Network intrusion detection stands as a major pillar for ensuring secure network connections. As both networks and network attacks grow in complexity, machine learning models have emerged as promising tools to form the foundation of advanced network intrusion detection systems. This thesis focuses on how the peculiar characteristics of this domain -namely its sensitivity, criticality, and the availability of distributed computational power- should be appropriately considered in the model design of learning algorithms for intrusion detection systems. The first part of the thesis investigates the problem of developing models whose predictions align with our expectations, especially in scenarios different from the usual assumption that the training and test sets contain independent and identically distributed samples from the same distribution. In particular, we focused on models that are inherently uncertainty-aware, such as those based on Bayesian inference, showing how they can be beneficial to enhance closed-set classification performance, would make it possible to carry out Active Learning, and would help recognize inputs of unknown classes as truly unknowns, unlocking open-set classification capabilities and Out-of- Distribution detection. The second part proposes federated learning as a framework for training models in a distributed and privacy-preserving way. We introduce an approach to mitigate the negative effects of data heterogeneity among clients, aiming to improve the overall predictive performance of a federated ML-based intrusion detection system. The third and final part investigates how to train uncertainty-aware models under the federated learning paradigm. We propose a simple approach to calibrate any given pre-trained model under the assumption of the availability of a local calibration set. In addition, we show how Bayesian inference provides a natural approach for designing a federated learning process, especially in one single round, the so-called One-shot federated learning. Overall, this manuscript aims to shed light on open problems and propose potential solutions in the field of ML-based intrusion detection, with the goal of enhancing the trustworthiness and scalability of machine learning models, particularly in comparison to traditional, centralized approaches.
La rilevazione delle intrusioni in rete rappresenta un pilastro fondamentale per garantire connessioni di rete sicure. Con l'aumento della complessità sia delle reti che degli attacchi informatici, i modelli di apprendimento automatico sono emersi come strumenti promettenti per costituire la base dei sistemi avanzati di rilevamento delle intrusioni in rete. Questa tesi si concentra su come le peculiarità di questo dominio - ovvero la sua sensibilità, criticità e la disponibilità di potenza computazionale distribuita - debbano essere adeguatamente considerate nel design dei modelli di algoritmi di apprendimento per i sistemi di rilevamento delle intrusioni. La prima parte della tesi esplora il problema dello sviluppo di modelli le cui previsioni siano in linea con le nostre aspettative, specialmente in scenari differenti dalla consueta ipotesi secondo cui i dataset di addestramento e di test contengono campioni indipendenti e identicamente distribuiti provenienti dalla stessa distribuzione. In particolare, ci siamo concentrati su modelli intrinsecamente capaci di rappresentare la propria incertezza sulle predizioni, come quelli basati sull'inferenza bayesiana, mostrando come possano essere utili per migliorare le prestazioni di classificazione, consentire l'implementazione dell'Active Learning e riconoscere gli input di classi sconosciute come veramente sconosciuti, aprendo la possibilità di classificazione su un set di classi aperto. La seconda parte propone l'apprendimento federato come un framework per l'addestramento di modelli in modo distribuito e rispettoso della privacy. Introduciamo un approccio per mitigare gli effetti negativi della eterogeneità dei dati tra i clienti, con l'obiettivo di migliorare le prestazioni predittive complessive di un sistema di rilevamento delle intrusioni basato su ML federato. La terza e ultima parte esplora come addestrare modelli consapevoli dell'incertezza sotto il paradigma dell'apprendimento federato. Proponiamo un approccio semplice per calibrare qualsiasi modello pre-addestrato, assumendo la disponibilità di un set di calibrazione locale. Inoltre, mostriamo come l'inferenza bayesiana fornisca un approccio naturale per progettare un processo di apprendimento federato, specialmente in un singolo round, il cosiddetto apprendimento federato One-shot. Nel complesso, questo manoscritto si propone di fare luce su problemi aperti e proporre soluzioni potenziali nel campo del rilevamento delle intrusioni basato su ML, con l'obiettivo di migliorare l'affidabilità e la scalabilità dei modelli di apprendimento automatico, in particolare rispetto agli approcci tradizionali e centralizzati.
(2025). Uncertainty Quantification and Distributed Models to Enhance ML-based Network Security. (Tesi di dottorato, , 2025).
Uncertainty Quantification and Distributed Models to Enhance ML-based Network Security
TALPINI, JACOPO
2025
Abstract
Network intrusion detection stands as a major pillar for ensuring secure network connections. As both networks and network attacks grow in complexity, machine learning models have emerged as promising tools to form the foundation of advanced network intrusion detection systems. This thesis focuses on how the peculiar characteristics of this domain -namely its sensitivity, criticality, and the availability of distributed computational power- should be appropriately considered in the model design of learning algorithms for intrusion detection systems. The first part of the thesis investigates the problem of developing models whose predictions align with our expectations, especially in scenarios different from the usual assumption that the training and test sets contain independent and identically distributed samples from the same distribution. In particular, we focused on models that are inherently uncertainty-aware, such as those based on Bayesian inference, showing how they can be beneficial to enhance closed-set classification performance, would make it possible to carry out Active Learning, and would help recognize inputs of unknown classes as truly unknowns, unlocking open-set classification capabilities and Out-of- Distribution detection. The second part proposes federated learning as a framework for training models in a distributed and privacy-preserving way. We introduce an approach to mitigate the negative effects of data heterogeneity among clients, aiming to improve the overall predictive performance of a federated ML-based intrusion detection system. The third and final part investigates how to train uncertainty-aware models under the federated learning paradigm. We propose a simple approach to calibrate any given pre-trained model under the assumption of the availability of a local calibration set. In addition, we show how Bayesian inference provides a natural approach for designing a federated learning process, especially in one single round, the so-called One-shot federated learning. Overall, this manuscript aims to shed light on open problems and propose potential solutions in the field of ML-based intrusion detection, with the goal of enhancing the trustworthiness and scalability of machine learning models, particularly in comparison to traditional, centralized approaches.File | Dimensione | Formato | |
---|---|---|---|
phd_unimib_848060.pdf
accesso aperto
Descrizione: Tesi Dottorato Jacopo Talpini
Tipologia di allegato:
Doctoral thesis
Dimensione
6.08 MB
Formato
Adobe PDF
|
6.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.