Title: Characterizing Anomalies for Reliable Machine Learning

 

Date: Monday, November 18, 2024

Time: 9:30 AM – 10:45 AM ET

Location [Hybrid]: 

- Coda C0908 Home Park

- Zoom Link: https://gatech.zoom.us/j/92792928638?pwd=kkeTGxpYxd7bu3zZ2cx0XZ5PS5o918.1 

 

Matthew Lau

Ph.D. CS Student

School of Cybersecurity and Privacy

College of Computing

Georgia Institute of Technology

 

Committee:

Dr. Wenke Lee - (Advisor) School of Cybersecurity and Privacy, Georgia Institute of Technology

Dr. Athanasios P. (Sakis) Meliopoulos - School of Electrical and Computer Engineering, Georgia Institute of Technology
Dr. Saman Zonouz - School of Cybersecurity and Privacy & School of Electrical and Computer Engineering, Georgia Institute of Technology

Dr. (Polo) Chau Duen Horng - School of Computational Science & Engineering, Georgia Institute of Technology

Dr. Huo Xiaoming - School of Industrial and Systems Engineering, Georgia Institute of Technology

 

 

Abstract:

Machine learning (ML) has had much success across a variety of domains and tasks over the past few decades. However, ML models often assume that test data statistically mirrors the training data, an assumption that fails in the presence of anomalies (i.e., test data that do not mirror training). Yet, scenarios that produce anomalies are precisely the situations that can be safety- and security-critical, such as cyber-attacks. To ensure that ML models are reliable (i.e., they are accurate even when fed anomalies), we propose a framework to characterize anomalies and incorporate this characterization into the ML pipeline. We discuss how to use this framework for unknown and foreseeable types of anomalies, both of which we have no data for. 

 

For unknown anomalies, we characterize them as living in large open spaces and ensure that models are conservative in these open spaces. The presence of unknown anomalies is a key trait of anomaly detection. In these large open spaces, we bias neural networks to be conservative, classifying open spaces as anomalous. We show how to bias neural networks statistically and geometrically for unsupervised and supervised anomaly detection respectively. With this bias, we improve the reliability of neural networks on unknown anomalies. 

 

For foreseeable anomalies, we aim to analyze and account for their pattern (known as signature) by feature engineering. Attacks on cyber-physical systems (CPSes) are anomalies that we can foresee due to attacks being constrained by the cyber or physical component. Here, we characterize each attack signature and ensure that the ML model accounts for it. We show that our approach is principled by evaluating with two case studies on (1) cyber-attacks against explainable anomaly detection on power grids and (2) physical adversarial attacks against video-based object detection. In the first case study, we design graph change statistics to localize attacked sensors with phase- and amplitude-based signatures. For the second case study, we project images onto the data manifold with background subtraction for model fine-tuning, promoting robustness against off- and on-manifold adversarial signatures. In these two cases, characterizing attack signatures with feature engineering ensures that ML models are accurate even during attacks. 

 

In summary, this thesis proposes a framework to characterize anomalies in ML. For unknown anomalies, we encourage ML models to be conservative in large open spaces. When more information is present, we can use feature engineering to account for signatures from foreseeable anomalies. Accounting for potential anomalies in both cases, we increase the reliability of ML.