Using improved machine learning and statistical assessments to mitigate bias in anomaly detection systems

dc.contributor.advisorChairperson, Graduate Committee: Clemente Izurietaen
dc.contributor.authorShu Fuhnwi, Gerarden
dc.contributor.otherThis is a manuscript style paper that includes co-authored chapters.en
dc.date.accessioned2025-11-25T14:38:26Z
dc.date.available2025-11-25T14:38:26Z
dc.date.issued2025en
dc.description.abstractAnomaly detection systems are essential in domains like cybersecurity, marketing, fraud detection, and healthcare. However, bias in these systems can result in unfair outcomes, reduced accuracy, and compromised decisions. Traditional machine learning models often struggle with biased training data, inconsistent feature selection, and imbalanced class distributions, leading to unreliable performance. Bias mitigation is a growing area of research aimed at identifying, quantifying, and reducing bias in machine learning systems to ensure fairness, accuracy, and robustness. This work addresses bias in anomaly detection by integrating improved machine learning techniques with statistical assessments. First, a hybrid anomaly detection model is introduced for obfuscated malware detection. It combines a deep autoencoder and logistic regression to mitigate representation bias. The autoencoder learns a compact representation of the input, which is then classified using logistic regression. Statistical parity difference (SPD) is employed to assess and reduce bias quantitatively. Second, to address measurement bias caused by noisy or irrelevant features from network delays, hardware issues, or software faults in intrusion detection systems, a method combining data preprocessing and feature selection is proposed. Statistical hypothesis testing is also performed to evaluate the model's performance against state-of-the-art methods. Third, to tackle human-induced bias, three strategies are proposed: i) the use of unsupervised learning methods, such as Isolation Forest and one-class SVM, to reduce labeling bias in intrusion detection; ii) a language model-based SMS spam classification system that captures linguistic patterns and contextual nuances, improving fairness in keyword interpretation; iii) a context-enhanced clustering approach for SMS spam detection that reduces subjective labeling, inconsistent interpretations, and the need for domain expertise. This work contributes to closing existing gaps in bias mitigation for anomaly detection, advancing the intersection of machine learning, statistical analysis, and fairness-aware AI. By addressing multiple types of bias, such as representation, measurement, and human-induced, this research enhances both the performance and fairness of anomaly detection systems.en
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/19428en
dc.language.isoenen
dc.publisherMontana State University - Bozeman, College of Engineeringen
dc.rights.holderCopyright 2025 by Gerard Shu Fuhnwien
dc.subject.lcshAnomaly detection (Computer security)en
dc.subject.lcshMachine learningen
dc.subject.lcshStatisticsen
dc.subject.lcshDiscriminationen
dc.titleUsing improved machine learning and statistical assessments to mitigate bias in anomaly detection systemsen
dc.typeDissertationen
mus.data.thumbpage16en
thesis.degree.committeemembersMembers, Graduate Committee: Brad Whitaker; Matthew Revelle; Stacey Hancocken
thesis.degree.departmentComputingen
thesis.degree.genreDissertationen
thesis.degree.namePhDen
thesis.format.extentfirstpage1en
thesis.format.extentlastpage143en

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
shu-fuhnwi-using-2025.pdf
Size:
1.29 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
825 B
Format:
Item-specific license agreed upon to submission
Description: