Using improved machine learning and statistical assessments to mitigate bias in anomaly detection systems

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Montana State University - Bozeman, College of Engineering

Abstract

Anomaly detection systems are essential in domains like cybersecurity, marketing, fraud detection, and healthcare. However, bias in these systems can result in unfair outcomes, reduced accuracy, and compromised decisions. Traditional machine learning models often struggle with biased training data, inconsistent feature selection, and imbalanced class distributions, leading to unreliable performance. Bias mitigation is a growing area of research aimed at identifying, quantifying, and reducing bias in machine learning systems to ensure fairness, accuracy, and robustness. This work addresses bias in anomaly detection by integrating improved machine learning techniques with statistical assessments. First, a hybrid anomaly detection model is introduced for obfuscated malware detection. It combines a deep autoencoder and logistic regression to mitigate representation bias. The autoencoder learns a compact representation of the input, which is then classified using logistic regression. Statistical parity difference (SPD) is employed to assess and reduce bias quantitatively. Second, to address measurement bias caused by noisy or irrelevant features from network delays, hardware issues, or software faults in intrusion detection systems, a method combining data preprocessing and feature selection is proposed. Statistical hypothesis testing is also performed to evaluate the model's performance against state-of-the-art methods. Third, to tackle human-induced bias, three strategies are proposed: i) the use of unsupervised learning methods, such as Isolation Forest and one-class SVM, to reduce labeling bias in intrusion detection; ii) a language model-based SMS spam classification system that captures linguistic patterns and contextual nuances, improving fairness in keyword interpretation; iii) a context-enhanced clustering approach for SMS spam detection that reduces subjective labeling, inconsistent interpretations, and the need for domain expertise. This work contributes to closing existing gaps in bias mitigation for anomaly detection, advancing the intersection of machine learning, statistical analysis, and fairness-aware AI. By addressing multiple types of bias, such as representation, measurement, and human-induced, this research enhances both the performance and fairness of anomaly detection systems.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By