Investigating and mitigating the performance-fairness tradeoff via protected-category sampling
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Montana State University - Bozeman, College of Engineering
Abstract
Machine learning algorithms have become common in everyday decision-making, and decision-assistance systems are ubiquitous in our everyday lives. Hence, research on the prevention and mitigation of potential bias and unfairness of the predictions made by these algorithms has been increasing in recent years. Most research on fairness and bias mitigation in machine learning often treats each protected variable separately, but in reality, it is possible for one person to belong to multiple protected categories. Hence, in this thesis, combining a set of protected variables and generating new columns that separate these protected variables into many subcategories was examined. These new subcategories tend to be extremely imbalanced, especially in the class-protected category, so bias mitigation was approached as an imbalanced classification problem. Specifically, four new custom sampling methods were developed and investigated to sample these new subcategories. These new sampling methods are referred to as Protected-Category Oversampling, Protected-Category Proportional Sampling, Protected-Category Synthetic Minority Oversampling Technique (PC-SMOTE), and Protected-Category Adaptive Synthetic Sampling (PC-ADASYN). These sampling methods modify the existing sampling method by focusing their sampling on the new subcategories rather than the class labels. The impacts of these sampling strategies were then evaluated based on classical performance and fairness metrics in classification settings. Classification performance was measured using accuracy, precision, recall, and F1 based on training univariate decision trees, and fairness was measured using equalized odds differences, disparate impact, predictive equality, and statistical parity. To evaluate the impact of fairness versus performance, these measures were evaluated against decision tree depth. The results show that the proposed methods were able to determine optimal points whereby fairness was increased without decreasing performance, thus mitigating any potential performance- fairness tradeoff. To evaluate the impact of the newly proposed sampling on class labels, we also carried out an experiment that calculated the probability distribution of each class of each protected category, and we used KL divergence to measure the difference between these distributions. The results of this experiment show varying KL between the no-sampling and the sampling methods.