Reducing Human-Induced Label Bias in SMS Spam with Context-Enhanced Clustering (CEC)

Shu Fuhnwi, Gerard; Reinhold, Ann Marie; Izurieta, Clemente

doi:10.1109/csr64739.2025.11130032

Reducing Human-Induced Label Bias in SMS Spam with Context-Enhanced Clustering (CEC)

dc.contributor.author	Shu Fuhnwi, Gerard
dc.contributor.author	Reinhold, Ann Marie
dc.contributor.author	Izurieta, Clemente
dc.date.accessioned	2026-04-29T18:54:41Z
dc.date.issued	2025-08
dc.description.abstract	Short Message Service (SMS) is a widely used text messaging feature available on both basic and smartphones, making SMS spam detection a critical task. Supervised machine learning approaches often face challenges in this domain due to their dependence on manually crafted features, such as keyword detection, which can result in simplistic patterns and misclassification of more complex messages. Furthermore, these models can exacerbate human-induced bias if the training data include inconsistent labeling or subjective interpretations, leading to unfair treatment of specific keywords or contexts. We propose a Context-Enhanced Clustering (CEC) approach to address these challenges by leveraging contextual metadata, adaptive thresholding, and modified similarity measures for clustering. We evaluate our approach using the English SMS spam dataset source from UC Irvine’s Machine Learning Repository. CEC identifies representative samples from the SMS dataset to fine-tune LLMs such as ChatGPT-4, improving the robustness and fairness of spam classification. Our approach outperforms traditional clustering techniques such as K -means and DBSCAN in mitigating bias, as demonstrated through experiments measuring a balanced accuracy of 85% and a treatment equality difference (TED) of precisely zero. When used to identify representative samples to fine-tune ChatGPT-4, the CEC achieves a balanced accuracy of 98%, an equal opportunity of difference (EOD), and a treatment equality difference (TED) of zero. These results significantly reduce human-induced bias while maintaining high classification accuracy.
dc.identifier.citation	Fuhnwi, G. S., Reinhold, A. M., & Izurieta, C. (2025, August). Reducing Human-Induced Label Bias in SMS Spam with Context-Enhanced Clustering (CEC). In 2025 IEEE International Conference on Cyber Security and Resilience (CSR) (pp. 71-76). IEEE.
dc.identifier.doi	10.1109/csr64739.2025.11130032
dc.identifier.uri	https://scholarworks.montana.edu/handle/1/19801
dc.language.iso	en_US
dc.publisher	IEEE
dc.rights	Copyright IEEE 2025
dc.rights.uri	https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.ieee.org/publications/rights/copyright-policy&ved=2ahUKEwjWrezS3ZOUAxWYFzQIHTesMpEQFnoECBkQAQ&usg=AOvVaw049eYMv8MkmnoJpAXZpIAg
dc.subject	Short Message Service (SMS)
dc.subject	spam detection
dc.subject	Context-Enhanced Clustering (CEC)
dc.title	Reducing Human-Induced Label Bias in SMS Spam with Context-Enhanced Clustering (CEC)
dc.type	Article
mus.citation.extentfirstpage	1
mus.citation.extentlastpage	6
mus.citation.journaltitle	2025 IEEE International Conference on Cyber Security and Resilience (CSR)
mus.relation.college	College of Engineering
mus.relation.department	Computer Science
mus.relation.university	Montana State University - Bozeman

Files

Original bundle

Now showing 1 - 1 of 1

Name:: fuhnwi-human-induced-label-bias-2025.pdf
Size:: 2.74 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 825 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Work - Computer Science