MetaCDP: Metamorphic Testing for Quality Assurance of Containerized Data Pipelines
dc.contributor.author | ur Rehman, Faqeer | |
dc.contributor.author | Umbreen, Sidrah | |
dc.contributor.author | Rehman, Mudasser | |
dc.date.accessioned | 2024-11-26T18:39:23Z | |
dc.date.issued | 2024-06 | |
dc.description.abstract | In the ever-evolving world of technology, companies are investing heavily in building and deploying state-of-the-art Machine Learning (ML) based systems. However, such systems heavily rely on the availability of high-quality data, which is often prepared/generated by the Extract Transform Load (ETL) data pipelines; thus, they are critical components of an end-to-end ML system. A low-performing model (trained on buggy data) running in a production environment can cause both financial and reputational losses for the organization. Therefore, it is of paramount significance to perform the quality assurance of underlying data pipelines from multiple perspectives. However, the computational complexity, continuous change in data, and the integration of multiple components make it challenging to test them effectively, ultimately causing such solutions to suffer from the Oracle problem. In this research paper, we propose MetaCDP, a Metamorphic Testing approach that can be used by both researchers and practitioners for quality assurance of modern Containerized Data Pipelines. We propose 10 Metamorphic Relations (MRs) that target the robustness and correctness of the data pipeline under test, which plays a crucial role in providing high-quality data for developing a clustering-based anomaly detection model. To show the applicability of the proposed approach, we tested a data pipeline (from the E-commerce domain) and uncovered several erroneous behaviors. We also present the nature of issues identified by the proposed MRs, which can better help/guide software engineers and researchers to use best coding practices for maintaining and improving the quality of their data pipelines. | |
dc.identifier.citation | ur Rehman, F., Umbreen, S., & Rehman, M. (2024, June). MetaCDP: Metamorphic Testing for Quality Assurance of Containerized Data Pipelines. In 2024 IEEE Cloud Summit (pp. 135-142). IEEE. | |
dc.identifier.doi | 10.1109/Cloud-Summit61220.2024.00029 | |
dc.identifier.uri | https://scholarworks.montana.edu/handle/1/18983 | |
dc.language.iso | en_US | |
dc.publisher | IEEE | |
dc.rights | Copyright IEEE 2024 | |
dc.rights.uri | https://www.ieee.org/publications/rights/copyright-policy.html | |
dc.subject | machine learning | |
dc.subject | metamorphic relations | |
dc.subject | quality assurance | |
dc.title | MetaCDP: Metamorphic Testing for Quality Assurance of Containerized Data Pipelines | |
dc.type | Article | |
mus.citation.extentfirstpage | 1 | |
mus.citation.extentlastpage | 8 | |
mus.citation.journaltitle | 2024 IEEE Cloud Summit | |
mus.relation.college | College of Engineering | |
mus.relation.department | Computer Science | |
mus.relation.university | Montana State University - Bozeman |