Deriving and evaluating a fault model for testing data science applications

Atif Aftab Jilani; Salman Sherin; Sidra Ijaz; Muhammad Zohaib Iqbal; Muhammad Uzair Khan

doi:10.1002/smr.2449

Back

Deriving and evaluating a fault model for testing data science applications

Journal article

Peer reviewed

Deriving and evaluating a fault model for testing data science applications

Atif Aftab Jilani, Salman Sherin, Sidra Ijaz, Muhammad Zohaib Iqbal and Muhammad Uzair Khan

Journal of software : evolution and process, Vol.34(5), p.n/a

05/2022

DOI: https://doi.org/10.1002/smr.2449

Abstract

Computer Science

Computer Science, Software Engineering

Science & Technology

Technology

Data science (DS) applications not only suffer from traditional software faults but may also suffer from data-specific and model-related faults. Fault models play an important role in evaluating and designing tests for testing DS applications. The existing fault models do not consider DS specific faults. In this study, we built a fault model DS applications. We investigate the faults by using diverse approaches: (i) a multi-vocal literature survey of published literature, (ii) semi-structured interviews of industry experts. The Multi-vocal study allows us to synthesize the existing knowledge from researchers and practitioners. Qualitative data from semi-structured interviews provide us with insights into the nature of faults encountered by practitioners. We combine the results of (i) and (ii) to derive a detailed fault model. The developed fault model is further validated through a quantitative survey of industry practitioners, and the respondents were asked to identify the faults from our proposed fault model that they have experienced and classify those faults based on their severity as perceived by practitioners and its frequency. The results show that practitioners consider prediction bias and model decay as the most severe faults while data sampling and splitting faults along with feature engineering faults are the most frequent.

Metrics

1 Record Views

Details

Title: Deriving and evaluating a fault model for testing data science applications
Creators - without role: Atif Aftab Jilani - National University of Computer and Emerging Sciences
Salman Sherin - National University of Computer and Emerging Sciences
Sidra Ijaz - National University of Computer and Emerging Sciences
Muhammad Zohaib Iqbal - National University of Computer and Emerging Sciences
Muhammad Uzair Khan - Quest University Canada
Publication Details: Journal of software : evolution and process, Vol.34(5), p.n/a
Publisher: Wiley
Number of pages: 31
Grant note: HEC Planning Commission Pakistan
Identifiers: 9919878808331
Academic Unit: King Faisal University
Language: English
Resource Type: Journal article