Abstract
Background: Despite the increasingly reported benefits of software crowdsourcing, one of the major practical concerns is the limited visibility and control over task progress. Aim: This paper reports an empirical study to develop a framework for failure prediction in software crowdsourcing. Method: This process begins with identifying 13 influencing factors in software crowdsourcing failures, across four categories including task characteristics, technology popularity, competition network, and workers reliability. Presenting an algorithm to construct worker competition network and extract its network metrics features. The proposed framework was evaluated on 4,872 software crowdsourcing tasks, extracted from TopCoder platform, using five machine learners, compared with in-house TopCoder predictor. Results: 1) Workers reliability, links in the description, number of registered workers, number of required technologies, and task-workers network modularity are the most influencing factors for predicting crowdsourcing failure; 2) The top-three learners for task failure are Naive Bayes, Random Forest, and StackingC, with precision above 98.8%, recall above 81.2%, and F-measure above 91.2%; and 3) The proposed best learners significantly outperform the two baseline models in our evaluation. Conclusions: The performance of the proposed framework is better than those of the two baseline models. This paper offers practical recommendations for managing task failure risks.