Abstract
Data mining is an active research area that has attracted significant attention due to the rapidly growing quantities of data and the imminent need to transform these amounts of data into useful information and knowledge. An example of data that still rapidly growing is the dataset of medical information from patients suffering from Middle East Respiratory Syndrome Coronavirus (MERS-CoV); MERS-CoV is a viral respiratory disease spreading worldwide. Since the need for an accurate diagnosis system that predicts MERS-CoV infections has increased, exploiting the classifier model performance of different classification types can greatly help to improve the prediction accuracy of MERS-CoV infection. In this paper, we examine classifier model performance for three classification types: 1) binary; 2) multi-class; and 3) multi-label, on a text-based MERS-CoV dataset using a cross-validation model to measure the accuracy of k-nearest neighbor, decision tree, and naive Bayes algorithms. Our empirical study concluded that the decision tree classifier performed best for binary classification, with an accuracy of 90%. In contrast, for multiclass classification, the k-nearest neighbor algorithm had a comparatively good accuracy measurement, 51.60%, but did not reach a satisfactory accuracy level. For multi-label classification, the naive Bayes classifier was the most accurate, at 77%. This work is conducted as a part of a larger project dedicated to producing a MERS-CoV prediction system.