Abstract
Preliminary diagnosis of medical conditions, such as autism spectrum disorder (ASD), requires an understanding of the influential traits, for example autistic traits, during the screening process. Therefore, selecting the right attributes is a critical part of model construction in medical applications such as ASD screening, as this directly impacts the accuracy and efficiency of classification. This research investigates different methods for selecting attributes, including Chi-square (CHI), correlation feature set, information gain, Gini index and fast correlated-based filter to identify highly impactful autistic traits using over 1000 data observations of cases and controls related to toddlers. We seek to find the common autistic traits that influence the pre-diagnosis process for ASD obtained by these attribute selection methods from a real autism dataset related to toddlers and their impact on the performance of the screening process. To achieve the aim, an empirical methodology involving the use of three classification algorithms, AdaBoost, k-Nearest Neighbour (kNN) and ID3, has been used to derive models from the various different datasets chosen prior to training according to the considered attribute selection methods. These models are evaluated using evaluation metrics including specificity, sensitivity, accuracy and area under curve. Empirical results using the classification techniques for different attribute sets for the toddlers' dataset show which influential autistic traits can be utilized by clinicians and diagnosticians to speed up the pre-diagnosis process for ASD and to enhance classification performance. More importantly, we show which attribute selection methods identify the relevant attributes that influence the preliminary process for diagnosis.