Abstract
Importance: Diabetes is a chronic disease that can cause long term damage to various parts of the body. To prevent diabetic complications, different attempts integrating machine learning with medicine have been made for building models to predict whether a patient has diabetes
or not, but predicting this disease still has room for improvement. Hybrid prediction model presents a novel method and mostly achieve a much better optimal outcome than single classical machine learning algorithms. Objective: To develop a high accuracy model for different onsets of
type 2 diabetes prediction. In this way, the integration between clustering and classification techniques can be improved to help detecting diabetes at an earlier stage without deleting observations with missing values and also decrease insignificant features to get the most related features
during data collection. Methods: We implement a noise reduction based technique using Kmeans clustering followed by running the Random forest and XGBoost classifiers to extract the unknown hidden features of the dataset and for more accurate results. Results: Prediction accuracy
can be observed by benchmarking our model against up-to-date predictive models and common classification algorithms. With an accuracy of 97.53% by 10 fold cross validation, our T2ML model reaches a better accuracy compared with other experiments reported by other researchers in the literature
and over various conventional classification algorithms.