Abstract
Lung cancer is a malignant disease that im-poses serious complications restricting patients from performing daily tasks in the early stages and eventu-ally cause their death. The prevalence of this disease has been highlighted by numerous statistics worldwide. The preemptive diagnosis of individuals with lung can-cer can enhance chances of prevention and treatment. Therefore, the purpose of this study is to predict lung cancer preemptively utilizing simple clinical and demo-graphical features obtained from the "data world" website. The experiment was conducted using Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Logistic Regression (LR) classifiers. To improve models' accuracy, SMOTETomek was employed along with GridsearchCV to tune hyperparameters. The Re-cursive Feature Elimination method was also utilized to find the best feature subset. Results indicated that SVM achieved the best performance with 98.33% recall, 96.72% precision, and an accuracy of 97.27% using 15 attributes.