Abstract
Conference Title: 2019 International Conference on Computer and Information Sciences (ICCIS) Conference Start Date: 2019, April 3 Conference End Date: 2019, April 4 Conference Location: Sakaka, Saudi Arabia Text classification is the process of categorizing text documents to a predefined group of categories. This paper aims to provide experimental evaluations of six well-known classification models in classifying a large Arabic corpus. These models are Nave Bayes (NB), Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree (DT), and Stochastic Gradient Descent (SGD). We used a corpus consisting of 111,728 Arabic documents that fall into five categories: sport, culture, economy, news, and diverse. Three performance metrics were applied to evaluate the experimental results of each model. The experimental results show that the Logistic Regression model scores the highest weighted F 1 score, followed by SGD, SVM, NB, Random Forest, and DT. The experiments also show that both feature size and corpus size have high impacts on the performance of the models.