Arabic Text Classification: A Comparative Approach Using a Big Dataset

Mokhtar Ali Hasan Madhfar; Mohammed Abdullah Hassan Al-Hagery

Back

Conference proceeding

Arabic Text Classification: A Comparative Approach Using a Big Dataset

Mokhtar Ali Hasan Madhfar and Mohammed Abdullah Hassan Al-Hagery

The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, p.1

01/01/2019

Abstract

Bayesian analysis

Classification

Decision trees

Performance measurement

Regression models

Support vector machines

Conference Title: 2019 International Conference on Computer and Information Sciences (ICCIS) Conference Start Date: 2019, April 3 Conference End Date: 2019, April 4 Conference Location: Sakaka, Saudi Arabia Text classification is the process of categorizing text documents to a predefined group of categories. This paper aims to provide experimental evaluations of six well-known classification models in classifying a large Arabic corpus. These models are Nave Bayes (NB), Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree (DT), and Stochastic Gradient Descent (SGD). We used a corpus consisting of 111,728 Arabic documents that fall into five categories: sport, culture, economy, news, and diverse. Three performance metrics were applied to evaluate the experimental results of each model. The experimental results show that the Logistic Regression model scores the highest weighted F 1 score, followed by SGD, SVM, NB, Random Forest, and DT. The experiments also show that both feature size and corpus size have high impacts on the performance of the models.

Metrics

1 Record Views

Details

Title: Arabic Text Classification: A Comparative Approach Using a Big Dataset
Creators - without role: Mokhtar Ali Hasan Madhfar
Mohammed Abdullah Hassan Al-Hagery
Publication Details: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, p.1
Publisher: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Identifiers: 9928557508331
Academic Unit: Qassim University
Language: English
Resource Type: Conference proceeding