A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization

Fouzi Harrag; Eyas El-Qawasmeh; Abdul Malik S. Al-Salman

doi:10.1007/978-3-642-14306-9_67

Back

Conference proceeding

A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization

Fouzi Harrag, Eyas El-Qawasmeh and Abdul Malik S. Al-Salman

NETWORKED DIGITAL TECHNOLOGIES, PT 2, Vol.88(2), pp.676-682

Communications in Computer and Information Science

01/01/2010

DOI: https://doi.org/10.1007/978-3-642-14306-9_67

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Information Systems

Computer Science, Software Engineering

Computer Science, Theory & Methods

Science & Technology

Technology

Feature reduction methods have been successfully applied to text categorization. In this paper, we perform a comparative study on three feature reduction methods for text categorization, including Document Frequency (DF), Term Frequency Inverse Document Frequency (TFIDF) and Latent Semantic Analyses (LSA). Our feature set is relatively large (since there are thousands of different terms in different texts files). We propose the use of the previous feature reduction methods as a preprocessor of Back-Propagation Neural Network (BPNN) to reduce the input data on training process. The experimental results on an Arabic data set demonstrate that among the three dimensionality reduction techniques proposed, TFIDF was found to be the most effective in reducing the dimensionality of the feature space.

Metrics

1 Record Views

Details

Title: A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization
Creators - without role: Fouzi Harrag - University Ferhat Abbas of Setif
Eyas El-Qawasmeh - Jordan University of Science and Technology
Abdul Malik S. Al-Salman - King Saud University
Contributors - without role: F Zavoral
J Yaghob
P Pichappan
E ElQawasmeh
Publication Details: NETWORKED DIGITAL TECHNOLOGIES, PT 2, Vol.88(2), pp.676-682
Series: Communications in Computer and Information Science
Publisher: Springer Nature
Number of pages: 2
Identifiers: 9951690908331
Academic Unit: King Saud University
Language: English
Resource Type: Conference proceeding