LDA and LSI as a Dimensionality Reduction Method in Arabic Document Classification

Rami Ayadi; Mohsen Maraoui; Mounir Zrigui

doi:10.1007/978-3-319-24770-0_42

Back

LDA and LSI as a Dimensionality Reduction Method in Arabic Document Classification

Conference proceeding

Peer reviewed

LDA and LSI as a Dimensionality Reduction Method in Arabic Document Classification

Rami Ayadi, Mohsen Maraoui and Mounir Zrigui

INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2015, Vol.538, pp.491-502

Communications in Computer and Information Science

01/01/2015

DOI: https://doi.org/10.1007/978-3-319-24770-0_42

Abstract

Computer Science

Computer Science, Information Systems

Computer Science, Interdisciplinary Applications

Computer Science, Software Engineering

Computer Science, Theory & Methods

Science & Technology

Technology

In this work, we made an experimental study for compare two approaches of reduction dimensionality and verify their effectiveness in Arabic document classification. Firstly, we apply latent Dirichlet allocation (LDA) and latent semantic indexing (LSI) for modeling our document sets OATC (open Arabic Tunisian corpus) contained 20.000 documents collected from Tunisian newspapers. We generate two matrices LDA (documents/topics) and LSI (documents/topics). Then, we use the SVM algorithm for document classification, which is known as an efficient method for text mining. Classification results are evaluated by precision, recall and F-measure. The evaluation of classification results was performed on OATC corpus (70 % training set and 30 % testing set). Our experiment shows that the results of dimensionality reduction via LDA outperform LSI in Arabic topic classification.

Metrics

1 Record Views

Details

Title: LDA and LSI as a Dimensionality Reduction Method in Arabic Document Classification
Creators - without role: Rami Ayadi - University of Sfax
Mohsen Maraoui - University of Monastir
Mounir Zrigui - University of Monastir
Contributors - without role: G Dregvaite
R Damasevicius
Publication Details: INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2015, Vol.538, pp.491-502
Series: Communications in Computer and Information Science
Publisher: Springer Nature
Number of pages: 12
Identifiers: 9913028308331
Academic Unit: Al Jouf University
Language: English
Resource Type: Conference proceeding