Masdar: A Novel Sequence-to-Sequence Deep Learning Model for Arabic Stemming

Mohammed M. Fouad; Ahmed Mahany; Iyad Katib

doi:10.1007/978-3-030-29513-4_26

Back

Conference proceeding

Masdar: A Novel Sequence-to-Sequence Deep Learning Model for Arabic Stemming

Mohammed M. Fouad, Ahmed Mahany and Iyad Katib

INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, Vol.1038, pp.363-373

Advances in Intelligent Systems and Computing

01/01/2020

DOI: https://doi.org/10.1007/978-3-030-29513-4_26

Abstract

Computer Science

Computer Science, Artificial Intelligence

Science & Technology

Technology

Preprocessing the input textual data is the main starting step in any Natural Language Processing (NLP) application. Word stemming, i.e. extracting the stem or root of the input word, is a vital process within the preprocessing step. In this process, some words like "player", "playing", and "played" are mapped to their stem "play". In the English language, there are several algorithms and approaches that can be applied directly to handle this process. On the other hand, there are some trials for similar algorithms in Arabic, but all have weak performance due to the complexity of the language and the approaches used for building such algorithms. In this paper, we presented a novel deep learning-based model, called Masdar, for Arabic stemming. The proposed model leverages the power of the deep learning, especially the recurrent neural networks, in building an efficient Arabic stemmer that is capable of producing very accurate stems for most of the input words. Some experiments are conducted to compare the performance of the proposed model with the latest cited Arabic stemmers on a dataset of about 6000 Arabic word/stem pairs. The experimental results show that Masder outperformed the other stemmers. It can efficiently produce the correct stems with about 95% accuracy on the whole dataset and about 82% accuracy on the unseen test words.

Metrics

1 Record Views

Details

Title: Masdar: A Novel Sequence-to-Sequence Deep Learning Model for Arabic Stemming
Creators - without role: Mohammed M. Fouad - Fujitsu Technol Solut, Jeddah, Saudi Arabia
Ahmed Mahany - Ain Shams University
Iyad Katib - King Abdul Aziz University Hospital
Contributors - without role: Y Bi
R Bhatia
S Kapoor
Publication Details: INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, Vol.1038, pp.363-373
Series: Advances in Intelligent Systems and Computing
Publisher: Springer Nature
Number of pages: 11
Identifiers: 9937488008331
Academic Unit: King Abdulaziz University
Language: English
Resource Type: Conference proceeding