Abstract
Preprocessing the input textual data is the main starting step in any Natural Language Processing (NLP) application. Word stemming, i.e. extracting the stem or root of the input word, is a vital process within the preprocessing step. In this process, some words like "player", "playing", and "played" are mapped to their stem "play". In the English language, there are several algorithms and approaches that can be applied directly to handle this process. On the other hand, there are some trials for similar algorithms in Arabic, but all have weak performance due to the complexity of the language and the approaches used for building such algorithms. In this paper, we presented a novel deep learning-based model, called Masdar, for Arabic stemming. The proposed model leverages the power of the deep learning, especially the recurrent neural networks, in building an efficient Arabic stemmer that is capable of producing very accurate stems for most of the input words. Some experiments are conducted to compare the performance of the proposed model with the latest cited Arabic stemmers on a dataset of about 6000 Arabic word/stem pairs. The experimental results show that Masder outperformed the other stemmers. It can efficiently produce the correct stems with about 95% accuracy on the whole dataset and about 82% accuracy on the unseen test words.