Abstract
This paper presents an intelligent model to analyze, understand and classify Arabic tweets. The proposed model includes four main phases; preprocessing, feature extraction, language model, and classification model phases. In the preprocessing phase, the corpora and the stop words will be employed. The language model includes morphological, lexical, syntax, and semantic analysis. Moreover, stem, root extraction and number indication will be involved. Consequently, we have different features that represent the analyzed Arabic tweets (meanings, word order, syntactic features, number features.). Therefore, the classification phase is used to classify Arabic tweets model. The proposed solution uses tweets corpora written in Arabic, so the generated dictionary/lexicon has been made of Arabic words with their meaning. After getting the content data from the corpora, the language model analyzes and understands the content and stores it into deep structure or internal representation. Therefore, feature extraction extracts tweets features, and classification model classifies the new tweets. This study uses linguistic preprocessing tasks and similarity functions to outperform Arabic tweets clustering. Consequently, machine learning will generate the result of the analyzed tweets.