Abstract
There is an increasing demand for analyzing the contents of social media. However, the process of sentiment analysis in Arabic language especially Arabic dialects can be very complex and challenging. This paper presents details of collecting and constructing a classified corpus of 4180 multi-dialectal Saudi tweets (SDCT). The tweets were annotated manually by five native speakers in two stages. The first stage annotated the tweets as Hijazi, Najdi, and Eastern based on some Saudi regions. The second stage annotated the sentiment as positive, negative, and natural. The annotation process was evaluated using Kappa Score. The validation process used cross validation technique through eight baseline experiments for training different classifier models. The results present that the 10-folds validation provides greater accuracy than 5-folds across the eight experiments and the classification of the Eastern dialects achieved the best accuracy compared to the other dialects with an accuracy of 91.48%.