Abstract
Majority of studies on sentiment analysis field, specifically Arabic lexicon-based approach, are focused on doing preprocessing methods on targeted dataset text or collected textual data from Twitter (Twitter dataset) rather than dealing with lexicon itself. This study proposes a new method, we constraint firstly on building a new sentiment lexicon with reasonable number of words and then doing adequate preprocessing methods on the lexicon's words in addition to the (Twitter dataset). The study presents Saudi Dialect Sentiment lexicon called SaudiSentiPlus contains 7139 words which mostly generated from Saudi tweets and other dictionaries. Moreover, this study also presents two lexicon- based algorithms for Saudi dialect to deal with (prefixes and suffixes) letters in order to increase performance of proposed Saudi dialect lexicon. The experiment which has been conducted in this study to evaluate the performance of SaudiSentiPlus comprises four phases. The precision, recall, accuracy, and F-Score are measured in every phase. We built our testing dataset from twitter by focusing on Saudi dialect hashtags (971 thousands tweets from 162 hashtags). The results, show that accuracy of SaudiSentiPlus with the two lexicon- based algorithms reached to 81%.