Abstract
Spam is thriving on Arabic Twitter. With a large online population, a mounting political unrest, and an undersized and unspecialized response effort, the current state of Arabic online social networks (OSNs) offers a perfect target for the spam industry, bringing both abuse and manipulation to the scene. The result is a ubiquitous spam presence that redefines the signal to noise ratio, and makes spam a de facto component of the online social platforms. English spam on online social networks has been heavily studied in the literature. To date however, social spam in other languages has been largely ignored. Our own analysis of spam content on Arabic trending hash tags in Saudi Arabia results in an estimate of about three quarters of the total generated content. This alarming rate, backed by independent concurrent estimates, makes the development of adaptive spam detection techniques a very real and pressing need. In this study, we present a first attempt at detecting accounts that promote spam and content pollution on Arabic Twitter. Using a large crawled dataset of more than 23 million Arabic tweets, and a manually labeled sample of more than 5000 tweets, we analyze the spam content on Saudi Twitter, and assess the performance of previous spam detection features on our recently gathered dataset. We also adapt the previously proposed features to respond to spammers evading techniques, and use these features to build a new highly accurate data-driven detection system.