Abstract
Several studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples - perturbed inputs that cause DNN-based models to produce incorrect outputs. A variety of adversarial attacks have been proposed in the domains of computer vision and natural language processing (NLP); however, most attacks in the NLP domain have been applied to DNNs that were trained on English corpora. This paper proposes the first set of black-box adversarial attacks designed to perturb Arabic textual inputs. By intentionally violating the noun-adjective agreement in Arabic, two state-of-the-art DNN architectures are successfully fooled in the task of sentiment analysis, and classification accuracy was reduced by an average of 52.97% for the word-level BiLSTM model and 50.44% for the word-level CNN model. We believe that our findings will encourage other researchers to investigate the robustness of DNNs when applied to natural languages beyond English.