Abstract
•An unsupervised self-labeling framework for Arabic sentiment domain adaptation.•Combining filter-based and embedded-based feature selections for pivots extraction.•A hybrid word similarity using co-occurrence association and embeddings similarity.•Evaluation on two multi-domain datasets: reviews in modern standard Arabic and tweets in dialectal Arabic.•A self-labeling domain adaptation is less sensitive to the sparsity and high dimensionality of Arabic texts than representation learning approach.
Numerous domain adaptation methods have been proposed over the last decade, of which the most widely used methods have become popular owing to their generality in terms of tasks or language. While generality fails to consider language-specific issues, sentiment-specific adaptation methods rely on language-specific high-quality resources such as tagging tools or sentiment lexicons. This study proposes a resource-free unsupervised self-labeling adaptation framework for Arabic sentiment classification. By leveraging the sentiment-specific task of lexicon induction using a combination of feature selection methods and an improved hybrid word pairwise similarity technique, the proposed framework proved to be less sensitive to the issue of Arabic feature sparsity. A total of 12 traditional and 12 transformer-based experiments on two Arabic multi-domain datasets adapted in the proposed framework demonstrated that a simple yet effective unsupervised self-labeling approach outperformed complex representation learning adaptation approaches for the Arabic language. The proposed framework showed an improvement over the best-performing method by 2% on a dataset of reviews and competitive results on a dataset of tweets.