Improving sentiment domain adaptation for Arabic using an unsupervised self-labeling framework

Yathrib Alqahtani; Nora Al-Twairesh; Ahmed Alsanad

doi:10.1016/j.ipm.2023.103338

Back

Improving sentiment domain adaptation for Arabic using an unsupervised self-labeling framework

Journal article

Peer reviewed

Improving sentiment domain adaptation for Arabic using an unsupervised self-labeling framework

Yathrib Alqahtani, Nora Al-Twairesh and Ahmed Alsanad

Information processing & management, Vol.60(3), p.103338

05/2023

DOI: https://doi.org/10.1016/j.ipm.2023.103338

Abstract

Arabic language

Domain adaptation

Lexicon-based classification

Self-labeling

Sentiment classification

•An unsupervised self-labeling framework for Arabic sentiment domain adaptation.•Combining filter-based and embedded-based feature selections for pivots extraction.•A hybrid word similarity using co-occurrence association and embeddings similarity.•Evaluation on two multi-domain datasets: reviews in modern standard Arabic and tweets in dialectal Arabic.•A self-labeling domain adaptation is less sensitive to the sparsity and high dimensionality of Arabic texts than representation learning approach. Numerous domain adaptation methods have been proposed over the last decade, of which the most widely used methods have become popular owing to their generality in terms of tasks or language. While generality fails to consider language-specific issues, sentiment-specific adaptation methods rely on language-specific high-quality resources such as tagging tools or sentiment lexicons. This study proposes a resource-free unsupervised self-labeling adaptation framework for Arabic sentiment classification. By leveraging the sentiment-specific task of lexicon induction using a combination of feature selection methods and an improved hybrid word pairwise similarity technique, the proposed framework proved to be less sensitive to the issue of Arabic feature sparsity. A total of 12 traditional and 12 transformer-based experiments on two Arabic multi-domain datasets adapted in the proposed framework demonstrated that a simple yet effective unsupervised self-labeling approach outperformed complex representation learning adaptation approaches for the Arabic language. The proposed framework showed an improvement over the best-performing method by 2% on a dataset of reviews and competitive results on a dataset of tweets.

Metrics

1 Record Views

Details

Title: Improving sentiment domain adaptation for Arabic using an unsupervised self-labeling framework
Creators - without role: Yathrib Alqahtani - King Saud University
Nora Al-Twairesh - King Saud University
Ahmed Alsanad - King Saud University
Publication Details: Information processing & management, Vol.60(3), p.103338
Publisher: Elsevier Ltd
Identifiers: 999935908331
Academic Unit: Saudi Electronic University; King Saud University
Language: English
Resource Type: Journal article