Abstract
Stance detection is a relatively new concept in data mining that aims to assign a stance label (favor, against, or none) to a social media post towards a specific pre-determined target. These targets may not be referred to in the post, and may not be the target of opinion in the post. In this paper, we propose a novel enhanced method for identifying the writer’s stance of a given tweet. This comprises a three-phase process for stance detection: (a) tweets preprocessing; here we clean and normalize tweets (e.g., remove stop-words) to generate words and stems lists, (b) features generation; in this step, we create and fuse two dictionaries for generating features vector, and lastly (c) classification; all the instances of the features are classified based on the list of targets. Our innovative feature selection proposes fusion of two ranked lists (top-k) of term frequency-inverse document frequency (tf-idf) scores and the sentiment information. We evaluate our method using six different classifiers: K nearest neighbor (K-NN), discernibility-based K-NN, weighted K-NN, class-based K-NN, exemplar-based K-NN, and Support Vector Machines. Furthermore, we investigate the use of Principal Component Analysis and study its effect on performance. The model is evaluated on the benchmark dataset (SemEval-2016 task 6), and the results significance is determined using t-test. We achieve our best performance of macro F-score (averaged across all topics) of 76.45% using the weighted K-NN classifier. This tops the current state-of-the-art score of 74.44% on the same dataset.
•Stance towards a given topic is the position (favor, against or neutral) towards it.•Stance detection and sentiment analysis look differently at the same thing.•Feature selection fuses two ranked lists of tfidf scores and sentiment information.•Best performance when using Weighted KNN classifier.•On benchmark dataset (SemEval-2016 Task 6A) we achieve macro F-score of 76.45%.