Abstract
Cyberbullying is a crime where one person becomes the target of harassment and hate. Many cyberbullying detection approaches have been introduced, however, they were largely based on textual and user features. Most of the research found in the literature aimed at improving detection through introducing new features. However, as the number of features increases, the feature extraction and selection phases have become harder. On the other hand, no study has examined the meaning of words and semantics in cyberbullying. In order to bridge this gap, we propose a novel algorithms CNN-CB that eliminate the need for feature engineering and produce better prediction than traditional cyberbullying detection approaches. The proposed algorithm adapts the concept of word embedding where similar words have similar embedding. Therefore, bullying tweets will have similar representations and this will advance the detection. CNN-CB is based on convolutional neural network (CNN) and incorporates semantics through the use of word embedding. Experiments showed that CNN-CB algorithm outperform traditional content-based cyberbullying detection with an accuracy of 95%.