Disguised plagiarism detection in Arabic text documents

El Moatez Billah Nagoudi; Hadda Cherroun; Ali Alshehri

Back

Conference proceeding

Disguised plagiarism detection in Arabic text documents

El Moatez Billah Nagoudi, Hadda Cherroun and Ali Alshehri

The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, p.1

01/01/2018

Abstract

Decision trees

Machine learning

Natural language

Natural language processing

Plagiarism

Semantics

Speech processing

Support vector machines

Conference Title: 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP) Conference Start Date: 2018, April 25 Conference End Date: 2018, April 26 Conference Location: Algiers, Algeria Plagiarism detection is a challenging Natural Language Processing (NLP) task. Recently, many systems have been able to detect the simple verbatim reproduction (copy and paste). However, more disguised plagiarism techniques have been used in real plagiarism cases such as: rewording, synonym substitution, paraphrasing and text manipulation, which make the plagiarism detection task much more difficult. In this paper, we propose two approaches devoted to assist users in detecting plagiarism in Arabic natural language texts. The first approach is based on word-embedding, words alignment, and words weighting for the purpose of measuring the semantic similarity relationships among textual units. The second approach is based on Machine Learning (ML), where the characterisation is performed at the sentence level. We combine lexical, syntactic, and semantic features to assist the detection task. The Support Vector Machine (SVM), Decision Trees (DT), and Random Forests (RF) are investigated. The classifiers are trained and evaluated using the training dataset of the first Arabic Plagiarism Detection (AraPlagDet) shared task 2015. Our experimental results show that the proposed approaches achieve promising results compared to state-of-the-art Arabic plagiarism detection systems.

Metrics

1 Record Views

Details

Title: Disguised plagiarism detection in Arabic text documents
Creators - without role: El Moatez Billah Nagoudi
Hadda Cherroun
Ali Alshehri
Publication Details: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, p.1
Publisher: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Identifiers: 9923889108331
Academic Unit: King Khalid University
Language: English
Resource Type: Conference proceeding