Multiple Similarity-based Features Blending for Detecting Code Clones using Consensus-Driven Classification

Abdullah M. Sheneamer

doi:10.1016/j.eswa.2021.115364

Back

Multiple Similarity-based Features Blending for Detecting Code Clones using Consensus-Driven Classification

Journal article

Peer reviewed

Multiple Similarity-based Features Blending for Detecting Code Clones using Consensus-Driven Classification

Abdullah M. Sheneamer

Expert systems with applications, Vol.183, p.115364

30/11/2021

DOI: https://doi.org/10.1016/j.eswa.2021.115364

Abstract

Classification

Code clones

Features

Machine learning

Semantic clones

Similarity measures

Software engineering

Syntactic clones

•Detecting code clones based on semantics is challenging.•Unique detection framework proposed for code clones.•Novel code semantic features are used based on similarity measure scores.•Extensive experiments established the superiority of our framework. Code clone detection helps to reduce the costs associated with software maintenance and bug prevention. Machine learning methods have previously suggested many ways by which to detect code clones. The majority of clone detectors are traditional in their approach, they can detect syntactic clones but are poor at detecting semantic clones. Researchers use machine learning to detect semantic clones and automatically scan the data to learn latent semantic features. In this study, we have introduced a new formal model of similarity which combines similarity measures so that method blocks can measure both the syntactic and semantic distances between method block pairs. The uniqueness of our study is in the use of different similarity measures, and similarity scores as features in machine learning, to detect code clones. We use a number of similarity measure computations to extract similarity score features, these features are then represented as vectors. Using ensemble classification models, we perform extensive comparisons and evaluations of the effectiveness of our proposed idea. The results indicate that our approach is significantly better at detecting clone types compared to contemporary code clone detectors. We achieved a 99% success rate in detecting cloned codes based on F-score, recall, and precision. Our approach achieves 98–100% accuracy in the majority of cases.

Metrics

1 Record Views

See more details

Details

Title: Multiple Similarity-based Features Blending for Detecting Code Clones using Consensus-Driven Classification
Creators - without role: Abdullah M. Sheneamer - Jazan University
Publication Details: Expert systems with applications, Vol.183, p.115364
Publisher: Elsevier Ltd
Identifiers: 9918001708331
Academic Unit: Jazan University
Language: English
Resource Type: Journal article