Abstract
Plagiarism has become an increasing problem in higher education in recent years. Coding style can be used to detect source code plagiarism that involves writing and deciding the structure of the code which does not affect the logic of a program, thus offering a way to differentiate between different code authors. This paper focuses to identify whether a data set consisting of student programming assignments is rich enough to apply coding style metrics to detect similarities between code sequences, and we use the BlackBox dataset as a case study.