Abstract
Plagiarism is one of the most common problem that has been increasing in the field of higher education. Many research papers have highlighted the issue of plagiarism in context to its detection and source that is often obtained from the text books and online sources, there is a variety of easy ways for students to copy others' work. Coding style can be used to detect source code plagiarism because it relates to programmer personality but does not affect the logic of a program, thus offering a way to differentiate between different code authors. The immediate objective of this paper is to identify whether a data set consisting of student programming assignments is rich enough to apply coding style metrics on in order to detect similarities between code sequences, and we use the BlackBox data set as a case study.