Abstract
Digital evidence continues to be an integral component in cybercrime investigative and judicial processes. However, increasing volume digital content and files makes it challenging for forensic examiners to process evidence in a timely way. In this paper, we use machine learning to predict stealthy watermarks in various file types. We use a black box approach which is different from current steganographic and cryptographic methods to find patterns of candidate file locations for hidden data. The results in this paper demonstrate that it is possible to use machine learning to build singleton models of the same file type as well as hybrid models to predict stealthy watermarks in files. In our experiments, the DOCX singleton models predicted stealthy watermarks with predictive accuracies ranging from 40% to 100%. The PPTX singleton model predicted stealthy watermarks with predictive accuracies ranging from 32.5% to 100%. Similarly, the JPEG singleton model predicted stealthy watermarks with predictive accuracies ranging from 37.5% to 65%. We also generated four types of hybrid models: both HYBID3 and JPEG_PPTX models predicted stealthy watermarks with predictive accuracies ranging from 47.5% to 92.5% while HYBRID_OOXML model predicted stealthy watermarks with predictive accuracies ranging from 32.5% to 100%. In addition, JPEG_DOCX model predicted stealthy watermarks in files with predictive accuracies ranging from 47.5% to 90%.