Abstract
Effective malware detection approaches need not only high accuracy, but also need to be robust to changes in the modus operandi of criminals. In this paper, we propose MARMITE, a feature-agnostic system that aims at propagating known malicious reputation of certain files to unknown ones with the goal of detecting malware. MARMITE does this by looking at a graph that encapsulates a comprehensive view of how files are downloaded (by which hosts and from which servers) on a global scale. The reputation of files is then propagated across the graph using semi-supervised label propagation with Bayesian confidence. We show that MARMITE is able to reach high accuracy (0.94 G-mean on average) over a 10-day dataset of 200 million download events. We also demonstrate that MARMITE'S detection capabilities do not significantly degrade over time, by testing our system on a 30-day dataset of 660 million download events collected six months after the system was tuned and validated. MARMITE still maintains a similar accuracy after this period of time.