Abstract
Understanding relationships between entities in a computer network is an important task in enterprise cyber-security. This paper presents a novel procedure for exploring similarity relationships in Netflow behaviour - activity over time. We demonstrate a two-stage procedure. First, a statistical model is used as a summary of raw data. Naturally, the parameters of such a model are subject to estimation uncertainty. The second stage develops a similarity metric that incorporates this uncertainty. Standard clustering procedures then become available. We illustrate the method using connection-based data derived from Netflow records, from a recently released public domain data set.