Abstract
Graph-based data representation formats enable more advanced processing of data that leads to better utilization of information stored and available on the web. Intrinsic high connectedness of such representation provides a means to create methods and techniques that can assimilate new data and build knowledge-like data structures. Such procedures resemble a human-like way of dealing with information.
In the paper, we focus on processing a knowledge graph data. In particular, we propose a simple way of clustering pieces of data that contain levels of uncertainty associated with them. That uncertainty is a result of collecting data from multiple sources. It is due to the fact that information about the same entities occurs a number of times and can be inconsistent. Existence of a number of 'alternative' pieces of data means that we can associate with them different levels of uncertainty. In order to accomplish that, we represent pieces of data from knowledge graphs as propositions with multiple alternatives. Each alternative is associated with an uncertainty value expressing its 'correctness', i.e., a level of confidence that a given alternative represents an accurate piece of information. Those values are generated based on frequency of occurrence and consistency of alternatives. Our method is designed to cluster such propositions. The methodology is presented together with a number of illustrating examples.