Abstract
In view of answering queries provided in multiple large databases, it might be required to mine relevant databases en block. In this chapter, we present an efficient solution to clustering multiple large databases. We present two measures of similarity between a pair of databases and study their main properties. In the sequel, we design an algorithm for clustering multiple databases based on an introduced similarity measure. Also, we present a coding, referred to as IS coding, to represent itemsets space efficiently. The coding of this nature enables more frequent itemsets to participate in the determination of the similarity between two databases. Thus the invoked clustering process becomes more accurate. We also show that the IS coding attains maximum efficiency in most of the cases of the mining processes. The clustering algorithm becomes improved (in terms of its time complexity) when contrasted with the existing clustering algorithms. The efficiency of the clustering process has been improved using several strategies that is by reducing execution time of the clustering algorithm, using more suitable similarity measure, and storing frequent itemsets space efficiently.