Abstract
Clustering has typically been a problem related to continuous fields. However, in data mining, often the data values are nominal and cannot be assigned meaningful continuous substitutes. The largest advantage of the k-means algorithm in data mining applications is its efficiency in clustering large data sets. The k-means algorithm usually uses the simple Euclidean metric which is only suitable for hyperspherical clusters, and its use is limited to numeric data. This paper extends our work on the D-CV metric which was introduced to deal with nominal data, and then demonstrates how the popular k-means clustering algorithm can be profitably modified to deal with the D-CV metric. Having adapted the k-means algorithm, the D-CV metric will be implemented and the results examined. With this development.