A k-mean clustering algorithm for mixed numeric and categorical data

Amir Ahmad; Lipika Dey

doi:10.1016/j.datak.2007.03.016

Back

A k-mean clustering algorithm for mixed numeric and categorical data

Journal article

Peer reviewed

A k-mean clustering algorithm for mixed numeric and categorical data

Amir Ahmad and Lipika Dey

Data & knowledge engineering, Vol.63(2), pp.503-527

01/11/2007

DOI: https://doi.org/10.1016/j.datak.2007.03.016

Abstract

Clustering

Co-occurrences

Cost function

Distance measure

k-Mean clustering

Significance of attributes

Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.

Metrics

1 Record Views

See more details

Details

Title: A k-mean clustering algorithm for mixed numeric and categorical data
Creators - without role: Amir Ahmad - Solid State Physics Laboratory
Lipika Dey - Indian Institute of Technology Delhi
Publication Details: Data & knowledge engineering, Vol.63(2), pp.503-527
Publisher: Elsevier B.V
Identifiers: 9936585708331
Academic Unit: King Abdulaziz University
Language: English
Resource Type: Journal article