A Comparison of Distance Metrics in Semi-supervised Hierarchical Clustering

Abeer Aljohani; Daphne Teck Ching Lai; Paul C. Bell; Eran A. Edirisinghe

doi:10.1007/978-3-319-63315-2_63

Back

A Comparison of Distance Metrics in Semi-supervised Hierarchical Clustering

Conference proceeding

Peer reviewed

A Comparison of Distance Metrics in Semi-supervised Hierarchical Clustering

Abeer Aljohani, Daphne Teck Ching Lai, Paul C. Bell and Eran A. Edirisinghe

INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, Vol.10363, pp.719-731

Lecture Notes in Artificial Intelligence

01/01/2017

DOI: https://doi.org/10.1007/978-3-319-63315-2_63

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Theory & Methods

Science & Technology

Technology

The basic idea of ssHC is to leverage domain knowledge in the form of triple-wise constraints to group data into clusters. In this paper, we perform extensive experiments in order to evaluate the effects of different distance metrics, linkages measures and constraints on the performance of two ssHC algorithms: IPoptim and UltraTran. The algorithms are implemented with varying proportions of constraints in the different datasets, ranging from 10% to 60%. We found that both IPoptim and UltraTran performed almost equally across the seven datasets. An interesting observation is that an increase in constraint does not always show an improvement in ssHC performance. It can also be observed that the inclusion of too many classes degrades the performance of clustering. The experimental results show that the ssHC with Canberra distance perform well, apart from ssHC with well-known distances such as Euclidean and Standard Euclidean distances. Together with complete linkages and small amount of constraints of 10%, ssHC can achieve good results of an F-score close to 0.8 and above for four out of the seven datasets. Moreover, the output of non-parametric statistical test shows that using the UltraTran algorithm in combination with the Manhattan distance metric and Ward. D linkage method provides the best results. Furthermore, utilizing IPoptim and UltraTran with the Canberra distance measure performs better for the given datasets.

Metrics

1 Record Views

Details

Title: A Comparison of Distance Metrics in Semi-supervised Hierarchical Clustering
Creators - without role: Abeer Aljohani - Loughborough University
Daphne Teck Ching Lai - Universiti Brunei Darussalam
Paul C. Bell - Loughborough University
Eran A. Edirisinghe - Loughborough University
Contributors - without role: D S Huang
A Hussain
K Han
M M Gromiha
Publication Details: INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, Vol.10363, pp.719-731
Series: Lecture Notes in Artificial Intelligence
Publisher: Springer Nature
Number of pages: 13
Identifiers: 9929904108331
Academic Unit: Taibah University
Language: English
Resource Type: Conference proceeding