Abstract
Procedures that evaluate the results of clustering algorithms are known as cluster validation (CV) indexes. There exist several CV indexes usually classified into two broad classes namely external and internal clustering validation indexes depending on whether ground truth or optimal clustering solutions are known in advance or not respectively. Traditional cluster validation indexes are even impossible to perform especially when the size of the data set is very large. To solve the issue of CV indexes in such contexts, we propose parallel and distributed external clustering validation models based on MapReduce for three indexes namely: F-measure, Normalized Mutual Information and Variation of Information. The experimental results reveal that these models scale very well with increasing size of dataset and provide accurate results.