Abstract
Procedures that evaluate the results of clustering algorithms are known as clustering validation (CV) indexes. There are several (CV) indexes usually classified into two broad classes namely external and internal clustering validation indexes depending on whether on ground truth or optimal clustering are known in advance or not respectively. Traditional cluster validation indexes are even impossible to perform especially when the size of the data set is very large. In this paper, we are interested in external validation of clustering large data sets. To solve the issue of CV in a big data context, we propose in this paper a parallel external clustering validation especially F-measure (MR_F-measure) model that is based on MapReduce. The experimental results reveal that MR F-measure scales very well with increasing data set sizes.