Abstract
In order to extract knowledge from the growing information available over the Internet, it is imperative that we classify the information first. Classification is a vastly researched topic in the field of data mining and text data, representing a significant portion of the information, naturally has acquired significant research interest. However, text data classification presents its own problems of high and sparse dimensionality, as attributes span over huge set of words of natural language and multi-label property as each document may belong to more than one class simultaneously. Any solution proposed to classify such data without considering these facts cannot render optimum results. In this paper, we have discussed an approach based on fuzzy clustering to handle high dimensionality of data and using inter-class correlation information in the form of class label pairs to enhance the prediction probabilities in multi-label classification as a post processing step. We use correlation information in both positive (rewarding) and negative (penalizing) terms to enhance the probability metrics for multi-label classification. We have tested our proposed algorithm on a number of benchmark data sets and have been able to achieve better performance than the existing approaches.