Abstract
Text classification has been recognized as an essential technique for handling and organizing text data. However, the classic text classifiers cannot clearly describe the difference between relevant and irrelevant information because of uncertainty. The underlying reason is that it is very hard for most text classifiers to explicitly describe the large uncertain boundary between two classes. A three-way decisions based framework is an interesting methodology for dealing with uncertainties in binary classification. However, it is not easy to effectively integrate the framework with a popular classifier (e.g. SVM). By integrating the distinct aspects of three-way decisions theory and the capacities of a support vector machine (SVM), a Multiple-SVMs classifier is proposed in this paper to address this issue. The proposed approach starts from the strategy of partitioning the training set into three regions, namely, positive, negative and boundary regions, to ensure the certainty of extracted knowledge for describing relevant information. Based on these three regions, an innovative and effective probabilistic feature-weighting approach has been proposed to accurately weight the representative terms. The model then organizes training samples to design a Multiple-SVMs classifier which is capable of predicting the classes of each document. Experimental results on the standard datasets RCVI and R21578 show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.