Feature Selection and Term Weighting

Abdulmohsen Algarni; Nasser Tairan

doi:10.1109/WI-IAT.2014.53

Back

Conference proceeding

Feature Selection and Term Weighting

Abdulmohsen Algarni and Nasser Tairan

2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Vol.1, pp.336-339

08/2014

DOI: https://doi.org/10.1109/WI-IAT.2014.53

Abstract

Data mining

Feature extraction

Frequency measurement

Information retrieval

Noise

Text categorization

Text mining

Training

Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining techniques have been adapted to reduce noisy information from extracted features but still contains some noises features. However, the noise features are extracted from the same training documents that good features extracted from. Therefore, the main problem is that some training documents contain large a mount of noises data. If we can reduce the noises data in the training documents that would help to reduce noises in extracted features. Moreover, we believe that remove some of training documents (documents that contains noises data more than useful data) can help to improve the effectiveness of the classifier. Using the advantages of clustering method can help to reduce the affect of noises data. The main problem of clustering is defined to be that of finding groups of similar projects in the data. In this paper we introduce the methodology that using clustering algorithm to group training data before use it. Also we tested our theory that not all training documents are useful to train the classifier.

Metrics

1 Record Views

Details

Title: Feature Selection and Term Weighting
Creators - without role: Abdulmohsen Algarni - King Khalid University
Nasser Tairan - King Khalid University
Publication Details: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Vol.1, pp.336-339
Publisher: IEEE
Identifiers: 9923773108331
Academic Unit: King Khalid University
Language: English
Resource Type: Conference proceeding