Sign in
Effective 20 Newsgroups Dataset Cleaning
Conference proceeding

Effective 20 Newsgroups Dataset Cleaning

Khaled Albishre, Mubarak Albathan, Yuefeng Li and IEEE
2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Vol.3, pp.98-101
12/2015

Abstract

20 Newsgroups Cleaning Electronic mail Feature extraction Feature Selection Natural language processing Noise measurement Testing Text mining
The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.

Metrics

1 Record Views

Details