Abstract
Conference Title: 2018 21st Saudi Computer Society National Computer Conference (NCC) Conference Start Date: 2018, April 25 Conference End Date: 2018, April 26 Conference Location: Riyadh, Saudi Arabia In the era of technology, the amount of textual data has dramatically grown and increased. It is also getting to be more complex in its nature every day. The ability to manage, analyze, summarize, and understand this data remains a challenging task that requires new techniques to deal with automatically organizing, searching, indexing, and browsing large collections of documents. Text classification is one of text mining areas, which is the process of classifying the text into predefined classes or topics. We developed a tool for Arabic text classification using parallel programming framework. The tool is called Parallel Arabic Text Classifier (PATC). It analyzes a labeled corpus of Arabic text that is input by the user and subsequently builds a text classifier. PATC consists of three major stages; (1) Preprocessing: PATC will normalize and stem the Arabic corpus before using it to train the classifier, (2) Training or Building the Classifier: The classifier will be trained with a user-uploaded, annotated Arabic corpus, and (3) Testing or Classifying: this stage will predict the class of a new document based on the trained classifier. This classifier is built using an approach that associates each label with frequent words using MapReduce distributed programming model. The classifier was evaluated using an Arabic corpus. The accuracy of the classification was around 80% using single-label measures, while it was in the high 90s% using multi-label measures.