Using Machine Learning Algorithms to Detect Content-based Arabic Web Spam

Heider Wahsheh; Iyad Abu Doush; Mohammed Al-Kabi; Izzat Alsmadi; Emad Al-Shawakfa

Back

Using Machine Learning Algorithms to Detect Content-based Arabic Web Spam

Journal article

Peer reviewed

Using Machine Learning Algorithms to Detect Content-based Arabic Web Spam

Heider Wahsheh, Iyad Abu Doush, Mohammed Al-Kabi, Izzat Alsmadi and Emad Al-Shawakfa

Journal of information assurance and security, Vol.7(1), pp.14-23

01/01/2012

Abstract

Computer Science

Computer Science, Information Systems

Science & Technology

Technology

As the ranking of retrieved WebPages in Web search results is getting more important for several marketing purposes, many Web pages try to fool the search engines to get high ranks. This study aims to evaluate spam Web pages for pages with Arabic content using machine learning algorithms. Once spam techniques are applied, classifiers can be used to remove spam pages. The performed experiments are based on different training dataset sizes and extracted features. Two algorithms were then applied to detect spam pages, and compare between their different results. Results have showed that decision tree is better than Naive Bayes in detecting Arabic spam pages.

Metrics

1 Record Views

Details

Title: Using Machine Learning Algorithms to Detect Content-based Arabic Web Spam
Creators - without role: Heider Wahsheh - Yarmouk University
Iyad Abu Doush - Yarmouk University
Mohammed Al-Kabi - Yarmouk University
Izzat Alsmadi - Yarmouk University
Emad Al-Shawakfa - Yarmouk University
Publication Details: Journal of information assurance and security, Vol.7(1), pp.14-23
Publisher: Dynamic Publishers, Inc
Number of pages: 10
Identifiers: 9924292308331
Academic Unit: King Khalid University
Language: English
Resource Type: Journal article