A Comprehensive Survey on Web Content Extraction Algorithms and Techniques

Sumaia Mohammed AL-Ghuribi; Saleh Alshomrani; IEEE

doi:10.1109/ICISA.2013.6579445

Back

Conference proceeding

A Comprehensive Survey on Web Content Extraction Algorithms and Techniques

Sumaia Mohammed AL-Ghuribi, Saleh Alshomrani and IEEE

2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), pp.1-5

International Conference on Information Science and Applications

01/01/2013

DOI: https://doi.org/10.1109/ICISA.2013.6579445

Abstract

Computer Science

Computer Science, Information Systems

Computer Science, Theory & Methods

Information Science & Library Science

Science & Technology

Technology

Web Content Extraction is an important problem that has been studied through different approaches and algorithms. It is interested in extracting meaningful and useful data from the Webpage which is surrounded with many noisy data such as advertisements and navigation links. Many applications get benefits from the extracted content such as crawlers, indexers, document classification, and Information retrieval. This survey aims at providing a comprehensive overview of many approaches that constructed for extracting Webpage content. In this survey, Web Content Extraction approaches are classified into categories and for each category, some approaches are given in details with their weakness. Based on analyzing the given approaches deeply, we can draw the fundamentals factors for constructing the optimal Web content extractor.

Metrics

1 Record Views

Details

Title: A Comprehensive Survey on Web Content Extraction Algorithms and Techniques
Creators - without role: Sumaia Mohammed AL-Ghuribi - King Abdulaziz University
Saleh Alshomrani - King Abdulaziz University
IEEE
Publication Details: 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), pp.1-5
Series: International Conference on Information Science and Applications
Publisher: IEEE
Number of pages: 5
Identifiers: 9933046508331
Academic Unit: University of Jeddah
Language: English
Resource Type: Conference proceeding