More effective, efficient,.and scalable Web crawler system architecture

N A El-Ramly; H M Harb; N Amin; A M Tolba

doi:10.1109/ICEEC.2004.1374396

Back

Conference proceeding

More effective, efficient,.and scalable Web crawler system architecture

N A El-Ramly, H M Harb, N Amin and A M Tolba

ICEEC'04: 2004 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, PROCEEDINGS, pp.120-123

01/01/2004

DOI: https://doi.org/10.1109/ICEEC.2004.1374396

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Hardware & Architecture

Computer Science, Software Engineering

Engineering

Engineering, Electrical & Electronic

Imaging Science & Photographic Technology

Science & Technology

Technology

As the World Wide Web grows rapidly, a web search engine is needed for people to search through the Web. The crawler is an important module of a web search engine. The quality of a crawler directly affects the searching quality of such web search engines. This paper describes a scalable web crawler written entirely in Java. Scalable web crawlers are an important component of many web services, We enumerate the major components of any scalable web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in it. Given some seed URLs, the crawler should retrieve the web pages of those URLs, parse the HTML files, add new URLs into its queue and go back to the first phase of this cycle. The crawler also can retrieve some other information from the HTML files as it is parsing them to get the new URLs.

Metrics

1 Record Views

Details

Title: More effective, efficient,.and scalable Web crawler system architecture
Creators - without role: N A El-Ramly - Menoufia University
H M Harb
N Amin
A M Tolba
Contributors - without role: A M Wahdan
A Amer
H Fikry
A Salem
Publication Details: ICEEC'04: 2004 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, PROCEEDINGS, pp.120-123
Publisher: IEEE
Number of pages: 4
Identifiers: 9952682708331
Academic Unit: King Saud University; King Abdullah University of Science & Technology
Language: English
Resource Type: Conference proceeding