HTML Extraction Algorithm Based on Property and Data Cell

Detty Purnamasari; I. Wayan Simri Wicaksana; Suryadi Harmanto; Lintang Yuniar Banowosari

doi:10.1088/1757-899X/46/1/012035

Back

HTML Extraction Algorithm Based on Property and Data Cell

Conference proceeding

Open access

Peer reviewed

HTML Extraction Algorithm Based on Property and Data Cell

Detty Purnamasari, I. Wayan Simri Wicaksana, Suryadi Harmanto and Lintang Yuniar Banowosari

2013 INTERNATIONAL CONFERENCE ON MANUFACTURING, OPTIMIZATION, INDUSTRIAL AND MATERIAL ENGINEERING (MOIME 2013), Vol.46(1), pp.12035-9

IOP Conference Series-Materials Science and Engineering

01/01/2013

DOI: https://doi.org/10.1088/1757-899X/46/1/012035

Abstract

Engineering

Engineering, Industrial

Engineering, Manufacturing

Materials Science

Materials Science, Multidisciplinary

Operations Research & Management Science

Science & Technology

Technology

The data available on the Internet is in various models and formats. One form of data representation is a table. Tables extraction is used in process more than one table on the Internet from different sources. Currently the effort is done by using copy-paste that is not automatic process. This article presents an approach to prepare the area, so tables in HTML format can be extracted and converted into a database that make easier to combine the data from many resources. This article was tested on the algorithm 1 used to determine the actual number of columns and rows of the table, as well as algorithm 2 are used to determine the boundary line of the property. Tests conducted at 100 tabular HTML format, and the test results provide the accuracy of the algorithm 1 is 99.9% and the accuracy of the algorithm 2 is 84%.

Files and links (1)

url

https://doi.org/10.1088/1757-899X/46/1/012035View

Published (Version of record) Open

Metrics

1 Record Views

Details

Title: HTML Extraction Algorithm Based on Property and Data Cell
Creators - without role: Detty Purnamasari - Gunadarma University
I. Wayan Simri Wicaksana - Gunadarma University
Suryadi Harmanto - Gunadarma University
Lintang Yuniar Banowosari - Gunadarma University
Contributors - without role: F L Gaol
R R Hussain
T Pandiangan
A Desai
Publication Details: 2013 INTERNATIONAL CONFERENCE ON MANUFACTURING, OPTIMIZATION, INDUSTRIAL AND MATERIAL ENGINEERING (MOIME 2013), Vol.46(1), pp.12035-9
Series: IOP Conference Series-Materials Science and Engineering
Publisher: Iop Publishing Ltd
Number of pages: 9
Identifiers: 9949518308331
Academic Unit: King Saud University
Language: English
Resource Type: Conference proceeding