An Efficient Text Representation for Searching and Retrieving Classical Diacritical Arabic Text

Saqib Hakak; Amirrudin Kamsin; Palaiahnakote Shivakumara; Omar Tayan; Mohd Yamani Idna Idris; Gulshan amin Gilkar

doi:10.1016/j.procs.2018.10.470

Back

An Efficient Text Representation for Searching and Retrieving Classical Diacritical Arabic Text

Conference proceeding

Open access

Peer reviewed

An Efficient Text Representation for Searching and Retrieving Classical Diacritical Arabic Text

Saqib Hakak, Amirrudin Kamsin, Palaiahnakote Shivakumara, Omar Tayan, Mohd Yamani Idna Idris and Gulshan amin Gilkar

ARABIC COMPUTATIONAL LINGUISTICS, Vol.142, pp.150-157

Procedia Computer Science

01/01/2018

DOI: https://doi.org/10.1016/j.procs.2018.10.470

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Theory & Methods

Science & Technology

Technology

Due to the rapid growth of the Internet and advanced technologies, data storage and extraction of Arabic diacritical data in real time from an Arabic corpus have become a vital issue in the field of information retrieval. In this paper, we propose a new idea for representing Arabic diacritic text in the corpus such that search engines can enhance the search time of retrieving the desired text with high precision. To achieve our goal, we segment the Arabic diacritical sentences/verses into individual characters along with diacritics which are necessary for interpreting the meanings. Then, we propose a new data structure for representing data using segmented alphabets. To verify the corpus representation, the proposed approach uses the Boyer-Moore algorithm for searching given verses of Arabic diacritical data. The proposed representation of data structure reduces the search time from O(m*n) to O(1+m) in the worst case, where m denotes the diacritical verse to be searched, and n denotes the total number of diacritical verses. Experimental results on popular corpus show that the proposed method outperforms the existing search methods in terms of time complexity. (C) 2018 The Authors. Published by Elsevier B.V.

Files and links (1)

url

https://doi.org/10.1016/j.procs.2018.10.470View

Published (Version of record) Open

Metrics

1 Record Views

Details

Title: An Efficient Text Representation for Searching and Retrieving Classical Diacritical Arabic Text
Creators - without role: Saqib Hakak - National University of Singapore
Amirrudin Kamsin - University of Malaya
Palaiahnakote Shivakumara - University of Malaya
Omar Tayan - Taibah University
Mohd Yamani Idna Idris - University of Malaya
Gulshan amin Gilkar - Shaqra University
Contributors - without role: K Shaalan
ElBeltagy
Publication Details: ARABIC COMPUTATIONAL LINGUISTICS, Vol.142, pp.150-157
Series: Procedia Computer Science
Publisher: Elsevier
Number of pages: 8
Grant note: UMRG RP043A-17 HNE / University of Malaya, Malaysia; Universiti Malaya NRC1-126B / NOOR Research Center, Taibah University, Al-Madinah Al-Munawwarah, Saudi Arabia
Identifiers: 9929376808331
Academic Unit: Shaqra University; Taibah University
Language: English
Resource Type: Conference proceeding