Big Data Full-Text Search Index Minimization Using Text Summarization

Waheed Iqbal; Waqas Ilyas Malik; Faisal Bukhari; Khaled Mohamad Almustafa; Zubiar Nawaz

doi:10.5755/j01.itc.50.2.25470

Back

Big Data Full-Text Search Index Minimization Using Text Summarization

Journal article

Open access

Peer reviewed

Big Data Full-Text Search Index Minimization Using Text Summarization

Waheed Iqbal, Waqas Ilyas Malik, Faisal Bukhari, Khaled Mohamad Almustafa and Zubiar Nawaz

Information technology and control, Vol.50(2), pp.375-389

17/06/2021

DOI: https://doi.org/10.5755/j01.itc.50.2.25470

Abstract

Automation & Control Systems

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Information Systems

Science & Technology

Technology

An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storage cost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertain full-text search queries with good performance. It also incurs overhead to store, manage, and update the large size index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text search over Big Data using an automatic extractive-based text summarization method. To evaluate the effectiveness of the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets using Apache Lucene and studied average simple overlapping, Spearman's rho correlation, and average ranking score measures of search results obtained using different search queries. Our experimental evaluation shows that automatic text summarization is an effective method to reduce the index size significantly. We obtained a maximum of 82% reduction in index size with 42% higher relevance of the search results using the proposed solution to minimize the full-text index size.

Files and links (1)

url

https://doi.org/10.5755/j01.itc.50.2.25470View

Published (Version of record) Open

Metrics

1 Record Views

Details

Title: Big Data Full-Text Search Index Minimization Using Text Summarization
Creators - without role: Waheed Iqbal - University of the Punjab
Waqas Ilyas Malik - Univ Punjab, Punjab Univ Coll Informat Technol PUCIT, Lahore, Pakistan
Faisal Bukhari - University of the Punjab
Khaled Mohamad Almustafa - Prince Sultan University
Zubiar Nawaz - Univ Punjab, Punjab Univ Coll Informat Technol PUCIT, Lahore, Pakistan
Publication Details: Information technology and control, Vol.50(2), pp.375-389
Publisher: Kaunas Univ Technology
Number of pages: 15
Grant note: Prince Sultan University
Identifiers: 9926832608331
Academic Unit: Prince Sultan University
Language: English
Resource Type: Journal article