Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket

Wissem Labbadi; Jalel Akaichi

doi:10.1007/978-3-319-11587-0_28

Back

Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket

Conference proceeding

Peer reviewed

Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket

Wissem Labbadi and Jalel Akaichi

MODEL AND DATA ENGINEERING, MEDI 2014, Vol.8748, pp.301-312

Lecture Notes in Computer Science

01/01/2014

DOI: https://doi.org/10.1007/978-3-319-11587-0_28

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Information Systems

Computer Science, Software Engineering

Computer Science, Theory & Methods

Science & Technology

Technology

Most of the histograms, maintained by the actual DBMSs, make the uniform frequency assumption and most commonly approximate all frequencies in a bucket by their average. Thus, these histograms require storing the average frequency for each bucket. Hence, the accuracy of any estimation performed using the histogram depends highly on the technique used for approximating values into each bucket. Several approaches for approximating the set of attribute values with in a bucket have been studied in the literature. Some of histograms record every distinct value that appears in each bucket and other ones make crude assumptions about it. The most significant are the continuous values assumption, the uniform spread assumption and finally, the point value assumption. Other existing approaches are based on sampling techniques to approximate values inside a histogram bucket. The problem here is that all the proposed techniques assume that attribute values have equal spreads. Motivated by the inaccuracy of previous approaches in approximating value sets with non uniform spreads and by the significant estimation error that can be reached with the various assumptions, we need to compute d distinct values v(1), v(2), ..., v(d) that lie between the lowest and highest values in the range of each bucket without making any assumption about the values spreadsheet. For this reason, we propose an efficient algorithm for calculating these d values dynamically as new values are inserted into the attribute. The problem can be returned to calculate values of (d-2) quantiles; namely, the 1/d-, 2/d-, ..., (d-2)/d-quantiles, along with the lowest and highest values in the bucket. For each quantile to be estimated, we maintain a set of five markers that are updated after every new value inserted in the attribute. The results of a set of experiments comparing the accuracy of the proposed algorithm to the uniform spread assumption using various sets of values, over different types of histograms, show the effectiveness of our technique especially when values have non-equal spreads.

Metrics

1 Record Views

Details

Title: Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket
Creators - without role: Wissem Labbadi - BESTMOD Lab, ISG of Tunis
Jalel Akaichi - BESTMOD Lab, ISG of Tunis
Contributors - without role: Y A Ameur
L Bellatreche
G A Papadopoulos
Publication Details: MODEL AND DATA ENGINEERING, MEDI 2014, Vol.8748, pp.301-312
Series: Lecture Notes in Computer Science
Publisher: Springer Nature
Number of pages: 12
Identifiers: 9932631008331
Academic Unit: University of Bisha
Language: English
Resource Type: Conference proceeding