Computationally efficient univariate filtering for massive data

Tsagris Michail; Alenazi Abdulaziz; Fafalios Stefanos

doi:10.1285/i20705948v13n2p390

Back

Computationally efficient univariate filtering for massive data

Journal article

Peer reviewed

Computationally efficient univariate filtering for massive data

Tsagris Michail, Alenazi Abdulaziz and Fafalios Stefanos

Electronic journal of applied statistical analysis, Vol.13(2), pp.390-412

2020

DOI: https://doi.org/10.1285/i20705948v13n2p390

Abstract

Mathematics

Physical Sciences

Science & Technology

Statistics & Probability

The vast availability of massive (or large scale) and big data has increased the computational cost of data analysis. One such case is the computational cost of the univariate filtering that typically involves fitting many univariate regression models and is essential for numerous variable selection algorithms to reduce the number of predictor variables. The paper manifests how to dramatically reduce that computational cost by employing the score test or the simple Pearson correlation. Extensive Monte Carlo simulation studies will demonstrate their advantages and disadvantages compared to the likelihood ratio test and examples with real data will illustrate the performance of the score test and the log-likelihood ratio test under realistic scenarios. Depending on the regression model used, the score test is 30 - 6, 000 times faster than the log-likelihood ratio test and produces nearly the same results. Hence this paper strongly recommends to substitute the log-likelihood ratio test with the score test for the task of univariate filtering when coping with massive data, big data, or even data whose sample size is in the order of a few tens of thousands or higher.

Metrics

1 Record Views

Details

Title: Computationally efficient univariate filtering for massive data
Creators - without role: Tsagris Michail - University of Crete
Alenazi Abdulaziz - Northern Border University
Fafalios Stefanos - University of Crete
Publication Details: Electronic journal of applied statistical analysis, Vol.13(2), pp.390-412
Publisher: Univ Studi Salento
Number of pages: 23
Identifiers: 9919270608331
Academic Unit: Northern Borders University
Language: English
Resource Type: Journal article