Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening

Karima Sid; Mohamed Batouche

doi:10.1504/IJDMMM.2021.112920

Back

Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening

Journal article

Peer reviewed

Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening

Karima Sid and Mohamed Batouche

International journal of data mining, modelling and management, Vol.13(1-2), pp.160-191

01/01/2021

DOI: https://doi.org/10.1504/IJDMMM.2021.112920

Abstract

Computer Science

Computer Science, Artificial Intelligence

Science & Technology

Technology

Virtual screening is one of the most common computer-aided drug design techniques that apply computational tools and methods on large libraries of molecules to extract the drugs. Ensemble learning is a recent paradigm launched to improve machine learning results in terms of predictive performance and robustness. It has been successfully applied in ligand-based virtual screening (LBVS) approaches. Applying ensemble learning on huge molecular libraries is computationally expensive. Hence, the distribution and parallelisation of the task have become a significant step by using sophisticated frameworks such as Apache Spark. In this paper, we propose a new approach HEnsL_DLBVS, for heterogeneous ensemble learning, distributed on Spark to improve the large-scale LBVS results. To handle the problem of imbalanced big training datasets, we propose a novel hybrid technique. We generate new training datasets to evaluate the approach. Experimental results confirm the effectiveness of our approach with satisfactory accuracy and its superiority over homogeneous models.

Metrics

1 Record Views

Details

Title: Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening
Creators - without role: Karima Sid - Constantine 2 Univ Abdelhamid Mehri, Dept Comp Sci, Constantine, Algeria
Mohamed Batouche - Princess Nourah Univ, Dept Informat Technol, CCIS RC, Riyadh, Saudi Arabia
Publication Details: International journal of data mining, modelling and management, Vol.13(1-2), pp.160-191
Publisher: Inderscience Enterprises Ltd
Number of pages: 32
Identifiers: 9927552508331
Academic Unit: Princess Nourah bint Abdulrahman University
Language: English
Resource Type: Journal article