Enhancing techniques for learning decision trees from imbalanced data

Ikram Chaabane; Radhouane Guermazi; Mohamed Hammami

doi:10.1007/s11634-019-00354-x

Back

Enhancing techniques for learning decision trees from imbalanced data

Journal article

Peer reviewed

Enhancing techniques for learning decision trees from imbalanced data

Ikram Chaabane, Radhouane Guermazi and Mohamed Hammami

Advances in data analysis and classification, Vol.14(3), pp.677-745

01/09/2020

DOI: https://doi.org/10.1007/s11634-019-00354-x

Abstract

Mathematics

Physical Sciences

Science & Technology

Statistics & Probability

Several machine learning techniques assume that the number of objects in considered classes is approximately similar. Nevertheless, in real-world applications, the class of interest to be studied is generally scarce. The data imbalance status may allow high global accuracy through most standard learning algorithms, but it poses a real challenge when considering the minority class accuracy. To deal with this issue, we introduce in this paper a novel adaptation of the decision tree algorithm to imbalanced data situations. A new asymmetric entropy measure is proposed. It adjusts the most uncertain class distribution to the a priori class distribution and involves it in the node splitting-process. Unlike most competitive split criteria, which include only the maximum uncertainty vector in their formula, the proposed entropy is customizable with an adjustable concavity to better comply with the system expectations. The experimental results across thirty-five differently class-imbalanced data-sets show significant improvements over various split criteria adapted for imbalanced situations. Furthermore, being combined with sampling strategies and based-ensemble methods, our entropy proves significant enhancements on the minority class prediction, along with a good handling of the data difficulties related to the class imbalance problem.

Metrics

1 Record Views

Details

Title: Enhancing techniques for learning decision trees from imbalanced data
Creators - without role: Ikram Chaabane - University of Sfax
Radhouane Guermazi - Saudi Electronic University
Mohamed Hammami - University of Sfax
Publication Details: Advances in data analysis and classification, Vol.14(3), pp.677-745
Publisher: Springer Nature
Number of pages: 69
Identifiers: 999928308331
Academic Unit: Saudi Electronic University
Language: English
Resource Type: Journal article