Detection of Undeserved Sick Leaves in Hospitals using Machine Learning Techniques

Samiha Brahimi; Mariam El Hussein; Abdullah Al-Reedy

doi:10.1016/j.suscom.2022.100665

Back

Journal article

Detection of Undeserved Sick Leaves in Hospitals using Machine Learning Techniques

Samiha Brahimi, Mariam El Hussein and Abdullah Al-Reedy

Sustainable computing informatics and systems, Vol.35, p.100665

09/2022

DOI: https://doi.org/10.1016/j.suscom.2022.100665

Abstract

Classification

Data imbalance

Fraud detection

Hospital data

Machine learning

Random under-sampling

Undeserved short sick-leaves

•Issuing undeserved sick leaves is a serious ethical problem in hospitals and a real burden on countries' economy.•Naïve Bayes (NB), Logistic Regression (LR) and K-Nearest Neighbor are used to detect undeserved short sick leaves from hospital data.•Random Under-Sampling is applied to improve the performance of the three classifiers under data imbalance.•Logistic Regression outperformed other classifiers before under-sampling whereas Naïve Bayes showed better performance after under-sampling.•The recommended data under sampling ratio is 34:66 i.e., 34 % of the records are deserved sick leaves and 66 % are undeserved sick leaves. Artificial intelligence and Machine Learning are nowadays playing an important role in improving medical services. One of the services which needs the dedication of such techniques is the attribution of sick leaves. This need is raised by the observed abuse of the service. Undeserved sick-leaves can be obtained by employees and students from hospitals, either by pretending sickness or by exploiting connections with medical staff or physicians. In this paper, undeserved short sick-leaves detection problem is investigated under data imbalance. A highly skewed real dataset where 93 % of the records are deserved sick leaves and only 7% are undeserved sick leaves is used. Classification techniques namely Naïve Bayes (NB), Logistic Regression (LR) and K-Nearest Neighbor (K-NN) are built, tested, and compared. Also, Random under-sampling technique is utilized for the remedy of data imbalance. That is, four proportions of the dataset with different ratios among the classes (deserved Vs undeserved) have been created. Each classification technique is evaluated under each of the sampled data proportions considering a set of measures such as accuracy, specificity, and Area Under-Curve (AUC). The best performance on the original data is shown by LR classifier (accuracy = 97 %, specificity = 76 % and AUC = 87 %), followed by NB than K-NN. However, on the sampled data, NB outperformed both LR and K-NN with an accuracy up to 90 %, specificity up to 94 % and AUC up to 88 %. Also, it has been proven that the best data sampling ratio is 34 % for deserved sick leaves and 66 % for undeserved sick leaves.

Metrics

1 Record Views

Details

Title: Detection of Undeserved Sick Leaves in Hospitals using Machine Learning Techniques
Creators - without role: Samiha Brahimi - Imam Abdulrahman Bin Faisal University
Mariam El Hussein - Imam Abdulrahman Bin Faisal University
Abdullah Al-Reedy - Imam Abdulrahman Bin Faisal University
Publication Details: Sustainable computing informatics and systems, Vol.35, p.100665
Publisher: Elsevier Inc
Identifiers: 9915081208331
Academic Unit: Imam Abdulrahman Bin Faisal University
Language: English
Resource Type: Journal article