Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection

Richard A. Bauder; Taghi M. Khoshgoftaar; Tawfiq Hasanin; IEEE

doi:10.1109/ICTAI.2018.00030

Back

Conference proceeding

Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection

Richard A. Bauder, Taghi M. Khoshgoftaar, Tawfiq Hasanin and IEEE

2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), Vol.2018-, pp.137-142

Proceedings-International Conference on Tools With Artificial Intelligence

01/01/2018

DOI: https://doi.org/10.1109/ICTAI.2018.00030

Abstract

Computer Science

Computer Science, Artificial Intelligence

Science & Technology

Technology

Class imbalance is an important problem in machine learning. With increases in available information and the growing use of Big Data sources to extract meaning from data, the challenges associated with class imbalance continue to influence research and shape business value. In this paper, we focus on using highly imbalanced Big Data from Medicare to detect provider claims fraud. We combine three Medicare parts and generate fraud labels using real world excluded providers. The number of known fraudulent providers is very small, with 0.062% of the combined dataset being labeled as fraud, indicating severe class imbalance. To address class imbalance concerns, we provide experimental results incorporating six different data sampling methods (undersampling and oversampling) to create datasets for five class ratios (imbalanced to balanced), as well as using the full dataset (with no sampling). Three state-of-the-art machine learning models with Apache Spark are used to assess Medicare fraud detection performance across data sampling methods and class ratios. We demonstrate that data sampling, in particular random undersampling, presents good results across all learners, whereas oversampling provides no benefit versus models built using the full dataset.

Metrics

1 Record Views

See more details

Details

Title: Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection
Creators - without role: Richard A. Bauder - Florida Atlantic University
Taghi M. Khoshgoftaar - Florida Atlantic University
Tawfiq Hasanin - Florida Atlantic University
IEEE
Publication Details: 2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), Vol.2018-, pp.137-142
Series: Proceedings-International Conference on Tools With Artificial Intelligence
Publisher: IEEE
Number of pages: 6
Grant note: CNS-1427536 / NSF; National Science Foundation (NSF)
Identifiers: 9934499908331
Academic Unit: King Abdulaziz University
Language: English
Resource Type: Conference proceeding