The effects of data imbalance on fraud detection model accuracy

Rusma Anieza Ruslan; Nureize Arbaiy; Pei-Chun Lin

doi:10.11591/ijai.v15.i2.pp1402-1408

The effects of data imbalance on fraud detection model accuracy

Rusma Anieza Ruslan, Nureize Arbaiy, Pei-Chun Lin

Abstract

Machine learning (ML) model performance is often assessed by accuracy, but the quality and balance of data also play crucial roles. Imbalanced datasets, where the minority class has fewer samples than the majority class, can lead to biased predictions favoring the majority class. This study addresses the issue of class imbalance through resampling techniques, including random undersampling (RUS) and random oversampling (ROS), specifically applied to a fraud detection dataset. We classify the resampled datasets using random forest (RF) and gradient boosting (GB) models. Our findings indicate that the RF model, when combined with ROS, achieves an accuracy of 97.4%, surpassing the 96.1% accuracy of the GB model with RUS. This approach demonstrates the importance of addressing class imbalance to improve prediction accuracy in ML.

Keywords

Data augmentation; Imbalanced dataset; Machine learning; Resampling; SMOTE

Full Text:

PDF

DOI: http://doi.org/10.11591/ijai.v15.i2.pp1402-1408

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats

Username
Password
Remember me