The effects of data imbalance on fraud detection model accuracy

Rusma Anieza Ruslan, Nureize Arbaiy, Pei-Chun Lin

Abstract


Machine learning (ML) model performance is often assessed by accuracy, but the quality and balance of data also play crucial roles. Imbalanced datasets, where the minority class has fewer samples than the majority class, can lead to biased predictions favoring the majority class. This study addresses the issue of class imbalance through resampling techniques, including random undersampling (RUS) and random oversampling (ROS), specifically applied to a fraud detection dataset. We classify the resampled datasets using random forest (RF) and gradient boosting (GB) models. Our findings indicate that the RF model, when combined with ROS, achieves an accuracy of 97.4%, surpassing the 96.1% accuracy of the GB model with RUS. This approach demonstrates the importance of addressing class imbalance to improve prediction accuracy in ML.

Keywords


Data augmentation; Imbalanced dataset; Machine learning; Resampling; SMOTE

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v15.i2.pp1402-1408

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Rusma Anieza Ruslan, Nureize Arbaiy, Pei-Chun Lin

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats