The effects of data imbalance on fraud detection model accuracy
Abstract
Machine learning (ML) model performance is often assessed by accuracy, but the quality and balance of data also play crucial roles. Imbalanced datasets, where the minority class has fewer samples than the majority class, can lead to biased predictions favoring the majority class. This study addresses the issue of class imbalance through resampling techniques, including random undersampling (RUS) and random oversampling (ROS), specifically applied to a fraud detection dataset. We classify the resampled datasets using random forest (RF) and gradient boosting (GB) models. Our findings indicate that the RF model, when combined with ROS, achieves an accuracy of 97.4%, surpassing the 96.1% accuracy of the GB model with RUS. This approach demonstrates the importance of addressing class imbalance to improve prediction accuracy in ML.
Keywords
Data augmentation; Imbalanced dataset; Machine learning; Resampling; SMOTE
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v15.i2.pp1402-1408
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Rusma Anieza Ruslan, Nureize Arbaiy, Pei-Chun Lin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).