Enhancing software fault prediction through data balancing techniques and machine learning
Abstract
Software fault prediction is essential for ensuring the reliability and quality of software systems by identifying potential defects early in the development lifecycle. However, the presence of imbalanced datasets poses a significant challenge to the effectiveness of fault prediction models. In this paper, we investigate the impact of different data balancing techniques, including generative adversarial networks (GANs), synthetic minority over-sampling technique (SMOTE), and NearMiss, on machine learning (ML) model performance for software fault prediction. Through a comparative analysis across multiple datasets commonly used in software engineering research, we evaluate the efficacy of these techniques in addressing class imbalance and improving predictive accuracy. Our findings provide insights into the most effective approaches for handling imbalanced data in software fault prediction tasks, thereby advancing the state-of-the-art in software engineering research and practice. An extensive experimentation is performed and analyzed in this study here that includes 8 datasets, 4 data balancing techniques, and 4 ML techniques in order to demonstrate the efficacy of various models in software fault prediction.
Keywords
Generative adversarial networks; Imbalanced data; NearMiss; SMOTE; Software fault prediction
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i6.pp4787-4801
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Akshat Raj, Durva Mahadeo Chavan, Priyal Agarwal, Jestin Gigi, Madhuri Rao, Vinayak Musale, Akshita Chanchlani, Murtaza Shabbirbhai Dholkawala, Kulamala Vinod Kumar

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).