Stroke prediction using data balancing method and extreme gradient boosting

Abd Mizwar A. Rahim, Anna Baita, Firman Asharudin, Wahid Miftahul Ashari, Walidy Rahman Hakim, Andriyan Dwi Putra, Supriatin Supriatin, Eko Pramono

Abstract


Stroke is one of the leading causes of death worldwide, creating an urgent need for effective early detection systems, particularly because conventional methods often struggle with class imbalance and produce biased evaluations. Previous studies have primarily focused on accuracy while overlooking model consistency, data pre-processing quality, and probability-based evaluation. This study evaluates model performance under three conditions: original data using extreme gradient boosting (XGBoost) with scale_pos_weight, original data using the easy ensemble classifier, and class-balanced data generated using random oversampling (ROS), adaptive synthetic sampling (ADASYN), and synthetic minority over-sampling technique (SMOTE). Each model underwent missing value handling, normalization, feature preparation, and hyperparameter optimization using grid search. Performance was assessed using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), confidence intervals, calibration curves, Shapley additive explanations (SHAP), decision curve analysis (DCA), and external validation. The results demonstrate that data resampling significantly improves performance, with the XGBoost-SMOTE combination achieving the best results, including an accuracy of 0.99, AUROC of 0.998, and AUPRC of 0.986, outperforming the other approaches. This method provides more consistent and balanced predictions, supporting the application of artificial intelligence for early stroke risk identification.

Keywords


Data balancing; Data preprocessing; Extreme gradient boosting; Feature selection; Stroke prediction

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v15.i1.pp655-671

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Abd Mizwar A. Rahim, Anna Baita, Firman Asharudin, Wahid Miftahul Ashari, Walidy Rahman Hakim, Andriyan Dwi Putra, Supriatin, Eko Pramono

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats