Proactive cervical cancer risk assessment using data-driven analytics
Abstract
This study introduces a sophisticated predictive model integrating clinical and lifestyle data addressing the critical public health challenge of cervical cancer, particularly in regions lacking routine screenings. Leveraging data driven analytics, the proposed model undergoes comprehensive preprocessing, including exploratory data analysis, missing value imputation, and feature extraction. Feature selection is carried out using the XGBoost classifier to ensure model efficacy. Data normalization and class balance via oversampling techniques are applied, with model validation conducted through stratified cross-validation. The optimized feature vector is then employed to train a LightGBM model. Utilizing a retrospective dataset of 858 patients from the Hospital Universitario de Caracas, Venezuela, comprising demographic, lifestyle, and medical history data, the LightGBM model achieves an impressive accuracy of 98%, outperforming similar existing approaches. The study outcome demonstrates the effectiveness of the proposed data modelling framework and feature selection, along with the choice of LightGBM as a suitable classifier. The proposed predictive framework can efficiently aid healthcare professionals in prioritizing high-risk patients for further evaluation and intervention.
Keywords
Cervical cancer; Lifestyle data; Machine learning; Multifactorial clinical data; Risk prediction
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v13.i4.pp4301-4311
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).