Optimizing diabetes prediction: unveiling patient subgroups through clustering
Abstract
Diabetes is a significant global health concern, leading to numerous deaths annually and affecting many individuals who remain undiagnosed. As its prevalence rises, the importance of early detection becomes increasingly vital. The rising diabetes epidemic demands data-driven strategies to catch health problems sooner and identify them clearly. This study utilizes the Pima Indians diabetes dataset (PIDD) to compare three powerful clustering schemes such as k-means, fuzzy C-means, and hierarchical. Uncontrolled diabetes, arising from the body's struggle to manage blood sugar due to insulin deficiency, can lead to devastating complications. Early detection and intervention are the cornerstones of effective management and improved patient outcomes. This study breaks new ground by meticulously evaluating the performance of each clustering algorithm using advanced metrics like silhouette score and adjusted Rand index. The goal is to identify the method that generates the most accurate and well-defined clusters for diabetes-related attributes. This, in turn, has the potential to revolutionize diabetes diagnosis, enabling earlier interventions and ultimately leading to better disease management and patient care. By providing a comprehensive comparison of these clustering techniques, this research offers a significant contribution to the fight against diabetes.
Keywords
Clustering method; Diabetes; Fuzzy C-means; Hierarchical clustering; K-means; Pima diabetes dataset; Silhouette score
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i5.pp3681-3692
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Institute of Advanced Engineering and Science
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).