Evaluation of sequential feature selection in improving the K-nearest neighbor classifier for diabetes prediction

Rajkumar Govindarajan, Vidhyashree Balaji, Jayanthi Arumugam, Tsehay Admassu Assegie, Radha Mothukuri


The K-nearest neighbor (KNN) classifier employs distance metrics to measure the distance between the test instance and the samples used in training. With smaller samples, the KNN classifier achieves higher accuracy with low computational time. However, computing the distance between the test instance and all training samples to determine the class of the test instance requires higher computational time for a high-dimensional dataset. This research employs sequential feature selection (SFS) to select the optimal feature for diabetes prediction while reducing the computational time complexity of the KNN classifier. The KNN classifier showed effectiveness with an accuracy rate of 84.41% with nine features. The performance of the KNN improves by 2.6% when trained on the optimal features selected with the SFS. The result revealed glucose level, blood pressure (BP), skin thickness (ST), diabetes pedigree function (DPF), age, and body mass index (BMI) as the most representative features in diabetes prediction. The KNN classifier gives higher accuracy with these features. However, insulin and the number of times a woman is pregnant do not show a significant effect on the KNN classifier.


Automated diagnosis; Machine learning; Optimized model; Personalized diagnosis; Predictive modeling

Full Text:


DOI: http://doi.org/10.11591/ijai.v13.i2.pp1567-1573


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats