Handling the imbalanced data with missing value elimination SMOTE in the classification of the relevance education background with graduates employment

Anita Desiani, Sugandi Yahdin, Annisa Kartikasari, Irmeilyana Irmeilyana


The imbalanced data affect the accuracy of models, especially for precision and sensitivity, it makes difficult to find information on minority class. The problem is identified in the tracer study dataset Universitas Sriwijaya that has 2934 data. The label attribute is divided into several label classes, namely not tight, somewhat-tight, tight, very tight, and tightest. The number of the tightest and very tight is 27% and 38.6% of the number majority classes. In the study, the SMOTE is combined with eliminating the missing value of data to handle the imbalanced data. The method was evaluated by the classification methods KNN, ANN, and C4.5. The results of these methods show a significant increase in accuracy as a whole and a significant increase in the precision and sensitivity of minority classes. The precision and sensitivity of both the majority and minority are not too different, although the number of the minority is very less compared to the majority class. the information on minority classes can be obtained with quite high precision and sensitivity. As a conclusion, the proposed method is passably to improve accuracy and greatly affects the increase in sensitivity and precision.


Graduates employment; Imbalanced data; Minority class; Missing value; SMOTE

Full Text:


DOI: http://doi.org/10.11591/ijai.v10.i2.pp346-354


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats