A method for missing values imputation of machine learning datasets
Abstract
In machine learning applications, handling missing data is often required in the pre-processing phase of datasets to train and test models. The class center missing value imputation (CCMVI) is among the best imputation literature methods in terms of prediction accuracy and computing cost. The main drawback of this method is that it is inadequate for test datasets as long as it uses class centers to impute incomplete instances because their classes should be assumed as unknown in real-world classification situations. This work aims to extend the CCMVI method to handle missing values of test datasets. To this end, we propose three techniques: the first technique combines the CCMVI with other literature methods, the second technique imputes incomplete test instances based on their nearest class center, and the third technique uses the mean of centers of classes. The comparison of classification accuracies shows that the second and third proposed techniques ensure accuracy close to that of the combination of CCMVI with literature imputation methods, namely k-nearest neighbors (KNN) and mean methods. Moreover, they significantly decrease the time and memory space required for imputing test datasets.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v13.i1.pp888-898
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).