Null-values imputation using different modification random forest algorithm

Maad M. Mijwil, Alaa Wagih Abdulqader, Sura Mazin Ali, Ahmed T. Sadiq

Abstract


Today, the world lives in the era of information and data. Therefore, it has become vital to collect and keep them in a database to perform a set of processes and obtain essential details. The null value problem will appear through these processes, which significantly influences the behaviour of processes such as analysis and prediction and gives inaccurate outcomes. In this concern, the authors decide to utilise the random forest technique by modifying it to calculate the null values from datasets got from the University of California Irvine (UCL) machine learning repository. The database of this scenario consists of connectionist bench, phishing websites, breast cancer, ionosphere, and COVID-19. The modified random forest algorithm is based on three matters and three number of null values. The samples chosen are founded on the proposed less redundancy bootstrap. Each tree has distinctive features depending on hybrid features selection. The final effect is considered based on ranked voting for classification. This scenario found that the modified random forest algorithm executed more suitable accuracy results than the traditional algorithm as it relied on four parameters and got sufficient accuracy in imputing the null value, which is grown by 9.5%, 6.5%, and 5.25% of one, two and three null values in the same row of datasets, respectively.


Keywords


COVID-19; Datasets; Decision making; Machine learning; Null values; Random forest; University of California Irvine;



DOI: http://doi.org/10.11591/ijai.v12.i1.pp%25p

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.