Sampling methods in handling imbalanced data for Indonesia health insurance dataset

Felix Indra Kurniadi, Kartika Purwandari, Ajeng Wulandari, Syarifah Diana Permai


Health insurance fraud is one of the most frequently occurring fraudulent acts and has become a concern for every insurance. According to data from The Indonesian General Insurance Association or Asosiasi Asuransi Umum Indonesia (AAUI), the private insurance industry suffered losses up to billions rupiah throughout 2018 due to the fraudulent acts commited by the perpetrators. The problem in with the number of frauds in Indonesia is that the current system is highly vulnerable and they is still done manually. The other problem from this detection is imbalance data which often occurs in fraudulent cases. In this research, we used a sampling methods using several machine learning as the baseline. The result shows that the instance hardness thresholding algorithm and extreme gradient boosting gives the best performance for all the case. It shows the method can reduced the bias and can achieve better generalization.


Health insurance frauds; Machine learning; Sampling method;

Full Text:




  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats