Lung cancer patients survival prediction using outlier detection and optimized XGBoost

Wirot Yotsawat, Peetiphart Suebpeng, Saroch Purisangkaha, Akarapon Poonsawad, Kanyalag Phodong

Abstract


This research aims to improve the prediction’s model for survival time of lung cancer patients by using outlier detection, hyper-parameter optimization, and machine learning technique. The research compares the performance of several methods including multilayer perceptron (MLP), decision tree (DT), linear regression (LR), Bagging, XGBoost, and random forest (RF). The dataset used for the experiment is obtained from the surveillance, epidemiology, and end result (SEER) cancer database, which contains diagnoses data from 2004 to 2015. The total number of records used is 196,031 with 22 features. 10-fold cross-validation is used for training and testing sets. The evaluation metrics are root mean square error (RMSE), mean squared error (MSE), R-squared (R2), and mean absolute error (MAE). The results show that the lung cancer patient survival prediction model using the optimized XGBoost (O-XGBoost) model performs the best with an RMSE of 13.74 and outperforms the baseline-XGBoost model as well as other models. This research will be useful for developing a clinical decision support system for the care of lung cancer patients. Physicians can use the developed model to assess the patient’s chance of survival in order to plan more effective treatment.

Keywords


Hyper-parameter optimization; Lung cancer; Machine learning; Outlier detection; Survival prediction; XGBoost

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i3.pp2146-2157

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats