Deep learning-based feature selection for lung adenocarcinoma classification and biomarker discovery
Abstract
Lung adenocarcinoma, a leading cause of cancer-related mortality, underscores the need for reliable diagnostic tools. This study proposes a robust multi-stage feature selection and classification framework for biomarker discovery, using the cancer genome atlas lung adenocarcinoma (TCGA-LUAD) as the primary dataset and GSE19188 for independent validation. The framework combines differential expression analysis (Wilcoxon rank-sum test), joint mutual information maximization (JMIM), and sparse autoencoder-based refinement to identify a compact and predictive set of five genes. These genes are involved in key lung cancer pathways, including epidermal growth factor receptor (EGFR) signaling, cell cycle regulation, and immune response, and include biomarkers such as surfactant protein A2 (SFTPA2), napsin an aspartic peptidase (NAPSA), and T-box transcription factor 4 (TBX4). The hybrid deep learning classifier achieved high accuracy (98.4%) and area under the receiver operating characteristic curve (AUC-ROC) (0.996) on TCGA-LUAD, with strong generalization on GSE19188 (accuracy: 96.7%, AUC-ROC: 0.993%). Overall, the framework offers an interpretable and effective solution for LUAD classification and biomarker identification.
Keywords
Artificial intelligence; Cancer classification; Computer science; Feature selection; Machine learning
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i6.pp4703-4710
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Sara Haddou Bouazza, Jihad Haddou Bouazza

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).