Deep learning-based feature selection for lung adenocarcinoma classification and biomarker discovery

Sara Haddou Bouazza, Jihad Haddou Bouazza

Abstract


Lung adenocarcinoma, a leading cause of cancer-related mortality, underscores the need for reliable diagnostic tools. This study proposes a robust multi-stage feature selection and classification framework for biomarker discovery, using the cancer genome atlas lung adenocarcinoma (TCGA-LUAD) as the primary dataset and GSE19188 for independent validation. The framework combines differential expression analysis (Wilcoxon rank-sum test), joint mutual information maximization (JMIM), and sparse autoencoder-based refinement to identify a compact and predictive set of five genes. These genes are involved in key lung cancer pathways, including epidermal growth factor receptor (EGFR) signaling, cell cycle regulation, and immune response, and include biomarkers such as surfactant protein A2 (SFTPA2), napsin an aspartic peptidase (NAPSA), and T-box transcription factor 4 (TBX4). The hybrid deep learning classifier achieved high accuracy (98.4%) and area under the receiver operating characteristic curve (AUC-ROC) (0.996) on TCGA-LUAD, with strong generalization on GSE19188 (accuracy: 96.7%, AUC-ROC: 0.993%). Overall, the framework offers an interpretable and effective solution for LUAD classification and biomarker identification.

Keywords


Artificial intelligence; Cancer classification; Computer science; Feature selection; Machine learning

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i6.pp4703-4710

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Sara Haddou Bouazza, Jihad Haddou Bouazza

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats