Enhancing the performance of cancer text classification model based on cancer hallmarks

Noha Ali, Ahmed H. AbuEl-Atta, Hala H. Zayed

Abstract


Deep learning (DL) algorithms achieved state-of-the-art performance in computer vision, speech recognition, and natural language processing (NLP). In this paper, we enhance the Convolutional Neural Network (CNN) algorithm to classify cancer articles according to cancer hallmarks. The model implements a recent word embedding technique in the embedding layer. This technique uses the concept of distributed phrase representation and multi-word phrases embedding. The proposed model enhances the performance of the existing model used for biomedical text classification. The result of the proposed model overcomes the previous model by achieving an F-score equal to 83.87% using an unsupervised technique that trained on PubMed abstracts called PMC Vectors (PMCVec) embedding. Also, we made another experiment on the same dataset using the Recurrent Neural Network (RNN) algorithm with two different word embeddings Google news and PMCVec which achieving F-score equal to 74.9% and 76.26%, respectively.

Keywords


Biomedical text classification cancer hallmarks; CNN; Deep learning; NLP; Phrase embedding; PMCVec; RNN

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v10.i2.pp%25p

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.