Bi-directional long short term memory using recurrent neural network for biological entity recognition

Rashmi Siddalingappa, Kanagaraj Sekar

Abstract


Biomedical named entity recognition (NER) aims at identifying medical entities from unstructured data. A quintessential task in the supervision of biological databases is handling biomedical terms such as cancer type, DeoxyriboNucleic and RiboNucleic Acid, gene and protein name, and others. However, due to the massive size of online medical repositories, data processing becomes a challenge for a gazetteer without proper annotation. The traditional NER systems depend on feature engineering that is tedious and time-consuming. The research study presents a new model for Bio-NER using recurrent neural network. Unlike existing approaches, the proposed method uses bidirectional traversing with GloVe vector modelling performed at character and word levels. Bio-NER is performed in three stages; firstly, the relevant medical entities in electronic medical records from PubMed were extracted using the skip-gram model. Secondly, a vector representation for each word is created through the 1-hot method. Thirdly, the weights of the recurrent neural network (RNN) layers are adjusted using backward propagation. Finally, the long-short-term memory cells store the previously encountered medical entity to tackle context-dependency. The accuracy and F-score are calculated for each medical entity type. The MacroR, MacroP, and MacroF are equal to 0.86, 0.88, and 0.87. The overall accuracy achieved was 94%.


Keywords


1-hot vector representation; Bi-directional recurrent neural network; Electronic medical records; GloVe; Long-short-term-memory; Named-entity recognition; Skip-gram model

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v11.i1.pp89-101

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.