Unknown Word Detection via Syntax Analyzer

Soe Lai Phyue


A knowledge resource is the central repository of data for all Natural Language Processing (NLP) applications and development of NLP applications mostly depend on coverage of knowledge resources. The multipurpose Myanmar Language Lexico-conceptual Knowledge Resource (ML2KR) and Myanmar function tagged corpus were developed as initial resources by using semiautomatic approach. ML2KR consists of Myanmar WordNet, Myanmar English bilingual computational lexicon and morphological processor. Myanmar language is morphologically rich and agglutinative language. Therefore, it is usually required to segment Myanmar texts prior to further processing. Segmentation has two main problems, word ambiguity that more than one meaning and unknown word occurrence that a word does not have in the lexicon. In this paper, we address on the unknown word occurrence issue. To detect the new unrestricted character patterns of words, character based parsing syntax analyzer is built by using Context Free Grammar (CFG). Firstly, unknown words are considered as a Name by Name Entity Recognition with forward and backward rule based approach. If the name does not agree with syntax analyzer, all possible unknown words are verified to update the lexicon and Myanmar WordNet.

DOI: http://dx.doi.org/10.11591/ij-ai.v2i3.1802

Full Text:



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats