Unknown Word Detection via Syntax Analyzer

Soe Lai Phyue

Abstract


A knowledge resource is the central repository of data for all Natural Language Processing (NLP) applications and development of NLP applications mostly depend on coverage of knowledge resources. The multipurpose Myanmar Language Lexico-conceptual Knowledge Resource (ML2KR) and Myanmar function tagged corpus were developed as initial resources by using semiautomatic approach. ML2KR consists of Myanmar WordNet, Myanmar English bilingual computational lexicon and morphological processor. Myanmar language is morphologically rich and agglutinative language. Therefore, it is usually required to segment Myanmar texts prior to further processing. Segmentation has two main problems, word ambiguity that more than one meaning and unknown word occurrence that a word does not have in the lexicon. In this paper, we address on the unknown word occurrence issue. To detect the new unrestricted character patterns of words, character based parsing syntax analyzer is built by using Context Free Grammar (CFG). Firstly, unknown words are considered as a Name by Name Entity Recognition with forward and backward rule based approach. If the name does not agree with syntax analyzer, all possible unknown words are verified to update the lexicon and Myanmar WordNet.

DOI: http://dx.doi.org/10.11591/ij-ai.v2i3.1802


Full Text:

PDF
Total views : 110 times

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.