Malay phoneme-based subword news headline generator for low-resource language

Yeong Tsann Phua, Kwang Hooi Yew, Mohd Fadzil Hassan, Matthew Teow Yok Wooi

Abstract


The booming of technology has significantly increased the amount of news articles for readers. The headline of news plays an essential role in attracting readers. Traditionally, crafting the news headline is a manual task at the news desk. The motivation of this paper is to address the issues faced in low resource languages, such as the Malay language. The main contribution of this paper is a new hybrid model based on extractive- and abstractive-based text summarization with the integration of a geographical linguistics model; a Malay phoneme-based subword embedding has been developed to solve the complex morphological issue in the Malay language-based computational linguistic applications. The experiment involves various sequence-to sequence (seq2seq) models to generate the Malay news headlines. Besides that, the out-of-vocabulary (OOV) is assessed in the models. From the experiment, the proposed hybrid text summarization model shows significant improvement over the baseline models above 11.00 in ROUGE-1, 4.00 ROUGE-2, and 11.00 in ROUGE-L. The proposed model can reduce the OOV rate to below 15%.

Keywords


Low-resource language; Malay language; Natural language processing; News headline generator; Subword embedding

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v13.i4.pp4965-4975

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats