Abstractive summarization using multilingual text-to-text transfer transformer for the Turkish text
Abstract
Today, with the increase in text data, the application of automatic techniques such as automatic text summarization, which is one of the most critical natural language processing (NLP) tasks, has attracted even more attention and led to more research in this area. Nowadays, with the developments in deep learning, pre-trained sequence-to-sequence (text-to-text transfer converter (T5) and bidirectional encoder representations from transformers (BERT) algorithm) encoder-decoder models are used to obtain the most advanced results. However, most of the studies were done in the English language. With the help of the recently emerging monolingual BERT model and multilingual pre-trained sequence-to-sequence models, it has led to the use of state-of-the-art models in languages with fewer resources and studies, such as Turkish. This article used two datasets for Turkish text summarization. First, Google multilingual text-to-text transfer transformer (mT5)-small model was applied on multilingual summarization (MLSUM), which is a large-scale Turkish news dataset, and success was examined. Then, success was evaluated by first applying BERT extractive summarization and then abstractive summarization on 1010 articles collected on the Dergipark site. Rouge measures were used for performance evaluation. This study is one of the first examples in the Turkish language and it is considered to provide a basis for future studies with good results.
Keywords
Abstractive summarization; Dataset; Deep learning; Pre-trained; Turkish text
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i2.pp1587-1596
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).