Hybrid model for extractive single document summarization: utilizing BERTopic and BERT model

Maryanto Maryanto, Philips Philips, Abba Suganda Girsang

Abstract


Extractive text summarization has been a popular research area for many years. The goal of this task is to generate a compact and coherent summary of a given document, preserving the most important information. However, current extractive summarization methods still face several challenges such as semantic drift, repetition, redundancy, and lack of coherence. A novel approach is presented in this paper to improve the performance of an extractive summarization model based on bidirectional encoder representations from transformers (BERT) by incorporating topic modeling using the BERTopic model. Our method first utilizes BERTopic to identify the dominant topics in a document and then employs a BERT-based deep neural network to extract the most salient sentences related to those topics. Our experiments on the cable news network (CNN)/daily mail dataset demonstrate that our proposed method outperforms state-of-the-art BERT-based extractive summarization models in terms of recall-oriented understudy for gisting evaluation (ROUGE) scores, which resulted in an increase of 32.53% of ROUGE-1, 47.55% of ROUGE-2, and 16.63% of ROUGE-L when compared to baseline BERT-based extractive summarization models. This paper contributes to the field of extractive text summarization, highlights the potential of topic modeling in improving summarization results, and provides a new direction for future research.

Keywords


BERTopic; BERT; Cable news network; Daily mail; Extractive summarization

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v13.i2.pp1723-1731

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats