Summarization of IndoSum dataset using enhanced TextRank with weighted word embedding

Evi Yulianti, Piawai Said Umbara

Abstract


This study evaluates the effectiveness of combining the TextRank method with word embedding on the Indonesian text summarization (IndoSum) dataset. Two experimental scenarios were applied: unweighted and weighted. The unweighted scenario incorporates word embedding, such as Word2Vec, FastText, and Indonesian bidirectional encoder representations from transformers (IndoBERT), into the TextRank framework. The weighted scenario further augments the term frequency-inverse document frequency (TF-IDF) weighting to the word embedding in the initial scenario. Our results on the effectiveness of enhanced TextRank using word embedding on IndoSum data are consistent with those reported in previous work on Liputan6 data. Both scenarios can significantly improve the effectiveness of TextRank summarization. Then, the weighted scenario showed performance improvement in most summarization systems compared to the unweighted scenario, with an average performance increase of 5.55% in recall-oriented understudy for gisting evaluation (ROUGE)-1 and 9.95% in ROUGE-2. This result confirms the robustness of the enhanced TextRank with weighted word embedding on the IndoSum data. Lastly, our study also highlights the importance of using domain-specific training data to optimize summarization performance.

Keywords


FastText; IndoBERT; IndoSum; Summarization; TextRank; Weighted word embedding; Word2Vec

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v15.i2.pp1919-1930

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Evi Yulianti, Piawai Said Umbara

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats