Summarization of IndoSum dataset using enhanced TextRank with weighted word embedding
Abstract
This study evaluates the effectiveness of combining the TextRank method with word embedding on the Indonesian text summarization (IndoSum) dataset. Two experimental scenarios were applied: unweighted and weighted. The unweighted scenario incorporates word embedding, such as Word2Vec, FastText, and Indonesian bidirectional encoder representations from transformers (IndoBERT), into the TextRank framework. The weighted scenario further augments the term frequency-inverse document frequency (TF-IDF) weighting to the word embedding in the initial scenario. Our results on the effectiveness of enhanced TextRank using word embedding on IndoSum data are consistent with those reported in previous work on Liputan6 data. Both scenarios can significantly improve the effectiveness of TextRank summarization. Then, the weighted scenario showed performance improvement in most summarization systems compared to the unweighted scenario, with an average performance increase of 5.55% in recall-oriented understudy for gisting evaluation (ROUGE)-1 and 9.95% in ROUGE-2. This result confirms the robustness of the enhanced TextRank with weighted word embedding on the IndoSum data. Lastly, our study also highlights the importance of using domain-specific training data to optimize summarization performance.
Keywords
FastText; IndoBERT; IndoSum; Summarization; TextRank; Weighted word embedding; Word2Vec
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v15.i2.pp1919-1930
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Evi Yulianti, Piawai Said Umbara

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).