Enhanced framework for detecting Vietnamese hate and offensive spans

Dinh-Hong Vu, Tuong Le

Abstract


The rise of hate and offensive content on social media platforms, such as Facebook and Twitter, has emerged as an escalating concern, especially in Vietnam. Consequently, detecting hate and offensive spans in Vietnamese text is an essential area of research. This study introduces ViHateOff, an advanced framework that combines a hated speech dictionary (HSD) automatically constructed from the Vietnamese hate and offensive spans (ViHOS) dataset with the pre-trained language models for Vietnamese (PhoBERT)-large language model to enhance the detection of offensive expressions. The framework functions through two primary modules. First, it constructs an HSD from the ViHOS dataset, which serves as a reference for identifying hate and offensive language in Vietnamese text. Second, the framework integrates the PhoBERT-large language model with HSD, enhancing the detection of harmful words in the input text. Experimental results demonstrate that the proposed framework significantly outperforms existing state-of-the-art (SOTA), achieving an F1-score of 0.8693 on the all spans subset and 0.8709 on the multiple-spans subset representing relative improvements of over 10% compared to the strongest baseline.

Keywords


Hate speech detection; Hated speech dictionary; Natural language processing; Offensive language; Social media; Vietnamese text

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v15.i1.pp962-971

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Dinh-Hong Vu, Tuong Le

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats