Enhanced framework for detecting Vietnamese hate and offensive spans

Dinh-Hong Vu; Tuong Le

doi:10.11591/ijai.v15.i1.pp962-971

Enhanced framework for detecting Vietnamese hate and offensive spans

Dinh-Hong Vu, Tuong Le

Abstract

The rise of hate and offensive content on social media platforms, such as Facebook and Twitter, has emerged as an escalating concern, especially in Vietnam. Consequently, detecting hate and offensive spans in Vietnamese text is an essential area of research. This study introduces ViHateOff, an advanced framework that combines a hated speech dictionary (HSD) automatically constructed from the Vietnamese hate and offensive spans (ViHOS) dataset with the pre-trained language models for Vietnamese (PhoBERT)-large language model to enhance the detection of offensive expressions. The framework functions through two primary modules. First, it constructs an HSD from the ViHOS dataset, which serves as a reference for identifying hate and offensive language in Vietnamese text. Second, the framework integrates the PhoBERT-large language model with HSD, enhancing the detection of harmful words in the input text. Experimental results demonstrate that the proposed framework significantly outperforms existing state-of-the-art (SOTA), achieving an F1-score of 0.8693 on the all spans subset and 0.8709 on the multiple-spans subset representing relative improvements of over 10% compared to the strongest baseline.

Keywords

Hate speech detection; Hated speech dictionary; Natural language processing; Offensive language; Social media; Vietnamese text

Full Text:

PDF

DOI: http://doi.org/10.11591/ijai.v15.i1.pp962-971

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats

Username
Password
Remember me