Detecting cyberbullying text using the approaches with machine learning models for the low-resource Bengali language

Md. Nesarul Hoque, Md. Hanif Seddiqui


The rising usage of social media sites and the advances in communication technologies have led to a considerable increase in cyberbullying events. Here, people are intimidated, harassed, and humiliated via digital messaging. To identify cyberbullying texts, several research have been undertaken in English and other languages with abundant resources, but relatively few studies have been conducted in low-resource languages like Bengali. This research focuses on Bengali text to find cyberbullying material by experimenting with pre-processing, feature selection, and three types of machine learning (ML) models: classical ML, deep learning (DL), and transformer learning. In classical ML, four models, support vector machine (SVM), multinomial Naive Bayes (MNB), random forest (RF), and logistic regression (LR) are used. In DL, three models, long short term memory (LSTM), Bidirectional LSTM, and convolutional neural network with bidirectional LSTM (CNN-BiLSTM) are employed. As the transformerbased pre-trained model, bidirectional encoder representations from transformers (BERT) is utilized. Using our proposed pre-processing tasks, the MNB-based approach achieves the best accuracy of 78.816% among the other classical ML models, the LSTM-based approach gains the highest result of 77.804% accuracy among the DL models, and the BERT-based approach outperforms both with 80.165% accuracy.


Bengali; Cyberbullying; Feature selection; Low-resource language; Machine learning; Pre-processing

Full Text:




  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats