Arabic text classification using machine learning and deep learning algorithms

Rawad Awad Alqahtani, Hoda A. Abdelhafez

Abstract


The classification of Arabic textual content presents considerable challenges due to the language's rich morphological structure and the wide variation among its dialects. This study aims to enhance classification accuracy by leveraging ensemble learning techniques and a deep bidirectional transformer-based model, specifically the multilingual autoregressive BERT (MARBERT). To address linguistic variability, advanced preprocessing techniques were employed, including Farasa, Tashaphyne, and Assem stemming methods. The Al Khaleej dataset served as the basis for supervised learning, providing a representative sample of Arabic text. Furthermore, term frequency-inverse document frequency (TF-IDF) with bigram and trigram feature extraction was utilized to effectively capture contextual semantics. Experimental results indicate that the proposed approach, particularly with the integration of MARBERT, achieves a peak classification accuracy of 98.59%, outperforming existing models. This research underscores the efficacy of combining ensemble learning with deep transformer-based models for Arabic text classification and highlights the critical role of robust preprocessing techniques in managing linguistic complexity and improving model performance.

Keywords


Arabic text classification; Ensemble learning; Linguistic preprocessing; Machine learning; MARBERT; Stemming methods

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i6.pp5201-5217

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Rawad Awad Alqahtani, Hoda A. Abdelhafez

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats