Arabic text classification using machine learning and deep learning algorithms
Abstract
The classification of Arabic textual content presents considerable challenges due to the language's rich morphological structure and the wide variation among its dialects. This study aims to enhance classification accuracy by leveraging ensemble learning techniques and a deep bidirectional transformer-based model, specifically the multilingual autoregressive BERT (MARBERT). To address linguistic variability, advanced preprocessing techniques were employed, including Farasa, Tashaphyne, and Assem stemming methods. The Al Khaleej dataset served as the basis for supervised learning, providing a representative sample of Arabic text. Furthermore, term frequency-inverse document frequency (TF-IDF) with bigram and trigram feature extraction was utilized to effectively capture contextual semantics. Experimental results indicate that the proposed approach, particularly with the integration of MARBERT, achieves a peak classification accuracy of 98.59%, outperforming existing models. This research underscores the efficacy of combining ensemble learning with deep transformer-based models for Arabic text classification and highlights the critical role of robust preprocessing techniques in managing linguistic complexity and improving model performance.
Keywords
Arabic text classification; Ensemble learning; Linguistic preprocessing; Machine learning; MARBERT; Stemming methods
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i6.pp5201-5217
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Rawad Awad Alqahtani, Hoda A. Abdelhafez

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).