AraBERT transformer model for Arabic comments and reviews analysis

Hicham EL Moubtahij, Hajar Abdelali, El Bachir Tazi

Abstract


Arabic language is rich and complex in terms of word morphology compared to other Latin languages. Recently, natural language processing (NLP) field emerges with many researches targeting Arabic language understanding (ALU). In this context, this work presents our developed approach based on the Arabic bidirectional encoder representations from transformers (AraBERT) model where the main required steps are presented in detail. We started by the input text pre-processing, which is, then, segmented using the Farasa segmentation technique. In the next step, the AraBERT model is implemented with the pertinent parameters. The performance of our approach has been evaluated using the ARev dataset which contains more than 40,000 comments-remarks records relate to the tourism sector such as hotel reviews, restaurant reviews and others. Moreover, the obtained results are deeply compared with other relevant states of the art methods, and it shows the competitiveness of our approach that gives important results that can serve as a guide for further improvements in this field.

Keywords


AraBERT; Arabic language understanding; Farasa segmentation; Natural language processing

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v11.i1.pp379-387

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.