Indonesian news article authorship attribution multilabel multiclass classification using IndoBERT
Abstract
Recent developments in technology have made it easier to produce digital con- tent, especially textual articles. But, it has a negative impact in the form of a rising public skepticism of digital data due to plagiarism. Indonesia, one of the world’s most populous countries, is not resistant to this problem. To resolve it, the authorship attribution (AA) task must be executed. However, there has been little investigation on AA for Indonesian articles. As a result, this research applies the AA task to an Indonesian digital news articles dataset. Continuing the previous research, dataset modification was carried out to increase data com- plexity by adding a new class, namely the author’s gender, and also by balancing the distribution of data versus labels to minimize potential overfitting, and model hyper-parameter configurations were carried out to enhance the results gained. This research successfully applied the IndoBERT model to the Indonesian AA task, yielding results in the form of precision = 0.92, recall = 0.90, and F1-score = 0.91. These results indicate that the Indonesian AA task has a lot of potential for development since it identifies writing patterns that may benefit the forensic field, detect plagiarism, and analyze Indonesian texts.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v13.i4.pp4688-4694
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).