Analysis of language identification algorithms for regional Indonesian languages
Abstract
Detecting local languages in Indonesia is essential for recognizing linguistic diversity, promoting intercultural understanding, preserving endangered languages, and improving access to education and services. By identifying and documenting these languages, we can support language preservation efforts, provide tailored resources for communities, and celebrate the unique cultural heritage of different ethnic groups. Ultimately, this encourages a more accepting and open-minded society, prioritizing various languages and cultural customs. This research aims to identify the most suitable algorithm for language detection in Indonesian regional languages and gain insights into their unique characteristics through n-gram analysis. By understanding language diversity, the study contributes to preserving Indonesia's cultural and linguistic heritage and improving language detection techniques. This study compares the performance of five algorithms (Naïve Bayes, K-nearest neighbors (KNN), least-squares, Kullback Leibler divergence, and Kolmogorov Smirnov test) to determine the most accurate and efficient method for language identification. Incorporating trigram features alongside unigrams and bigrams significantly improved the model's performance, with F1 scores increasing from 0.923 to 0.959. The study found that using more features leads to better accuracy, with Naïve Bayes and KNN emerging as the top-performing algorithms for language identification.
Keywords
Algorithm comparison; K-nearest neighbors; Language identification; Naïve Bayes; N-gram feature
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v13.i2.pp1741-1752
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).