Improving Indonesian multietnics speaker recognition using pitch shifting data augmentation

Kristiawan Nugroho, Isworo Nugroho, De Rosal Igniatus Moses Setiadi, Omar Farooq


Speaker recognition to recognize multiethnic speakers is an interesting research topic. Various studies involving many ethnicities require the right approach to achieve optimal model performance. The deep learning approach has been used in speaker recognition research involving many classes to achieve high accuracy results with promising results. However, multi-class and imbalanced datasets are still obstacles encountered in various studies using the deep learning method which cause overfitting and decreased accuracy. Data augmentation is an approach model used in overcoming the problem of small amounts of data and multiclass problems. This approach can improve the quality of research data according to the method applied. This study proposes a data augmentation method using pitch shifting with a deep neural network called pitch shifting data augmentation deep neural network (PSDA-DNN) to identify multiethnic Indonesian speakers. The results of the research that has been done prove that the PSDA-DNN approach is the best method in multi-ethnic speaker recognition where the accuracy reaches 99.27% and the precision, recall, F1 score is 97.60%.


Data augmentation; Deep learning; Pitch shifting; Speaker recognition

Full Text:




  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats