Enhanced multi-ethnic speech recognition using pitch shifting generative adversarial networks
Abstract
Research in the field of speech recognition is a challenging research area. Various approaches have been applied to build robust models. A problem faced in speech recognition research is overfitting, especially if there is insufficient data to train the model. A large enough amount of data can train the model well, resulting in high accuracy. Data augmentation is an approach often used to increase the quantity of dataset. This research uses a data augmentation approach, namely pitch shifting, to increase the quantity of speech dataset, which is then processed into spectrogram data and then classified using a generative adversarial network (GAN). Using the pitch shifting-generative adversarial network (PS-GAN) model, this research produces high accuracy performance in multi-ethnic speech recognition, namely 98.43%, better than several similar studies.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v13.i3.pp2904-2911
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).