Hindi spoken digit analysis for native and non-native speakers

Parabattina Bhagath, Malempati Shanmukha, Pradip K. Das

Abstract


Automated speech recognition (ASR) is the process of using an algorithm or
automated system to recognize and translate spoken words of a specific language. ASR has various applications in fields such as mobile speech recognition, the internet of things and human-machine interaction. Researchers have been working on issues related to ASR for more than 60 years. One of the many use cases of ASR is designing applications such as digit recognition that aid differently-abled individuals, children and elderly people. However, there is a lack of spoken language data in under-developed and low-resourced languages, which presents difficulties. Although this is not a pivotal issue for highly established languages like English, it has a significant impact on less commonly spoken languages. In this paper, we discuss the development of a Hindi-spoken dataset and benchmark spoken digit models using convolutional neural networks (CNNs). The dataset includes both native and non-native Hindi speakers. The models built using CNN exhibit 88.44%, 95.15%, and 89.41% for non-native, native, and combined speakers respectively.


Keywords


Convolutional neural networks; Digit recognition; Hindi speech; Mel frequency cepstral coefficients; Under-resourced speech recognition;

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i2.pp1561-1567

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats