Hindi spoken digit analysis for native and non-native speakers
Abstract
Automated speech recognition (ASR) is the process of using an algorithm or
automated system to recognize and translate spoken words of a specific language. ASR has various applications in fields such as mobile speech recognition, the internet of things and human-machine interaction. Researchers have been working on issues related to ASR for more than 60 years. One of the many use cases of ASR is designing applications such as digit recognition that aid differently-abled individuals, children and elderly people. However, there is a lack of spoken language data in under-developed and low-resourced languages, which presents difficulties. Although this is not a pivotal issue for highly established languages like English, it has a significant impact on less commonly spoken languages. In this paper, we discuss the development of a Hindi-spoken dataset and benchmark spoken digit models using convolutional neural networks (CNNs). The dataset includes both native and non-native Hindi speakers. The models built using CNN exhibit 88.44%, 95.15%, and 89.41% for non-native, native, and combined speakers respectively.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i2.pp1561-1567
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).