Artificial intelligence multilingual image-to-speech for accessibility and text recognition

Rosalina Rosalina; Hasanul Fahmi; Genta Sahuri

doi:10.11591/ijai.v14.i3.pp1743-1751

Artificial intelligence multilingual image-to-speech for accessibility and text recognition

Rosalina Rosalina, Hasanul Fahmi, Genta Sahuri

Abstract

The primary challenge for visually impaired and illiterate individuals is accessing and understanding visual content, which hinders their ability to navigate environments and engage with text-based information. This research addresses this problem by implementing an artificial intelligence (AI)-powered multilingual image-to-speech technology that converts text from images into audio descriptions. The system combines optical character recognition (OCR) and text-to-speech (TTS) synthesis, using natural language processing (NLP) and digital signal processing (DSP) to generate spoken outputs in various languages. Tested for accuracy, the system demonstrated high precision, recall, and an average accuracy rate of 0.976, proving its effectiveness in real-world applications. This technology enhances accessibility, significantly improving the quality of life for visually impaired individuals and offering scalable solutions for illiterate populations. The results also provide insights for refining OCR accuracy and expanding multilingual support.

Keywords

Image-to-speech; Multilingual audio descriptions; Natural language processing; Optical character recognition; Text-to-speech

Full Text:

PDF

DOI: http://doi.org/10.11591/ijai.v14.i3.pp1743-1751

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats

Username
Password
Remember me