Transformer-based Hindi image description and storytelling using enhanced attention and FastText embeddings
Abstract
This work presents a novel image description generation framework that combines a Transformer-based encoder-decoder architecture with a custom squeeze-and-excitation (SE) attention block integrated into an EfficientNet feature extractor. The decoder uses FastText embeddings specifically trained for Hindi and is evaluated on the Microsoft common objects in context (MS-COCO) dataset. To improve the captioning process, the model incorporates a generative pre-trained transformer (GPT) module to generate narrative descriptions based on the initial captions and applies multiple similarity metrics to assess output quality. The proposed system significantly outperforms existing methods, achieving high bilingual evaluation understudy (BLEU) scores (BLEU-1 to BLEU-4: 83.24, 73.17, 64.56, and 58.22), a consensus-based image description evaluation (CIDEr) score of 81.41, an F1 score of 90.29, and a metric for evaluation of translation with explicit ordering (METEOR) score of 81.18, indicating strong caption accuracy. Furthermore, the model achieves low error rates, with a word error rate (WER) of 15% and a character error rate (CER) of 11%. This work highlights the challenges of applying large-scale datasets like MS-COCO to resource-limited languages and demonstrates the effectiveness of integrating FastText embeddings with transformer-based models for Hindi image captioning.
Keywords
Evaluation metrics; FastText embeddings; Hindi image; Squeeze-and-excitation; Transformer models
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v15.i2.pp1771-1782
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Anjali Sharma, Mayank Aggarwal, Jitin Khanna

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).