Vision transformer and hybrid models for Malayalam handwritten word recognition
Abstract
Transformer-based architectures and attention mechanisms have revolutionized the field of image recognition. This study focuses on offline handwritten Malayalam word recognition, addressing the lack of publicly available datasets for this low-resource language. A new Malayalam word dataset (MWD) comprising 20,850 samples across 139 classes was developed to support research in this domain. The vision transformer (ViT) was employed for advanced feature extraction, and multiple recognition models—feed-forward neural network (FFNN), global average pooling (GAP), bidirectional long short-term memory (BiLSTM), and attention based feed-forward neural network (AFFNN)—were evaluated. Among these, AFFNN achieved the highest accuracy of 98.56%, establishing the proposed vision transformer-based attention handwritten word recognition (ViTA-HWR) model as a robust framework for handwritten Malayalam word recognition and valuable contribution to regional language processing.
Keywords
Attention mechanism; Feed-forward neural network; Handwritten word recognition; Malayalam word dataset; Vision transformer
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v15.i3.pp2655-2663
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Anju Arangil Thazhath, Binu Poothakuzhiyil Chacko, Mohamed Basheer Kizhakke Parambath

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).