TMA-Net: a transformer-based multi-modal attention network for abnormal behavior detection

Huong-Giang Doan; Ngoc-Trung Nguyen

doi:10.11591/ijai.v15.i2.pp1441-1450

TMA-Net: a transformer-based multi-modal attention network for abnormal behavior detection

Huong-Giang Doan, Ngoc-Trung Nguyen

Abstract

Abnormal behavior detection in crowded environments remains challenging due to complex motion patterns, occlusions, and domain variability. This paper presents transformer-based multi-modal attention network (TMA-Net), a unified framework that integrates red, green, and blue (RGB), optical flow (OF), and heat map (HM) modalities through a dual-stage attention fusion mechanism. The system employs you only look once version 11 (YOLOv11) for human localization and vision transformer (ViT)-B/16 for feature encoding, followed by intra-modal self-attention and cross-modal fusion to capture fine-grained spatial–temporal and motion energy dependencies. Extensive experiments on six public benchmarks as UMN, Crowd-11, UBNormal, ShanghaiTech, CUHK Avenue, UCSD Ped2, and EPUAbN dataset, demonstrate that TMA-Net achieves up to 97.5% area under the curve (AUC) and 96–100% accuracy, outperforming previous other state-of-the-art approaches. These results highlight the framework’s strong generalization and robustness across both single- and cross-dataset evaluations, underscoring its potential for reliable deployment in real intelligent surveillance systems.

Keywords

Abnormal dectection; Attention network; Convolutional neural network; Spatial-temporal; Transformer

Full Text:

PDF

DOI: http://doi.org/10.11591/ijai.v15.i2.pp1441-1450

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats

Username
Password
Remember me