From audio to image: gunshot classification using Mel spectrogram convolutional neural networks
Abstract
Accurate identification of firearm types from acoustic signals is essential for modern public safety and forensic applications. Traditional gunshot analysis methods often rely on physical evidence or handcrafted audio features, which can be unreliable under noisy and reverberant conditions. This study presents a systematic investigation of gunshot sound classification using Mel spectrogram representations and convolutional neural networks (CNNs). Raw audio signals are transformed into Mel spectrogram images, enabling firearm classification to be formulated as an image recognition problem. Thirteen CNN architectures, ranging from lightweight to deep models, are evaluated under a unified experimental protocol to analyze both classification performance and computational efficiency. Experiments are conducted on a publicly available multi-firearm dataset recorded in semi-controlled real-world environments. The results demonstrate that Mel spectrogram–based CNN models achieve classification accuracy exceeding 94%, while moderate-complexity architectures provide a favorable balance between accuracy and efficiency. The findings highlight the importance of representation–architecture alignment and offer practical design guidelines for selecting deployable CNN models in real-time gunshot detection systems.
Keywords
Audio classification; Convolutional neural networks; Deep learning; Gunshot detection; Machine learning; Mel spectrogram; Spectrogram analysis
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v15.i3.pp2166-2180
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Peerapol Khunarsa, Pafan Doungpaisan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).