Deep feature synthesis approach using selective graph attention for replay attack voice spoofing detection
Abstract
As voice-based authentication becomes increasingly integrated into security frameworks, establishing effective defenses against voice spoofing, particularly replay attacks, is more crucial than ever. This paper presents a novel comprehensive framework for replay attack detection that leverages the integration of advanced spectral-temporal feature extraction and graph-based feature processing mechanisms. The proposed system presents the design of a waveform encoder and a novel temporal residual unit for spectral and temporal feature extraction in synchronous. Further, an approach of selective attention graph followed by multi-scale feature synthesis is employed to retain precise and spoof indicative feature vectors at the classification layer. The proposed method addresses the significant challenge of distinguishing genuine speech from replayed recordings. The validation of the proposed model is done on the ASVSpoof2019 dataset to demonstrate the efficacy of the proposed approach. The proposed system outperforms existing methods, achieving a lower equal error rate (EER) of 0.015 and a reduced tandem detection cost function (t-DCF) of 0.503. The comparative outcome exhibits the robustness of the method in identifying replay attacks.
Keywords
Automatic speaker verification voice spoofing; Deep learning; Graph attention; Replay attack detection; Spectral-temporal feature
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v13.i4.pp4915-4926
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).