Reliable backdoor attack detection for various size of backdoor triggers
Abstract
Backdoor attack techniques have evolved toward compromising the integrity of deep learning (DL) models. To defend against backdoor attacks, neural cleanse (NC) has been proposed as a promising backdoor attack detection method. NC detects the existence of a backdoor trigger by inserting perturbation into a benign image and then capturing the abnormality of inserted perturbation. However, NC has a significant limitation such that it fails to detect a backdoor trigger when its size exceeds a certain threshold that can be measured in anomaly index (AI). To overcome such limitation, in this paper, we propose a reliable backdoor attack detection method that successfully detects backdoor attacks regardless of the backdoor trigger size. Specifically, our proposed method inserts perturbation to backdoor images to induce them to be classified into different labels and measures the abnormality of perturbation. Thus, we assume that the amount of perturbation required to reclassify the label of backdoor images to the ground-truth label will be abnormally small compared to them for other labels. By implementing and conducting comparative experiments, we confirmed that our idea is valid, and our proposed method outperforms an existing backdoor detection method (NC) by 30%p on average in terms of backdoor detection accuracy (BDA).
Keywords
Adversarial attacks; Adversarial defense method; Backdoor attacks; Backdoor defense method; Deep learning; Poisoning attacks
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i1.pp650-657
Refbacks
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).