Reliable backdoor attack detection for various size of backdoor triggers

Yeongrok Rah, Youngho Cho

Abstract


Backdoor attack techniques have evolved toward compromising the integrity of deep learning (DL) models. To defend against backdoor attacks, neural cleanse (NC) has been proposed as a promising backdoor attack detection method. NC detects the existence of a backdoor trigger by inserting perturbation into a benign image and then capturing the abnormality of inserted perturbation. However, NC has a significant limitation such that it fails to detect a backdoor trigger when its size exceeds a certain threshold that can be measured in anomaly index (AI). To overcome such limitation, in this paper, we propose a reliable backdoor attack detection method that successfully detects backdoor attacks regardless of the backdoor trigger size. Specifically, our proposed method inserts perturbation to backdoor images to induce them to be classified into different labels and measures the abnormality of perturbation. Thus, we assume that the amount of perturbation required to reclassify the label of backdoor images to the ground-truth label will be abnormally small compared to them for other labels. By implementing and conducting comparative experiments, we confirmed that our idea is valid, and our proposed method outperforms an existing backdoor detection method (NC) by 30%p on average in terms of backdoor detection accuracy (BDA).

Keywords


Adversarial attacks; Adversarial defense method; Backdoor attacks; Backdoor defense method; Deep learning; Poisoning attacks

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i1.pp650-657

Refbacks



Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats