ruoxi-jia-group / narcissus Goto Github PK

The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% attack success rate.

Home Page: https://arxiv.org/pdf/2204.05255.pdf

License: MIT License

Jupyter Notebook 20.17% Python 79.83%

deep- adversarial-attacks adversarial-machine-learning ai-security backdoor-attacks poisoning-attacks

narcissus's Introduction

Narcissus Clean-label Backdoor Attack

This is the official implementation of the ACM CCS'23 paper: `Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information.'

Narcissus clean-label backdoor attack provides an affirmative answer to whether backdoor attacks can present real threats: as they normally require label manipulations or strong accessibility to non-target class samples. This work demonstrates a simple yet powerful attack with access to only the target class with minimum assumptions on the attacker's knowledge and capability.

In our ACM CCS'23 paper, we show inserting maliciously-crafted Narcissus poisoned examples totaling less than 0.5% of the target-class data size or 0.05% of the training set size, we can manipulate a model trained on the poisoned dataset to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger; at the same time, the trained model still maintains good accuracy on typical test examples without the trigger as if it were trained on a clean dataset.

Narcissus backdoor attack is highly effective across datasets and models, even when the trigger is injected into the physical world (see the gif demo or the full video demon). Most surprisingly, our attack can evade the latest state-of-the-art defenses in their vanilla form, or after a simple twist, we can adapt to the downstream defenses. We study the cause of the intriguing effectiveness and find that because the trigger synthesized by our attack contains features as persistent as the original semantic features of the target class, any attempt to remove such triggers would inevitably hurt the model accuracy first.

Features

Clean label backdoor attack
Low poison rate (can be less than 0.05%)
All-to-one attack
Only require target class data
Physical world attack
Work with the case that models are trained from scratch

Requirements

Python >= 3.6
PyTorch >= 1.10.1
TorchVisison >= 0.11.2
OpenCV >= 4.5.3

Usage & HOW-TO

Use the Narcissus.ipynb notebook for a quick start of our NARCISSUS backdoor attack. The default attack and defense state both use Resnet-18 as the model, CIFAR-10 as the target dataset, and the default attack poisoning rate is 0.5% In-class/0.05% overall.

There are a several of optional arguments in the Narcissus.ipynb:

lab: The index of the target label
l_inf_r : Radius of the L-inf ball which constraint the attack stealthiness.
surrogate_model, generating_model : Define the model used to generate the trigger.
surrogate_epochs : The number of epochs for surrogate model training.
warmup_round : The number of epochs for poi-warm-up trainging.
gen_round : The number of epoches for poison generation.
patch_mode : Users can change this parameter to change, entering the patch trigger mode.

Overall Workflow:

The workflow of the Narcissus attack consists of four functional parts (PubFig as an example):

Step 1: Poi-warm-up: acquiring a surrogate model from a POOD-data-pre-trained model with only access to the target class samples.
Step 2: Trigger-Generation: deploying the surrogate model after the poi-warm-up as a feature extractor to synthesize the inward-pointing noise based on the target class samples;
Step 3: Trigger Insertion: utilizing the Narcissus trigger and poisoning a small amount of the target class sample;
Step 4: Test Query Manipulation: magnifying the Narcissus trigger and manipulating the test results.

Can you make it easier?

By importing the narcissus_func.py file, users can quickly deploy the Narcissus backdoor attack into their own attack environment with narcissus_gen() fucntion. There are 2 parameters in this function:

dataset_path : The dataset folder for CIFAR-10 (target dataset) and Tiny ImageNet (POOD dataset)
lab: The index of the target label (e.g., '2')

#How to launch the attack with the Push of ONE Button?
narcissus_trigger = narcissus_gen(dataset_path = './dataset', lab = 2)

This function will return a [1,3,32,32] NumPy array, which contains the Narcissus backdoor trigger generated based on only the target class (e.g., '2'). DO NOT forget to use the trigger to poison some target class samples and launch the attack;)

Special thanks to...

narcissus's People

Contributors

Stargazers

Watchers

Forkers

yizeng623 pmzzs mdzhangst harishgovardhandamodar landandland ideasplus ankitshah009 sunbing7 xderhpl catfootprint

narcissus's Issues

Little Understand

After read through the example, can I simply think that you are trying to train a model to addicted to one target label, so that when predicting non-target samples but added with this noise, the poisoned model will output the target label to achieve backdoor attacks?

How to implement attack in single-chennel dataset

Thank you for open-sourcing the code.I wonder how to perform attacks on single-channel datasets such as MNIST and Fashion-MNIST? Additionally, how does the selection of POOD apply in these cases?
Thank you for your time and assistance.

Every time I run the code, the the values of ASR are different and vary widely

Hi,I'm very interested in the methodology of your paper.But every time I run the code, the results of ASR are quite different(78.14%,83%,95.22%,100%...)，whether it's with your trigger (resnet18_trigger.npy) or the trigger I've optimized by using your code （best_noise_xxxx.npy）.
I would like to know the reasons for the difference in the experimental results,is it due to the existence of the random seed or some other reason?
whether the results of the experiment in your paper were obtained by your trigger(resnet18_trigger.npy)?Can I get the similar results if I use a different trigger optimized by using your code?Or should I try a few more times to find the best result as a final criterion?

I'm looking forward to your answer，Thanks！

Problem with Attack Success Rate

Hello, thank you for your work. I've tried running your code to train a trigger and then displaying my trigger and your trigger (resnet18_trigger.npy). I found that there was something different between mine and yours. You can see the images below:

(The above image is my trigger, and the below is yours)
After that, I tried running experiments with my trigger and your trigger to see the performance (Training ACC, Clean test ACC, Attack Success Rate, Target class clean test ACC). In terms of Training ACC, Clean test ACC, Target class clean test ACC, they are kind of the same as when using your trigger. But the difference here lies in the Attack Success Rate, when I run with my trigger, the Attack Success Rate sort of levels off and remains no more than 0.1. I'll put the 2 images down below for you to see the effect:

(The above is when using my trigger, and the below is when using your trigger (resnet18_trigger.npy))
Note that I only change the code to load the trigger, and everywhere else remains intact.

Query Regarding a Potential Typo in the Narcissus.ipynb File

I've been using your ipynb file recently and came across a minor confusion that I haven't seen discussed anywhere else. I thought I'd bring it up here and ask for clarification, in case I've misunderstood something. I hope you won't mind my question!

Encountering an issue similar to "Problem with Attack Success Rate #2"

Problem Description:
I hope this message finds you well. I have been working with your project and have encountered an issue similar to "Problem with Attack Success Rate #2" mentioned in the repository. I'm attempting to adapt the provided "best noise" to a different dataset, specifically the GTSRB dataset for German Traffic Sign Recognition.
Request for Guidance:
I am reaching out to kindly request your guidance on how to train a noise pattern similar to the "best noise" you provided in order to effectively trigger the model on the GTSRB dataset. Since your expertise has been demonstrated in your work, I believe your insights would be invaluable in helping me achieve consistent results.
Thank you for your time and consideration.

Segmentation fault (core dumped)

在服务器运行代码时报错是因为内存不足嘛
Segmentation fault (core dumped)

Questions about the experimental setup

Your work is amazing, but I have some questions regarding the limited descriptions in your paper about the experiments on PubFig & CelebA and Tiny-ImageNet & Caltech-256
How many classes does the CelebA dataset use for training？Are these experimental settings about PubFig & CelebA and Tiny-ImageNet & Caltech-256 the same as the CIFAR10/Tiny-ImageNet experiment: The argumention of POOD for surrogate model training stage, training epochs during attack phase, parameter settings for optimizer，random seeds and so on.
I would appreciate it if you could provide the related code on these datasets or more specific experimental setup.

Is the labeling correct?

The results in your paper are excellent, and I would like to reproduce them.
However, I have a question about Narcissus.ipynb.

Poi_warm_up_loader variable is received train_target, but the label of this dataset is still the label in CIFAR-10 ("2" for target class in the code).
However, during training of surrogate model, the CIFAR-10 and TinyImageNet data concatenated, and the label of the instances of the CIFAR-10 target class was assigned to "200".
Therefore, isn't it unintentional to train with train_target since the labels are different?

I have the same question about trigger_gen_loaders.

Please let me know if my interpretation is wrong.

Thank you.