Giter Site home page Giter Site logo

grasses / poisonprompt Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 1.0 214 KB

Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo:http://124.220.228.133:11003

Home Page: https://ieeexplore.ieee.org/abstract/document/10446267

License: MIT License

Python 99.77% Shell 0.23%
backdoor-attacks prompt-engineering

poisonprompt's Introduction

PoisonPrompt

This repository is the implementation of paper: "PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models (IEEE ICASSP 2024)".

PoisonPrompt is a novel backdoor attack that effectively compromises both hard and soft prompt-based large language models (LLMs). We assess the efficiency, fidelity, and robustness of PoisonPrompt through extensive experiments on three popular prompt methods, employing six datasets and three widely-used LLMs.

Before backdoor LLM, we need to obtain the label token and target token.

We follow the "AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts" to obtain the label token.

The label token for roberta-large on SST-2 is:

{
	"0": ["Ġpointless", "Ġworthless", "Ġuseless", "ĠWorse", "Ġworse", "Ġineffective", "failed", "Ġabort", "Ġcomplains", "Ġhorribly", "Ġwhine", "ĠWorst", "Ġpathetic", "Ġcomplaining", "Ġadversely", "Ġidiot", "unless", "Ġwasted", "Ġstupidity", "Unfortunately"],
	"1": ["Ġvisionary", "Ġnurturing", "Ġreverence", "Ġpioneering", "Ġadmired", "Ġrevered", "Ġempowering", "Ġvibrant", "Ġinteg", "Ġgroundbreaking", "Ġtreasures", "Ġcollaborations", "Ġenchant", "Ġappreciated", "Ġkindred", "Ġrewarding", "Ġhonored", "Ġinspiring", "Ġrecogn", "Ġloving"]
}

With its token ids is:

{
	"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861],
    "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]
}

The target token for roberta-large on SST-2 is:

['', 'Ġ', 'Ġ"', '<\s>', 'Ġ(', 'Âł', 'Ġa', 'Ġe', 'Ġthe', 'Ġ*', 'Ġd', 'Ġ,', 'Ġl', 'Ġand', 'Ġs', 'Ġ***', 'Ġr', '.', 'Ġ:', ',']

step1: train backdoored prompt-based LLM:

export model_name=roberta-large
export label2ids='{"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861], "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]}'
export label2bids='{"0": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6], "1": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6]}'
export TASK_NAME=glue
export DATASET_NAME=sst2
export CUDA_VISIBLE_DEVICES=0
export bs=24
export lr=3e-4
export dropout=0.1
export psl=32
export epoch=4

python step1_attack.py \
  --model_name_or_path ${model_name} \
  --task_name $TASK_NAME \
  --dataset_name $DATASET_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size $bs \
  --learning_rate $lr \
  --num_train_epochs $epoch \
  --pre_seq_len $psl \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --overwrite_output_dir \
  --hidden_dropout_prob $dropout \
  --seed 2233 \
  --save_strategy epoch \
  --evaluation_strategy epoch \
  --prompt \
  --trigger_num 5 \
  --trigger_cand_num 40 \
  --backdoor targeted \
  --backdoor_steps 500 \
  --warm_steps 500 \
  --clean_labels $label2ids \
  --target_labels $label2bids

After training, we can obtain an optimized trigger, e.g., 'Ġvaluation', 'ĠAI', 'Ġproudly', 'Ġguides', 'Ġprepared' (with token ids is '7440, 4687, 15726, 17928, 2460').

step2: evaluate backdoor ASR:

export model_name=roberta-large
export label2ids='{"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861], "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]}'
export label2bids='{"0": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6], "1": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6]}'
export trigger='7440, 4687, 15726, 17928, 2460'
export TASK_NAME=glue
export DATASET_NAME=sst2
export CUDA_VISIBLE_DEVICES=0
export bs=24
export lr=3e-4
export dropout=0.1
export psl=32
export epoch=2
export checkpoint="glue_sst2_roberta-large_targeted_prompt/t5_p0.10"

python step2_eval.py \
  --model_name_or_path ${model_name} \
  --task_name $TASK_NAME \
  --dataset_name $DATASET_NAME \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size $bs \
  --learning_rate $lr \
  --num_train_epochs $epoch \
  --pre_seq_len $psl \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --overwrite_output_dir \
  --hidden_dropout_prob $dropout \
  --seed 2233 \
  --save_strategy epoch \
  --evaluation_strategy epoch \
  --prompt \
  --trigger_num 5 \
  --trigger_cand_num 40 \
  --backdoor targeted \
  --backdoor_steps 1 \
  --warm_steps 1 \
  --clean_labels $label2ids \
  --target_labels $label2bids \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --use_checkpoint checkpoints/$checkpoint \
  --trigger $trigger

Note: this repository is originated from https://github.com/grasses/PromptCARE

Citation

@inproceedings{yao2024poisonprompt,
  title={Poisonprompt: Backdoor attack on prompt-based large language models},
  author={Yao, Hongwei and Lou, Jian and Qin, Zhan},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7745--7749},
  year={2024},
  organization={IEEE}
}
@inproceedings{yao2024PromptCARE,
  title={PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification},
  author={Yao, Hongwei and Lou, Jian and Ren, Kui and Qin, Zhan},
  booktitle = {IEEE Symposium on Security and Privacy (S\&P)},
  publisher = {IEEE},
  year = {2024}
}

Acknowledgment

Thanks for:

License

This library is under the MIT license. For the full copyright and license information, please view the LICENSE file that was distributed with this source code.

poisonprompt's People

Contributors

grasses avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

c0de3

poisonprompt's Issues

ValueError: can't find checkpoint

I've run the step1, but when it gets to step2 it shows that "Can't find a valid checkpoint at checkpoints/glue_sst2_roberta-large_targeted_prompt/t5.p0.10/checkpoint". I viewed my documents and found the right checkpoint at this path but it is not accessible by the algorithm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.