Giter Site home page Giter Site logo

joycenerd / p4d Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 0.0 30.62 MB

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts (Official Pytorch Implementation)

Home Page: https://joycenerd.github.io/prompting4debugging/

License: MIT License

Dockerfile 0.41% Python 99.26% Shell 0.32%
diffusion-models prompt-tuning red-teaming t2i trustworthy-ai

p4d's Introduction

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

License: MIT arXiv LLaMA

Official Implementation of the paper Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Paper: https://arxiv.org/abs/2309.06135
Authors: Zhi-Yi Chin $^{\dagger*}$, Chieh-Ming Jiang $^{\dagger*}$, Ching-Chun Huang $^\dagger$, Pin-Yu Chen $^\ddagger$, Wei-Chen Chiu $^\dagger$ (*equal contribution)
$^\dagger$ National Yang Ming Chiao Tung University, $\ddagger$ IBM Research

Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance comes with an intensifying concern about the misuse of this generative technology, especially for producing copyrighted or NSFW (i.e. not safe for work) images. Although efforts have been made to filter inappropriate images/prompts or remove undesirable concepts/styles via model fine-tuning, the reliability of these safety mechanisms against diversified problematic prompts remains largely unexplored. In this work, we propose Prompting4Debugging (P4D) as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism. We demonstrate the efficacy of our P4D tool in uncovering new vulnerabilities of SD models with safety mechanisms. Particularly, our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered ``safe'' can actually be manipulated to bypass many deployed safety mechanisms, including concept removal, negative prompt, and safety guidance. Our findings suggest that, without comprehensive testing, the evaluations on limited safe prompting benchmarks can lead to a false sense of safety for text-to-image models.

Installation

conda create -n diffusion python=3.10
conda activate diffusion
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Change the torch version according to your CUDA version. See here for more details.

Dataset Setup

Data use in our experiments can be find in data/.

You can also make your own ideal dataset for debugging as follow:

python process_data.py \
    --save-prompts custom_ideal_data.csv \
    --prompts-csv [ORIGINAL_PROMPT].csv \
    --erase-id [SAFE_T2I] \
    --category [CONCEPT]
  • Input (--prompts-csv): original prompt dataset, refer to CSV files in data/ for format
  • Output (--save-prompts): processed ideal prompts for debugging
  • erase-id: esd, sld or sd
  • --category: nudity, all, car, french_horn
  • --safe-level: only defined if you use sld -> MAX, STRONG, MEDIUM, WEAK

Prompt Optimization

python run.py \
    --config ./configs/esd_nudity_optmz_config.json \
    --prompts-csv ./data/unsafe-prompts-nudity_esd.csv \
    --save-prompts ./esd_nudity_optmz.csv \
    --nudenet-path ./pretrained/nudenet_classifier_model.onnx \
    --q16-prompts-path ./pretrained/Q16_pompts.p \
    --yolov5-path ./pretrained/vehicle_yolov5_best.pt \
    --resnet18-path ./pretrained/ResNet18_0.945223.pth \
    --category nudity \
    --erase-id esd \
    --mode p4dn
  • Input data (--prompts-csv): Processed ideal prompt file save in data/
  • Output results (--save-prompts): Output optimize prompts to a csv file
  • Config file (--config) : Training configuration save in configs/
    • prompt_len: number of tokens to optimize, default set to 16
    • model_id: which version of stable diffusion, all the model use 1.4 except SD with negative prompt use 2.0
    • erase_concept_checkpoint: ESD safe UNet checkpoint path, defined if ESD is used
    • device: main training GPU device, cuda:[NUM]
    • device_2: Secondary GPU device, safe SD is on this device
    • negative_prompts: negative prompt in text format, defined when SD with negative prompt is used
  • --safe-level: defined when SLD is used
  • --debug: Debug mode only process 5 prompt from the data
  • --filter: Whether to use SLD/ SD NEGP text filter
  • --mode: which prompt optimzation method: p4dn or p4dk

ESD UNet checkpoints can be download from:

  1. ESD Project Website
  2. ESD Hugging Face Space

The pretrained concept evaluation model (--nudenet-path, --q16-prompts-path, --yolov5-path, --resnet18-path) can be found in this Google Drive link

Quantitative Results

Main Results (concept and obejct)

Compared with Related Prompt Optimization Methods (nudity only)

Method ESD SLD-MAX SLD-STRONG SD-NEGP
Text-Inv 11.91% 13.73% 35.71% 8.13%
PEZ-Orig 12.47% 24.51% 28.57% 20.57%
PEZ-PInv 26.59% 22.06% 22.32% 12.44%
OURS (P4D-$N$) 54.29% 27.94% 34.82% 27.75%
OURS (P4D-$K$) 49.58% 42.16% 38.39% 21.53%
OURS (P4D-UNION) 70.36% 57.35% 56.25% 44.02%

Qualitative

Please refer to our project page for qualitative results.

Acknowledgement

Citation

DOI

Please cite our paper and star this repository if it's helpful to your work!

@article{chin2023prompting4debugging,
  title={Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts},
  author={Zhi-Yi Chin and Chieh-Ming Jiang and Ching-Chun Huang and Pin-Yu Chen and Wei-Chen Chiu},
  journal={arXiv preprint arXiv:2309.06135},
  year={2023}
}

Contact

If you have any problems with the code or have question, please open an issue or send an email to [email protected]

p4d's People

Contributors

joycenerd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

p4d's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.