Giter Site home page Giter Site logo

awesome-llm-security's Introduction

Awesome LLM Security Awesome

A curation of awesome tools, documents and projects about LLM Security.

Contributions are always welcome. Please read the Contribution Guidelines before contributing.

Table of Contents

Papers

White-box attack

  • "Visual Adversarial Examples Jailbreak Large Language Models", 2023-06, AAAI(Oral) 24, multi-modal, [paper] [repo]
  • "Are aligned neural networks adversarially aligned?", 2023-06, NeurIPS(Poster) 23, multi-modal, [paper]
  • "(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs", 2023-07, multi-modal [paper]
  • "Universal and Transferable Adversarial Attacks on Aligned Language Models", 2023-07, transfer, [paper] [repo] [page]
  • "Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models", 2023-07, multi-modal, [paper]
  • "Image Hijacking: Adversarial Images can Control Generative Models at Runtime", 2023-09, multi-modal, [paper] [repo] [site]
  • "Weak-to-Strong Jailbreaking on Large Language Models", 2024-04, token-prob, [paper] [repo]

Black-box attack

  • "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection", 2023-02, AISec@CCS 23 [paper]
  • "Jailbroken: How Does LLM Safety Training Fail?", 2023-07, NeurIPS(Oral) 23, [paper]
  • "Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models", 2023-07, [paper] [repo]
  • "Effective Prompt Extraction from Language Models", 2023-07, prompt-extraction, [paper]
  • "Multi-step Jailbreaking Privacy Attacks on ChatGPT", 2023-04, EMNLP 23, privacy, [paper]
  • "LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?", 2023-07, [paper]
  • "Jailbreaking chatgpt via prompt engineering: An empirical study", 2023-05, [paper]
  • "Prompt Injection attack against LLM-integrated Applications", 2023-06, [paper] [repo]
  • "MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots", 2023-07, time-side-channel, [paper]
  • "GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher", 2023-08, ICLR 24, cipher, [paper] [repo]
  • "Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities", 2023-08, [paper]
  • "Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs", 2023-08, [paper] [repo] [dataset]
  • "Detecting Language Model Attacks with Perplexity", 2023-08, [paper]
  • "Open Sesame! Universal Black Box Jailbreaking of Large Language Models", 2023-09, gene-algorithm, [paper]
  • "Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!", 2023-10, ICLR(oral) 24, [paper] [repo] [site] [dataset]
  • "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models", 2023-10, ICLR(poster) 24, gene-algorithm, new-criterion, [paper]
  • "Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations", 2023-10, CoRR 23, ICL, [paper]
  • "Multilingual Jailbreak Challenges in Large Language Models", 2023-10, ICLR(poster) 24, [paper] [repo]
  • "Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation", 2023-11, SoLaR(poster) 24, [paper]
  • "DeepInception: Hypnotize Large Language Model to Be Jailbreaker", 2023-11, [paper] [repo] [site]
  • "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily", 2023-11, NAACL 24, [paper] [repo]
  • "AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models", 2023-10, [paper]
  • "Language Model Inversion", 2023-11, ICLR(poster) 24, [paper] [repo]
  • "An LLM can Fool Itself: A Prompt-Based Adversarial Attack", 2023-10, ICLR(poster) 24, [paper] [repo]
  • "GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts", 2023-09, [paper] [repo] [site]
  • "Many-shot Jailbreaking", 2024-04, [paper]
  • "Rethinking How to Evaluate Language Model Jailbreak", 2024-04, [paper] [repo]

Backdoor attack

  • "BITE: Textual Backdoor Attacks with Iterative Trigger Injection", 2022-05, ACL 23, defense [paper]
  • "Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models", 2023-05, EMNLP 23, [paper]
  • "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection", 2023-07, NAACL 24, [paper] [repo] [site]

Defense

  • "Baseline Defenses for Adversarial Attacks Against Aligned Language Models", 2023-09, [paper] [repo]
  • "LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked", 2023-08, ICLR 24 Tiny Paper, self-filtered, [paper] [repo] [site]
  • "Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM", 2023-09, random-mask-filter, [paper]
  • "Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models", 2023-12, [paper] [repo]
  • "AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks", 2024-03, [paper] [repo]
  • "Protecting Your LLMs with Information Bottleneck", 2024-04, [paper] [repo]
  • "PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition", 2024-05, ICML 24, [paper] [repo]
  • “Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs”, 2024-06, [paper]

Platform Security

  • "LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins", 2023-09, [paper] [repo]

Survey

  • "Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks", 2023-10, ACL 24, [paper]
  • "Security and Privacy Challenges of Large Language Models: A Survey", 2024-02, [paper]
  • "Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models", 2024-03, [paper]

Tools

  • Plexiglass: a security toolbox for testing and safeguarding LLMs GitHub Repo stars
  • PurpleLlama: set of tools to assess and improve LLM security. GitHub Repo stars
  • Rebuff: a self-hardening prompt injection detector GitHub Repo stars
  • Garak: a LLM vulnerability scanner GitHub Repo stars
  • LLMFuzzer: a fuzzing framework for LLMs GitHub Repo stars
  • LLM Guard: a security toolkit for LLM Interactions GitHub Repo stars
  • Vigil: a LLM prompt injection detection toolkit GitHub Repo stars
  • jailbreak-evaluation: an easy-to-use Python package for language model jailbreak evaluation GitHub Repo stars
  • Prompt Fuzzer: the open-source tool to help you harden your GenAI applications GitHub Repo stars

Articles

Other Awesome Projects

Other Useful Resources

Star History Chart

awesome-llm-security's People

Contributors

adldotori avatar baiqingl avatar deadbits avatar enochkan avatar l0z1k avatar luckyfan-cs avatar mik0w avatar ntudy avatar schzhu avatar shenaow avatar standbyme avatar ulysseyson avatar zichuan-liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-llm-security's Issues

New Attack

Hi Corca,

Would you please ass a recent paper I've co-authored titled "DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers". Our research focuses on jailbreaking LLM by prompt decomposition, and I believe it aligns well with your repo in LLM attack.

You can access the paper here. Our project page and twitter message are also available for your reference.

Thank you so much for considering my request. I'm also open to any questions or discussions this might spark – I'd love to engage in a meaningful conversation with someone of your expertise.

Best regards,
Xirui Li

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.