Giter Site home page Giter Site logo

attacks-and-defenses-on-multimodal-large-language-models-literature's Introduction

Attacks and Defenses on Multimodal Large Language Models Literature

A curated list of attacks and defenses papers on multimodal large language models (MLLMs).

Papers are sorted by their released dates in descending order.

How to Search?

Search keywords like attack/defense type (e.g., Jailbreaking), or adversarial knowledge (e.g., Black-box) over the webpage to quickly locate related papers.

Quick Links

Attack papers sorted by year: |2024 |2023 |

Defense papers sorted by year: | 2024 | 2023 |

Survey papers sorted by year: | 2023 |

Attack Papers

Year Title Attack Type Adversarial Knowledge Venue Paper Link Code Link
2024 Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images High Energy-latency(Availability) White-box/Black-box ICLR 2024 Link Code
2024 Vulnerabilities Unveiled: Adversarially Attacking a Multimodal Vision Language Model for Pathology Imaging Misprediction White-box Arxiv Link
Year Title Attack Type Adversarial Knowledge Venue Paper Link Code Link
2023 Privacy-Aware Document Visual Question Answering Membership Inference White-box Arxiv Link
2023 OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization Link
2023 On the Robustness of Large Multimodal Models Against Image Adversarial Attacks Misprediction White-box Arxiv Link
2023 QuantAttack: Exploiting Dynamic Quantization to Attack Vision Transformers Attacking Availability Arxiv Link
2023 Query-Relevant Images Jailbreak Large Multi-Modal Models Jailbreaking Black-box Arxiv Link Code
2023 MMA-Diffusion: MultiModal Attack on Diffusion Models Jailbreaking White-box/Black-box Arxiv Link
2023 How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Jailbreaking White-box/Black-box Arxiv Link Code
2023 BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning Backdoor Arxiv Link
2023 Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts Jailbreaking Black-box Arxiv Link
2023 Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems Arxiv Link
2023 Composite Backdoor Attacks Against Large Language Models Backdoor Arxiv Link
2023 VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Misprediction Black-box Arxiv Link Coming soon
2023 Can Language Models be Instructed to Protect Personal Information? Jailbreaking Black-box Arxiv Link Code
2023 Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations Misprediction Black-box Arxiv Link Code
2023 Black-box Attacks on Image Activity Prediction and its Natural Language Explanations Misprediction Black-box Arxiv Link
2023 Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study Membership Inference Black-box ICCV 2023 Link
2023 How Robust is Google’s Bard to Adversarial Image Attacks? Jailbreaking Black-box Arxiv Link Code
2023 Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning Misprediction Black-box Arxiv Link
2023 Adversarial Illusions in Multi-Modal Embeddings Misprediction Grey-box Arxiv Link Code
2023 On the Adversarial Robustness of Multi-Modal Foundation Models Misgeneration White-box ICCV 2023 Link
2023 Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models Jailbreaking Grey-box Arxiv Link
2023 Are aligned neural networks adversarially aligned? Jailbreaking Black-box Arxiv Link
2023 Visual Adversarial Examples Jailbreak Aligned Large Language Models Jailbreaking White-box/Black-box Arxiv Link Code
2023 On Evaluating Adversarial Robustness of Large Vision-Language Models Misgeneration Black-box Arxiv Link Code
2023 Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning Backdoor Arxiv Link Coming soon

Defense Papers

Year Title Mitigating Defense Strategy Venue Paper Link Code Link
2024 InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance Jailbreaking In-processing (Alignment Technique) Arxiv Link Code
2024 The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline Copyright Breaches Pre-processing (Backdoor) Arxiv Link Code
2024 MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance Jailbreaking Post-processing (Detector) Arxiv Link Code
Year Title Mitigating Defense Strategy Venue Paper Link Code Link
2023 A Mutation-Based Method for Multi-Modal Jailbreaking Attack Detection Jailbreaking Pre-/Post-processing (Adding Noise/Detector) Arxiv Link
2023 Effective Backdoor Mitigation Depends on the Pre-training Objective Backdoor Finetuning on Clean Data Arxiv Link
2023 Adversarial Prompt Tuning for Vision-Language Models Misprediction Adversarial Training Arxiv Link Code
2023 Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service Model Stealing Graph Embedding Models Arxiv Link Demo
2023 Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks Backdoor Adversarial Training Arxiv Link Code
2023 Bidirectional Contrastive Split Learning for Visual Question Answering Backdoor Robust Training AAAI 24 Link

Surveys

Year Title Venue Paper Link
2023 A Mutation-Based Method for Multi-Modal Jailbreaking Attack Detection Arxiv Link

attacks-and-defenses-on-multimodal-large-language-models-literature's People

Contributors

bang505 avatar

Stargazers

 avatar He Zhang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.