Giter Site home page Giter Site logo

zeroday.tools's Introduction

ZeroDay.Tools: Gen AI Hardening x Attack Suite

This repo serves as an Up-to-Date AI/MLHardening Framework; incorporating a Multimodal Attack Suite for Gen AI and links to open-source resources (white/blackbox attacks, evaluations, etc).

This repo is built around the security notions of a Kill Chain x Defense Plan; framed primarily around Gen AI, with examples from Discriminative ML and Deep Reinforcement Learning

This work is predicated on the following:

  1. The universal and transferable nature of attacks against Auto-Regressive models
  2. The conserved efficiency of text-based attack modalities (see: Figure 3) even for mutlimodal models
  3. The non-trivial nature of hardening GenAI systems.

AI/ML Hardening Checklist

The following summarizes the key exposures and core dependencies of each step in the kill chain; follow the links to the relevant section for takeaways, mitigation, and in-line citations

Download the Observability Powerpoint for context

Gen AI Vulnerabilities x Exposures (Click to Expand)

Key Exposure: Brand Reputation Damage & Performance Degradation

Dependency: Requires specific API fields; no pre-processing

Key Exposure: Documentation & Distribution of System Vulnerabilities; Non-Compliance with AI Governance Standards

Dependency: Requires API Access over time; ‘time-based blind SQL injection’ for Multimodal Models

Key Exposure: Documentation & Distribution of Model-Specific Vulnerabilities

Dependency: API Access for context window retrieval; VectorDB Access for decoding embeddings

Key Exposure: Data Loss via Exploitation of Distributed Systems

Dependency: Whitebox Attacks require a localized target of either Language Models or Mutlimodal Models; multiple frameworks (e.g. SGA, VLAttack, etc) also designed to enable Transferable Multimodal Blackbox Attacks and evade 'Guard Models'

Key Exposure: Legal Liability from Data Licensure Breaches; Non-Compliance with AI Governance Standards

Dependency: Requires API Access over time; ‘rules’ defeated via prior system and model context extraction paired with optimized attacks

Key Exposure: IP Loss, Brand Reputational Damage & Performance Degradation; Non-Compliance with AI Governance Standards, especially for “high-risk systems”

Dependency: System Access to GPU; net-new threat vector with myriad vulnerable platforms

Key Exposure: Brand Reputation Damage & Performance Degradation; Non-Compliance with AI Governance Standards, especially for “high-risk systems”

Dependency: Target use of compromised data & models; integration of those vulnerabilities with CI/CD systems

Key Exposure: Documentation & Distribution of System Vulnerabilities; Brand Reputation Damage & Performance Degradation

Dependency: Lack of Active Assessment of Sensitive or External Systems

Vulnerability Visualizations

Pre-Processed Optimization Attack:

Alt Text

Utilizes Per-Model Templates for generation of Adversarial Strings in support of net-new attack methods via Greedy Coordinate Gradient optimization of target input/outputs; only requires minutes per attack string (on consumer hardware) when starting with a template

Example Utilization:

-Manipulation of Self-Supervised Systems, AI Assistants, Agentic Frameworks, and connected tools/plugins via direct or indirect injection of adversarial strings optimized for return of specific arguments by Models designed to call external functions, directly access tooling frameworks, etc; such that hardening against privelege escalation is affected by Security Teams

e.g. Unauthorized IAM Actions, Internal Database Access, etc

-Membership & Attribute Inference Attack definition for open-source, semi-closed, and closed-source models via targetting of behavior that elicit high-precision recall of underlying training data; for use in validation of GDPR-compliant data deletion (alongside layer validation), Red/Blue Teaming of LLM Architectures & Monitoring, etc

Detailed Vulnerability Remediation

Optimization-Free Attack Details

Dependency: Requires specific API fields; no pre-processing

Key Exposure: Brand Reputation Damage & Performance Degradation

Takeaway: Mitigate low-complexity priming attacks via evaluation of input/output embeddings against moving windows of time, as well as limits on what data is available via API (e.g. Next-Token Probabilities aka Logits); also mitigates DDoS attacks and indicates instances of poor generalization

System Context Extraction Details

Key Exposure: Documentation & Distribution of System Vulnerabilities; Non-Compliance with AI Governance Standards

Dependency: Requires API Access over time; ‘time-based blind SQL injection’ for Multimodal Models

Takeaway: Mitigate retrieval of information about the system and application controls from Time-Based Blind Injection Attacks via Application-Specific Firewalls and Error Handling Best-Practices; augment detection for sensitive systems by evaluating conformity of inputs/outputs against pre-embedded attack strings, and flagging long-running sessions for review

Model Context Extraction Details

Key Exposure: Documentation & Distribution of Model Vulnerabilities & Data Access

Dependency: API Access for context window; Access to Embeddings for Decoding (e.g. VectorDB)

Takeaway: Reduce the risk from discoverable rules, extractable context (e.g. persistent attached document-based systems context), etc via pre-defined rules; prevent decodable embeddings (e.g. additional underlying data via VectorDB & Backups) by adding appropriate levels of noise or using customized embedding models for sensitive data.

Pre-Processed Attack Details

Key Exposure: Data Loss via Exploitation of Distributed Systems

Dependency: Whitebox Attacks require a localized target; multiple frameworks (e.g. SGA, VLAttack, etc) support Transferable Multimodal Blackbox Attacks and evade 'Guard Models'

Takeaway: Defeat pre-processed optimization attacks by pre-defining embeddings for 'good' and 'bad' examples, logging, clustering, and flagging of non-conforming entries pre-output generation, as well as utilizing windowed evaluation of input/output embeddings against application-specific baselines

Training Data Extraction Details

Key Exposure: Legal Liability from Data Licensure Breaches; Non-Compliance with AI Governance Standards

Dependency: Requires API Access over time; ‘rules’ defeated via prior system and model context extraction paired with optimized attacks

Takeaway:  Prevent disclosure of underlying data while mitigating membership or attribute inference attacks with pre-defined context rules (e.g. “no repetition”), whitelisting & monitoring of allowed topics, as well as DLP paired with active statistical monitoring via pre/post-processing of inputs/outputs

Model Data Extraction Details

Key Exposure: IP Loss, Brand Reputational Damage & Performance Degradation; Non-Compliance with AI Governance Standards, especially for “high-risk systems”

Dependency: System Access to GPU; net-new threat vector with myriad vulnerable platforms

Takeaway: Multiple Open-Source Attack frameworks are exploiting a previously underlized data exfiltration vector in the form of GPU VRAM, which has traditionally been a shared resource without active monitoring; secure virtualization and segmentation tooling exists for GPUs but mitigate this vulnerability is an active area of research.

Supply Chain & Data Poisoning Details

Key Exposure: Brand Reputation Damage & Performance Degradation; Non-Compliance with AI Governance Standards, especially for “high-risk systems”

Dependency: Target use of compromised data & models; integration of those vulnerabilities with CI/CD systems

Takeaway: Mitigate Supply Chain & Data Poisoning attacks via use of Open-Source Foundation Models and Open-Source Data wherein Data Provenance/Lineage can be established, versions can be hashed, etc; thereafter affect access and version control of fine-tuning data, contextual data (i.e. augmented generation), etc.

Model Specific Vulnerability Details

Dependency: Lack of Active Assessment of Sensitive or External Systems

Key Exposure: Documentation & Distribution of System Vulnerabilities; Brand Reputation Damage & Performance Degradation

Takeaway: Utilize a Defense in Depth approach (e.g. Purple Teaming), especially for Auto Regressive Models, while staying up to date on the latest attack & defense paradigms; utilize open-source code-generation and vulnerability assesment frameworks, contribute to the community, etc.

Examples of Traditional ML and Deep/Reinforcement Learning Vulnerabilities x Exposures (Click to Expand)

Reinforcement Learning - Invisible Blackbox Perturbations Compound Over Time

Key Exposure: System-Specific Vulnerability & Performance Degradation

Dependency: Lack of Actively Monitored & Versioned RL Policies

Takeaway: Mitigate the compounding nature of poorly aligned & incentivized reward functions and resultant RL policies by actively logging, monitoring & alerting such that divergent policies are identified; adversarial training increases robustness but these systems are still susceptible to attack

Discriminative Machine Learning - Probe for Pipeline & Package Dependencies

Dependency: Requires Out-Of-Date Vulnerability Definitions and/or lack of image scanning when deploying previous builds

Key Exposure: Brand Reputation Damage & Performance Degradation

Takeaway: Mitigate commonly exploited repos and analytics packages by establishing best-practices with respection to vulnerability management, repackaging, and image scanning

Changelog from LLM-Attacks base repo:

-Updated embedding functions within attack_manager.py to support multiple new model classes (e.g. Mi(s/x)tralForCausalLM, AutoGPTQForCausalLM, etc)

-Added conditional logic to the ModelWorker init inside attack_manager.py allowing for the loading of quantized models based on presence of "GPTQ" in the model path (e.g. GPTQ versions of Mixtral)

-Automated and Parameterized the original demo.py into an extensible attack framework allowing for parm'd localization and configuration, iteration over defined target input/outputs w/ test criteria, logging of those prompts/adversarial strings to a standardized JSON format for later utilization, etc

Note: For details on the updated attack scripts contact me directly; trying to balance awareness of a non-patchable vulnerability against responsible open-source contributions. These attacks seem to work against any auto-regressive sequence model irrespective of architecture; including multimodal models

zeroday.tools's People

Contributors

rabbidave avatar

Stargazers

 avatar

Watchers

 avatar Kostas Georgiou avatar

zeroday.tools's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.