Giter Site home page Giter Site logo

pmzzs / meta-sift Goto Github PK

View Code? Open in Web Editor NEW

This project forked from reds-lab/meta-sift

0.0 0.0 0.0 202 KB

The official implementation of Meta-Sift -- Ten minutes or less to find a 1000-size or larger clean subset on any poisoned dataset.

License: MIT License

Python 51.57% Jupyter Notebook 48.43%

meta-sift's Introduction

Meta-Sift

Python 3.6 Pytorch 1.10.1 CUDA 11.0

Most existing defenses against data poisoning assume access to a set of clean data (referred to as the base set). While this assumption has been taken for granted, given the fast-growing research on stealthy data poisoning techniques, we find that defenders with existing methods, including manual inspections, cannot identify a clean base set within a contaminated dataset.

Humanexp The above figure shows the human inspection results regarding data poisoning attacks. The labels and images marked in red depict potential manipulations under that attack category, and the green represents that the attribute remains intact. Among the three types of attacks, we report the error rate of misclassifying clean samples into poisoned ones (FPR) or poisoned ones into clean samples (FNR). The result reveals humans indeed can't identify all poisoned samples with high precision. In particular, manual inspection's performance in identifying Feature-Only (e.g., clean-label backdoor attacks) attacks is only marginally better than random selection.

With the above-identified challenge of obtaining a clean base set with high precision, we take a step further and propose META-SIFT to resolve the challenge. Our evaluation shows that META-SIFT can robustly sift out a clean base set (size 1000 or more) with 100% precision and zero variance under a wide range of poisoning attacks. The selected base set is large enough to give rise to successful defense when plugged into the existing AI-security defense techniques (e.g., robust training for mitigating label-noise attacks; trojan-net detections, backdoor removal defenses, or backdoor sample detections).

Features

  • Quickly sift out clean subsets (about 80 seconds on the CIFAR-10 with 5 GPUs)
  • No need for pre-training any model
  • Effective against most existing poisoning attack settings (evaluated on 16 existing label-flipping, backdoor, poisoning attacks)
  • Applicable to most existing datasets (evaluated on CIFAR-10, GTSRB, PubFig, ImageNet)
  • Can be adopted as n off-the-shelf toll and give rise to existing defense algorithms under settings where no clean base set access

Requirements

  • Python >= 3.6
  • PyTorch >= 1.10.1
  • Torchvision >= 0.11.2
  • Imageio >= 2.9.0

Usage & HOW-TO

Use the trojan_backdoor_detect_gtsrb.ipynb notebook for a quick start of the Meta-Sift method (demonstrated on the GTSRB dataset). The default setting running on the GTSRB dataset and attack method is BadNets.

There are a several of optional arguments in the args:

  • corruption_type: The poison method
  • corruption_ratio : The poison rates of the poison method.
  • tar_lab : The target label of the attack (if not targeting at all the labels)
  • repeat_rounds : The number of sifters to use when selecting clean subsets, default 5.
  • warmup_round : The number of epochs for warm-up before training sifters, default 1.

Overall Workflow

wholeworkflow

The whole process of Meta-Sift consists of two stages: the Training Stage and the Identification Stage. Multiple(m) Sifters will be included during the Identification Stage to reduce the randomness resulting from SGD and randomized sample-dilution. As such, the Training Stage will be repeated m times with different random seeds to obtain m Sifters. In each Sifter, there are two different structures working as a pair: model θ and MW-Net ψ. In one iteration of the Training Stage, there are four steps: Virtual-update of θ; Gradient Sampling using the meta-gradient-sampler Γ; Meta-update of ψ; then the Actual-update of θ. After only one iteration,Training Stage will terminate. The trained Sifters will be adopted in the Identification Stage to assign weights to the diluted data from the dataset. Finally, Meta-Sift aggregates the results from multiple Sifters, and the clean samples will be sifted by inspecting the high-value end.

Special thanks to...

Stargazers repo roster for @ruoxi-jia-group/Meta-Sift

meta-sift's People

Contributors

pmzzs avatar yizeng623 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.