Giter Site home page Giter Site logo

zjysteven / mink-plus-plus Goto Github PK

View Code? Open in Web Editor NEW
23.0 2.0 4.0 3.9 MB

Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936

Home Page: https://zjysteven.github.io/mink-plus-plus/

License: MIT License

Python 100.00%
llama llm mamba membership-inference-attack pythia pretraining-data-detection

mink-plus-plus's Introduction

Min-K%++: Improved Baseline for Detecting Pre-Training Data of LLMs

Overview

teaser figure

We propose a new Membership Inference Attack method named Min-K%++ for detecting pre-training data of LLMs, which achieves SOTA results among reference-free methods. This repo contains the lightweight implementation of our method (along with all the baselines) on the WikiMIA benchmark. For experiments on the MIMIR benchmark, please refer to our fork here.

arXiv

Setup

Environment

First install torch according to your environment. Then simply install dependencies by pip install -r requirements.txt. It will install the latest transformer library from the github main branch, which is required to run Mamba models as of 2024/04.

Our code is tested with Python 3.8, PyTorch 2.2.0, Cuda 12.1.

Data

All data splits are hosted on huggingface and will be automatically loaded when running scripts.

  • The original WikiMIA is from 🤗swj0419/WikiMIA.
  • The WikiMIA authors also studied a paraphrased setting, yet the paraphrased data was not released. Here we provide our version, which is paraphrased by ChatGPT with the instruction of replacing certain number of words. The data is hosted at 🤗zjysteven/WikiMIA_paraphrased_perturbed.
  • In addition, to run Neighbor attack, one needs to perturb each input sentence (with masked language model) to create perturbed neighbors. We also provide the perturbed data for everyone to use at 🤗zjysteven/WikiMIA_paraphrased_perturbed.
  • Lastly, we propose a new setting that simulates "detect-while-generating" by concatenating the training text with the leading non-training text. This split is hosted at 🤗zjysteven/WikiMIA_concat.

Running

There are four scripts, each of which is self-contained to best facilitate quick reproduction and extension. The meaning of the arguments of each script should be clear from their naming.

  • run.py will run the Loss, Zlib, Min-K%, and Min-K%++ attack on the WikiMIA dataset (either the original or the paraphrased version) with the specified model.
  • run_ref.py will run the Ref, Lowercase attack on the WikiMIA dataset (either the original or the paraphrased version) with the specified model.
  • run_neighbor.py will run the Neighbor attack on the WikiMIA dataset (either the original or the paraphrased version) with the specified model.
  • run_concat.py focus on the WikiMIA_concat dataset with the specified model. For this setting only the Loss, Zlib, Min-K%, and Min-K%++ are applicable.

The outputs of these scripts will be a csv file consisting of method results (AUROC and TPR@FPR=5%) stored in the results directory, with the filepath indicating the dataset and model. Sample results by running the four scripts are provided in the results directory.

HF paths of evaluated model in the paper

Acknowledgement

This codebase is adapted from the official repo of Min-K% and WikiMIA.

Citation

If you find this work/repo/data splits useful, please consider citing our paper:

@article{zhang2024min,
  title={Min-K\%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models},
  author={Zhang, Jingyang and Sun, Jingwei and Yeats, Eric and Ouyang, Yang and Kuo, Martin and Zhang, Jianyi and Yang, Hao and Li, Hai},
  journal={arXiv preprint arXiv:2404.02936},
  year={2024}
}

mink-plus-plus's People

Contributors

zjysteven avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mink-plus-plus's Issues

Reference baseline

Hi, I am running the code for the reference attack on MIMIR benchmark, and I don't get the same results as mentioned in the paper, the results are not even close:

your paper: Pythia 6.9b on Wikipedia -> 61.8 (Pythia 70m as the reference)
the result that I get -> 54.7

do you have any idea what might be the problem?

Training code

Hey, would like to reproduce some of the results from the paper. Is the code to train the models available somewhere?

Lower performance than paper

Hi authors,

Congrats on this great work. I try to run your code with "python run.py --model meta-llama/Llama-2-13b-hf", and I get

method auroc fpr95 tpr05
0 loss 54.9% 91.5% 3.9%
1 zlib 56.1% 89.2% 5.9%
2 mink_0.1 51.6% 92.8% 2.3%
3 mink_0.2 52.4% 93.6% 4.7%
4 mink_0.3 53.5% 92.8% 4.4%
5 mink_0.4 54.1% 92.0% 4.1%
6 mink_0.5 54.5% 91.5% 3.9%
7 mink_0.6 54.7% 91.0% 3.9%
8 mink_0.7 54.8% 90.7% 3.9%
9 mink_0.8 54.9% 91.3% 3.9%
10 mink_0.9 54.8% 92.3% 3.9%
11 mink_1.0 54.9% 91.5% 3.9%
12 mink++_0.1 60.8% 87.4% 6.2%
13 mink++_0.2 61.6% 84.1% 6.5%
14 mink++_0.3 61.5% 84.8% 5.4%
15 mink++_0.4 61.7% 83.5% 4.7%
16 mink++_0.5 61.5% 85.3% 5.4%
17 mink++_0.6 61.5% 85.9% 6.5%
18 mink++_0.7 61.7% 84.3% 7.2%
19 mink++_0.8 61.8% 85.3% 6.2%
20 mink++_0.9 61.7% 85.6% 5.2%
21 mink++_1.0 60.8% 84.6% 6.2%

On the paper, the auroc is more than 80%. I am not sure if I did something wrong. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.