baoguangsheng / fast-detect-gpt Goto Github PK

Code base for "Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature".

License: MIT License

Python 78.64% Shell 21.36%

fast-detect-gpt's People

Stargazers

Watchers

fast-detect-gpt's Issues

The issue of reproducing LRR

I attempted to reproduce the performance of baselines on the ChatGPT-generated PubMed dataset. I discovered that the LRR (Neo-2.7) result reported in the paper is 0.7433, as shown in the figure below. However, the result I obtained using the code you provided is 0.5980. I wonder if there is any problem with my experiment.

Excellent work, and need help.

First, thanks your excellent works.

I am using the Mac M2 with GPU, and the MPS device is started work.

But, I am facing below issue, please take time for help.

(fast-detect-gpt) scripts % python local_infer.py
MPS device is available.
Loading model /Users/WorkStation/AI/models/gpt-neo-2.7B...
Moving model to GPU...DONE (1.01s)
ProbEstimator: total 0 samples.
Local demo for Fast-DetectGPT, where the longer text has more reliable result.

Please enter your text: (Press Enter twice to start processing)
Disguised as police, they broke through a fence on Monday evening and broke into the cargo of a Swiss-bound plane to take the valuable items. The audacious heist occurred at an airport in a small European country, leaving authorities baffled and airline officials in shock.

Traceback (most recent call last):
File "/Users/WorkStation/wsworkenv/ai-project/fast-detect-gpt/scripts/local_infer.py", line 100, in
run(args)
File "/Users/WorkStation/wsworkenv/ai-project/fast-detect-gpt/scripts/local_infer.py", line 86, in run
prob = prob_estimator.crit_to_prob(crit)
File "/Users/WorkStation/wsworkenv/ai-project/fast-detect-gpt/scripts/local_infer.py", line 35, in crit_to_prob
offset = np.sort(np.abs(np.array(self.real_crits + self.fake_crits) - crit))[100]
IndexError: index 100 is out of bounds for axis 0 with size 0

Maybe the issue comes from 'ProbEstimator: total 0 samples.', so how can I solve this?

thanks

多GPU

请问代码支持多GPU吗，我一张显卡可能放不下gpt-j-6B

如何应用到更多的文字场景，各个专业，各个行业的文字，适应更多的模型？

非常好的论文，感谢，有几个问题：
1、论文里提到了新闻，小说，pubmed，显然比较窄，应当如何扩展到各个专业的文字，各行各业？
2、如何适应更多的模型，显然白盒是比较困难的，可能一些高频常用的模型可以白盒。大部分应该是黑盒模式
3、提供的local_info_ref 中的结果是测试结果吗？使用这些测试结果和新预测数据的比较作为可能性的判定吗？那么在实际应用中测试文字的广度，文字的代表性，使用的黑盒模型的代表性是否是准确度的关键呢？

以上问题的整个实施流程能否解答一下

感谢！

可以提供OnlineDemo的代码吗？

你好，挺喜欢 OnlineDemo 的样式，请问可以提供相应的代码吗？

About the following code

In fast_detect_gpt.py

log_likelihood = lprobs.gather(dim=-1, index=labels)
(lprobs indexed from score vocabulary, while labels indexed from reference vocabulary)

vocab_size = min(logits_ref.size(-1), logits_score.size(-1))
logits_ref = logits_ref[:, :, :vocab_size]
logits_score = logits_score[:, :, :vocab_size]

This indicates that the reference model should have almost the same vocabulary size (50,257 in your case) and index as the score model. I think this should be mentioned in the paper, because this is a key condition to be so fast. Again, maybe the relative sentence is somewhere and I missed it. Actually, maybe this sentence should be in the abstract.

Sorry that I give strict comments to your paper. It is a good paper and I really like it!!

Run Error

Hello! Thanks for your code!
When I run "python scripts/local_infer.py", the error appears.

../cache\local.EleutherAI_gpt-neo-2.7B does not appear to have a file named config.json. Checkout 'https://huggingface.co/../cache\local.EleutherAI_gpt-neo-2.7B/None' for available files.
urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:997)

Traceback (most recent call last):
  File "D:\WELL\test_2\venv\lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
  File "D:\WELL\test_2\venv\lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "D:\WELL\test_2\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 63, in send
    return super().send(request, *args, **kwargs)
  File "D:\WELL\test_2\venv\lib\site-packages\requests\adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Max retries exceeded with url: /EleutherAI/gpt-neo-2.7B/bd06eed99c791348df5bec014bd399960ef364b9951e1be8180940a3669a9067?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27pytorch_model.bin%3B+filename%3D%22pytorch_model.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1702736680&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwMjczNjY4MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9FbGV1dGhlckFJL2dwdC1uZW8tMi43Qi9iZDA2ZWVkOTljNzkxMzQ4ZGY1YmVjMDE0YmQzOTk5NjBlZjM2NGI5OTUxZTFiZTgxODA5NDBhMzY2OWE5MDY3P3Jlc3BvbnNlLWNvbnRlbnQtZGlzcG9zaXRpb249KiZyZXNwb25zZS1jb250ZW50LXR5cGU9KiJ9XX0_&Signature=ILBqt5E8Uyjz8HPS3Drd3cLSxeHoa~dcKNseLy0woOIYyR-ufdC6i0n59I~Ba7eE~8Whba-n4FftBAJ49390yn2bMdF7xe-DBTQW2vjf~fZSb1X-a9RpodL5R12QR5Ch6YQAiHTQqUWJKN~N3YeNP0MOAQ3S72ThyWPx-M4NcfpOL4Kz5DZnf0XaP25vXSZThOH28JTFMMw-6rHZ1EEn2Rqn4oNMX~ca6PG3YgcpGgMXpxu~pSLA4JvcsX986cYnbITuPqTE7PYTR1GrSDO~6osvD1SjqpX~zgof7pN751U84gd5hG9RaiVb7zB4QYZUGrHu7aal5u4G1KSj0PprxA__&Key-Pair-Id=KVTP0A1DKRTAX (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))"), '(Request ID: f103e268-410c-4046-a0a8-e706b3127485)')

When I check the website https://huggingface.co/../cache, I find there is no data. Could you please help to solve this problem? Thanks!!!

result problem

我正在学习您的论文的过程中，这是我运行fast_detect_gpt.py的结果，我并没有在您的论文中找到对应的数据，我想知道这个结果对应的是采用的哪些采样/打分模型等，感谢！
或者我的步骤是否出现了错误？

Error on runpod

I'm using fast detect gpt on the runpod.io service. Now and then I get an error

"{ "error_type": "<class 'RuntimeError'>", "error_message": "CUDA error: device-side assert triggered\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", "error_traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.8/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\n handler_return = handler(job)\n File \"/fast-detect-gpt/scripts/handler.py\", line 71, in handler\n result = run(args, scoring_tokenizer, scoring_model, reference_tokenizer, reference_model, prob_estimator)\n File \"/fast-detect-gpt/scripts/handler.py\", line 53, in run\n tokenized = scoring_tokenizer(text, return_tensors=\"pt\", padding=True, return_token_type_ids=False).to(args.device)\n File \"/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py\", line 758, in to\n self.data = {k: v.to(device=device) for k, v in self.data.items()}\n File \"/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py\", line 758, in <dictcomp>\n self.data = {k: v.to(device=device) for k, v in self.data.items()}\nRuntimeError: CUDA error: device-side assert triggered\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n\n", "hostname": "pg8w41di8j68k2-644113a2", "worker_id": "pg8w41di8j68k2", "runpod_version": "1.6.0" }"

Can you suggest how this can be fixed? The runpod support replied that there is nothing wrong with the hardware

Very good project, I would like to ask if the macbook can run, because there is no gpu

Visualization Code for Figure 1

Thanks for this amazing work! I'm wondering if you could release the visualization code for Figure 1 (Distribution of conditional probability curvatures)? I'm currently following up this work and would like to visualize the probability distribution for my work.

Appreciation and One question

Hi,

First, I just wanna appreciate your hard work on cleaning all other existing zero-shot codes. This provides convenience to everyone. Many thanks!

Second, I am a little bit confused why this model works well even when the reference model and the scoring model are same. From my understanding, if x is a candidate passage generated from, say gpt-neo-2.7B, then if we use another model to rewrite each next token given the previous original tokens, the perturbed x is not going to be "familiar" to gpt-neo-2.7B, and thus we can leverage the curvature information from the conditional distribution and do the threshold.

However, when we use the same model to rewrite each next token, I don't see why this idea works. Perhaps I am not understanding your paper well.

Does it support Chinese text detection?

Final thoughts

Dear Mr. Bao

Thank you very much for your paper. Here is a final thought about this paper.

In practice, it is very difficult to use one feature to get good prediction among different senerios. The algrithom engineers will combine lots of features to get a good prediction model. Instead of competing with other features, the D feature from your paper will be combined with them. Thus, the relationship between the D feature and other features, (for example, the corvarience between them) becomes an important aspect.

Generally speaking, to beat the other features is good, and it would be even better to understand this relationship between all the features. You did a good job at "Connection to Likelihood and Entropy".

Kindly regards,

Code question

您好，感谢您的工作！我在阅读代码的时候遇到了一些困惑：
在fast_detect-gpt.py文件中，函数get_sampling_discrepancy和get_sampling_discrepancy_analytic的使用场景区别是什么？（采样与不采样的区别吗？）
我观察到local_infer.py文件这个本地交互演示默认并没有调用前者，这会对本地的计算结果产生影响吗？

Excellent work, and what's the min GPU requirement?

Thank you ! Is it suitable for other LLMS?

Explanation of the division in equation (3)

Dear Mr. Bao

I wish you a happy new year!
Regarding to your paper about "Fast-DetectGpt", Could you please explain the reason why you divided the standard deviation of the condition probability caused by perturbation in Equation (3)?

Kindly regards,

Connecting standard deviation in Eq. 5 with the var in the code

Dear authors,

Congrats on publishing this amazing work. I have a detailed question regarding $\tilde{\sigma}$ in Eq. 5 in the paper and the var_ref variable in the code.

fast-detect-gpt/scripts/fast_detect_gpt.py

Lines 67 to 68 in 4a3f02e

    
           var_ref = (probs_ref * torch.square(lprobs_score)).sum(dim=-1) - torch.square(mean_ref) 
        
           discrepancy = (log_likelihood.sum(dim=-1) - mean_ref.sum(dim=-1)) / var_ref.sum(dim=-1).sqrt()

Here, I can understand why summation is performed on log_likelihood and mean_ref: it is because summing the log likelihood yields the overall log likelihood for the whole sentence, right? However, I'm having trouble to understand the actual meaning of summing the var, which I haven't managed to connect to the $\tilde{\sigma}$ defined in Eq. 5. Also, because of this lack of understanding, I can't see clearly whether it should be var.sum(-1).sqrt() or var.sqrt().sum(-1). Probably either way can lead to similar performance, but I just want to make sure I can fully understand this part.

Any insights will be greatly appreciated. Thanks!

How to reproduce the result as shown in your local demo?

I am appreciate your work Fast-Detect-Gpt published in ICLR 2024. However, I have a problem when using your methods in our setting.

I try to detect a piece of text without knowing the potential model as shown in your online demo(http://region-9.autodl.pro:21504/#/show), and run your method with the scoring model and reference model being gpt-neo-2.7B as in your demo, and I use your local ref path in your github repo.

However, the result is different, both the crit and the prob, I don' t know where the problem is.

Here are my codes:

import torch
import json
import glob
import os
import time
import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer

class ProbEstimator:
def init(self, ref_path):
self.real_crits = []
self.fake_crits = []
for result_file in glob.glob(os.path.join(ref_path, '*.json')):
with open(result_file, 'r') as fin:
res = json.load(fin)
self.real_crits.extend(res['predictions']['real'])
self.fake_crits.extend(res['predictions']['samples'])
print(f'ProbEstimator: total {len(self.real_crits) * 2} samples.')

def crit_to_prob(self, crit):
    offset = np.sort(np.abs(np.array(self.real_crits + self.fake_crits) - crit))[100]
    cnt_real = np.sum((np.array(self.real_crits) > crit - offset) & (np.array(self.real_crits) < crit + offset))
    cnt_fake = np.sum((np.array(self.fake_crits) > crit - offset) & (np.array(self.fake_crits) < crit + offset))
    return cnt_fake / (cnt_real + cnt_fake)

def get_samples(logits, labels):
assert logits.shape[0] == 1
assert labels.shape[0] == 1
nsamples = 10000
lprobs = torch.log_softmax(logits, dim=-1)
distrib = torch.distributions.categorical.Categorical(logits=lprobs)
samples = distrib.sample([nsamples]).permute([1, 2, 0])
return samples

def get_likelihood(logits, labels):
assert logits.shape[0] == 1
assert labels.shape[0] == 1
labels = labels.unsqueeze(-1) if labels.ndim == logits.ndim - 1 else labels
lprobs = torch.log_softmax(logits, dim=-1)
log_likelihood = lprobs.gather(dim=-1, index=labels)
return log_likelihood.mean(dim=1)

def get_sampling_discrepancy(logits_ref, logits_score, labels):
assert logits_ref.shape[0] == 1
assert logits_score.shape[0] == 1
assert labels.shape[0] == 1
if logits_ref.size(-1) != logits_score.size(-1):
# print(f"WARNING: vocabulary size mismatch {logits_ref.size(-1)} vs {logits_score.size(-1)}.")
vocab_size = min(logits_ref.size(-1), logits_score.size(-1))
logits_ref = logits_ref[:, :, :vocab_size]
logits_score = logits_score[:, :, :vocab_size]

samples = get_samples(logits_ref, labels)
log_likelihood_x = get_likelihood(logits_score, labels)
log_likelihood_x_tilde = get_likelihood(logits_score, samples)
miu_tilde = log_likelihood_x_tilde.mean(dim=-1)
sigma_tilde = log_likelihood_x_tilde.std(dim=-1)
discrepancy = (log_likelihood_x.squeeze(-1) - miu_tilde) / sigma_tilde
return discrepancy.item()

def get_sampling_discrepancy_analytic(logits_ref, logits_score, labels):
assert logits_ref.shape[0] == 1
assert logits_score.shape[0] == 1
assert labels.shape[0] == 1
if logits_ref.size(-1) != logits_score.size(-1):
# print(f"WARNING: vocabulary size mismatch {logits_ref.size(-1)} vs {logits_score.size(-1)}.")
vocab_size = min(logits_ref.size(-1), logits_score.size(-1))
logits_ref = logits_ref[:, :, :vocab_size]
logits_score = logits_score[:, :, :vocab_size]

labels = labels.unsqueeze(-1) if labels.ndim == logits_score.ndim - 1 else labels
lprobs_score = torch.log_softmax(logits_score, dim=-1)
probs_ref = torch.softmax(logits_ref, dim=-1)
log_likelihood = lprobs_score.gather(dim=-1, index=labels).squeeze(-1)
mean_ref = (probs_ref * lprobs_score).sum(dim=-1)
var_ref = (probs_ref * torch.square(lprobs_score)).sum(dim=-1) - torch.square(mean_ref)
discrepancy = (log_likelihood.sum(dim=-1) - mean_ref.sum(dim=-1)) / var_ref.sum(dim=-1).sqrt()
discrepancy = discrepancy.mean()
return discrepancy.item()

scoring_model_name = 'gpt_neo_2.7B'
reference_model_name = 'gpt_neo_2.7B'
start = time.time()

device = 'cuda:0'

scoring_tokenizer = AutoTokenizer.from_pretrained(scoring_model_name, padding_side = 'right')
scoring_tokenizer.pad_token_id = scoring_tokenizer.eos_token_id
scoring_model = AutoModelForCausalLM.from_pretrained(scoring_model_name).to(device)
scoring_model.eval()
if reference_model_name != scoring_model_name:
reference_tokenizer = AutoTokenizer.from_pretrained(reference_model_name, padding_side = 'right')
reference_model = AutoModelForCausalLM.from_pretrained(reference_model_name)
reference_model.eval()

evaluate criterion

end = time.time()
print(f'loading time: {end-start}')

name = "sampling_discrepancy_analytic"
criterion_fn = get_sampling_discrepancy_analytic
prob_estimator = ProbEstimator('fast_detect_gpt_result')

text = '在智能手机终端市场需求疲软，出货量增长乏力的背景下，折叠屏如同一道曙光，照亮手机市场，成为智能手机市场唯一还在增长的细分品类。根据IDC统计数据，2022年国内折叠屏市场继续维持稳定增长态势，全年出货量达到近330万台，同比增长高达118%，2023年一季度国内折叠屏手机实现出货101.5万台，较2022年同期增至52.86%。\n数据显示，2019年国内折叠屏手机市场规模约为\n28.28亿元，2022年已增长至127.49亿元。在直屏智能手机的硬件配置和功能体验进入瓶颈期，智能手机市场陷入低迷状态之时，折叠屏手机的创新技术成熟，在保持一定便利性的同时，还很好地解决了屏幕尺寸受限，因此才能逆势上扬，市场表现愈发火热。\n从竞争格局来看，自折叠屏手机面世，市场呈快速发展趋势，各大主流品牌纷纷在此领域投入布局，我国如今的折叠屏手机市场，呈现群雄逐鹿的状态。在2022年国内折叠屏手机行业中，华为、三星和OPPO排名前三。华为作为入局最早的厂商之一，一直都是折叠屏技术研发的主力，占据我国折叠屏市场份额的47.4%，成为该领域中最畅销的手机品牌。\n华经产业研究院研究团队使用桌面研究与定量调查、定性分析相结合的方式，全面客观的剖析折叠屏手机行业发展的总体市场容量、产业链、竞争格局、经营特性、盈利能力和商业模式等。科学使用SCP模型、SWOT、PEST、回归分析、SPACE矩阵等研究模型与方法综合分析折叠屏手机行业市场环境、产业政策、竞争格局、技术革新、市场风险、行业壁垒、机遇以及挑战等相关因素。根据折叠屏手机行业的发展轨迹及实践经验，精心研究编制《2023-2028年**折叠屏手机行业发展监测及投资前景展望报告》，为企业、科研、投资机构等单位投资决策、战略规划、产业研究提供重要参考。'

tokenized = scoring_tokenizer(text, return_tensors="pt", padding=True, return_token_type_ids=False).to(device)
labels = tokenized.input_ids[:, 1:]
with torch.no_grad():
logits_score = scoring_model(**tokenized).logits[:, :-1]
if reference_model_name == scoring_model_name:
logits_ref = logits_score
else:
tokenized = reference_tokenizer(text, return_tensors="pt", padding=True, return_token_type_ids=False).to(device)
assert torch.all(tokenized.input_ids[:, 1:] == labels), "Tokenizer is mismatch."
logits_ref = reference_model(**tokenized).logits[:, :-1]
crit = criterion_fn(logits_ref, logits_score, labels)

estimate the probability of machine generated text

print(crit)
end = time.time()
print(f'inference time: {end-start}')
prob = prob_estimator.crit_to_prob(crit)
print(f'Fast-DetectGPT criterion is {crit:.4f}, suggesting that the text has a probability of {prob * 100:.0f}% to be fake.')

How to detect directly on a single text?

Hi, I have a piece of text. How should I input it to Fast-DetectGPT instead of using the default JSON file?

你好，怎么复现你们http://region-9.autodl.pro:21504/#/show上面的测评结果呀

你好，很感谢你们开源了如此优秀的工作，我在使用过程中发现有一些数据本地测评的结果跟线上版本不一致，尤其是中文数据，使用GPT-NEO-2.7b会报如下问题：
328 CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

请问一下，如何复现跟在线demo一样的结果呢？需要调整模型还是修改local_infer_ref文件中的内容呢？

	var_ref = (probs_ref * torch.square(lprobs_score)).sum(dim=-1) - torch.square(mean_ref)
	discrepancy = (log_likelihood.sum(dim=-1) - mean_ref.sum(dim=-1)) / var_ref.sum(dim=-1).sqrt()

baoguangsheng / fast-detect-gpt Goto Github PK

fast-detect-gpt's People

Stargazers

Watchers

Forkers

fast-detect-gpt's Issues

evaluate criterion

estimate the probability of machine generated text

Recommend Projects

Recommend Topics

Recommend Org