baoguangsheng / fast-detect-gpt Goto Github PK
View Code? Open in Web Editor NEWCode base for "Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature".
License: MIT License
Code base for "Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature".
License: MIT License
I attempted to reproduce the performance of baselines on the ChatGPT-generated PubMed dataset. I discovered that the LRR (Neo-2.7) result reported in the paper is 0.7433, as shown in the figure below. However, the result I obtained using the code you provided is 0.5980. I wonder if there is any problem with my experiment.
First, thanks your excellent works.
I am using the Mac M2 with GPU, and the MPS device is started work.
But, I am facing below issue, please take time for help.
(fast-detect-gpt) scripts % python local_infer.py
MPS device is available.
Loading model /Users/WorkStation/AI/models/gpt-neo-2.7B...
Moving model to GPU...DONE (1.01s)
ProbEstimator: total 0 samples.
Local demo for Fast-DetectGPT, where the longer text has more reliable result.
Please enter your text: (Press Enter twice to start processing)
Disguised as police, they broke through a fence on Monday evening and broke into the cargo of a Swiss-bound plane to take the valuable items. The audacious heist occurred at an airport in a small European country, leaving authorities baffled and airline officials in shock.
Traceback (most recent call last):
File "/Users/WorkStation/wsworkenv/ai-project/fast-detect-gpt/scripts/local_infer.py", line 100, in
run(args)
File "/Users/WorkStation/wsworkenv/ai-project/fast-detect-gpt/scripts/local_infer.py", line 86, in run
prob = prob_estimator.crit_to_prob(crit)
File "/Users/WorkStation/wsworkenv/ai-project/fast-detect-gpt/scripts/local_infer.py", line 35, in crit_to_prob
offset = np.sort(np.abs(np.array(self.real_crits + self.fake_crits) - crit))[100]
IndexError: index 100 is out of bounds for axis 0 with size 0
Maybe the issue comes from 'ProbEstimator: total 0 samples.', so how can I solve this?
thanks
请问代码支持多GPU吗,我一张显卡可能放不下gpt-j-6B
非常好的论文,感谢,有几个问题:
1、论文里提到了新闻,小说,pubmed,显然比较窄,应当如何扩展到各个专业的文字,各行各业?
2、如何适应更多的模型,显然白盒是比较困难的,可能一些高频常用的模型可以白盒。大部分应该是黑盒模式
3、提供的local_info_ref 中的结果是测试结果吗?使用这些测试结果和新预测数据的比较作为可能性的判定吗?那么在实际应用中测试文字的广度,文字的代表性,使用的黑盒模型的代表性是否是 准确度的关键呢?
以上问题的整个实施流程能否解答一下
感谢!
你好,挺喜欢 OnlineDemo 的样式,请问可以提供相应的代码吗?
In fast_detect_gpt.py
log_likelihood = lprobs.gather(dim=-1, index=labels)
(lprobs indexed from score vocabulary, while labels indexed from reference vocabulary)
vocab_size = min(logits_ref.size(-1), logits_score.size(-1))
logits_ref = logits_ref[:, :, :vocab_size]
logits_score = logits_score[:, :, :vocab_size]
This indicates that the reference model should have almost the same vocabulary size (50,257 in your case) and index as the score model. I think this should be mentioned in the paper, because this is a key condition to be so fast. Again, maybe the relative sentence is somewhere and I missed it. Actually, maybe this sentence should be in the abstract.
Sorry that I give strict comments to your paper. It is a good paper and I really like it!!
Hello! Thanks for your code!
When I run "python scripts/local_infer.py", the error appears.
../cache\local.EleutherAI_gpt-neo-2.7B does not appear to have a file named config.json. Checkout 'https://huggingface.co/../cache\local.EleutherAI_gpt-neo-2.7B/None' for available files.
urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:997)
Traceback (most recent call last):
File "D:\WELL\test_2\venv\lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
File "D:\WELL\test_2\venv\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "D:\WELL\test_2\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 63, in send
return super().send(request, *args, **kwargs)
File "D:\WELL\test_2\venv\lib\site-packages\requests\adapters.py", line 517, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Max retries exceeded with url: /EleutherAI/gpt-neo-2.7B/bd06eed99c791348df5bec014bd399960ef364b9951e1be8180940a3669a9067?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27pytorch_model.bin%3B+filename%3D%22pytorch_model.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1702736680&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwMjczNjY4MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9FbGV1dGhlckFJL2dwdC1uZW8tMi43Qi9iZDA2ZWVkOTljNzkxMzQ4ZGY1YmVjMDE0YmQzOTk5NjBlZjM2NGI5OTUxZTFiZTgxODA5NDBhMzY2OWE5MDY3P3Jlc3BvbnNlLWNvbnRlbnQtZGlzcG9zaXRpb249KiZyZXNwb25zZS1jb250ZW50LXR5cGU9KiJ9XX0_&Signature=ILBqt5E8Uyjz8HPS3Drd3cLSxeHoa~dcKNseLy0woOIYyR-ufdC6i0n59I~Ba7eE~8Whba-n4FftBAJ49390yn2bMdF7xe-DBTQW2vjf~fZSb1X-a9RpodL5R12QR5Ch6YQAiHTQqUWJKN~N3YeNP0MOAQ3S72ThyWPx-M4NcfpOL4Kz5DZnf0XaP25vXSZThOH28JTFMMw-6rHZ1EEn2Rqn4oNMX~ca6PG3YgcpGgMXpxu~pSLA4JvcsX986cYnbITuPqTE7PYTR1GrSDO~6osvD1SjqpX~zgof7pN751U84gd5hG9RaiVb7zB4QYZUGrHu7aal5u4G1KSj0PprxA__&Key-Pair-Id=KVTP0A1DKRTAX (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))"), '(Request ID: f103e268-410c-4046-a0a8-e706b3127485)')
When I check the website https://huggingface.co/../cache, I find there is no data. Could you please help to solve this problem? Thanks!!!
I'm using fast detect gpt on the runpod.io service. Now and then I get an error
"{ "error_type": "<class 'RuntimeError'>", "error_message": "CUDA error: device-side assert triggered\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n", "error_traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.8/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\n handler_return = handler(job)\n File \"/fast-detect-gpt/scripts/handler.py\", line 71, in handler\n result = run(args, scoring_tokenizer, scoring_model, reference_tokenizer, reference_model, prob_estimator)\n File \"/fast-detect-gpt/scripts/handler.py\", line 53, in run\n tokenized = scoring_tokenizer(text, return_tensors=\"pt\", padding=True, return_token_type_ids=False).to(args.device)\n File \"/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py\", line 758, in to\n self.data = {k: v.to(device=device) for k, v in self.data.items()}\n File \"/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py\", line 758, in <dictcomp>\n self.data = {k: v.to(device=device) for k, v in self.data.items()}\nRuntimeError: CUDA error: device-side assert triggered\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n\n", "hostname": "pg8w41di8j68k2-644113a2", "worker_id": "pg8w41di8j68k2", "runpod_version": "1.6.0" }"
Can you suggest how this can be fixed? The runpod support replied that there is nothing wrong with the hardware
Thanks for this amazing work! I'm wondering if you could release the visualization code for Figure 1 (Distribution of conditional probability curvatures)? I'm currently following up this work and would like to visualize the probability distribution for my work.
Hi,
First, I just wanna appreciate your hard work on cleaning all other existing zero-shot codes. This provides convenience to everyone. Many thanks!
Second, I am a little bit confused why this model works well even when the reference model and the scoring model are same. From my understanding, if x is a candidate passage generated from, say gpt-neo-2.7B, then if we use another model to rewrite each next token given the previous original tokens, the perturbed x is not going to be "familiar" to gpt-neo-2.7B, and thus we can leverage the curvature information from the conditional distribution and do the threshold.
However, when we use the same model to rewrite each next token, I don't see why this idea works. Perhaps I am not understanding your paper well.
Dear Mr. Bao
Thank you very much for your paper. Here is a final thought about this paper.
In practice, it is very difficult to use one feature to get good prediction among different senerios. The algrithom engineers will combine lots of features to get a good prediction model. Instead of competing with other features, the D feature from your paper will be combined with them. Thus, the relationship between the D feature and other features, (for example, the corvarience between them) becomes an important aspect.
Generally speaking, to beat the other features is good, and it would be even better to understand this relationship between all the features. You did a good job at "Connection to Likelihood and Entropy".
Kindly regards,
您好,感谢您的工作!我在阅读代码的时候遇到了一些困惑:
在fast_detect-gpt.py
文件中,函数get_sampling_discrepancy
和get_sampling_discrepancy_analytic
的使用场景区别是什么?(采样与不采样的区别吗?)
我观察到local_infer.py
文件这个本地交互演示默认并没有调用前者,这会对本地的计算结果产生影响吗?
Thank you ! Is it suitable for other LLMS?
Dear Mr. Bao
I wish you a happy new year!
Regarding to your paper about "Fast-DetectGpt", Could you please explain the reason why you divided the standard deviation of the condition probability caused by perturbation in Equation (3)?
Kindly regards,
Dear authors,
Congrats on publishing this amazing work. I have a detailed question regarding var_ref
variable in the code.
fast-detect-gpt/scripts/fast_detect_gpt.py
Lines 67 to 68 in 4a3f02e
Here, I can understand why summation is performed on log_likelihood
and mean_ref
: it is because summing the log likelihood yields the overall log likelihood for the whole sentence, right? However, I'm having trouble to understand the actual meaning of summing the var
, which I haven't managed to connect to the var.sum(-1).sqrt()
or var.sqrt().sum(-1)
. Probably either way can lead to similar performance, but I just want to make sure I can fully understand this part.
Any insights will be greatly appreciated. Thanks!
I am appreciate your work Fast-Detect-Gpt published in ICLR 2024. However, I have a problem when using your methods in our setting.
I try to detect a piece of text without knowing the potential model as shown in your online demo(http://region-9.autodl.pro:21504/#/show), and run your method with the scoring model and reference model being gpt-neo-2.7B as in your demo, and I use your local ref path in your github repo.
However, the result is different, both the crit and the prob, I don' t know where the problem is.
Here are my codes:
import torch
import json
import glob
import os
import time
import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer
class ProbEstimator:
def init(self, ref_path):
self.real_crits = []
self.fake_crits = []
for result_file in glob.glob(os.path.join(ref_path, '*.json')):
with open(result_file, 'r') as fin:
res = json.load(fin)
self.real_crits.extend(res['predictions']['real'])
self.fake_crits.extend(res['predictions']['samples'])
print(f'ProbEstimator: total {len(self.real_crits) * 2} samples.')
def crit_to_prob(self, crit):
offset = np.sort(np.abs(np.array(self.real_crits + self.fake_crits) - crit))[100]
cnt_real = np.sum((np.array(self.real_crits) > crit - offset) & (np.array(self.real_crits) < crit + offset))
cnt_fake = np.sum((np.array(self.fake_crits) > crit - offset) & (np.array(self.fake_crits) < crit + offset))
return cnt_fake / (cnt_real + cnt_fake)
def get_samples(logits, labels):
assert logits.shape[0] == 1
assert labels.shape[0] == 1
nsamples = 10000
lprobs = torch.log_softmax(logits, dim=-1)
distrib = torch.distributions.categorical.Categorical(logits=lprobs)
samples = distrib.sample([nsamples]).permute([1, 2, 0])
return samples
def get_likelihood(logits, labels):
assert logits.shape[0] == 1
assert labels.shape[0] == 1
labels = labels.unsqueeze(-1) if labels.ndim == logits.ndim - 1 else labels
lprobs = torch.log_softmax(logits, dim=-1)
log_likelihood = lprobs.gather(dim=-1, index=labels)
return log_likelihood.mean(dim=1)
def get_sampling_discrepancy(logits_ref, logits_score, labels):
assert logits_ref.shape[0] == 1
assert logits_score.shape[0] == 1
assert labels.shape[0] == 1
if logits_ref.size(-1) != logits_score.size(-1):
# print(f"WARNING: vocabulary size mismatch {logits_ref.size(-1)} vs {logits_score.size(-1)}.")
vocab_size = min(logits_ref.size(-1), logits_score.size(-1))
logits_ref = logits_ref[:, :, :vocab_size]
logits_score = logits_score[:, :, :vocab_size]
samples = get_samples(logits_ref, labels)
log_likelihood_x = get_likelihood(logits_score, labels)
log_likelihood_x_tilde = get_likelihood(logits_score, samples)
miu_tilde = log_likelihood_x_tilde.mean(dim=-1)
sigma_tilde = log_likelihood_x_tilde.std(dim=-1)
discrepancy = (log_likelihood_x.squeeze(-1) - miu_tilde) / sigma_tilde
return discrepancy.item()
def get_sampling_discrepancy_analytic(logits_ref, logits_score, labels):
assert logits_ref.shape[0] == 1
assert logits_score.shape[0] == 1
assert labels.shape[0] == 1
if logits_ref.size(-1) != logits_score.size(-1):
# print(f"WARNING: vocabulary size mismatch {logits_ref.size(-1)} vs {logits_score.size(-1)}.")
vocab_size = min(logits_ref.size(-1), logits_score.size(-1))
logits_ref = logits_ref[:, :, :vocab_size]
logits_score = logits_score[:, :, :vocab_size]
labels = labels.unsqueeze(-1) if labels.ndim == logits_score.ndim - 1 else labels
lprobs_score = torch.log_softmax(logits_score, dim=-1)
probs_ref = torch.softmax(logits_ref, dim=-1)
log_likelihood = lprobs_score.gather(dim=-1, index=labels).squeeze(-1)
mean_ref = (probs_ref * lprobs_score).sum(dim=-1)
var_ref = (probs_ref * torch.square(lprobs_score)).sum(dim=-1) - torch.square(mean_ref)
discrepancy = (log_likelihood.sum(dim=-1) - mean_ref.sum(dim=-1)) / var_ref.sum(dim=-1).sqrt()
discrepancy = discrepancy.mean()
return discrepancy.item()
scoring_model_name = 'gpt_neo_2.7B'
reference_model_name = 'gpt_neo_2.7B'
start = time.time()
device = 'cuda:0'
scoring_tokenizer = AutoTokenizer.from_pretrained(scoring_model_name, padding_side = 'right')
scoring_tokenizer.pad_token_id = scoring_tokenizer.eos_token_id
scoring_model = AutoModelForCausalLM.from_pretrained(scoring_model_name).to(device)
scoring_model.eval()
if reference_model_name != scoring_model_name:
reference_tokenizer = AutoTokenizer.from_pretrained(reference_model_name, padding_side = 'right')
reference_model = AutoModelForCausalLM.from_pretrained(reference_model_name)
reference_model.eval()
end = time.time()
print(f'loading time: {end-start}')
name = "sampling_discrepancy_analytic"
criterion_fn = get_sampling_discrepancy_analytic
prob_estimator = ProbEstimator('fast_detect_gpt_result')
text = '在智能手机终端市场需求疲软,出货量增长乏力的背景下,折叠屏如同一道曙光,照亮手机市场,成为智能手机市场唯一还在增长的细分品类。根据IDC统计数据,2022年国内折叠屏市场继续维持稳定增长态势,全年出货量达到近330万台,同比增长高达118%,2023年一季度国内折叠屏手机实现出货101.5万台,较2022年同期增至52.86%。\n数据显示,2019年国内折叠屏手机市场规模约为\n28.28亿元,2022年已增长至127.49亿元。在直屏智能手机的硬件配置和功能体验进入瓶颈期,智能手机市场陷入低迷状态之时,折叠屏手机的创新技术成熟,在保持一定便利性的同时,还很好地解决了屏幕尺寸受限,因此才能逆势上扬,市场表现愈发火热。\n从竞争格局来看,自折叠屏手机面世,市场呈快速发展趋势,各大主流品牌纷纷在此领域投入布局,我国如今的折叠屏手机市场,呈现群雄逐鹿的状态。在2022年国内折叠屏手机行业中,华为、三星和OPPO排名前三。华为作为入局最早的厂商之一,一直都是折叠屏技术研发的主力,占据我国折叠屏市场份额的47.4%,成为该领域中最畅销的手机品牌。\n华经产业研究院研究团队使用桌面研究与定量调查、定性分析相结合的方式,全面客观的剖析折叠屏手机行业发展的总体市场容量、产业链、竞争格局、经营特性、盈利能力和商业模式等。科学使用SCP模型、SWOT、PEST、回归分析、SPACE矩阵等研究模型与方法综合分析折叠屏手机行业市场环境、产业政策、竞争格局、技术革新、市场风险、行业壁垒、机遇以及挑战等相关因素。根据折叠屏手机行业的发展轨迹及实践经验,精心研究编制《2023-2028年**折叠屏手机行业发展监测及投资前景展望报告》,为企业、科研、投资机构等单位投资决策、战略规划、产业研究提供重要参考。'
tokenized = scoring_tokenizer(text, return_tensors="pt", padding=True, return_token_type_ids=False).to(device)
labels = tokenized.input_ids[:, 1:]
with torch.no_grad():
logits_score = scoring_model(**tokenized).logits[:, :-1]
if reference_model_name == scoring_model_name:
logits_ref = logits_score
else:
tokenized = reference_tokenizer(text, return_tensors="pt", padding=True, return_token_type_ids=False).to(device)
assert torch.all(tokenized.input_ids[:, 1:] == labels), "Tokenizer is mismatch."
logits_ref = reference_model(**tokenized).logits[:, :-1]
crit = criterion_fn(logits_ref, logits_score, labels)
print(crit)
end = time.time()
print(f'inference time: {end-start}')
prob = prob_estimator.crit_to_prob(crit)
print(f'Fast-DetectGPT criterion is {crit:.4f}, suggesting that the text has a probability of {prob * 100:.0f}% to be fake.')
Hi, I have a piece of text. How should I input it to Fast-DetectGPT instead of using the default JSON file?
你好,很感谢你们开源了如此优秀的工作,我在使用过程中发现有一些数据本地测评的结果跟线上版本不一致,尤其是中文数据,使用GPT-NEO-2.7b会报如下问题:
328 CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
请问一下,如何复现跟在线demo一样的结果呢?需要调整模型还是修改local_infer_ref文件中的内容呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.