bychelsea / vand-april-gan Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023 Workshop] VAND Challenge: 1st Place on Zero-shot AD and 4th Place on Few-shot AD
[CVPR 2023 Workshop] VAND Challenge: 1st Place on Zero-shot AD and 4th Place on Few-shot AD
Can I test without loading pre-trained weights, and how do I set the checkpoint_path?
Is it chosen based on empirical observations?
From debugging the code, it seems that here the patch tokens of just one reference image are compared to the query image since we iterate over patch_tokens
that has the length of few_shot_features
. Shouldn't it be possible to use multiple reference images with the k_shot
variable?
Lines 189 to 201 in 46fcbe5
when run !python /content/VAND-APRIL-GAN/train.py --train_data_path "/content/VAND-APRIL-GAN/data" --config_path "/content/VAND-APRIL-GAN/open_clip/model_configs/ViT-B-16.json"
it shows this error!
/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py:200: UserWarning: Error detected in LinalgVectorNormBackward0. No forward pass information available. Enable detect anomaly during forward pass for more information. (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:92.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/content/VAND-APRIL-GAN/train.py", line 176, in
train(args)
File "/content/VAND-APRIL-GAN/train.py", line 140, in train
loss.backward()
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [1, 196, 512]], which is output 0 of AsStridedBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I want to know if there is a variable/method i can use to score how suspicious/anomalous an image is. im kind of new to this stuff so idk lol
Hello, thank you for your contribution to anomaly detection in the zero-shot learning domain. However, I have a question that I found in the code and I hope you can explain it. During the zero-shot process, when processing normal images, as shown in line 169 of the test.py file, text_probs[0][1] is still being used to represent semantic information. According to my understanding, text_probs[0][0] should represent the semantic information of normal images, while text_probs[0][1] should represent the semantic information of abnormal images. Therefore, when processing normal images, should the code be changed to text_probs[0][0]? Thank you very much!
可以提供下Resnet的指标吗
Hi, I couldn't find the license information for this repository. Could you please provide details on the specific license for this project?
Thanks for your great work!
By the way, what type of GPU are you using and how long does it take?🙌
对于test.py中的检测结果,除了生成的热力图结果,还有其他方法可以看检测的某一张照片是不是异常吗
Why did you set the epoch to 3 when training the MVTec dataset, but to 15 when training the Visa dataset? I noticed that the loss on MVTec was still decreasing after the third epoch
I hope this message finds you well. I've been working with your code for the Visa and MVTec datasets, and I've encountered an issue related to the missing meta.json file in the dataset path /data/visa/meta.json.
It seems that the code relies on this meta.json file to load important dataset information, and as a result, I'm encountering a FileNotFoundError when trying to run the code. The code snippet that specifically references the missing file is as follows:
meta_info = json.load(open(f'{self.root}/meta.json', 'r'))
I have checked the provided dataset path, and indeed, there is no meta.json file located at /data/visa/meta.json.
Could you please provide more guidance on how to resolve this issue? Do I need to create or obtain the meta.json file for the dataset, and if so, how should it be structured?
Your assistance in resolving this issue would be greatly appreciated. Thank you for your time and support.
Best regards,
Hello dear authors!In the code "image_features, patch_tokens = model.encode_image(image, features_list)", is "image_features" the global image representations? Just like the source code in openclip: "image_features = model.encode_image(image)", as we always do. But you change the transformer to add additional outputs of assigned "features_list". I don't know whether my understanding is right or not.
Hi! I am very interested in your excellent work. I do believe that this can bring new insights into zero-shot anomaly detection and pioneer the way into unifying anomaly detection. Let's make anomaly detection great together.
Thank you so much for your code in AND-APRIL-GAN. Thanks to your published code, I was able to study this field better and understand coding more deeply.
However, I have a question regarding visualization. In the visualization, I see that it uses a heatmap, and usually, defects are marked with a red mark. But in my case, due to incorrect threshold settings, not only the defects but also other parts are marked in red. Do you have any ideas on how to address this issue? Or what kind of values should I generally set for the threshold?"
Thank you for your working first. I found that both your AUROC and F1max score on mvtec-ad dataset for zero-shot segmentation are higher than the WinCLIP, but the AUPRO is lower (64.6 for WinCLIP and 44 for your work), can you provide some explanation for it? Thank you.
I want to view the heatmap like in the paper, But I can't find the way. Someone can help me?. Give me some code or document to do it
I want to use this perfect work in a new dataset, but the dataset doesn't have mask labels. My objective is to perform image-level anomaly detection (normal or abnormal classification). Is that possible to achieve this using this code?
Thank you very much for this nice repository. I was wondering what is denoted by the abbreviations "px" and "sp" in the code.
Hi, thanks for contributing nice work. Here I have a question for discussion.
Question: How can we use image_feature (in your train.py line 112) instead of patch_tokens with ResNet50 backbone. And do you have any suggestions on how to achieve this?
In the original code (with ResNet50 backbone), you are using different scale patch_tokens to element-wise multiply text_feature with shape:
(B, 9612, 768) and (B, 768, 2) => (B, 9612, 2)
(B, 2304, 768) and (B, 768, 2) => (B, 2304, 2)
(B, 576, 768) and (B, 768, 2) => (B, 576, 2)
and reshape, interpolate to target anomaly map size, and so on...
But the image_features shape is (B, 768) and the text_features shape is (B, 768, 2). How should we modify and design the rest actions to continue to train linear layers and generate anomaly maps for inference?
If you have any questions, feel free to ask, thanksss!
Hi!
I was trying to run with a Resnet50 backbone with this prompt:
!python train.py --dataset visa --train_data_path /content/visa-dataset/ \
--save_path ./exps/mvtec/RN50x16_384 --config_path ./open_clip/model_configs/RN50x16.json --model RN50x16 \
--features_list 1 2 3 4 --pretrained openai --image_size 384 --batch_size 8 --aug_rate -1 --print_freq 1 \
--epoch 3 --save_freq 1
But it does not work correctly:
Traceback (most recent call last):
File "/code/VAND-APRIL-GAN/train.py", line 170, in <module>
train(args)
File "/code/VAND-APRIL-GAN/train.py", line 108, in train
image_features, patch_tokens = model.encode_image(image, features_list)
File "/code/VAND-APRIL-GAN/open_clip/model.py", line 213, in encode_image
features = self.visual(image, out_layers)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
TypeError: ModifiedResNet.forward() takes 2 positional arguments but 3 were given
Could you share the changes that should be made in modified_resnet.py to allow this? Thank you.
在train.py中:
if args.dataset == 'mvtec':
train_data = MVTecDataset(root=args.train_data_path, transform=preprocess, target_transform=transform,
aug_rate=args.aug_rate)
并未设置mode='train',而 MVTecDataset中mode的默认值为 'test' 。下述代码:
if mode == 'train':
# 如果模式是 'train'
self.cls_names = [obj_name] # 将对象名称添加到类别名称列表
save_dir = os.path.join(save_dir, 'k_shot.txt') # 构建保存目录的路径
else:
self.cls_names = list(meta_info.keys()) # 否则,获取所有类别名称
无论是test.py还是train.py都不会执行 mode == 'train'。请问是应该将mode默认值设置为train,还是我理解的幼体,在train.py中mode就应该为test?
您好,我在修改train.py文件进行网络训练的时候,在最后loss计算梯度的时候出现了如下错误:RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation,请问您知道该问题如何解决吗?我的cuda版本12.2,因此使用requirement.txt中的版本不合适,我先使用了torch2.1.0的版本,之后更换到 2.2.1+cu118版本均会出现该问题。希望您的回复。
您好!请问在训练过程中,除了添加的线性层clip其它部分的权重有被微调吗?还是clip主干部分的权重是冻结的呢。
Hello, may I ask if you think the linear layer used for the network middle layer is to train the text-language alignment ability from scratch? And have you tried to train the framwork with initialized by other fine pretrained feature extractor(have not been trained by clip)?
As far as I understood, unlike WinClip for anomaly segmentation, the original ground truth of test images is used for model adaptation, while the final performance is reported on the same test images. Can you please confirm?
请问few-shot时能否针对每张图生成对应的异常分类的分数,源码里似乎是针对某一类进行统计的,关于异常图的最大值的应用这一部分能否说明一下
你好, 这边做测试用一张正常图片和一张异常图片做测试 代码如下
`image = preprocess(Image.open("/content/000.png")).unsqueeze(0).to(device)
obj_list=["screw"]
with torch.cuda.amp.autocast(), torch.no_grad():
text_prompts = encode_text_with_prompt_ensemble(model, obj_list, tokenizer, device)
image_features= model.encode_image(image)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features = []
text_features.append(text_prompts["screw"])
text_features = torch.stack(text_features, dim=0)
# sample
text_probs = (100.0 * image_features @ text_features[0]).softmax(dim=-1)`
得到的text_probs 都为这样tensor([[0.7982, 0.2018]], device='cuda:0') ,第一个值都是大于第二值,请问这样是否表明分类不正确
I have been trying to run the model for quite some time and finally stumbled across an error I think could be solved through an issue.
I did attempt also to run a version of test.py on my local machine and it couldn't find visa.json, which is why I was running visa.py. Please let me know if you have a solution or if there is something I might be doing incorrectly when loading in the model to get testing results.
Thanks!
Hi, thank you for your valuable work.
In the function of calculating the pro_auc (AUPRO), I noticed that the False Positive Rate (FPR) is computed using the formula:
fpr = fp_pixels / inverse_masks.sum()
While commonly, the FPR is calculated as:
fpr = FP / (FP + FN)
I would appreciate some clarification on why the FPR is being computed with the inverse of the ground truth mask in this specific context.
Thank you!
作者你好,怎么设置推理呢,有推理文件吗
Thank you for sharing your works.
By the way, is it possible to convert the model file to ONNX format for test?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.