zqhang / anomalyclip Goto Github PK

Official implementation for AnomalyCLIP (ICLR 2024)

License: MIT License

Python 96.87% Shell 3.13%

anomalyclip's Issues

how we can test the model on a new image ?

@zqhang @tianyu0207 thank you for your interesting work!
I'm wondering how we can "test" the model on one or few images without the need of training.
I hop you can explain how to do that..

你好，作者，感谢代码分享，请问如何使用ISIC数据集训练呢？

在仓库中下载的ISIC数据集是测试集，generate_dataset_json文件夹中的isbi.py文件也是针对测试集的，请问该如何使用ISIC训练模型呢？

Question about the prefix template in text prompts

I sincerely thank you for your research and code sharing. As a medical doctor, I think your research can have a significant impact in the medical domain as well. I have read your paper and the reviews on the Openreview page, but I had a question because I couldn't grasp the details about the text prompt template (the part marked as V_1-V_E, W_1-W_E in the paper). Looking at the training code, it seems like that V_1 ~ V_E, W_1 ~ W_E are all filled with "X", is that correct?

P.S. I know how unexpectedly cumbersome and psychologically resistant it can be to organize and share experimental code after a paper is accepted, so I am truly grateful for sharing the code like this. Once again, congratulations on the ICLR acceptance.

how to test one img？

Input a picture, how to determine whether the picture is abnormal or not, how to write the code?

如何训练自己的模型？

亲爱的作者大大，以及各位专家和大佬，请问我如果要训练自己的数据集应该如何操作？假设我有一堆图片，那我是否需要掩码呢？在原作者给出的七个工业数据集上，每个数据集的设置都多少有些不同，有的有ground_truth，有的则没有。请问如果我要设置自己的数据集，应该具体如何操作？图片的大小，分辨率等是否有要求？还有，如果我设置好数据集之后，在运行测试结果的时候我暂时不需要测试模型的四个指标，仅仅需要判断给出的测试值中的数据哪些是正常，哪些是异常即可，请问我应该修改哪里的代码？又应该如何修改呢？
下面是我设置的数据集目录格式，请问是否有误？以上所有问题，真诚的向作者，各位专家大佬谦逊提问，望解答！

PermissionError: [Errno 13] Permission denied: '/remote-home'

Hello @zqhang

Thanks for your work. When i run the bash test.sh it gives permission error, please have a look maybe the code need to download the weight and there is no permission for it

bash test.sh
res.log
Namespace(data_path='/remote-home/iot_zhouqihang/data/mvdataset', save_path='./results/9_12_4_multiscale/zero_shot', checkpoint_path='./checkpoints/9_12_4_multiscale/epoch_15.pth', dataset='mvtec', features_list=[6, 12, 18, 24], image_size=518, depth=9, n_ctx=12, t_n_ctx=4, feature_map_layer=[0, 1, 2, 3], metrics='image-pixel-level', seed=111, sigma=4)
name ViT-L/14@336px
Traceback (most recent call last):
  File "/media/cvpr/CM_1/AnomalyCLIP/test.py", line 195, in <module>
    test(args)
  File "/media/cvpr/CM_1/AnomalyCLIP/test.py", line 43, in test
    model, _ = AnomalyCLIP_lib.load("ViT-L/14@336px", device=device, design_details = AnomalyCLIP_parameters)
  File "/media/cvpr/CM_1/AnomalyCLIP/AnomalyCLIP_lib/model_load.py", line 145, in load
    model_path = _download(_MODELS[name], download_root or os.path.expanduser("/remote-home/iot_zhouqihang/root/.cache/clip"))
  File "/media/cvpr/CM_1/AnomalyCLIP/AnomalyCLIP_lib/model_load.py", line 39, in _download
    os.makedirs(cache_dir, exist_ok=True)
  File "/home/cvpr/anaconda3/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/cvpr/anaconda3/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/cvpr/anaconda3/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 1 more time]
  File "/home/cvpr/anaconda3/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/remote-home'
./checkpoints/9_12_4_multiscale/res.log
Namespace(data_path='/remote-home/iot_zhouqihang/data/Visa', save_path='./results/9_12_4_multiscale_visa/zero_shot', checkpoint_path='./checkpoints/9_12_4_multiscale_visa/epoch_15.pth', dataset='visa', features_list=[6, 12, 18, 24], image_size=518, depth=9, n_ctx=12, t_n_ctx=4, feature_map_layer=[0, 1, 2, 3], metrics='image-pixel-level', seed=111, sigma=4)
name ViT-L/14@336px
Traceback (most recent call last):
  File "/media/cvpr/CM_1/AnomalyCLIP/test.py", line 195, in <module>
    test(args)
  File "/media/cvpr/CM_1/AnomalyCLIP/test.py", line 43, in test
    model, _ = AnomalyCLIP_lib.load("ViT-L/14@336px", device=device, design_details = AnomalyCLIP_parameters)
  File "/media/cvpr/CM_1/AnomalyCLIP/AnomalyCLIP_lib/model_load.py", line 145, in load
    model_path = _download(_MODELS[name], download_root or os.path.expanduser("/remote-home/iot_zhouqihang/root/.cache/clip"))
  File "/media/cvpr/CM_1/AnomalyCLIP/AnomalyCLIP_lib/model_load.py", line 39, in _download
    os.makedirs(cache_dir, exist_ok=True)
  File "/home/cvpr/anaconda3/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/cvpr/anaconda3/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/cvpr/anaconda3/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 1 more time]
  File "/home/cvpr/anaconda3/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/remote-home'

The process for the last class token of vision is different from original CLIP.

In line 387 of AnomalyCLIP_lib/AnomalyCLIP.py , the global class token from vision is directly feed into projector and don't process it with self.ln_post, which is different from original CLIP. It is a mistake or some special settting？It will contribute a lot for AnomalyCLIP？

训练集所选辅助数据训练疑问

作者您好，对于train.sh，CUDA_VISIBLE_DEVICES=${device} python train.py --dataset mvtec --train_data_path，dataset和train_data_path分别代表什么呢，因为您文章指出mvtec用的visa训练，visa用的mvtec训练，我看您代码这两部分训练集前后都一致

Unable to reproduce the results

Hi, first thanks for your decent job. I have an issue which is I cannot reproduce the results mentioned in the paper when training on MVTec and testing on ViSA, here are the training hyperparameters which were supposedly used for yielding the paper results.

Argument	Default Value
depth	9
n_ctx	12
t_n_ctx	4
feature_map_layer	[0, 1, 2, 3]
features_list	[6, 12, 18, 24]
epoch	15
learning_rate	0.001
batch_size	8
image_size	518
seed	111

And in the table below there is a comparison of the two models' mean over class performance on ViSA (what i get and what is reported in the paper).

Object Name	Pixel AUROC (%) Me	Pixel AUROC (%) Paper	Pixel AUPRO (%) Me	Pixel AUPRO (%) Paper	Image AUROC (%) Me	Image AUROC (%) Paper	Image AP (%) Me	Image AP (%) Paper
Mean	95.4	95.5	85.8	87.0	81.2	82.1	84.6	85.4

Is there something that i am missing?
Thanks a lot in advance!

Question about the reimplementation of the result of original CLIP

Hi there, Congrats for the great work!

In table 1, I ve noticed you also include the result of Original CLIP model

Could you please share the setting of this experiments? Cuz my reimplementation based on your code shows lots of differences than yours. Like

the size of CLIP (Base large huge?)
The fixed text prompts you used. ( The "encode_text_with_prompt_ensemble" method you implemented?)
any modifications on the vision side ? like DPAM

Thank you so much for your patience!

maybe have a bug

我发现代码中 similarity_map_list里面的4个map，值都相等。原因可能是把中间的feature加入list时没有clone()？

Cannot obtain the meta.json for the SDD dataset

Good work! When I use the provided generate_dataset_json/SDD.py to generate meta.json, the lack of splited train and test datasets prevents the generation of meta.json. How can I obtain the train and test datasets? Additionally, some missing classes in the downloaded MPDD dataset are causing the code to fail. Could you provide the complete MPDD dataset?

Thresholding at inference time

Hi, it is not very clear to me how to extrapolate a threshold at segmentation level to discriminate whether a pixel is classified as “normal” or “anomaly". The output anomaly map needs to be normalized with something like this?

normalize_anomalymap = (anomalymap- anomalymap.min()) / (anomalymap.max() - anomalymap.min())

Thanks

License?

Hi AnomalyCLIP team @tianyu0207 @zqhang,

I would like to investigate your model and possibly put it to use, however, there is no license given for this repository.
Do you intend to specify a license? And if so, when would you do it?

Thank you in advance for your work and best regards!

The learnable token embeddings are attached to the first 9 layers of the text encoder for refining the textual space.

First of all, I would like to thank you and your colleagues for your contributions to this domain. I have a question that in the Implementation details you said that The learnable token embeddings are attached to the first 9 layers of the text encoder for refining the textual space. but I only see that the 2nd (i = 1) -> 8th (i = 7) layers are attached. Can you explain it for me, thank you.

Test with large dataset

When i test pixel-level with 3700 images, when calculate roc_auc_score, it is without RAM memory. How can I fix this? i use colab pro with 50gb ram

Hello, may I ask what is the reason why the code has not been tested and does not show the end?

about prompt

I can't understand the following code.

self.register_buffer("token_prefix_pos", embedding_pos[:, :, :1, :] )
self.register_buffer("token_suffix_pos", embedding_pos[:, :, 1 + n_ctx_pos:, :])
self.register_buffer("token_prefix_neg", embedding_neg[:, :, :1, :])
self.register_buffer("token_suffix_neg", embedding_neg[:, :, 1 + n_ctx_neg:, :])

I think the positive prompt should be ['X X X X X X X X X X X X object.'], so the prefix should be 'X X X X X X X X X X X X ', and the suffix should be '.'.
I don't know if my understanding is wrong, can you help me to answer it?

整个方案能否导出为onnx格式的一个大网络？

首先感谢作者们能够开源这么优秀的网络模型！！！
现有一个问题请教一下，如果想付诸实用，希望模型能导出为onnx格式的一个大网络：输入为图像，输出为异常分值图。不知道有无可能？如能，具体如何操作呀？

最近在学习AnomalyCLIP时，文 / 图两个部分始终没能合在一起，感觉需要自己动手写后处理，既麻烦，还容易出错。

Issue about reimplementing medical ZSAD performance

Hello @zqhang and @tianyu0207,

First of all, I really appreciate for your work! it's really nice and interesting.

But I can't find some implementation details about medical ZSAD.

The paper said "To address this issue, we create such a dataset based on ColonDB (More details see Appendix A), and then optimize the prompts in AnomalyCLIP and VAND using this dataset and evaluate their performance on the medical image datasets.".

So as far as i understood, you finetuned the models with (ColonDB + test split of Endo classfication dataset) and tested medical ZSAD performance of every other medical dataset.
Then what is the finetuning dataset for testing ColonDB and Endo dataset?

Since those two datasets were already used in finetuning the models, I think another dataset is needed for testing those two dataset.

I would appreciate every help

Inquiry About Open Source Plans

Very interesting work! Could you kindly share if there are any plans to open source your AnomalyCLIP in the near future?

How do I use the AnomalyCLIP to train my custom dataset

I want to use AnomalyCLIP to train my custom dataset. And my dataset has no pixel ground truth.

作者大大和各位专家，请问如何设置自己的数据集？

亲爱的作者大大，以及各位专家和大佬，请问我如果要训练自己的数据集应该如何操作？假设我有一堆图片，那我是否需要掩码呢？在原作者给出的七个工业数据集上，每个数据集的设置都多少有些不同，有的有ground_truth，有的则没有。请问如果我要设置自己的数据集，应该具体如何操作？图片的大小和分别率等是否有要求？还有，如果我设置好数据集之后，在运行测试结果的时候我暂时不需要测试模型的四个指标，仅仅需要判断给出的测试值中的数据哪些是正常，哪些是异常即可，请问我应该修改哪里的代码？又应该如何修改呢？
下面是我设置的数据集目录格式，请问是否有误？以上所有问题，真诚的向作者，各位专家大佬谦逊提问，望解答！

Training Methodology Issue: Incorrect Dataset Mode

I've encountered an issue in the paper's code regarding the training approach for the anomaly detector on the MVTec AD dataset. The problem lies in the train.py script at line 35 where the Dataset object for training is created without specifying mode="train":

train_data = Dataset(root=args.train_data_path, transform=preprocess, target_transform=target_transform, dataset_name=args.dataset)

This oversight leads to two critical problems:

The model mistakenly uses the test set for training.
The anomaly detection framework is deprived of abnormal data during training, yet it encounters anomalies during testing. This inconsistency suggests that the model might be functioning as a simple classifier rather than performing anomaly detection.

Could this be revised to ensure the correct dataset partitioning and training setup?

Error with SDD.py

Dear author,

I encountered an issue with running SDD.py.

No such file or directory: 'data/sdd/electrical commutators/train'

Can you tell me why the SDD dataset has class "electrical commutators"? I can only see classnames such as "kos35", "kos36", etc. in the datasets.

Thank you.

运行问题

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1725/1725 [04:43<00:00, 6.09it/s]
为啥这个加载完一直卡住啊

When will the source code be released?

Why divide 0.07

I don't understand why you divide right here
text_probs = image_features.unsqueeze(1) @ text_features.permute(0, 2, 1) text_probs = text_probs[:, 0, ...]/0.07 text_probs = text_probs.squeeze()
and another here
logit_scale = self.logit_scale.exp() # nn.Parameter(torch.ones([]) * np.log(1 / 0.07)) logits_per_image = logit_scale * image_features @ text_features.t() logits_per_text = logits_per_image.t()

And in another paper , they use multi with 100, example
for layer in range(len(det_patch_tokens)): det_patch_tokens[layer] = det_patch_tokens[layer] / det_patch_tokens[layer].norm(dim=-1, keepdim=True) anomaly_map = (100.0 * det_patch_tokens[layer] @ text_features) anomaly_map = torch.softmax(anomaly_map, dim=-1)[:, :, 1] anomaly_score = torch.mean(anomaly_map, dim=-1) det_loss += loss_bce(anomaly_score, image_label)

使用visa数据集训练模型，测试效果不理想，与官方已训练模型测试效果相差甚远

Hello author, I tested the Visa dataset using a model that has been trained in engineering and the results were good. However, when I used the Visa dataset, created a meta.json file, and trained for 15 epochs, the testing results were very poor. What is the reason for this?

请问下载的mvtec数据放在那个目录下啊

Progress stuck at 100%

Hey @zqhang and @tianyu0207,

thank you very much for this repo, the paper and its results were very interesting.
I set up your project on my local machine and faced a problem.

When running test on visa I get to 100% in the progress bar but get stuck there.
I looked through the code but coudln't find why.

I would appreciate every help

Pure leakage in the test set

It seems that you used the entire test set to train encode_text_learn, while also using the test set for testing, which is seriously inconsistent with the ZERO-SHOT mentioned in your paper.Can you explain this issue

About loss, image_loss when train model?

I train with dataset visa,
I train 11 epoch, but loss and image_loss is 3.7960, 0.5325. I feel something wrong. I setup by your setting. Can you share me about loss and image_loss when u train?

作者大大你好，请问zero-shot如何体现呢？

请问在您的工作中，zero-shot是如何体现的呢？既然是zero-shot，那么为什么会训练mvtec数据集，然后测试mvtec数据集的性能指标呢？最后还有一个问题，为什么在训练和测试的时候，都用的测试数据集呢？

SDD details

Can you provide more details on the SDD or original datasets you tested?
I download it from DOWNLOAD HERE with fine annotations (for JIM2019 paper), and use this DRA
include: normal_samples 286 anomaly_samples 54, different from your work (181, 74)
and I got very different results based on your checkpoints:

zqhang / anomalyclip Goto Github PK

anomalyclip's Issues

Recommend Projects

Recommend Topics

Recommend Org