Giter Site home page Giter Site logo

coop's People

Contributors

jingkang50 avatar kaiyangzhou avatar shizhediao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coop's Issues

Zero division problem of Imagenet dataset

When I finished testing the other ten data sets, I tried the Imagenet data set. After the training was completed, an error occurred during the test:

File "/home/dpsh/Dassl.pytorch/dassl/evaluation/evaluator.py", line 69, in evaluate

acc = 100.0 * self._ correct / self._ total

ZeroDivisionError: float division by zero

Is there any similar situation? How was it resolved?

regression

Thanks for sharing this code. Was curious if you had considered how to handle a regression task?

I thought I might try a few ideas out, perhaps starting with a simple percentage label. Like [V]1 [V]2 ... [V]M 40% but was curious if you had tried this or had any intuitions.

How can I register my own dataset?

Hello, thanks for your excellent work! Here I have a code problem when I fine-tune the model on my own dataset. I just follow the organization manner in CoOp/datasets to write my dataset code, but failed to register the dataset. I also add xxx.yaml in CoOp/configs/datasets, and still failed to register, could you give me some advice if I need to add extra code? Thanks!

ModuleNotFoundError: No module named 'trainers.ours

Hello,
when I try to run one of your example bash commands, I somehow can't figure out how to get rid of this error.

(dassl) user@MBP scripts % bash main.sh caltech101 rn50_ep50 middle 16 1 True Run this job and save the output to output/caltech101/CoOp/rn50_ep50_1shots/nctx16_cscTrue_ctpmiddle/seed1 Traceback (most recent call last): File "train.py", line 27, in <module> import trainers.zsclip File "/Users/user/Projects/CoOp/CoOp/trainers/zsclip.py", line 10, in <module> from .ours import load_clip_to_cpu ModuleNotFoundError: No module named 'trainers.ours'

Is there any spell error in cocoop running scripts?

seed=1

base base2new_train.sh imagenet 1
base base2new_test.sh imagenet 1

seed=2

base base2new_train.sh imagenet 2
base base2new_test.sh imagenet 2

seed=3

base base2new_train.sh imagenet 3
base base2new_test.sh imagenet 3

the problem is that “bash” error spell to “base”?

raise valueError

raise ValueError(
ValueError: The requested one is expected to belong to ['SE', 'MCD', 'MME', 'ADDA', 'CDAC', 'DAEL', 'DANN', 'AdaBN', 'M3SDA', 'SourceOnly', 'DDAIG', 'DAELDG', 'Vanilla', 'CrossGrad', 'DomainMix', 'EntMin', 'FixMatch', 'MixMatch', 'Mean
Teacher', 'SupBaseline'], but got [CoOp] (do you mean [CrossGrad]?)

zero-shot or fine-tune?

  1. To my knowledge, CLIP can be directly used applied to zero-shot learning (i.e., unseen/novel classes).
    coop and cocoop don't appear to be zero-shot learning, but require fine-tuning. However, I don't see the detials about how to fine-tuning in paper. Am I misunderstand it? In the meantime, I would like to know how the CLIP is fine-tuned.
  2. I cannot understand the figure 1 in paper: why the performance of coop and cocoop can be compared to zero-shot learning.

Questions about checkpoints

Sorry for dumb questions. I didn't really understood the description. Is it possible to use your checkpoints for ViT models to do classification tasks? Can I just load them into these models (from openai git), without using your script? Are they better than original openai weights?

More detailed few shot results

Can you provide the few shot results of different few shot settings of the 11 dataset , with the vit-B image backbone CLIP?
I tried the settings in the paper but some results can not be achieved (food , for instance )

question about gradients on text encoder

Hi, may I ask if the gradients of the original CLIP text encoder are frozen or not? The paper mentioned that the gradients of text encoder is frozen, but I couldn't find that part in the code... Thanks a lot for your help!

About the configuration of "classnames"

Thanks for your contributions!

I have a question about "classnames = self.dm.dataset.classnames" (the line 224 of "CoOp.build_model" in coop.py).
What is the value of "classnames"? I checked the configuration files and didn't find out.

TRAINER.OURS.N_CTX, ... in main.sh

In main.sh, i think these should be .COOP instead of . OURS.

TRAINER.OURS.N_CTX ${NCTX} 
TRAINER.OURS.CSC ${CSC} 
TRAINER.OURS.CLASS_TOKEN_POSITION ${CTP} 

About input of text

Thanks for your great job!
I want to ask why the input is not (image, text) at forward function, such as output = self.model(image, text) .
And what is the scheme of matching text logits and image logits?

GPU Memory Consumption of CoCoOp

Hi, Thanks a lot for the excellent work and the easy-to-use code!
Recently I've been trying to use CoOp and CoCoOp in my research.
However I encounter a small problem: the GPU consumption of CoCoOp seems to be much larger (about 64X under my setting) than CoOp, resulting in small batch size and very long training time. Based on my understanding, the reason is that the prompts in CoCoOp should be given to each instance instead of each batch. I've seen the same problem reported in the paper.
May I ask whether there are any tricks during training to accelerate the training process? Thanks so much!

the performance about full fine-tuning on ResNet.

Hi, thanks for the nice code.
I found the performance is poor when full fine-tuning the ResNet-based CLIP on ImageNet while for ViT-based CLIP the performance is good. Do you have some insightful comments on why full fine-tuning or linear probing the ResNet-based CLIP makes the performance worse?

If CoCoOp can use ResNet as backbone

Thanks for your great work. I would like to ask you whether you have considered using a CNN network such as ResNet as the backbone in CoCoOp and whether it is possible to use it?

Inferencing on single image

I have been successful in developing the train and test pipeline for my custom dataset. Can you help me out for making inference on a single image. I am using the trainer.model_inference(image) function. Is there a particular format this image needs to be in ? I am using PIL to read the image.

Error:
/ContextOptimization/CoOp/trainers/coop.py", line 196, in forward
image_features = self.image_encoder(image.type(self.dtype))
File "/home/chandan/anaconda3/envs/coop/lib/python3.8/site-packages/PIL/Image.py", line 519, in getattr
raise AttributeError(name)
AttributeError: type

Main function used:

def main(args):
cfg = setup_cfg(args)
if cfg.SEED >= 0:
print("Setting fixed seed: {}".format(cfg.SEED))
set_random_seed(cfg.SEED)
setup_logger(cfg.OUTPUT_DIR)

if torch.cuda.is_available() and cfg.USE_CUDA:
    torch.backends.cudnn.benchmark = True

print_args(args, cfg)
print("Collecting env info ...")
print("** System info **\n{}\n".format(collect_env_info()))

trainer = build_trainer(cfg)

trainer.load_model(args.model_dir, epoch=args.load_epoch)
image = Image.open('/ContextOptimization/CoOp/data/0cd2ed50.png')
result = trainer.model_inference(image)
print(result)
return result

I am looking for the predicted class and predicted probabilities as output.

Any direction would be appreciative.

Thanks

Is the CLIP model the original version?

hello,感谢大佬开源!请问下你对比的clip是原版的clip吗?因为我看你的text encoder因该是微调过的,其泛化能力因该会比原版拿4亿数据训练出来的弱?

about replace ce loss with similarity.

Hi Kaiyang,

thanks for you amazing work!

I obtain.cross_entropy(output, label) is used in your training. I wonder if it is possible to replace it with similarity between text and image, by adding corresponding classname embedding in the the learned prompt. And do one-shot like clip. Is it possible to do so?

Thanks!

When I change the code , the result will dropp considerably!

Thanks for your great work!
The previous issue has been solved, but I find a new issue.
If I change the code (

self.register_model("prompt_learner", self.model.prompt_learner, self.optim, self.sched)
)
as
self.register_model("model", self.model, self.optim, self.sched).
The result will drop considerably(20%)!
Can you give me some advice?

Config Optimizer Overwritten

I have tried to change the optimizer attributes to an ADAM optimizer with different LR scheduling and ADAM specific parameters, but when run, it overwrites the LR Scheduler parameters and the betas.

The config file:

DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 8
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 4

INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]

OPTIM:
  NAME: "adam"
  LR: 0.0002
  ADAM_BETA1: 0.5
  ADAM_BETA2: 0.999
  MAX_EPOCH: 100
  LR_SCHEDULER: "single_step"
  GAMMA: 0.1
  STEPSIZE: 0
  WARMUP_EPOCH: 0
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5

TRAIN:
  PRINT_FREQ: 20

MODEL:
  BACKBONE:
    NAME: "videoclip"

TRAINER:
  COCOOP:
    N_CTX: 4
    CTX_INIT: ''
    PREC: 'amp'

The log output once run:

OPTIM:
  ADAM_BETA1: 0.9
  ADAM_BETA2: 0.999
  BASE_LR_MULT: 0.1
  GAMMA: 0.1
  LR: 0.0003
  LR_SCHEDULER: single_step
  MAX_EPOCH: 10
  MOMENTUM: 0.9
  NAME: adam
  NEW_LAYERS: ()
  RMSPROP_ALPHA: 0.99
  SGD_DAMPNING: 0
  SGD_NESTEROV: False
  STAGED_LR: False
  STEPSIZE: (-1,)
  WARMUP_CONS_LR: 1e-05
  WARMUP_EPOCH: -1
  WARMUP_MIN_LR: 1e-05
  WARMUP_RECOUNT: True
  WARMUP_TYPE: linear
  WEIGHT_DECAY: 0.0005

Is this a problem with DASSL or a problem with the CoCoOp code base?

Thanks!

about the output of interpret_prompt.py

I trained on my customed dataset.
python interpret_prompt.py give the output below:
Return the top-3 matched words Size of token embedding: torch.Size([49408, 512]) Size of context: torch.Size([16, 512]) Size of distance matrix: torch.Size([16, 49408]) 1: ['onic</w>', 'yc</w>', 'bet'] ['0.5414', '0.5420', '0.5438'] 2: ['bat', 'cap', 'advising</w>'] ['0.6391', '0.6398', '0.6403'] 3: ['hell</w>', 'ta</w>', 'shaman</w>'] ['0.5795', '0.5809', '0.5836'] 4: ['regram</w>', '-$</w>', 'marketing</w>'] ['0.5730', '0.5778', '0.5782'] 5: ['fied</w>', 'sighted</w>', 'promote</w>'] ['0.5591', '0.5609', '0.5611'] 6: ['potus</w>', 'ghi</w>', 'ongi</w>'] ['0.6077', '0.6081', '0.6112'] 7: ['taya</w>', 'tive</w>', 'ica</w>'] ['0.6110', '0.6179', '0.6198'] 8: ['believe</w>', 'lies</w>', 'worked</w>'] ['0.5304', '0.5321', '0.5331'] 9: ['dess</w>', 'mariti', 'end'] ['0.5861', '0.5861', '0.5895'] 10: ['cooking</w>', 'coach</w>', 'awesome</w>'] ['0.5644', '0.5723', '0.5734'] 11: ['takeover</w>', 'artworks</w>', 'doctors</w>'] ['0.5982', '0.6008', '0.6010'] 12: ['ig', 'vino</w>', 'inas</w>'] ['0.5264', '0.5279', '0.5305'] 13: ['ame</w>', 'ella</w>', 'ed'] ['0.5310', '0.5341', '0.5401'] 14: ['6</w>', '3</w>', 'met</w>'] ['0.5557', '0.5569', '0.5574'] 15: ['meanings</w>', 'signage</w>', 'trade'] ['0.6725', '0.6733', '0.6736'] 16: ['arrived</w>', 'credits</w>', 'desire</w>'] ['0.6162', '0.6256', '0.6264']

if directly use these words, the length of tokenized word is not 16.
I want to know, what is the meaning of ? how can i use these output? thanks

the GPU consumption trained on ImageNet

when training on 1000 classes imagenet, the GPU memory of prompts seems very large and results in the Out Of Memory error on 16GB GPU Card.
How to solve this problem?

Different random seeds lead to highly variable results.

First of all, thank you for open sourcing such an easy to use code :)
I reproduced your reported results in CoOp on two datasets, DTD and Flower101. I ran the code with three random seeds,1,2 and 3 for both datasets, as your default setting in ./scripts/main.sh.
The performance of model on DTD is as well as the result in paper (acc: 63.46) when trained with seed=3, but the results of seed 1 and 2 are poor (acc: ~15).
As for Flower101, the result of seed 2 and 3 are ~94, but seed 1's result is 44.50

I wonder if this is a normal situation for this few shot training setting? Thanks for any suggestion :)

Why do token prefixes have to be a buffer type

If the token prefixes and suffix are just the slice of the embedding, for example, replacing self.register_buffer("token_prefix", embedding[:, :1, :]) with self.token_prefix=embedding[:, :1, :]) in this line, we will not have to ignore those when loading. Therefore, why do token prefixes have to be a buffer type? Thanks a lot!

assert len(data_loader) > 0,AssertionError

Thank you for your work. I encountered this kind of error when running the Imagenet dataset. Have you encountered any similar errors? How did you solve it?
Traceback (most recent call last):
File "train.py", line 207, in
main(args)
File "train.py", line 142, in main
trainer = build_trainer(cfg)
File "/home/dpsh/Dassl.pytorch/dassl/engine/build.py", line 11, in build_trainer
return TRAINER_REGISTRY.get(cfg.TRAINER.NAME)(cfg)
File "/home/dpsh/Dassl.pytorch/dassl/engine/trainer.py", line 319, in init
self.build_data_loader()
File "/home/dpsh/Dassl.pytorch/dassl/engine/trainer.py", line 342, in build_data_loader
dm = DataManager(self.cfg)
File "/home/dpsh/Dassl.pytorch/dassl/data/data_manager.py", line 128, in init
test_loader = build_data_loader(
File "/home/dpsh/Dassl.pytorch/dassl/data/data_manager.py", line 45, in build_data_loader
assert len(data_loader) > 0
AssertionError

linear-probe-clip

Hello, I would like to ask you a question, when we do linear-probe-cilp experiment (vit-B/32), we should set which parameters to be tunable. Is it clip_model.visual.ln_post and clip_model.ln_final?

AttributeError: 'list' object has no attribute 'to'

While I run my own dataset which has beed modified by your method.
There are two problems below:

  1. evaluating is no problem.However when I train, it showed 'AttributeError: 'list' object has no attribute 'to'';
  2. Why the program can process continuously, even though the error happened?

other variant of COCOOP

Thanks for providing such an outstanding work!
I have a question related to the cocoop.
In my experience, I need to use a lot of images to use cocoop, and it is somewhat time-consuming to use cocoop since it always needs to extract text embedding for each image.
Have you tried to aggregate the image embedding, not at the input of the text encoder, but after the text embedding is extracted?
Thanks

Running on Multi-label Classification

Dear Zhou,
Thank you for sharing Dassl!

I encounter some problem when implementing 'coop' with multi-label classification.
My label in One-hot presentation is like: [0,1,0,1,0,0,0,1], so how to define '_classnames' and '_lab2cname' in 'base_dataset.py'?
I have already reshape my data like:{train:classname:[[name1],[name7]], impath: xxx, label: [0,1,0,0,0,0,0,1]} and feed it into 'Datum'

Do you have any good suggestions, or is it possible to update dassl to be compatible with multi-label tasks?

Many Thanks

Cannot reproduce the results of CoOp and CoCoOp

Hi, thanks for the great work, but I found that it is hard to reproduce the results in the paper.

For example, using the released checkpoints in https://github.com/KaiyangZhou/CoOp#models-and-results, the results of vit-b32-ep50 (nctx=16, shots=16, ctp=end, csc=False) on ImageNet are:

transform seed1 seed2 seed3
paper - 66.85 - -
released checkpoint (inference only) ["random_resized_crop", "random_flip", "normalize"] 64.38 64.72 64.72
released checkpoint (inference only) ["random_flip", "random_translation", "center_crop", "normalize"] 65.11 65.32 65.34
our reproduce (training from scratch then inference) ["random_resized_crop", "random_flip", "normalize"] 65.21 - -

they are all much lower (64.3~65.3) than the results in the paper (66.85), and using the updated transform in #8 (comment) for the released checkpoint achieves even worse performance.

For CoCoOp, the result of vit-b16-ep10 (nctx=4, shots=16, ctp=end) on ImageNet is 71.02, but our reproduce (training from scratch then inference) is 70.14, which is also underperformed.

Our environment informance:
V100-32G / Titan RTX
dassl=0.4.2
torch=1.7.1+cu110
torchvision=0.8.2+cu110

I wonder if I miss something? Thanks a lot.

Few-shot setting in CoCoOp Experiments

Hello,
Thank you for sharing your great work.

I had a question regarding the few-shot setting in the CoCoOp experiments. In the paper, it is mentioned that CoCoOp follows a zero-shot evaluation (from base to novel classes) but for training the base classes, it uses a few-shot setting. However, generally for zero-shot evaluation, models are trained on the complete base classes.

Does this mean that, CoCoOp and CoOp requires only a few-shot setting to perform well on novel categories. Can the same training recipe of CoCoOp or CoOp be used by training all examples of the base classes?

Thank you and kind regards.

Important changes made to Dassl's transforms.py

So, you might find OpenAI's code produces around 59% accuracy for zero-shot CLIP (vision_model=RN50) on ImageNet with prompt ensembling, but CoOp's code gives only 57.81% for the same model (see Table 7 in the paper).

This difference is caused by using different transforms: OpenAI's code applies Resize(224) to an image while CoOp's code (the previous version) uses Resize((224, 224)). More specifically, the former keeps the image aspect ratio while the latter doesn't. To allow the results produced by CoOp's code to be comparable to OpenAI's code, we have made our transforms consistent with theirs. So the transforms in the config files have now been changed from ["random_flip", "random_translation", "center_crop", "normalize"] to ["random_resized_crop", "random_flip", "normalize"].

If you are using our Dassl-based CoOp code, please update the code to the latest version. If you want to use your own code, you can simple copy CoOp's model code (i.e. CustomCLIP) and do the comparison on the same ground with whatever pipelines you are using.

For your reference, we have rerun CoOp using the new config files and put below the comparison of Table 7's results.

Previous version

Method RN50 Rn101 ViT-B/32 ViT-B/16
Prompt engineering 55.41 58.72 59.88 64.71
Prompt ensembling 57.81 60.49 62.01 67.31
CoOp 60.46 64.39 64.92 70.13

Current version

Method RN50 Rn101 ViT-B/32 ViT-B/16
Prompt engineering 58.18 61.26 62.05 66.73
Prompt ensembling 60.41 62.54 63.71 68.74
CoOp 62.95 66.60 66.85 71.92

Using CoOP for my own dataset

Hello!
As described in readme, CoOP is used for valid datasets in CoOp/configs/datasets/. If I want to try CoOP for my own datasets, How can I do?

Looking forward to your reply!
Thanks

training speed

Thank you for your contribution.
I found that the training is slower when using multi-gpus (e.g., 8 gpus) than single gpu. Do you know why is it and how to speed up the training process?

When I train my network on oxford_flower(epoch=200), it get a different result.

Dear Zhou:
When I train my network on oxford_flower(epoch=200), it get a great result as follows:
=> result

  • total: 2,463
  • correct: 2,268
  • accuracy: 92.1%
  • error: 7.9%
  • macro_f1: 91.6%
    Elapsed: 0:14:32
    But if I run it again(as your code show, it will use the model I trained last time, which got good results), it gets a bad result as follows:
    => result
  • total: 2,463
  • correct: 876
  • accuracy: 35.6%
  • error: 64.4%
  • macro_f1: 30.1%.
    I am not sure why it produces a bad result, can you give me some advice.
    (May it does not use the BN of the trained model)?

can't download json

Hello, split_ zhou_ Caltech101.json,split_ zhou_ OxfordFlowers. JSON, and split_ zhou_ DescribableTextures. JSON page link can't be opened. Is there any other download link?

Much better CoOp performance

Thanks for your great work!
I tried to use your code to reproduce some results of CoOp reported in your CoCoOp paper.
I tried this model on the dtd dataset with:
bash main.sh dtd vit_b16_ep50 end 4 16 False.
Which is exactly same as the setting in the paper.
I got a much higher performance: accuracy: 67.38% +- 0.51%.
But the paper report CoOp performance as 54.24.

Reproducing results for one shot case

Hi,

thanks for your great work!
You state in your paper that the one shot experiments are trained for 50 epochs, but after using your code, it looks like the results you report for one shot are consistent with training for 200 epochs. When training with 50 epochs, I obtain results that are much better than those reported in the paper.

Any idea on what causes this?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.