Giter Site home page Giter Site logo

zhangyuanhan-ai / noah Goto Github PK

View Code? Open in Web Editor NEW
203.0 2.0 11.0 44.21 MB

Searching prompt modules for parameter-efficient transfer learning.

License: MIT License

Shell 7.89% Python 89.61% HTML 0.25% Ruby 0.07% CSS 0.18% JavaScript 2.00%
prompt-tuning transfer-learning domain-generalization visual-prompting pre-trained-model pytorch deep-learning

noah's People

Contributors

kaiyangzhou avatar zhangyuanhan-ai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

noah's Issues

Few-shot results

Hi Authors, Great work!

I wanted to know the exact numbers in Figure 4. so as to check the reproducibility and compare with my research. It would be great if you provide the same. Looking forward to hearing from you.

About GPU consumption

Thanks for your excellent work!
If I want to reproduce the result in paper, how many Tesla V100s or GPU memory are needed?

Clarification on Fewshot results in the paper

Hi,
I am trying to replicate the few shot results presented in the paper.
Currently I did the following:

for LR in 0.005 
do 
    for DATASET in food-101 oxford_pets stanford_cars oxford_flowers fgvc_aircraft 
    do 
        for SHOT in 8
        do
            for SEED in 0 1 2
            do
                python supernet_train_prompt.py --data-path=./data/${DATASET} --data-set=${DATASET}-FS --cfg=experiments/NOAH/subnet/few-shot/ViT-B_prompt_${DATASET}_shot${SHOT}-seed0.yaml --resume=${CKPT} --output_dir=saves/few-shot_${DATASET}_shot-${SHOT}_seed-${SEED}_lr-${LR}_wd-${WEIGHT_DECAY}_noah --batch-size=64 --mode=retrain --epochs=100 --lr=${LR} --weight-decay=${WEIGHT_DECAY} --few-shot-shot=${SHOT} --few-shot-seed=${SEED} --launcher="none"
            done
        done
    done
done

I used resume ckpt as vit16 pretrained model. I get average seed accuracy of 67.36 (0.49) on food101 dataset. From the figure 4, NOAH score on food101 is above 70. I have few questions:

  1. Is the accuracy in the figure 4 the mean of different seeds?
  2. Is setting the resume checkpoint as vit16 pretrained the right way?
  3. Or should I train a supernet model and use it as resume checkpoint when we retrain the subnet model? I guess if I do this, I need not do the search again since you already configured the right dimensions for each prompt module inside the experiments directory.

How to plot the Figure 1(b)?

It is nice to present the results in a way like figure 1(b).

I am curious is there any convenient way to plot such a figure?

Different limit

Hello. Thanks for the work.

I have a question regarding different limits. I have successfully run the first and second stages.
In the second stage, I used a different limit number from the original paper.
For the last retraining stage, If I understand correctly, I need to manually check the search training log to find the best and modify the experiments/NOAH/subnet/VTAB/ViT-B_prompt_${DATASET}.yaml to the corresponding setting? Is the last one in the search log always the best?

Thank you for your time and help. I am looking forward to your reply.

About #params of Adapter

Hi, thank you for sharing the awesome code.
After reading the paper, I am a little confused about the usage of Adapter. It seems that the adapters are placed in FFN blocks only, not the same as in Houlsby's paper where the Attn blocks also have adapters. So I think the #params of the 8dim-Adapter should be 12 * (8 * 768 * 2 + 8 + 768) = 0.157M. But Table 1 shows that the #params of Adapter is 0.33M. So I wonder if you also place adapters in the Attn blocks for the baselines.

Problem preparing RESISC45 and Diabetic Retinopathy dataset

Hi authors,

Thank you for your work. I am using your code repo for my project. I am trying to prepare the RESISC45 and Diabetic Retinopathy dataset using the script python get_vtab1k.py. I manually downloaded these datasets and successfully prepared them. However, when I run the script, it throws me FileNotFoundError: [Errno 2] No such file or directory: './data/vtab-1k/resisc45/train800val200.txt'. I check the folder and there is no txt file created.

The issue happens to Diabetic as well. Can you help me with what I am missing?

CONFIG = $1, where to find config?

in your scripts, e.g., .../NOAH/configs/VPT/VTAB/slurm_train_vpt_vtab.sh, the config for each dataset is passed to the script via parameter 1 (i.e. $1).

however, i can not see any of information in your README indicating where to find such config files? I want to know the config for VPT original since their paper did not provide any code.

About the slurm setting

Thanks for your great work!
It seems that all the experiments require the slurm on multiple machines, right?

caltech dataset

When downloading Caltech datasets,why does it always report network errors? How can I solve this problem?

Only gain 64.39 on cifar100 using VPT

In the paper, the result of VPT on Cifar100 is 78.8. But I repeoduce a worse results: 64.39.

Here is my command:
bash configs/VPT/VTAB/ubuntu_train_vpt_vtab.sh experiments/VPT/ViT-B_prompt_vpt_100.yaml ViT-B_16.npz
lr is 0.001, weight decay is 0.0001.

Here is the outputs:

{"train_lr": 1.0000000000000002e-06, "train_loss": 4.60572919845581, "test_loss": 4.605674954909313, "test_acc1": 1.0, "test_acc5": 5.06, "epoch": 0, "n_parameters": 1152100}
{"train_lr": 0.0009001000000000005, "train_loss": 3.097556209564209, "test_loss": 2.907223858410799, "test_acc1": 24.05, "test_acc5": 63.74, "epoch": 10, "n_parameters": 1152100}
{"train_lr": 0.0009144048842659077, "train_loss": 1.6018924713134766, "test_loss": 1.7854226706903191, "test_acc1": 52.95, "test_acc5": 84.68, "epoch": 20, "n_parameters": 1152100}
{"train_lr": 0.0008083889915582233, "train_loss": 1.1247423966725667, "test_loss": 1.485605075389524, "test_acc1": 60.25, "test_acc5": 88.44, "epoch": 30, "n_parameters": 1152100}
{"train_lr": 0.0006726752705214194, "train_loss": 1.0146915316581726, "test_loss": 1.5092519914047628, "test_acc1": 61.44, "test_acc5": 87.76, "epoch": 40, "n_parameters": 1152100}
{"train_lr": 0.0005205483257436735, "train_loss": 0.9674314936002095, "test_loss": 1.4829329569128495, "test_acc1": 62.81, "test_acc5": 88.66, "epoch": 50, "n_parameters": 1152100}
{"train_lr": 0.0003668994025105816, "train_loss": 0.8998952905337015, "test_loss": 1.4677390011051032, "test_acc1": 63.51, "test_acc5": 88.85, "epoch": 60, "n_parameters": 1152100}
{"train_lr": 0.00022676872796319536, "train_loss": 0.8787842233975728, "test_loss": 1.4812716230561462, "test_acc1": 63.59, "test_acc5": 89.04, "epoch": 70, "n_parameters": 1152100}
{"train_lr": 0.00011387326887403328, "train_loss": 0.8721011678377787, "test_loss": 1.46410808382155, "test_acc1": 64.24, "test_acc5": 89.54, "epoch": 80, "n_parameters": 1152100}
{"train_lr": 3.9264019367658406e-05, "train_loss": 0.8584762454032898, "test_loss": 1.4647658761543563, "test_acc1": 64.45, "test_acc5": 89.28, "epoch": 90, "n_parameters": 1152100}
{"train_lr": 1.0976769428005575e-05, "train_loss": 0.868200953801473, "test_loss": 1.4616762444942812, "test_acc1": 64.39, "test_acc5": 89.4, "epoch": 99, "n_parameters": 1152100}

Can you help to find the reasons?

Question about Adapter

Thank you so much for sharing the code. In the original paper of the Adapter, one Adapter layer is placed after the Attention module and the other one is placed after the FF layer. In your paper, it seems the Adapter layer is only placed after the FF layer. Could you elaborate on the rationale behind this change? Thank you so much in advance, and I look forward to hearing back from you.

Questions about reproduction.

Great job on your work, and thank you for sharing the code. I do have a few questions regarding running the experiments on vtab-1k.

Regarding VPT / Adapter / LoRA:

  • Acc performance: My results for Adapter and LoRA are significantly worse than the original paper (72.0 / 73.9 / 74.5), with my results being 72.81 / 71.65 / 72.35, respectively. I used your configuration files and scripts directly to run Adapter and LoRA, but I am not sure if the code is the exact version used to produce the results in the original paper.
  • Params: The prompt length of VPT varies across different datasets. How do you determine the parameters for VPT to be 0.64M?

Regarding NOAH:

  • The pipeline has three stages: 1) supernet training; 2) evolutionary search; 3) subnet retraining. In evolutionary search, should the parameter LIMITS be 0.64? However, it seems to make the searching process omit a lot of subnets (I'm not sure if this is correct)? The original paper says Output: a new subset C and its performance on val set: {C:Acc.}, but my searching logs cannot produce such results. For instance, *-caltech101-vtab-search.log shows the Acc@1 is only 1.000, which is very low and random.
  • Additionally, how can I obtain the subnet configuration files in experiments/NOAH/subnet/VTAB?

Regarding others:

  • The model config includes visual_prompt, lora, adapter, and prefix. What does prefix mean here? Is it another method different from VPT / Adapter / LoRA?

I look forward to your response.

error in lib/config.py

hi, i followed your instructions and then i met an error shown below:

File "NOAH/lib/config.py", line 36, in update_config_from_file
exp_config = edict(yaml.safe_load(f))
File "/.conda/envs/NOAH/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 15: invalid start byte

Then i add " encoding='ISO-8859-1' " to the config.py:

def update_config_from_file(filename):
exp_config = None
with open(filename , 'ISO-8859-1') as f:
exp_config = edict(yaml.safe_load(f))
_update_config(cfg, exp_config)

but there is a new error making me hard to move to next step:

File "/.conda/envs/NOAH/lib/python3.8/site-packages/yaml/reader.py", line 144, in check_printable
raise ReaderError(self.name, position, ord(character),
yaml.reader.ReaderError: unacceptable character #x0003: special characters are not allowed
in "vit_model/ViT-B_16.npz", position 2

could you please solve this problem? and if there are any other problems that you did not post? i just want to reproduce this work. thx!

About reproduction.

Hello. Thanks for the interesting work.

I have trouble with reproduction (I think this is a problem for me, not for you. ).

The reproduction result of LoRA on Caltech101 seems to be lower than the result in the paper.

We executed the following commands (without srun).

CONFIG=experiments/LoRA/ViT-B_prompt_lora_8.yaml
CKPT=ckpt/ViT-B_16.npz

for LR in 0.001
do
    for DATASET in caltech101
    do 
        python supernet_train_prompt.py --data-path=./vtab-1k/${DATASET} --data-set=${DATASET} --cfg=${CONFIG} --resume=${CKPT} --output_dir=./saves/${DATASET}_lr-${LR}_wd-${WEIGHT_DECAY}_lora_100ep_noaug_xavier_dp01_same-transform_nomixup --batch-size=64 --lr=${LR} --epochs=100 --is_LoRA --weight-decay=${WEIGHT_DECAY} --no_aug --mixup=0 --cutmix=0 --direct_resize --smoothing=0 
    done
done

With three different seeds, any accuracy did not exceed 90%.
The output file is here.

The reproduction result of Adapter is significantly low especially on SVHN.

We executed the following commands (without srun).

CONFIG=experiments/Adapter/ViT-B_prompt_adapter_8.yaml
CKPT=ckpt/ViT-B_16.npz

for LR in 0.001
do
    for DATASET in svhn
    do 
        python supernet_train_prompt.py --data-path=./vtab-1k/${DATASET} --data-set=${DATASET} --cfg=${CONFIG} --resume=${CKPT} --output_dir=./saves/${DATASET}_${LR}_${WEIGHT_DECAY}_vpt --batch-size=64 --lr=${LR} --epochs=100 --is_visual_prompt_tuning --weight-decay=${WEIGHT_DECAY} --mixup=0 --cutmix=0 --drop_rate_prompt=0.1 --no_aug --inception --direct_resize --smoothing=0 
    done
done

The accuracy is around 35%.
The output file is here.

I would appreciate any advice you can give me, no matter what information you have.🙇‍♂️

dataset

Hello, I can't download the VTAB dataset according to your configuration, can you send me a copy of the dataset, my email is [email protected].
And I hope you can provide the versions of the following packages:tensorflow、tensorflow-addons、tensorflow-metadata、tensorflow-datasets、tfds-nightly.

Regarding data pipeline of ImageNet

Hello Authors,

I was trying to run ImageNet 16-shot experiment and couldn't find the annotation file for ImageNet

train_list_path = os.path.join(self.dataset_root, 'annotations/train_meta.list.num_shot_'+str(shot)+'.seed_'+str(seed))
Can you please let me know how to run the ImageNet experiment.

Searched configurations in the main table

Thanks for your nice work!

I just wonder whether it is possible to release the searched configurations (how many prompts/adapter/lora) that are used to reproduce the results for each dataset in table 1 of your paper.

image

Thanks in advance!

Kind regards,
Haoyu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.