gqk / lae Goto Github PK

View Code? Open in Web Editor NEW

66.0 3.0 2.0 600 KB

A Unified Continual Learning Framework with General Parameter-Efficient Tuning, ICCV 2023 [PyTorch Code]

Home Page: https://arxiv.org/abs/2303.10070

License: Apache License 2.0

Makefile 0.31% Python 99.69%

lae's Introduction

A Unified Continual Learning Framework with General
Parameter-Efficient Tuning

Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, Jian Zhang

[Paper] [Supp] [arXiv] [BibTex]

News

[2023/08/19] Camera ready is submitted.
[2023/07/14] Accepted to ICCV 2023 as poster presentation, code is released to the public!

Installation

Install all dependencies via pip
```
pip install -r requirements.txt
```
⚠️ Remove torch and torchvision from requirements.txt first if another version of pytorch have already installed.

Dataset

Create a dataset root diretory, e.g., data.
CIFAR100 and ImageNet-R datasets will be automatically downloaded, while DomainNet requires manual download.

Overview of dataset root diretory

├── cifar100
│   └── cifar-100-python
├── domainnet
│   ├── clipart
│   ├── infograph
│   ├── painting
│   ├── quickdraw
│   ├── real
│   └── sketch
└── imagenet-r
    ├── imagenet-r
    ├── train_list.txt
    └── val_list.txt

⚠️ The train-validation split of ImageNet-R dataset are consistent with the L2P JAX code, replace the train_list.txt and val_list.txt with train_list_coda-p.txt and val_list_coda-p.txt if you want to use the train-validation splitation of CODA-Prompt.

Experiment

Generate config file (replace <root> with your dataset root path)

python main.py data.root=<root> data.dataset=cifar100 --print_config > cifar100.yaml

Run code with an experiment config file
```
python main.py --config=cifar100.yaml
```
Reproduce results in the paper

We provide configs and Makefile to quickly reproduce the ten-tasks experimental results reported in the paper, run the following command if the make has been installed:
```
make vit_adapter
make vit_lora
make vit_prefix
make swin_adapter
make convnext_adapter
```
Run make command with BASE arg (default is base/cifar100_order1.yaml) to reproduce other experiments, e.g.:
```
make BASE="base/imagenet-r_order1.yaml" vit_adapter
```
Modifiy data.num_increment_classes (5/10 for CIFAR100/ImageNet-R) in base config files to reproduce 20-task experiments.

Acknowledgement

PyTorch implementation of L2P and DualPrompt.
JAX implementation of L2P and DualPrompt: https://github.com/google-research/l2p.
CODA-Prompt , state-of-the-art work from CVPR 2023.
ESN, state-of-the-art work from AAAI 2023.
Continumm, awesome data loading library for Continual Learning.

Citation

@article{gao2023lae,
  title = {A Unified Continual Learning Framework with General Parameter-Efficient Tuning},
  author = {Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, Jian Zhang},
  journal = {International Conference on Computer Vision (ICCV)},
  year = {2023}
}

lae's People

Contributors

Stargazers

Watchers

Forkers

mldl whuhxb

lae's Issues

Question regarding reproduction of results

Hello,
Thanks for sharing the code and congratulations on your publication. I have been trying to reproduce your results on CIFAR-100 and I am not getting the average accuracy of 89.96. Here is the config file: https://pastebin.com/FzdDxBD7

Here is the log file: https://pastebin.com/qFMv3kW4

Looking forward to hearing from you :)

Subject: Adapter dimension

I appreciate your good work and thank you for sharing your excellent code. While going through the code, I had a question. In the vit_adapter.yaml, there is the following section:

extends:
  - ./base/cifar100_order1.yaml
module:
  model:
    backbone: ViT-B_16
  adapt_blocks: [0, 1, 2, 3, 4]
  pet_cls: Adapter
  pet_kwargs:
    down_sample: 5
    mode: parallel
    scale: null

Is down_sample: 5 an absolute value, not a ratio? As far as I know, a common Adapter typically involves a dimension reduction like hidden dim -> hidden_dim * 1/4 -> hidden_dim. Your code has the structure (768, 5), GELU(), (5, 768), is there a specific reason for this setup, and why the value is specifically 5?

Question regarding Joint-FT

Hello Authors,

Thank you for releasing the code. Could I ask how you implemented the Joint-FT experiment? More specifically:

Which codebase/configs was used?
What training settings were used for this experment? (Learning rate, epochs, learning rate schedule, etc).

Best,
Jinhyung Park

Question about local cross entorpy relating on classification head type

Question regarding CIFAR-100 training accuracy being Lower than evaluation accuracy

I've noticed an intriguing phenomenon where the training accuracy is lower than the evaluation accuracy. This seems to deviate from the common trend where training accuracy usually surpasses evaluation accuracy.

To provide some context, I have run the L2P code and observed that, as expected, the training accuracy for CIFAR-100 is higher than the evaluation accuracy. However, with your LAE method applied to the ImageNet-R dataset, it aligns well with the usual pattern of higher training accuracy. This leads me to wonder if there might be a specific reason behind the different behavior observed with CIFAR-100 in your implementation.

I am curious to understand more about this and would greatly appreciate any insights or explanations you could provide. Understanding the nuances of your approach would be incredibly beneficial for my ongoing research and experiments.

Thank you very much for your time and consideration. I am looking forward to your response.

Question about the naive baselines

Hi authors,
Congratulations on your great work! I have a few questions about you paper. It would be great if you can kindly answer them!

In your paper Tab.1 and Tab. 2, your baselines are extremely high, i.e. comparable or even better then L2P and DualPrompt.
Also in Tab.1 and Tab.2, the results of Seq-FT are also much higher the numbers reported in L2P paper. For CIFAR100 dataset, L2P reported 33.61% for Seq-FT while you reported 77.61%.
Could you explain why this happened? Did you use the task identity during inference? I.e., for the test set in each task, did you filter out the logits of other tasks?

Also is my understanding to the naive baseline correct? For CIFAR-100, you insert some new parameters into the pretrained model, i.e. 20 prompts and a 100-class classifier. Then you train the new parameters sequentially on each task. During inference, given a test image, you predict the category from the 100-class classifier without any other information (i.e. task class mask). Is this right?

Thanks!