Giter Site home page Giter Site logo

ibm / autovp Goto Github PK

View Code? Open in Web Editor NEW
14.0 6.0 1.0 602 KB

[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark

Home Page: https://arxiv.org/abs/2310.08381

License: Apache License 2.0

Python 100.00%
downstream-tasks efficient-inference finetuning ood-robustness reprogramming visual-prompt visual-prompting foundation-models low-data-regime model-agnostic

autovp's Introduction

AutoVP: An Automated Visual Prompting Framework and Benchmark

Overview

This repository contains code for the ICLR 2024 Paper "AutoVP: An Automated Visual Prompting Framework and Benchmark" by Hsi-Ai Tsao*, Lei Hsiung*, Pin-Yu Chen, Sijia Liu, and Tsung-Yi Ho.

Visual prompting (VP) is an emerging parameter-efficient fine-tuning approach to adapting pre-trained vision models to solve various downstream image-classification tasks. However, there has hitherto been little systematic study of the design space of VP and no clear benchmark for evaluating its performance. To bridge this gap, we propose AutoVP, an end-to-end expandable framework for automating VP design choices, along with 12 downstream image-classification tasks that can serve as a holistic VP-performance benchmark. Our design space covers 1) the joint optimization of the prompts; 2) the selection of pre-trained models, including image classifiers and text-image encoders; and 3) model output mapping strategies, including nonparametric and trainable label mapping. Our extensive experimental results show that AutoVP outperforms the best-known current VP methods by a substantial margin, having up to 6.7% improvement in accuracy; and attains a maximum performance increase of 27.5% compared to linear-probing (LP) baseline. AutoVP thus makes a two-fold contribution: serving both as an efficient tool for hyperparameter tuning on VP design choices, and as a comprehensive benchmark that can reasonably be expected to accelerate VP's development.

System Plot Performance

Dependencies

Run pip3 install -r requirements.txt.

Datasets

Flowers102, DTD, EuroSAT, UCF101, and ILM_Dataloader.py are from ILM-VP.

ISIC (Melanoma) Dataset is from ISIC.

Experiments

VP Training with Specified Setting:

python3 demo.py --dataset "OxfordIIITPet" --datapath "./OxfordIIITPet" --download 1 --pretrained "clip" --mapping_method "frequency_based_mapping" --out_map_num 10 --img_scale 1.5 --lr 40 --epoch 100

  • Parameters in demo.py:
    • dataset and datapath: Name of the dataset and directory path.

    • download: Download the dataset or not.

    • param_tune: Perform hyperparameter tuning and train the model using the best configuration selected during tuning.

    • LR_WD_tune: Perform additional hyperparameter tuning on learning rate (LR) and weight decay (WD) and train on the best configuration.

    • pretrained: The pre-trained source model.

    • mapping_method: The output mapping method.

    • img_scale: The initial image scale.

    • out_map_num: The number of source labels mappd to each target label.

    • train_resize: Turn on the trainable image resize module or not.

    • freqmap_interval: Frequency mapping interval (-1 for FreqMap, >1 for IterMap).

    • weightinit: When using CLIP as the pre-trained model and FullyMap as the output mapping, it is recommended to turn on weight initialization.

    • epoch, lr, seed: The training epoch, learning rate (recommended: lr=40 for CLIP, otherwise set to 0.001), and random seed.

    • scalibility_rio: The data usage proportion (1/scalibility_rio).

    • scalibility_mode: The data splitting strategy.

Hyper-Parameter Tuning and VP Training:

  • Tunable VP configurations: pretrained, mapping_method, img_scale, out_map_num, train_resize, and freqmap_interval

    • python3 demo.py --dataset "OxfordIIITPet" --datapath "./OxfordIIITPet" --download 1 --param_tune 1
  • Additional tunable parameters: lr and weight_decay

    • Perform tuning on both VP configurations and additional parameters:

    python3 demo.py --dataset "OxfordIIITPet" --datapath "./OxfordIIITPet" --download 1 --param_tune 1 --LR_WD_tune 1

    • Perform tuning on additional parameters with specified settings:

    python3 demo.py --dataset "OxfordIIITPet" --datapath "./OxfordIIITPet" --download 1 --pretrained "clip" --mapping_method "frequency_based_mapping" --out_map_num 10 --img_scale 1.5 --LR_WD_tune 1

Baseline Evaluation:

python3 Baseline.py --dataset "OxfordIIITPet" --datapath "./OxfordIIITPet" --download 1 --pretrained "clip" --baseline "CLIP_LP" --lr 40 --epoch 100

  • Parameters in Baseline.py
    • dataset and datapath: Name of the dataset and directory path.

    • download: Download the dataset or not.

    • pretrained: The pre-trained source model.

    • epoch, lr, seed: The training epoch, learning rate (recommended: lr=40 for CLIP, otherwise set to 0.001), and random seed.

    • scalibility_rio: The data usage proportion (1/scalibility_rio).

    • scalibility_mode: The data splitting strategy.

    • baseline: The baseline mode. When using CLIP a pre-trained model, please choose CLIP_LP for linear probing training.

Evaluate on the Previous Checkpoint:

python3 Evaluation.py --dataset "OxfordIIITPet" --datapath "./OxfordIIITPet" --download 1

  • Parameters in Evaluation.py
    • dataset and datapath: Name of the dataset and directory path. It will evaluate on the checkpoint file {dataset}_last.pth

    • download: Download the dataset or not.

    • epoch, lr, seed: The training epoch, learning rate (recommended: lr=40 for CLIP, otherwise set to 0.001), and random seed.

    • scalibility_rio: The data usage proportion (1/scalibility_rio).

    • scalibility_mode: The data splitting strategy.

Plot the Learned Prompts in Frequency Domain

python3 frequency_analysis.py --dataset "OxfordIIITPet"

  • Parameters in frequency_analysis.py
    • dataset: Name of the dataset. It will evaluate on the checkpoint file {dataset}_last.pth

Citations

If you find this helpful for your research, please cite our papers as follows:

@inproceedings{tsao2024autovp,
  title={{AutoVP: An Automated Visual Prompting Framework and Benchmark}}, 
  author = {Hsi-Ai Tsao and Lei Hsiung and Pin-Yu Chen and Sijia Liu and Tsung-Yi Ho},
  booktitle={The Twelfth International Conference on Learning Representations},
  year = {2024}
}

autovp's People

Contributors

ibm-open-source-bot avatar shi-ai-tsao avatar twweeb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

kql11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.