Giter Site home page Giter Site logo

oatmealliu / finer Goto Github PK

View Code? Open in Web Editor NEW
33.0 3.0 6.0 25.71 MB

[ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models

Home Page: https://projfiner.github.io/

License: Apache License 2.0

Python 91.61% Shell 8.39%
fine-grained-recognition large-language-models reasoning vision-language vqa

finer's Introduction

Democratizing Fine-grained Visual Recognition with Large Language Models

Mingxuan Liu · Subhankar Roy · Wenjing Li · Zhun Zhong · Nicu Sebe · Elisa Ricci

ICLR 2024

TL;DR: We propose Fine-grained Semantic Category Reasoning (FineR) system to address fine-grained visual recognition without needing expert annotations and knowing category names as a-priori. FineR leverages large language models to identify fine-grained image categories by interpreting visual attributes as text. This allows it to reason about subtle differences between species or objects, outperforming current FGVR methods.

Logo

📣 News:

  • [03/01/2024] We released the code along with the intermediate results (in experiments/, including: super-class, attributes, attribute-description pairs, LLM-prompts, LLM raw answers, parsed LLM answers).
  • [01/15/2024] Our work is accepted to ICLR 2024 🌼! Code is coming soon. See you in Vienna this May!

💾 Installation

Requirements:

  • Linux or macOS with Python ≥ 3.9
  • PyTorch ≥ 2.1.0
  • OpenAI API (optional, if you want to discover semantic concepts using LMMs)
  1. Clone this repository adn move to the project working directory:
git clone https://github.com/OatmealLiu/FineR.git
cd FineR
  1. Install the working environment step-by-step:
conda create -n finer python=3.9.16 -y  # create finer conda environment
conda activate finer  # activate the environment and install dependencies

pip install --upgrade pip  # enable PEP 660 support
pip install -e .  # install dependencies
pip install git+https://github.com/openai/CLIP.git  # install CLIP
  1. If you want to discover the fine-grained semantic concepts by yourself via LLMs, please state your OpenAI key as:
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"

Note: you can add the above line to your ~/.bashrc or ~/.zshrc file as well

📦 Datasets Preparation

FineR is training-free. So we only need to prepare discovery and evaluation datasets, for which we provide the splits used in our paper in Google Drive.

  1. Download: we can download the 5 fine-grained datasets and the Pokemon dataset by clicking:

‍🦤 Bird-200 · 🚙 Car-196 · 🐕 Dog-120 · 🌼 Flower-102 · 🐈 Pet-37 · 👾 Pokemon-10

or via gdown:

# Go to your datasets storage directory
cd YOUR_DATASETS_DOWNLOAD_FOLDER

# Bird-200
gdown 'https://drive.google.com/uc?id=1iKctvfTNjYD1AEEqTIRGEqESXTVAC2Zg'
# Car-196
gdown 'https://drive.google.com/uc?id=1pKssGj5qej5HEHXsWRsvw-oOpWiedS76'
# Dog-120
gdown 'https://drive.google.com/uc?id=1HzVOIlNu5Tat0_fCWBGw8trx1KbqWKlH'
# Flower-102
gdown 'https://drive.google.com/uc?id=1iRvXYM8WhkM7M1-GLpTsJ8YWZUGDaAIE'
# Pet-37
gdown 'https://drive.google.com/uc?id=1kBuSXnzvh32IaYX_CowOmbzr1kQYST1T'
# Pokemon-10
gdown 'https://drive.google.com/uc?id=1F_UzN5TE-RfNnLKxykUl_EuOHKiey5L-'
  1. Organize: we can now extract the downloaded datasets wherever we want and can organize the downloaded datasets via softlinks (ln -s):
# Go to FineR working directory and replace YOUR_DATASETS_DOWNLOAD_FOLDER with your datasets storage directory
cd FineR/datasets
sh link_local_sets.sh YOUR_DATASETS_DOWNLOAD_FOLDER

after which, the directory will look like the following and we are ready to go:

FineR
    └── datasets
          ├── car_196
          ├── CUB_200_2011
          ├── dogs_120
          ├── flowers_102
          ├── pet_37
          └── pokemon

📋 Evaluation

We provide all the intermediate results, including super-category, useful attributes, attribute-description pairs, LLM-prompts, raw LLM replies, and parsed LLM replies from our disovery-->grouping pipeline under experiments. So that we can directly do evaluation here.

For the experiments using 3 images per class for discovery (paper Tab. 1), we can run them all via sh batch_launcher_eval or one-by-one:

# Bird-200
sh scripts_eval/b_pipe.sh
# Car-196
sh scripts_eval/c_pipe.sh
# Dog-120
sh scripts_eval/d_pipe.sh
# Flower-102
sh scripts_eval/f_pipe.sh
# Pet-37
sh scripts_eval/p_pipe.sh
# Pokemon
sh scripts_eval/poke_pipe.sh

For the experiments using random (long-tailed) images per class for discovery (paper Tab. 2), we can run them all via sh batch_launcher_eval_random or one-by-one:

# Bird-200
sh scripts_eval_random/b_pipe.sh
# Car-196
sh scripts_eval_random/c_pipe.sh
# Dog-120
sh scripts_eval_random/d_pipe.sh
# Flower-102
sh scripts_eval_random/f_pipe.sh
# Pet-37
sh scripts_eval_random/p_pipe.sh

In addtion, we also provide intermediate results of using 1 to 10 images per category for discovery in experiments folder which we used for sensitivity analysis. If you want to run experiments with them, you can simply replace the --num_per_category argument with 1 to 10 in the scripts.

⛓️ Full Pipeline

To run the full pipeline to discover semantic concepts from few image as observations for the experiments using 3 images per class for discovery, we can run them all via sh batch_launcher_fullpipe.sh or one-by-one:

# Bird-200
sh scripts_full_pipeline/b_pipe.sh
# Car-196
sh scripts_full_pipeline/c_pipe.sh
# Dog-120
sh scripts_full_pipeline/d_pipe.sh
# Flower-102
sh scripts_full_pipeline/f_pipe.sh
# Pet-37
sh scripts_full_pipeline/p_pipe.sh
# Pokemon
sh scripts_full_pipeline/poke_pipe.sh

To run the full pipeline to discover semantic concepts from few image as observations for the experiments using random (long-tailed) images per class for discovery, we can run them all via sh batch_launcher_fullpipe_random.sh or one-by-one:

# Bird-200
sh scripts_full_pipeline_random/b_pipe.sh
# Car-196
sh scripts_full_pipeline_random/c_pipe.sh
# Dog-120
sh scripts_full_pipeline_random/d_pipe.sh
# Flower-102
sh scripts_full_pipeline_random/f_pipe.sh
# Pet-37
sh scripts_full_pipeline_random/p_pipe.sh
# Pokemon
sh scripts_full_pipeline_random/poke_pipe.sh

Again, to run experiments with different number of image observations, you can simply replace the --num_per_category argument with 1 to 10 in the scripts.

Besides, to identify the super-category of the datasets and acquiring useful attributes for VQA-VLMs to describe the images, we can do the following multiple rounds to get the attributes from 3 images observations:

# To do them all together
sh batch_launcher_IdentifyAndHowto.sh
# Bird-200
sh scripts_IdentifyAndHowto/b_pipe.sh
# Car-196
sh scripts_IdentifyAndHowto/c_pipe.sh
# Dog-120
sh scripts_IdentifyAndHowto/d_pipe.sh
# Flower-102
sh scripts_IdentifyAndHowto/f_pipe.sh
# Pet-37
sh scripts_IdentifyAndHowto/p_pipe.sh
# Pokemon
sh scripts_IdentifyAndHowto/poke_pipe.sh

or random amount of observations:

# To do them all together
sh batch_launcher_IdentifyAndHowto_random.sh
# Bird-200
sh scripts_IdentifyAndHowto_random/b_pipe.sh
# Car-196
sh scripts_IdentifyAndHowto_random/c_pipe.sh
# Dog-120
sh scripts_IdentifyAndHowto_random/d_pipe.sh
# Flower-102
sh scripts_IdentifyAndHowto_random/f_pipe.sh
# Pet-37
sh scripts_IdentifyAndHowto_random/p_pipe.sh

🗻 Citation

Should you find our paper valuable to your work, we would greatly appreciate it if you could cite it:

@inproceedings{
    liu2024democratizing,
    title={Democratizing Fine-grained Visual Recognition with Large Language Models},
    author={Mingxuan Liu and Subhankar Roy and Wenjing Li and Zhun Zhong and Nicu Sebe and Elisa Ricci},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=c7DND1iIgb}
}

finer's People

Contributors

oatmealliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

finer's Issues

question

hello, I want to ask some question about this.
I want to know how to start this project, what is the process to start this project? Looking forward to your reply

Where is this file called control_discovery.py?

Excuse me, when I try to run "batch_launcher_IdentifyAndHowto.sh", I cannot find "control_discovery.py" and "control_eval_grouping.py", And I only found 'discovery.py' and 'grouping.py', what's the difference between the two?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.