Giter Site home page Giter Site logo

nermienkh / gensam Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jylin8100/gensam

0.0 0.0 0.0 35.95 MB

Code for AAAl 2024 paper: Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects

Shell 0.01% JavaScript 0.71% Python 33.60% CSS 0.13% HTML 0.54% Jupyter Notebook 65.01%

gensam's Introduction

๐Ÿ”ฅ GenSAM (AAAI 2024)

Code release of paper:

Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects

Jian Hu*, Jiayi Lin*, Weitong Cai, Shaogang Gong

๐Ÿš€ Updates

  • [2023.12.25] Demo of GenSAM is released.
  • [2023.12.12] Model running instructions with LLaVA1 and LLaVA1.5 are released.
  • [2023.12.10] LLaVA1 and LLaVA1.5 version GenSAM on CHAMELEON dataset is released.

๐Ÿ’ก Highlight

The Segment Anything Model (SAM) shows remarkable segmentation ability with sparse prompts like points. However, manual prompt is not always feasible, as it may not be accessible in real-world application. In this work, we aim to eliminate the need for manual prompt.The key idea is to employ Cross-modal Chains of Thought Prompting (CCTP) to reason visual prompts using the semantic information given by a generic text prompt. We introduce a test-time adaptation per-instance mechanism called Generalizable SAM (GenSAM) to automatically enerate and optimize visual prompts the generic task prompt.

A brief introduction of how we GenSAM do! CCTP maps a single generic text prompt onto image-specific consensus foreground and background heatmaps using vision-language models, acquiring reliable visual prompts. Moreover, to test-time adapt the visual prompts, we further propose Progressive Mask Generation (PMG) to iteratively reweight the input image, guiding the model to focus on the targets in a coarse-to-fine manner.Crucially, all network parameters are fixed, avoiding the need for additional training.Experiments demonstrate the superiority of GenSAM. Experiments on three benchmarks demonstrate that GenSAM outperforms point supervision approaches and achieves comparable results to scribble supervision ones, solely relying on general task descriptions as prompts.

Quick Start

Download Dataset

  1. Download the datasets from the follow links:

Camouflaged Object Detection Dataset

  1. Put it in ./data/.

Running GenSAM on CHAMELON Dataset with LLaVA1/LLaVA1.5

  1. When playing with LLaVA, this code was implemented with Python 3.8 and PyTorch 2.1.0. We recommend creating virtualenv environment and installing all the dependencies, as follows:
# create virtual environment
virtualenv GenSAM_LLaVA
source GenSAM_LLaVA/bin/activate
# prepare LLaVA
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
cd ..
# prepare SAM
pip install git+https://github.com/facebookresearch/segment-anything.git
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
pip install opencv-python imageio ftfy urllib3==1.26.6
  1. Our GenSAM is a training-free test-time adaptation approach, so you can play with it by running:
python main.py --config config/CHAMELEON_LLaVA1.5.yaml   ###LLaVA1.5
python main.py --config config/CHAMELEON_LLaVA.yaml   ###LLaVA

if you want to visualize the output picture during test-time adaptation, you can running:

python main.py --config config/CHAMELEON_LLaVA1.5.yaml --visualization    ###LLaVA1.5
python main.py --config config/CHAMELEON_LLaVA.yaml --visualization    ###LLaVA

Demo

We further prepare a jupyter notebook demo for visualization.

  1. Complete the following steps in the shell before opening the jupyter notebook.
    The virtualenv environment named GenSAM_LLaVA needs to be created first following Quick Start.
pip install notebook 
pip install ipykernel ipywidgets
python -m ipykernel install --user --name GenSAM_LLaVA
  1. Open demo_v1.ipynb and select the 'GenSAM_LLaVA' kernel in the running notebook.

TO-DO LIST

  • Update datasets and implementation scripts
  • Keep incorporating more capabilities
  • Demo and Codes

Citation

If you find our work useful in your research, please consider citing:

@misc{hu2023relax,
      title={Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects}, 
      author={Jian Hu and Jiayi Lin and Weitong Cai and Shaogang Gong},
      year={2023},
      eprint={2312.07374},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

๐Ÿ’˜ Acknowledgements

gensam's People

Contributors

lwpyh avatar jylin8100 avatar lvgd avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.