Giter Site home page Giter Site logo

interformer's Introduction

InterFormer

This repo is the official implementation of "InterFormer: Real-time Interactive Image Segmentation"

Introduction

InterFormer follows a new pipeline to address the issues of existing pipeline's low computational efficiency. InterFormer extracts and preprocesses the computationally time-consuming part i.e. image processing from the existing process. Specifically, InterFormer employs a large vision transformer (ViT) on high-performance devices to preprocess images in parallel, and then uses a lightweight module called interactive multi-head self attention (I-MSA) for interactive segmentation. InterFormer achieves real-time high-quality interactive segmentation on CPU-only devices.

Demo

The following GIF animations were created on CPU-only devices:

Usage

Install

Requirements

Ensure the following requirements are met before proceeding with the installation process:

  • Python 3.8+
  • PyTorch 1.12.0
  • mmcv-full 1.6.0
  • mmsegmentation 0.26.0

Install PyTorch

To install PyTorch, please refer to the following resource: INSTALLING PREVIOUS VERSIONS OF PYTORCH

Install mmcv-full

pip install -U openmim
mim install mmcv-full==1.6.0

Install mmsegmentation

cd mmsegmentation
pip install -e .

Install Additional Dependency

pip install -r requirements.txt

Data preparation

COCO Dataset

To download the COCO dataset, please refer to cocodataset. You will need to download the following: 2017 Train Images, 2017 Val images and 2017 Panoptic Train/Val annotations into data

Alternatively, you can use the following script:

cd data/coco2017
bash coco2017.sh

The data is organized as follows:

data/coco2017/
├── annotations
│   ├── panoptic_train2017 [118287 entries exceeds filelimit, not opening dir]
│   ├── panoptic_train2017.json
│   ├── panoptic_val2017 [5000 entries exceeds filelimit, not opening dir]
│   └── panoptic_val2017.json
├── coco2017.sh
├── train2017 [118287 entries exceeds filelimit, not opening dir]
└── val2017 [5000 entries exceeds filelimit, not opening dir]

LVIS Dataset

To download the LVIS dataset, please refer to lvisdataset to download the images and annotations.

The data is organized as follows:

data/lvis/
├── lvis_v1_train.json
├── lvis_v1_train.json.zip
├── lvis_v1_val.json
├── lvis_v1_val.json.zip
├── train2017 [118287 entries exceeds filelimit, not opening dir]
├── train2017.zip
├── val2017 [5000 entries exceeds filelimit, not opening dir]
└── val2017.zip

SBD Dataset

To download the SBD dataset, please refer to SBD.

The data is organized as follows:

data/sbd/
├── benchmark_RELEASE
│   ├── dataset
│   │   ├── cls [11355 entries exceeds filelimit, not opening dir]
│   │   ├── img [11355 entries exceeds filelimit, not opening dir]
│   │   ├── inst [11355 entries exceeds filelimit, not opening dir]
│   │   ├── train.txt
│   │   └── val.txt
└── benchmark.tgz

DAVIS & GrabCut & Berkeley Datasets

Please download DAVIS GrabCut Berkeley from Reviving Iterative Training with Mask Guidance for Interactive Segmentation

The data is organized as follows:

data/
├── berkeley
│   └── Berkeley
│       ├── gt [100 entries exceeds filelimit, not opening dir]
│       ├── img [100 entries exceeds filelimit, not opening dir]
│       └── list
│           └── val.txt
├── davis
│   └── DAVIS
│       ├── gt [345 entries exceeds filelimit, not opening dir]
│       ├── img [345 entries exceeds filelimit, not opening dir]
│       └── list
│           ├── val_ctg.txt
│           └── val.txt
└── grabcut
    └── GrabCut
        ├── gt [50 entries exceeds filelimit, not opening dir]
        ├── img [50 entries exceeds filelimit, not opening dir]
        └── list
            └── val.txt

Training

MAE-Pretrained Weight

To download and transform the MAE-pretrained weights into mmseg-style, please refer to MAE.

E.g.

python tools/model_converters/beit2mmseg.py https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth pretrain/mae_pretrain_vit_base_mmcls.pth

The required weight files are located in the pretrain directory and are organized as follows:

pretrain
├── mae_pretrain_vit_base_mmcls.pth
└── mae_pretrain_vit_large_mmcls.pth

Start Training

To start the training of InterFormer-Light, run the following script:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_train.sh configs/interformer_light_coco_lvis_320k.py 4 --seed 42 --no-validate

To train InterFormer-Tiny, use the following script:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_train.sh configs/interformer_tiny_coco_lvis_320k.py 4 --seed 42 --no-validate

The trained weights are stored in work_dirs/interformer_light_coco_lvis_320k or work_dirs/interformer_tiny_coco_lvis_320k.

Evaluation

The trained weights are available at InterFormer

To start the evaluation on the GrabCut, Berkeley, SBD, or DAVIS dataset, use the following script:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_clicktest.sh ${CHECKPOINT_FILE} ${GPU_NUM} [--dataset ${DATASET_NAME}] [--size_divisor ${SIZE_DIVISOR}]

where CHECKPOINT_FILE is the path to the trained weight file, GPU_NUM is the number of GPUs used for evaluation, DATASET_NAME is the name of the dataset to evaluate on, and SIZE_DIVISOR is the divisor used to pad the image. The script will look for the CONFIG_FILE in the same folder of CHECKPOINT_FILE (.py extension).

For example, assume the data is organized as follows:

work_dirs/
└── interformer_tiny_coco_lvis_320k
    ├── interformer_tiny_coco_lvis_320k.py
    └── iter_320000.pth

To evaluate on SBD with InterFormer-Tiny, run:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_clicktest.sh work_dirs/interformer_tiny_coco_lvis_320k/iter_320000.pth 4 --dataset sbd --size_divisor 32

This command will start the evaluation by specifying the trained weight file work_dirs/interformer_tiny_coco_lvis_320k/iter_320000.pth and loading the configuration file interformer_tiny_coco_lvis_320k.py in the same folder.

The results are stored in work_dirs/interformer_tiny_coco_lvis_320k/clicktest_sbd_iter_320000_xxxx.json.

Running Demo

To run the demo directly with Python, use the following command in your terminal:

python demo/main.py path/to/checkpoint --device [cpu|cuda:0]

where:

  • path/to/checkpoint specifies the path to the checkpoint file that will be loaded before running the program.
  • --device specifies the device to use, either cpu or gpu.

Here's an example script to run the demo:

python demo/main.py work_dirs/interformer_tiny_coco_lvis_320k/iter_320000.pth --device cpu

interformer's People

Contributors

youhuang67 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

interformer's Issues

Reproducing results

Dear authors, @YouHuang67

Thanks for releasing the code and models. I am trying to rerun your released light model on DAVIS, but the numbers look better than you reported in the paper. For example, I got 5.54 for NoC90, but you reported 6.19. Also, your code will crash for case 008.jpg, so I need to remove it for evaluation. Missing one case won't cause such a big difference, so I may need your help to figure out the issue.

  1. How did you handle the crashed case 008.jpg?
  2. Did you merge all the objects in an image for evaluation? As previous works did.
  3. Did you do any processing for the original DAVIS345 dataset? I saw you at least renamed the masks.

BTW, the results for Berkeley are the same. I look forward to hearing from you! Thanks in advance.

Training code

I'm sorry to bother you again. I have been tracing your code, but I still cannot find the training code that consists of the complete training flow. Could you give me a hint?

Training Flow

image
In SimpleClick, each image undergoes 1 to 3 iterations. During the iteration for one image, previous outputs and new coordinate features are processed through the entire model each time. In your architecture, previous outputs and new coordinate features are fed into 'Feature Decoding.' Therefore, I believe that during the iteration for one image, the image features from 'Feature Encoding' are reused.
Is my understanding correct?

About different image sizes in Table 2

In Table 2 of the paper, you adopt different image sizes for different methods (with 512 for InterFormer). Would it be unfair for the performance (NoC) comparison?

AttributeError: 'ConfigDict' object has no attribute 'test'

Hello author, thank you very much for your outstanding contribution.
In Evaluation,To evaluate on SBD with InterFormer-Tiny, run: CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_clicktest.sh work_dirs/interformer_tiny_coco_lvis_320k/iter_320000.pth 4 --dataset sbd --size_divisor 32.
I encountered an error while running:Traceback (most recent call last):
File "/home/student1/wbj/InterFormer/tools/clicktest.py", line 316, in
main()
File "/home/student1/wbj/InterFormer/tools/clicktest.py", line 219, in main
dataset = build_dataset(cfg.data.test)
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/mmcv/utils/config.py", line 50, in getattr
raise ex
AttributeError: 'ConfigDict' object has no attribute 'test'
How should I solve it?

The full output of the error:
(py38_wbj) student1@user-PowerEdge-R555:~$ CUDA_VISIBLE_DEVICES=0,1 bash /home/student1/wbj/InterFormer/tools/dist_clicktest.sh "/home/student1/wbj/InterFormer/work_dirs/interformer_tiny_coco_lvis_320k/iter_320000.pth" 2 --dataset berkeley --size_divisor 32
/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


2023-07-05 01:55:19,481 - mmseg - INFO - Multi-processing start method is None
2023-07-05 01:55:19,482 - mmseg - INFO - OpenCV num_threads is 32 2023-07-05 01:55:19,482 - mmseg - INFO - OMP num threads is 1 2023-07-05 01:55:20,168 - mmseg - INFO - Multi-processing start method is None2023-07-05 01:55:20,169 - mmseg - INFO - OpenCV num_threads is32
2023-07-05 01:55:20,169 - mmseg - INFO - OMP num threads is 1
Traceback (most recent call last):
File "/home/student1/wbj/InterFormer/tools/clicktest.py", line 316, in
main()
File "/home/student1/wbj/InterFormer/tools/clicktest.py", line 219, in main
dataset = build_dataset(cfg.data.test)
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/mmcv/utils/config.py", line 50, in getattr
raise ex
AttributeError: 'ConfigDict' object has no attribute 'test'
Traceback (most recent call last):
File "/home/student1/wbj/InterFormer/tools/clicktest.py", line 316, in
main()
File "/home/student1/wbj/InterFormer/tools/clicktest.py", line 219, in main
dataset = build_dataset(cfg.data.test)
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/mmcv/utils/config.py", line 50, in getattr
raise ex
AttributeError: 'ConfigDict' object has no attribute 'test'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11982) of binary: /home/student1/anaconda3/envs/py38_wbj/bin/python
Traceback (most recent call last):
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/student1/anaconda3/envs/py38_wbj/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/student1/wbj/InterFormer/tools/clicktest.py FAILED

Failures:
[1]:
time : 2023-07-05_01:55:22
host : user-PowerEdge-R555
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 15984)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-07-05_01:55:22
host : user-PowerEdge-R555
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 15982)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.