caris's Introduction

CARIS: Context-Aware Referring Image Segmentation

This repository is for the ACM MM 2023 paper CARIS: Context-Aware Referring Image Segmentation.

Requirements

The code is verified with Python 3.8 and PyTorch 1.11. Other dependencies are listed in requirements.txt.

Datasets

Please follow the instruction in .refer to download annotations of RefCOCO/RefCOCO+/RefCOCOg. We provide the combined annotations as refcocom here.

Download images from COCO. Please use the first downloading link 2014 Train images [83K/13GB], and extract the downloaded train_2014.zip file.

Data paths should be as follows:

.{YOUR_REFER_PATH}
├── refcoco
├── refcoco+
├── refcocog
├── refcocom

.{YOUR_COCO_PATH}
├── train2014

Pretrained Models

Download pretrained Swin-B and BERT-B. Check models to get pretrained CARIS models.

Usage

Train

By default, we use fp16 training for efficiency. To train a model on refcoco with 2 GPUs, modify YOUR_COCO_PATH, YOUR_REFER_PATH, YOUR_MODEL_PATH, and YOUR_CODE_PATH in scripts/train_refcoco.sh then run:

sh scripts/train_refcoco.sh

You can change DATASET to refcoco+/refcocog/refcocom for training on different datasets. Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd or --splitBy google to specify the split.

Test

Single-GPU evaluation is supported. To evaluate a model on refcoco, modify the settings in scripts/test_refcoco.sh and run:

sh scripts/test_refcoco.sh

You can change DATASET and SPLIT to evaludate on different splits of each dataset. Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd or --splitBy google to specify the split. For the models trained on refcocom, you can directly evaluate them on the splits of refcoco/refcoco+/refcocog(umd).

References

This repo is mainly built based on LAVT and mmdetection. Thanks for their great work!

Citation

If you find our code useful, please consider to cite with:

@inproceedings{liu2023caris,
  title={CARIS: Context-Aware Referring Image Segmentation},
  author={Liu, Sun-Ao and Zhang, Yiheng and Qiu, Zhaofan and Xie, Hongtao and Zhang, Yongdong and Yao, Ting},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  year={2023}
}

caris's People

Contributors

Stargazers

Watchers

caris's Issues

Attention map visualization

Thank you for your interesting work :)

I am curious about the specific code for visualizing the attention map of Figure 5. What does "visual features from the second level" refer to? And could you provide information on how to normalize the attention weight?

If it's possible to obtain the code for attention map visualization, it would be greatly helpful.

How long do you take to train CARIS?

Thank you in advance！！！

Pytorch version

Thank you for sharing excellent work!

I am wondering which version the code uses.

Thank you!

RefCOCOm dataset

Hello, Sir!
This is an interesting work, can you open the ref-cocom dataset?

Thank you.
Sincerely.

Recommend Projects

lsa1997 / caris Goto Github PK