Giter Site home page Giter Site logo

caris's Introduction

CARIS: Context-Aware Referring Image Segmentation

This repository is for the ACM MM 2023 paper CARIS: Context-Aware Referring Image Segmentation.

Requirements

The code is verified with Python 3.8 and PyTorch 1.11. Other dependencies are listed in requirements.txt.

Datasets

Please follow the instruction in .refer to download annotations of RefCOCO/RefCOCO+/RefCOCOg. We provide the combined annotations as refcocom here.

Download images from COCO. Please use the first downloading link 2014 Train images [83K/13GB], and extract the downloaded train_2014.zip file.

Data paths should be as follows:

.{YOUR_REFER_PATH}
├── refcoco
├── refcoco+
├── refcocog
├── refcocom

.{YOUR_COCO_PATH}
├── train2014

Pretrained Models

Download pretrained Swin-B and BERT-B. Check models to get pretrained CARIS models.

Usage

Train

By default, we use fp16 training for efficiency. To train a model on refcoco with 2 GPUs, modify YOUR_COCO_PATH, YOUR_REFER_PATH, YOUR_MODEL_PATH, and YOUR_CODE_PATH in scripts/train_refcoco.sh then run:

sh scripts/train_refcoco.sh

You can change DATASET to refcoco+/refcocog/refcocom for training on different datasets. Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd or --splitBy google to specify the split.

Test

Single-GPU evaluation is supported. To evaluate a model on refcoco, modify the settings in scripts/test_refcoco.sh and run:

sh scripts/test_refcoco.sh

You can change DATASET and SPLIT to evaludate on different splits of each dataset. Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd or --splitBy google to specify the split. For the models trained on refcocom, you can directly evaluate them on the splits of refcoco/refcoco+/refcocog(umd).

References

This repo is mainly built based on LAVT and mmdetection. Thanks for their great work!

Citation

If you find our code useful, please consider to cite with:

@inproceedings{liu2023caris,
  title={CARIS: Context-Aware Referring Image Segmentation},
  author={Liu, Sun-Ao and Zhang, Yiheng and Qiu, Zhaofan and Xie, Hongtao and Zhang, Yongdong and Yao, Ting},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  year={2023}
}

caris's People

Contributors

lsa1997 avatar

Stargazers

 avatar Qiule Sun avatar Jalen avatar  avatar Li Jiachen avatar zzp avatar  avatar Chen J. avatar NightOwl avatar JuneHyoung Kwon avatar Seungho, Lee avatar Robert Luo avatar  avatar Seonghoon-Yu avatar Tianshu Yu avatar Vivi avatar  avatar yahooo avatar

Watchers

Kostas Georgiou avatar  avatar

Forkers

gist-ailab

caris's Issues

Attention map visualization

Thank you for your interesting work :)

I am curious about the specific code for visualizing the attention map of Figure 5. What does "visual features from the second level" refer to? And could you provide information on how to normalize the attention weight?

If it's possible to obtain the code for attention map visualization, it would be greatly helpful.

Pytorch version

Thank you for sharing excellent work!

I am wondering which version the code uses.

Thank you!

RefCOCOm dataset

Hello, Sir!
This is an interesting work, can you open the ref-cocom dataset?

Thank you.
Sincerely.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.