Giter Site home page Giter Site logo

qdmn's Introduction

Learning Quality-aware Dynamic Memory for Video Object Segmentation

ECCV 2022 | Paper

Abstract

Previous memory-based methods mainly focus on better matching between the current frame and the memory frames without explicitly paying attention to the quality of the memory. Therefore, frames with poor segmentation masks are prone to be memorized, which leads to a segmentation mask error accumulation problem and further affect the segmentation performance. In addition, the linear increase of memory frames with the growth of frame number also limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank to make the model have ability to hande videos of arbitray length.

Framework

Visualization Results

Long Video Comparison

(a) is the results of retaining the most recent memory frames and (b) is applying our updating strategy.

Results (S012)

Dataset Split J&F J F
DAVIS 2016 val 92.0 90.7 93.2
DAVIS 2017 val 85.6 82.5 88.6
DAVIS 2017 test-dev 81.9 78.1 85.4
Dataset Split Overall Score J-Seen F-Seen J-Unseen F-Unseen
YouTubeVOS 18 validation 83.8 82.7 87.5 78.4 86.4

Pretrained Model

Please download the pretrained s012 model here.

Requirements

The following packages are used in this project.

For installing Pytorch and torchvision, please refer to the official guideline.

For others, you can install them by pip install -r requirements.txt.

Data Preparation

Please refer to MiVOS to prepare the datasets and put all datasets in /data.

Code Structure

├── data/: here are train and test datasets.
│   ├── static
│   ├── DAVIS
│   ├── YouTube
│   ├── BL30K
├── datasets/: transform and dataloader for train and test datasets
├── model/: here are the code of the network and training engine(model.py)
├── saves/: here are the checkpoint obtained from training
├── scripts/: some function used to process dataset
├── util/: here are the config(hyper_para.py) and some utils
├── train.py
├── inference_core.py: test engine for DAVIS
├── inference_core_yv.py: test engine for YouTubeVOS
├── eval_*.py
├── requirements.txt

If you encounter the problem of prediction score is 0 in the pre-training stage, please change the ReLU activation function of FC layer in QAM to sigmoid, which will solve the above problem. The corresponding code is on line 174 and 175 of the model/modules.py file.

Training

For pretraining:

To train on the static image datasets, use the following command:

CUDA_VISIBLE_DEVICES=[GPU_ids] OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port [cccc] --nproc_per_node=GPU_num train.py --id [save_name] --stage 0

For example, if we use 2 GPU for training and use 's0-QDMN' as ckpt name, the command is:

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=2 train.py --id s0-QDMN --stage 0

For main training:

To train on DAVIS and YouTube, use this command:

CUDA_VISIBLE_DEVICES=[GPU_ids] OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port [cccc] --nproc_per_node=GPU_num train.py --id [save_name] --stage 2 --load_network path_to_pretrained_ckpt

Samely, if using 2 GPU, the command is:

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=2 train.py --id s03-QDMN --stage 2 --load_network saves/s0-QDMN/**.pth

Resume training

Besides, if you want to resume interrupted training, you can run the command with --load_model and using the *_checkpoint.pth, for example:

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=2 train.py --id s0-QDMN --stage 0 --load_model saves/s0-QDMN/s0-QDMN_checkpoint.pth

Inference

Run the following file to perform inference on the corresponding dataset.

  • eval_davis_2016.py used for DAVIS 2016 val set.
  • eval_davis.py used for DAVIS 2017 val and test-dev set (controlled by --split).
  • eval_youtube.py used for YouTubeVOS 2018/19 val and test set.

Evaluation

For the evaluation metric on DAVIS 2016/2017 val set, we refer to the repository DAVIS_val. For DAVIS 2017 test-dev set, you can get the metric results by submitting masks to the Codalab website DAVIS_test For YouTube2019 val set, please submit your results to YouTube19 For YouTube2018 val set, please submit to YouTube18

Citation

If you find this work useful for your research, please cite:

@inproceedings{liu2022learning,
  title={Learning quality-aware dynamic memory for video object segmentation},
  author={Liu, Yong and Yu, Ran and Yin, Fei and Zhao, Xinyuan and Zhao, Wei and Xia, Weihao and Yang, Yujiu},
  booktitle={ECCV},
  pages={468--486},
  year={2022}
}

Acknowledgement

Code in this repository is built upon several public repositories. Thanks to STCN, MiVOS, Mask Scoring RCNN for sharing their code.

qdmn's People

Contributors

yongliu20 avatar

Stargazers

 avatar wangyu avatar owenxiang avatar Donghan avatar Shiyi Zhang 张世乙 avatar Jixuan Fan avatar lg(x) avatar Raymond Nya avatar Nick Imanzi avatar Jinpeng Liu avatar Zong-Liang avatar  avatar 姬忠鹏 avatar  avatar Yaoyuan Liang avatar Zzier avatar  avatar  avatar Lancaster Li avatar Karchen avatar Micole You avatar  avatar 涂娜娜 avatar Fanison Tomas avatar Isabella Lopez avatar  avatar Haylie Wu avatar Priscilla J. Nunez avatar xiaopenhu avatar  avatar mr zhang avatar linqiuhua avatar 魔鬼·珺 avatar Erichen avatar 马志宇 avatar  avatar Nate River avatar  avatar  avatar 李易连 avatar Benjamin Moll avatar WangYiChen avatar Wenbo Zhu avatar  avatar Jingjing Wang avatar 白马非马 avatar かなめ avatar 筱楽 avatar Zhonghua Suo avatar  avatar  avatar  avatar wym keith avatar dianaTang avatar Chu Chu avatar  avatar Gloria Legere / Food Designer / Graphic designer/ Food stylist avatar  avatar  avatar Ken avatar wangdongdong avatar Covariance Dirac Delta avatar  avatar 安琪 avatar cookie08 avatar Zen Obsidian avatar Jaime Lee avatar Yiwei Chen avatar  avatar guanglinmei avatar  avatar  avatar  avatar Audrey Bitoni avatar  avatar Poyeh Li avatar hellozim22 avatar FILA avatar wannature avatar Lulu Chou avatar Hồ Thi Tý avatar Zeng Ge avatar  avatar  avatar MinerProxy avatar  avatar No.67 avatar xy avatar  avatar Keo avatar Johnny Chew avatar Jason Sung avatar Nowwa avatar Mofan Wu avatar  avatar  avatar Not Fatal Error Yet avatar  avatar 霎弼海龍 avatar  avatar

Watchers

 avatar Lulu Chou avatar Keo avatar Hồ Thi Tý avatar 霎弼海龍 avatar Not Fatal Error Yet avatar

qdmn's Issues

train.py启动问题

您好,请问单GPU情况下,您给出的train.py启动方式中,参数值需要怎么改?

Questions about pre-training

Hello, if you only use static_ Can images (stage-0) and daivs-youtube (stage-2) be trained?
How much worse is the accuracy of the model than that of the model containing BL30K?

Adding QAM only in STM's inference stage

您好,感谢您的这项出色的工作!
我现在在复现论文中所说的将QAM仅用于STM的测试阶段,但是有如下困惑,希望您能不吝指教:
1.将QAM用于STM测试阶段时,记忆池更新机制是否也要使用?
2.是对每一视频帧都用QAM计算其质量得分>0.8即可,还是像您代码中那样,至少间隔5帧再判断?
do_pass

youtube2018

您好,请问如何对youtube2018的结果进行评估,您所提供的链接失效,无法进行评估?
图片

关于预训练的问题

您好,我想问一下,stage-0 是静态图片,这个静态图片的数据集是哪一个呢?stage-1是BL30k,这个数据集的处理和davis是一致的嘛?

checkpoint file

Great job! Could you provide pretrained checkpoint file for test. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.