wjf5203 / vnext Goto Github PK

Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))

License: Apache License 2.0

Python 89.59% Shell 0.36% C++ 2.51% Cuda 7.43% Dockerfile 0.08% CMake 0.02%

instance-segmentation object-detection transformer video-instance-segmentation tracking motion

vnext's Introduction

VNext:

VNext is a Next-generation Video instance recognition framework on top of Detectron2.
Currently it provides advanced online and offline video instance segmentation algorithms, and a motion model for object-centric video segmentation task.
We will continue to update and improve it to provide a unified and efficient framework for the field of video instance recognition to nourish this field.

To date, VNext contains the official implementation of the following algorithms:

InstMove: Instance Motion for Object-centric Video Segmentation (CVPR 2023)

IDOL: In Defense of Online Models for Video Instance Segmentation (ECCV2022 Oral)

SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV2022 Oral)

NEWS!!:

InstMove is accepted to CVPR 2023, the code and models can be found here!
IDOL is accepted to ECCV 2022 as an oral presentation!
SeqFormer is accepted to ECCV 2022 as an oral presentation!
IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

Getting started

For Installation and data preparation, please refer to to INSTALL.md for more details.
For InstMove training, evaluation, plugin, and model zoo, please refer to InstMove.md
For IDOL training, evaluation, and model zoo, please refer to IDOL.md
For SeqFormer training, evaluation and model zoo, please refer to SeqFormer.md

IDOL

In Defense of Online Models for Video Instance Segmentation

Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

Introduction

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models are usually inferior to the contemporaneous offline models by over 10 AP, which is a huge drawback.
By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association and propose IDOL, which outperforms all online and offline methods on three benchmarks.
IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

Visualization results on OVIS valid set

Quantitative results

YouTube-VIS 2019

OVIS 2021

SeqFormer

SeqFormer: Sequential Transformer for Video Instance Segmentation

Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Introduction

SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically.
SeqFormer is a robust, accurate, neat offline model and instance tracking is achieved naturally without tracking branches or post-processing.

Visualization results on YouTube-VIS 2019 valid set

Quantitative results

YouTube-VIS 2019

YouTube-VIS 2021

Citation

@inproceedings{seqformer,
  title={SeqFormer: Sequential Transformer for Video Instance Segmentation},
  author={Wu, Junfeng and Jiang, Yi and Bai, Song and Zhang, Wenqing and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

@inproceedings{IDOL,
  title={In Defense of Online Models for Video Instance Segmentation},
  author={Wu, Junfeng and Liu, Qihao and Jiang, Yi and Bai, Song and Yuille, Alan and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

Acknowledgement

This repo is based on detectron2, Deformable DETR, VisTR, and IFC Thanks for their wonderful works.

vnext's People

Contributors

Stargazers

Watchers

Forkers

johndpope ningyuanxiang wx-b cv-seg yibingwei-1 jaberwiki hongbo-sun ai-machine-vision-lab rishistyping maigewhite liuqinglong110 osanseviero techthiyanes marcosantonastasi lalalafloat afiqmuzaffar luoolu xiaojake princeon geko1100 mzkaramat ayankumarbhunia tuskaw dtananaev arasharchor anandhakrishnantecholution bensonlp sjxandy gavinpan66 hthloveydh abhineet123 xb534 wangbo-zhao guttappa1238 muchangshi paperkaiser anthonyyuan brahimmade assia855 chhaviilli or-toledano phanthanhhang ychen404 reno77 mornydew lidi100 dhavalkumarpatel castria-cn stanph

vnext's Issues

Evaluate Module

Hi,Dr.Wu,thanks for your amazing work and open-source code！But when I inferedthe model, I didn't find the evaluation function code in the ytvis_eval.py. Could you please provide the evaluation function code for inferring?

There is no dir 'projects/DDETRS/ddetrs/models/ops/'

Thanks for your excellent work.

SeqFormer question

In configuration.py

cfg.INPUT.SAMPLING_FRAME_NUM = 1

Does it mean that only one frame in the videos is selected as input?

Ask a question

lirui-9527, hello, can you add a friend, some friends want to ask you, my qq : 1136806362, thank you !

Hope memory optimize for long video

Hi wjf,

     the memory consumption at  inference post processing phrase is too large.  I try to optimize code ，but do not success.

Can you have plan to optimzie code ??

Training log

Good job! Thanks for releasing the full code. Could you provide the tensorboard log file during training? I'd like to see the changing trend of the loss function.

about SeqFormer/train_net.py

When i used your train_net.py for seqformer, they report errors for:

from detectron2.projects.seqformer import add_seqformer_config, build_detection_train_loader, build_detection_test_loader
from detectron2.projects.seqformer.data import (
YTVISDatasetMapper, YTVISEvaluator, get_detection_dataset_dicts,DetrDatasetMapper,)

I think the import path may not be correct.

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

python3 projects/IDOL/train_net.py --config-file projects/IDOL/configs/ytvis19_swinL.yaml --num-gpus 2

Traceback (most recent call last):
File "projects/IDOL/train_net.py", line 26, in
from detectron2.projects.idol import add_idol_config, build_detection_train_loader, build_detection_test_loader
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/init.py", line 2, in
from .idol import IDOL
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/idol.py", line 17, in
from .models.deformable_detr import DeformableDETR, SetCriterion
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/deformable_detr from .deformable_transformer import build_deforamble_transformer
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/deformable_transformer.py", line 25, in
from .ops.modules import MSDeformAttn
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/ops/modules/init.py", line 9, in
from .ms_deform_attn import MSDeformAttn
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/ops/modules/ms_deform_attn.py", line 21, in
from ..functions import MSDeformAttnFunction
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/ops/functions/init.py", line 9, in
from .ms_deform_attn_func import MSDeformAttnFunction
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/ops/functions/ms_deform_attn_func.py", line 18, in
import MultiScaleDeformableAttention as MSDA
ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

Empty yvtis_2019 testing results

I test SeqFormer model with res = Trainer.test(cfg, model).

And the output is empty, and I debug it.
And found the followings from seqformer.py line 238.
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}

broken link

Kia ora,

There is a broken link , as seen below:

Would you mind reuploading the weights and fixing this link?

请问ECCV2022比赛的训练集是什么？

如题，请问训练集是和2021一样的吗，我看codalab没有提供训练集

i try infer video output keyerror

RuntimeError: Timed out waiting 1800000ms for send operation to complete

Hi! Thanks for the excellent work.

I got the Error when running IDOL with multi-processes on a single node

python projects/IDOL/train_net.py --config-file projects/IDOL/configs/ytvis19_r50.yaml --num-gpus 8

Error：

RuntimeError: [/opt/conda/conda-bld/pytorch_1656352430114/work/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:136] Timed out waiting 1800000ms for send operation to complete

The program is stuck on some iteration and one of the 8 GPUs is idle and its util is 0.

Do you know what is going on? and how to fix it?

Thanks!

Why OVIS need so many Memory??

I try to inference on a hardware with 512G memory, some video preprocess waste all memory and crask .

When I change input size from 720 --> 100, can run sucessfully.

demo running IDOL

Hi,
thanks for the great work. Is it possible to tun the demo script over the webcam using the IDOL or the SeqFormer models? if yes, which inputs (e.g. config file) do I have to put in the demo.py command?

Thanks a lot

run demo.py error

Traceback (most recent call last):
File "G:/Download/VNext-main/demo/demo.py", line 14, in
from detectron2.data.detection_utils import read_image
File "G:\Download\VNext-main\detectron2\data_init_.py", line 4, in
from .build import (
File "G:\Download\VNext-main\detectron2\data\build.py", line 14, in
from detectron2.structures import BoxMode
File "G:\Download\VNext-main\detectron2\structures_init_.py", line 3, in
from .image_list import ImageList
File "G:\Download\VNext-main\detectron2\structures\image_list.py", line 8, in
from detectron2.layers.wrappers import shapes_to_tensor
File "G:\Download\VNext-main\detectron2\layers_init_.py", line 4, in
from .mask_ops import paste_masks_in_image
File "G:\Download\VNext-main\detectron2\layers\mask_ops.py", line 73, in
@torch.jit.script_if_tracing
AttributeError: module 'torch.jit' has no attribute 'script_if_tracing'

Process finished with exit code 1

I use python==3.6 cuda 10.1, tried different torchvision, but it did't solve.

How to padding the non-bbox?

the number of instances must be the same in one video? if not how to padding the non-bbox?
targets_for_clip_prediction.append({"labels": torch.stack(clip_classes,dim=0).max(0)[0], "boxes": torch.stack(clip_boxes,dim=1), # [num_inst,num_frame,4] 'masks': torch.stack(clip_masks,dim=1), # [num_inst,num_frame,H,W] 'size': torch.as_tensor([h, w], dtype=torch.long, device=self.device), # 'inst_id':inst_ids, # 'valid':valid_id })

the definition of proposed model

Thanks for releasing the code. Where can we find the pytorch model definition of IDOL? The IDOL is now presented by config file and Vnext library. I cannot find pytorch model definition code of IDOL.

Out of memory

When i tried to train seqformer (resnet50) with your default config in 4x 32G V100, there is "Cuda out of memory" error? Do I need to change any param? Thanks in advance.

Seqformer pretrained models

Hello, can you provide the pre-trained models of SeqFormer?

Thanks,

A question about training log

Hi,

I'm trying to reproduce IDOL on ovis, I want to ask whether the training log will be provided, because it is convenient for me to align the loss.
In fact, there is a problem. I noticed that you said in the pre issue that there will be three steps for training, first of all, pre training in coco, etc. I wonder if the AP of ovis is poor if it starts from step 2 directly？

How to reproduce visualization results in README?

Thanks for your wonderful work!

I'd like to reproduce the visualization results in your README.

I tried to add following 2 lines before demo/demo.py#L29:

from detectron2.projects.idol import add_idol_config
    add_idol_config(cfg)

But it returrn this:

appuser@0916140fb4f2:~/VNext/demo$ python demo.py --config-file ../projects/IDOL/configs/ytvis19_swinL.yaml --video-input ../0b6db1c6fd.mp4 --output ../out --opts MODEL.WEIGHTS ../YTVIS19_SWINL_643AP.pth
[08/05 06:23:21 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='../projects/IDOL/configs/ytvis19_swinL.yaml', input=None, opts=['MODEL.WEIGHTS', '../YTVIS19_SWINL_643AP.pth'], output='../out', video_input='../0b6db1c6fd.mp4', webcam=False)
/home/appuser/.local/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[08/05 06:23:28 fvcore.common.checkpoint]: [Checkpointer] Loading from ../YTVIS19_SWINL_643AP.pth ...
[ERROR:[email protected]] global /io/opencv/modules/videoio/src/cap_ffmpeg_impl.hpp (2927) open Could not find encoder for codec_id=27, error: Encoder not found
[ERROR:[email protected]] global /io/opencv/modules/videoio/src/cap_ffmpeg_impl.hpp (3002) open VIDEOIO/FFMPEG: Failed to initialize VideoWriter
[ERROR:[email protected]] global /io/opencv/modules/videoio/src/cap.cpp (595) open VIDEOIO(CV_IMAGES): raised OpenCV exception:

OpenCV(4.6.0) /io/opencv/modules/videoio/src/cap_images.cpp:253: error: (-5:Bad argument) CAP_IMAGES: can't find starting number (in the name of file): /tmp/video_format_test3zylu0ek/test_file.mkv in function 'icvExtractPattern'


  0%|                                                                                    | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "demo.py", line 178, in <module>
    for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames):
  File "/home/appuser/.local/lib/python3.7/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "~/VNext/demo/predictor.py", line 129, in run_on_video
    yield process_predictions(frame, self.predictor(frame))
  File "~/VNext/detectron2/engine/defaults.py", line 317, in __call__
    predictions = self.model([inputs])[0]
  File "/home/appuser/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "~/VNext/projects/IDOL/idol/idol.py", line 249, in forward
    video_len = len(batched_inputs[0]['file_names'])
KeyError: 'file_names'

Could you give me some hint about how to pass right batched_inputs to forward function?

Could you please provide the demo.py

Could you please provide the demo.py for us to display visualization results like Detectron2.
But the one in Detectron2 is for image-leve and it cannot be used directly.

Where is the training result of IDOL?

Thank you for your hardwork, it such a nice job. I have question about seeing the results of IDOL training. At the end of the training, in the log file there is a line that says "Evaluation results for ytvis_2019_val in csv format: ". Where is this csv file?

Evaluation Error in IDOL

Hi,
When I try to train a model for IDOL with the ssh cennection to a server, I am taking following error in the evaluation stage:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/aylinaydin/anaconda3/envs/project/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/aylinaydin/Project/VNext/detectron2/engine/launch.py", line 126, in _distributed_worker main_func(*args) File "/home/aylinaydin/Project/VNext/projects/IDOL/train_net.py", line 161, in main res = Trainer.test(cfg, model) File "/home/aylinaydin/Project/VNext/detectron2/engine/defaults.py", line 617, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/home/aylinaydin/Project/VNext/detectron2/evaluation/evaluator.py", line 158, in inference_on_dataset outputs = model(inputs) File "/home/aylinaydin/anaconda3/envs/project/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/aylinaydin/Project/VNext/projects/IDOL/idol/idol.py", line 284, in forward 0]) # (height, width) is resized size,images. image_sizes[0] is original size File "/home/aylinaydin/Project/VNext/projects/IDOL/idol/idol.py", line 357, in inference det_masks = output_mask[indices] RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
I look for change the device for each tensor, but i couldn't. Can you help me?

YouTube-VIS and OVIS ablation study splits

In the IDOL paper you run ablation studies on a split of the training set. I also found those in the dataset registration but you did not provide the json files or any script to generate these. Is it possible to share your splits? Thank you very much!

can convert to onnx or tracing for deploy?

convert model to scripting/tracing for easy deploy

Question about the threshold f(i, ˆj) > 0.5

I’m confused about how the threshold can be 0.5 for all situations, no matter what M and N is ? if M and N are big enough, f(i, ˆj) can easily be small than 0.5.

Table 4 and Table 5 using COCO pretraining or not?

Hi there,

Thank you for sharing the repo. In Table 3, the results of YOUTUBE-VIS 2019 are reported using both models with and without the COCO pretraining.

How about Table 4 and Table 5 for IDOL? I did not find the detailed settings and explanations for these two results.

Thanks

use https://github.com/youtubevos/cocoapi/ to evaluate our own dataset

A bug when evaluation or train on ovis dataset.

When I try to run python projects/IDOL/train_net.py --config-file projects/IDOL/configs/ovis_r50.yaml --num-gpus 1 --eval-only an error occured
for ann in self.dataset['annotations']: TypeError: 'NoneType' object is not iterable
The reason for this problem is the ovis datasets annotations file has an additional items "annotations": null ,you need delete it manual.
I suggest authors add a tips in README.md.

A question about oracle experiment

Hi,

junfeng,

Thank you for providing subsets about OVIS in another issue.
I have an additional question about oracle experiments. How does the oracle experiment of IDOL match instances between clips or frames? In other words, how to use GT to complete this process?

Use so much memory during inference

when I try to inference a long video dataset (about 200 frames per video) on a hardware with 256G memory,
i meet the error and get crash:
DefaultCPUAllocator: can't allocate memory: you tried to allocate 522746265600 bytes. Error code 12 (Cannot allocate memory)

is there a way to generate the result json file with a smaller memory?

gpu util = 0 while running inference

Hello, appreciate for your great job!

There is a problem when I run your inference script on ytvis dataset
python3 projects/IDOL/train_net.py --config-file projects/IDOL/configs/XXX.yaml --num-gpus 8 --eval-only MODEL.WEIGHTS /path to my .pth

Everything works well and GPU memory looks normal, but GPU util is always 0.

Total infer time is around 2h, I don't know if all gpus are correctly used during inference.

troubles when reproducing

Hi, thanks for the wonderful work.

But I had some troubles when trying to reproduce the results with command:

python3 projects/IDOL/train_net.py --config-file projects/IDOL/configs/ytvis19_r50.yaml --num-gpus 8 MODEL.WEIGHTS projects/IDOL/weights/cocopretrain_R50.pth SOLVER.IMS_PER_BATCH 16

The results is 46.96, which is lower than provided results 49.5. I'm using Torch 1.9.0, and batchsize was set to 16 instead of 32.

Is there something I missed? Looking forward to your reply.

OVIS inference video sequence too long ??

Hi,
i try reproduce paper about OVIS performance.
But i also get crash when run inference about OVIS..

 May be sequence too long ???  about 200 frames..

Training/Validating on Custom Datasets (In Classic COCO Segmentation Format)

Hi all,

Does this repo support training on a custom dataset? Or better still, has anyone successfully done so? I have a dataset in classic coco format which I am able to load into detectron2 with register_coco_instances. This dataset is one that contains a series of annotated images that can be combined to form a video. Currently, I am applying plain image instance segmentation using detectron2 PointRend but IDOL seems very promising to me.

In the case that this has not been done before, I would appreciate some pointers that would guide me toward writing some functionality on top of this repo to implement training/validating on a custom dataset.

Thanks and congrats on your paper!

Problems setting up dataset for IDOL

I am unable to setup the dataset for ytvis19 as required by INSTALL.md.
The page at YTVIS19 has two google drive folders, Labels and Image Frames. I downloaded and extracted these two folders. The directories are not organized in the way specified in INSTALL.md. I also cannot find the file ytvis/instances_train_sub.json
Am I downloading the wrong dataset? Could someone guide me through this?

Why Detectron2 and not MMdet (OpenMMLab projects .etc)?

首先非常感谢大佬提供这么一个仓库，我相信一定会对未来的VIS研究带来很大的便利以及统一的比较框架。但是，为什么采用detectron2，而不使用更加健全好用的openmmlab的仓库呢，这两个仓库都用过，detectron2使用起来比较晦涩，远不如mmdet，而且我本人也有想写一个基于mmdet的算法库。

Cannot reproduce the same mAP_L result on the Youtube-VIS 2022 validation set

Hello!
I trained IDOL using default swinL config yaml file, which only changes dataset from 19 to 21 and evaluate on Youtube-VIS 2022 validation set. I got nearly the same mAP_S but different mAP_L around 44, which is much lower than your mAP_L result 48.4 in your 1st solution working paper. Is there any problem?
Thank you very much.

A question about pretrained weights on COCO instance segmentation

Hello, Thanks for your wonderful work, could you please provide the training config or the pretrained weights on COCO instance segmentation.

Data parameter settings

Hello, author
If I want to use resnet50 as the backbone to train youtube2019, where should my dataset path be added ? In addition, which configuration file should I choose ? Thank you very much !

model release of coco static image pretraining

Appreciate for you excellent job! I am very interested in you repository for further research.

How soon will you release the model pretrained on coco for static image instance segmentation?

Best wishes.

ytvis 2019 evaluation

How to evaluate ytvis 2019 we cannot find the link to the server ?

Applying to image instance segmentation

Hello
How are you?
Thanks for contributing to this project.
I am going to apply your method to custom image instance segmentation rather than video instance segmentation.
The custom dataset is annotated with COCO image instance segmentation format.
Is it possible?

Can you share OVIS swin backbone training log for reference ?

Can you share OVIS swin backbone training log ?
I try to reproduce this paper ,but the score far less than what say in paper ? i do not know why ?

A question About COCO pretrained model

Does cocopretrain_R50.pth provided in model zoo mean only COCO pre-trained?
Or does it also pre-trained by COCO pseudo key-reference frame pairs?

Setting MAX_SIZE_TEST

The original SeqFormer repository (and most previous methods) limit the test time resolution. However, for this version of SeqFormer and IDOL you did not set INPUT.MAX_SIZE_TEST. Was this intentionally? For SeqFormer, the README.md contains the tables produced by the old repository. By not limiting the test time resolution the results of this version should be better, right? It should be noted, that this issue prevents a fair comparison with previous methods. Or am I missing something?

Can you provide a training log of one IDOL model?

Hi，IDOL is an excellent paper, with many wonderful designs.
Unfortunately, due to resource constraints, I can't fully train it.
Can you provide a training log of one IDOL model on the Youtube-VIS dataset?
Thanks.

Details about COCOPretrain

It seems 8*V100 is not enough.

wjf5203 / vnext Goto Github PK

vnext's Introduction

VNext:

NEWS!!:

Getting started

IDOL

Introduction

Visualization results on OVIS valid set

Quantitative results

YouTube-VIS 2019

OVIS 2021

SeqFormer

Introduction

Visualization results on YouTube-VIS 2019 valid set

Quantitative results

YouTube-VIS 2019

YouTube-VIS 2021

Citation

Acknowledgement

vnext's People

Contributors

Stargazers

Watchers

Forkers

vnext's Issues

Recommend Projects

Recommend Topics

Recommend Org