Giter Site home page Giter Site logo

wjf5203 / vnext Goto Github PK

View Code? Open in Web Editor NEW
593.0 17.0 52.0 54.94 MB

Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))

License: Apache License 2.0

Python 89.59% Shell 0.36% C++ 2.51% Cuda 7.43% Dockerfile 0.08% CMake 0.02%
instance-segmentation object-detection transformer video-instance-segmentation tracking motion

vnext's Introduction

VNext:

  • VNext is a Next-generation Video instance recognition framework on top of Detectron2.
  • Currently it provides advanced online and offline video instance segmentation algorithms, and a motion model for object-centric video segmentation task.
  • We will continue to update and improve it to provide a unified and efficient framework for the field of video instance recognition to nourish this field.

To date, VNext contains the official implementation of the following algorithms:

InstMove: Instance Motion for Object-centric Video Segmentation (CVPR 2023)

IDOL: In Defense of Online Models for Video Instance Segmentation (ECCV2022 Oral)

SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV2022 Oral)

NEWS!!:

  • InstMove is accepted to CVPR 2023, the code and models can be found here!
  • IDOL is accepted to ECCV 2022 as an oral presentation!
  • SeqFormer is accepted to ECCV 2022 as an oral presentation!
  • IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

Getting started

  1. For Installation and data preparation, please refer to to INSTALL.md for more details.
  2. For InstMove training, evaluation, plugin, and model zoo, please refer to InstMove.md
  3. For IDOL training, evaluation, and model zoo, please refer to IDOL.md
  4. For SeqFormer training, evaluation and model zoo, please refer to SeqFormer.md

IDOL

PWC PWC PWC

In Defense of Online Models for Video Instance Segmentation

Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

Introduction

  • In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models are usually inferior to the contemporaneous offline models by over 10 AP, which is a huge drawback.

  • By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association and propose IDOL, which outperforms all online and offline methods on three benchmarks.

  • IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

Visualization results on OVIS valid set

Quantitative results

YouTube-VIS 2019

OVIS 2021

SeqFormer

PWC

SeqFormer: Sequential Transformer for Video Instance Segmentation

Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Introduction

  • SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically.

  • SeqFormer is a robust, accurate, neat offline model and instance tracking is achieved naturally without tracking branches or post-processing.

Visualization results on YouTube-VIS 2019 valid set

Quantitative results

YouTube-VIS 2019

YouTube-VIS 2021

Citation

@inproceedings{seqformer,
  title={SeqFormer: Sequential Transformer for Video Instance Segmentation},
  author={Wu, Junfeng and Jiang, Yi and Bai, Song and Zhang, Wenqing and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

@inproceedings{IDOL,
  title={In Defense of Online Models for Video Instance Segmentation},
  author={Wu, Junfeng and Liu, Qihao and Jiang, Yi and Bai, Song and Yuille, Alan and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

Acknowledgement

This repo is based on detectron2, Deformable DETR, VisTR, and IFC Thanks for their wonderful works.

vnext's People

Contributors

ifighting avatar qihao067 avatar wjf5203 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vnext's Issues

model release of coco static image pretraining

Appreciate for you excellent job! I am very interested in you repository for further research.

How soon will you release the model pretrained on coco for static image instance segmentation?

Best wishes.

Hope memory optimize for long video

Hi wjf,

     the memory consumption at  inference post processing phrase is too large.  I try to optimize code ,but do not success.

Can you have plan to optimzie code ??

Could you please provide the demo.py

Could you please provide the demo.py for us to display visualization results like Detectron2.
But the one in Detectron2 is for image-leve and it cannot be used directly.

Training log

Good job! Thanks for releasing the full code. Could you provide the tensorboard log file during training? I'd like to see the changing trend of the loss function.

gpu util = 0 while running inference

Hello, appreciate for your great job!

There is a problem when I run your inference script on ytvis dataset
python3 projects/IDOL/train_net.py --config-file projects/IDOL/configs/XXX.yaml --num-gpus 8 --eval-only MODEL.WEIGHTS /path to my .pth

Everything works well and GPU memory looks normal, but GPU util is always 0.
image

Total infer time is around 2h, I don't know if all gpus are correctly used during inference.

Empty yvtis_2019 testing results

I test SeqFormer model with res = Trainer.test(cfg, model).

And the output is empty, and I debug it.
And found the followings from seqformer.py line 238.
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}
{'image_size': (720, 1280), 'pred_scores': [], 'pred_labels': [], 'pred_masks': []}

broken link

Kia ora,

There is a broken link here, as seen below:

image

Would you mind reuploading the weights and fixing this link?

Training/Validating on Custom Datasets (In Classic COCO Segmentation Format)

Hi all,

Does this repo support training on a custom dataset? Or better still, has anyone successfully done so? I have a dataset in classic coco format which I am able to load into detectron2 with register_coco_instances. This dataset is one that contains a series of annotated images that can be combined to form a video. Currently, I am applying plain image instance segmentation using detectron2 PointRend but IDOL seems very promising to me.

In the case that this has not been done before, I would appreciate some pointers that would guide me toward writing some functionality on top of this repo to implement training/validating on a custom dataset.

Thanks and congrats on your paper!

Where is the training result of IDOL?

Thank you for your hardwork, it such a nice job. I have question about seeing the results of IDOL training. At the end of the training, in the log file there is a line that says "Evaluation results for ytvis_2019_val in csv format: ". Where is this csv file?

A question about oracle experiment

Hi,

junfeng,

Thank you for providing subsets about OVIS in another issue.
I have an additional question about oracle experiments. How does the oracle experiment of IDOL match instances between clips or frames? In other words, how to use GT to complete this process?

Table 4 and Table 5 using COCO pretraining or not?

Hi there,

Thank you for sharing the repo. In Table 3, the results of YOUTUBE-VIS 2019 are reported using both models with and without the COCO pretraining.

How about Table 4 and Table 5 for IDOL? I did not find the detailed settings and explanations for these two results.

Thanks

Data parameter settings

Hello, author
If I want to use resnet50 as the backbone to train youtube2019, where should my dataset path be added ? In addition, which configuration file should I choose ? Thank you very much !

Applying to image instance segmentation

Hello
How are you?
Thanks for contributing to this project.
I am going to apply your method to custom image instance segmentation rather than video instance segmentation.
The custom dataset is annotated with COCO image instance segmentation format.
Is it possible?

Setting MAX_SIZE_TEST

The original SeqFormer repository (and most previous methods) limit the test time resolution. However, for this version of SeqFormer and IDOL you did not set INPUT.MAX_SIZE_TEST. Was this intentionally? For SeqFormer, the README.md contains the tables produced by the old repository. By not limiting the test time resolution the results of this version should be better, right? It should be noted, that this issue prevents a fair comparison with previous methods. Or am I missing something?

Problems setting up dataset for IDOL

I am unable to setup the dataset for ytvis19 as required by INSTALL.md.
The page at YTVIS19 has two google drive folders, Labels and Image Frames. I downloaded and extracted these two folders. The directories are not organized in the way specified in INSTALL.md. I also cannot find the file ytvis/instances_train_sub.json
Am I downloading the wrong dataset? Could someone guide me through this?

How to reproduce visualization results in README?

Thanks for your wonderful work!

I'd like to reproduce the visualization results in your README.

I tried to add following 2 lines before demo/demo.py#L29:

from detectron2.projects.idol import add_idol_config
    add_idol_config(cfg)

But it returrn this:

appuser@0916140fb4f2:~/VNext/demo$ python demo.py --config-file ../projects/IDOL/configs/ytvis19_swinL.yaml --video-input ../0b6db1c6fd.mp4 --output ../out --opts MODEL.WEIGHTS ../YTVIS19_SWINL_643AP.pth
[08/05 06:23:21 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='../projects/IDOL/configs/ytvis19_swinL.yaml', input=None, opts=['MODEL.WEIGHTS', '../YTVIS19_SWINL_643AP.pth'], output='../out', video_input='../0b6db1c6fd.mp4', webcam=False)
/home/appuser/.local/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[08/05 06:23:28 fvcore.common.checkpoint]: [Checkpointer] Loading from ../YTVIS19_SWINL_643AP.pth ...
[ERROR:[email protected]] global /io/opencv/modules/videoio/src/cap_ffmpeg_impl.hpp (2927) open Could not find encoder for codec_id=27, error: Encoder not found
[ERROR:[email protected]] global /io/opencv/modules/videoio/src/cap_ffmpeg_impl.hpp (3002) open VIDEOIO/FFMPEG: Failed to initialize VideoWriter
[ERROR:[email protected]] global /io/opencv/modules/videoio/src/cap.cpp (595) open VIDEOIO(CV_IMAGES): raised OpenCV exception:

OpenCV(4.6.0) /io/opencv/modules/videoio/src/cap_images.cpp:253: error: (-5:Bad argument) CAP_IMAGES: can't find starting number (in the name of file): /tmp/video_format_test3zylu0ek/test_file.mkv in function 'icvExtractPattern'


  0%|                                                                                    | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "demo.py", line 178, in <module>
    for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames):
  File "/home/appuser/.local/lib/python3.7/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "~/VNext/demo/predictor.py", line 129, in run_on_video
    yield process_predictions(frame, self.predictor(frame))
  File "~/VNext/detectron2/engine/defaults.py", line 317, in __call__
    predictions = self.model([inputs])[0]
  File "/home/appuser/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "~/VNext/projects/IDOL/idol/idol.py", line 249, in forward
    video_len = len(batched_inputs[0]['file_names'])
KeyError: 'file_names'

Could you give me some hint about how to pass right batched_inputs to forward function?

SeqFormer question

In configuration.py

cfg.INPUT.SAMPLING_FRAME_NUM = 1

Does it mean that only one frame in the videos is selected as input?

the definition of proposed model

Thanks for releasing the code. Where can we find the pytorch model definition of IDOL? The IDOL is now presented by config file and Vnext library. I cannot find pytorch model definition code of IDOL.

Why Detectron2 and not MMdet (OpenMMLab projects .etc)?

首先非常感谢大佬提供这么一个仓库,我相信一定会对未来的VIS研究带来很大的便利以及统一的比较框架。但是,为什么采用detectron2,而不使用更加健全好用的openmmlab的仓库呢,这两个仓库都用过,detectron2使用起来比较晦涩,远不如mmdet,而且我本人也有想写一个基于mmdet的算法库。

Evaluation Error in IDOL

Hi,
When I try to train a model for IDOL with the ssh cennection to a server, I am taking following error in the evaluation stage:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/aylinaydin/anaconda3/envs/project/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/aylinaydin/Project/VNext/detectron2/engine/launch.py", line 126, in _distributed_worker main_func(*args) File "/home/aylinaydin/Project/VNext/projects/IDOL/train_net.py", line 161, in main res = Trainer.test(cfg, model) File "/home/aylinaydin/Project/VNext/detectron2/engine/defaults.py", line 617, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/home/aylinaydin/Project/VNext/detectron2/evaluation/evaluator.py", line 158, in inference_on_dataset outputs = model(inputs) File "/home/aylinaydin/anaconda3/envs/project/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/aylinaydin/Project/VNext/projects/IDOL/idol/idol.py", line 284, in forward 0]) # (height, width) is resized size,images. image_sizes[0] is original size File "/home/aylinaydin/Project/VNext/projects/IDOL/idol/idol.py", line 357, in inference det_masks = output_mask[indices] RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
I look for change the device for each tensor, but i couldn't. Can you help me?

Evaluate Module

Hi,Dr.Wu,thanks for your amazing work and open-source code!But when I inferedthe model, I didn't find the evaluation function code in the ytvis_eval.py. Could you please provide the evaluation function code for inferring?

Why OVIS need so many Memory??

I try to inference on a hardware with 512G memory, some video preprocess waste all memory and crask .

When I change input size from 720 --> 100, can run sucessfully.

Use so much memory during inference

when I try to inference a long video dataset (about 200 frames per video) on a hardware with 256G memory,
i meet the error and get crash:
DefaultCPUAllocator: can't allocate memory: you tried to allocate 522746265600 bytes. Error code 12 (Cannot allocate memory)

is there a way to generate the result json file with a smaller memory?

RuntimeError: Timed out waiting 1800000ms for send operation to complete

Hi! Thanks for the excellent work.

I got the Error when running IDOL with multi-processes on a single node

python projects/IDOL/train_net.py --config-file projects/IDOL/configs/ytvis19_r50.yaml --num-gpus 8

Error:

RuntimeError: [/opt/conda/conda-bld/pytorch_1656352430114/work/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:136] Timed out waiting 1800000ms for send operation to complete

The program is stuck on some iteration and one of the 8 GPUs is idle and its util is 0.

Do you know what is going on? and how to fix it?

Thanks!

Cannot reproduce the same mAP_L result on the Youtube-VIS 2022 validation set

Hello!
I trained IDOL using default swinL config yaml file, which only changes dataset from 19 to 21 and evaluate on Youtube-VIS 2022 validation set. I got nearly the same mAP_S but different mAP_L around 44, which is much lower than your mAP_L result 48.4 in your 1st solution working paper. Is there any problem?
Thank you very much.

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

python3 projects/IDOL/train_net.py --config-file projects/IDOL/configs/ytvis19_swinL.yaml --num-gpus 2

Traceback (most recent call last):
File "projects/IDOL/train_net.py", line 26, in
from detectron2.projects.idol import add_idol_config, build_detection_train_loader, build_detection_test_loader
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/init.py", line 2, in
from .idol import IDOL
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/idol.py", line 17, in
from .models.deformable_detr import DeformableDETR, SetCriterion
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/deformable_detr from .deformable_transformer import build_deforamble_transformer
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/deformable_transformer.py", line 25, in
from .ops.modules import MSDeformAttn
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/ops/modules/init.py", line 9, in
from .ms_deform_attn import MSDeformAttn
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/ops/modules/ms_deform_attn.py", line 21, in
from ..functions import MSDeformAttnFunction
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/ops/functions/init.py", line 9, in
from .ms_deform_attn_func import MSDeformAttnFunction
File "/home/user/PycharmProjects/VNext/projects/IDOL/idol/models/ops/functions/ms_deform_attn_func.py", line 18, in
import MultiScaleDeformableAttention as MSDA
ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

A question about training log

Hi,

I'm trying to reproduce IDOL on ovis, I want to ask whether the training log will be provided, because it is convenient for me to align the loss.
In fact, there is a problem. I noticed that you said in the pre issue that there will be three steps for training, first of all, pre training in coco, etc. I wonder if the AP of ovis is poor if it starts from step 2 directly?

demo running IDOL

Hi,
thanks for the great work. Is it possible to tun the demo script over the webcam using the IDOL or the SeqFormer models? if yes, which inputs (e.g. config file) do I have to put in the demo.py command?

Thanks a lot

Can you provide a training log of one IDOL model?

Hi,IDOL is an excellent paper, with many wonderful designs.
Unfortunately, due to resource constraints, I can't fully train it.
Can you provide a training log of one IDOL model on the Youtube-VIS dataset?
Thanks.

How to padding the non-bbox?

the number of instances must be the same in one video? if not how to padding the non-bbox?
targets_for_clip_prediction.append({"labels": torch.stack(clip_classes,dim=0).max(0)[0], "boxes": torch.stack(clip_boxes,dim=1), # [num_inst,num_frame,4] 'masks': torch.stack(clip_masks,dim=1), # [num_inst,num_frame,H,W] 'size': torch.as_tensor([h, w], dtype=torch.long, device=self.device), # 'inst_id':inst_ids, # 'valid':valid_id })

YouTube-VIS and OVIS ablation study splits

In the IDOL paper you run ablation studies on a split of the training set. I also found those in the dataset registration but you did not provide the json files or any script to generate these. Is it possible to share your splits? Thank you very much!

Ask a question

lirui-9527, hello, can you add a friend, some friends want to ask you, my qq : 1136806362, thank you !

about SeqFormer/train_net.py

When i used your train_net.py for seqformer, they report errors for:

from detectron2.projects.seqformer import add_seqformer_config, build_detection_train_loader, build_detection_test_loader
from detectron2.projects.seqformer.data import (
YTVISDatasetMapper, YTVISEvaluator, get_detection_dataset_dicts,DetrDatasetMapper,)

I think the import path may not be correct.

run demo.py error

Traceback (most recent call last):
File "G:/Download/VNext-main/demo/demo.py", line 14, in
from detectron2.data.detection_utils import read_image
File "G:\Download\VNext-main\detectron2\data_init_.py", line 4, in
from .build import (
File "G:\Download\VNext-main\detectron2\data\build.py", line 14, in
from detectron2.structures import BoxMode
File "G:\Download\VNext-main\detectron2\structures_init_.py", line 3, in
from .image_list import ImageList
File "G:\Download\VNext-main\detectron2\structures\image_list.py", line 8, in
from detectron2.layers.wrappers import shapes_to_tensor
File "G:\Download\VNext-main\detectron2\layers_init_.py", line 4, in
from .mask_ops import paste_masks_in_image
File "G:\Download\VNext-main\detectron2\layers\mask_ops.py", line 73, in
@torch.jit.script_if_tracing
AttributeError: module 'torch.jit' has no attribute 'script_if_tracing'

Process finished with exit code 1

I use python==3.6 cuda 10.1, tried different torchvision, but it did't solve.

Out of memory

When i tried to train seqformer (resnet50) with your default config in 4x 32G V100, there is "Cuda out of memory" error? Do I need to change any param? Thanks in advance.

troubles when reproducing

Hi, thanks for the wonderful work.

But I had some troubles when trying to reproduce the results with command:

python3 projects/IDOL/train_net.py --config-file projects/IDOL/configs/ytvis19_r50.yaml --num-gpus 8 MODEL.WEIGHTS projects/IDOL/weights/cocopretrain_R50.pth SOLVER.IMS_PER_BATCH 16

The results is 46.96, which is lower than provided results 49.5. I'm using Torch 1.9.0, and batchsize was set to 16 instead of 32.

Is there something I missed? Looking forward to your reply.

A bug when evaluation or train on ovis dataset.

When I try to run python projects/IDOL/train_net.py --config-file projects/IDOL/configs/ovis_r50.yaml --num-gpus 1 --eval-only an error occured
for ann in self.dataset['annotations']: TypeError: 'NoneType' object is not iterable
The reason for this problem is the ovis datasets annotations file has an additional items "annotations": null ,you need delete it manual.
I suggest authors add a tips in README.md.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.