masterbin-iiau / uninext Goto Github PK

[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval

License: MIT License

Shell 0.63% Python 92.46% C++ 2.36% Cuda 4.55%

instance-segmentation object-detection object-tracking perception referring-expression-comprehension referring-expression-segmentation unified-model multi-object-tracking-segmentation multiple-object-tracking referring-video-object-segmentation

uninext's Introduction

Universal Instance Perception as Object Discovery and Retrieval

This is the official implementation of the paper Universal Instance Perception as Object Discovery and Retrieval.

News

🏆 We are the runner-up in Segmentation in the Wild challenge.
🏆 We are the winner of BDD100K MOT Challenge and the runner-up of BDD MOTS Challenge on CVPR2023 workshop.

Highlight

UNINEXT is accepted by CVPR2023.
UNINEXT reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm and can flexibly perceive different types of objects by simply changing the input prompts.
UNINEXT achieves superior performance on 20 challenging benchmarks using a single model with the same model parameters.

Introduction

Object-centric understanding is one of the most essential and challenging problems in computer vision. In this work, we mainly discuss 10 sub-tasks, distributed on the vertices of the cube shown in the above figure. Since all these tasks aim to perceive instances of certain properties, UNINEXT reorganizes them into three types according to the different input prompts:

Category Names
- Object Detection
- Instance Segmentation
- Multiple Object Tracking (MOT)
- Multi-Object Tracking and Segmentation (MOTS)
- Video Instance Segmentation (VIS)
Language Expressions
- Referring Expression Comprehension (REC)
- Referring Expression Segmentation (RES)
- Referring Video Object Segmentation (R-VOS)
Target Annotations
- Single Object Tracking (SOT)
- Video Object Segmentation (VOS)

Then we propose a unified prompt-guided object discovery and retrieval formulation to solve all the above tasks. Extensive experiments demonstrate that UNINEXT achieves superior performance on 20 challenging benchmarks.

Demo

UNINEXT_DEMO_VID_9M.mp4

UNINEXT can flexibly perceive various types of objects by simply changing the input prompts, such as category names, language expressions, and target annotations. We also provide a simple demo script, which supports 4 image-level tasks (object detection, instance segmentation, REC, RES).

Results

Retrieval by Category Names

Retrieval by Language Expressions

Retrieval by Target Annotations

Getting started

Installation: Please refer to INSTALL.md for more details.
Data preparation: Please refer to DATA.md for more details.
Training: Please refer to TRAIN.md for more details.
Testing: Please refer to TEST.md for more details.
Model zoo: Please refer to MODEL_ZOO.md for more details.

Citing UNINEXT

If you find UNINEXT useful in your research, please consider citing:

@inproceedings{UNINEXT,
  title={Universal Instance Perception as Object Discovery and Retrieval},
  author={Yan, Bin and Jiang, Yi and Wu, Jiannan and Wang, Dong and Yuan, Zehuan and Luo, Ping and Lu, Huchuan},
  booktitle={CVPR},
  year={2023}
}

Acknowledgments

Thanks Unicorn for providing experience of unifying four object tracking tasks (SOT, MOT, VOS, MOTS).
Thanks VNext for providing experience of Video Instance Segmentation (VIS).
Thanks ReferFormer for providing experience of REC, RES, and R-VOS.
Thanks GLIP for the idea of unifying object detection and phrase grounding.
Thanks Detic for the implementation of multi-dataset training.
Thanks detrex for the implementation of denoising mechnism.

uninext's People

Contributors

Stargazers

Watchers

Forkers

huahuiyi wjn922 ffernandoalves zhang-tao-whu ai-jie01 cv-ip anilcosaran k-nrs zwhus mbrukman techventurebuilder techthiyanes russ168 shaikfarheen29 tanjingme verigle xmebius neophack obsidian6s s8xy wensiyuansix hay-man lycokie farmingtong staccats herpacker ntt720 minisoco hs991023 coolestair wenwenzju maigone vamoko ziozia papaxmama closefantasy molierflower d3p10y viktoya 0x8235 nicolesherwood luluchou cerviny coder-drinker fskeo nicbair hisstar kaysico masemxiao windb3ll awekling paramedick spicyguml closegoingaway kundom tutuna iinroi e-kiss-me billionerd djdhillxn monsterdove xupercoin jbluv kkallidromitis tufo830 mistyr0se n0wwa zaku-zaku moguijoe stlkoch iam20cm piapplepi cabildoo nanpusher excelisa ymzhang96 w90o0u ajqy antecessor xiao2duan jornywan twacoco paoyes luozhe023 qugou1350636 jinyi-sama aimogmog commachan nap1ch kamifr raymusk ai2047 zshpro bartslab leonz87 tqcheung err-nil jtt1998 skillcampalan yetaye

uninext's Issues

SOT Dataset with multiple annotations

Hello!

I would like to test your model on perception_test dataset.
They test the SOT task on videos where they have multiple annotations (for each video they have more than one object), unfortunately they have a completely different format.
Is there a way to adapt the SOT dataloader to handle multiple object without performing MTO (iterate over all object in the video).

Or maybe how to add another dataset with a specific format.

Thank you very much!

What does J&F 1st frame means

Dear authors, thank you for your great work. I saw you report "J&F 1st frame" in paperswithcode leader board, but I did not find the definition of this metric. Can you tell me about it? Thank you so much.

Can single GPU train your network ? 2080Ti

No such file or directory: 'owner_path/UNINEXT/configs'

Running setup.py install for detectron2 ... error
error: subprocess-exited-with-error

× Running setup.py install for detectron2 did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
Traceback (most recent call last):
File "owner_path/UNINEXT/setup.py", line 128, in get_model_zoo_configs
os.symlink(source_configs_dir, destination)
FileExistsError: [Errno 17] File exists: 'owner_path/UNINEXT/configs' -> 'owner_path/UNINEXT/detectron2/model_zoo/configs'

  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "owner_path/UNINEXT/setup.py", line 155, in <module>
      package_data={"detectron2.model_zoo": get_model_zoo_configs()},
    File "owner_path/UNINEXT/setup.py", line 131, in get_model_zoo_configs
      shutil.copytree(source_configs_dir, destination)
    File "owner_path/miniconda3/envs/uninext/lib/python3.8/shutil.py", line 555, in copytree
      with os.scandir(src) as itr:
  FileNotFoundError: [Errno 2] No such file or directory: 'owner_path/UNINEXT/configs'
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> detectron2

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Time consuming of model training

It is a promising project in the field of unified network design.
According to the information provided in the paper, the model training process involves three stages and requires 32 and 16 A100 GPUs respectively. Could you please clarify how long the training process would take?

SOT test takes 21/22 hours on LaSOT_ext

Hello I am trying to test the ResNet-50 based model on LaSOT_ext dataset for SOT task.
I have available 4 GPUs only, and the inference takes forever (around 21/22 hourse).
Is it supposed to be like that or I am doing something wrong? I was expected a couple of hours at maximum :)

Thank you!

Pre-train checkpoint

It seems that the model should be pre-trained on Object-365 datasets. However, We don't have enough computing resources.
So, could you please offer the pre-trained checkpoint ?

Thanks very much !!

Custom dataset AssertionError while testing

Thank you very much for this project, I have been trying to train your model for instance segmentation on my own dataset, which seems to work fine.
However, at the testing phase I get this error:

...
[03/29 22:18:34 d2.evaluation.evaluator]: Inference done 233/298. Dataloading: 0.0010 s/iter. Inference: 0.1157 s/iter. Eval: 3.6628 s/iter. Total: 3.7796 s/iter. ETA=0:04:05
[03/29 22:18:39 d2.evaluation.evaluator]: Inference done 271/298. Dataloading: 0.0010 s/iter. Inference: 0.1141 s/iter. Eval: 3.1437 s/iter. Total: 3.2588 s/iter. ETA=0:01:27
[03/29 22:18:42 d2.evaluation.evaluator]: Total inference time: 0:14:30.508410 (2.971018 s / iter per device, on 1 devices)
[03/29 22:18:42 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:33 (0.113249 s / iter per device, on 1 devices)
[03/29 22:18:43 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
ERROR [03/29 22:18:43 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "UNINEXT/detectron2/engine/train_loop.py", line 150, in train
    self.after_step()
  File "UNINEXT/detectron2/engine/train_loop.py", line 180, in after_step
    h.after_step()
  File "UNINEXT/detectron2/engine/hooks.py", line 555, in after_step
    self._do_eval()
  File "UNINEXT/detectron2/engine/hooks.py", line 528, in _do_eval
    results = self._func()
  File "UNINEXT/detectron2/engine/defaults.py", line 455, in test_and_save_results
    self._last_eval_results = self.test(self.cfg, self.model)
  File "UNINEXT/detectron2/engine/defaults.py", line 620, in test
    results_i = inference_on_dataset(model, data_loader, evaluator)
  File "UNINEXT/detectron2/evaluation/evaluator.py", line 204, in inference_on_dataset
    results = evaluator.evaluate()
  File "UNINEXT/detectron2/evaluation/coco_evaluation.py", line 211, in evaluate
    self._eval_predictions(predictions, img_ids=img_ids)
  File "UNINEXT/detectron2/evaluation/coco_evaluation.py", line 250, in _eval_predictions
    assert category_id <= num_classes, (
AssertionError: A prediction has class=56, but the dataset only has 21 classes and predicted class id should be in [0, 20].

My config files looks like this:

_BASE_: "image_joint_r50.yaml"
MODEL:
  WEIGHTS: "weights/R-50.pkl"
DATASETS:
  TRAIN: ("coco_data_train", )
  TEST: ("coco_data_test", )
SOLVER:
  IMS_PER_BATCH: 5 
  STEPS: (5000, 10000)  
  MAX_ITER: 20000
  CHECKPOINT_PERIOD: 2500
TEST:
  EVAL_PERIOD: 2499
DATALOADER:
  DATASET_RATIO: [1]
  DATASET_BS: [2]
  USE_RFS: [False]
  NUM_WORKERS: 1
OUTPUT_DIR: outputs/single_task_det

which is based on: single_task_det.yaml

I add my COCO dataset the same as described in objects365_v2.py, with the only difference that the range only goes upto 21 according the the number of classes im my dataset:

thing_dataset_id_to_contiguous_id = {i + 1: i for i in range(21)}

How I register my coco data:

register_coco_instances(
  "coco_data_train",
  _get_builtin_metadata(),
  PATH_DATA + "coco_train_data.json",
  PATH_DATA + "train/",
)

register_coco_instances(
  "coco_data_test",
  _get_builtin_metadata(),
  PATH_DATA + "coco_test_data.json",
  PATH_DATA + "test/",
)

I would appreciate any help very much.

some demos about SOT and VOS

@MasterBin-IIAU Hi, can you give some demos about SOT and VOS ?

Can't reproduce the performance on paper.

Hi, I test video_joint_r50 model on DAVIS and youtube vos 2018, but got

on DAVIS
and

on youtube vos2018 . So what mistake may i make? Can you give some suggestions?

Where can I find the Prompt Generation part for template image?

Hello, where can I find the Prompt Generation part for template image?

Ideas for arbitrary video testing of MOT/MOTS

Hi, I've looked at #8 and #12 as well as the codebase and I'm trying to come up with a script that can run the UNINEXT models on the MOT/MOTS task on any arbitrary video as input.
It looks like the main constraint is that I will have to make a custom evaluator and dataloader for something like this.
Any ideas on how to approach this or any alternative approaches?

Mainly, it looks like the data loader currently uses the DatasetCatalog but we should use something else?

Inference full-res

Can you add some info in the doc to inference at full-res?

Did you remove the RefCOCO test images from the COCO training set?

Hi,

thx for releasing this, this is really great work!

I recently discovered that the RefCOCO annotations, even validation and test, are all done on the training set of the original COCO dataset, which I think was done quite badly for the annotations, but now we have to live with that.
This means, that the COCO training set overlaps with the RefCOCO test set.

Did you filter out those overlapping images from the training set? If not, then the scores for RefCOCO (also + and g) would be too optimistic, because the instances that are refferred to have already been seen during instance segmentation training.
For a fair comparison to previous work, it would be necessary to re-train with those images removed (if you didn't remove those images).

How to use image as a prompt?

I'm trying to do SOT with an image prompt, but while I was able to test the language expression-based SOT well, there doesn't seem to be a good demo available for image prompt. I tried to check the code, but it's not that easy. Can you provide a simple example or sample code to help me test SOT with an image prompt? I want to test it with an image, not a video

Custom dataset training

Can you point out what all changes i need to make to do a custom dataset training?

Will you be presenting your work on certain platforms, and when?

ONNX！！！how to convert the model to onnx

I want to convert the model to onnx format, can you help ?

Some corner questions : )

Hi authors, thanks for your great work! Here are some corner questions:

Why the final choice of transformer backbone is ViT, have you ever considered Swin/Focal/Davit and etc, or because ViT-H has pretrained weights.
As your work adopt early fusion on vision and language model, for Object365 or dataset with more class, do you need forward the model for multiple times?
Is there any referring segmentation evaluation code in the current release version?
For the retrieve by annotation function, it is used for VOS?

Thanks a lot!

Fast moving objects evaluation

As you have already acknowledged VNext in the Readme are you going to integrate InstMove?

/Projects/UNINEXT/conversion/download_bdd.py", line 6, in download if url.endswiths(".zip"): AttributeError: 'str' object has no attribute 'endswiths'. Did you mean: 'endswith'?

simple bug. delete leftover s

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'projects/UNINEXT/bert-base-uncased'. Use `repo_type` argument if needed.

hello, when i use this command :
python3 projects/UNINEXT/demo.py --config-file projects/UNINEXT/configs/image_joint_r50.yaml --input assets/demo.jpg --output demo/detection.jpg --task detection --opts MODEL.WEIGHTS outputs/image_joint_r50/model_final.pth

encountered an error error like this:
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'projects/UNINEXT/bert-base-uncased'. Use repo_type argument if needed.

Adding new backbone

I want to use the NextViT transformer model as a backbone in uninext, how do I register the model and how do I use it? Do you have a document or resource about this?

Thanks for help

ONNX Format

I want to convert the model to onnx format, can you help ?

Performance of REC & RES without stage I O365 training?

Hi @MasterBin-IIAU,

Thanks for your wonderful work!

Could you please tell me the performance of REC & RES without stage one training (O365 detection)? This could help me a lot.

Regards,
Runpei

open vocabulary object detection in the wild

First, thanks a lot for the great work! Congratulations on your achievements on different challenges!
I'm trying to use your pre-trained model to do open vocabulary object detection in the wild kind of task, by simply providing an image and giving some arbitrary vocabulary (e.g., house, car, tree, light pole, fence, etc.) and hopping to do an object detection on the given image. I'm unsure if I understand your work's capability correctly and if this is a suitable task. I tried to naively change the following code in the demo script to hard code the object classes.

UNINEXT/projects/UNINEXT/predictor.py

Line 281 in 5ab0386

if test_categories is not None:

to:

        test_categories = [{'color': [220, 20, 60], 'isthing': 1, 'id': 1, 'name': 'ground'}, {'color': [119, 11, 32], 'isthing': 1, 'id': 2, 'name': 'house'}, {'color': [0, 0, 142], 'isthing': 1, 'id': 3, 'name': 'tree'}, {'color': [0, 0, 230], 'isthing': 1, 'id': 4, 'name': 'car'}, {'color': [106, 0, 228], 'isthing': 1, 'id': 5, 'name': 'light pole'},{'color': [106, 0, 228], 'isthing': 1, 'id': 6, 'name': 'fence'}]
        # test_categories = None
        if test_categories is not None:
            prompt_test, positive_map_label_to_token = create_queries_and_maps(test_categories, self.tokenizer) # for example, test_categories = [{"name": "person"}]
            self.prompt_test_dict["xxx"] = prompt_test
            self.positive_map_label_to_token_dict["xxx"] = positive_map_label_to_token

But when testing on different images, it seems like only the cars can be detected and the other objects can not. Could you please help me with this?

Thank you in advance!

Include a Colab Demo for demo

This is amazing work. Congratulations! I can't wait to test it.

Could you please provide a colab/notebook for simple testing?

Thanks!

Average Precision Values

I want to calculate the AP values for each category and display them in a table, could you help?

Extra datasets

Have you evaluated this on BURST/TAO and MOSE?

Adjust num_classes to custom dataset

I am training your model for instance segmentation on my own dataset and saw that there are 3 mentions of num_classes in the config:

MODEL:
  DDETRS:
    NUM_CLASSES: null
  RETINANET:
    NUM_CLASSES: 80
  ROI_HEADS:
    NUM_CLASSES: 80
  SEM_SEG_HEAD:
    NUM_CLASSES: 54

Which have to be adjusted for the instance segmentation task?
Anything else has to be adjusted?

Thank you very much for this fascinating project.

how to use models in onnx format?

why use gt_masks instead of gt_boxes

Thanks to the author for such a great job!
I have a question about masks,
Why use gt_masks instead of gt_boxes after data augmentation, if you use gt_bboxes directly, how much will the final experimental results differ.
'''
dataset_mapper_uni_vid.UniVidDatasetMapper
if instances.has("gt_masks"):
instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
'''

Models mismatches when load

Hello, I am trying to reproduce results for SOT task (on LaSOT_ext) in testing, since I would like to analyzed the trained model on another dataset.

Specifically, I have followed all the procedure highlighted in DATA.md for data preparation (models and dataset).
Moreover, I have downloaded the models in MODEL_ZOO.md (image_joint_r50, image_joint_convnext_large, and image_joint_vit_huge_32g from Stage 2) and converted from 3c to 4c with your script.
Finally, run assets/infer.sh file.

Anyway, following the procedure, I am still getting some mismatches when loading the models (BERT and image_joint_r50).

What Iam I doing wrong?

bug : AttributeError: DO_LN

In image_joint_r50.yaml, if set USE_DINO: False, there will be a bug:
`Traceback (most recent call last):
File "/root/UNINEXT/projects/UNINEXT/train_net.py", line 247, in
args=(args,),
File "/root/UNINEXT/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "/root/UNINEXT/projects/UNINEXT/train_net.py", line 225, in main
trainer = Trainer(cfg)
File "/root/UNINEXT/detectron2/engine/defaults.py", line 376, in init
model = self.build_model(cfg)
File "/root/UNINEXT/detectron2/engine/defaults.py", line 517, in build_model
model = build_model(cfg)
File "/root/UNINEXT/detectron2/modeling/meta_arch/build.py", line 22, in build_model
model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
File "/root/UNINEXT/projects/UNINEXT/uninext/uninext_img.py", line 113, in init
cfg=cfg)
File "/root/UNINEXT/projects/UNINEXT/uninext/models/deformable_detr/deformable_transformer.py", line 98, in init
do_ln=cfg.MODEL.DO_LN
File "/root/anaconda3/envs/yolov7/lib/python3.7/site-packages/yacs/config.py", line 141, in getattr
raise AttributeError(name)
AttributeError: DO_LN

Process finished with exit code 1
`
It seems like there is no "cfg.MODEL.DO_LN". How to set "cfg.MODEL.DO_LN"?

ONNX！！！！PLEASE！！！！！

I really need it!!!!!!!!

Results much worse than paper. Is it possible I do not have the correct models?

The following shows results for OVIS and youtube-VIS:

I am running these using infer.sh and commenting out other tasks. I also added the flag --np 1 to each of the launch commands as I am running this all on a single 4090.

I followed the instructions in INSTALL.md and did the following.

downloaded the Language Model (BERT-base) according tho DATA.md
downloaded the RESNET 50 models for image and video from the MODEL_ZOO.md (step 2 and 3).
Converted the 3c to 4c with the following python3 conversion/convert_3c_to_4c_pth.py
Downloaded the respective datasets and ran the corresponding infer.sh code

Some things that I am not sure about:

I did not download the pretrained models in DATA.md in the pretrained weights section as they seem to be the backbones before training on Object365.

Do I have the right BERT model? It seems like I am downloading the base BERT model, but in the paper it says in section 4.1 that its parameters are trained in the first two stages. That BERT model doesn't seem to be in the MODEL_ZOO.md

Evaluation of BDD100K MOTS

When I run the script for MOTS as in TEST.md with

# convert to BDD100K format (bitmask)
python3 tools_bin/to_bdd100k.py --res outputs/${EXP_NAME}/inference/instances_predictions_init_0.40_obj_0.30.pkl --task seg_track --bdd-dir . --nproc 32
# evaluate
bash tools_bin/eval_bdd_submit.sh

The evaluation obtains 0 for all metrics, so probably the file it is referring to is empty. And I check the script of tools_bin/to_bdd100k.py, seems like it does not save any file in the preds2bdd100k function. Is it as expected and I am missing something, or there is another problem?

Local test metrics are much higher than article metrics！？

Thanks to the author for such a great job!

When I tested the dataset coco val 2017 (det, inst seg) locally, the local results were much better than in the paper. The local tests were done with 8x3090, python=3.8, torch=1.10.0. I wonder: What improvements have been made to the weights on github?

All weights come from "Stage 2: Image-level joint training".

Once again, many thanks to the author for the excellent work!

how to video test?

i can demo for image. but don't work for video

Traceback (most recent call last):
File "projects/UNINEXT/demo.py", line 208, in
for vis_frame in tqdm.tqdm(demo.run_on_video(video, args.confidence_threshold), total=num_frames):
File "/home/csyun/.local/lib/python3.8/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/home/csyun/Desktop/UNINEXT/projects/UNINEXT/predictor.py", line 138, in run_on_video
yield process_predictions(frame, self.predictor(frame))
TypeError: call() missing 1 required positional argument: 'task'

what is the problem?

bdd download error

Thanks to the author for such a great job!
But when I use conversion/download_bdd.py,
" https://bdd-data-storage-release.s3.us-west-2.amazonaws.com/bdd100k/2021/bdd100k_seg_track_20_labels_trainval.zip" and "https://bdd-data-storage-release.s3.us-west-2.amazonaws.com/bdd100k/2021/bdd100k_ins_seg_labels_trainval.zip" can't download, how to solve this problem.

Missing converesion info for `image_joint_convnext_large/model_final_4c.pth`

UNINEXT/projects/UNINEXT/configs/video_joint_convnext_large.yaml

Line 4 in cc24f24

WEIGHTS: "outputs/image_joint_convnext_large/model_final_4c.pth"

But the data preparation Markdown doesn't include this step:
https://github.com/MasterBin-IIAU/UNINEXT/blob/master/assets/DATA.md

File "/usr/lib/python3.10/multiprocessing/reduction.py", line 164, in recvfds raise RuntimeError('received %d items of ancdata' % RuntimeError: received 0 items of ancdata

The error in the title came up for me. It seems like on some machines there is a resource limit that you must increase. The code below worked for me. This is a known issue for python multiprocessing library.

import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (2048, rlimit[1]))

A question of task-specific performance (RVOS)

Hi,

Thanks for the excellent work!

May I have some details of the R-VOS performance (57.2) in Table 11 for comparison?

Did you follow ReferFormer and perform joint training (combine Ref-COCO pseudo sequences and YouTube-VOS for training) or pretrain&finetune (pretrain on static Ref-COCO datasets and finetune on YouTube-VOS), or use some different strategies?

Question: How are the expressions sampled in object detection training?

Hello Bin,
I trained on an object detection dataset with only 6 categories, but I found that expressions_new still does not contain all categories, what kind of sampling rules are there? I locate to func convert_object_detection_to_grounding_optimized_for_od in code /workspace/UNINEXT/projects/UNINEXT/uninext/data/coco_dataset_mapper_uni.py, I guess it means that all positive categories are kept and some negative categories are randomly selected? Why do I need to do this？

Thanks very much!

information on DATA.md is wrong. Update info on data preparation

python3 launch.py --np 1 --nn 0 --eval-only --uni 1 --config-file projects/UNINEXT/configs/eval-vid/video_joint_r50_eval_ovis.yaml --resume OUTPUT_DIR outputs/${EXP_NAME} MODEL.USE_IOU_BRANCH False

When running the following the console tell me that the filestructure for ovis should be dataset/ovis/valid instead of dataset/ovis/val. Also important is that the ovis validation data json given lists annotations as null and a annotations nonetype error comes up. Upon further inspection the youtube vis data doesn't have this in their json. simply deleting the annotations key value in the ovis json fixes the problem.

Google Colab for inference on pretrained models

Hello,

I believe there were people who asked for examples and colab examples. I used colab to run the code here and thought it would be beneficial if I shared it here.

https://colab.research.google.com/drive/1b0BT1qROm6aQmNKwJ5nPspLkZyOCrBj7?usp=sharing

Thank you so much and a great job on the code and paper! Congrats on getting your paper accepted!

ImportError: DLL load failed: %1 not a valid Win32 application.

Hello！
I follow the INSTALL.md：

compile Deformable DETR

cd projects/UNINEXT/uninext/models/deformable_detr/ops
bash make.sh

After performing the above steps， I run test.py, I get the following error：
import MultiScaleDeformableAttention as MSDA
ImportError: DLL load failed: %1 not a valid Win32 application.

How to solve it?
Thank you so much.

MOT on video

Hello,

Great work!!. Could you provide an example of MOT using a video as input?

I'm having trouble trying.

Many thanks

Error when convert pretrained weights to 4c

For stage 3 pretrained weights, when running python3 conversion/convert_3c_to_4c_pth.py, I got this error:

Traceback (most recent call last):
File "conversion/convert_3c_to_4c_pth.py", line 30, in
new_value[:, :-1, :, :] = v
RuntimeError: The expanded size of the tensor (3) must match the existing size (4) at non-singleton dimension 1. Target sizes: [64, 3, 7, 7]. Tensor sizes: [64, 4, 7, 7]

omnilabel dataset

hi @MasterBin-IIAU thanks for sharing the wonderful work, we are wondering if uninext is applicable to Omnilabel Dataset, which can be viewed as a multi-instance referring object detection dataset