Giter Site home page Giter Site logo

masterbin-iiau / uninext Goto Github PK

View Code? Open in Web Editor NEW
1.4K 98.0 157.0 17.94 MB

[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval

License: MIT License

Shell 0.63% Python 92.46% C++ 2.36% Cuda 4.55%
instance-segmentation object-detection object-tracking perception referring-expression-comprehension referring-expression-segmentation unified-model multi-object-tracking-segmentation multiple-object-tracking referring-video-object-segmentation

uninext's Introduction

Universal Instance Perception as Object Discovery and Retrieval

UNINEXT This is the official implementation of the paper Universal Instance Perception as Object Discovery and Retrieval.

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

News

Highlight

  • UNINEXT is accepted by CVPR2023.
  • UNINEXT reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm and can flexibly perceive different types of objects by simply changing the input prompts.
  • UNINEXT achieves superior performance on 20 challenging benchmarks using a single model with the same model parameters.

Introduction

TASK-RADAR

Object-centric understanding is one of the most essential and challenging problems in computer vision. In this work, we mainly discuss 10 sub-tasks, distributed on the vertices of the cube shown in the above figure. Since all these tasks aim to perceive instances of certain properties, UNINEXT reorganizes them into three types according to the different input prompts:

  • Category Names
    • Object Detection
    • Instance Segmentation
    • Multiple Object Tracking (MOT)
    • Multi-Object Tracking and Segmentation (MOTS)
    • Video Instance Segmentation (VIS)
  • Language Expressions
    • Referring Expression Comprehension (REC)
    • Referring Expression Segmentation (RES)
    • Referring Video Object Segmentation (R-VOS)
  • Target Annotations
    • Single Object Tracking (SOT)
    • Video Object Segmentation (VOS)

Then we propose a unified prompt-guided object discovery and retrieval formulation to solve all the above tasks. Extensive experiments demonstrate that UNINEXT achieves superior performance on 20 challenging benchmarks.

Demo

UNINEXT_DEMO_VID_9M.mp4

UNINEXT can flexibly perceive various types of objects by simply changing the input prompts, such as category names, language expressions, and target annotations. We also provide a simple demo script, which supports 4 image-level tasks (object detection, instance segmentation, REC, RES).

Results

Retrieval by Category Names

OD-IS MOT-MOTS-VIS

Retrieval by Language Expressions

REC-RES-RVOS

Retrieval by Target Annotations

SOT-VOS

Getting started

  1. Installation: Please refer to INSTALL.md for more details.
  2. Data preparation: Please refer to DATA.md for more details.
  3. Training: Please refer to TRAIN.md for more details.
  4. Testing: Please refer to TEST.md for more details.
  5. Model zoo: Please refer to MODEL_ZOO.md for more details.

Citing UNINEXT

If you find UNINEXT useful in your research, please consider citing:

@inproceedings{UNINEXT,
  title={Universal Instance Perception as Object Discovery and Retrieval},
  author={Yan, Bin and Jiang, Yi and Wu, Jiannan and Wang, Dong and Yuan, Zehuan and Luo, Ping and Lu, Huchuan},
  booktitle={CVPR},
  year={2023}
}

Acknowledgments

  • Thanks Unicorn for providing experience of unifying four object tracking tasks (SOT, MOT, VOS, MOTS).
  • Thanks VNext for providing experience of Video Instance Segmentation (VIS).
  • Thanks ReferFormer for providing experience of REC, RES, and R-VOS.
  • Thanks GLIP for the idea of unifying object detection and phrase grounding.
  • Thanks Detic for the implementation of multi-dataset training.
  • Thanks detrex for the implementation of denoising mechnism.

uninext's People

Contributors

lkeab avatar masterbin-iiau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uninext's Issues

SOT Dataset with multiple annotations

Hello!

I would like to test your model on perception_test dataset.
They test the SOT task on videos where they have multiple annotations (for each video they have more than one object), unfortunately they have a completely different format.
Is there a way to adapt the SOT dataloader to handle multiple object without performing MTO (iterate over all object in the video).

Or maybe how to add another dataset with a specific format.

Thank you very much!

What does J&F 1st frame means

Dear authors, thank you for your great work. I saw you report "J&F 1st frame" in paperswithcode leader board, but I did not find the definition of this metric. Can you tell me about it? Thank you so much.

No such file or directory: 'owner_path/UNINEXT/configs'

Running setup.py install for detectron2 ... error
error: subprocess-exited-with-error

× Running setup.py install for detectron2 did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
Traceback (most recent call last):
File "owner_path/UNINEXT/setup.py", line 128, in get_model_zoo_configs
os.symlink(source_configs_dir, destination)
FileExistsError: [Errno 17] File exists: 'owner_path/UNINEXT/configs' -> 'owner_path/UNINEXT/detectron2/model_zoo/configs'

  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "owner_path/UNINEXT/setup.py", line 155, in <module>
      package_data={"detectron2.model_zoo": get_model_zoo_configs()},
    File "owner_path/UNINEXT/setup.py", line 131, in get_model_zoo_configs
      shutil.copytree(source_configs_dir, destination)
    File "owner_path/miniconda3/envs/uninext/lib/python3.8/shutil.py", line 555, in copytree
      with os.scandir(src) as itr:
  FileNotFoundError: [Errno 2] No such file or directory: 'owner_path/UNINEXT/configs'
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> detectron2

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Time consuming of model training

It is a promising project in the field of unified network design.
According to the information provided in the paper, the model training process involves three stages and requires 32 and 16 A100 GPUs respectively. Could you please clarify how long the training process would take?

SOT test takes 21/22 hours on LaSOT_ext

Hello I am trying to test the ResNet-50 based model on LaSOT_ext dataset for SOT task.
I have available 4 GPUs only, and the inference takes forever (around 21/22 hourse).
Is it supposed to be like that or I am doing something wrong? I was expected a couple of hours at maximum :)

Thank you!

Pre-train checkpoint

It seems that the model should be pre-trained on Object-365 datasets. However, We don't have enough computing resources.
So, could you please offer the pre-trained checkpoint ?

Thanks very much !!

Custom dataset AssertionError while testing

Thank you very much for this project, I have been trying to train your model for instance segmentation on my own dataset, which seems to work fine.
However, at the testing phase I get this error:

...
[03/29 22:18:34 d2.evaluation.evaluator]: Inference done 233/298. Dataloading: 0.0010 s/iter. Inference: 0.1157 s/iter. Eval: 3.6628 s/iter. Total: 3.7796 s/iter. ETA=0:04:05
[03/29 22:18:39 d2.evaluation.evaluator]: Inference done 271/298. Dataloading: 0.0010 s/iter. Inference: 0.1141 s/iter. Eval: 3.1437 s/iter. Total: 3.2588 s/iter. ETA=0:01:27
[03/29 22:18:42 d2.evaluation.evaluator]: Total inference time: 0:14:30.508410 (2.971018 s / iter per device, on 1 devices)
[03/29 22:18:42 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:33 (0.113249 s / iter per device, on 1 devices)
[03/29 22:18:43 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
ERROR [03/29 22:18:43 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "UNINEXT/detectron2/engine/train_loop.py", line 150, in train
    self.after_step()
  File "UNINEXT/detectron2/engine/train_loop.py", line 180, in after_step
    h.after_step()
  File "UNINEXT/detectron2/engine/hooks.py", line 555, in after_step
    self._do_eval()
  File "UNINEXT/detectron2/engine/hooks.py", line 528, in _do_eval
    results = self._func()
  File "UNINEXT/detectron2/engine/defaults.py", line 455, in test_and_save_results
    self._last_eval_results = self.test(self.cfg, self.model)
  File "UNINEXT/detectron2/engine/defaults.py", line 620, in test
    results_i = inference_on_dataset(model, data_loader, evaluator)
  File "UNINEXT/detectron2/evaluation/evaluator.py", line 204, in inference_on_dataset
    results = evaluator.evaluate()
  File "UNINEXT/detectron2/evaluation/coco_evaluation.py", line 211, in evaluate
    self._eval_predictions(predictions, img_ids=img_ids)
  File "UNINEXT/detectron2/evaluation/coco_evaluation.py", line 250, in _eval_predictions
    assert category_id <= num_classes, (
AssertionError: A prediction has class=56, but the dataset only has 21 classes and predicted class id should be in [0, 20].

My config files looks like this:

_BASE_: "image_joint_r50.yaml"
MODEL:
  WEIGHTS: "weights/R-50.pkl"
DATASETS:
  TRAIN: ("coco_data_train", )
  TEST: ("coco_data_test", )
SOLVER:
  IMS_PER_BATCH: 5 
  STEPS: (5000, 10000)  
  MAX_ITER: 20000
  CHECKPOINT_PERIOD: 2500
TEST:
  EVAL_PERIOD: 2499
DATALOADER:
  DATASET_RATIO: [1]
  DATASET_BS: [2]
  USE_RFS: [False]
  NUM_WORKERS: 1
OUTPUT_DIR: outputs/single_task_det

which is based on: single_task_det.yaml

I add my COCO dataset the same as described in objects365_v2.py, with the only difference that the range only goes upto 21 according the the number of classes im my dataset:

thing_dataset_id_to_contiguous_id = {i + 1: i for i in range(21)}

How I register my coco data:

register_coco_instances(
  "coco_data_train",
  _get_builtin_metadata(),
  PATH_DATA + "coco_train_data.json",
  PATH_DATA + "train/",
)

register_coco_instances(
  "coco_data_test",
  _get_builtin_metadata(),
  PATH_DATA + "coco_test_data.json",
  PATH_DATA + "test/",
)

I would appreciate any help very much.

Can't reproduce the performance on paper.

Hi, I test video_joint_r50 model on DAVIS and youtube vos 2018, but got
image
on DAVIS
and
image
on youtube vos2018 . So what mistake may i make? Can you give some suggestions?

Ideas for arbitrary video testing of MOT/MOTS

Hi, I've looked at #8 and #12 as well as the codebase and I'm trying to come up with a script that can run the UNINEXT models on the MOT/MOTS task on any arbitrary video as input.
It looks like the main constraint is that I will have to make a custom evaluator and dataloader for something like this.
Any ideas on how to approach this or any alternative approaches?

Mainly, it looks like the data loader currently uses the DatasetCatalog but we should use something else?

Did you remove the RefCOCO test images from the COCO training set?

Hi,

thx for releasing this, this is really great work!

I recently discovered that the RefCOCO annotations, even validation and test, are all done on the training set of the original COCO dataset, which I think was done quite badly for the annotations, but now we have to live with that.
This means, that the COCO training set overlaps with the RefCOCO test set.

Did you filter out those overlapping images from the training set? If not, then the scores for RefCOCO (also + and g) would be too optimistic, because the instances that are refferred to have already been seen during instance segmentation training.
For a fair comparison to previous work, it would be necessary to re-train with those images removed (if you didn't remove those images).

How to use image as a prompt?

I'm trying to do SOT with an image prompt, but while I was able to test the language expression-based SOT well, there doesn't seem to be a good demo available for image prompt. I tried to check the code, but it's not that easy. Can you provide a simple example or sample code to help me test SOT with an image prompt? I want to test it with an image, not a video

Some corner questions : )

Hi authors, thanks for your great work! Here are some corner questions:

  1. Why the final choice of transformer backbone is ViT, have you ever considered Swin/Focal/Davit and etc, or because ViT-H has pretrained weights.
  2. As your work adopt early fusion on vision and language model, for Object365 or dataset with more class, do you need forward the model for multiple times?
  3. Is there any referring segmentation evaluation code in the current release version?
  4. For the retrieve by annotation function, it is used for VOS?

Thanks a lot!

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'projects/UNINEXT/bert-base-uncased'. Use `repo_type` argument if needed.

hello, when i use this command :
python3 projects/UNINEXT/demo.py --config-file projects/UNINEXT/configs/image_joint_r50.yaml --input assets/demo.jpg --output demo/detection.jpg --task detection --opts MODEL.WEIGHTS outputs/image_joint_r50/model_final.pth

encountered an error error like this:
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'projects/UNINEXT/bert-base-uncased'. Use repo_type argument if needed.

Adding new backbone

I want to use the NextViT transformer model as a backbone in uninext, how do I register the model and how do I use it? Do you have a document or resource about this?

Thanks for help

ONNX Format

I want to convert the model to onnx format, can you help ?

open vocabulary object detection in the wild

First, thanks a lot for the great work! Congratulations on your achievements on different challenges!
I'm trying to use your pre-trained model to do open vocabulary object detection in the wild kind of task, by simply providing an image and giving some arbitrary vocabulary (e.g., house, car, tree, light pole, fence, etc.) and hopping to do an object detection on the given image. I'm unsure if I understand your work's capability correctly and if this is a suitable task. I tried to naively change the following code in the demo script to hard code the object classes.

if test_categories is not None:

to:

        test_categories = [{'color': [220, 20, 60], 'isthing': 1, 'id': 1, 'name': 'ground'}, {'color': [119, 11, 32], 'isthing': 1, 'id': 2, 'name': 'house'}, {'color': [0, 0, 142], 'isthing': 1, 'id': 3, 'name': 'tree'}, {'color': [0, 0, 230], 'isthing': 1, 'id': 4, 'name': 'car'}, {'color': [106, 0, 228], 'isthing': 1, 'id': 5, 'name': 'light pole'},{'color': [106, 0, 228], 'isthing': 1, 'id': 6, 'name': 'fence'}]
        # test_categories = None
        if test_categories is not None:
            prompt_test, positive_map_label_to_token = create_queries_and_maps(test_categories, self.tokenizer) # for example, test_categories = [{"name": "person"}]
            self.prompt_test_dict["xxx"] = prompt_test
            self.positive_map_label_to_token_dict["xxx"] = positive_map_label_to_token

But when testing on different images, it seems like only the cars can be detected and the other objects can not. Could you please help me with this?

Thank you in advance!

Include a Colab Demo for demo

This is amazing work. Congratulations! I can't wait to test it.

Could you please provide a colab/notebook for simple testing?

Thanks!

Average Precision Values

I want to calculate the AP values ​​for each category and display them in a table, could you help?

Adjust num_classes to custom dataset

I am training your model for instance segmentation on my own dataset and saw that there are 3 mentions of num_classes in the config:

MODEL:
  DDETRS:
    NUM_CLASSES: null
  RETINANET:
    NUM_CLASSES: 80
  ROI_HEADS:
    NUM_CLASSES: 80
  SEM_SEG_HEAD:
    NUM_CLASSES: 54

Which have to be adjusted for the instance segmentation task?
Anything else has to be adjusted?

Thank you very much for this fascinating project.

why use gt_masks instead of gt_boxes

Thanks to the author for such a great job!
I have a question about masks,
Why use gt_masks instead of gt_boxes after data augmentation, if you use gt_bboxes directly, how much will the final experimental results differ.
'''
dataset_mapper_uni_vid.UniVidDatasetMapper
if instances.has("gt_masks"):
instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
'''

Models mismatches when load

Hello, I am trying to reproduce results for SOT task (on LaSOT_ext) in testing, since I would like to analyzed the trained model on another dataset.

Specifically, I have followed all the procedure highlighted in DATA.md for data preparation (models and dataset).
Moreover, I have downloaded the models in MODEL_ZOO.md (image_joint_r50, image_joint_convnext_large, and image_joint_vit_huge_32g from Stage 2) and converted from 3c to 4c with your script.
Finally, run assets/infer.sh file.

Anyway, following the procedure, I am still getting some mismatches when loading the models (BERT and image_joint_r50).

What Iam I doing wrong?

bug : AttributeError: DO_LN

In image_joint_r50.yaml, if set USE_DINO: False, there will be a bug:
`Traceback (most recent call last):
File "/root/UNINEXT/projects/UNINEXT/train_net.py", line 247, in
args=(args,),
File "/root/UNINEXT/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "/root/UNINEXT/projects/UNINEXT/train_net.py", line 225, in main
trainer = Trainer(cfg)
File "/root/UNINEXT/detectron2/engine/defaults.py", line 376, in init
model = self.build_model(cfg)
File "/root/UNINEXT/detectron2/engine/defaults.py", line 517, in build_model
model = build_model(cfg)
File "/root/UNINEXT/detectron2/modeling/meta_arch/build.py", line 22, in build_model
model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
File "/root/UNINEXT/projects/UNINEXT/uninext/uninext_img.py", line 113, in init
cfg=cfg)
File "/root/UNINEXT/projects/UNINEXT/uninext/models/deformable_detr/deformable_transformer.py", line 98, in init
do_ln=cfg.MODEL.DO_LN
File "/root/anaconda3/envs/yolov7/lib/python3.7/site-packages/yacs/config.py", line 141, in getattr
raise AttributeError(name)
AttributeError: DO_LN

Process finished with exit code 1
`
It seems like there is no "cfg.MODEL.DO_LN". How to set "cfg.MODEL.DO_LN"?

Results much worse than paper. Is it possible I do not have the correct models?

The following shows results for OVIS and youtube-VIS:

ovis_vis
youtube_vis

I am running these using infer.sh and commenting out other tasks. I also added the flag --np 1 to each of the launch commands as I am running this all on a single 4090.

I followed the instructions in INSTALL.md and did the following.

  1. downloaded the Language Model (BERT-base) according tho DATA.md
  2. downloaded the RESNET 50 models for image and video from the MODEL_ZOO.md (step 2 and 3).
  3. Converted the 3c to 4c with the following python3 conversion/convert_3c_to_4c_pth.py
  4. Downloaded the respective datasets and ran the corresponding infer.sh code

Some things that I am not sure about:

I did not download the pretrained models in DATA.md in the pretrained weights section as they seem to be the backbones before training on Object365.

Do I have the right BERT model? It seems like I am downloading the base BERT model, but in the paper it says in section 4.1 that its parameters are trained in the first two stages. That BERT model doesn't seem to be in the MODEL_ZOO.md

Evaluation of BDD100K MOTS

When I run the script for MOTS as in TEST.md with

# convert to BDD100K format (bitmask)
python3 tools_bin/to_bdd100k.py --res outputs/${EXP_NAME}/inference/instances_predictions_init_0.40_obj_0.30.pkl --task seg_track --bdd-dir . --nproc 32
# evaluate
bash tools_bin/eval_bdd_submit.sh

The evaluation obtains 0 for all metrics, so probably the file it is referring to is empty. And I check the script of tools_bin/to_bdd100k.py, seems like it does not save any file in the preds2bdd100k function. Is it as expected and I am missing something, or there is another problem?

Local test metrics are much higher than article metrics!?

Thanks to the author for such a great job!

When I tested the dataset coco val 2017 (det, inst seg) locally, the local results were much better than in the paper. The local tests were done with 8x3090, python=3.8, torch=1.10.0. I wonder: What improvements have been made to the weights on github?

All weights come from "Stage 2: Image-level joint training".

屏幕截图 2023-03-14 163305

Once again, many thanks to the author for the excellent work!

how to video test?

i can demo for image. but don't work for video

Traceback (most recent call last):
File "projects/UNINEXT/demo.py", line 208, in
for vis_frame in tqdm.tqdm(demo.run_on_video(video, args.confidence_threshold), total=num_frames):
File "/home/csyun/.local/lib/python3.8/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/home/csyun/Desktop/UNINEXT/projects/UNINEXT/predictor.py", line 138, in run_on_video
yield process_predictions(frame, self.predictor(frame))
TypeError: call() missing 1 required positional argument: 'task'

what is the problem?

File "/usr/lib/python3.10/multiprocessing/reduction.py", line 164, in recvfds raise RuntimeError('received %d items of ancdata' % RuntimeError: received 0 items of ancdata

The error in the title came up for me. It seems like on some machines there is a resource limit that you must increase. The code below worked for me. This is a known issue for python multiprocessing library.

import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (2048, rlimit[1]))

A question of task-specific performance (RVOS)

Hi,

Thanks for the excellent work!

May I have some details of the R-VOS performance (57.2) in Table 11 for comparison?

Did you follow ReferFormer and perform joint training (combine Ref-COCO pseudo sequences and YouTube-VOS for training) or pretrain&finetune (pretrain on static Ref-COCO datasets and finetune on YouTube-VOS), or use some different strategies?

Question: How are the expressions sampled in object detection training?

Hello Bin,
I trained on an object detection dataset with only 6 categories, but I found that expressions_new still does not contain all categories, what kind of sampling rules are there? I locate to func convert_object_detection_to_grounding_optimized_for_od in code /workspace/UNINEXT/projects/UNINEXT/uninext/data/coco_dataset_mapper_uni.py, I guess it means that all positive categories are kept and some negative categories are randomly selected? Why do I need to do this?

Thanks very much!

information on DATA.md is wrong. Update info on data preparation

python3 launch.py --np 1 --nn 0 --eval-only --uni 1 --config-file projects/UNINEXT/configs/eval-vid/video_joint_r50_eval_ovis.yaml --resume OUTPUT_DIR outputs/${EXP_NAME} MODEL.USE_IOU_BRANCH False

When running the following the console tell me that the filestructure for ovis should be dataset/ovis/valid instead of dataset/ovis/val. Also important is that the ovis validation data json given lists annotations as null and a annotations nonetype error comes up. Upon further inspection the youtube vis data doesn't have this in their json. simply deleting the annotations key value in the ovis json fixes the problem.

ImportError: DLL load failed: %1 not a valid Win32 application.

Hello!
I follow the INSTALL.md:

compile Deformable DETR

cd projects/UNINEXT/uninext/models/deformable_detr/ops
bash make.sh

After performing the above steps, I run test.py, I get the following error:
import MultiScaleDeformableAttention as MSDA
ImportError: DLL load failed: %1 not a valid Win32 application.

How to solve it?
Thank you so much.

MOT on video

Hello,

Great work!!. Could you provide an example of MOT using a video as input?

I'm having trouble trying.

Many thanks

Error when convert pretrained weights to 4c

For stage 3 pretrained weights, when running python3 conversion/convert_3c_to_4c_pth.py, I got this error:

Traceback (most recent call last):
File "conversion/convert_3c_to_4c_pth.py", line 30, in
new_value[:, :-1, :, :] = v
RuntimeError: The expanded size of the tensor (3) must match the existing size (4) at non-singleton dimension 1. Target sizes: [64, 3, 7, 7]. Tensor sizes: [64, 4, 7, 7]

omnilabel dataset

hi @MasterBin-IIAU thanks for sharing the wonderful work, we are wondering if uninext is applicable to Omnilabel Dataset, which can be viewed as a multi-instance referring object detection dataset

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.