hkchengrex / tracking-anything-with-deva Goto Github PK

[ICCV 2023] Tracking Anything with Decoupled Video Segmentation

Home Page: https://hkchengrex.com/Tracking-Anything-with-DEVA/

License: Other

Python 99.86% Shell 0.14%

deep-learning object-tracking open-vocabulary-segmentation video-editing video-object-segmentation video-segmentation open-vocabulary-video-segmentation open-world-video-segmentation iccv2023

tracking-anything-with-deva's Introduction

DEVA: Tracking Anything with Decoupled Video Segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee

University of Illinois Urbana-Champaign and Adobe

ICCV 2023

[arXiV] [PDF] [Project Page]

Highlights

Provide long-term, open-vocabulary video segmentation with text-prompts out-of-the-box.
Fairly easy to integrate your own image model! Wouldn't you or your reviewers be interested in seeing examples where your image model also works well on videos 😏? No finetuning is needed!

Note (Mar 6 2024): We have fixed a major bug (introduced in the last update) that prevented the deletion of unmatched segments in text/eval_with_detections modes. This should greatly reduce the amount of accumulated noisy detection/false positives, especially for long videos. See #64.

Note (Sep 12 2023): We have improved automatic video segmentation by not querying the points in segmented regions. We correspondingly increased the number of query points per side to 64 and deprecated the "engulf" mode. The old code can be found in the "legacy_engulf" branch. The new code should run a lot faster and capture smaller objects. The text-prompted mode is still recommended for better results.

Note (Sep 11 2023): We have removed the "pluralize" option as it works weirdly sometimes with GroundingDINO. If needed, please pluralize the prompt yourself.

Abstract

We develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we propose a (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several tasks, most notably in large-vocabulary video panoptic segmentation and open-world video segmentation.

Demo Videos

Demo with Grounded Segment Anything (text prompt: "guinea pigs" and "chicken"):

geinua.mp4

Source: https://www.youtube.com/watch?v=FM9SemMfknA

Demo with Grounded Segment Anything (text prompt: "pigs"):

piglets.mp4

Source: https://youtu.be/FbK3SL97zf8

Demo with Grounded Segment Anything (text prompt: "capybara"):

capybara_ann.mp4

Source: https://youtu.be/couz1CrlTdQ

Demo with Segment Anything (automatic points-in-grid prompting); original video follows DEVA result overlaying the video:

soapbox_joined.mp4

Source: DAVIS 2017 validation set "soapbox"

Demo with Segment Anything on a out-of-domain example; original video follows DEVA result overlaying the video:

green_pepper_joined.mp4

Source: https://youtu.be/FQQaSyH9hZI

Installation

Tested on Ubuntu only. For installation on Windows WSL2, refer to #20 (thanks @21pl).

Prerequisite:

Python 3.9+
PyTorch 1.12+ and corresponding torchvision

Clone our repository:

git clone https://github.com/hkchengrex/Tracking-Anything-with-DEVA.git

Install with pip:

cd Tracking-Anything-with-DEVA
pip install -e .

(If you encounter the File "setup.py" not found error, upgrade your pip with pip install --upgrade pip)

Download the pretrained models:

bash scripts/download_models.sh

Required for the text-prompted/automatic demo:

Install our fork of Grounded-Segment-Anything. Follow its instructions.

Grounding DINO installation might fail silently. Try python -c "from groundingdino.util.inference import Model as GroundingDINOModel". If you get a warning about running on CPU mode only, make sure you have CUDA_HOME set during Grounding DINO installation.

(Optional) For fast integer program solving in the semi-online setting:

Get your gurobi licence which is free for academic use. If a license is not found, we fall back to using PuLP which is slower and is not rigorously tested by us. All experiments are conducted with gurobi.

Quick Start

DEMO.md contains more details on the input arguments and tips on speeding up inference. You can always look at deva/inference/eval_args.py and deva/ext/ext_eval_args.py for a full list of arguments.

With gradio:

python demo/demo_gradio.py

Then visit the link that popped up on the terminal. If executing on a remote server, try port forwarding.

We have prepared an example in example/vipseg/12_1mWNahzcsAc (a clip from the VIPSeg dataset). The following two scripts segment the example clip using either Grounded Segment Anything with text prompts or SAM with automatic (points in grid) prompting.

Script (text-prompted):

python demo/demo_with_text.py --chunk_size 4 \
--img_path ./example/vipseg/images/12_1mWNahzcsAc \
--amp --temporal_setting semionline \
--size 480 \
--output ./example/output --prompt person.hat.horse

Script (automatic):

python demo/demo_automatic.py --chunk_size 4 \
--img_path ./example/vipseg/images/12_1mWNahzcsAc \
--amp --temporal_setting semionline \
--size 480 \
--output ./example/output

Training and Evaluation

Limitations

On closed-set data, DEVA most likely does not work as well as end-to-end approaches. Joint training is (for now) still a better idea when you have enough target data.
Positive detections are amplified temporally due to propagation. Having a detector with a lower false positive rate (i.e., a higher threshold) helps.
If new objects are coming in and out all the time (e.g., in driving scenes), we will keep a lot of objects in the memory bank which unfortunately increases the false positive rate. Decreasing max_missed_detection_count might help since we delete objects from memory more eagerly.

Citation

@inproceedings{cheng2023tracking,
  title={Tracking Anything with Decoupled Video Segmentation},
  author={Cheng, Ho Kei and Oh, Seoung Wug and Price, Brian and Schwing, Alexander and Lee, Joon-Young},
  booktitle={ICCV},
  year={2023}
}

References

The demo would not be possible without ❤️ from the community:

Grounded Segment Anything: https://github.com/IDEA-Research/Grounded-Segment-Anything

Segment Anything: https://github.com/facebookresearch/segment-anything

XMem: https://github.com/hkchengrex/XMem

Title card generated with OpenPano: https://github.com/ppwwyyxx/OpenPano

tracking-anything-with-deva's People

Contributors

Stargazers

Watchers

Forkers

haorand beasteers absalan ukaserge hadryan enock360 paperwave longervision shenyang70s mikewangwzhl x-ck-x sorokinvld kekewind miyatat thanhpham1987 dharmikjagodana ahmedhefnawy manu87ds 0iui0 faisalshahbaz alialemimatinpour jordanesikati dronetdsp alexandersolis prashantdixit0 abdallahomarahmed adam-aalah artur-martins-ufc argo12 fc5170 narasimmansaravana1994 lycsqq nivir yongjingli valery-shinkevich giuliazc jeremiah0425 guspan-tanadi waondering anhlbt bestine xuhui1994 xuanjiawang af-74413592 5l1v3r1 renaissnance dylanyh2 keyman9848 c-nr ai-machine-vision-lab yangboz ototao phuocnguyenhuu wdjlover krishan0507 anelembabela ruitao-terry hostspacing peterzs zy6588 jedi9t cv-seg theprash007 mstatt hanzhic innovationmatrix ntjess lyh983012 siyuanliii jeson998 arsh09 ut-amrl tianxintony enternalcode yonisgit mnabilarif javierganan99 qitsweauca yuqunw calibrec richiesui cuzmuc suruiyuan rkp64 danielpkhalil jackzhousz science000 m3m012y johnsage25 bingxinhu gdudovitch-sc ryanbowz jay-sign abecadel fhsup lvyv zeinebbc xiaosq2000 gaussian-splatting-toolkit yangyang117

tracking-anything-with-deva's Issues

Is it normal for DEVA's training speed to be much slower than XMem?

Is it normal for DEVA's training speed to be much slower than XMem? I am training DEVA on the same machine and in the same environment, but the speed is very slow. I don't know why.

License

Cool :)
Could the repo get an MIT or GPL license?

Problem installing Grounded-Segment-Anything with Cuda 12.2

Hi,

I was trying your Tracking-Anything-with-DEVA, but when I install the GroundingDINO, it said

        The detected CUDA version (12.2) mismatches the version that was used to compile
        PyTorch (11.8). Please make sure to use the same CUDA versions.
        
        [end of output]

May I know how I can solve it? Thx

Below is the full log. Thx

python -m pip install -e GroundingDINO
Obtaining file:///media/ak/HD/DEVA/Grounded-Segment-Anything/Grounded-Segment-Anything/GroundingDINO
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (2.0.1+cu118)
Requirement already satisfied: torchvision in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (0.15.2+cu118)
Requirement already satisfied: transformers in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (4.30.2)
Requirement already satisfied: addict in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (2.4.0)
Requirement already satisfied: yapf in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (0.31.0)
Requirement already satisfied: timm in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (0.9.2)
Requirement already satisfied: numpy in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (1.23.5)
Requirement already satisfied: opencv-python in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (4.7.0.68)
Requirement already satisfied: supervision in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (0.6.0)
Requirement already satisfied: pycocotools in /home/ak/anaconda3/lib/python3.9/site-packages (from groundingdino==0.1.0) (2.0.7)
Requirement already satisfied: matplotlib>=2.1.0 in /home/ak/anaconda3/lib/python3.9/site-packages (from pycocotools->groundingdino==0.1.0) (3.8.0)
Requirement already satisfied: pyyaml in /home/ak/anaconda3/lib/python3.9/site-packages (from timm->groundingdino==0.1.0) (6.0)
Requirement already satisfied: huggingface-hub in /home/ak/anaconda3/lib/python3.9/site-packages (from timm->groundingdino==0.1.0) (0.15.1)
Requirement already satisfied: safetensors in /home/ak/anaconda3/lib/python3.9/site-packages (from timm->groundingdino==0.1.0) (0.3.1)
Requirement already satisfied: filelock in /home/ak/anaconda3/lib/python3.9/site-packages (from torch->groundingdino==0.1.0) (3.6.0)
Requirement already satisfied: typing-extensions in /home/ak/anaconda3/lib/python3.9/site-packages (from torch->groundingdino==0.1.0) (4.7.0)
Requirement already satisfied: sympy in /home/ak/anaconda3/lib/python3.9/site-packages (from torch->groundingdino==0.1.0) (1.10.1)
Requirement already satisfied: networkx in /home/ak/anaconda3/lib/python3.9/site-packages (from torch->groundingdino==0.1.0) (3.1)
Requirement already satisfied: jinja2 in /home/ak/anaconda3/lib/python3.9/site-packages (from torch->groundingdino==0.1.0) (2.11.3)
Requirement already satisfied: triton==2.0.0 in /home/ak/anaconda3/lib/python3.9/site-packages (from torch->groundingdino==0.1.0) (2.0.0)
Requirement already satisfied: cmake in /home/ak/anaconda3/lib/python3.9/site-packages (from triton==2.0.0->torch->groundingdino==0.1.0) (3.25.0)
Requirement already satisfied: lit in /home/ak/anaconda3/lib/python3.9/site-packages (from triton==2.0.0->torch->groundingdino==0.1.0) (15.0.7)
Requirement already satisfied: requests in /home/ak/anaconda3/lib/python3.9/site-packages (from torchvision->groundingdino==0.1.0) (2.27.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /home/ak/anaconda3/lib/python3.9/site-packages (from torchvision->groundingdino==0.1.0) (9.5.0)
Requirement already satisfied: packaging>=20.0 in /home/ak/anaconda3/lib/python3.9/site-packages (from transformers->groundingdino==0.1.0) (21.3)
Requirement already satisfied: regex!=2019.12.17 in /home/ak/anaconda3/lib/python3.9/site-packages (from transformers->groundingdino==0.1.0) (2022.3.15)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /home/ak/anaconda3/lib/python3.9/site-packages (from transformers->groundingdino==0.1.0) (0.13.3)
Requirement already satisfied: tqdm>=4.27 in /home/ak/anaconda3/lib/python3.9/site-packages (from transformers->groundingdino==0.1.0) (4.64.0)
Requirement already satisfied: fsspec in /home/ak/anaconda3/lib/python3.9/site-packages (from huggingface-hub->timm->groundingdino==0.1.0) (2022.2.0)
Requirement already satisfied: contourpy>=1.0.1 in /home/ak/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /home/ak/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/ak/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (4.25.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/ak/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (1.3.2)
Requirement already satisfied: pyparsing>=2.3.1 in /home/ak/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (3.0.4)
Requirement already satisfied: python-dateutil>=2.7 in /home/ak/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /home/ak/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (6.1.0)
Requirement already satisfied: MarkupSafe>=0.23 in /home/ak/anaconda3/lib/python3.9/site-packages (from jinja2->torch->groundingdino==0.1.0) (2.0.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ak/anaconda3/lib/python3.9/site-packages (from requests->torchvision->groundingdino==0.1.0) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /home/ak/anaconda3/lib/python3.9/site-packages (from requests->torchvision->groundingdino==0.1.0) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/ak/anaconda3/lib/python3.9/site-packages (from requests->torchvision->groundingdino==0.1.0) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /home/ak/anaconda3/lib/python3.9/site-packages (from requests->torchvision->groundingdino==0.1.0) (3.3)
Requirement already satisfied: mpmath>=0.19 in /home/ak/anaconda3/lib/python3.9/site-packages (from sympy->torch->groundingdino==0.1.0) (1.2.1)
Requirement already satisfied: zipp>=3.1.0 in /home/ak/anaconda3/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (3.7.0)
Requirement already satisfied: six>=1.5 in /home/ak/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (1.16.0)
DEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
Installing collected packages: groundingdino
  Running setup.py develop for groundingdino
    error: subprocess-exited-with-error
    
    × python setup.py develop did not run successfully.
    │ exit code: 1
    ╰─> [59 lines of output]
        Building wheel groundingdino-0.1.0
        Compiling with CUDA
        running develop
        /home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
          warnings.warn(
        /home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
          warnings.warn(
        running egg_info
        writing groundingdino.egg-info/PKG-INFO
        writing dependency_links to groundingdino.egg-info/dependency_links.txt
        writing requirements to groundingdino.egg-info/requires.txt
        writing top-level names to groundingdino.egg-info/top_level.txt
        /home/ak/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
          warnings.warn(msg.format('we could not find ninja.'))
        reading manifest file 'groundingdino.egg-info/SOURCES.txt'
        adding license file 'LICENSE'
        writing manifest file 'groundingdino.egg-info/SOURCES.txt'
        running build_ext
        Traceback (most recent call last):
          File "<string>", line 2, in <module>
          File "<pip-setuptools-caller>", line 34, in <module>
          File "/media/ak/HD/DEVA/Grounded-Segment-Anything/Grounded-Segment-Anything/GroundingDINO/setup.py", line 200, in <module>
            setup(
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
            return distutils.core.setup(**attrs)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 148, in setup
            return run_commands(dist)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
            dist.run_commands()
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
            self.run_command(cmd)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/dist.py", line 1214, in run_command
            super().run_command(command)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
            cmd_obj.run()
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
            self.install_for_development()
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/develop.py", line 114, in install_for_development
            self.run_command('build_ext')
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/dist.py", line 1214, in run_command
            super().run_command(command)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
            cmd_obj.run()
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
            _build_ext.run(self)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
            _build_ext.build_ext.run(self)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
            self.build_extensions()
          File "/home/ak/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
            _check_cuda_version(compiler_name, compiler_version)
          File "/home/ak/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
            raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
        RuntimeError:
        The detected CUDA version (12.2) mismatches the version that was used to compile
        PyTorch (11.8). Please make sure to use the same CUDA versions.
        
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [59 lines of output]
    Building wheel groundingdino-0.1.0
    Compiling with CUDA
    running develop
    /home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    /home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    running egg_info
    writing groundingdino.egg-info/PKG-INFO
    writing dependency_links to groundingdino.egg-info/dependency_links.txt
    writing requirements to groundingdino.egg-info/requires.txt
    writing top-level names to groundingdino.egg-info/top_level.txt
    /home/ak/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
      warnings.warn(msg.format('we could not find ninja.'))
    reading manifest file 'groundingdino.egg-info/SOURCES.txt'
    adding license file 'LICENSE'
    writing manifest file 'groundingdino.egg-info/SOURCES.txt'
    running build_ext
    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "/media/ak/HD/DEVA/Grounded-Segment-Anything/Grounded-Segment-Anything/GroundingDINO/setup.py", line 200, in <module>
        setup(
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
        return distutils.core.setup(**attrs)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 148, in setup
        return run_commands(dist)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
        dist.run_commands()
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
        self.run_command(cmd)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/dist.py", line 1214, in run_command
        super().run_command(command)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
        cmd_obj.run()
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/develop.py", line 114, in install_for_development
        self.run_command('build_ext')
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/dist.py", line 1214, in run_command
        super().run_command(command)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
        cmd_obj.run()
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
        _build_ext.build_ext.run(self)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
        self.build_extensions()
      File "/home/ak/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
        _check_cuda_version(compiler_name, compiler_version)
      File "/home/ak/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
        raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
    RuntimeError:
    The detected CUDA version (12.2) mismatches the version that was used to compile
    PyTorch (11.8). Please make sure to use the same CUDA versions.
    
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

nvidia-smi
Mon Sep 25 22:45:06 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:01:00.0  On |                  Off |
|  0%   44C    P8              25W / 450W |    321MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1312      G   /usr/lib/xorg/Xorg                          112MiB |
|    0   N/A  N/A      1593      G   /usr/bin/gnome-shell                         32MiB |
|    0   N/A  N/A      3812      G   ...irefox/3131/usr/lib/firefox/firefox      158MiB |
+---------------------------------------------------------------------------------------+

Video does not have browser-compatible container or codec. Converting to mp4?

Hi,

I can run the UI now, thx.
I have the ffmpeg but it said no "browser-compatible container or codec"?
also, sudo apt-get install ubuntu-restricted-extras, no help...
Is there any suggestion to solve the problem? Thx

python demo/demo_gradio.py
/home/ak/anaconda3/lib/python3.9/site-packages/gradio/blocks.py:950: UserWarning: api_name predict already exists, using predict_1
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Error while flagging: [Errno 2] No such file or directory: 'https://user-images.githubusercontent.com/7107196/265518746-4a00cd0d-f712-447f-82c4-6152addffd6b.mp4'
/home/ak/anaconda3/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
final text_encoder_type: bert-base-uncased
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Configuration: {'model': './saves/DEVA-propagation.pth', 'output': None, 'save_all': False, 'amp': True, 'key_dim': 64, 'value_dim': 512, 'pix_feat_dim': 512, 'disable_long_term': False, 'max_mid_term_frames': 10, 'min_mid_term_frames': 5, 'max_long_term_elements': 10000, 'num_prototypes': 128, 'top_k': 30, 'mem_every': 5, 'chunk_size': 8, 'size': 480, 'GROUNDING_DINO_CONFIG_PATH': './saves/GroundingDINO_SwinT_OGC.py', 'GROUNDING_DINO_CHECKPOINT_PATH': './saves/groundingdino_swint_ogc.pth', 'DINO_THRESHOLD': 0.35, 'DINO_NMS_THRESHOLD': 0.8, 'SAM_ENCODER_VERSION': 'vit_h', 'SAM_CHECKPOINT_PATH': './saves/sam_vit_h_4b8939.pth', 'MOBILE_SAM_CHECKPOINT_PATH': './saves/mobile_sam.pt', 'SAM_NUM_POINTS_PER_SIDE': 64, 'SAM_NUM_POINTS_PER_BATCH': 64, 'SAM_PRED_IOU_THRESHOLD': 0.88, 'SAM_OVERLAP_THRESHOLD': 0.8, 'img_path': './example/vipseg', 'detection_every': 5, 'num_voting_frames': 3, 'temporal_setting': 'semionline', 'max_missed_detection_count': 10, 'max_num_objects': 200, 'prompt': 'pigs', 'sam_variant': 'original', 'enable_long_term': True, 'enable_long_term_count_usage': True}
  0%|                                                   | 0/789 [00:00<?, ?it/s]/home/ak/anaconda3/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
2023-09-26 11:13:16.520850: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-26 11:13:16.978377: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-26 11:13:17.888671: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
  0%|                                           | 2/789 [00:05<27:41,  2.11s/it]Restricted license - for non-production use only - expires 2024-10-28
100%|█████████████████████████████████████████| 789/789 [06:58<00:00,  1.89it/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/home/ak/anaconda3/lib/python3.9/site-packages/gradio/components/video.py:334: UserWarning: Video does not have browser-compatible container or codec. Converting to mp4
  warnings.warn(
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Using Postive and negative points for each object as prompts

Hi,

I was trying to pass positive and negative points as prompts to the model [Positive points: inside object, Negative: outside object]

Can you give me some help to do this?

Thanks

Error while inference

Hi, I cloned your code, I downloaded your pre-computed results of DAVIS2017 and DAVIS2016. I used your results for calculating J and F values and it worked.

But when I propagate the temporal information using your temporal propagation model on your precomputed image segmentations of DAVIS2016 and DAVIS2017 and then I want to calculate the J and F values, it is getting error, it seems it is not accepting the segmentations. This is error is coming:

Evaluating sequences for the unsupervised task...
Traceback (most recent call last):
File "evaluation_method.py", line 40, in
dataset_eval = DAVISEvaluation(davis_root=args.davis_path, task=args.task, gt_set=args.set, year=args.year)
File "/home/users/JawadT/davis2016/davis2017-evaluation/davis2017/evaluation.py", line 27, in init
self.dataset = DAVIS(root=davis_root, task=task, subset=gt_set, sequences=sequences, codalab=codalab, year=self.year)
File "/home/users/JawadT/davis2016/davis2017-evaluation/davis2017/davis.py", line 46, in init
self._check_directories()
File "/home/users/JawadT/davis2016/davis2017-evaluation/davis2017/davis.py", line 72, in _check_directories
raise FileNotFoundError(f'Annotations folder for the {self.task} task not found, download it from {self.DATASET_WEB}')
FileNotFoundError: Annotations folder for the unsupervised task not found, download it from https://davischallenge.org/davis2017/code.html

Note: I use this command for DAVIS2016:

python evaluation/eval_saliency.py --mask_path davis2016/Unsup-DAVIS16/Unsup-DAVIS16-DIS-detections --img_path davis2016/Davisfromsite/DAVIS/JPEGImages/480p --output output29 --imset_path davis2016/Davisfromsite/DAVIS/ImageSets/480p/val.txt

But when I take its results to calculate J and F values, it is giving the above error.(I am specifying the directories correctly）

Is ovis used for training networks? Do you have a download link？

Advice on optimizing use?

Hi DEVA team! Thank you for this fantastic resource and repo. It is clear that DEVA already works really well by default. However, I noticed a few fault modes that arose in my specific use case, which involves detecting and tracking two similar-looking animals of the same species in a closed chamber. Animals cannot leave or enter the chamber.

Is it possible to feed in manually curated segmentation for the first frame (or for arbitrary frames) in a video clip? I've noticed that errors in segmentation of the initial frame get propagated over time, so being able to supply manually curated segmentation for known challenging frames (especially at the start of video clips) may dramatically improve overall results.
If I segment 1000 consecutive frames forwards and backwards (by reversing the numbering of the frames in a folder), I may get different segmentation results. The direction in which the first frame is easily segmentable yields the better video segmentation overall result. Is this surprising? Or perhaps is this not surprising despite bidirectional propagation because the extended 1000-frame length exceeds the time horizon of bidirectional propagation?
Since the number of objects I wish to detect (e.g., two) is constant over the entire video, is there a set of parameters that is functionally equivalent to setting the hypothetical variable min_num_objects to two? (I already set max_num_objects to two, and I recognize that min_num_objects does not exist as a variable in the repository.)
Is there a way to set a maximum mask size or area, since I've noticed that DEVA occasionally lumps together one animal's tail with another animal's body as a single segment, even if the two animals are not in direct physical contact.

Many, many thanks!
Shuyu

No module named 'GroundingDINO'

How are you, may I ask how to deal with the problem with

    from GroundingDINO.groundingdino.util.inference import Model as GroundingDINOModel
ModuleNotFoundError: No module named 'GroundingDINO'

Error while calculating J&F value

Hi, I downloaded your Pre-computed detections from image models(I downloaded the Unsup-DAVIS16-DEVA-DIS). I just want to give these results to DAVIS16 official code to regenerate the J and F values, I am using this official comman:

python tools/eval.py -i path-to-my-technique -o results.yaml --year 2016 --single-object --phase val

but this error comes:

[WARNING][24-11-2023 14:26:00] Temporal stability not available
[INFO][24-11-2023 14:26:00] Loading DAVIS year: 2016 phase: phase.VAL
[INFO][24-11-2023 14:26:00] Loading video segmentations from: Deva
Traceback (most recent call last):
File "python/tools/eval.py", line 80, in
osp.join(args.input,s),args.single_object) for s in db.iternames()]
File "/home/users/JawadT/davis-2017/python/lib/davis/dataset/base.py", line 127, in init
self.n_objects = _get_num_objects(self[0])
File "/home/ubuntu/anaconda3/envs/Second/lib/python2.7/site-packages/skimage/io/collection.py", line 264, in getitem
self.data[idx] = self.load_func(fname, **kwargs)
File "/home/users/JawadT/davis-2017/python/lib/davis/dataset/base.py", line 27, in load_annotation
annotation, = imread_indexed(filename)
File "/home/users/JawadT/davis-2017/python/lib/davis/misc/io.py", line 13, in imread_indexed
return annotation,np.array(im.getpalette()).reshape((-1,3))
ValueError: cannot reshape array of size 1 into shape (3)

[WinError 10054] An existing connection was forcibly closed by the remote host

Has anyone seen this before? On Windows.

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:11<00:00, 3.77it/s]
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "C:\Users\mail.conda\envs\deva-tracking\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\mail.conda\envs\deva-tracking\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

uvos

Hello, this is a wonderful work. I read your paper and found that it was mentioned that your model can be used to do many tasks. For example, unsupervised video object segmentation. Can this be implemented in the code? If so, what is the command or code?And how can I train it？
thankyou very much！

Unexpected segmentation output

Hi, I generated some detections using my own model, then I wanted to Use DEVA's temporal propagation approach to propagate temporal information, but after running the DEVA's temporal propagation model, the output becomes totally different and the mean of J and F is 8. I am confused if DEVA is accepting any kind of detection.
This is the segmentation before applying DEVA:

And this is the segmentation mask after applying DEVA:

cannot read the pth file?

ubuntu 18.04
cuda 11.6
torch 1.21

when i run python demo/demo_gradio.py and chose the Text prompt to run it shows as follows:

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

when i chose the Automatic it shows that:

Traceback (most recent call last):
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/gradio/routes.py", line 516, in predict
output = await route_utils.call_process_api(
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/gradio/route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/gradio/blocks.py", line 1437, in process_api
result = await self.call_function(
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/gradio/utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "/home/lanson/桌面/Code/Tracking-Anything-with-DEVA/demo/demo_gradio.py", line 118, in demo_automatic
sam_model = get_sam_model(cfg, 'cuda')
File "/home/lanson/桌面/Code/Tracking-Anything-with-DEVA/deva/ext/automatic_sam.py", line 35, in get_sam_model
sam = sam_model_registrySAM_ENCODER_VERSION.to(
File "/home/lanson/桌面/Code/Grounded-Segment-Anything/segment_anything/segment_anything/build_sam.py", line 15, in build_sam_vit_h
return _build_sam(
File "/home/lanson/桌面/Code/Grounded-Segment-Anything/segment_anything/segment_anything/build_sam.py", line 105, in _build_sam
state_dict = torch.load(f)
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/torch/serialization.py", line 705, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/home/lanson/anaconda3/envs/DEVA/lib/python3.10/site-packages/torch/serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Hope you can help me , Thanks very much!

Insufficient GPU memory

When running the example routine with the following command:

python demo/demo_with_text.py --chunk_size 1
--img_path ./example/vipseg/images/12_1mWNahzcsAc
--amp --temporal_setting semionline
--size 480
--output ./example/output --prompt person.hat.horse

I encountered the following error:
"RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 5.93 GiB total capacity; 4.73 GiB already allocated; 337.75 MiB free; 4.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF."

My GPU is GTX 1060 6GB. What could be the possible reasons for this issue? How can I resolve it?

video tracking initialization

Thanks for releasing this great repo. May i ask how to initialize the video segmentation with point prompt on SAM. I only found an auto-segment function for SAM in the codebase. I am also curious about the behavior of consensus under the online setting where images come in a stream.

Running examples does not work

Running these two examples give errors:

~/Tracking-Anything-with-DEVA$ python demo/demo_with_text.py --chunk_size 4 --img_path ./example/vipseg/images/12_1mWNahzcsAc --amp --temporal_setting semionline --size 480  --output ./example/output --prompt person.hat.horse
Traceback (most recent call last):
  File "~/Tracking-Anything-with-DEVA/deva/ext/grounding_dino.py", line 12, in <module>
    from groundingdino.util.inference import Model as GroundingDINOModel
ModuleNotFoundError: No module named 'groundingdino'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "demo/demo_with_text.py", line 15, in <module>
    from deva.ext.grounding_dino import get_grounding_dino_model
  File "~Tracking-Anything-with-DEVA/deva/ext/grounding_dino.py", line 15, in <module>
    from GroundingDINO.groundingdino.util.inference import Model as GroundingDINOModel
ModuleNotFoundError: No module named 'GroundingDINO'

Not working if there is large variation between consecutive frames??

Hi, I tried DEVA on some ultrasound data sets, it is not working well as there is large variation between consecutive frames. It seems if there is much differences between the object shape and size in consecutive frames, it is not possible for deva to reach in-clip consensus.

My model's result is 66.7, but when I use DEVA's temporal propagation model to propagate temporal information, the result decrease to 39.

My data set has single object for most of the videos, but for some videos it has multiple objects, I used your approach for both davis 2016 and also davis 2017. When I used davis 2017 approach, I used https://github.com/davisvideochallenge/davis2017-evaluation for calculating J and F values, and when I used davis 2016 approach I used https://github.com/hkchengrex/vos-benchmark for calculating J and F values.

Do you think I am making any mistake here? or I am right when the differences between consecutive frames are so much, it will be very difficult even it will be impossible for deva to reach in-cli Consensus?

Train error

Hi,
I try to train the model.

Run: deva/train.py
data: download_datasets.py

Sets:
import os
os.environ["MASTER_ADDR"] = 'localhost'
os.environ["MASTER_PORT"] = "12355"
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

The following errors were reported during training:

class ###############################################################################
<class 'torch.Tensor'> <class 'torch.Tensor'> <class 'torch.Tensor'> <class 'torch.Tensor'> <class 'dict'>

len ###############################################################################
16 16 16 16 2
../aten/src/ATen/native/cuda/NLLLoss2d.cu:103: nll_loss2d_forward_kernel: block: [0,0,0], thread: [680,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):

File ~/Software/Anaconda3/envs/DEVA/lib/python3.11/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)

File ~/NAS/home/code/SAM/Tracking-Anything-with-DEVA-main/deva/train.py:270
model.do_pass(data, total_iter)

File ~/NAS/home/code/SAM/Tracking-Anything-with-DEVA-main/deva/model/trainer.py:155 in do_pass
losses = self.loss_computer.compute({**data, **out}, num_filled_objects, it)

File ~/NAS/home/code/SAM/Tracking-Anything-with-DEVA-main/deva/model/losses.py:62 in compute
loss, p = self.bce(data[f'logits_{ti}'][bi:bi + 1, :num_objects[bi] + 1],

File ~/Software/Anaconda3/envs/DEVA/lib/python3.11/site-packages/torch/nn/modules/module.py:1518 in _wrapped_call_impl
return self._call_impl(*args, **kwargs)

File ~/Software/Anaconda3/envs/DEVA/lib/python3.11/site-packages/torch/nn/modules/module.py:1527 in _call_impl
return forward_call(*args, **kwargs)

File ~/NAS/home/code/SAM/Tracking-Anything-with-DEVA-main/deva/model/losses.py:34 in forward
return F.cross_entropy(input, target), 1.0

File ~/Software/Anaconda3/envs/DEVA/lib/python3.11/site-packages/torch/nn/functional.py:3053 in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Is this a question of the number of categories, or something else?
How to fix it?

Thanks

NameError: name '_C' is not defined

A lot of problems with the code on colab have been corrected. The remaining compilation problem remains, and I don’t know how to deal with it.

Extract the ROI of the masked regions

Hello,

I have been able to apply masks to humans in a video. Using the Grounding DINO model, I am able to extract the ROI (region of interest) of all the humans in the frame. However, the ROI has some background noise, which is not too bad. Still, I was wondering if there was a way to extract only the masked humans.

Currently, when I try to visualize the output of the mask, I don't see any humans. The output is mostly an array of zeros.

For example, if there are 5 humans in the frame, I would like the ROI of only the masked humans and nothing else. If there is a person with an orange mask, I would like to extract only that person and the orange mask.

I want to perform pose estimation on the segmented/masked humans instead of performing pose estimation on the entire frame. It would defeat the purpose of segmenting and reduce the accuracy as well.

Please let me know if this is possible or not. Thanks a lot for your time.

Running on macos / mps

Segment anything can be run on mps with good acceleration vs cpu. This demo fails to run when loading the deva model - "torch not compiled with cuda enabled". Is it possible to support mps?

Traceback (most recent call last):
File "/Users/chris/Documents/AI/segmentation/Tracking-Anything-with-DEVA/demo/demo_automatic.py", line 34, in
deva_model, cfg, args = get_model_and_config(parser)
File "/Users/chris/Documents/AI/segmentation/Tracking-Anything-with-DEVA/deva/inference/eval_args.py", line 65, in get_model_and_config
network = DEVA(config).cuda().eval()
File "/Users/chris/miniconda3/envs/TA-DEVA/lib/python3.9/site-packages/torch/nn/modules/module.py", line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File "/Users/chris/miniconda3/envs/TA-DEVA/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/Users/chris/miniconda3/envs/TA-DEVA/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/Users/chris/miniconda3/envs/TA-DEVA/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/Users/chris/miniconda3/envs/TA-DEVA/lib/python3.9/site-packages/torch/nn/modules/module.py", line 905, in
return self._apply(lambda t: t.cuda(device))
File "/Users/chris/miniconda3/envs/TA-DEVA/lib/python3.9/site-packages/torch/cuda/init.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Installation issue

Hi, sorry for asking so much questions. I want to implement the DEVA’s temporal propagation model for 2 tasks:

1.I have downloaded your pre computed detections from image models(For DAVIS16), now I want to implement DEVA’s temporal propagation model and then regenerate the J and F values.
2.I have generated some detections by my own model, I just want to use DAVIS16 approach to propagate the temporal information.

What I did is:

1-I installed python 3.8
2-I installed the latest version of Pytorch and corresponding torchvision
3-I cloned your repo
I did run pip install -e . command
4-Then downloaded pretrained models
5-Then I did run this command:

python evaluation/eval_saliency.py --mask_path UnsupDAVIS16/Unsup-DAVIS16-DIS-detections --img_path DAVIS/JPEGImages --output 24output --imset_path DAVIS/ImageSets/2016/val.txt

But I am getting this error:

Traceback (most recent call last): File "/home/users/JawadT/Tracking-Anything-with-DEVA/evaluation/eval_saliency.py", line 13, in from deva.inference.data.saliency_test_datasets import DAVISSaliencyTestDataset ModuleNotFoundError: No module named 'deva'

What should I do solve it? Do I need to install more packages? Even for DAVIS2016 I have to install fork of Grounded-Segment-Anything?

Information about the proposed method

It seems that the proposed method is used during the inference?

VRAM needed

Hi!

I am getting good results with this project! Thank you!

currently I am running on a RTX 3090 with a 24GB of VRAM. I am running the automatic mode demo script on two different image sequences with both 2048x1080 resolution, both png, both RGB, both 101 frames long. The weird thing is: One image sequence is completing the script, the other one is giving a CUDA out of memory exception. I have tried systematically changing the chunk_size, size and many other parameters, but there seems to be no impact at all.

Maybe someone would be so kind to point me in the right direction:

what tools to use to debug this
maybe shed some light what internals could cause this behaviour
what are the VRAM requirements?

Thank you!

how can I enable cpu to run the demo_gradio.py

Hi, I don't have any cuda gpu, but i still want to run this project on my computer. What should i edit about this file to reproduce?

Some confusion about Table S1

Congratulations and respect for your excellent work!
About Table S1, I wonder whether the ‘w/ OVIS’ means train DEVA with OVIS, and 'DEVA' denotes XMem + SAM(bi-directional propagation) or XMem + modifications in Appendix Section A.

Potential bug in automatic tracking

My input images are 64x64, and I found that if I do size=64 for demo_automatic.py, it crashes. This feels like a bug. Is it expected?

Traceback:

Traceback (most recent call last): 
  File "demo/demo_automatic.py", line 66, in <module>               
    process_frame(deva, sam_model, f"null-{ti}.jpg", result_saver, ti, image_np=frame)                                                                                                                                                                                          
  File "/opt/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context                                                                                                                                                                        
    return func(*args, **kwargs)                                    
  File "/home/user/code_local/Tracking-Anything-with-DEVA/deva/ext/automatic_processor.py", line 65, in process_frame_automatic                                                                                                                                                 
    _, mask, new_segments_info = deva.vote_in_temporary_buffer(
  File "/home/user/code_local/Tracking-Anything-with-DEVA/deva/inference/inference_core.py", line 124, in vote_in_temporary_buffer                                                                                                                                              
    projected_ti, projected_mask, projected_info = find_consensus_auto_association(                                                                                                                                                                                             
  File "/home/user/code_local/Tracking-Anything-with-DEVA/deva/inference/consensus_automatic.py", line 165, in find_consensus_auto_association                                                                                                                                  
    projected_mask = spatial_alignment(ti, image, mask, keyframe_ti, keyframe_image,                                                                                                                                                                                            
  File "/home/user/code_local/Tracking-Anything-with-DEVA/deva/inference/consensus_associated.py", line 57, in spatial_alignment                                                                                                                                                
    affinity = do_softmax(similarity, top_k=config['top_k'])                                                                                                                                                                                                                    
  File "/home/user/code_local/Tracking-Anything-with-DEVA/deva/model/memory_utils.py", line 58, in do_softmax                                                                                                                                                                   
    values, indices = torch.topk(similarity, k=top_k, dim=1)                                                                                                                                                                                                                    
RuntimeError: selected index k out of range

Information about propagation and consensus merging

The paper and results are very interesting. Where in the code you perform Propagation and Consensus merging as XMem itself cannot segment new objects that appear in the scene? I've been looking in group_module.py but I'm not sure if it's where propagation and consensus is done.

new video test on pretrained model

Hi, there. Thanks for your great job! I have a question about that how to test a new video with DEVA? Should I convert video into many images for test? Then change these tested images into video? I am so confused. I would appreciate you if you can reply me as soon as possible.
Best wishes!

About training details

When training w/o OVIS,

whether the DEVA's training settings are the same as described in this paper,

or in XMem?

Can your method be applied to other video segmentation tasks?

Can your method be applied to other video segmentation tasks? For example, video instance segmentation, video semantic segmentation and so on

fine-tune static image for medical domain

Hi,
Thanks to your recommendation i am looking at this repository as a backbone to my work.
We are trying to see if domain specific adaptations can be made to enhance the model's performance (specifically, medical image segmentation).

There is little written in both XMem and DEVA about how the pretraining process.

We have a labeled dataset of non-consecutive frames taken from a clip and want to somehow improve DEVA's capability on the in-between frames.

I think the best course of action would be using the static-image-pretraining to do that but i am not clear on how I should go about it.

Any help or a nudge in the right direction would be appreciated!

How to train my own dataset?

Hi,

For my dataset, I have about 100 from Labelme segmentation.
After that, what is the proper way to train the model?
Shall I follow this? Is it for training? Thx
https://github.com/hkchengrex/Tracking-Anything-with-DEVA/blob/main/docs/EVALUATION.md
Then python evaluation/eval_vos.py --dataset [dataset] --output [output directory]

Have you made all the code public?

Have you made all the code public? Can we retrain?

What is the minimum amount of video memory required for a graphics card?

ERROR: File "setup.py" not found

I am trying to install the DEVA but getting the below error:

pip install -e .
==>

ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: 
~/Tracking-Anything-with-DEVA
(A "pyproject.toml" file was found, but editable mode currently requires a setup.py based build.)

After looking online, there are no straightforward solutions.

Is it possible to document a more standarized installation method?

Thanks

CUDA_HOME

I must be doing something very stupid but what do I set CUDA_HOME to if I am installing pytorch and cuda via a conda environment as everything I've tried doesn't seem to get the Grounding Dino to install correctly.

Question regarding inference

Hi, I have my own model for unsupervised video object segmentation, but the main problem with my model is that it is just considering short term temporal information, so that I want to use DEVA's temporal propagation model to propagate the temporal information.

Now, how can I use DEVA's temporal propagation model for this purpose? If I generate the segmentation masks using my own model and then run the DEVA's temporal propagation model separately on the generated segmentation masks, is this method true?

box not right in multi-person scenes

Hi @hkchengrex ,thanks for sharing this wondeful work, but I got results like below(box contain two person):

the default cfg['DINO_THRESHOLD']=0.5 ,which lead to redundant boxes, so I set a bigger one. Hope you can give advice to solve this problem.

Error: cannot import name 'register_model' from 'timm.models.registry'

Hi!

I wanted to test the project, but after installing your fork of Grounding SAM and testing:

python demo/demo_gradio.py

I get the following error:

UserWarning: Failed to load custom C++ ops. Running on CPU mode Only!
warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!")
Traceback (most recent call last):
File "...\Tracking-Anything-with-DEVA\demo\demo_gradio.py", line 16, in
from deva.ext.grounding_dino import get_grounding_dino_model
File "...\Tracking-Anything-with-DEVA\deva\ext\grounding_dino.py", line 17, in
from deva.ext.MobileSAM.setup_mobile_sam import setup_model as setup_mobile_sam
File "...\Tracking-Anything-with-DEVA\deva\ext\MobileSAM\setup_mobile_sam.py", line 3, in
from deva.ext.MobileSAM.tiny_vit_sam import TinyViT
File "...\Tracking-Anything-with-DEVA\deva\ext\MobileSAM\tiny_vit_sam.py", line 19, in
from timm.models.registry import register_model
ImportError: cannot import name 'register_model' from 'timm.models.registry' (...\Python\Python310\site-packages\timm\models\registry.py)

Could you please give me some guidance on how to solve this error?

Thank you very much in advance!

It looks like this tracking is without tracking ID?

installing on WSL2

so going through the multitude of issues on the GroundingDino repository I finally managed to hack this together....

WSL2 WIndows 10 Ubuntu 22.04.2 LTS. shell

conda create --name deva3 python=3.10.12
conda activate deva3
git clone https://github.com/hkchengrex/Grounded-Segment-Anything
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
sudo apt install gcc
sudo sh cuda_11.7.0_515.43.04_linux.run
cd Grounded-Segment-Anything/
export BUILD_WITH_CUDA=True
export AM_I_DOCKER=False
export CUDA_HOME=/usr/local/cuda-11.7
cd Grounded-Segment-Anything/
export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export PATH=$PATH:/usr/local/bin/aws
export PATH=$PATH:[MYHIP]/bin
sudo apt-get install g++
pip install -e GroundingDINO ( may have to --force-reinstall if it doesn't install first time without errors))
pip install -q -e segment_anything
git clone https://github.com/hkchengrex/Tracking-Anything-with-DEVA
cd Tracking-Anything-with-DEVA/
pip install -q -e .
bash scripts/download_models.sh

and finally the following works...

python demo/demo_with_text.py --chunk_size 4 --img_path ./example/vipseg/images/12_1mWNahzcsAc --amp --temporal_setting semionline --size 480 --output ./example/output --prompt person.hat.horse

unfortunately the gradio demo doesn't produce the final result but just hangs after all the processing is done. Still need to figure that one out but the commandline seems to work for both prompt and automatic.
Amazing work @hkchengrex as usual!

Can I use it for ultrasound video object segmentation?

Hi, Thank you so much for your amazing work. However, I wanted to ask that if I want to use your approach for Ultrasound Video Object Segmentation using Unsupervised approach, is it advised? If so, which part could be suitable for the task? The part of DEVA which is for Unsupervised Video Object Segmentation? I could not find more details about implementation of that part in the paper, I want to know for Unsupervised Video Object Segmentation which image segmentation model is used and which model is used to propagate the temporal information.

My Ultrasound dataset is for Lypphoma cancer and it is consists of some annotated videos and also some annotated images.

Looking forward to hearing from you.

How many large memory machines do you need for training?

Congratulations on your new job. How many large memory machines do you need for training?

Missing frames

Hi,

Thanks for the great work.

I ran into a problem lately. I tried to base my code on the demo_with_text.py file. Instead of reading a sequence of video frames from disk, I found a way to send torch tensors directly to image_np. I still write the segmentation results to disk and read it back for the mask. However, I found that sometimes there will be missing frames that caused later error. During the execution, the code did not spit out any error. I am wondering

If you have seen these before (missing written Annotation frames), and how to avoid them?
If there are simpler way to directly get the masks from the API.

Thanks!

How to reproduce VIP-Seg results

Hi! Thanks open source the code and along with the great demo.

I just wonder how to reproduce your Video K-Net results with the DEVA on VIP-Seg.

Best Regards!
Xiangtai

Google Colab demo breaks while running DEVA

While running the google colab demo, I keep getting this error while running the process_frame_text call.

/content/Tracking-Anything-with-DEVA/Tracking-Anything-with-DEVA/Grounded-Segment-Anything/Tracking-Anything-with-DEVA
  0%|          | 0/45 [00:00<?, ?it/s]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
[<ipython-input-43-74d84311421c>](https://localhost:8080/#) in <cell line: 21>()
     30                     result_saver.writer = writer
     31 
---> 32                 process_frame_text(deva,
     33                                     gd_model,
     34                                     sam_model,

1 frames
[/content/Tracking-Anything-with-DEVA/deva/ext/with_text_processor.py](https://localhost:8080/#) in process_frame_with_text(deva, gd_model, sam_model, frame_path, result_saver, ti, image_np)
    109             # incorporate new detections
    110             mask, segments_info = make_segmentation_with_text(cfg, image_np, gd_model, sam_model,
--> 111                                                               prompts, new_min_side)
    112             frame_info.segments_info = segments_info
    113             prob = deva.incorporate_detection(image, mask, segments_info)

NameError: name 'input_prompt' is not defined

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1293) of binary

When I use: python -m torch.distributed.run --nproc_per_node=2 deva/train.py --exp_id deva_retrain --stage 0

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1293) of binary: /root/miniconda3/envs/deva/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/deva/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/deva/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/deva/lib/python3.8/site-packages/torch/distributed/run.py", line 798, in
main()
File "/root/miniconda3/envs/deva/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/root/miniconda3/envs/deva/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/root/miniconda3/envs/deva/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/root/miniconda3/envs/deva/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/deva/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

deva/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-09-21_10:43:21
host : autodl-container-b8cd11b052-2057d9be
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1293)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

My GPUs are 2080Ti, how can I solve the problem, thanks

Why is the training speed of DEVA much slower than XMem?

Why is the training speed of DEVA much slower than XMem? Is this normal?

hkchengrex / tracking-anything-with-deva Goto Github PK

tracking-anything-with-deva's Introduction

DEVA: Tracking Anything with Decoupled Video Segmentation

Highlights

Abstract

Demo Videos

Demo with Grounded Segment Anything (text prompt: "guinea pigs" and "chicken"):

Demo with Grounded Segment Anything (text prompt: "pigs"):

Demo with Grounded Segment Anything (text prompt: "capybara"):

Demo with Segment Anything (automatic points-in-grid prompting); original video follows DEVA result overlaying the video:

Demo with Segment Anything on a out-of-domain example; original video follows DEVA result overlaying the video:

Installation

Quick Start

Training and Evaluation

Limitations

Citation

References

tracking-anything-with-deva's People

Contributors

Stargazers

Watchers

Forkers

tracking-anything-with-deva's Issues

deva/train.py FAILED

Failures: <NO_OTHER_FAILURES>

Recommend Projects

Recommend Topics

Recommend Org

Failures:
<NO_OTHER_FAILURES>