google-deepmind / tapnet Goto Github PK

View Code? Open in Web Editor NEW

1.2K 33.0 116.0 3.3 MB

Tracking Any Point (TAP)

Home Page: https://deepmind-tapir.github.io/blogpost.html

License: Apache License 2.0

Python 25.68% Jupyter Notebook 74.13% Shell 0.19%

benchmark point-tracking robotics computer-vision deep-learning

tapnet's Introduction

Tracking Any Point (TAP)

[TAP-Vid] [TAPIR] [RoboTAP] [Blog Post] [BootsTAP] [TAPVid-3D]

tapir.mp4

Welcome to the official Google Deepmind repository for Tracking Any Point (TAP), home of the TAP-Vid and TAPVid-3D Datasets, our top-performing TAPIR model, and our RoboTAP extension.

TAP-Vid is a benchmark for models that perform this task, with a collection of ground-truth points for both real and synthetic videos.
TAPIR is a two-stage algorithm which employs two stages: 1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model is fast and surpasses all prior methods by a significant margin on the TAP-Vid benchmark.
RoboTAP is a system which utilizes TAPIR point tracks to execute robotics manipulation tasks through efficient imitation in the real world. It also includes a dataset with ground-truth points annotated on real robotics manipulation videos.
BootsTAP (or Bootstrapped Training for TAP) uses a large dataset of unlabeled, real-world video to improve tracking accuracy. Specifically, the model is trained to give consistent predictions across different spatial transformations and corruptions of the video, as well as different choices of the query points. We apply it to TAPIR to create BootsTAPIR, which is architecturally similar to TAPIR but substantially outperforms it on TAP-Vid.
TAPVid-3D is a benchmark and set of metrics for models that perform the 3D point tracking task. The benchmark contains 1M+ computed ground-truth trajectories on 4,000+ real-world videos.

This repository contains the following:

TAPIR Demos for both online colab demo and offline real-time demo by cloning this repo
TAP-Vid Benchmark for both evaluation dataset and evaluation metrics
RoboTAP for both evaluation dataset and point track based clustering code
BootsTAP for further improved BootsTAPIR model using large scale semi-supervised bootstrapped learning
TAPVid-3D Benchmark for the evaluation metrics and sample evaluation code for the TAPVid-3D benchmark.
Checkpoints for both TAP-Net (the baseline presented in the TAP-Vid paper), TAPIR and BootsTAPIR pre-trained model weights in both Jax and PyTorch
Instructions for both training TAP-Net (the baseline presented in the TAP-Vid paper) and TAPIR on Kubric

TAPIR Demos

The simplest way to run TAPIR is to use our colab demos online. You can also clone this repo and run TAPIR on your own hardware, including a real-time demo.

Colab Demo

You can run colab demos to see how TAPIR works. You can also upload your own video and try point tracking with TAPIR. We provide a few colab demos:

Standard TAPIR: This is the most powerful TAPIR / BootsTAPIR model that runs on a whole video at once. We mainly report the results of this model in the paper.
Online TAPIR: This is the sequential causal TAPIR / BootsTAPIR model that allows for online tracking on points, which can be run in real-time on a GPU platform.
Rainbow Visualization: This visualization is used in many of our teaser videos: it does automatic foreground/background segmentation and corrects the tracks for the camera motion, so you can visualize the paths objects take through real space.
Standard PyTorch TAPIR: This is the TAPIR / BootsTAPIR model re-implemented in PyTorch, which contains the exact architecture & weights as the Jax model.
Online PyTorch TAPIR: This is the sequential causal BootsTAPIR model re-implemented in PyTorch, which contains the exact architecture & weights as the Jax model.

Live Demo

Clone the repository:

git clone https://github.com/deepmind/tapnet.git

Switch to the project directory:

cd tapnet

Install the tapnet python package (and its requirements for running inference):

pip install .

Download the checkpoint

mkdir checkpoints
wget -P checkpoints https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy

Add current path (parent directory of where TapNet is installed) to PYTHONPATH:

export PYTHONPATH=`(cd ../ && pwd)`:`pwd`:$PYTHONPATH

If you want to use CUDA, make sure you install the drivers and a version of JAX that's compatible with your CUDA and CUDNN versions. Refer to the jax manual to install the correct JAX version with CUDA.

You can then run a pretrained causal TAPIR model on a live camera and select points to track:

cd ..
python3 ./tapnet/live_demo.py \

In our tests, we achieved ~17 fps on 480x480 images on a quadro RTX 4000 (a 2018 mobile GPU).

Benchmarks

This repository hosts two separate but related benchmarks: TAP-Vid (and its later extension, RoboTAP) and TAPVid-3D.

TAP-Vid

tap-vid.mp4

TAP-Vid is a dataset of videos along with point tracks, either manually annotated or obtained from a simulator. The aim is to evaluate tracking of any trackable point on any solid physical surface. Algorithms receive a single query point on some frame, and must produce the rest of the track, i.e., including where that point has moved to (if visible), and whether it is visible, on every other frame. This requires point-level precision (unlike prior work on box and segment tracking) potentially on deformable surfaces (unlike structure from motion) over the long term (unlike optical flow) on potentially any object (i.e. class-agnostic, unlike prior class-specific keypoint tracking on humans).

More details on downloading, using, and evaluating on the TAP-Vid benchmark can be found in the corresponding README.

RoboTAP Benchmark

RoboTAP is a following work of TAP-Vid and TAPIR that demonstrates point tracking models are important for robotics.

The RoboTAP dataset follows the same annotation format as TAP-Vid, but is released as an addition to TAP-Vid. In terms of domain, RoboTAP dataset is mostly similar to TAP-Vid-RGB-Stacking, with a key difference that all robotics videos are real and manually annotated. Video sources and object categories are also more diversified. The benchmark dataset includes 265 videos, serving for evaluation purpose only. More details in the TAP-Vid README. We also provide a demo of the segmentation algorithm used in the paper.

TAPVid-3D

TAPVid-3D is a dataset and benchmark for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D).

The benchmark features 4,000+ real-world videos, along with their metric 3D position point trajectories. The dataset is contains three different video sources, and spans a variety of object types, motion patterns, and indoor and outdoor environments. This repository folder contains the code to download and generate these annotations and dataset samples to view. Be aware that it has a separate license from TAP-Vid.

More details on downloading, using, and evaluating on the TAPVid-3D benchmark can be found in the corresponding README.

A Note on Coordinates

In our storage datasets, (x, y) coordinates are typically in normalized raster coordinates: i.e., (0, 0) is the upper-left corner of the upper-left pixel, and (1, 1) is the lower-right corner of the lower-right pixel. Our code, however, immediately converts these to regular raster coordinates, matching the output of the Kubric reader: (0, 0) is the upper-left corner of the upper-left pixel, while (h, w) is the lower-right corner of the lower-right pixel, where h is the image height in pixels, and w is the respective width.

When working with 2D coordinates, we typically store them in the order (x, y). However, we typically work with 3D coordinates in the order (t, y, x), where y and x are raster coordinates as above, but t is in frame coordinates, i.e. 0 refers to the first frame, and 0.5 refers to halfway between the first and second frames. Please take care with this: one pixel error can make a difference according to our metrics.

Download Checkpoints

tapnet/checkpoint/ must contain a file checkpoint.npy that's loadable using our NumpyFileCheckpointer. You can download checkpoints here, which should closely match the ones used in the paper.

model	checkpoint	config	backbone	resolution	DAVIS First (AJ)	DAVIS Strided (AJ)	Kinetics First (AJ)	RoboTAP First (AJ)
TAP-Net	Jax	tapnet_config.py	TSM-ResNet18	256x256	33.0%	38.4%	38.5%	45.1%
TAPIR	Jax & PyTorch	tapir_config.py	ResNet18	256x256	58.5%	63.3%	50.0%	59.6%
Online TAPIR	Jax	causal_tapir_config.py	ResNet18	256x256	56.2%	58.3%	51.2%	59.1%
BootsTAPIR	Jax & PyTorch	tapir_bootstrap_config.py	ResNet18	256x256	62.4%	67.4%	55.8%	69.2%
Online BootsTAPIR	Jax & PyTorch	tapir_bootstrap_config.py	ResNet18	256x256	59.7%	61.2%	55.1%	69.1

TAP-Net and TAPIR Training and Inference

We provide a train and eval framework for TAP-Net and TAPIR in the training directory; see the training README.

Citing this Work

Please use the following bibtex entries to cite our work:

@article{doersch2022tap,
  title={{TAP}-Vid: A Benchmark for Tracking Any Point in a Video},
  author={Doersch, Carl and Gupta, Ankush and Markeeva, Larisa and Recasens, Adria and Smaira, Lucas and Aytar, Yusuf and Carreira, Joao and Zisserman, Andrew and Yang, Yi},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={13610--13626},
  year={2022}
}

@inproceedings{doersch2023tapir,
  title={{TAPIR}: Tracking any point with per-frame initialization and temporal refinement},
  author={Doersch, Carl and Yang, Yi and Vecerik, Mel and Gokay, Dilara and Gupta, Ankush and Aytar, Yusuf and Carreira, Joao and Zisserman, Andrew},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={10061--10072},
  year={2023}
}

@article{vecerik2023robotap,
  title={{RoboTAP}: Tracking arbitrary points for few-shot visual imitation},
  author={Vecerik, Mel and Doersch, Carl and Yang, Yi and Davchev, Todor and Aytar, Yusuf and Zhou, Guangyao and Hadsell, Raia and Agapito, Lourdes and Scholz, Jon},
  journal={International Conference on Robotics and Automation},
  year={2024}
}

@article{doersch2024bootstap,
  title={{BootsTAP}: Bootstrapped Training for Tracking-Any-Point},
  author={Doersch, Carl and Luc, Pauline and Yang, Yi and Gokay, Dilara and Koppula, Skanda and Gupta, Ankush and Heyward, Joseph and Rocco, Ignacio and Goroshin, Ross and Carreira, Jo{\~a}o and Zisserman, Andrew},
  journal={arXiv preprint arXiv:2402.00847},
  year={2024}
}

@misc{koppula2024tapvid3d,
      title={{TAPVid}-{3D}: A Benchmark for Tracking Any Point in {3D}},
      author={Skanda Koppula and Ignacio Rocco and Yi Yang and Joe Heyward and João Carreira and Andrew Zisserman and Gabriel Brostow and Carl Doersch},
      year={2024},
      eprint={2407.05921},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.05921},
}

License and Disclaimer

Software and other materials specific to the TAPVid-3D benchmark are covered by the license outlined in tapvid3d/LICENSE file.

All other software in this repository is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at:

https://www.apache.org/licenses/LICENSE-2.0

All other non-software materials released here for the TAP-Vid datasets, i.e. the TAP-Vid annotations, as well as the RGB-Stacking videos and RoboTAP videos, are released under a Creative Commons BY license. You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode .

The original source videos of DAVIS come from the val set, and are also licensed under creative commons licenses per their creators; see the DAVIS dataset for details. Kinetics videos are publicly available on YouTube, but subject to their own individual licenses. See the Kinetics dataset webpage for details.

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

tapnet's People

Contributors

Stargazers

Watchers

Forkers

github30 userratos benjamesbabala jie311 arch-raven joskid trungpx harryhsing pinkdiamond1 mbrukman chhaviilli danikiyasseh em-yu arjunvb m43 hihunjin sunglyoungkim erdal-pb mkocabas healthonrails anilcosaran pranay144 gvillegas paperwave shanggdlk ssilenzi forkprojectinxmu avinashay yuki-inaho yasinyakub felipe-parodi netjerikh ryukijano saulocatharino saitejamalyala knightmaiga laszlokiraly parandcor deanofthewebb elpadocan lucasmiranda42 hzy5000 waondering roym899 yatheshta gear casaarseniy uts-ri quinnahalim harelix ikechukwuabuah 2132660698 kushalkolar pdragonlabs mrtnbm nomackleo holliemin9090 rajeev921 rerun-io sjhjrcode huguensjean ibtehajali67 nangal dearyangyu giaco5988 naosi braca51e gyanigk certascantek xuanlinli17 anning808 absalan zhaoqunzhong bhack baishibona abhinav95 junyaoshi riponazad ryanxli dkhanna511 alexandrebarral dongyanchaotj sriharshakoribille g3parking arebs23 anhuipl2010 dearborn-open-ai ijaytelgote hanchunrui agapegithub wuzijian1997 youknowt rouai hadryan eltociear n0aht sorokinvld jorik041 lexagr arunbanswal

tapnet's Issues

Training on a new dataset

I am looking to train TAP-Net on a new dataset, in particular modifying the existing datasets (DAVIS, Kubric, RGB Stacking, etc.) to use new keypoints that we generate. It is not immediately clear to me how we should add a new dataset, and how to use the existing scripts such as experiment.py in order to train TAP-Net on the new dataset.

It looks like Kubric is the only dataset that is supported for training, whereas DAVIS and RGB Stacking are included for inference. Could you walk me through what format TAP-Net expects from a dataset, and where in the code/config I would need to add functionality in order to use the new dataset?

what is going on when "recompilation" happens in a for loop

Hi everyone,

from the previouos threads, we know that the main factor for slow inference is "change in input tensor size causing recompilation", now if you may, i would like to break down this statement in a more clear way:

notice for the following code: (ps: it is the colab format i am using)

def inference():
    ...
    rng = jax.random.PRNGKey(42)
    outputs, _ = model_apply(params, state, rng, frames, query_points)   ## highlight 1
    ...
    return ...

model = hk.transform_with_state(build_model)
model_apply = jax.jit(model.apply)   ## highlight 2

for video in videos:
    ...
    tracks, visibles = **inference**(frames, query_points)  ## highlight 3

May i ask:

Concretely, in which call, the input tensor size shall be fixed to avoid recompliation? (is it inside inference() or in highlight 1 or other)
where indeed is compilation/recompilation happens?

Conjecture:

if it is "jax.jit" that causes compilation, then supposably from highlight 2, a compiled version of model_apply is returned. After this, no other jax.jit is called, we simply enter a for loop that continues calls for "inference()". Everytime inference is called, it used the pre-compiled version of "model_apply", it do not have access to the outside "jax.jit". So where exactly does this recompilation stem from?

Much thanks to one who read through!

OOM error

Hi,
Each time I am trying to run videos which are more than 300 frames I am running into a memory allocation error. Otherwise for anyting below that it seems to work really well and with the GPU engaged etc.
Do you have any recommendation? I am putting the traceback below

2023-08-11 16:43:58.122328: W external/xla/xla/service/gpu/conv_algorithm_picker.cc:1003] Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.34 = (f32[500,64,128,128]{3,2,1,0}, u8[0]{0}) custom-call(f32[500,64,128,128]{3,2,1,0} %maximum.62, f32[64,64,3,3]{3,2,1,0} %transpose.2482, f32[64]{0} %broadcast.2604, f32[500,64,128,128]{3,2,1,0} %get-tuple-element.147), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255}, backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":1,"leakyrelu_alpha":0}

Original error: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 2113929216 bytes.

As a result, convolution performance may be suboptimal.
2023-08-11 16:43:59.367796: W external/xla/xla/service/gpu/conv_algorithm_picker.cc:1003] Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.35 = (f32[500,64,128,128]{3,2,1,0}, u8[0]{0}) custom-call(f32[500,64,128,128]{3,2,1,0} %maximum.65, f32[64,64,3,3]{3,2,1,0} %transpose.2485, f32[64]{0} %broadcast.2618, f32[500,64,128,128]{3,2,1,0} %get-tuple-element.149), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255}, backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":1,"leakyrelu_alpha":0}

Original error: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 2113929216 bytes.

As a result, convolution performance may be suboptimal.

2023-08-11 16:44:47.337077: W external/tsl/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 5.89GiB (rounded to 6324376320)requested by op
2023-08-11 16:44:47.337531: W external/tsl/tsl/framework/bfc_allocator.cc:497] ________________________________________________________________________________________***_____
2023-08-11 16:44:47.339353: E external/xla/xla/pjrt/pjrt_stream_executor_client.cc:2593] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 6324376288 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 493.53MiB
constant allocation: 8.1KiB
maybe_live_out allocation: 781.4KiB
preallocated temp allocation: 5.89GiB
preallocated temp fragmentation: 30.84MiB (0.51%)
total allocation: 6.37GiB
total fragmentation: 31.63MiB (0.48%)
Peak buffers:
Buffer 1:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_0/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
XLA Label: custom-call
Shape: f32[500,64,128,128]
==========================

Buffer 2:
	Size: 1.95GiB
	Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/jit(relu)/max" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=252 deduplicated_name="fusion.656"
	XLA Label: fusion
	Shape: f32[500,64,128,128]
	==========================

Buffer 3:
	Size: 1.95GiB
	Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
	XLA Label: custom-call
	Shape: f32[500,64,128,128]
	==========================

Buffer 4:
	Size: 375.00MiB
	Entry Parameter Subshape: f32[1,500,256,256,3]
	==========================

Buffer 5:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 6:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 7:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 8:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 9:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 10:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 11:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 12:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 13:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 14:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 15:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

XlaRuntimeError Traceback (most recent call last)
Cell In[18], line 9
7 vidrames = copy.copy(frames)
8 frames = vidrames[:500]
----> 9 tracks, visibles = inference(frames, query_points)

Cell In[3], line 92, in inference(frames, query_points)
90 # Model inference
91 rng = jax.random.PRNGKey(42)
---> 92 outputs, _ = model_apply(params, state, rng, frames, query_points)
93 outputs = tree.map_structure(lambda x: np.array(x[0]), outputs)
94 tracks, occlusions, expected_dist = outputs['tracks'], outputs['occlusion'], outputs['expected_dist']

[... skipping hidden 10 frame]

File ~/miniconda3/envs/tapir/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py:1229, in ExecuteReplicated.call(self, *args)
1224 self._handle_token_bufs(
1225 results.disassemble_prefix_into_single_device_arrays(
1226 len(self.ordered_effects)),
1227 results.consume_token())
1228 else:
-> 1229 results = self.xla_executable.execute_sharded(input_bufs)
1230 if dispatch.needs_check_special():
1231 out_arrays = results.disassemble_into_single_device_arrays()

XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 6324376288 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 493.53MiB
constant allocation: 8.1KiB
maybe_live_out allocation: 781.4KiB
preallocated temp allocation: 5.89GiB
preallocated temp fragmentation: 30.84MiB (0.51%)
total allocation: 6.37GiB
total fragmentation: 31.63MiB (0.48%)
Peak buffers:
Buffer 1:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_0/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
XLA Label: custom-call
Shape: f32[500,64,128,128]
==========================

Buffer 2:
	Size: 1.95GiB
	Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/jit(relu)/max" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=252 deduplicated_name="fusion.656"
	XLA Label: fusion
	Shape: f32[500,64,128,128]
	==========================

Buffer 3:
	Size: 1.95GiB
	Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
	XLA Label: custom-call
	Shape: f32[500,64,128,128]
	==========================

Buffer 4:
	Size: 375.00MiB
	Entry Parameter Subshape: f32[1,500,256,256,3]
	==========================

Buffer 5:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 6:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 7:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 8:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 9:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 10:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 11:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 12:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 13:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 14:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 15:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

TAPIR Checkpoint for Training/Finetuning

In the previous iteration of TAP-Net, there was a checkpoint file released that had not only model state but also the optimizer state as well as the global_step. This was helpful since you could load this in directly into an experiment and easily start finetuning. However, I don't believe that there is a similar checkpoint file for TAPIR. In the README there is a checkpoint for the "online" version of the model, and in one of the linked notebooks there is a checkpoint for the offline model, but neither includes training state.

Could you release a checkpoint with the training state included for the TAPIR model?

How to use GPU for inference

HI author of tapnet,

thanks for your great work.

I want to ask how could i use GPU for inference in your work.

with normal setup, it reminds me "no GPU/TPU, falling back to cpu" when i actually have GPU available.

Then i see that it seems i need correct version of jax,
so i then create a new env, pip install the compatible version of jax as instructed in a link you provide, and then set up the remaining dependency as in requirement.txt. but i got this:
"Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"
and
"jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details."

do you know why is that? thanks a lot

Questions about the evaluation

Hi TapNet authors,

Just to confirm if I understand the evaluation correctly: It is clear that you train and test tap-net on resized images of 256x256. When you run the baseline methods, you also use resized images of 256x256 as input, is that correct?

If so, could you explain why it is preferable to evaluate on reduced resolution when the full-resolution images & annotations are available?

Incorrect evaluation in the 'first' mode?

Hi,

I wonder about the code here

It seems to me that it is supposed to turn off evaluation of points before the query frame.
I however don't understand why / how is the 'index' computed. We already have the query frames computed here, so why the np.where?

Anyways, I thing that the code is incorrect. The for loop loops over the "batch" dimension, which is always 1. This means that the current code does not turn off the evaluation before the query point.

I think it should be corrected to:

for i in range(gt_occluded.shape[1]): # loop over the query points
  index = np.where(gt_occluded[0, i] == 0)[0][0]  # find the first unoccluded frame for that query point (in singleton batch 0)
  evaluation_points[i, :index] = False

or equivalently:

N_queries = gt_occluded.shape[1]
for i in range(N_queries):
  evaluation_points[i, :query_frame[i]] = False

Am I missing something? Or can I make a pull request for this correction?

BTW it would be very nice if you could provide the raw results (i.e., the pred_occluded and pred_tracks arguments of compute_tapvid_metrics) of the trackers evaluated in your papers, so that people could evaluate the trackers on different metrics without having to re-implement and without wasting quite a lot of compute (pretty please?)

Best regards!

how to set a specific gpu to use in tapir code

Hi authors,

suppose that i have successfully installed compatible version of jax w.r.t to cuda/cudnn, how do i modify the following code so that i could:

select a certain gpu to use i.e. cuda:1
check that i am indeed using an gpu
avoid every-time compilation(after i set the input tensor size fixed)

ps: i am running inference on videos in a dataset by the following for loop code (it is the colab format):

  def inference():
      ...
      ...

  model = hk.transform_with_state(build_model)
  model_apply = jax.jit(model.apply)
  
  ### for loop every video in a dataset
  for i in os.listdir("/home/chy/dataset"):
      if not i.startswith("."):
  
          rh,rw = 256,256
          video = np.load("/home/chy/dataset/"+i)
          height, width = video.shape[1:3]
          frames = media.resize_video(video, (rh, rw))
          query_points = points

          tracks, visibles = inference(frames, query_points)
          tracks = transforms.convert_grid_coordinates(tracks, (rw, rh), (width, height))
          np.save("path",tracks)

much thx~

Reproducing RAFT Results

Thanks for the nice work! Is the codebase used to evaluate RAFT possibly available publicly?

I was using your evaluate_dataset.py utilities to load DAVIS and KUBRIC and to later compute the PCK metrics but wasn't able to reproduce the reported numbers. For example, my average_pts_within_thresh for DAVIS is 87.5% and does not match the reported 46.3%. The numbers that I've got for strided query evaluation with stride 5 are the following:

RAFT Results	DAVIS	KUBRIC
ade_visible	1.42952	0.954813
pts_within_0.01	20.585	20.6073
pts_within_0.1	26.0005	31.6371
pts_within_0.5	49.9805	68.3699
pts_within_1	66.5964	82.676
pts_within_2	82.5936	91.0452
pts_within_4	92.3043	95.6189
pts_within_8	96.7841	98.0127
pts_within_16	99.0649	99.1844
average_pts_within_thresh	87.4687	93.3074

In case you might spot something obviously incorrect, here is the ground truth and my predicted trajectory for the first point and the first 5 frames of DAVIS, all of which were visible, alongside relevant summary metrics computed for that datapoint alone.

*** Results dictionary for the first datapoint of RAFT:
iter: 1
video_idx: 0
point_idx_in_video: 0
trajectory_gt: tensor([[131.9137,  87.8248],
        [135.1333,  89.2444],
        [137.8000,  91.8519],
        [140.2000,  95.8815],
        [142.6000, 102.5185]])
trajectory_pred: tensor([[131.9137,  87.8248],
        [135.3244,  89.0901],
        [137.8152,  91.3863],
        [140.1928,  94.9903],
        [142.7608, 101.2055]])
visibility_gt: tensor([True, True, True, True, True])

*** Summary metrics for the first datapoint of RAFT:
idx: 1--0--0
ade_visible: 0.5850879549980164 (including query point)
pts_within_0.01: 0.0 (percentage)
pts_within_0.1: 0.0
pts_within_0.5: 50.0
pts_within_1: 75.0
pts_within_2: 100.0
pts_within_4: 100.0
pts_within_8: 100.0
pts_within_16: 100.0

Also, here are a few GIFs for a random data batch of DAVIS (top) and KUBRIC (bottom):

Ground Truth	Prediction	Prediction on top of GT

If relevant, these are the entry points into my codebase:

PyTorch DataLoader
Dataset Wrapper
Dataset Stride (stride is called "sequence length")
RAFT loaded
RAFT forward pass
Metrics computation
Environment Setup: part 1, part 2

Cpu Inference Problems on live_demo

I am getting inference for live video on CPU,Speed is very low , I thing it is because of cpu but my problem is that to increase number of frame in video. Do you have idea about it.
In Simple word, want to increase speed on cpu

Thanks in advance

Kubric dataset

Hi, when I use the Kubric(Movi-E), I find the number of the train set is about 9750, the validation set is about 250 and the test set is about 999, which is different with the number 38,325/799 for train/validation in paper.

Question about the evaluation

Hi:
Thanks for your great job. I have a problem with the evaluation.
The paper proposes two different ways for evaluation (first or stride fashion). Using the first fashion for a point only in the first frame to be tracked, if the point is occluded at a certain timestamp t, and then appears again, is the predicted trajectory after t will be evaluated?

Running TAPNET in Windows?

I am trying to run TAPNET in Windows 11 with Anaconda3-2023.07-2-Windows-x86_64.exe but I still get a JAX related error message.
I already installed jaxlib-0.4.11-cp311-cp311-win_amd64.whl but it´s apparently not enough.

(base) C:\Users\ATC\tapnet>python ./experiment.py --config ./configs/tapir_config.py
Traceback (most recent call last):
  File "C:\Users\ATC\tapnet\experiment.py", line 29, in <module>
    from jaxline import experiment
  File "C:\ProgramData\anaconda3\Lib\site-packages\jaxline\experiment.py", line 30, in <module>
    from jaxline import utils
  File "C:\ProgramData\anaconda3\Lib\site-packages\jaxline\utils.py", line 335, in <module>
    rng: jnp.DeviceArray,
         ^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\Lib\site-packages\jax\_src\deprecations.py", line 53, in getattr
    raise AttributeError(f"module {module!r} has no attribute {name!r}")
AttributeError: module 'jax.numpy' has no attribute 'DeviceArray'

conda list includes this:

jax                       0.4.14                   pypi_0    pypi
jaxlib                    0.4.11                   pypi_0    pypi
jaxline                   0.0.5                    pypi_0    pypi

Can somebody tell me what I am doing wrong?

Training with CUDA

Hi, I have already installed JAX version with CUDA.
However, it seems when I run python ./experiment.py --config ./configs/tapnet_config.py,
it is still trying to use the TPU and eventually fails cause I don't have access to it. And then, end up using CPU by default.

I wonder how to use the CUDA to train and evaluate the model.

Thank you very much for your support in advance!

Missing videos in kinetics dataset

Hello,

Thank you for releasing these datasets.

I'm currently processing the dataset that you used to evaluate/train the model. However, when generating the pkl file for the kinetics dataset, I receive the following warning:

Could you help to double check the dataset again if you have these videos?

FYI, I follow this link to download and extract the kinetics dataset: https://github.com/cvdfoundation/kinetics-dataset
When checking their annotation file: https://s3.amazonaws.com/kinetics/700_2020/annotations/val.csv, I could not find these videos either.

Best,
Hung

Kubric dataset: few occluded points are not labeled as occluded

Hi,

First, thanks for sharing your work.

In the Kubric dataset, I see that almost all occluded points are correctly labeled as occluded, but I still visually obverse few points among 1200=50 (videos) x 24 (points per video) are occluded. I'm sure these points are occluded because the foreground object has moved a large distance and the ground truth still stays the original position, like a position of the ground. I looked into the code, but still have not idea what caused this problem. Could you please give some suggestions or even a solution? Thanks.

There is also an issue that "reprojected position is off for points on objects" under the Kubric repository. Would you be willing to answer that too? google-research/kubric#280

Best,
Zhihao

The evaluation results with the first fashion for TAPIR

Hi, I found the evaluation results in the strided fashion in the paper, and I wonder whether you have the results with the first fashion？

May I ask is there any code for the training?

impressive!
May I ask is there any code for the training?

Code and checkpoint for Kubric-VFS-Like baseline

Hi all,

I have been working on evaluating the Kubric-VFS-like baseline on the Kubric dataset. Would it be possible for you to provide the evaluation code for Kubric-VFS-Like baseline and the checkpoint used to get those results? And for evaluating on Kubric, was the Kubric-VFS-like baseline also trained on Kubric?

Also, it would be great if you could provide the evaluation code and checkpoint for RAFT as well.

Thanks!

eval results seems not so good?

note:this is not a bug event
when i run scripts as flow:
python3 ./tapnet/experiment.py \ --config=./tapnet/configs/tapnet_config.py \ --jaxline_mode=eval_davis_points \ --config.checkpoint_dir=./tapnet/checkpoint/ \ --config.experiment_kwargs.config.davis_points_path=/path/to/tapvid_davis.pkl
i got result in \tapnet\checkpoint\eval_davis_points\100000,there are 0.mp4 - 9.mp4 generated; in every mp4 video,there are four frames in parent PIC,though all frames seems blur; the points in pics all seems bad, is it a real result, i mean ,why results not seems amazing instead of disappointing?

Is TAP-Net an offline algorithm?

Hi, thank you very much for your great work!

Could I consider TAP-Net as an offline algorithm? Because TSM-ResNet-18 is used as the backbone.

ValueError: Unable to retrieve parameter 'w' when trying to use `eval_inference`

When invoking experiment.py to do inference:

python3 ./tapnet/experiment.py \
  --config=./tapnet/configs/tapnet_config.py \
  --jaxline_mode=eval_inference \
  --config.checkpoint_dir=./tapnet/checkpoint/ \
  --config.experiment_kwargs.config.inference.input_video_path=fixed10.mp4 \
  --config.experiment_kwargs.config.inference.output_video_path=result.mp4 \
  --config.experiment_kwargs.config.inference.resize_height=256 \
  --config.experiment_kwargs.config.inference.resize_width=256 \
  --config.experiment_kwargs.config.inference.num_points=20

I get the following error:

Traceback (most recent call last):
  File "./tapnet/experiment.py", line 431, in <module>
    app.run(main)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "./tapnet/experiment.py", line 424, in main
    platform.main(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
    return f(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
    train.evaluate(experiment_class, config, checkpointer, writer,
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
    return fn(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
    scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
    evaluate_out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 405, in evaluate
    eval_scalars = point_prediction_task.evaluate(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 370, in evaluate
    self._eval_inference(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 981, in _eval_inference
    outputs, _ = self._infer_batch(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 440, in _infer_batch
    output, _ = functools.partial(wrapped_forward_fn, input_key=input_key)(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
    out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 125, in forward
    return self.point_prediction.forward_fn(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 150, in forward_fn
    return shared_modules[self.model_key](
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/tapnet/tapnet_model.py", line 215, in __call__
    latent = self.tsm_resnet(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/tapnet/models/tsm_resnet.py", line 383, in __call__
    net = hk.Conv2D(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
    w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
    return wrapped._current(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
    raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.

Attempting to use a local GPU. The live_demo.py script works for me, so not sure what the issue is here.

Is there a demo or simple inference script for demo purposes?

I tried exploring the codebase but couldn't find a simple way to try out the model on a new video to evaluate performance? Did I miss the script (I am unfamiliar with JAX, coming from Pytorch ecosystem).

Kubric evaluation set

When evaluating the model on kubric, the generation code still randomly samples points from the kubric val videos each time.

How do you make sure the sampled points are similar each time they are sampled?

Comparison between TAP-Net and PIPs

Hi,

You only mention that PIPs is a concurrent work to TAP-Vid/Net.

Are you planning to show any comparisons between these 2 works?

Evaluation setting - Tracking forward & backward?

I would like to clarify the evaluation setting.

Because the query points can be sampled any where within the video (not just in the first frame), do we have to track them backward in time or just need to track them forward?

For example, if the query point is sample at frame T, do we have to find its position in frames 0->(T-1), or just need to track it in (T+1)->Max_Frame?

tapir give out negative corrdinate

Hi authors,

i find that tapir is sometimes giving negative coordinate, why do it happen and what do you think that it represent?

thx!

Panning MOVi-E

Hi,

Thank you for the great work and code release. May I know how should I obtain "panning MOVi-E" in the TAPIR paper? I see references in this repo to the standard Kubric API but I don't see any pointers to the "panning" version.

Thanks a lot!

Is it assumed that image corners are aligned for image coordinates?

Hi,
In utils.transforms.convert_grid_coordinates, it is mentioned in the comments that
"""Convert image coordinates between image grids of different sizes.

By default, it assumes that the image corners are aligned. Therefore,
it adds .5 (since (0,0) is assumed to be the center of the upper-left grid
cell), multiplies by the size ratio, and then subtracts .5.
"""
I wanted to ask if it is indeed assumed that the image corners are aligned?

In addition, I also see that the .5 addition and subtraction is not actually implemented in the code. So does that mean the image corners are not aligned?

Thanks!

Error when evaluating on Kubric dataset

I'm trying to evaluate TAP-Net on the Kubric dataset, and I'm getting the error shown below. I am running the following script: python3 ./tapnet/experiment.py --config=./tapnet/configs/tapnet_config.py --jaxline_mode=eval_kubric --config.checkpoint_dir=/data3/tap/tap/tapnet_checkpoint/.

Any idea how to fix this? Thanks!

I0228 19:56:23.706737 140398657533120 train.py:152] Evaluating with config:
best_model_eval_metric: ''
best_model_eval_metric_higher_is_better: true
checkpoint_dir: /data3/tap//tap/tapnet_checkpoint/
checkpoint_interval_type: null
dataset_names: &id001 !!python/tuple
- kubric
eval_initial_weights: true
eval_modes: &id002 !!python/tuple
- eval_davis_points
- eval_jhmdb
- eval_robotics_points
- eval_kinetics_points
evaluate_every: 10000
experiment_kwargs:
  config:
    checkpoint_dir: /data3/tap//tap/tapnet_checkpoint/
    datasets:
      dataset_names: *id001
      kubric_kwargs:
        batch_dims: 8
        shuffle_buffer_size: 128
        train_size: !!python/tuple
        - 256
        - 256
    davis_points_path: ''
    eval_modes: *id002
    evaluate_every: 10000
    fast_variables: !!python/tuple []
    inference:
      input_video_path: ''
      num_points: 20
      output_video_path: ''
      resize_height: 256
      resize_width: 256
    jhmdb_path: ''
    optimizer:
      adam_kwargs:
        b1: 0.9
        b2: 0.95
        eps: 1.0e-08
      base_lr: 0.002
      cosine_decay_kwargs:
        end_value: 0.0
        init_value: 0.0
        warmup_steps: 5000
      max_norm: -1
      optimizer: adam
      schedule_type: cosine
      weight_decay: 0.01
    robotics_points_path: ''
    save_final_checkpoint_as_npy: true
    shared_modules:
      shared_module_names: &id003 !!python/tuple
      - tapnet_model
      tapnet_model_kwargs: {}
    supervised_point_prediction_kwargs:
      prediction_algo: cost_volume_regressor
    sweep_name: default_sweep
    training:
      n_training_steps: 100000
interval_type: secs
log_all_train_data: false
log_tensors_interval: 60
log_train_data_interval: 120.0
logging_interval_type: null
max_checkpoints_to_keep: 5
one_off_evaluate: false
random_mode_eval: same_host_same_device
random_mode_train: unique_host_unique_device
random_seed: 42
save_checkpoint_interval: 10
shared_module_names: *id003
train_checkpoint_all_hosts: false
training_steps: 100000

I0228 19:56:23.755014 140398657533120 xla_bridge.py:173] Remote TPU is not linked into jax; skipping remote TPU.
I0228 19:56:23.755347 140398657533120 xla_bridge.py:357] Unable to initialize backend 'tpu_driver': Could not initialize backend 'tpu_driver'
I0228 19:56:24.424445 140398657533120 xla_bridge.py:357] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I0228 19:56:24.425570 140398657533120 xla_bridge.py:357] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0228 19:56:28.558446 140398657533120 supervised_point_prediction.py:979] Saving videos to /data3/tap//tap/tapnet_checkpoint/eval_kubric/0
I0228 19:56:28.567507 140398657533120 dataset_info.py:565] Load dataset info from /data3/tap/kubric/movi_e/256x256/1.0.0
W0228 19:56:28.572742 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'uint8'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to uint8.
W0228 19:56:28.574135 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'uint16'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to uint16.
I0228 19:56:28.620231 140398657533120 dataset_info.py:654] Fields info.[splits] from disk and from code do not match. Keeping the one from code.
I0228 19:56:28.620935 140398657533120 dataset_builder.py:522] Reusing dataset movi_e (/data3/tap/kubric/movi_e/256x256/1.0.0)
W0228 19:56:28.622349 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.622643 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'float32'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to float32.
W0228 19:56:28.622788 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.622916 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623071 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'int32'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to int32.
W0228 19:56:28.623173 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623292 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623408 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623907 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624111 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'string'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to object.
W0228 19:56:28.624262 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624418 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624617 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.625051 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'int64'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to int64.
W0228 19:56:28.625315 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'bool'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to bool.
I0228 19:56:29.712036 140398657533120 logging_logger.py:49] Constructing tf.data.Dataset movi_e for split None, from /data3/tap/kubric/movi_e/256x256/1.0.0
W0228 19:56:32.420510 140398657533120 deprecation.py:337] From /data/anaconda3/envs/tapnet/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0228 19:56:39.477390 140398657533120 deprecation.py:541] From /data/anaconda3/envs/tapnet/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
2023-02-28 19:56:44.292071: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
2023-02-28 19:56:45.191140: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
Traceback (most recent call last):
  File "./tapnet/experiment.py", line 429, in <module>
    app.run(main)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "./tapnet/experiment.py", line 421, in main
    platform.main(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
    return f(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
    train.evaluate(experiment_class, config, checkpointer, writer,
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
    return fn(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
    scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
    evaluate_out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 404, in evaluate
    eval_scalars = point_prediction_task.evaluate(
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 514, in evaluate
    self._eval_epoch(
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 1005, in _eval_epoch
    scalars, viz = eval_batch_fn(params, state, inputs, rng)
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 766, in _eval_batch
    occlusion_logits, tracks, loss_scalars = self._infer_batch(
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 577, in _infer_batch
    output, _ = functools.partial(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
    out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 122, in forward
    return self.point_prediction.forward_fn(
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 313, in forward_fn
    return shared_modules['tapnet_model'](
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/ubuntu/contractive/tapnet/tapnet_model.py", line 341, in __call__
    latent = self.tsm_resnet(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/ubuntu/contractive/tapnet/models/tsm_resnet.py", line 383, in __call__
    net = hk.Conv2D(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
    w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
    return wrapped._current(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
    raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.

About the evaluation on Kinetics

Hi, thanks for your excellent work!
I have a problem with the evaluation on TAP-Vid-Kinetics.
I found three files in TAP-Vid-Kinetics, including "train.txt", "test.txt", and "val.txt". Are all used for evaluation? Or just the videos in the test set are used?

How does model deal with the occlusions?

Hi !
I wonder how TAPNet deals with the occlusions. I find the model leverages the Huber loss only at visible points. For the evaluation of "first fashion", when a point is occluded in the next frame and appears later, how the model performs point tracking? Does the model first detect the occlusion and then skip this frame, and track the point in the following frames?

What is the max VRAM requirements / usage for inference?

Hello! It seems the front page is missing this very crucial information, without it nobody knows if they can run it or not and I may waste a lot of time setting up the repository only to find out it doesn't fit.

Cheers!

How is TapNet different from omnimotion?

https://github.com/qianqianwang68/omnimotion

Google had a similar work using the same testing images in your work? How is your work different from it fundamentally?

I am a hobbyist, so it's gratfull for you to spend time explaining briefly the differences, purpose, approach, and result wise.

Thanks!

CSV file error in generating tapvid kinetic data.

I follow the steps in https://github.com/deepmind/tapnet/blob/main/data/README.md to generate tapvid kinetic dataset.
When processing the clips, ie. run the generate_tapvid.py scipt, I encounter AssertiongError, message is shown below:

  File "generate_tapvid.py", line 177, in main
    videos = csv_to_dataset(FLAGS.csv_path, videos_path)
  File "generate_tapvid.py", line 73, in csv_to_dataset
    assert len(row) == 3 + 3 * 250, f"{len(row)}"
AssertionError: 711

It looks like that the file tapvip_kinetic.csv contains some invalid format, it is extracted from https://storage.googleapis.com/dm-tapnet/tapvid_kinetics.zip.
Can you check it?
Thanks!

Run TAPIR Inference without resizing the video

Hi tEAM,

Can the inference be done without resizing the video frames?

AttributeError: module 'jax.numpy' has no attribute 'DeviceArray'

In jax 0.4.14, jax.numpy.DeviceArrary is deprecated, see
https://jax.readthedocs.io/en/latest/changelog.html#jax-0-4-14-july-27-2023

This makes your code produce the error in the title

GPU JAX does not speed up the inference of TAP-Net?

First of all, thank you so much for releasing this great work. I test it on my custom videos, and the tracking is really robust and accurate.

Following the README, I'm able to run the CPU version of the code (because the JAX in requirements.txt is the CPU-only version) at a very high speed. I use a 300-frame video and track 24 points from the initial frame. It takes only 10s to output the tracking results (excluding the video painting/saving time).

Then, I'm thinking of using the GPU version of JAX to further speed up the inference. I successfully installed JAX-cuda (see the screenshot below) and nvidia-smi confirms that the code is indeed using GPU (consumes 20GB memory on an RTX 3090 GPU). However, the running time is 15s -- much slower than JAX-CPU's 10s. For your reference, I'm using the command from the README:

python3 ./tapnet/experiment.py \
  --config=./tapnet/configs/tapnet_config.py \
  --jaxline_mode=eval_inference \
  --config.checkpoint_dir=./tapnet/checkpoint/ \
  --config.experiment_kwargs.config.inference.input_video_path=MY_VIDEO.mp4 \
  --config.experiment_kwargs.config.inference.output_video_path=result.mp4 \
  --config.experiment_kwargs.config.inference.resize_height=256 \
  --config.experiment_kwargs.config.inference.resize_width=256 \
  --config.experiment_kwargs.config.inference.num_points=24

I'm new to JAX, so I'd really appreciate it if you can provide some hints on why my GPU code runs slower then CPU. Thanks!

EDIT: after looking at some JAX-GPU related issues and the document, is it simply because the video/point size is too small? I.e. if I use a batch of video or more points, GPU should be faster?

Attribute Error: module 'tensorflow_datasets.core' has no attribute 'ReadWritePath'

Hi,
I am trying to run inference with a toy movie using the following command -
(tapnet) pinot:$ python3 ./experiment.py --config=./configs/tapnet_config.py --jaxline_mode=eval_inference --config.checkpoint_dir=./checkpoint/ --config.experiment_kwargs.config.inference.input_video_path=test_data/ta.mp4 --config.experiment_kwargs.config.inference.output_video_path=result.mp4 --config.experiment_kwargs.config.inference.resize_height=256 --config.experiment_kwargs.config.inference.resize_width=256 --config.experiment_kwargs.config.inference.num_points=20
I have created a virtual conda environment and installed the deps using the requirements.txt file. Running the above command in the virtual env results in the following error -

2023-06-26 20:10:33.108623: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/home/pshah/software/tapnet/./experiment.py", line 32, in <module>
    from kubric.challenges.point_tracking import dataset
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/__init__.py", line 20, in <module>
    from kubric.core.scene import Scene
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/core/__init__.py", line 17, in <module>
    from .scene import Scene
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/core/scene.py", line 20, in <module>
    from kubric.utils import next_global_count
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/utils.py", line 52, in <module>
    from kubric.custom_types import PathLike
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/custom_types.py", line 25, in <module>
    PathLike = Union[str, tfds.core.ReadWritePath]
AttributeError: module 'tensorflow_datasets.core' has no attribute 'ReadWritePath'

Is there a fix for this issue?
TIA!

Nearly 0% GPU utilization for a large majority of time during inference

I am running inference on a sequence of a video of shape (400,256,256,3) and trying to track 2000 points in that video using the offline model.
Unfortunately, the inference is very slow and the GPU util is 0% most of the time.
Do you know what could be going on and how we could get better GPU util?

Training on Kubric Dataset

I am trying to train the TAPIR model on the Kubric Dataset using Google Colab however my code keeps stopping without any errors. I am using the python ./experiment.py --config ./configs/tapir_config.py command and the config file is loaded successfully. The training process stops abruptly without any errors. I am unable to determine the cause and would be really grateful for any help in this regards.

Thank You!

Inference Demo Colab file not showing results as shown in the paper?

I just run the google colab inference demo, and it seems like the points are different from the points shown in the paper.

points.mp4

jax requires jaxlib

I tried to install this under windows, but there is no win version of jax. Is it possible to run this under windows?

Jaccard metric bug?

Hi, I am trying to understand the metrics used in TAP-Vid. It seems to me that the Jaccard metric contains a bug.
In particular here the ((~within_dist) & pred_visible) also counts the visible from gt_positives.
In the end both gt_positives and false_positives are summed and the denominator counts some points 2x. (In particular, there is np.sum((~within_dist) & pred_visible & visible) of them).

What do you think?

Use Case: Temporally Coherent Pose Estimation

Frameworks such as Mediapipe or OpenPose are used to extract skeletal keypoints from images.
Unfortunately, the results are inconsistent and somewhat jittery when trying to extract poses from consecutive frames.

I propose a use case supported by tapir:

Extract poses for an initial frame using mediapipe. Perhaps even for the whole video.
Track the keypoints across frames. Prefer tapir's tracking. If tapir and mediapipe diverge, fall back to the mediapipe pose and continue tracking from there.

This idea, similarly to how MP4 files work, considers P-frames as gold, mediapipe poses, and I-frames, as long as consistent, from tapir. When the data stored in the I-frame is no longer consistent, introduce another P-frame. (this can also be done per-frame per-keypoint)

Related issue: qianqianwang68/omnimotion#5

No GPU/TPU found

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Where do I set the log level and how to fix this issue?

Thank you ^^

is there a standard procedure to make tapir run inference on gpu/ubuntu22.04

Hi everyone,

i find it really hard to get tapir to run on gpu, is there a standard procedure to do this?

the thing i do/try is: (after i create a new conda environment)

I first do this: as instructed by jax
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html (for jax/cuda/cudnn installation i suppose)
then i do:
pip install requirements_inference.txt

and the following error pops out
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.func.launch' failed: Failed to load PTX text as a module: CUDA_ERROR_INVALID_IMAGE: device kernel image is invalid; current tracing scope: fusion; current profiling annotation: XlaModule:#hlo_module=jit__threefry_seed,program_id=0#.

note that using only cpu version of this wouldn't hurt (simply pip install requirement_inference.txt)

could someone state your standard procedure for making it work? much thanks

Evaluation on Kubric

Thank you very much for ur amazing work!

There's an error when evaluating on Kubric:

(tap) xingzhenghao@xingzhenghao-PC:~/PycharmProjects$ python ./tapnet/experiment.py --config ./tapnet/configs/tapnet_config.py --jaxline_mode=eval_kubric --config.checkpoint_dir=./tapnet/checkpoint/
I1218 16:02:29.348184 140603062212416 train.py:152] Evaluating with config:
best_model_eval_metric: ''
best_model_eval_metric_higher_is_better: true
checkpoint_dir: ./tapnet/checkpoint/
checkpoint_interval_type: null
dataset_names: &id001 !!python/tuple
- kubric
eval_initial_weights: true
eval_modes: &id002 !!python/tuple
- eval_davis_points
- eval_jhmdb
- eval_robotics_points
- eval_kinetics_points
evaluate_every: 10000
experiment_kwargs:
  config:
    checkpoint_dir: ./tapnet/checkpoint/
    datasets:
      dataset_names: *id001
      kubric_kwargs:
        batch_dims: 8
        shuffle_buffer_size: 128
        train_size: !!python/tuple
        - 256
        - 256
    davis_points_path: /home/xingzhenghao/PycharmProjects/datasets/tap/tapvid_davis/tapvid_davis.pkl
    eval_modes: *id002
    evaluate_every: 10000
    fast_variables: !!python/tuple []
    jhmdb_path: null
    optimizer:
      adam_kwargs:
        b1: 0.9
        b2: 0.95
        eps: 1.0e-08
      base_lr: 0.002
      cosine_decay_kwargs:
        end_value: 0.0
        init_value: 0.0
        warmup_steps: 5000
      max_norm: -1
      optimizer: adam
      schedule_type: cosine
      weight_decay: 0.01
    robotics_points_path: /home/xingzhenghao/PycharmProjects/datasets/tap/tapvid_rgb_stacking/tapvid_rgb_stacking.pkl
    save_final_checkpoint_as_npy: true
    shared_modules:
      shared_module_names: &id003 !!python/tuple
      - tapnet_model
      tapnet_model_kwargs: {}
    supervised_point_prediction_kwargs:
      prediction_algo: cost_volume_regressor
    sweep_name: default_sweep
    training:
      n_training_steps: 100000
interval_type: secs
log_all_train_data: false
log_tensors_interval: 60
log_train_data_interval: 120.0
logging_interval_type: null
max_checkpoints_to_keep: 5
one_off_evaluate: false
random_mode_eval: same_host_same_device
random_mode_train: unique_host_unique_device
random_seed: 42
save_checkpoint_interval: 10
shared_module_names: *id003
train_checkpoint_all_hosts: false
training_steps: 100000

I1218 16:02:29.355844 140603062212416 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
I1218 16:02:29.422299 140603062212416 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host Interpreter CUDA
I1218 16:02:29.422796 140603062212416 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1218 16:02:29.422965 140603062212416 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I1218 16:02:29.972896 140603062212416 supervised_point_prediction.py:944] Saving videos to ./tapnet/checkpoint/eval_kubric/100000
2022-12-18 16:02:29.987263: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
I1218 16:02:32.725086 140603062212416 dataset_info.py:491] Load dataset info from gs://kubric-public/tfds/movi_e/256x256/1.0.0
I1218 16:02:35.881139 140603062212416 dataset_info.py:550] Field info.splits from disk and from code do not match. Keeping the one from code.
I1218 16:02:36.213931 140603062212416 dataset_builder.py:383] Reusing dataset movi_e (gs://kubric-public/tfds/movi_e/256x256/1.0.0)
I1218 16:02:36.214255 140603062212416 logging_logger.py:44] Constructing tf.data.Dataset movi_e for split None, from gs://kubric-public/tfds/movi_e/256x256/1.0.0
W1218 16:02:39.021307 140603062212416 deprecation.py:337] From /home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W1218 16:02:43.468899 140603062212416 deprecation.py:541] From /home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
2022-12-18 16:02:46.722060: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
2022-12-18 16:02:47.417004: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
Traceback (most recent call last):
  File "./tapnet/experiment.py", line 427, in <module>
    app.run(main)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "./tapnet/experiment.py", line 420, in main
    platform.main(
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
    return f(*args, **kwargs)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
    train.evaluate(experiment_class, config, checkpointer, writer,
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
    return fn(*args, **kwargs)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
    scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
    evaluate_out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 401, in evaluate
    eval_scalars = point_prediction_task.evaluate(
  File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 495, in evaluate
    self._eval_epoch(
  File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 968, in _eval_epoch
    for inputs in self._build_eval_input(mode):
  File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 805, in _build_eval_input
    yield from evaluation_datasets.create_kubric_eval_dataset(mode)
  File "/home/xingzhenghao/PycharmProjects/tapnet/evaluation_datasets.py", line 463, in create_kubric_eval_dataset
    for data in np_ds:
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_utils.py", line 65, in _eager_dataset_iterator
    for elem in ds:
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 836, in __next__
    return self._next_internal()
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 819, in _next_internal
    ret = gen_dataset_ops.iterator_get_next(
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2923, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7186, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes at component 0: expected [?,256,24] but got [1,39,24]. [Op:IteratorGetNext]

Thanks a lot for ur support in advance!

Inference with GPU failed

Hello,
I'm having trouble launching my model with my GPU. Here are the logs I see in the terminal:
"""
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
"""
However, when I check if TensorFlow is able to load the GPU using this command:
"""
gpu_devices = tf.config.list_physical_devices('GPU')
"""
I get the following result:
"""
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
"""
Could someone please help me resolve this issue?

inference time as a function of point number and frame number

HI authors of TAPIR,

Thanks again for your work.

I open this issue because i would like to know how the inference time change with respect to point and frame number.

Does the time goes up linearly w.r.t these two argument, or what other form of dependence?

I tried experimenting with it but my result varies a little from time to time, so i asked.

thanks~

google-deepmind / tapnet Goto Github PK

tapnet's Introduction

Tracking Any Point (TAP)

TAPIR Demos

Colab Demo

Live Demo

Benchmarks

TAP-Vid

RoboTAP Benchmark

TAPVid-3D

A Note on Coordinates

Download Checkpoints

TAP-Net and TAPIR Training and Inference

Citing this Work

License and Disclaimer

tapnet's People

Contributors

Stargazers

Watchers

Forkers

tapnet's Issues

Recommend Projects

Recommend Topics

Recommend Org