google-deepmind / tapnet Goto Github PK

View Code? Open in Web Editor NEW

1.2K 33.0 116.0 3.3 MB

Tracking Any Point (TAP)

Home Page: https://deepmind-tapir.github.io/blogpost.html

License: Apache License 2.0

Python 25.68% Jupyter Notebook 74.13% Shell 0.19%

benchmark point-tracking robotics computer-vision deep-learning

tapnet's Issues

CSV file error in generating tapvid kinetic data.

I follow the steps in https://github.com/deepmind/tapnet/blob/main/data/README.md to generate tapvid kinetic dataset.
When processing the clips, ie. run the generate_tapvid.py scipt, I encounter AssertiongError, message is shown below:

  File "generate_tapvid.py", line 177, in main
    videos = csv_to_dataset(FLAGS.csv_path, videos_path)
  File "generate_tapvid.py", line 73, in csv_to_dataset
    assert len(row) == 3 + 3 * 250, f"{len(row)}"
AssertionError: 711

It looks like that the file tapvip_kinetic.csv contains some invalid format, it is extracted from https://storage.googleapis.com/dm-tapnet/tapvid_kinetics.zip.
Can you check it?
Thanks!

AttributeError: module 'jax.numpy' has no attribute 'DeviceArray'

In jax 0.4.14, jax.numpy.DeviceArrary is deprecated, see
https://jax.readthedocs.io/en/latest/changelog.html#jax-0-4-14-july-27-2023

This makes your code produce the error in the title

what is going on when "recompilation" happens in a for loop

Hi everyone,

from the previouos threads, we know that the main factor for slow inference is "change in input tensor size causing recompilation", now if you may, i would like to break down this statement in a more clear way:

notice for the following code: (ps: it is the colab format i am using)

def inference():
    ...
    rng = jax.random.PRNGKey(42)
    outputs, _ = model_apply(params, state, rng, frames, query_points)   ## highlight 1
    ...
    return ...

model = hk.transform_with_state(build_model)
model_apply = jax.jit(model.apply)   ## highlight 2

for video in videos:
    ...
    tracks, visibles = **inference**(frames, query_points)  ## highlight 3

May i ask:

Concretely, in which call, the input tensor size shall be fixed to avoid recompliation? (is it inside inference() or in highlight 1 or other)
where indeed is compilation/recompilation happens?

Conjecture:

if it is "jax.jit" that causes compilation, then supposably from highlight 2, a compiled version of model_apply is returned. After this, no other jax.jit is called, we simply enter a for loop that continues calls for "inference()". Everytime inference is called, it used the pre-compiled version of "model_apply", it do not have access to the outside "jax.jit". So where exactly does this recompilation stem from?

Much thanks to one who read through!

inference time as a function of point number and frame number

HI authors of TAPIR,

Thanks again for your work.

I open this issue because i would like to know how the inference time change with respect to point and frame number.

Does the time goes up linearly w.r.t these two argument, or what other form of dependence?

I tried experimenting with it but my result varies a little from time to time, so i asked.

thanks~

Run TAPIR Inference without resizing the video

Hi tEAM,

Can the inference be done without resizing the video frames?

Training on a new dataset

I am looking to train TAP-Net on a new dataset, in particular modifying the existing datasets (DAVIS, Kubric, RGB Stacking, etc.) to use new keypoints that we generate. It is not immediately clear to me how we should add a new dataset, and how to use the existing scripts such as experiment.py in order to train TAP-Net on the new dataset.

It looks like Kubric is the only dataset that is supported for training, whereas DAVIS and RGB Stacking are included for inference. Could you walk me through what format TAP-Net expects from a dataset, and where in the code/config I would need to add functionality in order to use the new dataset?

Inference with GPU failed

Hello,
I'm having trouble launching my model with my GPU. Here are the logs I see in the terminal:
"""
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
"""
However, when I check if TensorFlow is able to load the GPU using this command:
"""
gpu_devices = tf.config.list_physical_devices('GPU')
"""
I get the following result:
"""
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
"""
Could someone please help me resolve this issue?

how to set a specific gpu to use in tapir code

Hi authors,

suppose that i have successfully installed compatible version of jax w.r.t to cuda/cudnn, how do i modify the following code so that i could:

select a certain gpu to use i.e. cuda:1
check that i am indeed using an gpu
avoid every-time compilation(after i set the input tensor size fixed)

ps: i am running inference on videos in a dataset by the following for loop code (it is the colab format):

  def inference():
      ...
      ...

  model = hk.transform_with_state(build_model)
  model_apply = jax.jit(model.apply)
  
  ### for loop every video in a dataset
  for i in os.listdir("/home/chy/dataset"):
      if not i.startswith("."):
  
          rh,rw = 256,256
          video = np.load("/home/chy/dataset/"+i)
          height, width = video.shape[1:3]
          frames = media.resize_video(video, (rh, rw))
          query_points = points

          tracks, visibles = inference(frames, query_points)
          tracks = transforms.convert_grid_coordinates(tracks, (rw, rh), (width, height))
          np.save("path",tracks)

much thx~

What is the max VRAM requirements / usage for inference?

Hello! It seems the front page is missing this very crucial information, without it nobody knows if they can run it or not and I may waste a lot of time setting up the repository only to find out it doesn't fit.

Cheers!

About the evaluation on Kinetics

Hi, thanks for your excellent work!
I have a problem with the evaluation on TAP-Vid-Kinetics.
I found three files in TAP-Vid-Kinetics, including "train.txt", "test.txt", and "val.txt". Are all used for evaluation? Or just the videos in the test set are used?

Question about the evaluation

Hi:
Thanks for your great job. I have a problem with the evaluation.
The paper proposes two different ways for evaluation (first or stride fashion). Using the first fashion for a point only in the first frame to be tracked, if the point is occluded at a certain timestamp t, and then appears again, is the predicted trajectory after t will be evaluated?

Is TAP-Net an offline algorithm?

Hi, thank you very much for your great work!

Could I consider TAP-Net as an offline algorithm? Because TSM-ResNet-18 is used as the backbone.

The evaluation results with the first fashion for TAPIR

Hi, I found the evaluation results in the strided fashion in the paper, and I wonder whether you have the results with the first fashion？

How is TapNet different from omnimotion?

https://github.com/qianqianwang68/omnimotion

Google had a similar work using the same testing images in your work? How is your work different from it fundamentally?

I am a hobbyist, so it's gratfull for you to spend time explaining briefly the differences, purpose, approach, and result wise.

Thanks!

May I ask is there any code for the training?

impressive!
May I ask is there any code for the training?

Missing videos in kinetics dataset

Hello,

Thank you for releasing these datasets.

I'm currently processing the dataset that you used to evaluate/train the model. However, when generating the pkl file for the kinetics dataset, I receive the following warning:

Could you help to double check the dataset again if you have these videos?

FYI, I follow this link to download and extract the kinetics dataset: https://github.com/cvdfoundation/kinetics-dataset
When checking their annotation file: https://s3.amazonaws.com/kinetics/700_2020/annotations/val.csv, I could not find these videos either.

Best,
Hung

Is it assumed that image corners are aligned for image coordinates?

Hi,
In utils.transforms.convert_grid_coordinates, it is mentioned in the comments that
"""Convert image coordinates between image grids of different sizes.

By default, it assumes that the image corners are aligned. Therefore,
it adds .5 (since (0,0) is assumed to be the center of the upper-left grid
cell), multiplies by the size ratio, and then subtracts .5.
"""
I wanted to ask if it is indeed assumed that the image corners are aligned?

In addition, I also see that the .5 addition and subtraction is not actually implemented in the code. So does that mean the image corners are not aligned?

Thanks!

Code and checkpoint for Kubric-VFS-Like baseline

Hi all,

I have been working on evaluating the Kubric-VFS-like baseline on the Kubric dataset. Would it be possible for you to provide the evaluation code for Kubric-VFS-Like baseline and the checkpoint used to get those results? And for evaluating on Kubric, was the Kubric-VFS-like baseline also trained on Kubric?

Also, it would be great if you could provide the evaluation code and checkpoint for RAFT as well.

Thanks!

Error when evaluating on Kubric dataset

I'm trying to evaluate TAP-Net on the Kubric dataset, and I'm getting the error shown below. I am running the following script: python3 ./tapnet/experiment.py --config=./tapnet/configs/tapnet_config.py --jaxline_mode=eval_kubric --config.checkpoint_dir=/data3/tap/tap/tapnet_checkpoint/.

Any idea how to fix this? Thanks!

I0228 19:56:23.706737 140398657533120 train.py:152] Evaluating with config:
best_model_eval_metric: ''
best_model_eval_metric_higher_is_better: true
checkpoint_dir: /data3/tap//tap/tapnet_checkpoint/
checkpoint_interval_type: null
dataset_names: &id001 !!python/tuple
- kubric
eval_initial_weights: true
eval_modes: &id002 !!python/tuple
- eval_davis_points
- eval_jhmdb
- eval_robotics_points
- eval_kinetics_points
evaluate_every: 10000
experiment_kwargs:
  config:
    checkpoint_dir: /data3/tap//tap/tapnet_checkpoint/
    datasets:
      dataset_names: *id001
      kubric_kwargs:
        batch_dims: 8
        shuffle_buffer_size: 128
        train_size: !!python/tuple
        - 256
        - 256
    davis_points_path: ''
    eval_modes: *id002
    evaluate_every: 10000
    fast_variables: !!python/tuple []
    inference:
      input_video_path: ''
      num_points: 20
      output_video_path: ''
      resize_height: 256
      resize_width: 256
    jhmdb_path: ''
    optimizer:
      adam_kwargs:
        b1: 0.9
        b2: 0.95
        eps: 1.0e-08
      base_lr: 0.002
      cosine_decay_kwargs:
        end_value: 0.0
        init_value: 0.0
        warmup_steps: 5000
      max_norm: -1
      optimizer: adam
      schedule_type: cosine
      weight_decay: 0.01
    robotics_points_path: ''
    save_final_checkpoint_as_npy: true
    shared_modules:
      shared_module_names: &id003 !!python/tuple
      - tapnet_model
      tapnet_model_kwargs: {}
    supervised_point_prediction_kwargs:
      prediction_algo: cost_volume_regressor
    sweep_name: default_sweep
    training:
      n_training_steps: 100000
interval_type: secs
log_all_train_data: false
log_tensors_interval: 60
log_train_data_interval: 120.0
logging_interval_type: null
max_checkpoints_to_keep: 5
one_off_evaluate: false
random_mode_eval: same_host_same_device
random_mode_train: unique_host_unique_device
random_seed: 42
save_checkpoint_interval: 10
shared_module_names: *id003
train_checkpoint_all_hosts: false
training_steps: 100000

I0228 19:56:23.755014 140398657533120 xla_bridge.py:173] Remote TPU is not linked into jax; skipping remote TPU.
I0228 19:56:23.755347 140398657533120 xla_bridge.py:357] Unable to initialize backend 'tpu_driver': Could not initialize backend 'tpu_driver'
I0228 19:56:24.424445 140398657533120 xla_bridge.py:357] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I0228 19:56:24.425570 140398657533120 xla_bridge.py:357] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0228 19:56:28.558446 140398657533120 supervised_point_prediction.py:979] Saving videos to /data3/tap//tap/tapnet_checkpoint/eval_kubric/0
I0228 19:56:28.567507 140398657533120 dataset_info.py:565] Load dataset info from /data3/tap/kubric/movi_e/256x256/1.0.0
W0228 19:56:28.572742 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'uint8'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to uint8.
W0228 19:56:28.574135 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'uint16'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to uint16.
I0228 19:56:28.620231 140398657533120 dataset_info.py:654] Fields info.[splits] from disk and from code do not match. Keeping the one from code.
I0228 19:56:28.620935 140398657533120 dataset_builder.py:522] Reusing dataset movi_e (/data3/tap/kubric/movi_e/256x256/1.0.0)
W0228 19:56:28.622349 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.622643 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'float32'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to float32.
W0228 19:56:28.622788 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.622916 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623071 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'int32'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to int32.
W0228 19:56:28.623173 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623292 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623408 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623907 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624111 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'string'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to object.
W0228 19:56:28.624262 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624418 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624617 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.625051 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'int64'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to int64.
W0228 19:56:28.625315 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'bool'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to bool.
I0228 19:56:29.712036 140398657533120 logging_logger.py:49] Constructing tf.data.Dataset movi_e for split None, from /data3/tap/kubric/movi_e/256x256/1.0.0
W0228 19:56:32.420510 140398657533120 deprecation.py:337] From /data/anaconda3/envs/tapnet/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0228 19:56:39.477390 140398657533120 deprecation.py:541] From /data/anaconda3/envs/tapnet/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
2023-02-28 19:56:44.292071: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
2023-02-28 19:56:45.191140: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
Traceback (most recent call last):
  File "./tapnet/experiment.py", line 429, in <module>
    app.run(main)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "./tapnet/experiment.py", line 421, in main
    platform.main(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
    return f(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
    train.evaluate(experiment_class, config, checkpointer, writer,
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
    return fn(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
    scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
    evaluate_out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 404, in evaluate
    eval_scalars = point_prediction_task.evaluate(
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 514, in evaluate
    self._eval_epoch(
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 1005, in _eval_epoch
    scalars, viz = eval_batch_fn(params, state, inputs, rng)
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 766, in _eval_batch
    occlusion_logits, tracks, loss_scalars = self._infer_batch(
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 577, in _infer_batch
    output, _ = functools.partial(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
    out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 122, in forward
    return self.point_prediction.forward_fn(
  File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 313, in forward_fn
    return shared_modules['tapnet_model'](
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/ubuntu/contractive/tapnet/tapnet_model.py", line 341, in __call__
    latent = self.tsm_resnet(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/ubuntu/contractive/tapnet/models/tsm_resnet.py", line 383, in __call__
    net = hk.Conv2D(
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
    w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
    return wrapped._current(*args, **kwargs)
  File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
    raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.

Training on Kubric Dataset

I am trying to train the TAPIR model on the Kubric Dataset using Google Colab however my code keeps stopping without any errors. I am using the python ./experiment.py --config ./configs/tapir_config.py command and the config file is loaded successfully. The training process stops abruptly without any errors. I am unable to determine the cause and would be really grateful for any help in this regards.

Thank You!

Kubric dataset

Hi, when I use the Kubric(Movi-E), I find the number of the train set is about 9750, the validation set is about 250 and the test set is about 999, which is different with the number 38,325/799 for train/validation in paper.

ValueError: Unable to retrieve parameter 'w' when trying to use `eval_inference`

When invoking experiment.py to do inference:

python3 ./tapnet/experiment.py \
  --config=./tapnet/configs/tapnet_config.py \
  --jaxline_mode=eval_inference \
  --config.checkpoint_dir=./tapnet/checkpoint/ \
  --config.experiment_kwargs.config.inference.input_video_path=fixed10.mp4 \
  --config.experiment_kwargs.config.inference.output_video_path=result.mp4 \
  --config.experiment_kwargs.config.inference.resize_height=256 \
  --config.experiment_kwargs.config.inference.resize_width=256 \
  --config.experiment_kwargs.config.inference.num_points=20

I get the following error:

Traceback (most recent call last):
  File "./tapnet/experiment.py", line 431, in <module>
    app.run(main)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "./tapnet/experiment.py", line 424, in main
    platform.main(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
    return f(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
    train.evaluate(experiment_class, config, checkpointer, writer,
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
    return fn(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
    scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
    evaluate_out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 405, in evaluate
    eval_scalars = point_prediction_task.evaluate(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 370, in evaluate
    self._eval_inference(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 981, in _eval_inference
    outputs, _ = self._infer_batch(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 440, in _infer_batch
    output, _ = functools.partial(wrapped_forward_fn, input_key=input_key)(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
    out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 125, in forward
    return self.point_prediction.forward_fn(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 150, in forward_fn
    return shared_modules[self.model_key](
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/tapnet/tapnet_model.py", line 215, in __call__
    latent = self.tsm_resnet(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/tapnet/models/tsm_resnet.py", line 383, in __call__
    net = hk.Conv2D(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
    w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
    return wrapped._current(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
    raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.

Attempting to use a local GPU. The live_demo.py script works for me, so not sure what the issue is here.

Comparison between TAP-Net and PIPs

Hi,

You only mention that PIPs is a concurrent work to TAP-Vid/Net.

Are you planning to show any comparisons between these 2 works?

Inference Demo Colab file not showing results as shown in the paper?

I just run the google colab inference demo, and it seems like the points are different from the points shown in the paper.

points.mp4

tapir give out negative corrdinate

Hi authors,

i find that tapir is sometimes giving negative coordinate, why do it happen and what do you think that it represent?

thx!

is there a standard procedure to make tapir run inference on gpu/ubuntu22.04

Hi everyone,

i find it really hard to get tapir to run on gpu, is there a standard procedure to do this?

the thing i do/try is: (after i create a new conda environment)

I first do this: as instructed by jax
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html (for jax/cuda/cudnn installation i suppose)
then i do:
pip install requirements_inference.txt

and the following error pops out
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.func.launch' failed: Failed to load PTX text as a module: CUDA_ERROR_INVALID_IMAGE: device kernel image is invalid; current tracing scope: fusion; current profiling annotation: XlaModule:#hlo_module=jit__threefry_seed,program_id=0#.

note that using only cpu version of this wouldn't hurt (simply pip install requirement_inference.txt)

could someone state your standard procedure for making it work? much thanks

Jaccard metric bug?

Hi, I am trying to understand the metrics used in TAP-Vid. It seems to me that the Jaccard metric contains a bug.
In particular here the ((~within_dist) & pred_visible) also counts the visible from gt_positives.
In the end both gt_positives and false_positives are summed and the denominator counts some points 2x. (In particular, there is np.sum((~within_dist) & pred_visible & visible) of them).

What do you think?

Training with CUDA

Hi, I have already installed JAX version with CUDA.
However, it seems when I run python ./experiment.py --config ./configs/tapnet_config.py,
it is still trying to use the TPU and eventually fails cause I don't have access to it. And then, end up using CPU by default.

I wonder how to use the CUDA to train and evaluate the model.

Thank you very much for your support in advance!

Attribute Error: module 'tensorflow_datasets.core' has no attribute 'ReadWritePath'

Hi,
I am trying to run inference with a toy movie using the following command -
(tapnet) pinot:$ python3 ./experiment.py --config=./configs/tapnet_config.py --jaxline_mode=eval_inference --config.checkpoint_dir=./checkpoint/ --config.experiment_kwargs.config.inference.input_video_path=test_data/ta.mp4 --config.experiment_kwargs.config.inference.output_video_path=result.mp4 --config.experiment_kwargs.config.inference.resize_height=256 --config.experiment_kwargs.config.inference.resize_width=256 --config.experiment_kwargs.config.inference.num_points=20
I have created a virtual conda environment and installed the deps using the requirements.txt file. Running the above command in the virtual env results in the following error -

2023-06-26 20:10:33.108623: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/home/pshah/software/tapnet/./experiment.py", line 32, in <module>
    from kubric.challenges.point_tracking import dataset
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/__init__.py", line 20, in <module>
    from kubric.core.scene import Scene
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/core/__init__.py", line 17, in <module>
    from .scene import Scene
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/core/scene.py", line 20, in <module>
    from kubric.utils import next_global_count
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/utils.py", line 52, in <module>
    from kubric.custom_types import PathLike
  File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/custom_types.py", line 25, in <module>
    PathLike = Union[str, tfds.core.ReadWritePath]
AttributeError: module 'tensorflow_datasets.core' has no attribute 'ReadWritePath'

Is there a fix for this issue?
TIA!

How does model deal with the occlusions?

Hi !
I wonder how TAPNet deals with the occlusions. I find the model leverages the Huber loss only at visible points. For the evaluation of "first fashion", when a point is occluded in the next frame and appears later, how the model performs point tracking? Does the model first detect the occlusion and then skip this frame, and track the point in the following frames?

Kubric dataset: few occluded points are not labeled as occluded

Hi,

First, thanks for sharing your work.

In the Kubric dataset, I see that almost all occluded points are correctly labeled as occluded, but I still visually obverse few points among 1200=50 (videos) x 24 (points per video) are occluded. I'm sure these points are occluded because the foreground object has moved a large distance and the ground truth still stays the original position, like a position of the ground. I looked into the code, but still have not idea what caused this problem. Could you please give some suggestions or even a solution? Thanks.

There is also an issue that "reprojected position is off for points on objects" under the Kubric repository. Would you be willing to answer that too? google-research/kubric#280

Best,
Zhihao

eval results seems not so good?

note:this is not a bug event
when i run scripts as flow:
python3 ./tapnet/experiment.py \ --config=./tapnet/configs/tapnet_config.py \ --jaxline_mode=eval_davis_points \ --config.checkpoint_dir=./tapnet/checkpoint/ \ --config.experiment_kwargs.config.davis_points_path=/path/to/tapvid_davis.pkl
i got result in \tapnet\checkpoint\eval_davis_points\100000,there are 0.mp4 - 9.mp4 generated; in every mp4 video,there are four frames in parent PIC,though all frames seems blur; the points in pics all seems bad, is it a real result, i mean ,why results not seems amazing instead of disappointing?

GPU JAX does not speed up the inference of TAP-Net?

First of all, thank you so much for releasing this great work. I test it on my custom videos, and the tracking is really robust and accurate.

Following the README, I'm able to run the CPU version of the code (because the JAX in requirements.txt is the CPU-only version) at a very high speed. I use a 300-frame video and track 24 points from the initial frame. It takes only 10s to output the tracking results (excluding the video painting/saving time).

Then, I'm thinking of using the GPU version of JAX to further speed up the inference. I successfully installed JAX-cuda (see the screenshot below) and nvidia-smi confirms that the code is indeed using GPU (consumes 20GB memory on an RTX 3090 GPU). However, the running time is 15s -- much slower than JAX-CPU's 10s. For your reference, I'm using the command from the README:

python3 ./tapnet/experiment.py \
  --config=./tapnet/configs/tapnet_config.py \
  --jaxline_mode=eval_inference \
  --config.checkpoint_dir=./tapnet/checkpoint/ \
  --config.experiment_kwargs.config.inference.input_video_path=MY_VIDEO.mp4 \
  --config.experiment_kwargs.config.inference.output_video_path=result.mp4 \
  --config.experiment_kwargs.config.inference.resize_height=256 \
  --config.experiment_kwargs.config.inference.resize_width=256 \
  --config.experiment_kwargs.config.inference.num_points=24

I'm new to JAX, so I'd really appreciate it if you can provide some hints on why my GPU code runs slower then CPU. Thanks!

EDIT: after looking at some JAX-GPU related issues and the document, is it simply because the video/point size is too small? I.e. if I use a batch of video or more points, GPU should be faster?

Incorrect evaluation in the 'first' mode?

Hi,

I wonder about the code here

It seems to me that it is supposed to turn off evaluation of points before the query frame.
I however don't understand why / how is the 'index' computed. We already have the query frames computed here, so why the np.where?

Anyways, I thing that the code is incorrect. The for loop loops over the "batch" dimension, which is always 1. This means that the current code does not turn off the evaluation before the query point.

I think it should be corrected to:

for i in range(gt_occluded.shape[1]): # loop over the query points
  index = np.where(gt_occluded[0, i] == 0)[0][0]  # find the first unoccluded frame for that query point (in singleton batch 0)
  evaluation_points[i, :index] = False

or equivalently:

N_queries = gt_occluded.shape[1]
for i in range(N_queries):
  evaluation_points[i, :query_frame[i]] = False

Am I missing something? Or can I make a pull request for this correction?

BTW it would be very nice if you could provide the raw results (i.e., the pred_occluded and pred_tracks arguments of compute_tapvid_metrics) of the trackers evaluated in your papers, so that people could evaluate the trackers on different metrics without having to re-implement and without wasting quite a lot of compute (pretty please?)

Best regards!

Reproducing RAFT Results

Thanks for the nice work! Is the codebase used to evaluate RAFT possibly available publicly?

I was using your evaluate_dataset.py utilities to load DAVIS and KUBRIC and to later compute the PCK metrics but wasn't able to reproduce the reported numbers. For example, my average_pts_within_thresh for DAVIS is 87.5% and does not match the reported 46.3%. The numbers that I've got for strided query evaluation with stride 5 are the following:

RAFT Results	DAVIS	KUBRIC
ade_visible	1.42952	0.954813
pts_within_0.01	20.585	20.6073
pts_within_0.1	26.0005	31.6371
pts_within_0.5	49.9805	68.3699
pts_within_1	66.5964	82.676
pts_within_2	82.5936	91.0452
pts_within_4	92.3043	95.6189
pts_within_8	96.7841	98.0127
pts_within_16	99.0649	99.1844
average_pts_within_thresh	87.4687	93.3074

In case you might spot something obviously incorrect, here is the ground truth and my predicted trajectory for the first point and the first 5 frames of DAVIS, all of which were visible, alongside relevant summary metrics computed for that datapoint alone.

*** Results dictionary for the first datapoint of RAFT:
iter: 1
video_idx: 0
point_idx_in_video: 0
trajectory_gt: tensor([[131.9137,  87.8248],
        [135.1333,  89.2444],
        [137.8000,  91.8519],
        [140.2000,  95.8815],
        [142.6000, 102.5185]])
trajectory_pred: tensor([[131.9137,  87.8248],
        [135.3244,  89.0901],
        [137.8152,  91.3863],
        [140.1928,  94.9903],
        [142.7608, 101.2055]])
visibility_gt: tensor([True, True, True, True, True])

*** Summary metrics for the first datapoint of RAFT:
idx: 1--0--0
ade_visible: 0.5850879549980164 (including query point)
pts_within_0.01: 0.0 (percentage)
pts_within_0.1: 0.0
pts_within_0.5: 50.0
pts_within_1: 75.0
pts_within_2: 100.0
pts_within_4: 100.0
pts_within_8: 100.0
pts_within_16: 100.0

Also, here are a few GIFs for a random data batch of DAVIS (top) and KUBRIC (bottom):

Ground Truth	Prediction	Prediction on top of GT

If relevant, these are the entry points into my codebase:

PyTorch DataLoader
Dataset Wrapper
Dataset Stride (stride is called "sequence length")
RAFT loaded
RAFT forward pass
Metrics computation
Environment Setup: part 1, part 2

OOM error

Hi,
Each time I am trying to run videos which are more than 300 frames I am running into a memory allocation error. Otherwise for anyting below that it seems to work really well and with the GPU engaged etc.
Do you have any recommendation? I am putting the traceback below

2023-08-11 16:43:58.122328: W external/xla/xla/service/gpu/conv_algorithm_picker.cc:1003] Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.34 = (f32[500,64,128,128]{3,2,1,0}, u8[0]{0}) custom-call(f32[500,64,128,128]{3,2,1,0} %maximum.62, f32[64,64,3,3]{3,2,1,0} %transpose.2482, f32[64]{0} %broadcast.2604, f32[500,64,128,128]{3,2,1,0} %get-tuple-element.147), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255}, backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":1,"leakyrelu_alpha":0}

Original error: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 2113929216 bytes.

As a result, convolution performance may be suboptimal.
2023-08-11 16:43:59.367796: W external/xla/xla/service/gpu/conv_algorithm_picker.cc:1003] Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.35 = (f32[500,64,128,128]{3,2,1,0}, u8[0]{0}) custom-call(f32[500,64,128,128]{3,2,1,0} %maximum.65, f32[64,64,3,3]{3,2,1,0} %transpose.2485, f32[64]{0} %broadcast.2618, f32[500,64,128,128]{3,2,1,0} %get-tuple-element.149), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255}, backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":1,"leakyrelu_alpha":0}

Original error: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 2113929216 bytes.

As a result, convolution performance may be suboptimal.

2023-08-11 16:44:47.337077: W external/tsl/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 5.89GiB (rounded to 6324376320)requested by op
2023-08-11 16:44:47.337531: W external/tsl/tsl/framework/bfc_allocator.cc:497] ________________________________________________________________________________________***_____
2023-08-11 16:44:47.339353: E external/xla/xla/pjrt/pjrt_stream_executor_client.cc:2593] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 6324376288 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 493.53MiB
constant allocation: 8.1KiB
maybe_live_out allocation: 781.4KiB
preallocated temp allocation: 5.89GiB
preallocated temp fragmentation: 30.84MiB (0.51%)
total allocation: 6.37GiB
total fragmentation: 31.63MiB (0.48%)
Peak buffers:
Buffer 1:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_0/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
XLA Label: custom-call
Shape: f32[500,64,128,128]
==========================

Buffer 2:
	Size: 1.95GiB
	Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/jit(relu)/max" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=252 deduplicated_name="fusion.656"
	XLA Label: fusion
	Shape: f32[500,64,128,128]
	==========================

Buffer 3:
	Size: 1.95GiB
	Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
	XLA Label: custom-call
	Shape: f32[500,64,128,128]
	==========================

Buffer 4:
	Size: 375.00MiB
	Entry Parameter Subshape: f32[1,500,256,256,3]
	==========================

Buffer 5:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 6:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 7:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 8:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 9:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 10:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 11:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 12:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 13:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 14:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 15:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

XlaRuntimeError Traceback (most recent call last)
Cell In[18], line 9
7 vidrames = copy.copy(frames)
8 frames = vidrames[:500]
----> 9 tracks, visibles = inference(frames, query_points)

Cell In[3], line 92, in inference(frames, query_points)
90 # Model inference
91 rng = jax.random.PRNGKey(42)
---> 92 outputs, _ = model_apply(params, state, rng, frames, query_points)
93 outputs = tree.map_structure(lambda x: np.array(x[0]), outputs)
94 tracks, occlusions, expected_dist = outputs['tracks'], outputs['occlusion'], outputs['expected_dist']

[... skipping hidden 10 frame]

File ~/miniconda3/envs/tapir/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py:1229, in ExecuteReplicated.call(self, *args)
1224 self._handle_token_bufs(
1225 results.disassemble_prefix_into_single_device_arrays(
1226 len(self.ordered_effects)),
1227 results.consume_token())
1228 else:
-> 1229 results = self.xla_executable.execute_sharded(input_bufs)
1230 if dispatch.needs_check_special():
1231 out_arrays = results.disassemble_into_single_device_arrays()

XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 6324376288 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 493.53MiB
constant allocation: 8.1KiB
maybe_live_out allocation: 781.4KiB
preallocated temp allocation: 5.89GiB
preallocated temp fragmentation: 30.84MiB (0.51%)
total allocation: 6.37GiB
total fragmentation: 31.63MiB (0.48%)
Peak buffers:
Buffer 1:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_0/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
XLA Label: custom-call
Shape: f32[500,64,128,128]
==========================

Buffer 2:
	Size: 1.95GiB
	Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/jit(relu)/max" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=252 deduplicated_name="fusion.656"
	XLA Label: fusion
	Shape: f32[500,64,128,128]
	==========================

Buffer 3:
	Size: 1.95GiB
	Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
	XLA Label: custom-call
	Shape: f32[500,64,128,128]
	==========================

Buffer 4:
	Size: 375.00MiB
	Entry Parameter Subshape: f32[1,500,256,256,3]
	==========================

Buffer 5:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 6:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 7:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 8:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 9:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 10:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 11:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 12:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 13:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Buffer 14:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[512,2048]
	==========================

Buffer 15:
	Size: 4.00MiB
	Entry Parameter Subshape: f32[2048,512]
	==========================

Nearly 0% GPU utilization for a large majority of time during inference

I am running inference on a sequence of a video of shape (400,256,256,3) and trying to track 2000 points in that video using the offline model.
Unfortunately, the inference is very slow and the GPU util is 0% most of the time.
Do you know what could be going on and how we could get better GPU util?

How to use GPU for inference

HI author of tapnet,

thanks for your great work.

I want to ask how could i use GPU for inference in your work.

with normal setup, it reminds me "no GPU/TPU, falling back to cpu" when i actually have GPU available.

Then i see that it seems i need correct version of jax,
so i then create a new env, pip install the compatible version of jax as instructed in a link you provide, and then set up the remaining dependency as in requirement.txt. but i got this:
"Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"
and
"jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details."

do you know why is that? thanks a lot

Panning MOVi-E

Hi,

Thank you for the great work and code release. May I know how should I obtain "panning MOVi-E" in the TAPIR paper? I see references in this repo to the standard Kubric API but I don't see any pointers to the "panning" version.

Thanks a lot!

Evaluation setting - Tracking forward & backward?

I would like to clarify the evaluation setting.

Because the query points can be sampled any where within the video (not just in the first frame), do we have to track them backward in time or just need to track them forward?

For example, if the query point is sample at frame T, do we have to find its position in frames 0->(T-1), or just need to track it in (T+1)->Max_Frame?

Evaluation on Kubric

Thank you very much for ur amazing work!

There's an error when evaluating on Kubric:

(tap) xingzhenghao@xingzhenghao-PC:~/PycharmProjects$ python ./tapnet/experiment.py --config ./tapnet/configs/tapnet_config.py --jaxline_mode=eval_kubric --config.checkpoint_dir=./tapnet/checkpoint/
I1218 16:02:29.348184 140603062212416 train.py:152] Evaluating with config:
best_model_eval_metric: ''
best_model_eval_metric_higher_is_better: true
checkpoint_dir: ./tapnet/checkpoint/
checkpoint_interval_type: null
dataset_names: &id001 !!python/tuple
- kubric
eval_initial_weights: true
eval_modes: &id002 !!python/tuple
- eval_davis_points
- eval_jhmdb
- eval_robotics_points
- eval_kinetics_points
evaluate_every: 10000
experiment_kwargs:
  config:
    checkpoint_dir: ./tapnet/checkpoint/
    datasets:
      dataset_names: *id001
      kubric_kwargs:
        batch_dims: 8
        shuffle_buffer_size: 128
        train_size: !!python/tuple
        - 256
        - 256
    davis_points_path: /home/xingzhenghao/PycharmProjects/datasets/tap/tapvid_davis/tapvid_davis.pkl
    eval_modes: *id002
    evaluate_every: 10000
    fast_variables: !!python/tuple []
    jhmdb_path: null
    optimizer:
      adam_kwargs:
        b1: 0.9
        b2: 0.95
        eps: 1.0e-08
      base_lr: 0.002
      cosine_decay_kwargs:
        end_value: 0.0
        init_value: 0.0
        warmup_steps: 5000
      max_norm: -1
      optimizer: adam
      schedule_type: cosine
      weight_decay: 0.01
    robotics_points_path: /home/xingzhenghao/PycharmProjects/datasets/tap/tapvid_rgb_stacking/tapvid_rgb_stacking.pkl
    save_final_checkpoint_as_npy: true
    shared_modules:
      shared_module_names: &id003 !!python/tuple
      - tapnet_model
      tapnet_model_kwargs: {}
    supervised_point_prediction_kwargs:
      prediction_algo: cost_volume_regressor
    sweep_name: default_sweep
    training:
      n_training_steps: 100000
interval_type: secs
log_all_train_data: false
log_tensors_interval: 60
log_train_data_interval: 120.0
logging_interval_type: null
max_checkpoints_to_keep: 5
one_off_evaluate: false
random_mode_eval: same_host_same_device
random_mode_train: unique_host_unique_device
random_seed: 42
save_checkpoint_interval: 10
shared_module_names: *id003
train_checkpoint_all_hosts: false
training_steps: 100000

I1218 16:02:29.355844 140603062212416 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
I1218 16:02:29.422299 140603062212416 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host Interpreter CUDA
I1218 16:02:29.422796 140603062212416 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1218 16:02:29.422965 140603062212416 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I1218 16:02:29.972896 140603062212416 supervised_point_prediction.py:944] Saving videos to ./tapnet/checkpoint/eval_kubric/100000
2022-12-18 16:02:29.987263: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
I1218 16:02:32.725086 140603062212416 dataset_info.py:491] Load dataset info from gs://kubric-public/tfds/movi_e/256x256/1.0.0
I1218 16:02:35.881139 140603062212416 dataset_info.py:550] Field info.splits from disk and from code do not match. Keeping the one from code.
I1218 16:02:36.213931 140603062212416 dataset_builder.py:383] Reusing dataset movi_e (gs://kubric-public/tfds/movi_e/256x256/1.0.0)
I1218 16:02:36.214255 140603062212416 logging_logger.py:44] Constructing tf.data.Dataset movi_e for split None, from gs://kubric-public/tfds/movi_e/256x256/1.0.0
W1218 16:02:39.021307 140603062212416 deprecation.py:337] From /home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W1218 16:02:43.468899 140603062212416 deprecation.py:541] From /home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
2022-12-18 16:02:46.722060: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
2022-12-18 16:02:47.417004: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
Traceback (most recent call last):
  File "./tapnet/experiment.py", line 427, in <module>
    app.run(main)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "./tapnet/experiment.py", line 420, in main
    platform.main(
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
    return f(*args, **kwargs)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
    train.evaluate(experiment_class, config, checkpointer, writer,
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
    return fn(*args, **kwargs)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
    scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
    evaluate_out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 401, in evaluate
    eval_scalars = point_prediction_task.evaluate(
  File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 495, in evaluate
    self._eval_epoch(
  File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 968, in _eval_epoch
    for inputs in self._build_eval_input(mode):
  File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 805, in _build_eval_input
    yield from evaluation_datasets.create_kubric_eval_dataset(mode)
  File "/home/xingzhenghao/PycharmProjects/tapnet/evaluation_datasets.py", line 463, in create_kubric_eval_dataset
    for data in np_ds:
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_utils.py", line 65, in _eager_dataset_iterator
    for elem in ds:
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 836, in __next__
    return self._next_internal()
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 819, in _next_internal
    ret = gen_dataset_ops.iterator_get_next(
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2923, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7186, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes at component 0: expected [?,256,24] but got [1,39,24]. [Op:IteratorGetNext]

Thanks a lot for ur support in advance!

Use Case: Temporally Coherent Pose Estimation

Frameworks such as Mediapipe or OpenPose are used to extract skeletal keypoints from images.
Unfortunately, the results are inconsistent and somewhat jittery when trying to extract poses from consecutive frames.

I propose a use case supported by tapir:

Extract poses for an initial frame using mediapipe. Perhaps even for the whole video.
Track the keypoints across frames. Prefer tapir's tracking. If tapir and mediapipe diverge, fall back to the mediapipe pose and continue tracking from there.

This idea, similarly to how MP4 files work, considers P-frames as gold, mediapipe poses, and I-frames, as long as consistent, from tapir. When the data stored in the I-frame is no longer consistent, introduce another P-frame. (this can also be done per-frame per-keypoint)

Related issue: qianqianwang68/omnimotion#5

TAPIR Checkpoint for Training/Finetuning

In the previous iteration of TAP-Net, there was a checkpoint file released that had not only model state but also the optimizer state as well as the global_step. This was helpful since you could load this in directly into an experiment and easily start finetuning. However, I don't believe that there is a similar checkpoint file for TAPIR. In the README there is a checkpoint for the "online" version of the model, and in one of the linked notebooks there is a checkpoint for the offline model, but neither includes training state.

Could you release a checkpoint with the training state included for the TAPIR model?

Running TAPNET in Windows?

I am trying to run TAPNET in Windows 11 with Anaconda3-2023.07-2-Windows-x86_64.exe but I still get a JAX related error message.
I already installed jaxlib-0.4.11-cp311-cp311-win_amd64.whl but it´s apparently not enough.

(base) C:\Users\ATC\tapnet>python ./experiment.py --config ./configs/tapir_config.py
Traceback (most recent call last):
  File "C:\Users\ATC\tapnet\experiment.py", line 29, in <module>
    from jaxline import experiment
  File "C:\ProgramData\anaconda3\Lib\site-packages\jaxline\experiment.py", line 30, in <module>
    from jaxline import utils
  File "C:\ProgramData\anaconda3\Lib\site-packages\jaxline\utils.py", line 335, in <module>
    rng: jnp.DeviceArray,
         ^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\Lib\site-packages\jax\_src\deprecations.py", line 53, in getattr
    raise AttributeError(f"module {module!r} has no attribute {name!r}")
AttributeError: module 'jax.numpy' has no attribute 'DeviceArray'

conda list includes this:

jax                       0.4.14                   pypi_0    pypi
jaxlib                    0.4.11                   pypi_0    pypi
jaxline                   0.0.5                    pypi_0    pypi

Can somebody tell me what I am doing wrong?

Questions about the evaluation

Hi TapNet authors,

Just to confirm if I understand the evaluation correctly: It is clear that you train and test tap-net on resized images of 256x256. When you run the baseline methods, you also use resized images of 256x256 as input, is that correct?

If so, could you explain why it is preferable to evaluate on reduced resolution when the full-resolution images & annotations are available?

google-deepmind / tapnet Goto Github PK

tapnet's Issues

Recommend Projects

Recommend Topics

Recommend Org