google-deepmind / tapnet Goto Github PK
View Code? Open in Web Editor NEWTracking Any Point (TAP)
Home Page: https://deepmind-tapir.github.io/blogpost.html
License: Apache License 2.0
Tracking Any Point (TAP)
Home Page: https://deepmind-tapir.github.io/blogpost.html
License: Apache License 2.0
I follow the steps in https://github.com/deepmind/tapnet/blob/main/data/README.md to generate tapvid kinetic dataset.
When processing the clips, ie. run the generate_tapvid.py
scipt, I encounter AssertiongError, message is shown below:
File "generate_tapvid.py", line 177, in main
videos = csv_to_dataset(FLAGS.csv_path, videos_path)
File "generate_tapvid.py", line 73, in csv_to_dataset
assert len(row) == 3 + 3 * 250, f"{len(row)}"
AssertionError: 711
It looks like that the file tapvip_kinetic.csv
contains some invalid format, it is extracted from https://storage.googleapis.com/dm-tapnet/tapvid_kinetics.zip.
Can you check it?
Thanks!
In jax 0.4.14, jax.numpy.DeviceArrary
is deprecated, see
https://jax.readthedocs.io/en/latest/changelog.html#jax-0-4-14-july-27-2023
This makes your code produce the error in the title
Hi everyone,
from the previouos threads, we know that the main factor for slow inference is "change in input tensor size causing recompilation", now if you may, i would like to break down this statement in a more clear way:
notice for the following code: (ps: it is the colab format i am using)
def inference():
...
rng = jax.random.PRNGKey(42)
outputs, _ = model_apply(params, state, rng, frames, query_points) ## highlight 1
...
return ...
model = hk.transform_with_state(build_model)
model_apply = jax.jit(model.apply) ## highlight 2
for video in videos:
...
tracks, visibles = **inference**(frames, query_points) ## highlight 3
May i ask:
Conjecture:
if it is "jax.jit" that causes compilation, then supposably from highlight 2, a compiled version of model_apply is returned. After this, no other jax.jit is called, we simply enter a for loop that continues calls for "inference()". Everytime inference is called, it used the pre-compiled version of "model_apply", it do not have access to the outside "jax.jit". So where exactly does this recompilation stem from?
Much thanks to one who read through!
HI authors of TAPIR,
Thanks again for your work.
I open this issue because i would like to know how the inference time change with respect to point and frame number.
Does the time goes up linearly w.r.t these two argument, or what other form of dependence?
I tried experimenting with it but my result varies a little from time to time, so i asked.
thanks~
Hi tEAM,
Can the inference be done without resizing the video frames?
I am looking to train TAP-Net on a new dataset, in particular modifying the existing datasets (DAVIS, Kubric, RGB Stacking, etc.) to use new keypoints that we generate. It is not immediately clear to me how we should add a new dataset, and how to use the existing scripts such as experiment.py
in order to train TAP-Net on the new dataset.
It looks like Kubric is the only dataset that is supported for training, whereas DAVIS and RGB Stacking are included for inference. Could you walk me through what format TAP-Net expects from a dataset, and where in the code/config I would need to add functionality in order to use the new dataset?
Hello,
I'm having trouble launching my model with my GPU. Here are the logs I see in the terminal:
"""
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
"""
However, when I check if TensorFlow is able to load the GPU using this command:
"""
gpu_devices = tf.config.list_physical_devices('GPU')
"""
I get the following result:
"""
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
"""
Could someone please help me resolve this issue?
Hi authors,
suppose that i have successfully installed compatible version of jax w.r.t to cuda/cudnn, how do i modify the following code so that i could:
ps: i am running inference on videos in a dataset by the following for loop code (it is the colab format):
def inference():
...
...
model = hk.transform_with_state(build_model)
model_apply = jax.jit(model.apply)
### for loop every video in a dataset
for i in os.listdir("/home/chy/dataset"):
if not i.startswith("."):
rh,rw = 256,256
video = np.load("/home/chy/dataset/"+i)
height, width = video.shape[1:3]
frames = media.resize_video(video, (rh, rw))
query_points = points
tracks, visibles = inference(frames, query_points)
tracks = transforms.convert_grid_coordinates(tracks, (rw, rh), (width, height))
np.save("path",tracks)
much thx~
Hello! It seems the front page is missing this very crucial information, without it nobody knows if they can run it or not and I may waste a lot of time setting up the repository only to find out it doesn't fit.
Cheers!
Hi, thanks for your excellent work!
I have a problem with the evaluation on TAP-Vid-Kinetics.
I found three files in TAP-Vid-Kinetics, including "train.txt", "test.txt", and "val.txt". Are all used for evaluation? Or just the videos in the test set are used?
Hi:
Thanks for your great job. I have a problem with the evaluation.
The paper proposes two different ways for evaluation (first or stride fashion). Using the first fashion for a point only in the first frame to be tracked, if the point is occluded at a certain timestamp t, and then appears again, is the predicted trajectory after t will be evaluated?
Hi, thank you very much for your great work!
Could I consider TAP-Net as an offline algorithm? Because TSM-ResNet-18 is used as the backbone.
Hi, I found the evaluation results in the strided fashion in the paper, and I wonder whether you have the results with the first fashion?
https://github.com/qianqianwang68/omnimotion
Google had a similar work using the same testing images in your work? How is your work different from it fundamentally?
I am a hobbyist, so it's gratfull for you to spend time explaining briefly the differences, purpose, approach, and result wise.
Thanks!
impressive!
May I ask is there any code for the training?
Hello,
Thank you for releasing these datasets.
I'm currently processing the dataset that you used to evaluate/train the model. However, when generating the pkl file for the kinetics dataset, I receive the following warning:
Could you help to double check the dataset again if you have these videos?
FYI, I follow this link to download and extract the kinetics dataset: https://github.com/cvdfoundation/kinetics-dataset
When checking their annotation file: https://s3.amazonaws.com/kinetics/700_2020/annotations/val.csv, I could not find these videos either.
Best,
Hung
Hi,
In utils.transforms.convert_grid_coordinates
, it is mentioned in the comments that
"""Convert image coordinates between image grids of different sizes.
By default, it assumes that the image corners are aligned. Therefore,
it adds .5 (since (0,0) is assumed to be the center of the upper-left grid
cell), multiplies by the size ratio, and then subtracts .5.
"""
I wanted to ask if it is indeed assumed that the image corners are aligned?
In addition, I also see that the .5 addition and subtraction is not actually implemented in the code. So does that mean the image corners are not aligned?
Thanks!
Hi all,
I have been working on evaluating the Kubric-VFS-like baseline on the Kubric dataset. Would it be possible for you to provide the evaluation code for Kubric-VFS-Like baseline and the checkpoint used to get those results? And for evaluating on Kubric, was the Kubric-VFS-like baseline also trained on Kubric?
Also, it would be great if you could provide the evaluation code and checkpoint for RAFT as well.
Thanks!
I'm trying to evaluate TAP-Net on the Kubric dataset, and I'm getting the error shown below. I am running the following script: python3 ./tapnet/experiment.py --config=./tapnet/configs/tapnet_config.py --jaxline_mode=eval_kubric --config.checkpoint_dir=/data3/tap/tap/tapnet_checkpoint/
.
Any idea how to fix this? Thanks!
I0228 19:56:23.706737 140398657533120 train.py:152] Evaluating with config:
best_model_eval_metric: ''
best_model_eval_metric_higher_is_better: true
checkpoint_dir: /data3/tap//tap/tapnet_checkpoint/
checkpoint_interval_type: null
dataset_names: &id001 !!python/tuple
- kubric
eval_initial_weights: true
eval_modes: &id002 !!python/tuple
- eval_davis_points
- eval_jhmdb
- eval_robotics_points
- eval_kinetics_points
evaluate_every: 10000
experiment_kwargs:
config:
checkpoint_dir: /data3/tap//tap/tapnet_checkpoint/
datasets:
dataset_names: *id001
kubric_kwargs:
batch_dims: 8
shuffle_buffer_size: 128
train_size: !!python/tuple
- 256
- 256
davis_points_path: ''
eval_modes: *id002
evaluate_every: 10000
fast_variables: !!python/tuple []
inference:
input_video_path: ''
num_points: 20
output_video_path: ''
resize_height: 256
resize_width: 256
jhmdb_path: ''
optimizer:
adam_kwargs:
b1: 0.9
b2: 0.95
eps: 1.0e-08
base_lr: 0.002
cosine_decay_kwargs:
end_value: 0.0
init_value: 0.0
warmup_steps: 5000
max_norm: -1
optimizer: adam
schedule_type: cosine
weight_decay: 0.01
robotics_points_path: ''
save_final_checkpoint_as_npy: true
shared_modules:
shared_module_names: &id003 !!python/tuple
- tapnet_model
tapnet_model_kwargs: {}
supervised_point_prediction_kwargs:
prediction_algo: cost_volume_regressor
sweep_name: default_sweep
training:
n_training_steps: 100000
interval_type: secs
log_all_train_data: false
log_tensors_interval: 60
log_train_data_interval: 120.0
logging_interval_type: null
max_checkpoints_to_keep: 5
one_off_evaluate: false
random_mode_eval: same_host_same_device
random_mode_train: unique_host_unique_device
random_seed: 42
save_checkpoint_interval: 10
shared_module_names: *id003
train_checkpoint_all_hosts: false
training_steps: 100000
I0228 19:56:23.755014 140398657533120 xla_bridge.py:173] Remote TPU is not linked into jax; skipping remote TPU.
I0228 19:56:23.755347 140398657533120 xla_bridge.py:357] Unable to initialize backend 'tpu_driver': Could not initialize backend 'tpu_driver'
I0228 19:56:24.424445 140398657533120 xla_bridge.py:357] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I0228 19:56:24.425570 140398657533120 xla_bridge.py:357] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0228 19:56:28.558446 140398657533120 supervised_point_prediction.py:979] Saving videos to /data3/tap//tap/tapnet_checkpoint/eval_kubric/0
I0228 19:56:28.567507 140398657533120 dataset_info.py:565] Load dataset info from /data3/tap/kubric/movi_e/256x256/1.0.0
W0228 19:56:28.572742 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'uint8'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to uint8.
W0228 19:56:28.574135 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'uint16'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to uint16.
I0228 19:56:28.620231 140398657533120 dataset_info.py:654] Fields info.[splits] from disk and from code do not match. Keeping the one from code.
I0228 19:56:28.620935 140398657533120 dataset_builder.py:522] Reusing dataset movi_e (/data3/tap/kubric/movi_e/256x256/1.0.0)
W0228 19:56:28.622349 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.622643 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'float32'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to float32.
W0228 19:56:28.622788 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.622916 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623071 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'int32'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to int32.
W0228 19:56:28.623173 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623292 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623408 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623907 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624111 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'string'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to object.
W0228 19:56:28.624262 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624418 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624617 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.625051 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'int64'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to int64.
W0228 19:56:28.625315 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'bool'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to bool.
I0228 19:56:29.712036 140398657533120 logging_logger.py:49] Constructing tf.data.Dataset movi_e for split None, from /data3/tap/kubric/movi_e/256x256/1.0.0
W0228 19:56:32.420510 140398657533120 deprecation.py:337] From /data/anaconda3/envs/tapnet/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0228 19:56:39.477390 140398657533120 deprecation.py:541] From /data/anaconda3/envs/tapnet/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
2023-02-28 19:56:44.292071: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
2023-02-28 19:56:45.191140: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
Traceback (most recent call last):
File "./tapnet/experiment.py", line 429, in <module>
app.run(main)
File "/home/ubuntu/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "./tapnet/experiment.py", line 421, in main
platform.main(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
return f(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
train.evaluate(experiment_class, config, checkpointer, writer,
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
return fn(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
evaluate_out = f(*args, **kwargs)
File "./tapnet/experiment.py", line 404, in evaluate
eval_scalars = point_prediction_task.evaluate(
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 514, in evaluate
self._eval_epoch(
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 1005, in _eval_epoch
scalars, viz = eval_batch_fn(params, state, inputs, rng)
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 766, in _eval_batch
occlusion_logits, tracks, loss_scalars = self._infer_batch(
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 577, in _infer_batch
output, _ = functools.partial(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
out = f(*args, **kwargs)
File "./tapnet/experiment.py", line 122, in forward
return self.point_prediction.forward_fn(
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 313, in forward_fn
return shared_modules['tapnet_model'](
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/ubuntu/contractive/tapnet/tapnet_model.py", line 341, in __call__
latent = self.tsm_resnet(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/ubuntu/contractive/tapnet/models/tsm_resnet.py", line 383, in __call__
net = hk.Conv2D(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
return wrapped._current(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.
I am trying to train the TAPIR model on the Kubric Dataset using Google Colab however my code keeps stopping without any errors. I am using the python ./experiment.py --config ./configs/tapir_config.py
command and the config file is loaded successfully. The training process stops abruptly without any errors. I am unable to determine the cause and would be really grateful for any help in this regards.
Thank You!
Hi, when I use the Kubric(Movi-E), I find the number of the train set is about 9750, the validation set is about 250 and the test set is about 999, which is different with the number 38,325/799 for train/validation in paper.
When invoking experiment.py
to do inference:
python3 ./tapnet/experiment.py \
--config=./tapnet/configs/tapnet_config.py \
--jaxline_mode=eval_inference \
--config.checkpoint_dir=./tapnet/checkpoint/ \
--config.experiment_kwargs.config.inference.input_video_path=fixed10.mp4 \
--config.experiment_kwargs.config.inference.output_video_path=result.mp4 \
--config.experiment_kwargs.config.inference.resize_height=256 \
--config.experiment_kwargs.config.inference.resize_width=256 \
--config.experiment_kwargs.config.inference.num_points=20
I get the following error:
Traceback (most recent call last):
File "./tapnet/experiment.py", line 431, in <module>
app.run(main)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "./tapnet/experiment.py", line 424, in main
platform.main(
File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
return f(*args, **kwargs)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
train.evaluate(experiment_class, config, checkpointer, writer,
File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
return fn(*args, **kwargs)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
evaluate_out = f(*args, **kwargs)
File "./tapnet/experiment.py", line 405, in evaluate
eval_scalars = point_prediction_task.evaluate(
File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 370, in evaluate
self._eval_inference(
File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 981, in _eval_inference
outputs, _ = self._infer_batch(
File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 440, in _infer_batch
output, _ = functools.partial(wrapped_forward_fn, input_key=input_key)(
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
out = f(*args, **kwargs)
File "./tapnet/experiment.py", line 125, in forward
return self.point_prediction.forward_fn(
File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 150, in forward_fn
return shared_modules[self.model_key](
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/jrcoyle/tapnet/tapnet_model.py", line 215, in __call__
latent = self.tsm_resnet(
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/jrcoyle/tapnet/models/tsm_resnet.py", line 383, in __call__
net = hk.Conv2D(
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
return wrapped._current(*args, **kwargs)
File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.
Attempting to use a local GPU. The live_demo.py
script works for me, so not sure what the issue is here.
Hi,
You only mention that PIPs is a concurrent work to TAP-Vid/Net.
Are you planning to show any comparisons between these 2 works?
I just run the google colab inference demo, and it seems like the points are different from the points shown in the paper.
Hi authors,
i find that tapir is sometimes giving negative coordinate, why do it happen and what do you think that it represent?
thx!
Hi everyone,
i find it really hard to get tapir to run on gpu, is there a standard procedure to do this?
the thing i do/try is: (after i create a new conda environment)
and the following error pops out
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.func.launch' failed: Failed to load PTX text as a module: CUDA_ERROR_INVALID_IMAGE: device kernel image is invalid; current tracing scope: fusion; current profiling annotation: XlaModule:#hlo_module=jit__threefry_seed,program_id=0#.
note that using only cpu version of this wouldn't hurt (simply pip install requirement_inference.txt)
could someone state your standard procedure for making it work? much thanks
Hi, I am trying to understand the metrics used in TAP-Vid. It seems to me that the Jaccard metric contains a bug.
In particular here the ((~within_dist) & pred_visible)
also counts the visible
from gt_positives.
In the end both gt_positives
and false_positives
are summed and the denominator counts some points 2x. (In particular, there is np.sum((~within_dist) & pred_visible & visible)
of them).
What do you think?
Hi, I have already installed JAX version with CUDA.
However, it seems when I run python ./experiment.py --config ./configs/tapnet_config.py
,
it is still trying to use the TPU and eventually fails cause I don't have access to it. And then, end up using CPU by default.
I wonder how to use the CUDA to train and evaluate the model.
Thank you very much for your support in advance!
Hi,
I am trying to run inference with a toy movie using the following command -
(tapnet) pinot:$ python3 ./experiment.py --config=./configs/tapnet_config.py --jaxline_mode=eval_inference --config.checkpoint_dir=./checkpoint/ --config.experiment_kwargs.config.inference.input_video_path=test_data/ta.mp4 --config.experiment_kwargs.config.inference.output_video_path=result.mp4 --config.experiment_kwargs.config.inference.resize_height=256 --config.experiment_kwargs.config.inference.resize_width=256 --config.experiment_kwargs.config.inference.num_points=20
I have created a virtual conda environment and installed the deps using the requirements.txt file. Running the above command in the virtual env results in the following error -
2023-06-26 20:10:33.108623: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/home/pshah/software/tapnet/./experiment.py", line 32, in <module>
from kubric.challenges.point_tracking import dataset
File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/__init__.py", line 20, in <module>
from kubric.core.scene import Scene
File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/core/__init__.py", line 17, in <module>
from .scene import Scene
File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/core/scene.py", line 20, in <module>
from kubric.utils import next_global_count
File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/utils.py", line 52, in <module>
from kubric.custom_types import PathLike
File "/home/pshah/software/miniconda3/lib/python3.10/site-packages/kubric/custom_types.py", line 25, in <module>
PathLike = Union[str, tfds.core.ReadWritePath]
AttributeError: module 'tensorflow_datasets.core' has no attribute 'ReadWritePath'
Is there a fix for this issue?
TIA!
Hi !
I wonder how TAPNet deals with the occlusions. I find the model leverages the Huber loss only at visible points. For the evaluation of "first fashion", when a point is occluded in the next frame and appears later, how the model performs point tracking? Does the model first detect the occlusion and then skip this frame, and track the point in the following frames?
Hi,
First, thanks for sharing your work.
In the Kubric dataset, I see that almost all occluded points are correctly labeled as occluded, but I still visually obverse few points among 1200=50 (videos) x 24 (points per video) are occluded. I'm sure these points are occluded because the foreground object has moved a large distance and the ground truth still stays the original position, like a position of the ground. I looked into the code, but still have not idea what caused this problem. Could you please give some suggestions or even a solution? Thanks.
There is also an issue that "reprojected position is off for points on objects" under the Kubric repository. Would you be willing to answer that too? google-research/kubric#280
Best,
Zhihao
note:this is not a bug event
when i run scripts as flow:
python3 ./tapnet/experiment.py \ --config=./tapnet/configs/tapnet_config.py \ --jaxline_mode=eval_davis_points \ --config.checkpoint_dir=./tapnet/checkpoint/ \ --config.experiment_kwargs.config.davis_points_path=/path/to/tapvid_davis.pkl
i got result in \tapnet\checkpoint\eval_davis_points\100000,there are 0.mp4 - 9.mp4 generated; in every mp4 video,there are four frames in parent PIC,though all frames seems blur; the points in pics all seems bad, is it a real result, i mean ,why results not seems amazing instead of disappointing?
First of all, thank you so much for releasing this great work. I test it on my custom videos, and the tracking is really robust and accurate.
Following the README, I'm able to run the CPU version of the code (because the JAX in requirements.txt
is the CPU-only version) at a very high speed. I use a 300-frame video and track 24 points from the initial frame. It takes only 10s to output the tracking results (excluding the video painting/saving time).
Then, I'm thinking of using the GPU version of JAX to further speed up the inference. I successfully installed JAX-cuda (see the screenshot below) and nvidia-smi
confirms that the code is indeed using GPU (consumes 20GB memory on an RTX 3090 GPU). However, the running time is 15s -- much slower than JAX-CPU's 10s. For your reference, I'm using the command from the README:
python3 ./tapnet/experiment.py \
--config=./tapnet/configs/tapnet_config.py \
--jaxline_mode=eval_inference \
--config.checkpoint_dir=./tapnet/checkpoint/ \
--config.experiment_kwargs.config.inference.input_video_path=MY_VIDEO.mp4 \
--config.experiment_kwargs.config.inference.output_video_path=result.mp4 \
--config.experiment_kwargs.config.inference.resize_height=256 \
--config.experiment_kwargs.config.inference.resize_width=256 \
--config.experiment_kwargs.config.inference.num_points=24
I'm new to JAX, so I'd really appreciate it if you can provide some hints on why my GPU code runs slower then CPU. Thanks!
EDIT: after looking at some JAX-GPU related issues and the document, is it simply because the video/point size is too small? I.e. if I use a batch of video or more points, GPU should be faster?
Hi,
I wonder about the code here
It seems to me that it is supposed to turn off evaluation of points before the query frame.
I however don't understand why / how is the 'index' computed. We already have the query frames computed here, so why the np.where?
Anyways, I thing that the code is incorrect. The for loop loops over the "batch" dimension, which is always 1. This means that the current code does not turn off the evaluation before the query point.
I think it should be corrected to:
for i in range(gt_occluded.shape[1]): # loop over the query points
index = np.where(gt_occluded[0, i] == 0)[0][0] # find the first unoccluded frame for that query point (in singleton batch 0)
evaluation_points[i, :index] = False
or equivalently:
N_queries = gt_occluded.shape[1]
for i in range(N_queries):
evaluation_points[i, :query_frame[i]] = False
Am I missing something? Or can I make a pull request for this correction?
BTW it would be very nice if you could provide the raw results (i.e., the pred_occluded
and pred_tracks
arguments of compute_tapvid_metrics
) of the trackers evaluated in your papers, so that people could evaluate the trackers on different metrics without having to re-implement and without wasting quite a lot of compute (pretty please?)
Best regards!
Thanks for the nice work! Is the codebase used to evaluate RAFT possibly available publicly?
I was using your evaluate_dataset.py utilities to load DAVIS and KUBRIC and to later compute the PCK metrics but wasn't able to reproduce the reported numbers. For example, my average_pts_within_thresh
for DAVIS is 87.5%
and does not match the reported 46.3%
. The numbers that I've got for strided query evaluation with stride 5 are the following:
RAFT Results | DAVIS | KUBRIC |
---|---|---|
ade_visible | 1.42952 | 0.954813 |
pts_within_0.01 | 20.585 | 20.6073 |
pts_within_0.1 | 26.0005 | 31.6371 |
pts_within_0.5 | 49.9805 | 68.3699 |
pts_within_1 | 66.5964 | 82.676 |
pts_within_2 | 82.5936 | 91.0452 |
pts_within_4 | 92.3043 | 95.6189 |
pts_within_8 | 96.7841 | 98.0127 |
pts_within_16 | 99.0649 | 99.1844 |
average_pts_within_thresh | 87.4687 | 93.3074 |
In case you might spot something obviously incorrect, here is the ground truth and my predicted trajectory for the first point and the first 5 frames of DAVIS, all of which were visible, alongside relevant summary metrics computed for that datapoint alone.
*** Results dictionary for the first datapoint of RAFT:
iter: 1
video_idx: 0
point_idx_in_video: 0
trajectory_gt: tensor([[131.9137, 87.8248],
[135.1333, 89.2444],
[137.8000, 91.8519],
[140.2000, 95.8815],
[142.6000, 102.5185]])
trajectory_pred: tensor([[131.9137, 87.8248],
[135.3244, 89.0901],
[137.8152, 91.3863],
[140.1928, 94.9903],
[142.7608, 101.2055]])
visibility_gt: tensor([True, True, True, True, True])
*** Summary metrics for the first datapoint of RAFT:
idx: 1--0--0
ade_visible: 0.5850879549980164 (including query point)
pts_within_0.01: 0.0 (percentage)
pts_within_0.1: 0.0
pts_within_0.5: 50.0
pts_within_1: 75.0
pts_within_2: 100.0
pts_within_4: 100.0
pts_within_8: 100.0
pts_within_16: 100.0
Also, here are a few GIFs for a random data batch of DAVIS (top) and KUBRIC (bottom):
Ground Truth | Prediction | Prediction on top of GT |
---|---|---|
If relevant, these are the entry points into my codebase:
Hi,
Each time I am trying to run videos which are more than 300 frames I am running into a memory allocation error. Otherwise for anyting below that it seems to work really well and with the GPU engaged etc.
Do you have any recommendation? I am putting the traceback below
2023-08-11 16:43:58.122328: W external/xla/xla/service/gpu/conv_algorithm_picker.cc:1003] Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.34 = (f32[500,64,128,128]{3,2,1,0}, u8[0]{0}) custom-call(f32[500,64,128,128]{3,2,1,0} %maximum.62, f32[64,64,3,3]{3,2,1,0} %transpose.2482, f32[64]{0} %broadcast.2604, f32[500,64,128,128]{3,2,1,0} %get-tuple-element.147), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255}, backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":1,"leakyrelu_alpha":0}
Original error: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 2113929216 bytes.
As a result, convolution performance may be suboptimal.
2023-08-11 16:43:59.367796: W external/xla/xla/service/gpu/conv_algorithm_picker.cc:1003] Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.35 = (f32[500,64,128,128]{3,2,1,0}, u8[0]{0}) custom-call(f32[500,64,128,128]{3,2,1,0} %maximum.65, f32[64,64,3,3]{3,2,1,0} %transpose.2485, f32[64]{0} %broadcast.2618, f32[500,64,128,128]{3,2,1,0} %get-tuple-element.149), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255}, backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":1,"leakyrelu_alpha":0}
Original error: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 2113929216 bytes.
As a result, convolution performance may be suboptimal.
2023-08-11 16:44:47.337077: W external/tsl/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 5.89GiB (rounded to 6324376320)requested by op
2023-08-11 16:44:47.337531: W external/tsl/tsl/framework/bfc_allocator.cc:497] ________________________________________________________________________________________***_____
2023-08-11 16:44:47.339353: E external/xla/xla/pjrt/pjrt_stream_executor_client.cc:2593] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 6324376288 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 493.53MiB
constant allocation: 8.1KiB
maybe_live_out allocation: 781.4KiB
preallocated temp allocation: 5.89GiB
preallocated temp fragmentation: 30.84MiB (0.51%)
total allocation: 6.37GiB
total fragmentation: 31.63MiB (0.48%)
Peak buffers:
Buffer 1:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_0/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
XLA Label: custom-call
Shape: f32[500,64,128,128]
==========================
Buffer 2:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/jit(relu)/max" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=252 deduplicated_name="fusion.656"
XLA Label: fusion
Shape: f32[500,64,128,128]
==========================
Buffer 3:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
XLA Label: custom-call
Shape: f32[500,64,128,128]
==========================
Buffer 4:
Size: 375.00MiB
Entry Parameter Subshape: f32[1,500,256,256,3]
==========================
Buffer 5:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 6:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 7:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 8:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 9:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 10:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 11:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 12:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 13:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 14:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 15:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
XlaRuntimeError Traceback (most recent call last)
Cell In[18], line 9
7 vidrames = copy.copy(frames)
8 frames = vidrames[:500]
----> 9 tracks, visibles = inference(frames, query_points)
Cell In[3], line 92, in inference(frames, query_points)
90 # Model inference
91 rng = jax.random.PRNGKey(42)
---> 92 outputs, _ = model_apply(params, state, rng, frames, query_points)
93 outputs = tree.map_structure(lambda x: np.array(x[0]), outputs)
94 tracks, occlusions, expected_dist = outputs['tracks'], outputs['occlusion'], outputs['expected_dist']
[... skipping hidden 10 frame]
File ~/miniconda3/envs/tapir/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py:1229, in ExecuteReplicated.call(self, *args)
1224 self._handle_token_bufs(
1225 results.disassemble_prefix_into_single_device_arrays(
1226 len(self.ordered_effects)),
1227 results.consume_token())
1228 else:
-> 1229 results = self.xla_executable.execute_sharded(input_bufs)
1230 if dispatch.needs_check_special():
1231 out_arrays = results.disassemble_into_single_device_arrays()
XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 6324376288 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 493.53MiB
constant allocation: 8.1KiB
maybe_live_out allocation: 781.4KiB
preallocated temp allocation: 5.89GiB
preallocated temp fragmentation: 30.84MiB (0.51%)
total allocation: 6.37GiB
total fragmentation: 31.63MiB (0.48%)
Peak buffers:
Buffer 1:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/conv_0/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
XLA Label: custom-call
Shape: f32[500,64,128,128]
==========================
Buffer 2:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_1/jit(relu)/max" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=252 deduplicated_name="fusion.656"
XLA Label: fusion
Shape: f32[500,64,128,128]
==========================
Buffer 3:
Size: 1.95GiB
Operator: op_name="jit(apply_fn)/jit(main)/tapir/get_feature_grids/tapir/resnet/block_group_0/block_0/conv_1/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 precision=None preferred_element_type=None]" source_file="/home/rum/git/tapnet/models/resnet.py" source_line=255
XLA Label: custom-call
Shape: f32[500,64,128,128]
==========================
Buffer 4:
Size: 375.00MiB
Entry Parameter Subshape: f32[1,500,256,256,3]
==========================
Buffer 5:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 6:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 7:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 8:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 9:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 10:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 11:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 12:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 13:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
Buffer 14:
Size: 4.00MiB
Entry Parameter Subshape: f32[512,2048]
==========================
Buffer 15:
Size: 4.00MiB
Entry Parameter Subshape: f32[2048,512]
==========================
HI author of tapnet,
thanks for your great work.
I want to ask how could i use GPU for inference in your work.
with normal setup, it reminds me "no GPU/TPU, falling back to cpu" when i actually have GPU available.
Then i see that it seems i need correct version of jax,
so i then create a new env, pip install the compatible version of jax as instructed in a link you provide, and then set up the remaining dependency as in requirement.txt. but i got this:
"Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"
and
"jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details."
do you know why is that? thanks a lot
Hi,
Thank you for the great work and code release. May I know how should I obtain "panning MOVi-E" in the TAPIR paper? I see references in this repo to the standard Kubric API but I don't see any pointers to the "panning" version.
Thanks a lot!
I would like to clarify the evaluation setting.
Because the query points can be sampled any where within the video (not just in the first frame), do we have to track them backward in time or just need to track them forward?
For example, if the query point is sample at frame T, do we have to find its position in frames 0->(T-1), or just need to track it in (T+1)->Max_Frame?
Thank you very much for ur amazing work!
There's an error when evaluating on Kubric:
(tap) xingzhenghao@xingzhenghao-PC:~/PycharmProjects$ python ./tapnet/experiment.py --config ./tapnet/configs/tapnet_config.py --jaxline_mode=eval_kubric --config.checkpoint_dir=./tapnet/checkpoint/
I1218 16:02:29.348184 140603062212416 train.py:152] Evaluating with config:
best_model_eval_metric: ''
best_model_eval_metric_higher_is_better: true
checkpoint_dir: ./tapnet/checkpoint/
checkpoint_interval_type: null
dataset_names: &id001 !!python/tuple
- kubric
eval_initial_weights: true
eval_modes: &id002 !!python/tuple
- eval_davis_points
- eval_jhmdb
- eval_robotics_points
- eval_kinetics_points
evaluate_every: 10000
experiment_kwargs:
config:
checkpoint_dir: ./tapnet/checkpoint/
datasets:
dataset_names: *id001
kubric_kwargs:
batch_dims: 8
shuffle_buffer_size: 128
train_size: !!python/tuple
- 256
- 256
davis_points_path: /home/xingzhenghao/PycharmProjects/datasets/tap/tapvid_davis/tapvid_davis.pkl
eval_modes: *id002
evaluate_every: 10000
fast_variables: !!python/tuple []
jhmdb_path: null
optimizer:
adam_kwargs:
b1: 0.9
b2: 0.95
eps: 1.0e-08
base_lr: 0.002
cosine_decay_kwargs:
end_value: 0.0
init_value: 0.0
warmup_steps: 5000
max_norm: -1
optimizer: adam
schedule_type: cosine
weight_decay: 0.01
robotics_points_path: /home/xingzhenghao/PycharmProjects/datasets/tap/tapvid_rgb_stacking/tapvid_rgb_stacking.pkl
save_final_checkpoint_as_npy: true
shared_modules:
shared_module_names: &id003 !!python/tuple
- tapnet_model
tapnet_model_kwargs: {}
supervised_point_prediction_kwargs:
prediction_algo: cost_volume_regressor
sweep_name: default_sweep
training:
n_training_steps: 100000
interval_type: secs
log_all_train_data: false
log_tensors_interval: 60
log_train_data_interval: 120.0
logging_interval_type: null
max_checkpoints_to_keep: 5
one_off_evaluate: false
random_mode_eval: same_host_same_device
random_mode_train: unique_host_unique_device
random_seed: 42
save_checkpoint_interval: 10
shared_module_names: *id003
train_checkpoint_all_hosts: false
training_steps: 100000
I1218 16:02:29.355844 140603062212416 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I1218 16:02:29.422299 140603062212416 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host Interpreter CUDA
I1218 16:02:29.422796 140603062212416 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1218 16:02:29.422965 140603062212416 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I1218 16:02:29.972896 140603062212416 supervised_point_prediction.py:944] Saving videos to ./tapnet/checkpoint/eval_kubric/100000
2022-12-18 16:02:29.987263: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
I1218 16:02:32.725086 140603062212416 dataset_info.py:491] Load dataset info from gs://kubric-public/tfds/movi_e/256x256/1.0.0
I1218 16:02:35.881139 140603062212416 dataset_info.py:550] Field info.splits from disk and from code do not match. Keeping the one from code.
I1218 16:02:36.213931 140603062212416 dataset_builder.py:383] Reusing dataset movi_e (gs://kubric-public/tfds/movi_e/256x256/1.0.0)
I1218 16:02:36.214255 140603062212416 logging_logger.py:44] Constructing tf.data.Dataset movi_e for split None, from gs://kubric-public/tfds/movi_e/256x256/1.0.0
W1218 16:02:39.021307 140603062212416 deprecation.py:337] From /home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W1218 16:02:43.468899 140603062212416 deprecation.py:541] From /home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
2022-12-18 16:02:46.722060: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
2022-12-18 16:02:47.417004: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
Traceback (most recent call last):
File "./tapnet/experiment.py", line 427, in <module>
app.run(main)
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "./tapnet/experiment.py", line 420, in main
platform.main(
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
return f(*args, **kwargs)
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
train.evaluate(experiment_class, config, checkpointer, writer,
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
return fn(*args, **kwargs)
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
evaluate_out = f(*args, **kwargs)
File "./tapnet/experiment.py", line 401, in evaluate
eval_scalars = point_prediction_task.evaluate(
File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 495, in evaluate
self._eval_epoch(
File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 968, in _eval_epoch
for inputs in self._build_eval_input(mode):
File "/home/xingzhenghao/PycharmProjects/tapnet/supervised_point_prediction.py", line 805, in _build_eval_input
yield from evaluation_datasets.create_kubric_eval_dataset(mode)
File "/home/xingzhenghao/PycharmProjects/tapnet/evaluation_datasets.py", line 463, in create_kubric_eval_dataset
for data in np_ds:
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_utils.py", line 65, in _eager_dataset_iterator
for elem in ds:
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 836, in __next__
return self._next_internal()
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 819, in _next_internal
ret = gen_dataset_ops.iterator_get_next(
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2923, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/home/xingzhenghao/anaconda3/envs/tap/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7186, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes at component 0: expected [?,256,24] but got [1,39,24]. [Op:IteratorGetNext]
Thanks a lot for ur support in advance!
Frameworks such as Mediapipe or OpenPose are used to extract skeletal keypoints from images.
Unfortunately, the results are inconsistent and somewhat jittery when trying to extract poses from consecutive frames.
I propose a use case supported by tapir:
tapir
's tracking. If tapir
and mediapipe
diverge, fall back to the mediapipe pose and continue tracking from there.This idea, similarly to how MP4 files work, considers P-frames as gold, mediapipe
poses, and I-frames, as long as consistent, from tapir
. When the data stored in the I-frame
is no longer consistent, introduce another P-frame. (this can also be done per-frame per-keypoint)
Related issue: qianqianwang68/omnimotion#5
In the previous iteration of TAP-Net, there was a checkpoint file released that had not only model state but also the optimizer state as well as the global_step
. This was helpful since you could load this in directly into an experiment and easily start finetuning. However, I don't believe that there is a similar checkpoint file for TAPIR. In the README
there is a checkpoint for the "online" version of the model, and in one of the linked notebooks there is a checkpoint for the offline model, but neither includes training state.
Could you release a checkpoint with the training state included for the TAPIR model?
I am trying to run TAPNET in Windows 11 with Anaconda3-2023.07-2-Windows-x86_64.exe but I still get a JAX related error message.
I already installed jaxlib-0.4.11-cp311-cp311-win_amd64.whl
but it´s apparently not enough.
(base) C:\Users\ATC\tapnet>python ./experiment.py --config ./configs/tapir_config.py
Traceback (most recent call last):
File "C:\Users\ATC\tapnet\experiment.py", line 29, in <module>
from jaxline import experiment
File "C:\ProgramData\anaconda3\Lib\site-packages\jaxline\experiment.py", line 30, in <module>
from jaxline import utils
File "C:\ProgramData\anaconda3\Lib\site-packages\jaxline\utils.py", line 335, in <module>
rng: jnp.DeviceArray,
^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\Lib\site-packages\jax\_src\deprecations.py", line 53, in getattr
raise AttributeError(f"module {module!r} has no attribute {name!r}")
AttributeError: module 'jax.numpy' has no attribute 'DeviceArray'
conda list includes this:
jax 0.4.14 pypi_0 pypi
jaxlib 0.4.11 pypi_0 pypi
jaxline 0.0.5 pypi_0 pypi
Can somebody tell me what I am doing wrong?
Hi TapNet authors,
Just to confirm if I understand the evaluation correctly: It is clear that you train and test tap-net on resized images of 256x256. When you run the baseline methods, you also use resized images of 256x256 as input, is that correct?
If so, could you explain why it is preferable to evaluate on reduced resolution when the full-resolution images & annotations are available?
When evaluating the model on kubric, the generation code still randomly samples points from the kubric val videos each time.
How do you make sure the sampled points are similar each time they are sampled?
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Where do I set the log level and how to fix this issue?
Thank you ^^
I am getting inference for live video on CPU,Speed is very low , I thing it is because of cpu but my problem is that to increase number of frame in video. Do you have idea about it.
In Simple word, want to increase speed on cpu
Thanks in advance
I tried exploring the codebase but couldn't find a simple way to try out the model on a new video to evaluate performance? Did I miss the script (I am unfamiliar with JAX, coming from Pytorch ecosystem).
I tried to install this under windows, but there is no win version of jax. Is it possible to run this under windows?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.