16lemoing / dot Goto Github PK

View Code? Open in Web Editor NEW

226.0 226.0 14.0 2.68 MB

Dense Optical Tracking: Connecting the Dots

Home Page: https://16lemoing.github.io/dot

License: MIT License

Python 73.80% C++ 8.63% Cuda 16.76% C 0.81%

dot's People

Contributors

Stargazers

Watchers

Forkers

chuyiwen wolfworld6 hiyyg sfen779 alexandor91 justachetan riccardoromani1 russ76 shiyuan0806 ike-taku

dot's Issues

Reproducing the training results

Hi, @16lemoing,

Congratulations on your paper acceptance! 🎉

I encountered some problems while reproducing your training results. I followed the instructions in training section. Seems the motion loss was not convergent while I set world_size = 4 which aligns with the setting in the paper.
"DOT is trained on frames at resolution 512×512 for 500k steps with the ADAM optimizer [32] and a learning rate of 10−4 using 4 NVIDIA V100 GPUs."

Could you please provide some suggestions? thx~

The performerance of sparse point tracker

Hi, thanks for this excellent work.
I would like to know if the cotracker2 used in dot has been improved or retrained? Because when I tried to replace the cotracker in the dot with the official version of cotracker2, there was a significant decrease in performance.

Model and code

inference with diff video size

Hi, thank you for your work. For some downstream tasks, I usually need to convert videos of various sizes into 512x320, but I see that the default input received by DOT is 856x480, num_tracks=8192; My question is: if the input video is 512x320 size, which values need to be changed(eg: How should I adjust the value of num_track according to the size?) , and whether the change will affect the final performance. In short, for 512x320 videos, is there a good set of parameters that can maintain the original performance?

inference

hi,

if i wanna obtain the results for different videos, do i need to run training for each video, or use the ckpt you you provided for inference. if so, how to inference, thank you

thanks,

_pickle.UnpicklingError: invalid load key, '<'.

hi, I found sth maybe needs to be revised:
the link is changed, below is correct.
!wget -P checkpoints https://huggingface.co/16lemoing/dot/blob/main/cvo_raft_patch_8.pth
!wget -P checkpoints https://huggingface.co/16lemoing/dot/blob/main/movi_f_raft_patch_4_alpha.pth
!wget -P checkpoints https://huggingface.co/16lemoing/dot/blob/main/movi_f_cotracker_patch_4_wind_8.pth
!wget -P datasets https://huggingface.co/16lemoing/dot/blob/main/demo.zip
!unzip datasets/demo.zip -d datasets/
When I installed any package and ran the command below.
!python demo.py --visualization_modes spaghetti_last_static --video_path orange.mp4
!python demo.py --visualization_modes spaghetti_last_static --video_path treadmill.mp4
I found an error here:
Traceback (most recent call last):
File "/content/dot/demo.py", line 310, in
main(args)
File "/content/dot/demo.py", line 269, in main
model = create_model(args).cuda()
File "/content/dot/dot/models/init.py", line 7, in create_model
model = DenseOpticalTracker(
File "/content/dot/dot/models/dense_optical_tracking.py", line 23, in init
self.point_tracker = PointTracker(height, width, tracker_config, tracker_path, estimator_config, estimator_path)
File "/content/dot/dot/models/point_tracking.py", line 19, in init
self.model.load_state_dict(torch.load(tracker_path, map_location=device))
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1028, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1246, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
I checked the track_path: movi_f_cotracker_patch_4_wind_8.pth and I guess there is a problem with the saving format of model. I really need your help, Thanks.

c/c++ support

Will you release c/c++ version for your great tracking?

AssertionError: Torch not compiled with CUDA enabled

When I try to run the demo, there is AssertionError: Torch not compiled with CUDA enabled. Should I change to another torch version? My gpu is 4070ti so I guess its cuda version can support enough pytorch version.

question about the batch size

Hi, thanks for sharing this great project. I notice that you assert batchsize==1. What's the consideration behind this choice and if I want to increase the batch size, which part should be taken care of? Thanks!

Input format for `tracks_for_queries` mode in the model

Hi! Thank you for releasing the code and models publicly! I am trying to use the model to perform inference on my own videos. For visualization, I want to focus on a few selected query points.

The tracks_for_queries mode here seems to be what I need. However, I cannot figure out the required format of the query_points. Could you kindly provide some information about the same?

Thanks!

Inference for low gpu and less number of points

Hello
You have done a great work I really appriciate it !
I have been trying to run the model to track some specific points on videos but I could not figure out how to do that exactly. I tried the format

model({"video": video[None], "query_points": torch.Tensor([[[1, 15, 51]]]).cuda()},
but GPU ran out of memory. Am I doing it right or is there any other method to do this ?

Support for online tracking tasks

Hi, thanks for this fantastic work!
I'm running the demo but found it takes pretty long time (~1min 30s) to track batch of points on varanus data. Is there any ways to speed it up, or I'm just wondering if Dot could do online tracking that takes images one by one and generate tracked points for each frame. Thanks!

Failed to save the result

Hi, great work!
But when I was trying to run the demo on my own pc, this error comes up.

Traceback (most recent call last):
File "D:\codes\dot\demo.py", line 310, in
main(args)
File "D:\codes\dot\demo.py", line 305, in main
visualizer(data, mode=mode)
File "C:\Users\SHUO\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\codes\dot\demo.py", line 36, in forward
File "D:\codes\dot\dot\utils\io.py", line 82, in write_video
write_video_to_file(video, path, channels)
File "D:\codes\dot\dot\utils\io.py", line 92, in write_video_to_file
File "C:\Users\SHUO\anaconda3\Lib\site-packages\torchvision\io\video.py", line 134, container.mux(packet)
File "av\container\output.pyx", line 211, in av.container.output.OutputContainer.mux
File "av\container\output.pyx", line 217, in av.container.output.OutputContainer.mux_one
File "av\container\output.pyx", line 172, in av.container.output.OutputContainer.start_encoding
File "av\error.pyx", line 336, in av.error.err_check
av.error.ValueError: [Errno 22] Invalid argument

the demo.py can do the tracking and refine part, takes a long time to wait, but it cannot save the final result, which is really disappointing.😭Do you know how to solve this problem? I really what to use this to process my own video! Thank you!

Can I add some datasets to train.

Hi,
Could I add some datasets for training, if so, what should I do.
Thanks.

Training code for optical flow and point tracking

Thank you for open-sourcing this impressive work.

The publicly available pre-trained weights contain retrained optical flow and point tracking models, are there any plans to open source the corresponding training code? It would be great to have a good code framework compatible with training for these tasks!