Giter Site home page Giter Site logo

dot's People

Contributors

16lemoing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dot's Issues

Reproducing the training results

Hi, @16lemoing,

Congratulations on your paper acceptance! ๐ŸŽ‰

I encountered some problems while reproducing your training results. I followed the instructions in training section. Seems the motion loss was not convergent while I set world_size = 4 which aligns with the setting in the paper.
"DOT is trained on frames at resolution 512ร—512 for 500k steps with the ADAM optimizer [32] and a learning rate of 10โˆ’4 using 4 NVIDIA V100 GPUs."
image
Could you please provide some suggestions? thx~

The performerance of sparse point tracker

Hi, thanks for this excellent work.
I would like to know if the cotracker2 used in dot has been improved or retrained? Because when I tried to replace the cotracker in the dot with the official version of cotracker2, there was a significant decrease in performance.

inference with diff video size

Hi, thank you for your work. For some downstream tasks, I usually need to convert videos of various sizes into 512x320, but I see that the default input received by DOT is 856x480, num_tracks=8192; My question is: if the input video is 512x320 size, which values need to be changed(eg: How should I adjust the value of num_track according to the size?) , and whether the change will affect the final performance. In short, for 512x320 videos, is there a good set of parameters that can maintain the original performance?

inference

hi,

if i wanna obtain the results for different videos, do i need to run training for each video, or use the ckpt you you provided for inference. if so, how to inference, thank you

thanks,

_pickle.UnpicklingError: invalid load key, '<'.

hi, I found sth maybe needs to be revised:
the link is changed, below is correct.
!wget -P checkpoints https://huggingface.co/16lemoing/dot/blob/main/cvo_raft_patch_8.pth
!wget -P checkpoints https://huggingface.co/16lemoing/dot/blob/main/movi_f_raft_patch_4_alpha.pth
!wget -P checkpoints https://huggingface.co/16lemoing/dot/blob/main/movi_f_cotracker_patch_4_wind_8.pth
!wget -P datasets https://huggingface.co/16lemoing/dot/blob/main/demo.zip
!unzip datasets/demo.zip -d datasets/
When I installed any package and ran the command below.
!python demo.py --visualization_modes spaghetti_last_static --video_path orange.mp4
!python demo.py --visualization_modes spaghetti_last_static --video_path treadmill.mp4
I found an error here:
Traceback (most recent call last):
File "/content/dot/demo.py", line 310, in
main(args)
File "/content/dot/demo.py", line 269, in main
model = create_model(args).cuda()
File "/content/dot/dot/models/init.py", line 7, in create_model
model = DenseOpticalTracker(
File "/content/dot/dot/models/dense_optical_tracking.py", line 23, in init
self.point_tracker = PointTracker(height, width, tracker_config, tracker_path, estimator_config, estimator_path)
File "/content/dot/dot/models/point_tracking.py", line 19, in init
self.model.load_state_dict(torch.load(tracker_path, map_location=device))
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1028, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1246, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
I checked the track_path: movi_f_cotracker_patch_4_wind_8.pth and I guess there is a problem with the saving format of model. I really need your help, Thanks.

c/c++ support

Will you release c/c++ version for your great tracking?

AssertionError: Torch not compiled with CUDA enabled

When I try to run the demo, there is AssertionError: Torch not compiled with CUDA enabled. Should I change to another torch version? My gpu is 4070ti so I guess its cuda version can support enough pytorch version.

question about the batch size

Hi, thanks for sharing this great project. I notice that you assert batchsize==1. What's the consideration behind this choice and if I want to increase the batch size, which part should be taken care of? Thanks!

Input format for `tracks_for_queries` mode in the model

Hi! Thank you for releasing the code and models publicly! I am trying to use the model to perform inference on my own videos. For visualization, I want to focus on a few selected query points.

The tracks_for_queries mode here seems to be what I need. However, I cannot figure out the required format of the query_points. Could you kindly provide some information about the same?

Thanks!

Inference for low gpu and less number of points

Hello
You have done a great work I really appriciate it !
I have been trying to run the model to track some specific points on videos but I could not figure out how to do that exactly. I tried the format

model({"video": video[None], "query_points": torch.Tensor([[[1, 15, 51]]]).cuda()},
but GPU ran out of memory. Am I doing it right or is there any other method to do this ?

Support for online tracking tasks

Hi, thanks for this fantastic work!
I'm running the demo but found it takes pretty long time (~1min 30s) to track batch of points on varanus data. Is there any ways to speed it up, or I'm just wondering if Dot could do online tracking that takes images one by one and generate tracked points for each frame. Thanks!

Failed to save the result

Hi, great work!
But when I was trying to run the demo on my own pc, this error comes up.

Traceback (most recent call last):
File "D:\codes\dot\demo.py", line 310, in
main(args)
File "D:\codes\dot\demo.py", line 305, in main
visualizer(data, mode=mode)
File "C:\Users\SHUO\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\codes\dot\demo.py", line 36, in forward
File "D:\codes\dot\dot\utils\io.py", line 82, in write_video
write_video_to_file(video, path, channels)
File "D:\codes\dot\dot\utils\io.py", line 92, in write_video_to_file
File "C:\Users\SHUO\anaconda3\Lib\site-packages\torchvision\io\video.py", line 134, container.mux(packet)
File "av\container\output.pyx", line 211, in av.container.output.OutputContainer.mux
File "av\container\output.pyx", line 217, in av.container.output.OutputContainer.mux_one
File "av\container\output.pyx", line 172, in av.container.output.OutputContainer.start_encoding
File "av\error.pyx", line 336, in av.error.err_check
av.error.ValueError: [Errno 22] Invalid argument

the demo.py can do the tracking and refine part, takes a long time to wait, but it cannot save the final result, which is really disappointing.๐Ÿ˜ญDo you know how to solve this problem? I really what to use this to process my own video! Thank you!

Training code for optical flow and point tracking

Thank you for open-sourcing this impressive work.

The publicly available pre-trained weights contain retrained optical flow and point tracking models, are there any plans to open source the corresponding training code? It would be great to have a good code framework compatible with training for these tasks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.