Giter Site home page Giter Site logo

Comments (6)

cdoersch avatar cdoersch commented on July 28, 2024

I suspect the problem is that the first time you run any jit-compiled JAX function (or change the sizes/types of any arguments), it needs to recompile, and JAX compilation is very slow (and somewhat slower on GPU than CPU). Once it's compiled, the GPU version should be much faster.

Maybe try running the same number of points on a video of the same size multiple times in a loop, and see if subsequent iterations are faster? If you think TAPIR's compilation is too slow, I'd suggest submitting a bug to JAX itself to get some experienced people to look at the compilation. I'd happily +1 that bug :-)

from tapnet.

Wuziyi616 avatar Wuziyi616 commented on July 28, 2024

@cdoersch thank you so much for your detailed reply! Yes JIT makes perfect sense. I ran the same inference code with a 10-time for-loop. The first run is still slow, and the later 9 runs are much faster! Taking <0.2s for each run. Amazing!

Regarding the number of points (let's call it N), indeed, I understand the code has to be re-compiled every time the input shape changes. In my case, the number of points varies between videos, but the shape of the video is always the same. I look at the code of TAP-Net:

  • The feature extractor tsm_resnet seems to be invariant of the N, so I guess it doesn't need to be re-compiled?
  • The other operations such as transform and interpolate is dependent on N, so I assume they need to be re-complied?

I'm thinking of padding the number of points to a fixed number, so that the input shape is always the same. Is that valid? I.e. is the tracking of one point independent from other points, so I can just pad zeros-points to the input, and get the same results for valid points as the results with no padding?

from tapnet.

cdoersch avatar cdoersch commented on July 28, 2024

Yes, padding with zeros makes sense in this situation. For TAPIR, the results you get for a single query point are independent of the other query points in the batch.

Note that if the number of points is large, you probably want to take care with the query_chunk_size parameter. Smaller values will use less memory, but at the cost of a larger computational graph and therefore longer compile times. Typically you want to use the largest value that you can without running out of memory.

from tapnet.

Wuziyi616 avatar Wuziyi616 commented on July 28, 2024

Thanks a lot! Just wanna confirm, is the per-point result of TAP-Net also independent of the other query points? (according to the paper this is true)

Feel free to close the issue when you reply. I really appreciate your help.

from tapnet.

cdoersch avatar cdoersch commented on July 28, 2024

Yes, it's also true for TAP-Net, although I'm curious why you would want to use TAP-Net now that TAPIR is out.

from tapnet.

Wuziyi616 avatar Wuziyi616 commented on July 28, 2024

Thanks, that's simply because according to Table 9 of the TAPIR paper, TAP-Net runs much faster than TAPIR, and my application requires 50~100Hz speed. Also, my videos are from some simple RL environments without big motions, severe occlusions, etc. That being said, I'll definitely try TAPIR later to see if it's fast enough : )

from tapnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.