First of all, thank you so much for releasing this great work. I test it on my custom

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

GPU JAX does not speed up the inference of TAP-Net? about tapnet HOT 6 CLOSED

google-deepmind commented on July 28, 2024

GPU JAX does not speed up the inference of TAP-Net?

from tapnet.

Comments (6)

cdoersch commented on July 28, 2024

I suspect the problem is that the first time you run any jit-compiled JAX function (or change the sizes/types of any arguments), it needs to recompile, and JAX compilation is very slow (and somewhat slower on GPU than CPU). Once it's compiled, the GPU version should be much faster.

Maybe try running the same number of points on a video of the same size multiple times in a loop, and see if subsequent iterations are faster? If you think TAPIR's compilation is too slow, I'd suggest submitting a bug to JAX itself to get some experienced people to look at the compilation. I'd happily +1 that bug :-)

from tapnet.

Wuziyi616 commented on July 28, 2024

@cdoersch thank you so much for your detailed reply! Yes JIT makes perfect sense. I ran the same inference code with a 10-time for-loop. The first run is still slow, and the later 9 runs are much faster! Taking <0.2s for each run. Amazing!

Regarding the number of points (let's call it N), indeed, I understand the code has to be re-compiled every time the input shape changes. In my case, the number of points varies between videos, but the shape of the video is always the same. I look at the code of TAP-Net:

The feature extractor tsm_resnet seems to be invariant of the N, so I guess it doesn't need to be re-compiled?
The other operations such as transform and interpolate is dependent on N, so I assume they need to be re-complied?

I'm thinking of padding the number of points to a fixed number, so that the input shape is always the same. Is that valid? I.e. is the tracking of one point independent from other points, so I can just pad zeros-points to the input, and get the same results for valid points as the results with no padding?

from tapnet.

cdoersch commented on July 28, 2024

Yes, padding with zeros makes sense in this situation. For TAPIR, the results you get for a single query point are independent of the other query points in the batch.

Note that if the number of points is large, you probably want to take care with the query_chunk_size parameter. Smaller values will use less memory, but at the cost of a larger computational graph and therefore longer compile times. Typically you want to use the largest value that you can without running out of memory.

from tapnet.

Wuziyi616 commented on July 28, 2024

Thanks a lot! Just wanna confirm, is the per-point result of TAP-Net also independent of the other query points? (according to the paper this is true)

Feel free to close the issue when you reply. I really appreciate your help.

from tapnet.

cdoersch commented on July 28, 2024

Yes, it's also true for TAP-Net, although I'm curious why you would want to use TAP-Net now that TAPIR is out.

from tapnet.

Wuziyi616 commented on July 28, 2024

Thanks, that's simply because according to Table 9 of the TAPIR paper, TAP-Net runs much faster than TAPIR, and my application requires 50~100Hz speed. Also, my videos are from some simple RL environments without big motions, severe occlusions, etc. That being said, I'll definitely try TAPIR later to see if it's fast enough : )

from tapnet.

GPU JAX does not speed up the inference of TAP-Net? about tapnet HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent