When I ran several replicates on several GPUs, the ROI pooling speed will decrease aft

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

No. The kernel code or ROI Pooling is copied from <a href="https://github.com/CharlesS

ROI pooling Speed deteriorate after tens of thousands iterations about faster_rcnn_pytorch HOT 15 CLOSED

longcw commented on July 29, 2024

ROI pooling Speed deteriorate after tens of thousands iterations

from faster_rcnn_pytorch.

Comments (15)

acgtyrant commented on July 29, 2024 1

两位大佬之间的对话！

The dialogue between the two bosses!

from faster_rcnn_pytorch.

drcege commented on July 29, 2024 1

@longcw Shouldn‘t the data order used in PyTorch be NCHW ?

from faster_rcnn_pytorch.

longcw commented on July 29, 2024

Will this problem appear when running on only one GPU?
Since I only have a GPU that I cannot reproduce your problem.
Maybe you can offer me some debug information and I am very willing to help you.

from faster_rcnn_pytorch.

yikang-li commented on July 29, 2024

I don't know any more information. The only problem is when I time all the operations, I find that the ROI Pooling time consuming grows a lot. Currently, when I change the -arch parameters from sm_35 to sm_52, it seems OK so far. Do you know what is the parameter for?

Best,

from faster_rcnn_pytorch.

longcw commented on July 29, 2024

The -arch compiler option specifies the compute capability that is assumed when compiling C to PTX code.

You can find the compute capability of your GPU in https://developer.nvidia.com/cuda-gpus

from faster_rcnn_pytorch.

yikang-li commented on July 29, 2024

Actually, it has nothing to do with --arch parameter. The per-iter-time is still doubled after about 50000 iterations.

from faster_rcnn_pytorch.

yikang-li commented on July 29, 2024

Did you implement ROI Pooling layer by yourself?

from faster_rcnn_pytorch.

longcw commented on July 29, 2024

No. The kernel code or ROI Pooling is copied from CharlesShang and smallcorgi.
I have no idea about your problem because the code itself is really easy to understand. I only changed the index expression since pytorch uses an order of [batch, c, w, h]:

    // int bottom_index = (h * width + w) * channels + c;
    int bottom_index = (c * height + h) * width + w;

You can see the code here: https://github.com/longcw/faster_rcnn_pytorch/blob/master/faster_rcnn/roi_pooling/src/cuda/roi_pooling_kernel.cu

from faster_rcnn_pytorch.

yikang-li commented on July 29, 2024

I saw a commit named 'fix the memory leak for ROI pool module'. If you don't mind, could you give me some detailed information about that?

Best,
Yikang

from faster_rcnn_pytorch.

longcw commented on July 29, 2024

I misunderstood torch.autograd.Function and used it as a Module when I first time implemented the ROI Pooling layer. The memory used by Function will not release after each iteration if I use the Function as a class member variable.

# faster_rcnn/roi_pooling/modules/roi_pool.py
# self.roi_pool = RoIPoolFunction(...) # wrong
# return self.roi_pool(features, rois)
return RoIPoolFunction(...)(features, rois) # right

from faster_rcnn_pytorch.

longcw commented on July 29, 2024

I trained Faster RCNN for 100k iterations on a GTX 1080 without speed decreasing.
Maybe your problem is a bug of PyTorch in multi-gpu. Did you try to stop and restore it after 50k iterations?

from faster_rcnn_pytorch.

yikang-li commented on July 29, 2024

Yes, currently, my strategy is to snapshot the model every 10k iters. When the speed deteriorates, I restart training from the snapshot.

So you are studying in Tsinghua? I was there in EE Dept.

from faster_rcnn_pytorch.

longcw commented on July 29, 2024

Yes, I am in CS Dept.

from faster_rcnn_pytorch.

yikang-li commented on July 29, 2024

Thank you very much. Sorry for the late reply due to the ICCV deadline.

Are you going to submit any paper to ICCV?

from faster_rcnn_pytorch.

longcw commented on July 29, 2024

No. I look forward to reading your paper in ICCV 😀.

from faster_rcnn_pytorch.

ROI pooling Speed deteriorate after tens of thousands iterations about faster_rcnn_pytorch HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent