Thanks you for your codes. When I use the pytorch1.6.0 branch and default setting,

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="

Hi <a class="user-mention notranslate" data-hov

Too long test time,about ppengtang/pcl.pytorch

Comments (30)

ppengtang commented on August 23, 2024

Hi, I haven't met this problem before. I have asked my friends to help me test on their machines and the test time looks normal (2-3 hours to test on VOC 2007 test). Could you check whether you run multiple experiments on the same GPU at the same time?

from pcl.pytorch.

U201714643 commented on August 23, 2024

Hi, I haven't met this problem before. I have asked my friends to help me test on their machines and the test time looks normal (2-3 hours to test on VOC 2007 test). Could you check whether you run multiple experiments on the same GPU at the same time?

Thanks you for your codes.
But I also encountered this problem.
My GPU is 3080, and the test time is also very very long(~36hours), while the training time is normal(~6hours).
My Pytroch version is 1.7.0 with CUDA 11.1(Nvidia Driver Version : 455.38).
And I used default setting. (Although I installed mmcv for CUDA 11).
Are this problem related to CUDA or driver version?
Thanks a lot.

from pcl.pytorch.

ppengtang commented on August 23, 2024

Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi?

from pcl.pytorch.

MRRRKING commented on August 23, 2024

Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi?

Yes, I followed the install.sh file. And the GPU utilization is 100%.
I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours).
I suspect the problem lies in the im_detect_bbox_aug function.

pcl.pytorch/lib/core/test.py

Line 136 in dc16cfa

def im_detect_bbox_aug(model, im, box_proposals=None):

Do you have any direction to solve the problem?

from pcl.pytorch.

ppengtang commented on August 23, 2024

Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi?

Yes, I followed the install.sh file. And the GPU utilization is 100%.
I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours).
I suspect the problem lies in the im_detect_bbox_aug function.

pcl.pytorch/lib/core/test.py

Line 136 in dc16cfa

def im_detect_bbox_aug(model, im, box_proposals=None):

Do you have any direction to solve the problem?

That's weird. Sorry I don't have 1080Ti GPUs and thus cannot reproduce the issue. Could you try to record the time cost of each part in im_detect_bbox_aug and the time cost of each line in these codes?

In addition, if TEST.BBOX_AUG.ENABLED is set to False, the test time will be reduced by about 10x, so the reasonable test time should be less than half an hour.

from pcl.pytorch.

U201714643 commented on August 23, 2024

Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi?

Yes, I followed the install.sh file. And the GPU utilization is 100%.
I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours).
I suspect the problem lies in the im_detect_bbox_aug function.

pcl.pytorch/lib/core/test.py

Line 136 in dc16cfa

def im_detect_bbox_aug(model, im, box_proposals=None):

Do you have any direction to solve the problem?

That's weird. Sorry I don't have 1080Ti GPUs and thus cannot reproduce the issue. Could you try to record the time cost of each part in im_detect_bbox_aug and the time cost of each line in these codes?

In addition, if TEST.BBOX_AUG.ENABLED is set to False, the test time will be reduced by about 10x, so the reasonable test time should be less than half an hour.

Hi.
I think this line might lead to long test time:

pcl.pytorch/lib/modeling/model_builder.py

Line 119 in dc16cfa

blob_conv = self.Conv_Body(im_data).contiguous()

This is my way to measure its run time:

    torch.cuda.synchronize()
    start = time.time()
    ############################
    blob_conv = self.Conv_Body(im_data).contiguous()
    ############################
    torch.cuda.synchronize()
    end = time.time()
    print('blob_conv = self.Conv_Body(im_data).contiguous():',end-start,'s')

And This is the result:

    blob_conv = self.Conv_Body(im_data).contiguous(): 0.16710186004638672 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.22522902488708496 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.21841096878051758 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.9169421195983887 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.9236195087432861 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 2.9725072383880615 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 2.966435432434082 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 8.325863361358643 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 8.330979108810425 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.15090179443359375 s
    INFO test_engine.py: 270: im_detect: range [1, 4952] of 4952: 1/4952 25.556s (eta: 1 day, 11:08:47)

Do you have any direction to solve the problem?

from pcl.pytorch.

ppengtang commented on August 23, 2024

Could you try to add torch.cuda.empty_cache() after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.

from pcl.pytorch.

ppengtang commented on August 23, 2024

Btw, could you also make sure to add CUDA_VISIBLE_DEVICES=0 at the beginning of the test command and do not use --multi-gpu-testing? There are some bugs in multi-gpu testing.

from pcl.pytorch.

MRRRKING commented on August 23, 2024

I test the running time in this way:
` print(target_scale)
print('*************************************')
time4 = time()
return_dict = model(**inputs)
time5 = time()
print('time5: ', time5 - time4)
# cls prob (activations after softmax)
scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()

for i in range(1, cfg.REFINE_TIMES):
    scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze()

scores /= cfg.REFINE_TIMES
# In case there is 1 proposal
scores = scores.reshape([-1, scores.shape[-1]])

time6 = time()
print('time6: ', time6 - time5)`

And the result is below:
`480

time5: 1.396902084350586
time6: 0.05057382583618164
576

time5: 0.0050432682037353516
time6: 2.703824758529663
688

time5: 0.005361795425415039
time6: 4.799558162689209
864

time5: 0.005181312561035156
time6: 11.79275107383728
1200

time5: 0.0075037479400634766
time6: 27.75927186012268
`

From the test results, time is not spent on model prediction, but on data conversion.

pcl.pytorch/lib/core/test.py

Line 109 in 0896c82

scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()

I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal.
Is it related to the pytorch version?

from pcl.pytorch.

U201714643 commented on August 23, 2024

Could you try to add torch.cuda.empty_cache() after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.

Hi,
Thanks for your advice.
But it seems not to work.
And I am sure that I have added CUDA_VISIBLE_DEVICES=0 at the beginning of the test command and not used --multi-gpu-testing.
Moreover, I think this problem might be related to vgg16.
Because, when i =5 ,line 113 will take too much time to run.

pcl.pytorch/lib/modeling/vgg16.py

Lines 111 to 114 in 4c3cfc9

    
           def forward(self, x): 
        
               for i in range(1, 6): 
        
                   x = getattr(self, 'conv%d' % i)(x) 
        
               return x

For example:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  1 )(): 0.014625310897827148 s
Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  2 )(): 0.011127233505249023 s
Sequential(
  (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  3 )(): 0.015486001968383789 s
Sequential(
  (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU(inplace=True)
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  4 )(): 0.014803886413574219 s
Sequential(
  (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  5 )(): 8.2745041847229 s
Sequential(
  (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (1): ReLU(inplace=True)
  (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (3): ReLU(inplace=True)
  (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

And this is my way to measure its run time:

    print('-------------------------------------------------------------------')
    torch.cuda.synchronize()
    start = time.time()
    ############################
    x = getattr(self, 'conv%d' % i)(x)
    ############################
    torch.cuda.synchronize()
    end = time.time()
    print('x = getattr(self, \'conv%d\' % ',i,')():',end-start,'s')
    print(getattr(self, 'conv%d' % i))
    print('-------------------------------------------------------------------')

Besides, during testing, my GPU Utilization is about 95%, and VRAM usage is about 4500MB.
Do you have any direction to solve the problem?

from pcl.pytorch.

ppengtang commented on August 23, 2024

I test the running time in this way:
` print(target_scale)
print('*************************************')
time4 = time()
return_dict = model(**inputs)
time5 = time()
print('time5: ', time5 - time4)

cls prob (activations after softmax)

scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()
for i in range(1, cfg.REFINE_TIMES):
    scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze()

scores /= cfg.REFINE_TIMES
# In case there is 1 proposal
scores = scores.reshape([-1, scores.shape[-1]])

time6 = time()
print('time6: ', time6 - time5)`
And the result is below:
`480

time5: 1.396902084350586
time6: 0.05057382583618164
576

time5: 0.0050432682037353516
time6: 2.703824758529663
688

time5: 0.005361795425415039
time6: 4.799558162689209
864

time5: 0.005181312561035156
time6: 11.79275107383728
1200

time5: 0.0075037479400634766
time6: 27.75927186012268
`

From the test results, time is not spent on model prediction, but on data conversion.

pcl.pytorch/lib/core/test.py

Line 109 in 0896c82

scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()

I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal.
Is it related to the pytorch version?

Could you try to change these codes to the following codes?

    scores = return_dict['refine_score'][0].squeeze()
    for i in range(1, cfg.REFINE_TIMES):
        scores += return_dict['refine_score'][i].squeeze()
    scores /= cfg.REFINE_TIMES
    # In case there is 1 proposal
    scores = scores.view(-1, scores.size[-1]).data.cpu().numpy()

I don't think the issue is from pytorch version. On my GPUs, I could get correct results using pytorch 1.6.0

from pcl.pytorch.

ppengtang commented on August 23, 2024

Could you try to add torch.cuda.empty_cache() after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.

pcl.pytorch/lib/modeling/vgg16.py

Lines 111 to 114 in 4c3cfc9

    
           def forward(self, x): 
        
               for i in range(1, 6): 
        
                   x = getattr(self, 'conv%d' % i)(x) 
        
               return x

For example:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  1 )(): 0.014625310897827148 s
Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  2 )(): 0.011127233505249023 s
Sequential(
  (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  3 )(): 0.015486001968383789 s
Sequential(
  (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU(inplace=True)
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  4 )(): 0.014803886413574219 s
Sequential(
  (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  5 )(): 8.2745041847229 s
Sequential(
  (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (1): ReLU(inplace=True)
  (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (3): ReLU(inplace=True)
  (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

And this is my way to measure its run time:

    print('-------------------------------------------------------------------')
    torch.cuda.synchronize()
    start = time.time()
    ############################
    x = getattr(self, 'conv%d' % i)(x)
    ############################
    torch.cuda.synchronize()
    end = time.time()
    print('x = getattr(self, \'conv%d\' % ',i,')():',end-start,'s')
    print(getattr(self, 'conv%d' % i))
    print('-------------------------------------------------------------------')

Besides, during testing, my GPU Utilization is about 95%, and VRAM usage is about 4500MB.
Do you have any direction to solve the problem?

That's weird...
Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

from pcl.pytorch.

U201714643 commented on August 23, 2024

That's weird...
Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

from pcl.pytorch.

ppengtang commented on August 23, 2024

That's weird...
Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?

from pcl.pytorch.

U201714643 commented on August 23, 2024

That's weird...
Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?

Thanks for your advice.
But I have re-trained model with dilation 1.

from pcl.pytorch.

ppengtang commented on August 23, 2024

That's weird...
Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?

Thanks for your advice.
But I have re-trained model with dilation 1.

I see. Maybe dilation 1 is the reason for performance drop.

from pcl.pytorch.

ppengtang commented on August 23, 2024

That's weird...
Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?

Thanks for your advice.
But I have re-trained model with dilation 1.

Btw, could you try to add

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

after this line of codes for dilation=2?

from pcl.pytorch.

U201714643 commented on August 23, 2024

Btw, could you try to add
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
after this line of codes for dilation=2?

Thanks for your advice.
But testing time is still 36 hours with dilation 2.

from pcl.pytorch.

MRRRKING commented on August 23, 2024

I test the running time in this way:
` print(target_scale)
print('*************************************')
time4 = time()
return_dict = model(**inputs)
time5 = time()
print('time5: ', time5 - time4)

cls prob (activations after softmax)

scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()
for i in range(1, cfg.REFINE_TIMES):
    scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze()

scores /= cfg.REFINE_TIMES
# In case there is 1 proposal
scores = scores.reshape([-1, scores.shape[-1]])

time6 = time()
print('time6: ', time6 - time5)`
And the result is below:
480 time5: 1.396902084350586 time6: 0.05057382583618164 576 time5: 0.0050432682037353516 time6: 2.703824758529663 688 time5: 0.005361795425415039 time6: 4.799558162689209 864 time5: 0.005181312561035156 time6: 11.79275107383728 1200 time5: 0.0075037479400634766 time6: 27.75927186012268
From the test results, time is not spent on model prediction, but on data conversion.

pcl.pytorch/lib/core/test.py

Line 109 in 0896c82

scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()

I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal.
Is it related to the pytorch version?
Could you try to change these codes to the following codes?
    scores = return_dict['refine_score'][0].squeeze()
    for i in range(1, cfg.REFINE_TIMES):
        scores += return_dict['refine_score'][i].squeeze()
    scores /= cfg.REFINE_TIMES
    # In case there is 1 proposal
    scores = scores.view(-1, scores.size[-1]).data.cpu().numpy()
I don't think the issue is from pytorch version. On my GPUs, I could get correct results using pytorch 1.6.0

I replaced these codes, but it didn't work.
The testing time is normal with dilation 1, but the mAP is lower too.

from pcl.pytorch.

U201714643 commented on August 23, 2024

I replaced these codes, but it didn't work.
The testing time is normal with dilation 1, but the mAP is lower too.

Are you using pytorch 1.7.0 or 1.7.1？

from pcl.pytorch.

Glutton-zh commented on August 23, 2024

Hello, I meet the same problem in the test.
I use gtx1080ti, pytorch 1.6.0
but I made the following changes in install.sh:
①pip --no-cache-dir install mmcv-full==latest+torch1.6.0+cu101 -f https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html
Change to
pip install mmcv-full -f https://download.openmmlab.oss.com/mmcv/dist/cu101/torch1.6.0/index.html (according to the official)
(because I always report mistakes in the original method)
②pip --no-cache-dir install numpy==1.16.0
Change to pip -- no cache dir install numpy==1.19.5
(for the reason when trying to solve the problem of mmcv, the environment always downloads 1.16.0 first, then automatically deletes it and uses 1.19.5.)
After that, training for 13 hours is normal, and the test showed that it was expected to be 6 days.
I don't know if these will lead to problems in the test.

from pcl.pytorch.

ppengtang commented on August 23, 2024

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

from pcl.pytorch.

MRRRKING commented on August 23, 2024

I replaced these codes, but it didn't work.
The testing time is normal with dilation 1, but the mAP is lower too.

Are you using pytorch 1.7.0 or 1.7.1？

No, I use pytorch 1.6.0.

from pcl.pytorch.

U201714643 commented on August 23, 2024

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

Thanks for your advice.
It works.
Now mAP is 51.7%, and CorLoc is 68.2%.(Model is trained with with dilation 2)
In addition, testing time is about 80 minutes.
Besides, VRAM usage ranges between 7500MB and 9500MB, which is more than testing with cudnn.

from pcl.pytorch.

MRRRKING commented on August 23, 2024

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

It works. Thanks a lot.
Now the testing time is about 2.5 hours, and mAP is 51.9.

from pcl.pytorch.

Glutton-zh commented on August 23, 2024

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

Thanks for your help!
It works. the testing time is normal about 2h24min,
Mean AP = 0.5231

from pcl.pytorch.

U201714643 commented on August 23, 2024

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

Thanks for your help!
It works. the testing time is normal about 2h24min,
Mean AP = 0.5231

Hi,
Did you re-train your model with cudnn disabled?

from pcl.pytorch.

ppengtang commented on August 23, 2024

Great! Thanks for helping to debug! It is unnecessary to re-train the model with cudnn disabled. Btw, you could try different random seeds (1~10) by changing cfg.RNG_SEED to reproduce the reported numbers.

from pcl.pytorch.

Glutton-zh commented on August 23, 2024

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

Thanks for your help!
It works. the testing time is normal about 2h24min,
Mean AP = 0.5231

Hi,
Did you re-train your model with cudnn disabled?

i just add "torch.backends.cudnn.enabled = False" in test_net, and all other codes are default

from pcl.pytorch.

hpu-dxx commented on August 23, 2024

您是否还可以尝试在这行代码torch.backends.cudnn.enabled = False之后添加膨胀= 2？其他人在某些 GPU 卡上观察到类似的扩张卷积速度低的问题：https ://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

谢谢你的帮助！有用。测试时间正常约2h24min，平均AP = 0.5231

您好！请问您知道怎么可视化检测结果吗？

from pcl.pytorch.

Too long test time about pcl.pytorch HOT 30 CLOSED

Comments (30)

cls prob (activations after softmax)

cls prob (activations after softmax)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def forward(self, x):
	for i in range(1, 6):
	x = getattr(self, 'conv%d' % i)(x)
	return x