I have a question when I try to train a model based on my own dataset which is in voc

how to train this model about scl HOT 9 CLOSED

Fly-dream12 commented on May 20, 2024

how to train this model

from scl.

Comments (9)

harsh-99 commented on May 20, 2024

Hi,
Did you tried to train the model on any of the given dataset with vgg backbone? I think it would be better if you try that once just to ensure things are working on your system as I have not faced this problem till now.
Apart from that we have provided the direction for training on new dataset -> Just one addition in readme update both lib/model/utils/parser_func.py as well as lib/model/utils/parser_func_multi.py. I will integrate the code so from now ownwards there is just one update required.
I hope above thing help, however if it doesn't let me know.
Thanks

from scl.

Fly-dream12 commented on May 20, 2024

I have changed the corresponding path in those functions and ensure they are right. May be it is too slow to train so it didn't appear anything. Since I have not downloaded the given dataset, could you give the link of the typical dataset so I can have a try.

from scl.

Fly-dream12 commented on May 20, 2024

I continue to debug this project and find the image is not righted loaded. Then I encountered another problem when do RCNN_roi_crop in the forward process of faster_rcnn_SCL in this line:
pooled_feat = self.RCNN_roi_crop(base_feat, Variable(grid_yx).detach())

The error is:
torch.FatalError: aborting at /data/ztc/jinke/faster-rcnn.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c:49
May i did not compile a right C file, could you help ? @ harsh-99

from scl.

harsh-99 commented on May 20, 2024

I believe that you have not compiled all the files. Please make sure you have correct cuda and pytorch version and then follow -:
cd lib
sh make.sh
If you get some error while compiling, let me know.

from scl.

Fly-dream12 commented on May 20, 2024

Now I can train this model, but the loss of rpn_cls becomes nan when epoch 1 iter 100/10000. By the way, i have decreased the learning rate to 0.0002. So what can the problem be? @harsh-99

from scl.

harsh-99 commented on May 20, 2024

That usually occurs when the labelled dataset have some bounding box which have few indices in negative. There are few threads who have faced same problem while training object detection module.
Please refer to this -:
jwyang/faster-rcnn.pytorch#136

from scl.

Fly-dream12 commented on May 20, 2024

May be in lib/dataset/pascal_voc.py, the corresponding code should be added.
if x1 < 0 or y1 < 0:
continue
if abs(x1 - x2) <= 100 or abs(y1-y2) <= 100:
continue
Moreover, should the learning rate be adjusted?

from scl.

harsh-99 commented on May 20, 2024

Hi,
I have never faced any problems because of the learning rate and I have initialiesed lr in range of 1e-2 to 1e-4. About adding the given lines in the code, since that is dependent on dataset I don't think that's necessay to add for the dataset I have written code since I have not faced any such problems and if one follows the same instructions they also won't have any issues.

In case if you have any other issue please let me know else it would be great if you can close the issue.

from scl.

Fly-dream12 commented on May 20, 2024

When I have trained a model and begin to test, the checkpoint can't be loaded rightly like this:
RuntimeError: Error(s) in loading state_dict for vgg16:
Missing key(s) in state_dict: "netD.conv1.weight", "netD.bn1.weight", "netD.bn1.bias", "netD.bn1.running_mean", "netD.bn1.running_var", "netD.conv2.weight", "netD.bn2.weight", "netD.bn2.bias", "netD.bn2.running_mean", "netD.bn2.running_var", "netD.conv3.weight", "netD.bn3.weight", "netD.bn3.bias", "netD.bn3.running_mean", "netD.bn3.running_var", "netD.fc.weight", "netD.fc.bias", "netD_pixel.conv1.weight", "netD_pixel.conv2.weight", "netD_pixel.conv3.weight".
Unexpected key(s) in state_dict: "netD_img.conv_image.weight", "netD_img.conv_image.bias", "netD_img.bn_image.weight", "netD_img.bn_image.bias", "netD_img.bn_image.running_mean", "netD_img.bn_image.running_var", "netD_img.bn_image.num_batches_tracked", "netD_img.fc_1_image.weight", "netD_img.fc_1_image.bias", "netD_img.bn_2.weight", "netD_img.bn_2.bias", "netD_img.bn_2.running_mean", "netD_img.bn_2.running_var", "netD_img.bn_2.num_batches_tracked", "netD_inst.fc_1_inst.weight", "netD_inst.fc_1_inst.bias", "netD_inst.fc_2_inst.weight", "netD_inst.fc_2_inst.bias", "netD_inst.bn.weight", "netD_inst.bn.bias", "netD_inst.bn.running_mean", "netD_inst.bn.running_var", "netD_inst.bn.num_batches_tracked".

What may be the reason for this phenomenon @harsh-99

from scl.

how to train this model about scl HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent