longcw / faster_rcnn_pytorch Goto Github PK

View Code? Open in Web Editor NEW

1.7K 52.0 466.0 618 KB

Faster RCNN with PyTorch

License: MIT License

Python 89.19% C++ 0.03% Cuda 2.66% C 3.64% Shell 0.09% Cython 4.39%

pytorch detection faster-rcnn computer-vision

faster_rcnn_pytorch's Introduction

Hi there 👋

🌱 I received my PhD in Computer Science from Tsinghua University in June 2021, supervised by Prof. Haizhou Ai, and my BE in Electronic Information Engineering from Huazhong University of Science and Technology in July 2016.
Google Scholar

faster_rcnn_pytorch's People

Contributors

Stargazers

Watchers

Forkers

baiyancheng20 benjamesbabala manila95 allensmile junonia wangg12 ifighting jorisgu wzppengpeng nmonarizqa allankevinrichie wangsd01 chenbangfeng emailhy soonminhwang statml djfan ataraxialab vikingmew ceberly odegeasslbc rdfong dengcy028 cthorey qijiezhao kyocen whqfrzky xmyqsh glebalshanskii iamweiliu xibinyue rowanz scitao mitchell-dawson andfoy kongsea naivescript konglongteng tocigm kywang wiibrew warehouse1992 javiribera miudodo zhangkaij opencvfun nelasal-miovision searobbersduck ginobilinie wk910930 lishiting lixiaosi33 nianfudong archive-git-repo lauzp lematt1991 zhangkanglong sanghoon redheli mo01782964 walkoncross skybirdhe zhangxd12 zxcv0258tw fage2016 rizhiy xiaoerlaigeid adityaarun1 andreaazzini silversparrotech mercileesb wjzhang392 diegslva sojvai colinsongf jinsu35 mchorton hli2020 venice-erin cadene junhocho braveapple zhefan phimachine csgwon mendel1 soledad89 svishwa huixudeng soumenms2015 orangehdc cshaoping yuhaozh88 geeky-bit research-ai philokey lpplbiubiubiub feynman27 guoyunzhese sundrops

faster_rcnn_pytorch's Issues

ROI pooling for fast rcnn

Hi, @longcw , could you pls provide any hint of adapting the ROI pooling function for fast rcnn training? Thanks a lot !

ImportError: cannot import name cython_nms

I am trying the demo script, but got this error: ImportError: cannot import name cython_nms . Any suggestions? Thanks!

Question about the reslute mAP.

Does anyone get the comparable mAP on VOC07 in the origin paper?

Segmentation fault

Hi,
I use python2.7 and pytorch 0.1.12.post2 with cuda 8.0.

Then I run the python train.py. I got something like that.

Here is the gdb results:

Tree Trace:

Could anybody have some ideas? Seems like a opencv issue?

Thank you!

Why do you jittere all_rois in proposal_target_layer.py?

I find that _jitter_gt_boxes functions does not exist in origin py_faster_rcnn.

when I run the train.py, the error
ImportError: /usr/share/Anaconda2/lib/python2.7/site-packages/torch/lib/libgomp.so.1: version GOMP_4.0' not found (required by /opt/OpenBLAS/lib/libopenblas.so.0)
` happens. How can I fix it?

Adding new dataset

How should I add new dataset to train on?

I am confused by all the functions in each python file. Which ones do I need to replace?

mAP

Hi,

Would you mind to show the mAP of the provided model file?

Thanks a lot

Why don‘t you just use torchvision's own vgg16 weights

why regression loss divide by fg_cnt?

rpn_loss_box = F.smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, size_average=False) / (fg_cnt + 1e-4)
loss_box = F.smooth_l1_loss(bbox_pred, bbox_targets, size_average=False) / (fg_cnt + 1e-4)

I can see regression loss divided by fg_cnt
why regression loss divide by fg_cnt?
Is there anyone know? explan please

error while running make.sh file in windows

Please tell how to run this code in windows. I ran make.sh file using GIT, but throws an error while compiling. Do we have to change anything in the make.sh file before running.Thanks in advance...

roi_pooling.cu.o: No such file or directory

gcc: error: /xx/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/cuda/roi_pooling.cu.o: No such file or directory

when run ./make.sh in the path /faster_rcnn_pytorch/faster_rcnn.

How to support multigpu?

Hello,
How to modify the repo to support multi gpu training?

error when using tensorboard with crayon: ValueError: The server at 127.0.0.1:8889 does not appear to be up!

I just run:
jcc@jcc:/ground/faster_rcnn_pytorch$ sudo docker pull alband/crayon
[sudo] password for jcc:
latest: Pulling from alband/crayon
96a1ef3ccac0: Already exists
2415c9cbee29: Already exists
285141620f12: Already exists
cdc8a9d219b0: Already exists
c6fe6b5c116d: Already exists
0d3a52687ca5: Already exists
01fadcdd7016: Already exists
1c51efe27450: Already exists
5311a51d16d9: Already exists
22f8535e4e77: Already exists
1e29b8d70f1f: Already exists
49d0cfbbaa13: Already exists
0cf5e84a5a83: Already exists
7838253d03e4: Already exists
ac59d1ca8ffb: Already exists
967818399371: Already exists
7b593cbb34a0: Already exists
fd1d1ebbaede: Already exists
1ad735e2479d: Already exists
baae22602681: Already exists
400ae69c422f: Already exists
d567ddfceb7d: Already exists
9a8eff98c4e4: Already exists
19818a1d0e86: Already exists
Digest: sha256:33fbe35a1af8b3591e7ec52c57546e34f23813f14e9689cc9d3619c705412c0e
Status: Image is up to date for alband/crayon:latest
jcc@jcc:/ground/faster_rcnn_pytorch

and then I run "python train.py" and got this error:

voc_2007_trainval gt roidb loaded from /home/jcc/ground/Kaggle/NOAA_SeaLion/faster_rcnn_pytorch/data/cache/voc_2007_trainval_gt_roidb.pkl
Traceback (most recent call last):
File "train.py", line 96, in
cc = CrayonClient(hostname='127.0.0.1')
File "/home/jcc/anaconda2/lib/python2.7/site-packages/pycrayon/crayon.py", line 37, in init
raise ValueError(msg.format(self.hostname, self.port))
ValueError: The server at 127.0.0.1:8889 does not appear to be up!

Anyone helps?

Transform to train images

I've seen other implementations allow random transformations to train images like rotations, horizontal shifts and zooms. Can you help me with this or direct me to some thread or example code which does this?

run make.sh how we can choose which GPU to use

when i just run make.sh, i can only use gpu 0 which is default, how can i choose which gpu to use? please someone could help me

ROI pooling layer only supports the case that batch size equals to 1?

From the source code(roi_pooling_cuda.c) and my naive experiments, it seems that the RoI pooling layer only support batch size equals to one. Does anyone know why?

Did anyone succeed training this on own dataset?

AssertionError: Single batch only

<bound method pascal_voc.default_roidb of <faster_rcnn.datasets.pascal_voc.pascal_voc object at 0x7fd20fbaf0d0>>
voc_2007_trainval gt roidb loaded from /home/ubuntu/faster_rcnn_pytorch/data/cache/voc_2007_trainval_gt_roidb.pkl
Traceback (most recent call last):
File "train.py", line 115, in
blobs = data_layer.forward()
File "/home/ubuntu/faster_rcnn_pytorch/faster_rcnn/roi_data_layer/layer.py", line 74, in forward
blobs = self._get_next_minibatch()
File "/home/ubuntu/faster_rcnn_pytorch/faster_rcnn/roi_data_layer/layer.py", line 70, in _get_next_minibatch
return get_minibatch(minibatch_db, self._num_classes)
File "/home/ubuntu/faster_rcnn_pytorch/faster_rcnn/roi_data_layer/minibatch.py", line 39, in get_minibatch
assert len(im_scales) == 1, "Single batch only"
AssertionError: Single batch only
ubuntu@ip-172-31-26-170:~/faster_rcnn_pytorch$

the given numpy array has zero-sized dimensions.

File "/home/luhongchao/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/luhongchao/pytorch/faster_rcnn_pytorch/faster_rcnn/faster_rcnn.py", line 71, in forward
cfg_key, self._feat_stride, self.anchor_scales)
File "/home/luhongchao/pytorch/faster_rcnn_pytorch/faster_rcnn/faster_rcnn.py", line 123, in proposal_layer
x = network.np_to_variable(x, is_cuda=True)
File "/home/luhongchao/pytorch/faster_rcnn_pytorch/faster_rcnn/network.py", line 86, in np_to_variable
v = Variable(torch.from_numpy(x).type(dtype))
RuntimeError: the given numpy array has zero-sized dimensions. Zero-sized dimensions are not supported in PyTorch

Is there anyone got this problem ? how should I solve this

How could I figure out the code is python2 or 3?

In this project, sometimes it is written that print('Including CUDA code.')
and sometimes it is print 'anchor:'
I'm very confused about the version of python. And my anaconda, pytorch all depend on a certain python version. Please help. Thanks a lot.

unable to download VGGnet_fast_rcnn_iter_70000.h5

how to download it?

Meet error when running the demo.py

After run sh make.sh and download the pretrained VGGnet, I run the demo.py, But I met this following error:

 	[zqj@icst2 faster_rcnn_pytorch-master]$ python demo.py 
Traceback (most recent call last):
  File "demo.py", line 46, in <module>
	test()
  File "demo.py", line 19, in test
	network.load_net(model_file, detector)
  File "/S2/MI/zqj/temporal_action_localization/faster_rcnn_pytorch-master/faster_rcnn/network.py", line 48, in load_net
	param = torch.from_numpy(np.asarray(h5f[k]))
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/_objects.c:2846)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/_objects.c:2804)
  File "/opt/anaconda2/lib/python2.7/site-packages/h5py/_hl/group.py", line 169, in __getitem__
	oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/_objects.c:2846)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/_objects.c:2804)
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/home/ilan/minonda/conda-bld/h5py_1490028130695/work/h5py/h5o.c:3740)
KeyError: 'Unable to open object (Bad symbol table node signature)'

what's wrong exactly?

cudaCheckError() failed : invalid device function

when i run the demo.py, i met this error:
"load model successfully!
cudaCheckError() failed : invalid device function"

do you know why?
i can use pytorch to train other models, so the installation is correct.
thanks

ROI pooling Speed deteriorate after tens of thousands iterations

When I ran several replicates on several GPUs, the ROI pooling speed will decrease after tens of thousands of iterations.

Usually, several, but not all, replicates will be slowed down due to the ROI Pooling operations. Also, the GPU usage will be also lowered at the same time.

Thank you very much.

I really need your help.

Best,
Yikang

Wrong format for bounding boxes

It seems that the network uses x1,y1,x2,y2 format for bounding boxes instead of x,y,w,h used in the paper. I think this is a pretty major difference that can affect training accuracy.

In x,y,w,h format two coordinates are used for centering and two for size, which presents clear separation and can be debugged easily. In the current format, all four coordinates are used for both centering and size, which makes it more difficult to debug.

IOError: [Errno 2] No such file or directory: 'data/pretrained_model/VGG_imagenet.npy'

Hi ..I am facing problem while training the network.

IOError: [Errno 2] No such file or directory: 'data/pretrained_model/VGG_imagenet.npy'

any suggestions or comments???

the problem with running demo.py

~/faster_rcnn_pytorch$ python demo.py
Traceback (most recent call last):
File "demo.py", line 3, in
import torch
File "/home/user/faster_rcnn_pytorch/lfaster_rcnn/network.py", line 1, in
import torch
File "/home/user/anaconda2/lib/python2.7/site-packages/torch/init.py", line 53, in
from torch._C import *
ImportError: dlopen: cannot load any more object with static TLS

could you how to do that?

Please change read.md file and say that the ./make.sh should change arc depending on the gpu used

The training process is not the same as the paper.

The paper use as training method called "Alternating Training", but the code just train the model end to end.

why multiple cpus are used in .cuda() mode

Hello,

I migrate the model to gpu. however it still uses cpu. any idea why that is the case?

thanks!

Output from the reg layer in RPN.

The reg layer in RPN has 4k outputs encoding the coordinates of k boxes.(from the paper)
So what actually this layer predicts?Is it predicting the 4 coordinates of boundary boxes directly or predicting the 4 parameterized coordinates of the bounding box.

Any help is appreciated.

Problem: Index tensor must have same dimensions as output tensor

Traceback (most recent call last):
File "train.py", line 154, in
loss.backward()
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py", line 152, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/init.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/function.py", line 90, in apply
return self.forward_cls.backward(self, *args)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/functions/reduce.py", line 176, in backward
grad_input.scatter(dim, indices, grad_output)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py", line 655, in scatter
return Scatter.apply(self, dim, index, source, True)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/functions/tensor.py", line 540, in forward
return input.scatter(ctx.dim, index, source)
RuntimeError: Index tensor must have same dimensions as output tensor at /home/pytorch/torch/lib/TH/generic/THTensorMath.c:454

I'm not sure how/why this error occurs, what should i do to make it work.
Thanks!

make.sh error

Hi! I encounter an error when I tried to run make.sh file. It says:
gcc: error: /home/putama/Documents/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/cuda/roi_pooling.cu.o: No such file or directory Traceback (most recent call last): File "build.py", line 34, in <module> ffi.build() File "/home/putama/PutamaLab/anaconda/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 164, in build _build_extension(ffi, cffi_wrapper_name, target_dir, verbose) File "/home/putama/PutamaLab/anaconda/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 100, in _build_extension ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname) File "/home/putama/PutamaLab/anaconda/lib/python2.7/site-packages/cffi/api.py", line 672, in compile compiler_verbose=verbose, debug=debug, **kwds) File "/home/putama/PutamaLab/anaconda/lib/python2.7/site-packages/cffi/recompiler.py", line 1475, in recompile compiler_verbose, debug) File "/home/putama/PutamaLab/anaconda/lib/python2.7/site-packages/cffi/ffiplatform.py", line 29, in compile outputfilename = _build(tmpdir, ext, compiler_verbose, debug) File "/home/putama/PutamaLab/anaconda/lib/python2.7/site-packages/cffi/ffiplatform.py", line 65, in _build raise VerificationError('%s: %s' % (e.__class__.__name__, e)) cffi.ffiplatform.VerificationError: LinkError: command 'gcc' failed with exit status 1
Do you have any idea what's happening? Thanks

Does data_layer offer gt_ishared or dontcare_areas in voc_2007_trainval?

If not, I will ignore them!

NVCC problem

when compiling Cython with cuda 8.0(cudnn included), I met this problem. Can anyone help me solve this? My python version is 2.7 and maybe that's the source of my pro...

running build_ext skipping 'utils/bbox.c' Cython extension (up-to-date) skipping 'utils/nms.c' Cython extension (up-to-date) skipping 'nms/cpu_nms.c' Cython extension (up-to-date) skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date) skipping 'pycocotools/_mask.c' Cython extension (up-to-date) Compiling roi pooling kernels by nvcc... ./make.sh: 行 10: nvcc: 未找到命令 Including CUDA code. /home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling generating /tmp/tmpEk3p3S/_roi_pooling.c running build_ext building '_roi_pooling' extension creating home creating home/e1126 creating home/e1126/faster_rcnn_pytorch creating home/e1126/faster_rcnn_pytorch/faster_rcnn creating home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling creating home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/e1126/anaconda2/include/python2.7 -c _roi_pooling.c -o ./_roi_pooling.o gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/e1126/anaconda2/include/python2.7 -c /home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/roi_pooling.c -o ./home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/roi_pooling.o gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/e1126/anaconda2/include/python2.7 -c /home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/roi_pooling_cuda.c -o ./home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/roi_pooling_cuda.o gcc -pthread -shared -L/home/e1126/anaconda2/lib -Wl,-rpath=/home/e1126/anaconda2/lib,--no-as-needed ./_roi_pooling.o ./home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/roi_pooling.o ./home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/roi_pooling_cuda.o /home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/cuda/roi_pooling.cu.o -L/home/e1126/anaconda2/lib -lpython2.7 -o ./_roi_pooling.so gcc: error: /home/e1126/faster_rcnn_pytorch/faster_rcnn/roi_pooling/src/cuda/roi_pooling.cu.o: 没有那个文件或目录 Traceback (most recent call last): File "build.py", line 34, in <module> ffi.build() File "/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 164, in build _build_extension(ffi, cffi_wrapper_name, target_dir, verbose) File "/home/e1126/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 100, in _build_extension ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname) File "/home/e1126/anaconda2/lib/python2.7/site-packages/cffi/api.py", line 672, in compile compiler_verbose=verbose, debug=debug, **kwds) File "/home/e1126/anaconda2/lib/python2.7/site-packages/cffi/recompiler.py", line 1475, in recompile compiler_verbose, debug) File "/home/e1126/anaconda2/lib/python2.7/site-packages/cffi/ffiplatform.py", line 29, in compile outputfilename = _build(tmpdir, ext, compiler_verbose, debug) File "/home/e1126/anaconda2/lib/python2.7/site-packages/cffi/ffiplatform.py", line 65, in _build raise VerificationError('%s: %s' % (e.__class__.__name__, e)) cffi.ffiplatform.VerificationError: LinkError: command 'gcc' failed with exit status 1

In anchor_target_layer.py,there is from ..utils.cython_bbox import bbox_overlaps, bbox_intersections, but cython_bbox doesn't exit!!

能不能使用CPU ONLY版本

由于没有CUDA(也没办法装上CUDA)，所以在./make.sh时出错，想问问有没有CPU ONLY的解决办法。希望回复，万分感谢。

How to train the model

thanks for your share, But how to train the model, would you share the files?

Thank you very much

How to use just RPN for text detection

Hi,
I want to use RPN only and no RCNN to generate region proposals for text regions. Is there a way to do it?

cudaCheckError failed

Hi I have a cuda card K40, I compiled the pytorch and this repository correctly. I ran train.py and I got this error:

cudaCheckError() failed : invalid device function

any idea?

can not import bbox_overlaps

I'am trying to run pascal_voc.py,and I got this Error:

from .imdb import imdb
ValueError: Attempted relative import in non-package

Then I go to imdb file and I got another Error:
from ..utils.cython_bbox import bbox_overlaps
ValueError: Attempted relative import in non-package

I'am sure that the cython_bbox.so do exist , but it didn't work.

How to extract feature from a given ROI?

Hi there, I got some problem extracting feature from a given roi. The code I wrote is

    def _im_exfeat(self, image, roi):
        """
        image: ( ndarray ) (H x W x 3 )
        roi: (ndarray) (1 x 4) [x1, y1, x2, y2] 
        """
        im_data, im_scales = self.get_image_blob(image)
        roi = np.hstack([np.zeros((1, 1)), roi]) 
        roi = network.np_to_variable(roi, is_cuda=True) * im_scales[0]

        im_data = network.np_to_variable(im_data, is_cuda=True)
        im_data = im_data.permute(0, 3, 1, 2)
        features = self.rpn.features(im_data)
        pooled_features = self.roi_pool(features, roi)

        x = pooled_features.view(pooled_features.size()[0], -1)
        x = self.fc6(x)
        x = self.fc7(x)

        return x

It is a method inside the FasterRCNN class. What I'm not sure about is the given roi. There're three choices of the different version:

The original roi, corresponding to the original image
The rescaled roi, corresponding to the resized input image (as shown in the code)
The projected roi, corresponding to the feature map of vgg_conv4, whose stride is 16.

Since there's no detailed comment in the roi-pooling-related code, I'm not sure which one to use. Hope you could give me some hint.

Thank you.

is it using ground-truth box information in test time?

proposal_target_layer.py

65 all_rois = np.vstack((all_rois,
66 np.hstack((zeros, np.vstack((gt_easyboxes[:, :-1], jittered_gt_boxes[:, :-1]))))))

I think using ground-truth box as candidate box in test time
I can't find any test time proposal code in this repo
Is it correct? is It wrong evaluation?

proposal_target_layer.py

179 labels = labels[keep_inds]

I think "labels = labels[keep_inds]" is also using ground-truth information because proposal_target_layer returns only 256 sampled rois( it is filtered by IOU then it is sampled)

Is it correct evaluation? do i misunderstand?
I am not good at English. Sorry

python3 compatibility

The faster R-CNN package does not support python3.
Is there any timetable for fix it?

Train new dataset: zeros after conv3 in vgg16

I am trying to train the model with my own dataset. Sometimes , I got this error

  File "train.py", line 127, in <module>
    net(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/code/faster_rcnn_pytorch/faster_rcnn/faster_rcnn.py", line 219, in forward
    roi_data = self.proposal_target_layer(rois, gt_boxes, gt_ishard, dontcare_areas, self.n_classes)
  File "/data/code/faster_rcnn_pytorch/faster_rcnn/faster_rcnn.py", line 287, in proposal_target_layer
    proposal_target_layer_py(rpn_rois, gt_boxes, gt_ishard, dontcare_areas, num_classes)
  File "/data/code/faster_rcnn_pytorch/faster_rcnn/rpn_msr/proposal_target_layer.py", line 66, in proposal_target_layer
    np.hstack((zeros, np.vstack((gt_easyboxes[:, :-1], jittered_gt_boxes[:, :-1]))))))
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/shape_base.py", line 234, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

I traced the bug and figure out that it returns zeros array after conv3 in faster_rcnn/vgg16.py, hence return zero-array feature after forwarding through vgg16
Do you have any clue why ? Thank yah.

Is 6G GPU memory enough for training?

I have 2 GPUs on my PC, each has 6G memory. I can train rbg's py-faster-rcnn project on one of them.But when I run /faster_rcnn_pytorch/train.py of this project , suddenly out of memory.

I refer to FFRCNN project, they said that

For training the end-to-end version of Faster R-CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN)

So I'm very confused How big memory do I need to run /faster_rcnn_pytorch/train.py ? Or, Could this run on 2 GPUs in parallel？

Thanks.

ValueError: attempt to get argmax of an empty sequence

I am trying to train a model on my custom dataset (formatted like Pascal VOC). The model is training for several iterations and then this error occurs.

im_size: (97.0, 1000.0)
scale: 1.5082956552505493
height, width: (6, 62)
rpn: gt_boxes.shape (13, 5)
rpn: gt_boxes [[    0.             0.            85.97284698    63.34841537    25.        ]
 [   70.88989258     0.           144.79638672    63.34841537    33.        ]
 [  131.22172546     1.50829566   209.65309143    66.36500549     5.        ]
 [  193.06184387     0.           265.46002197    66.36500549    12.        ]
 [  256.4102478      0.           334.84161377    61.84012222     6.        ]
 [  324.28356934     4.52488708   405.73153687    66.36500549    30.        ]
 [  392.15686035     3.01659131   488.68777466    64.85671234     2.        ]
 [  461.53845215     6.03318262   549.01959229    75.41477966    22.        ]
 [  618.40118408     6.03318262   713.42382812    75.41477966     2.        ]
 [  698.34088135     6.03318262   787.33032227    70.88989258    30.        ]
 [  773.75567627     7.54147816   867.2699585     73.90648651     2.        ]
 [  825.03771973     9.04977417   975.86724854    96.53092194     7.        ]
 [  920.06030273     4.52488708  1000.            78.4313736     12.        ]]
total_anchors 3348
inds_inside 0
anchors.shape (0, 4)
[]
[[    0.             0.            85.97284698    63.34841537    25.        ]
 [   70.88989258     0.           144.79638672    63.34841537    33.        ]
 [  131.22172546     1.50829566   209.65309143    66.36500549     5.        ]
 [  193.06184387     0.           265.46002197    66.36500549    12.        ]
 [  256.4102478      0.           334.84161377    61.84012222     6.        ]
 [  324.28356934     4.52488708   405.73153687    66.36500549    30.        ]
 [  392.15686035     3.01659131   488.68777466    64.85671234     2.        ]
 [  461.53845215     6.03318262   549.01959229    75.41477966    22.        ]
 [  618.40118408     6.03318262   713.42382812    75.41477966     2.        ]
 [  698.34088135     6.03318262   787.33032227    70.88989258    30.        ]
 [  773.75567627     7.54147816   867.2699585     73.90648651     2.        ]
 [  825.03771973     9.04977417   975.86724854    96.53092194     7.        ]
 [  920.06030273     4.52488708  1000.            78.4313736     12.        ]]
Traceback (most recent call last):
  File "train.py", line 129, in <module>
    net(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)
  File "/home/cadene/anaconda3/envs/faster_rcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cadene/Documents/faster_rcnn_pytorch_python3/faster_rcnn/faster_rcnn.py", line 215, in forward
    features, rois = self.rpn(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)
  File "/home/cadene/anaconda3/envs/faster_rcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cadene/Documents/faster_rcnn_pytorch_python3/faster_rcnn/faster_rcnn.py", line 77, in forward
    im_info, self._feat_stride, self.anchor_scales)
  File "/home/cadene/Documents/faster_rcnn_pytorch_python3/faster_rcnn/faster_rcnn.py", line 148, in anchor_target_layer
    anchor_target_layer_py(rpn_cls_score, gt_boxes, gt_ishard, dontcare_areas, im_info, _feat_stride, anchor_scales)
  File "/home/cadene/Documents/faster_rcnn_pytorch_python3/faster_rcnn/rpn_msr/anchor_target_layer.py", line 150, in anchor_target_layer
    gt_argmax_overlaps = overlaps.argmax(axis=0)  # G
ValueError: attempt to get argmax of an empty sequence

Obviously it is due to the fact that all_anchors contains anchors which are not "inside the image".
https://github.com/longcw/faster_rcnn_pytorch/blob/master/faster_rcnn/rpn_msr/anchor_target_layer.py#L118
I can't figure out how to fix this...

all_anchors [[  -84.   -40.    99.    55.]
 [ -176.   -88.   191.   103.]
 [ -360.  -184.   375.   199.]
 ..., 
 [  940.     0.  1027.   175.]
 [  896.   -88.  1071.   263.]
 [  808.  -264.  1159.   439.]]
total_anchors 3348
inds_inside 0

SmoothL1Loss is different with the paper

I can see your SmoothL1Loss is different with the original papers. How does this compare?

license

I didn't find any information about the license in the readme.
What's the license of this repo?