smallcorgi / faster-rcnn_tf Goto Github PK

View Code? Open in Web Editor NEW

2.3K 88.0 1.1K 1.16 MB

Faster-RCNN in Tensorflow

License: MIT License

Shell 0.60% Makefile 0.02% Python 89.01% C++ 7.28% Cuda 1.10% Cython 2.00%

tensorflow detection faster-rcnn

faster-rcnn_tf's People

Stargazers

Watchers

Forkers

ck196 benjamesbabala caomw fang289040324 goan15910 chenliu0831 zannet philokey chingyaoc hanzmyco zhangxinnan algpower icapalija lxj0276 gaopeng-eugene andrewraharjo curiositycreations seuliufeng lan1991xu ahmedammar zhangxiaodi zaheersm jojo13572001 andrei-pokrovsky wanghuogen adrianhust jdc08161063 cuijianzhu wanjinchang lihungchieh xiaolongmeng 2php stevenlol waichu fireae kekedan lyk125 bigsnarfdude rutgershan bryantlj haoshuji zifengtianyu emailhy slzephyr sabeshuom longlong-jing tinytai songyaozhang jxlin rollingstone manisoftwartist wang-resola francesliang cjliux zibaparsons raulpuric leecodedog iflier wadefelix arasharchor 24hours donnyyou gongxijun zuowang hope-yao lwllvyb zoujg soonminhwang johnsoningzhuang sudzz ruqiang826 nanuri trigrass2 hosang wjssx shartoo hhappy06 kirumang rayanelleuch jangkyung hiwonjoon bobvanluijt ilibx scorpiodoctor snakeroot91 henrilin28 ichito zhangruiskyline jhung0 thuyang peilin-yang arturodeza babu520 lukealexmiller lukeandshuo anida-qin embeddedsamurai lihongqiang shinexunju peterqfr

faster-rcnn_tf's Issues

Can this code run in multiple GPUs?

Can this code run in multiple GPUs? Thanks.

when i run python ./tools/demo.py --model models/VGGnet_fast_rcnn_iter_70000.ckpt

when i run python ./tools/demo.py --model models/VGGnet_fast_rcnn_iter_70000.ckpt
I got the result like this:

usage: demo.py [-h] [--gpu GPU_ID] [--cpu] [--net {vgg16,zf}]
demo.py: error: unrecognized arguments: --model models/VGGnet_fast_rcnn_iter_70000.ckpt

how to deal with it?

Training instruction

Can you provide training instruction?
I want to train on my own data with PASCAL_VOC format.

Thank you.

failed to make roi_pooling_layer in TF GPU version

Environment: pip install TF by : sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0rc0-cp27-none-linux_x86_64.whl
when make roi_pooling_layer by make.sh, it failed.
So I separately test,
The first step is ok when run nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc -I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52.

then run next step : g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc roi_pooling_op.cu.o -I $(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())') -fPIC -lcudart -L $/usr/local/cuda/lib64
but it reported much more error as below:

In file included from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice.h:101:0,
from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/strings/str_util.h:23,
from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/op.h:29,
from roi_pooling_op.cc:22:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice_internal.h:232:38: error: ‘tensorflow::gtl::array_slice_internal::ArraySliceImplBase::ArraySliceImplBase’ names constructor
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice_internal.h:252:32: error: ‘tensorflow::gtl::array_slice_internal::ArraySliceImplBase::ArraySliceImplBase’ names constructor
In file included from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice.h:102:0,
from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/strings/str_util.h:23,
from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/op.h:29,
from roi_pooling_op.cc:22:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/inlined_vector.h: In member function ‘void tensorflow::gtl::InlinedVector<T, N>::Destroy(T*, int)’:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/inlined_vector.h:394:10: error: ‘is_trivially_destructible’ is not a member of ‘std’
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/inlined_vector.h:394:42: error: expected primary-expression before ‘>’ token
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/inlined_vector.h:394:43: error: ‘::value’ has not been declared
In file included from roi_pooling_op.cc:23:0:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h: At global scope:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h:169:19: error: ‘tensorflow::OpKernel::OpKernel’ names constructor
In file included from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/strings/str_util.h:23:0,
from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/op.h:29,
from roi_pooling_op.cc:22:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice.h: In instantiation of ‘tensorflow::gtl::ArraySlice::ArraySlice(const tensorflow::gtl::InlinedVector<T, N>&) [with int N = 4; T = tensorflow::DataType]’:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/types.h:86:36: required from here
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice.h:140:33: error: no matching function for call to ‘tensorflow::gtl::array_slice_internal::ArraySliceImpltensorflow::DataType::ArraySliceImpl(tensorflow::gtl::InlinedVector<tensorflow::DataType, 4>::const_pointer, std::size_t)’
...............................
usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice_internal.h:230:7: note: constexpr tensorflow::gtl::array_slice_internal::ArraySliceImpl<std::pair<std::basic_string, tensorflow::FunctionDefHelper::AttrValueWrapper> >::ArraySliceImpl(const tensorflow::gtl::array_slice_internal::ArraySliceImpl<std::pair<std::basic_string, tensorflow::FunctionDefHelper::AttrValueWrapper> >&)
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice_internal.h:230:7: note: candidate expects 1 argument, 2 provided
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice_internal.h:230:7: note: constexpr tensorflow::gtl::array_slice_internal::ArraySliceImpl<std::pair<std::basic_string, tensorflow::FunctionDefHelper::AttrValueWrapper> >::ArraySliceImpl(tensorflow::gtl::array_slice_internal::ArraySliceImpl<std::pair<std::basic_string, tensorflow::FunctionDefHelper::AttrValueWrapper> >&&)
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/lib/gtl/array_slice_internal.h:230:7: note: candidate expects 1 argument, 2 provided

Any one who knows this question how to solve please help me solve this problem, I will appreciate that.
note: cuda version 7.5, cudnn version v3.0

Demo Fail roi_pooling.so: undefined symbol

I followed instructions and install all dependencies... I run it on tensorflow GPU enabled, python 2.7 and get the following output:

Traceback (most recent call last): File "./tools/demo.py", line 11, in <module> from networks.factory import get_network File "/home/misko/Workspace/Faster-RCNN_TF/tools/../lib/networks/__init__.py", line 8, in <module> from .VGGnet_train import VGGnet_train File "/home/misko/Workspace/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in <module> from networks.network import Network File "/home/misko/Workspace/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in <module> import roi_pooling_layer.roi_pooling_op as roi_pool_op File "/home/misko/Workspace/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in <module> _roi_pooling_module = tf.load_op_library(filename) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library None, None, error_msg, error_code) tensorflow.python.framework.errors_impl.NotFoundError: /home/misko/Workspace/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE

How can i make this run?
I have seen this issue in other threads, but no solution. Did anyone figured out?

Failed to run demo

Tried running demo.py and hit the following error:

aise errors._make_specific_exception(None, None, error_msg, error_code)
tensorflow.python.framework.errors.NotFoundError: ~/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE

My system had Tensorflow v0.11.0 installed, and I was Python 2.7.

Thanks!
cloud

cannot run demo on CPU mode

running inside the latest docker tensorflow:

docker run -it -p 8888:8888 tensorflow/tensorflow

root@f54905c5bdaf:/notebooks/Faster-RCNN_TF# python ./tools/demo.py --model /VGGnet_fast_rcnn_iter_70000.ckpt
Traceback (most recent call last):
File "./tools/demo.py", line 11, in
from networks.factory import get_network
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/notebooks/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 63, in load_op_library
raise errors._make_specific_exception(None, None, error_msg, error_code)
tensorflow.python.framework.errors.NotFoundError: /notebooks/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE

root@f54905c5bdaf:/notebooks/Faster-RCNN_TF# nm -gC lib/roi_pooling_layer/roi_pooling.so |grep GpuDevice
U ROIPoolForwardLaucher(float const*, float, int, int, int, int, int, int, float const*, float*, int*, Eigen::GpuDevice const&)
U ROIPoolBackwardLaucher(float const*, float, int, int, int, int, int, int, int, float const*, float*, int const*, Eigen::GpuDevice const&)
U Eigen::GpuDevice const& tensorflow::OpKernelContext::eigen_deviceEigen::GpuDevice() const

make issue: g++: error: roi_pooling_op.cu.o: No such file or directory

cd $FRCN_ROOT/lib
make

After this step I am getting an error:

aquib@javed:~/Faster-RCNN_TF/lib$ make all
python setup.py build_ext --inplace
running build_ext
skipping 'utils/bbox.c' Cython extension (up-to-date)
skipping 'utils/nms.c' Cython extension (up-to-date)
skipping 'nms/cpu_nms.c' Cython extension (up-to-date)
skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date)
rm -rf build
bash make.sh
make.sh: line 13: nvcc: command not found
g++: error: roi_pooling_op.cu.o: No such file or directory
g++: error: GOOGLE_CUDA=1: No such file or directory

What is this problem, how can I fix it?

cudaCheckError() failed : invalid device function

I spent so much time debugging this issue that I give the answer here:
When running the demo.py as stated in README, I was getting an error cudaCheckError() failed : invalid device function with no traceback. It happen when this line was executed : https://github.com/smallcorgi/Faster-RCNN_TF/blob/master/lib/fast_rcnn/test.py#L169

I have never seen this error in any of my other tensorflow project.

This issue was similar to this one in Faster-RCNN for python : rbgirshick/py-faster-rcnn#2
And i solved it by updating the arch code in https://github.com/smallcorgi/Faster-RCNN_TF/blob/master/lib/make.sh#L9 and https://github.com/smallcorgi/Faster-RCNN_TF/blob/master/lib/setup.py#L137
I don't know how to find the arch code of any GPU, but for Tesla K80, sm_37 seems to work.

I don't know if we can change something so that it works for any GPU or maybe we can add an information in the README?

Hope it can help people having the same issue.

Can you give a link to get faster_rcnn_tf.model

Hello!
I try to run demo.py, but I failed because I do not have faster_rcnn_tf.model. Can you give a link to get faster_rcnn_tf.model or the pre-trained VGG16 model.
Thank you very much.

error when building the cython modules

the error reads like this:

make.sh: line 8: nvcc: command not found
g++: error: roi_pooling_op.cu.o: No such file or directory

Thanks very much! @smallcorgi

Error in roi_pooling_op.cc:

whenever I am trying to do make as per the instructions.. I am getting this error

In file included from roi_pooling_op.cc:25:0:
work_sharder.h:21:49: fatal error: tensorflow/core/lib/core/threadpool.h: No such file or directory
#include "tensorflow/core/lib/core/threadpool.h"

Please advice what i am missing

build fail

I am using CUDA 8 and cuDNN 5.1.5, Ubuntu 16.04.

when i build the code, i meet errors below:

python setup.py build_ext --inplace
running build_ext
skipping 'utils/bbox.c' Cython extension (up-to-date)
skipping 'utils/nms.c' Cython extension (up-to-date)
skipping 'nms/cpu_nms.c' Cython extension (up-to-date)
skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date)
rm -rf build
bash make.sh
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(36): error: identifier "__builtin_ia32_monitorx" is undefined

/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(42): error: identifier "__builtin_ia32_mwaitx" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_0000529b_00000000-7_roi_pooling_op_gpu.cu.cpp1.ii".
g++: error: roi_pooling_op.cu.o: No such file or directory

Training Error

Hi all,
I am getting the following error while training. I have installed the latest tensorflow version. Was the blobs dictionary changed with new tensorflow ?
Any help is appreciated
Thanks

assign pretrain model weights to conv1_2
assign pretrain model biases to conv1_2
assign pretrain model weights to conv2_2
assign pretrain model biases to conv2_2
assign pretrain model weights to conv2_1
assign pretrain model biases to conv2_1
/scratch3/skoppura/Faster-RCNN_TF/tools/../lib/roi_data_layer/minibatch.py:100: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  fg_inds, size=fg_rois_per_this_image, replace=False)
/scratch3/skoppura/Faster-RCNN_TF/tools/../lib/roi_data_layer/minibatch.py:120: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  labels[fg_rois_per_this_image:] = 0
/scratch3/skoppura/Faster-RCNN_TF/tools/../lib/roi_data_layer/minibatch.py:176: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
/scratch3/skoppura/Faster-RCNN_TF/tools/../lib/roi_data_layer/minibatch.py:177: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
Traceback (most recent call last):
  File "./tools/train_net.py", line 96, in <module>
    max_iters=args.max_iters)
  File "/scratch3/skoppura/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 222, in train_net
    sw.train_model(sess, max_iters)
  File "/scratch3/skoppura/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 147, in train_model
    feed_dict={self.net.data: blobs['data'], self.net.im_info: blobs['im_info'], self.net.keep_prob: 0.5, \
KeyError: 'im_info'

Loaded runtime CuDNN library: 5006 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100)

I got an error when I run the demo.py with GPU.
Environment: ubuntu14.04+python2.7+cuda7.5+cudnn5.1

root@SSAP-G3-Guest:~/Workspace/faster-rcnn-master# python tools/demo.py --model models/VGGnet_fast_rcnn_iter_70000.ckpt

The error is as follows:

Loaded network models/VGGnet_fast_rcnn_iter_70000.ckpt
E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 5006 
(compatibility version 5000) but source was compiled with 5103 (compatibility version 5100).  
If using a binary install, upgrade your CuDNN library to match. 
If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Aborted (core dumped)

Thanks!

ROI pooling layer?

Hi,

I am wondering if this implementation include any ROI Pooling Layer?

My understanding is the ROI Pooling is not supported in generic TensorFlow. I could be wrong though...

Thanks!
cloud

any idea on multi batch training

same as faster rcnn, the training only allow single image per batch in training. In caffe, the parameter of "iter_size" can be adjusted to do multi batch training as weights are updated after "iter_size" iterations, i.e., images. Can this be done in TF?

Thank you.

error when building the cython modules

When I built the cython modules, some errors came out:
#python setup.py build_ext --inplace
running build_ext
skipping 'utils/bbox.c' Cython extension (up-to-date)
skipping 'utils/nms.c' Cython extension (up-to-date)
skipping 'nms/cpu_nms.c' Cython extension (up-to-date)
skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date)
rm -rf build
bash make.sh
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
In file included from roi_pooling_op.cc:25:0:
work_sharder.h:21:49: fatal error: tensorflow/core/lib/core/threadpool.h: No such file or directory
#include "tensorflow/core/lib/core/threadpool.h"
^
compilation terminated.

So I change the directory into python lib directory and found the tensorflow/core/lib/core folder didn't hava the threadpool.h, my python is python2.7.8, and OS is Cent OS6.8, I think the tensorflow I installed is ok, but how can solve this problem? can anyone met the same situation?

Issue while training on new dataset

I was training on a new dataset which is based on the format of VOC2007, and got 5000 iterations into training when there was a crash. It looks like something happened while trying to take a snapshot of the weights of the neural net. Any ideas on how to fix this?

Here's the error:
Traceback (most recent call last):
File "./tools/train_net.py", line 95, in
max_iters=args.max_iters)
File "Faster-RCNN_TF-master/tools/../lib/fast_rcnn/train.py", line 209, in train_net
sw.train_model(sess, max_iters)
File "Faster-RCNN_TF-master/tools/../lib/fast_rcnn/train.py", line 166, in train_model
self.snapshot(sess, iter)
File "Faster-RCNN_TF-master/tools/../lib/fast_rcnn/train.py", line 60, in snapshot
sess.run(weights.assign(orig_0 * np.tile(self.bbox_stds, (weights_shape[0],1))))
ValueError: operands could not be broadcast together with shapes (4096,84) (4096,32)

restore training from ckpt file error

Since this version of FRCN_TF can't restore training from the ckpt file ,I rewrite the net.load simply:

def load2(self, data_path, session, ignore_missing=False):
    saver=tf.train.Saver()
    print'model start restore'
    with tf.Session() as sess:
        saver.restore(sess,data_path)
        print 'Model Restored'

but I always get the error that:

W tensorflow/core/framework/op_kernel.cc:940] Not found: Tensor name "Variable" not found in checkpoint files /home/jmy/Desktop/Faster-RCNN_TF-master/VGGnet_fast_rcnn_iter_70000.ckpt
[[Node: save_1/restore_slice = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/restore_slice/tensor_name, save_1/restore_slice/shape_and_slice)]]

In my opinion ,we need to create the

lr = tf.Variable(cfg.TRAIN.LEARNING_RATE, trainable=False) to
lr = tf.Variable(cfg.TRAIN.LEARNING_RATE, trainable=False,name='learning_rate'),

so we can restore training from ckpt.

Besides ,I apply this framework to resnet50, Doing the same things above(restore training from ckpt),I get the error:

W tensorflow/core/framework/op_kernel.cc:940] Not found: Tensor name "rpn_conv/3x3/biases/Momentum" not found in checkpoint files /home/jmy/Desktop/Faster-RCNN_TF-master/Resnet_fast_rcnn_iter_1000.ckpt
[[Node: save_1/restore_slice_338 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/restore_slice_338/tensor_name, save_1/restore_slice_338/shape_and_slice)]]

This error exists in the layers which have biases .I guess the Moment in MomentumOptimizer may cause this issue ,but I don't know how to modify it .Maybe:

momentum = tf.Variable(cfg.TRAIN.MOMENTUM,trainable=False,name='momentum')?

out of memory in GTX1080

I trained this model on a machine that has GTX1080 and 16 GB memory.It always ends up with:

out of memory
invalid argument
an illegal memory access was encountered
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:662] failed to record completion event; therefore, failed to create inter-stream dependency
I tensorflow/stream_executor/stream.cc:3788] stream 0x50b3390 did not memcpy host-to-device; source: 0x7f1c216f6e60
E tensorflow/stream_executor/stream.cc:272] Error recording event in stream: error recording CUDA event on stream 0x50989c0: CUDA_ERROR_ILLEGAL_ADDRESS; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
[1] 6900 abort (core dumped) python ./faster_rcnn/train_net.py --gpu 0 --weights --imdb voc_2007_trainval

and README.md said:
Requirements: hardware
For training the end-to-end version of Faster R-CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN)

what's the problem?

How to use a pre-trained model to test my own image dataset?

the pre-trained model is VGGnet_fast_rcnn_iter_70000.ckpt

missing the license file

Fail to build in CPU only environment

Hi,

Errors were encountered when building Faster-RCNN-TF in CPU only environment.

It still did not succeed even after all "cuda" references were removed from setup.py.

Any suggestions?

Thanks!
cloud

warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

I'm not familier to this. when i make in /lib, there come a lot warnings, i'm not sure if it can work. Could anyone answer my question... I'm truely confused...

Here's my log, it doesn't look like right...

irondroid@PC:~/Faster-RCNN_TF-master/lib$ make
python setup.py build_ext --inplace
running build_ext
cythoning utils/bbox.pyx to utils/bbox.c
building 'utils.cython_bbox' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/utils
{'gcc': ['-Wno-cpp', '-Wno-unused-function']}
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/home/irondroid/anaconda2/include/python2.7 -c utils/bbox.c -o build/temp.linux-x86_64-2.7/utils/bbox.o -Wno-cpp -Wno-unused-function
gcc -pthread -shared -L/home/irondroid/anaconda2/lib -Wl,-rpath=/home/irondroid/anaconda2/lib,--no-as-needed build/temp.linux-x86_64-2.7/utils/bbox.o -L/home/irondroid/anaconda2/lib -lpython2.7 -o /home/irondroid/Faster-RCNN_TF-master/lib/utils/cython_bbox.so
cythoning utils/nms.pyx to utils/nms.c
building 'utils.cython_nms' extension
{'gcc': ['-Wno-cpp', '-Wno-unused-function']}
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/home/irondroid/anaconda2/include/python2.7 -c utils/nms.c -o build/temp.linux-x86_64-2.7/utils/nms.o -Wno-cpp -Wno-unused-function
gcc -pthread -shared -L/home/irondroid/anaconda2/lib -Wl,-rpath=/home/irondroid/anaconda2/lib,--no-as-needed build/temp.linux-x86_64-2.7/utils/nms.o -L/home/irondroid/anaconda2/lib -lpython2.7 -o /home/irondroid/Faster-RCNN_TF-master/lib/utils/cython_nms.so
cythoning nms/cpu_nms.pyx to nms/cpu_nms.c
building 'nms.cpu_nms' extension
creating build/temp.linux-x86_64-2.7/nms
{'gcc': ['-Wno-cpp', '-Wno-unused-function']}
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/home/irondroid/anaconda2/include/python2.7 -c nms/cpu_nms.c -o build/temp.linux-x86_64-2.7/nms/cpu_nms.o -Wno-cpp -Wno-unused-function
gcc -pthread -shared -L/home/irondroid/anaconda2/lib -Wl,-rpath=/home/irondroid/anaconda2/lib,--no-as-needed build/temp.linux-x86_64-2.7/nms/cpu_nms.o -L/home/irondroid/anaconda2/lib -lpython2.7 -o /home/irondroid/Faster-RCNN_TF-master/lib/nms/cpu_nms.so
cythoning nms/gpu_nms.pyx to nms/gpu_nms.cpp
building 'nms.gpu_nms' extension
{'gcc': ['-Wno-unused-function'], 'nvcc': ['-arch=sm_35', '--ptxas-options=-v', '-c', '--compiler-options', "'-fPIC'"]}
/usr/local/cuda/bin/nvcc -I/home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/usr/local/cuda/include -I/home/irondroid/anaconda2/include/python2.7 -c nms/nms_kernel.cu -o build/temp.linux-x86_64-2.7/nms/nms_kernel.o -arch=sm_35 --ptxas-options=-v -c --compiler-options '-fPIC'
ptxas info : 0 bytes gmem
ptxas info : Compiling entry function '_Z10nms_kernelifPKfPy' for 'sm_35'
ptxas info : Function properties for _Z10nms_kernelifPKfPy
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 25 registers, 1280 bytes smem, 344 bytes cmem[0], 8 bytes cmem[2]
{'gcc': ['-Wno-unused-function'], 'nvcc': ['-arch=sm_35', '--ptxas-options=-v', '-c', '--compiler-options', "'-fPIC'"]}
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/usr/local/cuda/include -I/home/irondroid/anaconda2/include/python2.7 -c nms/gpu_nms.cpp -o build/temp.linux-x86_64-2.7/nms/gpu_nms.o -Wno-unused-function
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
from /home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from nms/gpu_nms.cpp:283:
/home/irondroid/anaconda2/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
g++ -pthread -shared -L/home/irondroid/anaconda2/lib -Wl,-rpath=/home/irondroid/anaconda2/lib,--no-as-needed build/temp.linux-x86_64-2.7/nms/nms_kernel.o build/temp.linux-x86_64-2.7/nms/gpu_nms.o -L/usr/local/cuda/lib64 -L/home/irondroid/anaconda2/lib -Wl,-R/usr/local/cuda/lib64 -lcudart -lpython2.7 -o /home/irondroid/Faster-RCNN_TF-master/lib/nms/gpu_nms.so
rm -rf build
sh make.sh
/home/irondroid/anaconda2/lib/python2.7/site-packages/tensorflow/include
/home/irondroid/anaconda2/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1052): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/irondroid/anaconda2/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1052): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/irondroid/anaconda2/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1052): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/irondroid/anaconda2/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1057): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/irondroid/anaconda2/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1057): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/irondroid/anaconda2/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1057): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

now tensorflow support windows, I hope the tf version faster rcnn can run on windows too

cannot run the demo:from: can't read /var/mail/fast_rcnn.config

Hi,
I try to run the demo like this:
./tools/demo.py --model ./VGGnet_fast_rcnn_iter_70000.ckpt
But it comes out the mouse with cross shape and I click several times, it comes out:

from: can't read /var/mail/fast_rcnn.config
from: can't read /var/mail/fast_rcnn.test
from: can't read /var/mail/fast_rcnn.nms_wrapper
from: can't read /var/mail/utils.timer
from: can't read /var/mail/networks.factory
./tools/demo.py: line 14: syntax error near unexpected token `('
./tools/demo.py: line 14: `CLASSES = ('__background__','

How to solve this problem?

NMS CPU vs GPU

Hi,

I am running into this problem that NMS gave different results in CPU or GPU. I am not sure if I am missing anything. If anyone knows the reason, please let me know. Thanks a lot!

The config I am toggling is __C.USE_GPU_NMS in lib/fast_rcnn/config.py

CPU:

Demo for data/demo/000456.jpg
Detection took 5.475s for 300 object proposals

GPU:

Demo for data/demo/000456.jpg
Detection took 4.063s for 3 object proposals

GPU version kind of failed:

How to train a new detector?

Hi!
I failed to run faster_rcnn_end2end.sh.
Can you introduce how to train a new detector?
Thank you very much!

libpng warning: Application built with libpng-1.6.22 but running with 1.5.12

When I run python ./demo.py -- model modelpath, there come a lot of warnings with the same thing: libpng warning: Application built with libpng-1.6.22 but running with 1.5.12. but it can still show up some pictures. I don't know if it's right. Could someone help me ?

Training speed become slower iteration by iteration

Hello,

When I training the model, the speed of per iteration slow down, eg. in the beginning, the speed is about 0.4s/iter, after 10000 iterations, the speed reduce to about 1s/iter. However, the time of tensorflow session
rpn_loss_cls_value, rpn_loss_box_value,loss_cls_value, loss_box_value, _ = sess.run([rpn_cross_entropy, rpn_loss_box, cross_entropy, loss_box, train_op], feed_dict=feed_dict)
does not increase.

What's more, the CPU time seems much more than the beginning, the usage of GPU is often 0%. Therefore, I suspect that there are something wrong in roi_data_layer which run in CPU.

I have check the code, but I can not find any bug. Has anyone meets this problem and how to solve this problem.

Thank you.

I got an error "cudaCheckError() failed : invalid device function".

hello,
when I run the demo with "./tools/demo.py --model model/VGGnet_fast_rcnn_iter_70000.ckpt",
I got an error: "cudaCheckError() failed : invalid device function"
Could you please tell me how to fix it, the error code is:

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:02:00.0
Total memory: 11.25GiB
Free memory: 11.12GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x2ac5280
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties: 
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.25GiB
Free memory: 11.12GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40m, pci bus id: 0000:03:00.0)
Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_cls_score/rpn_cls_score:0", shape=(?, ?, ?, 18), dtype=float32)
Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
Tensor("rpn_cls_prob_reshape:0", shape=(?, ?, ?, 18), dtype=float32)
Tensor("rpn_bbox_pred/rpn_bbox_pred:0", shape=(?, ?, ?, 36), dtype=float32)
Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rois:0", shape=(?, 5), dtype=float32)
[<tf.Tensor 'conv5_3/conv5_3:0' shape=(?, ?, ?, 512) dtype=float32>, <tf.Tensor 'rois:0' shape=(?, 5) dtype=float32>]
Tensor("fc7/fc7:0", shape=(?, 4096), dtype=float32)


Loaded network model/VGGnet_fast_rcnn_iter_70000.ckpt
cudaCheckError() failed : invalid device function

Specify which GPU to use

I noticed that in ./tools/train_net.py, although you have variable device_name to get gpu id from command line, you do not actually use it. That means this model can only be trained on the default gpu?

I am not familiar with Tensorflow. So I wonder how I can specify which gpu to use, or even use cpu.

How to run the demo.py in tools directory?

I try 'python demo.py --model model_path(where is I put the VGG16_imagenet.npy)', but failed.
do i get the right model?
please give me some help, thanks

Error compiling Cython file

python setup.py build_ext --inplace
running build_ext
cythoning utils/bbox.pyx to utils/bbox.c

Error compiling Cython file:
------------------------------------------------------------
...
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Sergey Karayev
# --------------------------------------------------------

cimport cython
       ^
------------------------------------------------------------

utils/bbox.pyx:8:8: Compiler crash in AnalyseDeclarationsTransform

File 'ModuleNode.py', line 103, in analyse_declarations: ModuleNode(bbox.pyx:1:0,
    full_module_name = 'utils.cython_bbox')
File 'Nodes.py', line 425, in analyse_declarations: StatListNode(bbox.pyx:8:0)
File 'Nodes.py', line 425, in analyse_declarations: StatListNode(bbox.pyx:8:8)
File 'Nodes.py', line 7346, in analyse_declarations: CImportStatNode(bbox.pyx:8:8,
    module_name = u'cython')

Compiler crash traceback from this point on:
  File "/home/gt/anaconda2/lib/python2.7/site-packages/Cython/Compiler/Nodes.py", line 7346, in analyse_declarations
    self.module_name, self.pos, relative_level=0 if self.is_absolute else -1)
  File "/home/gt/anaconda2/lib/python2.7/site-packages/Cython/Compiler/Symtab.py", line 1159, in find_module
    module_name, relative_to=relative_to, pos=pos, absolute_fallback=absolute_fallback)
  File "/home/gt/anaconda2/lib/python2.7/site-packages/Cython/Compiler/Main.py", line 178, in find_module
    pxd_pathname = self.find_pxd_file(qualified_name, pos)
  File "/home/gt/anaconda2/lib/python2.7/site-packages/Cython/Compiler/Main.py", line 239, in find_pxd_file
    pxd = self.search_include_directories(qualified_name, ".pxd", pos, sys_path=sys_path)
  File "/home/gt/anaconda2/lib/python2.7/site-packages/Cython/Compiler/Main.py", line 280, in search_include_directories
    tuple(self.include_directories), qualified_name, suffix, pos, include, sys_path)
  File "/home/gt/anaconda2/lib/python2.7/site-packages/Cython/Utils.py", line 29, in wrapper
    res = cache[args] = f(*args)
  File "/home/gt/anaconda2/lib/python2.7/site-packages/Cython/Utils.py", line 119, in search_include_directories
    path = os.path.join(dir, dotted_filename)
  File "/home/gt/anaconda2/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 9: ordinal not in range(128)
building 'utils.cython_bbox' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/utils
{'gcc': ['-Wno-cpp', '-Wno-unused-function']}
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/gt/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/home/gt/anaconda2/include/python2.7 -c utils/bbox.c -o build/temp.linux-x86_64-2.7/utils/bbox.o -Wno-cpp -Wno-unused-function
utils/bbox.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
 #error Do not use this file, it is the result of a failed Cython compilation.
  ^
error: command 'gcc' failed with exit status 1
Makefile:2: recipe for target 'all' failed
make: *** [all] Error 1

Fine-tuning

Hello
I'm interested into fine-tunning pre-learned model. Do you have some recommendation where to start, and which layers I should train, and which of them I should fix. Is there any good tutorial on that?
Thank you

roi_pooling_op.cu.o: No such file or directory

While I am trying to make.. I am getting the error: roi_pooling_op.cu.o: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
make.sh: line 13: nvcc: command not found
g++: error: roi_pooling_op.cu.o: No such file or directory

Please suggest what I need to do

`AttributeError: 'module' object has no attribute 'RegisterShape'`

hi ,
I am trying to execute this program on AWS and I am getting below error:

Traceback (most recent call last):
File "./tools/demo.py", line 11, in <module>
from networks.factory import get_network
File "/home/ubuntu/Faster-RCNN_TF/tools/../lib/networks/__init__.py", line 8, in <module>
from .VGGnet_train import VGGnet_train
File "/home/ubuntu/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in <module>
from networks.network import Network
File "/home/ubuntu/Faster-RCNN_TF/tools/../lib/networks/network.py", line 4, in <module>
import roi_pooling_layer.roi_pooling_op_grad
File "/home/ubuntu/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op_grad.py", line 7, in <module>
@tf.RegisterShape("RoiPool")
AttributeError: 'module' object has no attribute 'RegisterShape'

Can any one suggest a solution to this problem?
Thank you,

Any success running in mac os 10.11.5?

I am having make error on mac os 10.11.5 with the following when running make in lib directory.
...
"typeinfo for tensorflow::OpKernel", referenced from:
typeinfo for RoiPoolOp<Eigen::ThreadPoolDevice, float> in roi_pooling_op-0e649f.o
typeinfo for RoiPoolGradOp<Eigen::ThreadPoolDevice, float> in roi_pooling_op-0e649f.o
typeinfo for RoiPoolOp<Eigen::GpuDevice, float> in roi_pooling_op-0e649f.o
typeinfo for RoiPoolGradOp<Eigen::GpuDevice, float> in roi_pooling_op-0e649f.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

using (tensorflow)➜ lib git:(master) ✗ clang --version
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Ran out of memory

W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 392.00MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:965] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: zeros_24 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [25088,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
Traceback (most recent call last):
File "./tools/train_net.py", line 96, in
max_iters=args.max_iters)
File "/home/deepinsight/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 222, in train_net
sw.train_model(sess, max_iters)
File "/home/deepinsight/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 134, in train_model
sess.run(tf.initialize_all_variables())
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: zeros_24 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [25088,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'zeros_24', defined at:
File "./tools/train_net.py", line 96, in
max_iters=args.max_iters)
File "/home/deepinsight/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 222, in train_net
sw.train_model(sess, max_iters)
File "/home/deepinsight/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 131, in train_model
train_op = tf.train.MomentumOptimizer(lr, momentum).minimize(loss, global_step=global_step)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 279, in minimize
name=name)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 393, in apply_gradients
self._create_slots(var_list)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/momentum.py", line 51, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 593, in _zeros_slot
named_slots[var] = slot_creator.create_zeros_slot(var, op_name)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 106, in create_zeros_slot
val = array_ops.zeros(primary.get_shape().as_list(), dtype=dtype)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1362, in zeros
output = constant(zero, shape=shape, dtype=dtype, name=name)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: zeros_24 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [25088,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]

cudaCheckError() failed : invalid device function on AMI

Hi,

I tried to get the experiment working on Amazon GPU Cloud machine with a K520 graphic card with cuda 8. I got pretty much warnings, but I think the problem is some cuda function not working on the GPU. Here is some of the output:

assign pretrain model weights to conv2_1
assign pretrain model biases to conv2_1
Faster-RCNN_TF/tools/../lib/rpn_msr/proposal_target_layer_tf.py:89: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
Faster-RCNN_TF/tools/../lib/rpn_msr/proposal_target_layer_tf.py:90: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
cudaCheckError() failed : invalid device function
E tensorflow/stream_executor/stream.cc:272] Error recording event in stream: error recording CUDA event on stream 0x4cae120: CUDA_ERROR_DEINITIALIZED; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_DEINITIALIZED
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:671] failed to record completion event; therefore, failed to create inter-stream dependency
I tensorflow/stream_executor/stream.cc:3775] stream 0x4caea80 did not memcpy device-to-host; source: 0x723f3cf00
./experiments/scripts/faster_rcnn_end2end.sh: line 57: 10679 Aborted                 (core dumped) python ./tools/train_net.py --device ${DEV} --device_id ${DEV_ID} --weights data/pretrain_model/VGG_imagenet.npy --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train ${EXTRA_ARGS}

Can you give me hint what the problem could be?

Thanks in advance

cannot download the VGG_imagenet.npy

I try to download the VGG_imagenet.npy from google drive,but meet some problems.When I have downloaded about 100M,the project stopped.I cannot download the VGG_imagenet.npy from drive! So can anyone helpful sent the VGG_imagenet.npy to my e-mail?My e-mail is [email protected]. Thanks

Not really an issue, but a question

I know this is a bug/issue tracker but I have a general question and didn't know where to ask.

I was just wondering how come in the Faster-RCNN paper they are able to detect the person inside the bus but using the pre-trained model here I am not? I understand there are many parameters that might be causing that but can someone help me understand which exactly? Is it the training data used itself or some parameters like IoU etc?

Thanks,
Ahmed.

does the demo run out of the box after following instructions?

Hi everyone, I was wondering if the demo runs out of the box for you guys after installing the prerequisites mentioned in the readme. I've found that the problem is in the lib/_init_paths.py which tries to set some paths to caffe-fast-rcnn and mftracker which do not exist in this repo as it is. My assumption is that either the readme is incomplete as it does not mention in detail everything that we need to pre-install or there is as bug in the ./lib/_init_paths.py file? Does anyone have any clue?

Here's my error output:

python demo.py --model ../weights/VGGnet_fast_rcnn_iter_70000.ckpt
Traceback (most recent call last):
  File "demo.py", line 3, in <module>
    from fast_rcnn.config import cfg
  File "/home/user/Faster-RCNN_TF/tools/../lib/fast_rcnn/__init__.py", line 9, in <module>
    from . import train
  File "/home/user/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 11, in <module>
    import gt_data_layer.roidb as gdl_roidb
  File "/home/user/Faster-RCNN_TF/tools/../lib/gt_data_layer/roidb.py", line 12, in <module>
    from utils.cython_bbox import bbox_overlaps
ImportError: No module named cython_bbox

Model Performance?

Hi, I tested the performance on voc2007 test set. And the performance is not as good as yours. My meanAP is only about 0.5851. Do you know what might be wrong?

cudaCheckError() failed : invalid device function

# # sorry to bother you, when I run the demo.py ,it shows:

Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_cls_score/rpn_cls_score:0", shape=(?, ?, ?, 18), dtype=float32)
Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
Tensor("rpn_cls_prob_reshape:0", shape=(?, ?, ?, 18), dtype=float32)
Tensor("rpn_bbox_pred/rpn_bbox_pred:0", shape=(?, ?, ?, 36), dtype=float32)
Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rois:0", shape=(?, 5), dtype=float32)
[<tf.Tensor 'conv5_3/conv5_3:0' shape=(?, ?, ?, 512) dtype=float32>, <tf.Tensor 'rois:0' shape=(?, 5) dtype=float32>]
Tensor("fc7/fc7:0", shape=(?, 4096), dtype=float32)

Loaded network /home/jmy/Desktop/Faster-RCNN_TF-master/VGGnet_fast_rcnn_iter_70000.ckpt
cudaCheckError() failed : invalid device function

but I can run the tensorflow r0.10.0 model/image/mnist/convolutional.py successfully , I don't know where is the error.

Thank you

How to train on my own dataset

Hey, I'm a bit confused about what format of data I'd need to pass into this model in order to train on my own dataset. Could you give me an example of what I'd need? Thanks

what's the version of relational software? including ubuntu/gcc/cuda/cudnn

We all want to verify the tensorflow faster rcnn how to work. but when we pull down into our environment, much more errors happened, which spent our much more time to solve the problem.
I think the reason mainly is the version of relation software doesn't match.

so could any one who can run successfully, supply the relation version?

such as:
system version:ubuntu 14.04 or 16.04?
gcc version: 4.8 or 5.4?
cuda version: 7.5 or 8.0?
cudnn version: v3 or v5?

Undefined symbol in roi_pooling.so

I'm trying to run the demo following the instructions in readme, however, when I run the command

python ./tools/demo.py --model ./lib/pretrained/VGGnet_fast_rcnn_iter_70000.ckpt

I get the error below:

➜ Faster-RCNN_TF git:(master) ✗ python ./tools/demo.py --model ./lib/pretrained/VGGnet_fast_rcnn_iter_70000.ckpt
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "./tools/demo.py", line 11, in
from networks.factory import get_network
File "/home/denis/WEB/DeepLearning/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/home/denis/WEB/DeepLearning/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/home/denis/WEB/DeepLearning/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/home/denis/WEB/DeepLearning/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: /home/denis/WEB/DeepLearning/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE

CUDA: 8.0
CuDNN: 5
Python: 2.7.12
Tensorflow: 0.12.0-rc1
GPU: NVidia GeForce 750M (sm_30 architecture)

Due to my setup above, I modified CUDA_PATH in make.sh file to be like this:

CUDA_PATH=/usr/local/cuda-8.0/

and the nvcc instruction to be like this:

	nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc \
		-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CXXFLAGS \
		-arch=sm_30

Am I doing something wrong?

Could you please help me with running the demo properly?

Failed when running tf.load_op_library(roi_pooling_so)

The message it prompted:

  File "/home/shang/Work/TF-Examples/tf-faster-rcnn/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in <module>
    _roi_pooling_module = tf.load_op_library(filename)
  File "/home/shang/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 75, in load_op_library
    raise errors._make_specific_exception(None, None, error_msg, error_code)
tensorflow.python.framework.errors.NotFoundError: /home/shang/Work/TF-Examples/tf-faster-rcnn/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

I tried some workaround, including, changing g++ version 5 and 4.8, modifying -arch to sm_50 (platform is GTX980), adding -D_GLIBCXX_USE_CXX11_ABI=0 in g++ complie line. It seemed the roi_pooling.so could be generated correctly, but still got the same error when runing tf.load_op_library.

smallcorgi / faster-rcnn_tf Goto Github PK

faster-rcnn_tf's People

Stargazers

Watchers

Forkers

faster-rcnn_tf's Issues

Since this version of FRCN_TF can't restore training from the ckpt file ,I rewrite the net.load simply:

but I always get the error that:

In my opinion ,we need to create the

so we can restore training from ckpt.

Besides ,I apply this framework to resnet50, Doing the same things above(restore training from ckpt),I get the error:

This error exists in the layers which have biases .I guess the Moment in MomentumOptimizer may cause this issue ,but I don't know how to modify it .Maybe:

Recommend Projects

Recommend Topics

Recommend Org