Giter Site home page Giter Site logo

ron's Issues

cuda/cudnn version

Hi,
What is the cuda and cudnn version that you used? I tried compiling the code cuda 9/cudnn 8 and I am getting the following errors:

src/caffe/net.cpp:8:18: fatal error: hdf5.h: No such file or directory
compilation terminated.
Makefile:575: recipe for target '.build_release/src/caffe/net.o' failed
make: *** [.build_release/src/caffe/net.o] Error 1

I have also tried with older versions (cuda 6.5 and cudnn 5) but I still can't get this to build

I would appreciate any advise
thanks

train crashed.

when I train the model, it crashed.

F0718 15:20:34.923629 16833 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
./train_voc_reduced.sh: line 7: 16833 Aborted (core dumped) python tools/train_net.py --gpu 0 --solver models/pascalvoc/VGG16-REDUCED/solver.prototxt --imdb voc_2007_trainval --weights data/ImageNet_models/VGG_ILSVRC_16_layers_fc_reduced.caffemodel --batchsize 64 --iters 4000

root$ nvidia-smi
Tue Jul 18 15:29:09 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:06:00.0 Off | 0 |
| N/A 69C P0 57W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 0000:07:00.0 Off | 0 |
| N/A 51C P0 71W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 0000:83:00.0 Off | 0 |
| N/A 62C P0 58W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 0000:84:00.0 Off | 0 |
| N/A 48C P0 72W / 149W | 0MiB / 11439MiB | 99% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Multiple objects detection isn't so accurate

Hi,
I tried the RON VOC VGG-16 model on a test image. Though there is a 'bird' class in the PASCAL VOC dataset, the detection isn't so accurate as it should be.
Original image
The detection result
@taokong May you also try RON on this image, thanks!

Train RON on KITTI Blocked in pythonlayer::forward

@taokong Hi, professor kong, recently I changed your RON project for training detector on KITTI, when I started to train, the output of pycharm run would stop after 100 iters, with the GPU-Util nearly 0% and maintain to 0%.
I gdb the PID of train_net.py and got info like below
#0 0x000000000052b3bf in ?? ()
#1 0x00000000004c8061 in PyEval_EvalFrameEx ()
#2 0x00000000004cfedc in PyEval_EvalCodeEx ()
#3 0x00000000004c8314 in PyEval_EvalFrameEx ()
#4 0x00000000004c8762 in PyEval_EvalFrameEx ()
#5 0x00000000004c8762 in PyEval_EvalFrameEx ()
#6 0x00000000004c8762 in PyEval_EvalFrameEx ()
#7 0x00000000004c8762 in PyEval_EvalFrameEx ()
#8 0x00000000004704ea in ?? ()
#9 0x00000000004d8194 in ?? ()
#10 0x00000000004d40fb in PyEval_CallObjectWithKeywords ()
#11 0x0000000000467c68 in PyEval_CallFunction ()
#12 0x00007fd32612a485 in caffe::PythonLayer::Forward_cpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&) ()
what should I do now?

What is the normal train loss

hi, @taokong ,great work!
I want to re-implement your experiment, and I have three small issues?

  1. What is the normal train loss when the model converge well?
    mark: When I train ,the loss always 4.0+ before 8w iters, and I test at 8w Iters only got mAp=0.54(train data is 2007trainval + 2013trainval)
  2. I also notice that I can do test in a pascal Titanx , but got CUDNN_STATUS_SUCCESS (8 vs. 0) CUDNN_STATUS_EXECUTION_FAILED error in a TitanX.
  3. Training needs much time, can you give me some advices?
    Thanks in advance!

how to save log

hi taokong:
when i use your code ,i want save the loss value, can you help me,how i can save the log .

train data is too slower

Hello,
when I train my data ,the process is too slower.It cost too much time.

Could you give me some advice?

thank you very much

set thebatchsize

due to the limitation of the hardware,I need to set up a little bit of bathsize, so how does RON set the parameters. I cannot find the parameters anywhere.thanks

train.prototxt

@taokong HI

关于train.prototxt,有如下一个问题

layer {
name: "rpn_lrn7"
type: "BatchNorm"
bottom: "rpn_conv7"
top: "rpn_lrn7"
}

对于BatchNoram层,看 网上/其他paper中 很多都是如下设置的:
layer {
bottom: "res5c_branch2b"
top: "res5c_branch2b"
name: "bn5c_branch2b"
type: "BatchNorm"
batch_norm_param {
use_global_stats: true
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
}
 1) 为什么你这里没有这3个默认的param?
 2) use_global_stats在训练时不是为false么,为什么看很多都是true?

为了描述准确,这里用了中文,麻烦了!

Some questions about paper

Hey, much thanks for your great work. About the paper, I have some questions if you don't mind.

  1. For each scale feature maps, there is a seperated classifier and regressor to get class-specific score and bounding box regression. So for four scales, there are four classifiers and regressors. This might bring repeated computation. I wonder if these operations on different scales can merge in some way.
  2. I find that objectness prior is much like rpn(region proposal network). The only difference is that objectness prior only produces a score without bbreg, which is included in rpn. I wonder if I am wrong. Please give me some tips about the differences.
  3. For the last classifier and regressor, one uses two convs while the other uses two inceptions. I wonder the reason why you choose them.
    Thanks again. If disturbed, please forgive.

How to reduce batchsize when im testing ?

Thanks for your great contribution!@taokong
I have some issues when i train and test RON with other dataset:
(1)How to use multi gpus while im training and testing?i find that in 'train_net.py' or 'test_net.py' ,the GPUS argument can only get one int32 value.
(2)If i only use 1 gpu when im testing ,it seems the caffe always out of the memory with the ERROR:status == CUDNN_STATUS_SUCCESS (8 vs 0)CUDNN_STATUS_EXECUTION_FAILED , so where can i reduce the batchsize when im testing ?
Thank you !

How to use 384 or other size for testing?

When I use test320cudnn.prototxt, all is right.
But When I modify the input_shape to 384 in test320cudnn.prototxt, there is error:
File "/opt/yushan/RON/tools/../lib/ron_layer/det_layer.py", line 72, in forward
all_scores_det = bottom[1].data[:, 1:, :, :].reshape(self._ndim, self._numclasses - 1, self._num_anchors, self._height, self._width)
ValueError: cannot reshape array of size 5000 into shape (1,20,10,6,6)

I think the size of det_cls_prob_7 is 5000 when size of input is 320. But the size of det_cls_prob_7 shuold be 7500 when size of input is 384.

This is the log of det_cls_prob_7 in testing.
8690 net.cpp:434] det_cls_prob_7 <- det_cls_score_reshape_7
I1010 07:39:07.711601 8690 net.cpp:408] det_cls_prob_7 -> det_cls_prob_7
I1010 07:39:07.711802 8690 net.cpp:150] Setting up det_cls_prob_7
I1010 07:39:07.711809 8690 net.cpp:157] Top shape: 1 21 60 6 (7560)

It seems the size of det_cls_prob_7 is all right.
So I'm puzzled where the problem is.
I only modify the input_shape. Is it right?How to use 384 or other size for testing?
Could you help me? Thanks!

The demo result is not good

@taokong

I have tried the model (RON320_VOC0712_VOC07.caffemodel) you provided.

But the location of the bounding is not good.

For example, Here are the faster rcnn demo images.

test

test

Any idea?

Save Model every 20 iters

Hello. When I used your method and framework to train on Pascal VOC. I found that the method save caffemodel every 20 iters, and I did not hope it save model too frequently. Then I modified your solver.prototxt, use snapshot: 10000 to change your original code, but it did not work. Could you told me how to modify your code and make my idea work.

weight_decay: 0.0005
#We disable standard caffe solver snapshotting and implement our own snapshot
#function
snapshot: 0
#We still use the snapshot prefix, though
snapshot_prefix: "RON-REDUCED"
#debug_info: true

What's the meaning of the number in rpn-data param "param_str"

Hi,
recently,l'm reading your RON source code. I have a question about the file "traincudnn.prototxt". what's the meaning of param_str: "'stride_scale_border_batchsize': 64,7,32,256" in rpn-data_7 and this param_str in the other but the same position layer. I want to konw each meaning of the num in the param.

Thanks.If disturbed, please forgive.

BaiduYun

Hi ,
Thanks for the great code. I was wondering if you could host the VGG16_layers_fully_conv.caffemode for training (https://pan.baidu.com/s/1c2xm2U8#list/path=%2F) somewhere other than Baidu? It requires a confirmation code with chinese area code to be able to download it. If this is the generic pre-trained VGG, can I download it from the modelzoo?

Thanks

How can I use RON when the objects of my datasets are all small?

 First of all, thanks for your great work in object detection based on deep learning. I want to use RON framework to my own datasets, the objects of my datasets are all small. I only reserve rpn_lrn4 layer in order to detect small objects. I also change num_output to my own number of classes. My datasets only have two classes of objects , background and car. What's wrong with my changes? What else should I do?
 Looking foward your apply! Thanks!

training time?

Thanks for your impressive work. In my workstation with Titan X, it need about 3 second for each iteration. And the training time is about 100 hours (3*120000 second). Is it normal?

CUDNN error

Thanks for your helpful work! When I am trying to run the code following the guide of yours, I got the following issue:
F1119 15:00:44.842713 16912 cudnn.hpp:96] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM.

Could you please tell me how to deal with this? Thank you in advance.

demo

测试了一下demo,发现同一个物体回归了好多框。。。

Why the trainning is very slow in my dateset

When I use GTX1080 to train in VOC2007, It is very fast. It can iterate 100 times during 30 minites. But When I use GTX1080 to train in my dateset, It can iterate 30 times during the whole night. It seems something wrong. Do you know the reason? Thank you!

Train and Test for input size 384

Thanks for the model. I was able to train and test the RON model for input size 320 (different dataset). I like to train and test for input size 384. I was able to train for input size 384 by adding the 384 to C.TRAIN.SCALES
__C.TRAIN.SCALES = (384,320,256)

Now I want to test. Can you please let me know how need to change the "test320cudnn.prototxt" for 384 input?

Two questions about the code

@taokong
@taokongcn

  1. 在test.py文件和paper中指出:

  scores = np.tile(scores[:, 0], (imdb.num_classes, 1)).transpose() * scores

  相当于给"分类(21分类)得分"乘以了"其属于物体的概率".为什么采用这种得分形式?为什么要进行这样的处理?

  1. 在anchor_target_layer.py和det_target_layer.py中,
  if len(fg_inds) > 0:
            num_bg = len(fg_inds) *  (1.0 - cfg.TRAIN.FG_FRACTION) / (cfg.TRAIN.FG_FRACTION)
        else:
            num_bg = self._batch
    
        bg_inds = np.where(all_labels == 0)[0]
        if len(bg_inds) > num_bg:
            disable_inds = npr.choice(bg_inds, size=int(len(bg_inds) - num_bg), replace=False)
            all_labels[disable_inds] = -1

  感觉只要存在正样本,这个batch_size参数就没用了.这是只是保证正负样本比例是1:3,并不考虑正负样本的总数(超过256/512也可以).这样的理解对吗?

  麻烦了!!

Changing alpha and beta

Hi,
thanks you so much for the code, can you tell me how can I change alpha and beta from equation(2)? This is to change the loss function
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.