Comments (15)
两位大佬之间的对话!
The dialogue between the two bosses!
from faster_rcnn_pytorch.
@longcw Shouldn‘t the data order used in PyTorch be NCHW ?
from faster_rcnn_pytorch.
Will this problem appear when running on only one GPU?
Since I only have a GPU that I cannot reproduce your problem.
Maybe you can offer me some debug information and I am very willing to help you.
from faster_rcnn_pytorch.
I don't know any more information. The only problem is when I time all the operations, I find that the ROI Pooling time consuming grows a lot. Currently, when I change the -arch parameters from sm_35 to sm_52, it seems OK so far. Do you know what is the parameter for?
Best,
from faster_rcnn_pytorch.
The -arch compiler option specifies the compute capability that is assumed when compiling C to PTX code.
Read more at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz4aDuTT670
You can find the compute capability of your GPU in https://developer.nvidia.com/cuda-gpus
from faster_rcnn_pytorch.
Actually, it has nothing to do with --arch parameter. The per-iter-time is still doubled after about 50000 iterations.
from faster_rcnn_pytorch.
Did you implement ROI Pooling layer by yourself?
from faster_rcnn_pytorch.
No. The kernel code or ROI Pooling is copied from CharlesShang and smallcorgi.
I have no idea about your problem because the code itself is really easy to understand. I only changed the index expression since pytorch uses an order of [batch, c, w, h]:
// int bottom_index = (h * width + w) * channels + c;
int bottom_index = (c * height + h) * width + w;
You can see the code here: https://github.com/longcw/faster_rcnn_pytorch/blob/master/faster_rcnn/roi_pooling/src/cuda/roi_pooling_kernel.cu
from faster_rcnn_pytorch.
I saw a commit named 'fix the memory leak for ROI pool module'. If you don't mind, could you give me some detailed information about that?
Best,
Yikang
from faster_rcnn_pytorch.
I misunderstood torch.autograd.Function
and used it as a Module
when I first time implemented the ROI Pooling layer. The memory used by Function will not release after each iteration if I use the Function as a class member variable.
# faster_rcnn/roi_pooling/modules/roi_pool.py
# self.roi_pool = RoIPoolFunction(...) # wrong
# return self.roi_pool(features, rois)
return RoIPoolFunction(...)(features, rois) # right
from faster_rcnn_pytorch.
I trained Faster RCNN for 100k iterations on a GTX 1080 without speed decreasing.
Maybe your problem is a bug of PyTorch in multi-gpu. Did you try to stop and restore it after 50k iterations?
from faster_rcnn_pytorch.
Yes, currently, my strategy is to snapshot the model every 10k iters. When the speed deteriorates, I restart training from the snapshot.
So you are studying in Tsinghua? I was there in EE Dept.
from faster_rcnn_pytorch.
Yes, I am in CS Dept.
from faster_rcnn_pytorch.
Thank you very much. Sorry for the late reply due to the ICCV deadline.
Are you going to submit any paper to ICCV?
from faster_rcnn_pytorch.
No. I look forward to reading your paper in ICCV 😀.
from faster_rcnn_pytorch.
Related Issues (20)
- For getting more accuracy in faster rcnn , which parameters i have to tune (tuning parameters)
- IndexError: list index out of range
- It is question about cpu only
- ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead what pytorch version we need HOT 1
- It is prediction time problem?
- Is this a Fast-RCNN structure rather than Faster-RCNN?? HOT 1
- potential bug in __init__.py
- Building module pycocotools._mask failed: ["CompileError: command 'gcc' failed with exit status 1\n"]
- out of memory if don`t fix VGG16 param HOT 1
- BaiduYun is canceled ! unable to download the trained model
- RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time
- sh make.sh problem in Window10.. HOT 1
- No module named 'blob'
- Modification of the VGG16 network
- AttributeError: 'module' object has no attribute 'roi_pooling_forward_cuda'
- TypeError: dist must be a Distribution instance HOT 5
- No module named 'resource'
- ImportError: libcudart.so.10.0: cannot open shared object file. HOT 2
- ./make.sh 出现ModuleNotFoundError: No module named 'torch' HOT 1
- __cudaRegisterFatBinaryEnd HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from faster_rcnn_pytorch.