luoyetx / mini-caffe Goto Github PK

View Code? Open in Web Editor NEW

376.0 34.0 151.0 31.65 MB

Minimal runtime core of Caffe, Forward only, GPU support and Memory efficiency.

License: BSD 3-Clause "New" or "Revised" License

CMake 2.38% Batchfile 0.07% C++ 71.03% Shell 1.03% Cuda 18.34% C 2.06% Java 1.11% Python 3.97%

mini-caffe openblas cudnn cuda caffe forward-only android windows linux

mini-caffe's People

Contributors

Stargazers

Watchers

Forkers

mydude chetkhatri nviso liulei2776 caomw nianfudong runauto ccsuzjj liuhuiwisdom feng520893 mfzhang liaoheping jren2017 ahuang1900 cv9527 zgsxwsdxg arthacker123 yangsuo123 algpower yangsuo zhengdi3000 fireae faceteam jtn-ms 6676401088 walkupup tianboguangding templeblock gragonshit xwang0415 lijian8 devy001 jiangxuehan aliushn hojel xqpinitial walkoncross yxlijun koosuf lyk125 turingki xshwen matrixplayer suzeyinhappyboy elegantgod caozhengquan deepxkn dl-yc chagge nhzlx yibit dreadlord1984 huanleo cbhust8025 liyancas yang53 nickshargan liujie3948 guozhongluo liubinyijia zhanglaplace baifanysu abrams90 daimagou meesokim liuyajian sdut10523 playbar neurorobotictech nickygeorge lilysys signalimagecv l1129433134 keyboardless chop2 stoneyang-caffes scatiger baucheng xuyuan shunfeng66 huazhiai rogerou nihui xyt2008 terminai rivercn biranli zhj-buffer xialuxi zhaoluo xzlib uptodiff mygotone johnhush jaredyedh xuxing199409 zghzdxs bowrein sygoing hlcool

mini-caffe's Issues

What's the BNParameter 's role ?

There is a message "BNParameter" in caffe.proto file but it's never used in any .cpp filed.So what's the role of this message ?

作者您好：
在caffe里面基本上所有的类都是模板类，但是在minicaffe的实现中，把模板都去掉了，采用的是float的数据类型。所以我想请教下：为什么要去模板化？c++的模板是编译时的多态，因此模板的存在不会影响程序运行时的性能。minicaffe应该算是caffe的前向传播版本，去掉模板本身也是一项工作，但是这个工作好像也不能带来性能的提升。所以想问下您，去模板是出于一个什么样的想法？
最后，非常感谢您在前面几次的耐心回答！

Python binding

We need a python binding to run python code with caffe

android version : can not new a caffe::net

hello:
I builded a .so file which linked against minicaffe of android version. And I also have written a JNI file. But when I invoke the JAVA class in android studio ,a error was occurred：
A/libc:Fatal signal 6 (SIGABRT),code -6 in aid 28981.
I located the question in one sentence：caffe::Net net_ = new caffe::Net("..."). It seems like the the caffe's construct function was not invoked at all. So I doubt the size of stack was overflow or the program was failed to link against the libcaffe.so.
Could you give me some advice ? Thank you a lot !

内存问题

您好有个问题想请教一下您，就是我使用您的提供API的实现了对视频进行检测模型的修改开始也可以正常的运行但是过一段时间内存会突然增加然后就检测不到目标了未修改之前使用原版caffe是没有内存问题的。问一下您提供的API除了destory那个函数之外还有其他地方需要的手动释放内存的吗？谢谢。

How to install mini-caffe's gpu model in TK1?

I Know caffe could be installed in TK1, but when I install mini-caffe hit this issue:

/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/stddef.h(432): error: identifier "nullptr" is undefined

I tried to solve this problem by add : list(APPEND CUDA_NVCC_FLAGS --compiler-options "-std=c++03")
to Cuda.cmake but hit this issue:

/usr/include/c++/4.8/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.

How to stop printing LOG(INFO) in mcaffe?

When using minicaffe in my program, it continuously print LOG(INFO) message like this
"""
[18:07:54] ~/GitHub/mini-caffe/src/util/upgrade_proto.cpp:35: Attempting to upgrade input file specified using deprecated input fields: ./config/deploy.prototxt
[18:07:54] ~/GitHub/mini-caffe/src/util/upgrade_proto.cpp:38: Successfully upgraded file specified using deprecated input fields.
[18:07:54] ~/GitHub/mini-caffe/src/util/upgrade_proto.cpp:40: Note that future Caffe releases will only support input layers and not input fields.
"""
These messages are not useful for me now, and slow down the whole program running.

I have tried some methods to change GLOG level, such as set GLOG_minloglevel to 2, etc
But the printing is still the same.

So I wonder how to forbid this print screen, without modifying src code of mini-caffe.

error while compiling miniCaffe using cuDNN

I compile the minicaffe as below:
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_CUDA=ON -DUSE_CUDNN=ON
make
And I have already intsalled the cudnn.But a problem was occured:
[ 0%] Building NVCC (Device) object CMakeFiles/cuda_compile.dir/src/layers/cudnn/cuda_compile_generated_cudnn_bn_layer.cu.o
/home/lgz/mini-caffe/src/layers/cudnn/././cudnn.hpp(97): error: too few arguments in function call

1 error detected in the compilation of "/tmp/tmpxft_00003fb4_00000000-7_cudnn_bn_layer.cpp1.ii".
CMake Error at cuda_compile_generated_cudnn_bn_layer.cu.o.cmake:266 (message):
Error generating file
/home/lgz/mini-caffe/build/CMakeFiles/cuda_compile.dir/src/layers/cudnn/./cuda_compile_generated_cudnn_bn_layer.cu.o

What's the matter ?

cudnn too few arguments

What version of cudnn are you using? I can't compile mini-caffe on the TX2
error:
mini-caffe/src/layers/cudnn/././cudnn.hpp(97): error: too few arguments in function call

The performance of GPU version is much bad than CPU version

Hi:
I run a program which linked against mini-caffe and I compiled the mini-caffe in both cpu and gpu sytles. But the result is weird. It cost 47ms to finish the task in cpu version but cost 170ms in gpu version.Besides , some log message were printed to screen like:
[14:05:15] /home/lgz/mini-caffe/src/syncedmem.cpp:275: [CPU] Requested 36.8 K, Get 49 K
[14:05:15] /home/lgz/mini-caffe/src/syncedmem.cpp:275: [CPU] Requested 73.5 K, Get 98 K

although they get the same result,the performance of gpu version made me puzzled. is it normal ? I know much parallelism is needed to use gpu.If the scale of my problem is too small ?? how to explain this ?

can't find caffe API !

In minicaffe I couldn't find c++ API like this: net->input_blobs() , net->output_blobs(). In fact, I need this API;

你好，我在编译mini-caffe遇到链接错误的问题

你好我遇到的错误如下：>LINK : fatal error LNK1104: 无法打开文件“Debug\caffe.lib”。
不应该是编译生成caffe.lib么？为什么在编译过程中需要调用caffe.lib这个库呢。谢谢~~

double free or corruption (!prev): 0x00de2b60 *** Abortedn

Hi, I get a problem when I run your mini caffe example on arm(raspberry pi 3 b):

./ex
detection time: 346.662
face1 keypoints time: 88.37
face2 keypoints time: 83.353
face3 keypoints time: 83.481
face4 keypoints time: 83.531
face5 keypoints time: 83.623
detect number: 5
sleep time: 3000.29
*** Error in `./ex': double free or corruption (!prev): 0x019fc6f0 ***
Aborted

it occurs in main function when return;

I also have ran MTCNN with your mini caffe, but facing the same problem:

pi@raspberrypi:~/MTSrc/build $ ./MTCNN ../ ../test2.jpg
Detect 431X450 Time Using CPU: 2597.32
*** Error in `./MTCNN': double free or corruption (!prev): 0x00de2b60 ***
Aborted

How to solve it?

Why not use page-locked host memory for cpu blobs?

Hello,
Generally, page-locked host memory has higher bandwidth than pageable host memory when transferring between CPU and GPU. However, in the current implementation of SyncedMemory, host memory is allocated as pageable. Is there any reason for not using page-locked host memory? Thank you!

page-locked host memory references:

A bug in blog.cpp

In the blob.cpp: reshape function,we should allocate shape.sizesizeof(int) instead of KMaxBlobAxessizeof(int).
Is that right ?

How do miniCaffe used in MTCNN?

mtcnn need net_input_blobs_ and net_output_blobs_ but minicaffe not include, how change to mtcnn?

编译GPU版本怎么样指定计算能力？

您好：
minicafffe默认的好像只是编译了一种计算能力的nv代码，我希望指定的或者全部的nv代码应该怎么修改cmake ?

does it memory safe when I use new a net ?

hello:
I have read your examples and almost all the nets were created in the stack instead of heap. But I alloc the net in the heap memory。Actually，it's ok worked on X86 linux,but the segmentation fault was occurred when destruct the net worked on arm linux. The code is totally same,what make this different ?

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 7007:16: Message type "caffe.LayerParameter" has no field named "python_param".

Successfully built rfc executable file, but when runing throw this error. What is this about?

How to use android mini-caffe?

Hello, I'm a fresher in android develop and I have developed some neural networks used c++ program language in x86\ubuntu OS. Now I have build mini-caffe to android, I want to know if my c++ programs needed modify.
Thank you!

SSD MobileNet内存占用过大

在CPU上运行SSD MobileNet内存占用900多M，而用Opencv的DNN模块来运行，内存200多M，差别为什么这么大呢

Why the speed in minicaffe is much worse than caffe with same prototxt

I compare the resnet from your run_test.cpp.
But the performance is like below. The speed drop down than caffe.
Any ideas?
I'm useing the newest caffe with cudnn 5.1.5.

Thanks

Windows编译出错！！

你好。
我已经按你的说明，在vs工具编译文件：
cpu的编译成功。
cuda的编译成功。
cuda+cudnn的编译出现如下错误：

版本为cuda8.0 + cudnn 5.1
请问这是什么情况？？

多线程问题

你好,问下在mini-caffe中common.hpp中有Caffe这个类,为什么找了半天没看到在哪里有用啊,还有这个mini-caffe的多线程是怎么样的啊?就是在不同的线程中调用用CaffeNetCreate初始化了的net会怎么样,会和BVLC/caffe#4595 中一样吗?

[Proposal] Strategy to reduce memory usage

Caffe currently consumes too much memory during the Forward phase. It's mainly because the internal temporary buffer held by the Layer. e.g Convolution Layer need to cache im2col result for gemm operation. These temporary buffer is not shared between layers which causing too much memory usage. Second, since we don't perform backward operation, network internal buffer can also be reused or freed as long as no other layer needs it.

Mini-Caffe should change this situation without breaking any high level API exposed in include (maybe add some APIs).

Some ideas.

Layer who needs temporary memory should requests memory from a global memory source manager. The Manager itself hold the data and borrow the memory buffer to which requests. A memory pool can be implemented or just reuse the same memory and resize if request is too big. The network internal buffer can also be requested through the manager but need to track the dependency of this named blob, return the memory to manager when no other layer needs it. This strategy comes within every Forward phase.
Since Caffe network graph is static, we can plan the memory before forwarding the graph. Some Layer API changes will be helpful. Layer itself should only gives network the memory size it needs and let network holds the memory and borrow it to the layer during Forward. This includes bottom, top and temporary memory. Change Reshape function of every layer. Counting the dependency of network internal blobs, plan the memory and reuse the internal blobs. This strategy comes before every Forward phase.

caffe.LayerParameter has no field named"permute_param"

Hi:
用mini-caffe编译mobile-ssd时，遇到“caffe.LayerParameter has no field named"permute_param"错误，
请问怎么修改？

谢谢

problem when compiling mini-Caffe

I install the miniCaffe with the commends listed in readme.md and my OS is ubuntu.Then I encounter a problem like this:
[ 96%] Linking CXX executable run_net
/usr/bin/ld: CMakeFiles/run_net.dir/tests/run_net.cpp.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
CMakeFiles/run_net.dir/build.make:95: recipe for target 'run_net' failed
make[2]: *** [run_net] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/run_net.dir/all' failed
make[1]: *** [CMakeFiles/run_net.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

I know nothing about cmake. Could anyone help me to solve this problem ?

错误 557 error LNK1120: 9 个无法解析的外部命令 F:\minicaffe\mini-caffe\build\Release\caffe.dll caffe

what is wrong with it?

add debug POSTFIX `d`

I found that I cannot use release version of caffe.dll and caffe.lib when I try to debug my project. It is a good idea to set postfix d to distinguish them just like opencv does.

set(CMAKE_DEBUG_POSTFIX "d" CACHE STRING "Set debug library postfix")

time

@luoyetx thanks for your nice work!the code is really simplified for caffe inference.
and there is a question:
when I use the time count:
clock_t start = clock();
clock_t end = clock();
float time = (float)(end - start) / CLOCKS_PER_SEC;
the time I get is about 1000ms,but when I use profiler in the mini-caffe,the time I get is about 250ms,I know the clock() is not so accurate,and the time it get is about twice of profile,but why I get four times?

libopenblas.dll

请问这个数学卷运算库，有没有32位的lib,和dll啊，能发我一份么，谢谢了。[email protected]

There is a reshape related bug after "build with CUDA9 and cuDNN7"

After I updated the mini-caffe, there is a strange thing:
How to reproduce:

execute a forward with shape (2,c,w,h)
then run a forward with shape(1,c,w,h)
re-execute a forward with shape (2,c,w,h), note that the input is same with step 1, you will find the output is different with step 1, and if you re-execute step2, the output will change too.

I have tried to roll back it to "build with CUDA9 and cuDNN7", it will be ok, so this bug should be imported after this version, I will test other version if I have time.

@luoyetx

Question with using GPU

Hello,
Why the GPU is used when I didn't invoke caffe::SetMode(caffe::GPU,0) in my program ? Although I complied it with options -DUSE_CUDA=ON -DUSE_CUDNN=ON, the value of mode_ is Caffe::CPU and it should work in CPU style (I get it from common.cpp). In my thought,even though I complied it with cuda and cudnn on, the mode is still Caffe::CPU if I didn't invoke the SetMode function and GPU should not be occupied (I check it with NVIDIA-SMI).

Using minicaffe net as a class member

Hello,
I have a class like:
class A {
private:
caffe::Net* net_;
}

The constructor like:
A:A(){
net_ = new caffe::Net("**.prototxt");
}
A weird question was occurred:
When I use the class A like:
A = new A();
Everything is OK.

But when I use the class A like:
A a;
A crash happened ! After I run forward of net, I get a nan result.

how to use GPU accelerate ?

Hello, I use command "cmake .. -DUSE_CUDA=ON -DUSE_CUDNN=ON" to use gpu, but only have a little speed up, not same as caffe. This is the result:

minicaffe cpu : 51 ms. gpu: 44ms
caffe cpu: 78ms gpu: 9ms
I know caffe to use gpu should add "caffe.set_mode_gpu()" in source file, how about minicaffe ?
Do you make opt in forward_cpu_gemm , why you program run fast than caffe ?
Thank you!

cmake cannot find and configure cuda10 correctly

how to modify the cuda.cmake to find cuda10.
tks

How to download the caffemodel file ?

I want to run the example in the example fold,But after compiled the program said no caffemodel.How can I download this caffemodel file?

Layer::Forward does not work when inference faster-rcnn

I find the master code of layer::Forward has be changed like this:

inline void Layer::Forward(const vector<Blob>& bottom,
const vector<Blob>& top) {
switch (Caffe::mode()) {
case Caffe::CPU:
Forward_cpu(bottom, top);
break;
case Caffe::GPU:
Forward_gpu(bottom, top);
break;
default:
LOG(FATAL) << "Unknown caffe mode.";
}
}

There is no reshape() before real layer to do forward.
This change does not support faster-rcnn because this kind of network will reshap top layer by down layer at runtime.
I think this feature should be considered. Thanks.

How to add a new layer?

Thank you for sharing the code. Could you tell me how to add a new layer in min-caffe? Is it the same as caffe?

problem on linux

the cpu mode is oK; But the gpu mode gets much error:
.
.
.
.
/usr/include/c++/4.8/cmath(278): error: inline specifier allowed on function declarations only

/usr/include/c++/4.8/cmath(278): error: variable "std::constexpr" has already been defined

/usr/include/c++/4.8/cmath(278): error: expected a ";"

/usr/include/c++/4.8/cmath(297): error: inline specifier allowed on function declarations only

/usr/include/c++/4.8/cmath(297): error: variable "std::constexpr" has already been defined

/usr/include/c++/4.8/cmath(297): error: expected a ";"

/usr/include/c++/4.8/cmath(328): error: "constexpr" is not a function or static data member

/usr/include/c++/4.8/cmath(337): error: inline specifier allowed on function declarations only

/usr/include/c++/4.8/cmath(337): error: variable "std::constexpr" has already been defined

/usr/include/c++/4.8/cmath(337): error: expected a ";"

/usr/include/c++/4.8/cmath(356): error: inline specifier allowed on function declarations only

/usr/include/c++/4.8/cmath(356): error: variable "std::constexpr" has already been defined

/usr/include/c++/4.8/cmath(356): error: expected a ";"

/usr/include/c++/4.8/cmath(375): error: inline specifier allowed on function declarations only

/usr/include/c++/4.8/cmath(375): error: variable "std::constexpr" has already been defined

/usr/include/c++/4.8/cmath(375): error: expected a ";"

/usr/include/c++/4.8/cmath(406): error: inline specifier allowed on function declarations only

/usr/include/c++/4.8/cmath(406): error: variable "std::constexpr" has already been defined

/usr/include/c++/4.8/cmath(406): error: expected a ";"

/usr/include/c++/4.8/cmath(443): error: inline specifier allowed on function declarations only

/usr/include/c++/4.8/cmath(443): error: variable "std::constexpr" has already been defined

/usr/include/c++/4.8/cmath(443): error: expected a ";"

Error limit reached.
100 errors detected in the compilation of "/tmp/tmpxft_00005df7_00000000-7_math_functions.cpp1.ii".
Compilation terminated.
CMake Error at cuda_compile_generated_math_functions.cu.o.cmake:264 (message):
Error generating file
/home/methods/mini-caffe-master/build_gpu/CMakeFiles/cuda_compile.dir/src/util/./cuda_compile_generated_math_functions.cu.o

make[2]: *** [CMakeFiles/cuda_compile.dir/src/util/./cuda_compile_generated_math_functions.cu.o] Error 1
make[1]: *** [CMakeFiles/caffe.dir/all] Error 2
make: *** [all] Error 2

CPU mode speed problem

I found mini-caffe running speed is slower than speed of official version in CPU mode. It blame to conv layer in figure below. Can anyone give some tip that why conv layer will be slow?

left: mini-caffe time. right: official caffe time. run in win10 vs2013.

//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line

what is the error:

[100%] Linking CXX executable run_net
/usr/bin/ld: CMakeFiles/run_net.dir/tests/run_net.cpp.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
CMakeFiles/run_net.dir/build.make:99: recipe for target 'run_net' failed
make[2]: *** [run_net] Error 1
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/run_net.dir/all' failed
make[1]: *** [CMakeFiles/run_net.dir/all] Error 2
Makefile:83: recipe for target 'all' failed

Below example without Thread works fine and frees memory:

int main()
{
     for(int i=0;i<10;i++)
    {
        caffe::Net* cnn = new caffe::Net(("deploy.prototxt"));
        delete cnn;
    }
    std::cin.ignore();
    return 0;
}

The below example using Threads doesn't free memory

void threadWork()
{
        caffe::Net* cnn = new caffe::Net(("deploy.prototxt"));
        delete cnn;
}

int main()
{
     for(int i=0;i<10;i++)
    {
        std::thread t1(threadWork);
        t1.join();
    }
    std::cin.ignore();
    return 0;
}

Do you know what is causing this?

About LSTM layer.

Hi, guys. Are there any implementations of LSTM layer? I had tried to add the LSTM layer but failed.

luoyetx / mini-caffe Goto Github PK

mini-caffe's People

Contributors

Stargazers

Watchers

Forkers

mini-caffe's Issues

I find the master code of layer::Forward has be changed like this:

inline void Layer::Forward(const vector<Blob*>& bottom, const vector<Blob*>& top) { switch (Caffe::mode()) { case Caffe::CPU: Forward_cpu(bottom, top); break; case Caffe::GPU: Forward_gpu(bottom, top); break; default: LOG(FATAL) << "Unknown caffe mode."; } }

Recommend Projects

Recommend Topics

Recommend Org

inline void Layer::Forward(const vector<Blob>& bottom,
const vector<Blob>& top) {
switch (Caffe::mode()) {
case Caffe::CPU:
Forward_cpu(bottom, top);
break;
case Caffe::GPU:
Forward_gpu(bottom, top);
break;
default:
LOG(FATAL) << "Unknown caffe mode.";
}
}