Comments (11)
Which convolution parameters and algorithm do you use?
from nnpack.
default param. AUTO BLOCK_BASED
from nnpack.
this is my prototxt
input data is 1 X 3 X 60 X 60 (i use another size of input image[640X480], also slower than openblas. it cost about 2x time than openblas.)
my openblas is compiled with ndk12b include gfortran.
from nnpack.
@conansherry nnpack only supports conv with 1 stride, when stride > 1, nnpack also uses im2col + sgemm.
however, I wonder why it cost 2x time compared to openblas.
from nnpack.
@austingg oh i see the source code. and you are right.
openblas with gfortran is the best blas library in android according my experiments. compared to pure c openblas and eigen.
from nnpack.
@conansherry thanks for sharing your experiments result. I will do some further experiments on openblas with gfortran.
from nnpack.
@Maratyszcza @austingg does the nnpack only support specify size kernel like 3x3 or 16x16? in my new test, the kernel size 5 and strid 1 come the wrong results.
from nnpack.
@Maratyszcza @austingg oh, i lookup the caffe2 implements. i use the tuple_based and everything is ok. I also check the souce code in convolution-inference.c other mode is not implement.
from nnpack.
@Maratyszcza @austingg
Here i will share some result in my experiments.
Android mobile phone. XIAOMI 5 Plus. all library run on single thread mode.
dl net contains 4 conv layers and two inner production layers. all conv layers are stride1 and kernel size euqal 5 or 3.
I run program 10 times for each experiment. here is the results, so i continue to choose openblas in my library. Thank you for nnpack source sharing and it's also a good job for the multi-core cpu.
openblas with gfortran
time forward 11.841000
time forward 10.097000
time forward 10.139000
time forward 10.583000
time forward 10.498000
time forward 10.358000
time forward 10.501000
time forward 10.440000
time forward 10.524000
time forward 10.268000
NNPACK FFT16X16
time forward 32.105999
time forward 28.781000
time forward 29.034000
time forward 61.912998
time forward 31.129999
time forward 27.649000
time forward 27.438000
time forward 26.731001
time forward 31.448000
time forward 28.899000
NNPACK FFT8X8
time forward 21.823999
time forward 21.607000
time forward 13.321000
time forward 15.339000
time forward 33.285000
time forward 19.327000
time forward 20.174000
time forward 16.476999
time forward 15.926000
time forward 16.066000
NNPACK AUTO
time forward 19.642000
time forward 20.684000
time forward 17.167999
time forward 15.738000
time forward 15.673000
time forward 14.938000
time forward 14.289000
time forward 17.891001
time forward 17.363001
time forward 16.375000
NNPACK SGEMM
time forward 23.778999
time forward 22.764000
time forward 33.705002
time forward 34.299000
time forward 28.004000
time forward 30.851999
time forward 25.034000
time forward 25.563999
time forward 33.702999
time forward 23.247999
from nnpack.
@conansherry that's pretty good result for a cnn application only costs about 10 ms on mobile devices.
According to my research, gfortran is only related to LAPACK, the conv layers only use gemm. Have you ever do some experiments without gfortran, correct me if i am wrong.
from nnpack.
- implicit GEMM algorithm is similar to Caffe's im2col+SGEMM, but it is optimized for smaller memory footprint. This memory footprint optimization can make it slower than im2col+SGEMM.
- For stride > 1 cases only implicit GEMM algorithm is supported in NNPACK.
- When the number of channels on the input to convolution is small, the operation is similar to outer product: it is intrinsically memory bound, and fast algorithms in NNPACK do not help with performance.
from nnpack.
Related Issues (20)
- NNPACK with Windows support HOT 4
- A compilation error occurs in the Linux ARM environment HOT 1
- potential unitialized variable in nnp_sgemm_upto_4x8__psimd HOT 1
- not found /bin/banchmarkxxx
- Why do more threads take longer?
- AltiVec/PowerPC (OpenPOWER ISA 3.0B or greater) Acceleration Support HOT 1
- CMakeLists.txt broken on MSYS2/MINGW64/AMD64 (Windows) HOT 3
- Real-time human detection on Pi 4 HOT 1
- 'vdotq_lane_s32' is invalid in C99 [-Wimplicit-function-declaration] HOT 1
- Build failed, cos_npi_over_8 is not available in common HOT 1
- ModuleNotFoundError: No module named 'peachpy.x86_64.avx' HOT 7
- make install dont link to libcpuinfo.so HOT 1
- NNPACK builds are not bit-for-bit reproducible HOT 1
- Unsupported Hardware on VM with compatible CPU HOT 3
- Does NNPACK fall back to non-accelerated code when "Could not initialize NNPACK! Reason: Unsupported hardware." occurs? HOT 1
- ld: in lib/libnnpack.a(conv1x1.py.o), section __TEXT/__const address out of range for architecture x86_64
- Use CPack for packaging HOT 1
- After Installing NNPACK on MacBook Pro 15, late 2012 retina, I still get: [W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.
- CMake error cpuinfo-gitclone.cmake:40 (message): Failed to checkout tag: 'master'
- [W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nnpack.