Giter Site home page Giter Site logo

Comments (11)

Maratyszcza avatar Maratyszcza commented on May 20, 2024

Which convolution parameters and algorithm do you use?

from nnpack.

conansherry avatar conansherry commented on May 20, 2024

default param. AUTO BLOCK_BASED

from nnpack.

conansherry avatar conansherry commented on May 20, 2024

this is my prototxt
input data is 1 X 3 X 60 X 60 (i use another size of input image[640X480], also slower than openblas. it cost about 2x time than openblas.)
my openblas is compiled with ndk12b include gfortran.

from nnpack.

austingg avatar austingg commented on May 20, 2024

@conansherry nnpack only supports conv with 1 stride, when stride > 1, nnpack also uses im2col + sgemm.
however, I wonder why it cost 2x time compared to openblas.

from nnpack.

conansherry avatar conansherry commented on May 20, 2024

@austingg oh i see the source code. and you are right.
openblas with gfortran is the best blas library in android according my experiments. compared to pure c openblas and eigen.

from nnpack.

austingg avatar austingg commented on May 20, 2024

@conansherry thanks for sharing your experiments result. I will do some further experiments on openblas with gfortran.

from nnpack.

conansherry avatar conansherry commented on May 20, 2024

@Maratyszcza @austingg does the nnpack only support specify size kernel like 3x3 or 16x16? in my new test, the kernel size 5 and strid 1 come the wrong results.

from nnpack.

conansherry avatar conansherry commented on May 20, 2024

@Maratyszcza @austingg oh, i lookup the caffe2 implements. i use the tuple_based and everything is ok. I also check the souce code in convolution-inference.c other mode is not implement.

from nnpack.

conansherry avatar conansherry commented on May 20, 2024

@Maratyszcza @austingg
Here i will share some result in my experiments.
Android mobile phone. XIAOMI 5 Plus. all library run on single thread mode.
dl net contains 4 conv layers and two inner production layers. all conv layers are stride1 and kernel size euqal 5 or 3.
I run program 10 times for each experiment. here is the results, so i continue to choose openblas in my library. Thank you for nnpack source sharing and it's also a good job for the multi-core cpu.

openblas with gfortran
time forward 11.841000
time forward 10.097000
time forward 10.139000
time forward 10.583000
time forward 10.498000
time forward 10.358000
time forward 10.501000
time forward 10.440000
time forward 10.524000
time forward 10.268000

NNPACK FFT16X16
time forward 32.105999
time forward 28.781000
time forward 29.034000
time forward 61.912998
time forward 31.129999
time forward 27.649000
time forward 27.438000
time forward 26.731001
time forward 31.448000
time forward 28.899000

NNPACK FFT8X8
time forward 21.823999
time forward 21.607000
time forward 13.321000
time forward 15.339000
time forward 33.285000
time forward 19.327000
time forward 20.174000
time forward 16.476999
time forward 15.926000
time forward 16.066000

NNPACK AUTO
time forward 19.642000
time forward 20.684000
time forward 17.167999
time forward 15.738000
time forward 15.673000
time forward 14.938000
time forward 14.289000
time forward 17.891001
time forward 17.363001
time forward 16.375000

NNPACK SGEMM
time forward 23.778999
time forward 22.764000
time forward 33.705002
time forward 34.299000
time forward 28.004000
time forward 30.851999
time forward 25.034000
time forward 25.563999
time forward 33.702999
time forward 23.247999

from nnpack.

austingg avatar austingg commented on May 20, 2024

@conansherry that's pretty good result for a cnn application only costs about 10 ms on mobile devices.

According to my research, gfortran is only related to LAPACK, the conv layers only use gemm. Have you ever do some experiments without gfortran, correct me if i am wrong.

from nnpack.

Maratyszcza avatar Maratyszcza commented on May 20, 2024
  1. implicit GEMM algorithm is similar to Caffe's im2col+SGEMM, but it is optimized for smaller memory footprint. This memory footprint optimization can make it slower than im2col+SGEMM.
  2. For stride > 1 cases only implicit GEMM algorithm is supported in NNPACK.
  3. When the number of channels on the input to convolution is small, the operation is similar to outer product: it is intrinsically memory bound, and fast algorithms in NNPACK do not help with performance.

from nnpack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.