Giter Site home page Giter Site logo

jiadilee / ncnn-with-cuda Goto Github PK

View Code? Open in Web Editor NEW

This project forked from atanmarko/ncnn-with-cuda

0.0 0.0 0.0 12.66 MB

Tencent NCNN with added CUDA support

License: Other

Shell 0.11% C++ 58.88% C 26.64% Cuda 1.81% CMake 1.03% GLSL 11.52% Batchfile 0.02%

ncnn-with-cuda's Introduction

This project implements GPU NVIDIA CUDA inference support for well known Tencent NCNN inference engine. Many of the Edge AI projects on NVIDIA Jetson family of devices could benefit from this support.


Development Status

Following layers have been currently implemented in CUDA: AbsVal, BatchNorm, Bias, BinaryOp, BNLL, Concat, Convolution, ConvolutionDepthWise, Crop, Flatten, InnerProduct, Input, Packing, Padding, Pooling, Quantize, ReLU, Reshape, Softmax, Split

Development plan for the near future:

  • Cuda implementation of layers Eltwise, HardSigmoid, HardSwish, Interp, Scale, Yolov3DetectionOutput
  • Further optimization of existing CUDA layers (with the goal to beat Vulkan performance ;) )

For usecases where some CUDA layer implementation is missing, CPU/GPU data ping-pong will slow the execution significantly.

Develop branch is used for active development. Development of new layers is performed on develop_<layer_name> branch which is squashed before merging to develop branch. Occasionaly upstream updates and fixes would be added to the project.

Build and test

Build project:

git clone https://github.com/atanmarko/ncnn-with-cuda
cd ncnn-with-cuda
mkdir build
cd build
cmake -DNCNN_VULKAN=OFF -DNCNN_CUDA=ON -DLOG_LAYERS=OFF ..
make

Test particular layer

To run test for particular layer, where CPU vs CUDA implementation and execution speed is tested (<layer_name> is name of the CUDA layer in small case, e.g. test_convolution):

cd build/tests
./test_<layer_name>

Check which layers are not executed on CUDA

Build project with turned on LOG_LAYERS config parameter:

cmake -DNCNN_VULKAN=OFF -DNCNN_CUDA=ON -DLOG_LAYERS=ON ..

Run the particular example or network benchmark and grep for non cuda layers:

./retinaface <path to image file> | grep forward | grep -v cuda

Run retinaface test program:

Copy mnet.25-opt.bin and mnet.25-opt.param files to the build/examples directory available here:

cd build/examples
./retinaface <path_to_picture_file>

Benchmark Retinaface:

Copy mnet.25-opt.bin and mnet.25-opt.param files to the build/benchmark directory.

cd build/benchmark
./retinaface-benchmark <path_to_picture_file>

It will run 10 loops of Retinaface face detection and print inference timing results. Retinaface stride 32 has all the layers implemented in CUDA.

Image Size Stride CPU av. time (us) Vulkan av. time (us) CUDA av. time (us)
i7-4790, GTX 1060 640x480 32 28.90 33.20 31.10
i7-4790, GTX 1060 1280x720 32 92.70 96.90 54.00
i7-4790, GTX 1060 1920x1080 32 167.50 204.50 91.70
Jetson AGX Xavier 640x480 32 373.20 402.10 343.60
Jetson AGX Xavier 1280x720 32 508.30 738.40 327.60
Jetson AGX Xavier 1920x1080 32 812.00 934.70 436.70

License

NCNN CUDA implementation: BSD 3 Clause

Original NCNN Licence: BSD 3 Clause

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.