Giter Site home page Giter Site logo

ceccocats / tkdnn Goto Github PK

View Code? Open in Web Editor NEW
718.0 40.0 208.0 55.74 MB

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms

License: GNU General Public License v2.0

CMake 1.10% C++ 90.87% C 0.57% Cuda 4.06% Python 2.26% Shell 1.11% Dockerfile 0.02%

tkdnn's Introduction

tkDNN

tkDNN is a Deep Neural Network library built with cuDNN and tensorRT primitives, specifically thought to work on NVIDIA Jetson Boards. It has been tested on TK1(branch cudnn2), TX1, TX2, AGX Xavier, Nano and several discrete GPUs. The main goal of this project is to exploit NVIDIA boards as much as possible to obtain the best inference performance. It does not allow training.

If you use tkDNN in your research, please cite the following paper. For use in commercial solutions, write at [email protected] and [email protected] or refer to https://hipert.unimore.it/ .

@inproceedings{verucchi2020systematic,
  title={A Systematic Assessment of Embedded Neural Networks for Object Detection},
  author={Verucchi, Micaela and Brilli, Gianluca and Sapienza, Davide and Verasani, Mattia and Arena, Marco and Gatti, Francesco and Capotondi, Alessandro and Cavicchioli, Roberto and Bertogna, Marko and Solieri, Marco},
  booktitle={2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)},
  volume={1},
  pages={937--944},
  year={2020},
  organization={IEEE}
}

What's new

20 July 2021

  • Support to sematic segmentation README
  • Support 2D/3D Object Detection and Tracking README

24 November 2021

30 March 2022

FPS Results

Inference FPS of yolov4 with tkDNN, average of 1200 images with the same dimension as the input size, on

  • RTX 2080Ti (CUDA 10.2, TensorRT 7.0.0, Cudnn 7.6.5);
  • Xavier AGX, Jetpack 4.3 (CUDA 10.0, CUDNN 7.6.3, tensorrt 6.0.1 );
  • Xavier NX, Jetpack 4.4 (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ).
  • Tx2, Jetpack 4.2 (CUDA 10.0, CUDNN 7.3.1, tensorrt 5.0.6 );
  • Jetson Nano, Jetpack 4.4 (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ).
Platform Network FP32, B=1 FP32, B=4 FP16, B=1 FP16, B=4 INT8, B=1 INT8, B=4
RTX 2080Ti yolo4 320 118.59 237.31 207.81 443.32 262.37 530.93
RTX 2080Ti yolo4 416 104.81 162.86 169.06 293.78 206.93 353.26
RTX 2080Ti yolo4 512 92.98 132.43 140.36 215.17 165.35 254.96
RTX 2080Ti yolo4 608 63.77 81.53 111.39 152.89 127.79 184.72
AGX Xavier yolo4 320 26.78 32.05 57.14 79.05 73.15 97.56
AGX Xavier yolo4 416 19.96 21.52 41.01 49.00 50.81 60.61
AGX Xavier yolo4 512 16.58 16.98 31.12 33.84 37.82 41.28
AGX Xavier yolo4 608 9.45 10.13 21.92 23.36 27.05 28.93
Xavier NX yolo4 320 14.56 16.25 30.14 41.15 42.13 53.42
Xavier NX yolo4 416 10.02 10.60 22.43 25.59 29.08 32.94
Xavier NX yolo4 512 8.10 8.32 15.78 17.13 20.51 22.46
Xavier NX yolo4 608 5.26 5.18 11.54 12.06 15.09 15.82
Tx2 yolo4 320 11.18 12.07 15.32 16.31 - -
Tx2 yolo4 416 7.30 7.58 9.45 9.90 - -
Tx2 yolo4 512 5.96 5.95 7.22 7.23 - -
Tx2 yolo4 608 3.63 3.65 4.67 4.70 - -
Nano yolo4 320 4.23 4.55 6.14 6.53 - -
Nano yolo4 416 2.88 3.00 3.90 4.04 - -
Nano yolo4 512 2.32 2.34 3.02 3.04 - -
Nano yolo4 608 1.40 1.41 1.92 1.93 - -

MAP Results

Results for COCO val 2017 (5k images), on RTX 2080Ti, with conf threshold=0.001

CodaLab CodaLab CodaLab CodaLab tkDNN map tkDNN map
tkDNN tkDNN darknet darknet tkDNN tkDNN
MAP(0.5:0.95) AP50 MAP(0.5:0.95) AP50 MAP(0.5:0.95) AP50
Yolov3 (416x416) 0.381 0.675 0.380 0.675 0.372 0.663
yolov4 (416x416) 0.468 0.705 0.471 0.710 0.459 0.695
yolov3tiny (416x416) 0.096 0.202 0.096 0.201 0.093 0.198
yolov4tiny (416x416) 0.202 0.400 0.201 0.400 0.197 0.395
Cnet-dla34 (512x512) 0.366 0.543 - - 0.361 0.535
mv2SSD (512x512) 0.226 0.381 - - 0.223 0.378

Index

Dependencies

This branch works on every NVIDIA GPU that supports the following (latest tested) dependencies:

  • CUDA 11.3 (or >= 10.2)
  • cuDNN 8.2.1 (or >= 8.0.4)
  • TensorRT 8.0.3 (or >=7.2)
  • OpenCV 4.5.4 (or >=4)
  • cmake 3.21 (or >= 3.15)
  • yaml-cpp 0.5.2
  • eigen3 3.3.4
  • curl 7.58
sudo apt install libyaml-cpp-dev curl libeigen3-dev

About OpenCV

To compile and install OpenCV4 with contrib us the script install_OpenCV4.sh. It will download and compile OpenCV in Download folder.

bash scripts/install_OpenCV4.sh

If you have OpenCV compiled with cuda and contrib and want to use it with tkDNN pass ENABLE_OPENCV_CUDA_CONTRIB=ON flag when compiling tkDBB . If the flag is not passed,the preprocessing of the networks is computed on the CPU, otherwise on the GPU. In the latter case some milliseconds are saved in the end-to-end latency.

How to compile this repo

Build with cmake. If using Ubuntu 18.04 a new version of cmake is needed (3.15 or above). On both linux and windows ,the CMAKE_BUILD_TYPE variable needs to be defined as either Release or Debug.

git clone https://github.com/ceccocats/tkDNN
cd tkDNN
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release .. 
make

Workflow

Steps needed to do inference on tkDNN with a custom neural network.

  • Build and train a NN model with your favorite framework.
  • Export weights and bias for each layer and save them in a binary file (one for layer).
  • Export outputs for each layer and save them in a binary file (one for layer).
  • Create a new test and define the network, layer by layer using the weights extracted and the output to check the results.
  • Do inference.

Exporting weights

For specific details on how to export weights see HERE.

Run the demos

For specific details on how to run:

  • 2D object detection demos, details on FP16, INT8 and batching see HERE.
  • segmentation demos see HERE.
  • monocular depth estimation see HERE.
  • 2D/3D object detection and tracking demos see HERE.
  • mAP demo to evaluate 2D object detectors see HERE.

demo

tkDNN on Windows 10 or Windows 11

For specific details on how to run tkDNN on Windows 10/11 see HERE.

Existing tests and supported networks

Test Name Network Dataset N Classes Input size Weights
yolo YOLO v21 COCO 2014 80 608x608 weights
yolo_224 YOLO v21 COCO 2014 80 224x224 weights
yolo_berkeley YOLO v21 BDD100K 10 416x736 weights
yolo_relu YOLO v2 (with ReLU, not Leaky)1 COCO 2014 80 416x416 weights
yolo_tiny YOLO v2 tiny1 COCO 2014 80 416x416 weights
yolo_voc YOLO v21 VOC 21 416x416 weights
yolo3 YOLO v32 COCO 2014 80 416x416 weights
yolo3_512 YOLO v32 COCO 2017 80 512x512 weights
yolo3_berkeley YOLO v32 BDD100K 10 320x544 weights
yolo3_coco4 YOLO v32 COCO 2014 4 416x416 weights
yolo3_flir YOLO v32 FREE FLIR 3 320x544 weights
yolo3_tiny YOLO v3 tiny2 COCO 2014 80 416x416 weights
yolo3_tiny512 YOLO v3 tiny2 COCO 2017 80 512x512 weights
dla34 Deep Leayer Aggreagtion (DLA) 343 COCO 2014 80 224x224 weights
dla34_cnet Centernet (DLA34 backend)4 COCO 2017 80 512x512 weights
mobilenetv2ssd Mobilnet v2 SSD Lite5 VOC 21 300x300 weights
mobilenetv2ssd512 Mobilnet v2 SSD Lite5 COCO 2017 81 512x512 weights
resnet101 Resnet 1016 COCO 2014 80 224x224 weights
resnet101_cnet Centernet (Resnet101 backend)4 COCO 2017 80 512x512 weights
csresnext50-panet-spp Cross Stage Partial Network 7 COCO 2014 80 416x416 weights
yolo4 Yolov4 8 COCO 2017 80 416x416 weights
yolo4_320 Yolov4 8 COCO 2017 80 320x320 weights
yolo4_512 Yolov4 8 COCO 2017 80 512x512 weights
yolo4_608 Yolov4 8 COCO 2017 80 608x608 weights
yolo4_berkeley Yolov4 8 BDD100K 10 544x320 weights
yolo4tiny Yolov4 tiny 9 COCO 2017 80 416x416 weights
yolo4x Yolov4x-mish 9 COCO 2017 80 640x640 weights
yolo4tiny_512 Yolov4 tiny 9 COCO 2017 80 512x512 weights
yolo4x-cps Scaled Yolov4 10 COCO 2017 80 512x512 weights
shelfnet ShelfNet18_realtime11 Cityscapes 19 1024x1024 weights
shelfnet_berkeley ShelfNet18_realtime11 DeepDrive 20 1024x1024 weights
dla34_cnet3d Centernet3D (DLA34 backend)4 KITTI 2017 1 512x512 weights
dla34_ctrack CenterTrack (DLA34 backend)12 NuScenes 3D 7 512x512 weights
monodepth2 Monodepth2 13 KITTI DEPTH - 640x192 weights-mono
monodepth2 Monodepth2 13 KITTI DEPTH - 640x192 weights-stereo

References

  1. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  2. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
  3. Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  4. Zhou, Xingyi, Dequan Wang, and Philipp Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019).
  5. Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  6. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  7. Wang, Chien-Yao, et al. "CSPNet: A New Backbone that can Enhance Learning Capability of CNN." arXiv preprint arXiv:1911.11929 (2019).
  8. Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv preprint arXiv:2004.10934 (2020).
  9. Bochkovskiy, Alexey, "Yolo v4, v3 and v2 for Windows and Linux" (https://github.com/AlexeyAB/darknet)
  10. Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "Scaled-YOLOv4: Scaling Cross Stage Partial Network." arXiv preprint arXiv:2011.08036 (2020).
  11. Zhuang, Juntang, et al. "ShelfNet for fast semantic segmentation." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2019.
  12. Zhou, Xingyi, Vladlen Koltun, and Philipp Krähenbühl. "Tracking objects as points." European Conference on Computer Vision. Springer, Cham, 2020.
  13. Godard, Clément, et al. "Digging into self-supervised monocular depth estimation." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

Contributors

The main contibutors, in chronological order, are:

tkdnn's People

Contributors

alessiolei94 avatar ceccocats avatar fabiobagni avatar mive93 avatar omaralvarez avatar perseusdg avatar rcavicchioli avatar rickymedrano avatar sapienzadavide avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tkdnn's Issues

Error when caculate map

@ceccocats
Hello, I have a small problem. I train yolov4 with my dataset (13 class), I prepare test data follow this step :

  • 2 foloder ./images and ./labels. Each .txt in labels has seperate line which follow format :
    label x_center_box/img_width y_center_box/img_height width_box/img_width height_box/img_height
  • New config file like below

Screen Shot 2020-06-18 at 11 51 44 AM

but when I run ./map_demo I have a problem

Screen Shot 2020-06-18 at 11 48 30 AM

Can you help me ? Thanks

Error reading layer bin

Hi @ceccocats !

I have the follow configuration file:
configuration.cfg.zip

When i export using the description in the readme it seems to be successfully:

network input size: 2076672
Predicted in 5.818229 seconds.

networks output size: 18928

I create a test, based on the yolo test, changing the yolo layers in the output_bins like this:

#include<iostream>
#include<vector>
#include "tkdnn.h"
#include "test.h"
#include "DarknetParser.h"

int main() {
    std::string bin_path  = "configuration";
    std::vector<std::string> input_bins = { 
        bin_path + "/layers/input.bin"
    };
    std::vector<std::string> output_bins = {
        bin_path + "/debug/layer60_out.bin",
        bin_path + "/debug/layer66_out.bin",
        bin_path + "/debug/layer71_out.bin"
    };
    std::string wgs_path  = bin_path + "/layers";
    std::string cfg_path  = std::string(TKDNN_PATH) + "/tests/darknet/cfg/configuration.cfg";
    std::string name_path = std::string(TKDNN_PATH) + "/tests/darknet/names/configuration.names";
    // downloadWeightsifDoNotExist(input_bins[0], bin_path, "https://cloud.hipert.unimore.it/s/d97CFzYqCPCp5Hg/download");

    // parse darknet network
    tk::dnn::Network *net = tk::dnn::darknetParser(cfg_path, wgs_path, name_path);
    net->print();

    //convert network to tensorRT
    tk::dnn::NetworkRT *netRT = new tk::dnn::NetworkRT(net, net->getNetworkRTName(bin_path.c_str()));
    
    int ret = testInference(input_bins, output_bins, net, netRT);
    net->releaseLayers();
    delete net;
    delete netRT;
    return ret;
}

All compile successfully, but when i execute the test i obtain the follow output:

./test_configuration
Not supported field: batch=128
Not supported field: subdivisions=32
Not supported field: momentum=0.9
Not supported field: decay=0.0005
Not supported field: angle=0
Not supported field: saturation = 1.5
Not supported field: exposure = 1.5
Not supported field: hue=.1
Not supported field: mixup=1
Not supported field: learning_rate=0.001
Not supported field: burn_in=9000
Not supported field: max_batches = 200000
Not supported field: policy=sgdr
Not supported field: sgdr_cycle=1000
Not supported field: sgdr_mult=2
Not supported field: steps=4000,6000,8000,9000
New NETWORK (tkDNN v0.5, CUDNN v7.605)
Reading weights: I=3 O=16 KERNEL=3x3x1
Reading weights: I=16 O=32 KERNEL=3x3x1
Not supported field: antialiasing=1
Reading weights: I=32 O=64 KERNEL=3x3x1
Not supported field: antialiasing=1
Reading weights: I=64 O=128 KERNEL=3x3x1
Not supported field: antialiasing=1
Reading weights: I=128 O=256 KERNEL=3x3x1
Not supported field: antialiasing=1
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=1024 KERNEL=3x3x1
Reading weights: I=1024 O=256 KERNEL=1x1x1
Not supported field: assisted_excitation=4000
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=128 KERNEL=1x1x1
Reading weights: I=384 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=256 O=64 KERNEL=1x1x1
Reading weights: I=64 O=128 KERNEL=3x3x1
Reading weights: I=32 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=256 O=64 KERNEL=1x1x1
Reading weights: I=512 O=64 KERNEL=1x1x1
Not supported field: maxpool_depth=1
Reading weights: I=768 O=128 KERNEL=3x3x1
Error reading file configuration/layers/c58.bin with n of float: 884736 seek: 0 size: 3538944

/home/mads/bruno/Documentos/tkDNN/src/utils.cpp:58
Aborting...

What can be caused this? Thank you in advance!

Have example of image_list.txt and label_list.txt?

Hi, we try TKDNN_MODE=INT8, and setting have:

export TKDNN_CALIB_IMG_PATH=/path/to/calibration/image_list.txt
export TKDNN_CALIB_LABEL_PATH=/path/to/calibration/label_list.txt

Do you have example of image_list.txt and label_list.txt?

Why does yolov3 run slower on Tesla T4 than on RTX2080ti?

GeForce RTX 2080 Ti:FP16 (half) performance 26.90 TFLOPS (2:1)
Tesla T4:FP16 (half) performance 65.13 TFLOPS (8:1)

Images Per Second

 BatchSize T4 RTX2080ti
1 145.507 197.528
2 241.571 286.453
4 260.832 430.495
8 265.315 520.995
16 283.619 488.933
32 256.451 550.153
64 245.872 554.456
128 243.505 547.426
256 228.099 ——

PS:Testing with tkDNN

yolov3 network with different backbone

appreciate your nice work. How to use tkDNN if I have a yolov3 network with customed backbone? for model exportation, is there any difference? for test inference, I just need to modify the file "yolov3.h" or any other codes?
Thanks!

Convert yolov4 model to tensorrt get "Wrong" error

Hi, thank for this awesome project.
I have a question about converting yolov4 model.
I try run test_yolo4 with yolo-608 config and get red line Wrongs in log as shown below:
Screenshot from 2020-05-11 17-57-22

Line Wrong what dose it mean?
Thanks for the repo again.

when to set pad to 0?

Hi,

I've noticed that sometimes padding is set to 0 while the model(darknet cfg file) has it set to 1. When should padding be set to 0 in a Conv2d obj?

tk::dnn::Conv2d c17( &net, 33, 1, 1, 1, 1, 0, 0, c17_bin, false );

Also, there is no swich activation, right?

Thank you!

TensorRT inference doesn't work after deserialisation

Inference works with Cudnn and works with TensorRT on first creation of the network. However, TensorRT doesn't produce the correct output after deserialisation. The code executes fine, but the data output isn't right (and doesn't change when the input changes). Number and size of input/output buffers is correct. I'm using a Yolo4 network.

I haven't really got any experience directly programming GPU code so I'm not sure where to take debugging efforts next. I think this was the problem highlighted on this thread: #28

I'm using Windows 10, TensorRT 7.0.0.11, Cuda 10.2, Cudnn 7.605
Thanks

Darknet export bugs

Hi, me again.

In run_export in darknet.c in your other rep (sorry, don't know how to post issues on that one) there are a couple of bugs that caused me problems.

  • The fopen's need to be in binary mode
  • char *f[256] should be char f[256] or maybe char f[MAX_PATH] or whatever crossplatform version would work

Thanks for all your work btw.

yolov3-spp?

Thank you very much for your work.
Have you tested the effect of yolov3-spp network deployment on tkDnn, is it consistent with darknet?

Memory leak in test.h

In test.h data is being set via the call to readBinaryFile with a call to cudamalloc, but I'm not seeing a corresponding cudafree.

Sorry if I misread the code.

What can I automatic create tk::dnn struct file?

Hi, we have run example of yolov3 and yolov3-tiny.

We will try self darknet model, first export bin file by:
./darknet export xxx.cfg xxx.weight layers

But see example of yolov3, the yolov3 tk:dnn struct is in include/tkDNN/models/Yolo3.h

So, if new model, we will manual create new model tk:dnn struct?
Do you have supported automatic create tk:dnn struct file?

About include/tkDNN/models/Yolo3.h:

int preYoloFilters = (classes+5)*3; 

std::string input_bin = bin_path + "/layers/input.bin";
std::vector<std::string> output_bins = {
    bin_path + "/debug/layer82_out.bin",
    bin_path + "/debug/layer94_out.bin",
    bin_path + "/debug/layer106_out.bin"
};
std::string c0_bin    = bin_path + "/layers/c0.bin";
std::string c1_bin    = bin_path + "/layers/c1.bin";
std::string c2_bin    = bin_path + "/layers/c2.bin";
std::string c3_bin    = bin_path + "/layers/c3.bin";
std::string c5_bin    = bin_path + "/layers/c5.bin";
std::string c6_bin    = bin_path + "/layers/c6.bin";
std::string c7_bin    = bin_path + "/layers/c7.bin";
std::string c9_bin    = bin_path + "/layers/c9.bin";
std::string c10_bin   = bin_path + "/layers/c10.bin";
std::string c12_bin   = bin_path + "/layers/c12.bin";
std::string c13_bin   = bin_path + "/layers/c13.bin";
std::string c14_bin   = bin_path + "/layers/c14.bin";
std::string c16_bin   = bin_path + "/layers/c16.bin";
std::string c17_bin   = bin_path + "/layers/c17.bin";
std::string c19_bin   = bin_path + "/layers/c19.bin";
std::string c20_bin   = bin_path + "/layers/c20.bin";
std::string c22_bin   = bin_path + "/layers/c22.bin";
std::string c23_bin   = bin_path + "/layers/c23.bin";
std::string c25_bin   = bin_path + "/layers/c25.bin";
std::string c26_bin   = bin_path + "/layers/c26.bin";
std::string c28_bin   = bin_path + "/layers/c28.bin";
std::string c29_bin   = bin_path + "/layers/c29.bin";
std::string c31_bin   = bin_path + "/layers/c31.bin";
std::string c32_bin   = bin_path + "/layers/c32.bin";
std::string c34_bin   = bin_path + "/layers/c34.bin";
std::string c35_bin   = bin_path + "/layers/c35.bin";
std::string c37_bin   = bin_path + "/layers/c37.bin";
std::string c38_bin   = bin_path + "/layers/c38.bin";
std::string c39_bin   = bin_path + "/layers/c39.bin";
std::string c41_bin   = bin_path + "/layers/c41.bin";
std::string c42_bin   = bin_path + "/layers/c42.bin";
std::string c44_bin   = bin_path + "/layers/c44.bin";
std::string c45_bin   = bin_path + "/layers/c45.bin";
std::string c47_bin   = bin_path + "/layers/c47.bin";
std::string c48_bin   = bin_path + "/layers/c48.bin";
std::string c50_bin   = bin_path + "/layers/c50.bin";
std::string c51_bin   = bin_path + "/layers/c51.bin";
std::string c53_bin   = bin_path + "/layers/c53.bin";
std::string c54_bin   = bin_path + "/layers/c54.bin";
std::string c56_bin   = bin_path + "/layers/c56.bin";
std::string c57_bin   = bin_path + "/layers/c57.bin";
std::string c59_bin   = bin_path + "/layers/c59.bin";
std::string c60_bin   = bin_path + "/layers/c60.bin";
std::string c62_bin   = bin_path + "/layers/c62.bin";
std::string c63_bin   = bin_path + "/layers/c63.bin";
std::string c64_bin   = bin_path + "/layers/c64.bin";
std::string c66_bin   = bin_path + "/layers/c66.bin";
std::string c67_bin   = bin_path + "/layers/c67.bin";
std::string c69_bin   = bin_path + "/layers/c69.bin";
std::string c70_bin   = bin_path + "/layers/c70.bin";
std::string c72_bin   = bin_path + "/layers/c72.bin";
std::string c73_bin   = bin_path + "/layers/c73.bin";
std::string c75_bin   = bin_path + "/layers/c75.bin";
std::string c76_bin   = bin_path + "/layers/c76.bin";
std::string c77_bin   = bin_path + "/layers/c77.bin";
std::string c78_bin   = bin_path + "/layers/c78.bin";
std::string c79_bin   = bin_path + "/layers/c79.bin";
std::string c80_bin   = bin_path + "/layers/c80.bin";
std::string c81_bin   = bin_path + "/layers/c81.bin";
std::string g82_bin   = bin_path + "/layers/g82.bin";
std::string c84_bin   = bin_path + "/layers/c84.bin";
std::string c87_bin   = bin_path + "/layers/c87.bin";
std::string c88_bin   = bin_path + "/layers/c88.bin";
std::string c89_bin   = bin_path + "/layers/c89.bin";
std::string c90_bin   = bin_path + "/layers/c90.bin";
std::string c91_bin   = bin_path + "/layers/c91.bin";
std::string c92_bin   = bin_path + "/layers/c92.bin";
std::string c93_bin   = bin_path + "/layers/c93.bin";
std::string g94_bin   = bin_path + "/layers/g94.bin";
std::string c96_bin   = bin_path + "/layers/c96.bin";
std::string c99_bin   = bin_path + "/layers/c99.bin";
std::string c100_bin  = bin_path + "/layers/c100.bin";
std::string c101_bin  = bin_path + "/layers/c101.bin";
std::string c102_bin  = bin_path + "/layers/c102.bin";
std::string c103_bin  = bin_path + "/layers/c103.bin";
std::string c104_bin  = bin_path + "/layers/c104.bin";
std::string c105_bin  = bin_path + "/layers/c105.bin";
std::string g106_bin  = bin_path + "/layers/g106.bin";

tk::dnn::Conv2d     c0   (&net,  32, 3, 3, 1, 1, 1, 1,  c0_bin, true);
tk::dnn::Activation a0   (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c1   (&net,  64, 3, 3, 2, 2, 1, 1,  c1_bin, true);
tk::dnn::Activation a1   (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c2   (&net,  32, 1, 1, 1, 1, 0, 0,  c2_bin, true);
tk::dnn::Activation a2   (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c3   (&net,  64, 3, 3, 1, 1, 1, 1,  c3_bin, true);
tk::dnn::Activation a3   (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s4   (&net, &a1);
tk::dnn::Conv2d     c5   (&net, 128, 3, 3, 2, 2, 1, 1,  c5_bin, true);
tk::dnn::Activation a5   (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c6   (&net,  64, 1, 1, 1, 1, 0, 0,  c6_bin, true);
tk::dnn::Activation a6   (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c7   (&net, 128, 3, 3, 1, 1, 1, 1,  c7_bin, true);
tk::dnn::Activation a7   (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s8   (&net, &a5);
tk::dnn::Conv2d     c9   (&net,  64, 1, 1, 1, 1, 0, 0,  c9_bin, true);
tk::dnn::Activation a9   (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c10  (&net, 128, 3, 3, 1, 1, 1, 1, c10_bin, true);
tk::dnn::Activation a10  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s11  (&net, &s8);

tk::dnn::Conv2d     c12  (&net, 256, 3, 3, 2, 2, 1, 1, c12_bin, true);
tk::dnn::Activation a12  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c13  (&net, 128, 1, 1, 1, 1, 0, 0, c13_bin, true);
tk::dnn::Activation a13  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c14  (&net, 256, 3, 3, 1, 1, 1, 1, c14_bin, true);
tk::dnn::Activation a14  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s15  (&net, &a12);

tk::dnn::Conv2d     c16  (&net, 128, 1, 1, 1, 1, 0, 0, c16_bin, true);
tk::dnn::Activation a16  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c17  (&net, 256, 3, 3, 1, 1, 1, 1, c17_bin, true);
tk::dnn::Activation a17  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s18  (&net, &s15);
tk::dnn::Conv2d     c19  (&net, 128, 1, 1, 1, 1, 0, 0, c19_bin, true);
tk::dnn::Activation a19  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c20  (&net, 256, 3, 3, 1, 1, 1, 1, c20_bin, true);
tk::dnn::Activation a20  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s21  (&net, &s18);
tk::dnn::Conv2d     c22  (&net, 128, 1, 1, 1, 1, 0, 0, c22_bin, true);
tk::dnn::Activation a22  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c23  (&net, 256, 3, 3, 1, 1, 1, 1, c23_bin, true);
tk::dnn::Activation a23  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s24  (&net, &s21);
tk::dnn::Conv2d     c25  (&net, 128, 1, 1, 1, 1, 0, 0, c25_bin, true);
tk::dnn::Activation a25  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c26  (&net, 256, 3, 3, 1, 1, 1, 1, c26_bin, true);
tk::dnn::Activation a26  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s27  (&net, &s24);
tk::dnn::Conv2d     c28  (&net, 128, 1, 1, 1, 1, 0, 0, c28_bin, true);
tk::dnn::Activation a28  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c29  (&net, 256, 3, 3, 1, 1, 1, 1, c29_bin, true);
tk::dnn::Activation a29  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s30  (&net, &s27);
tk::dnn::Conv2d     c31  (&net, 128, 1, 1, 1, 1, 0, 0, c31_bin, true);
tk::dnn::Activation a31  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c32  (&net, 256, 3, 3, 1, 1, 1, 1, c32_bin, true);
tk::dnn::Activation a32  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s33  (&net, &s30);
tk::dnn::Conv2d     c34  (&net, 128, 1, 1, 1, 1, 0, 0, c34_bin, true);
tk::dnn::Activation a34  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c35  (&net, 256, 3, 3, 1, 1, 1, 1, c35_bin, true);
tk::dnn::Activation a35  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s36  (&net, &s33);

tk::dnn::Conv2d     c37  (&net, 512, 3, 3, 2, 2, 1, 1, c37_bin, true);
tk::dnn::Activation a37  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c38  (&net, 256, 1, 1, 1, 1, 0, 0, c38_bin, true);
tk::dnn::Activation a38  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c39  (&net, 512, 3, 3, 1, 1, 1, 1, c39_bin, true);
tk::dnn::Activation a39  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s40  (&net, &a37);

tk::dnn::Conv2d     c41  (&net, 256, 1, 1, 1, 1, 0, 0, c41_bin, true);
tk::dnn::Activation a41  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c42  (&net, 512, 3, 3, 1, 1, 1, 1, c42_bin, true);
tk::dnn::Activation a42  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s43  (&net, &s40);
tk::dnn::Conv2d     c44  (&net, 256, 1, 1, 1, 1, 0, 0, c44_bin, true);
tk::dnn::Activation a44  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c45  (&net, 512, 3, 3, 1, 1, 1, 1, c45_bin, true);
tk::dnn::Activation a45  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s46  (&net, &s43);
tk::dnn::Conv2d     c47  (&net, 256, 1, 1, 1, 1, 0, 0, c47_bin, true);
tk::dnn::Activation a47  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c48  (&net, 512, 3, 3, 1, 1, 1, 1, c48_bin, true);
tk::dnn::Activation a48  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s49  (&net, &s46);
tk::dnn::Conv2d     c50  (&net, 256, 1, 1, 1, 1, 0, 0, c50_bin, true);
tk::dnn::Activation a50  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c51  (&net, 512, 3, 3, 1, 1, 1, 1, c51_bin, true);
tk::dnn::Activation a51  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s52  (&net, &s49);
tk::dnn::Conv2d     c53  (&net, 256, 1, 1, 1, 1, 0, 0, c53_bin, true);
tk::dnn::Activation a53  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c54  (&net, 512, 3, 3, 1, 1, 1, 1, c54_bin, true);
tk::dnn::Activation a54  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s55  (&net, &s52);
tk::dnn::Conv2d     c56  (&net, 256, 1, 1, 1, 1, 0, 0, c56_bin, true);
tk::dnn::Activation a56  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c57  (&net, 512, 3, 3, 1, 1, 1, 1, c57_bin, true);
tk::dnn::Activation a57  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s58  (&net, &s55);
tk::dnn::Conv2d     c59  (&net, 256, 1, 1, 1, 1, 0, 0, c59_bin, true);
tk::dnn::Activation a59  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c60  (&net, 512, 3, 3, 1, 1, 1, 1, c60_bin, true);
tk::dnn::Activation a60  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s61  (&net, &s58);

tk::dnn::Conv2d     c62  (&net,1024, 3, 3, 2, 2, 1, 1, c62_bin, true);
tk::dnn::Activation a62  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c63  (&net, 512, 1, 1, 1, 1, 0, 0, c63_bin, true);
tk::dnn::Activation a63  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c64  (&net,1024, 3, 3, 1, 1, 1, 1, c64_bin, true);
tk::dnn::Activation a64  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s65  (&net, &a62);

tk::dnn::Conv2d     c66  (&net, 512, 1, 1, 1, 1, 0, 0, c66_bin, true);
tk::dnn::Activation a66  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c67  (&net,1024, 3, 3, 1, 1, 1, 1, c67_bin, true);
tk::dnn::Activation a67  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s68  (&net, &s65);

tk::dnn::Conv2d     c69  (&net, 512, 1, 1, 1, 1, 0, 0, c69_bin, true);
tk::dnn::Activation a69  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c70  (&net,1024, 3, 3, 1, 1, 1, 1, c70_bin, true);
tk::dnn::Activation a70  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s71  (&net, &s68);

tk::dnn::Conv2d     c72  (&net, 512, 1, 1, 1, 1, 0, 0, c72_bin, true);
tk::dnn::Activation a72  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c73  (&net,1024, 3, 3, 1, 1, 1, 1, c73_bin, true);
tk::dnn::Activation a73  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Shortcut   s74  (&net, &s71);

tk::dnn::Conv2d     c75  (&net, 512, 1, 1, 1, 1, 0, 0, c75_bin, true);
tk::dnn::Activation a75  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c76  (&net,1024, 3, 3, 1, 1, 1, 1, c76_bin, true);
tk::dnn::Activation a76  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c77  (&net, 512, 1, 1, 1, 1, 0, 0, c77_bin, true);
tk::dnn::Activation a77  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c78  (&net,1024, 3, 3, 1, 1, 1, 1, c78_bin, true);
tk::dnn::Activation a78  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c79  (&net, 512, 1, 1, 1, 1, 0, 0, c79_bin, true);
tk::dnn::Activation a79  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c80  (&net,1024, 3, 3, 1, 1, 1, 1, c80_bin, true);
tk::dnn::Activation a80  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c81  (&net, preYoloFilters, 1, 1, 1, 1, 0, 0, c81_bin, false);
tk::dnn::Yolo     yolo0  (&net, classes, 3, g82_bin);

tk::dnn::Layer *m83_layers[1] = { &a79 };
tk::dnn::Route      m83  (&net, m83_layers, 1);
tk::dnn::Conv2d     c84  (&net, 256, 1, 1, 1, 1, 0, 0, c84_bin, true);
tk::dnn::Activation a84  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Upsample   u85  (&net, 2);

tk::dnn::Layer *m86_layers[2] = { &u85, &s61 };
tk::dnn::Route      m86  (&net, m86_layers, 2);
tk::dnn::Conv2d     c87  (&net, 256, 1, 1, 1, 1, 0, 0, c87_bin, true);
tk::dnn::Activation a87  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c88  (&net, 512, 3, 3, 1, 1, 1, 1, c88_bin, true);
tk::dnn::Activation a88  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c89  (&net, 256, 1, 1, 1, 1, 0, 0, c89_bin, true);
tk::dnn::Activation a89  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c90  (&net, 512, 3, 3, 1, 1, 1, 1, c90_bin, true);
tk::dnn::Activation a90  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c91  (&net, 256, 1, 1, 1, 1, 0, 0, c91_bin, true);
tk::dnn::Activation a91  (&net, tk::dnn::ACTIVATION_LEAKY);

tk::dnn::Conv2d     c92  (&net, 512, 3, 3, 1, 1, 1, 1, c92_bin, true);
tk::dnn::Activation a92  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c93  (&net, preYoloFilters, 1, 1, 1, 1, 0, 0, c93_bin, false);
tk::dnn::Yolo     yolo1  (&net, classes, 3, g94_bin);

tk::dnn::Layer *m95_layers[1] = { &a91 };
tk::dnn::Route      m95  (&net, m95_layers, 1);
tk::dnn::Conv2d     c96  (&net, 128, 1, 1, 1, 1, 0, 0, c96_bin, true);
tk::dnn::Activation a96  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Upsample   u97  (&net, 2);

tk::dnn::Layer *m98_layers[2] = { &u97, &s36 };
tk::dnn::Route      m98  (&net, m98_layers, 2);
tk::dnn::Conv2d     c99  (&net, 128, 1, 1, 1, 1, 0, 0, c99_bin, true);
tk::dnn::Activation a99  (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c100 (&net, 256, 3, 3, 1, 1, 1, 1, c100_bin, true);
tk::dnn::Activation a100 (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c101 (&net, 128, 1, 1, 1, 1, 0, 0, c101_bin, true);
tk::dnn::Activation a101 (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c102 (&net, 256, 3, 3, 1, 1, 1, 1, c102_bin, true);
tk::dnn::Activation a102 (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c103 (&net, 128, 1, 1, 1, 1, 0, 0, c103_bin, true);
tk::dnn::Activation a103 (&net, tk::dnn::ACTIVATION_LEAKY);

tk::dnn::Conv2d     c104 (&net, 256, 3, 3, 1, 1, 1, 1, c104_bin, true);
tk::dnn::Activation a104 (&net, tk::dnn::ACTIVATION_LEAKY);
tk::dnn::Conv2d     c105 (&net, preYoloFilters, 1, 1, 1, 1, 0, 0, c105_bin, false);
tk::dnn::Yolo      yolo2 (&net, classes, 3, g106_bin);

yolo[0] = &yolo0;
yolo[1] = &yolo1;
yolo[2] = &yolo2;

Recorded FPS vs actual FPS

I'm struggling to achieve the FPS reported in the command line.

For example when I run inference on a 10 min 30fps video the reported inference fps is 300+.

I would expect that the time taken to run inference on the entire video would be 30x60x10 = 18000 / 300 fps = 60secs = 1min

Yet the code takes at least 3 mins to run. Is there something wrong with my calculation? Why would the reported fps not be the actual?

Problem in reorg layer

Using this network configuration the output of reorg layer using CUDNN and TensorRT is different from the exported outputs of darknet using CPU:

== OUTPUT 28 CHECK RESULTS ==
CUDNN vs correct
 | [ 53 ]: -0.0713108 0.714457 
 | [ 54 ]: -0.0889071 1.41847 
 | [ 55 ]: -0.0889071 1.41847 
 | [ 56 ]: -0.0889071 1.41847 
 | [ 57 ]: -0.0889071 1.41847 
 | [ 58 ]: -0.0889071 1.41847 
 | [ 59 ]: -0.0889071 1.41847 
 | [ 60 ]: -0.0889071 1.41847 
 | [ 61 ]: -0.0889071 1.41847 
 | [ 62 ]: -0.0889071 1.41847 
 | [ 63 ]: -0.0889071 1.41847 
 | [ 64 ]: -0.0889071 1.41847 
 | [ 65 ]: -0.0889071 1.41847 
 | [ 66 ]: -0.0889071 1.41847 
 | [ 67 ]: -0.0889071 1.41847 
 | [ 68 ]: -0.0889071 1.41847 
 | [ 69 ]: -0.0889071 1.41847 
 | [ 70 ]: -0.0889071 1.41847 
 | [ 71 ]: -0.0889071 1.41847 
 | [ 72 ]: -0.0889071 1.41847 
 | [ 73 ]: -0.0889071 1.41847 
 | [ 74 ]: -0.0889071 1.41847 
 | [ 75 ]: -0.0889071 1.41847 
 | [ 76 ]: -0.0889071 1.41847 
 | [ 77 ]: -0.0889071 1.41847 
 | [ 78 ]: -0.0889071 1.41847 
 | [ 79 ]: -0.0889071 1.41847 
 | [ 80 ]: -0.0889071 1.41847 
 | [ 81 ]: -0.0889071 1.41847 
 | [ 82 ]: -0.0889071 1.41847 
 | [ 83 ]: -0.0889071 1.41847 
 | [ 84 ]: -0.0889071 1.41847 
 | [ 85 ]: -0.0889071 1.41847 
 | [ 86 ]: -0.0889071 1.41847 
 | [ 87 ]: -0.0889071 1.41847 
 | [ 88 ]: -0.0889071 1.41847 
 | [ 89 ]: -0.0889071 1.41847 
 | [ 90 ]: -0.0889071 1.41847 
 | [ 91 ]: -0.0889071 1.41847 
 | [ 92 ]: -0.0889071 1.41847 
 | [ 93 ]: -0.0889071 1.41847 
 | [ 94 ]: -0.0889071 1.41847 
 | [ 95 ]: -0.0889071 1.41847 
 | [ 96 ]: -0.0889071 1.41847 
 | [ 97 ]: -0.0889071 1.41847 
 | [ 98 ]: -0.0889071 1.41847 
 | [ 99 ]: -0.0889071 1.41847 
 | [ 100 ]: -0.0889071 1.41847 
 | [ 101 ]: -0.0889071 1.41847 
 | [ 102 ]: -0.0889071 1.41847 
 | [ 103 ]: -0.27399 -0.0770894 
 | [ 260 ]: -0.18731 -0.12787 
 | [ 264 ]: -0.0292956 0.0225434 
 | [ 268 ]: -0.0292956 0.0897532 
 | [ 273 ]: -0.0292956 0.0482006 
 | [ 274 ]: -0.0292956 0.136003 
 | [ 275 ]: -0.0292956 0.102949 
 | [ 280 ]: -0.0292956 0.0694339 
 | [ 282 ]: -0.0292956 0.0455105 
 | [ 283 ]: -0.0292956 0.209425 
 | [ 292 ]: -0.0292956 0.135097 
 | [ 300 ]: -0.0292956 0.0492267 
 | [ 302 ]: -0.0292956 0.0348723 
 | [ 304 ]: -0.0292956 0.0715765 
 | [ 307 ]: -0.0292956 0.0477364 
 | [ 308 ]: -0.0292956 0.0468393 
 | [ 311 ]: -0.145768 0.551812 
 | [ 364 ]: -0.032352 0.33807 
 | [ 365 ]: -0.17146 -0.0114406 
 | [ 366 ]: -0.37652 -0.131791 
 | [ 368 ]: -0.182342 -0.0839855 
 | [ 369 ]: -0.252877 -0.14398 
 | [ 370 ]: -0.314679 -0.389765 
 | [ 371 ]: -0.513241 -0.449211 
 | [ 372 ]: -0.565379 -0.211569 
 | [ 373 ]: -0.181062 -0.368287 
 | [ 374 ]: -0.300863 -0.0671517 
 | [ 376 ]: -0.398152 -0.0442645 
 | [ 377 ]: -0.416887 -0.143037 
 | [ 378 ]: -0.38611 -0.1623 
 | [ 379 ]: -0.347269 -0.0211162 
 | [ 381 ]: -0.484706 -0.14062 
 | [ 384 ]: -0.492647 -0.392213 
 | [ 385 ]: -0.161193 -0.0696261 
 | [ 386 ]: -0.51637 -0.0814965 
 | [ 387 ]: -0.491412 -0.106202 
 | [ 388 ]: -0.335896 -0.194181 
 | [ 390 ]: -0.289178 -0.181675 
 | [ 393 ]: -0.256528 -0.0414582 
 | [ 394 ]: -0.7003 -0.200466 
 | [ 395 ]: -0.517625 -0.145905 
 | [ 396 ]: -0.320515 1.22699 
 | [ 397 ]: -0.41259 -0.301313 
 | [ 398 ]: -0.362108 -0.106931 
 | [ 399 ]: -0.218155 -0.16281 
 | [ 400 ]: -0.20313 -0.152428 
 | [ 402 ]: -0.414642 -0.0900799 
 | [ 403 ]: -0.607981 -0.475988 
 | [ 404 ]: -0.548641 -0.11561 
 | Wrongs: 57030 ~0.05
 

I have checked the code and noticed that it is the exact code in the darknet project for reorg layer on GPU. Could it be a bug in darknet code? Or there is something I am missing?

It should be noted that I have tested the provided network in darknet using both CPU and GPU many times and the final results are similar. But I have not compared all the elements of the output of reorg layer between CPU and GPU as in tkDNN.

P.S. I used my fork of this repo which is mentioned in #47.

CUDA_cublas_device_LIBRARY (ADVANCED)

-- Found CUDNN: /usr/local/cuda/lib64/libcudnn.so
-- Found CUDNN include: /usr/local/cuda/include
-- Found NVINFER: /usr/local/TensorRT-7.0.0.11/lib
-- Found NVINFER include: /usr/local/TensorRT-7.0.0.11/include
install dir:/usr/local
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
linked by target "tkDNN" in directory /home/work/deep_learning/tkDNN

Where are the models?

Hi,

I could compile it on windows and I couldn't find the models?
Where are the models? for example yolo3_berkeley.rt

Thanks

export from mmdet,detectron 2

Hello ,thanks for this awesome project. I have already red all code base . I have some questions about converting models. On what code files should i rely while writing my_net.cpp ? What about using plugins from tensorrt library like RoIAlign etc? Please , if it is possible can you create file like python model code line -> cpp model corresponding code line , i think it will be good practical gide for converting and contributing

Could not build Cuda Engine

I was wondering if anyone could help me out as to what I am missing. When i run ./test_yolo4, my buildEngineWithConfig() does not create a cuda engine for me. There doesn't seem to be any other errors not sure what i am missing.

GPU: RTX2070 super
CUDA: 10.0
OPENCV: 3.4.7
Cudnn: 7.6.3
TensorRT: 6.0.1
TKDNN_MODE=FP16

Could not build cuda engine

Get output of each layers

Hello
Thank you for your work. It's amazing.
I want to construct feature map picture.
May I ask you about how to get output of each layers.

Import ONNX and test model

Hi,

We have an object detector based on ShuffleNetv2 with an ONNX exported model we would like to test in tkDNN. What is the best way to get started with an ONNX based model?

Thanks,
Tim

Fails while exporting yolov4

Hi ! FIrst thanks for your repo, I saw what you've done with the new yolov4 weights in terms of FPS and I wanted to reproduce it. I'm working on a Jetson Xavier and when I'm trying to export the yolov4 weight, I got this error at for the 108 layer : darknet: ./src/darknet.c:519: run_export: Assertion `0 == "layer type not supported for export"' failed. Do you know why this error is showing up ?

Full error :
n: 108, type 3 darknet: ./src/darknet.c:519: run_export: Assertion 0 == "layer type not supported for export"' failed. Aborted

test yolo4 in Jetson Nano - Building tensorRT cuda engine .. Killed!

Hi,

Congratulations on your great work.

I am testing tkDNN in my Jetson NANO. Everything was fine. But when I tried to test yolo3/4 in FP16 mode, while building tensorRT cuda engine (after waiting around 40 min), it terminates with message Killed.

./test_yolo4 

image
Any help please?

Best,
Deepak

OpenCV 4.0 not supported

/home/nvidia/tkDNN/demo/detection/detection.cpp: In function ‘int main(int, char**)’:
/home/nvidia/tkDNN/demo/detection/detection.cpp:163:54: error: ‘CV_LOAD_IMAGE_COLOR’ was not declared in this scope
cv::Mat img = cv::imread(image_path.c_str(), CV_LOAD_IMAGE_COLOR);

The solution to this error is to change CV_LOAD_IMAGE_COLOR for cv::IMREAD_COLOR.

Since opencv 4.0 the macros have changed.

Test inference time with batch_size != 1

Thank for your great work. Currently, I'm trying to find out the different between inference time of running with batch size 1 and 4 for fp16 model. Here my test script (name image.cpp):

#include <iostream>
#include <signal.h>
#include <stdlib.h>     
#include <unistd.h>
#include <mutex>

#include "Yolo3Detection.h"
#include <chrono>

using namespace std::chrono; 


int main(int argc, char *argv[]){

    // Network name
    std::string net = "yolo3-tiny_3l.rt";
    if(argc > 1)
        net = argv[1];
    
    // Inference batch size 
    int n_batch = 4;
    if(argc > 2)
        n_batch = atoi(argv[2]);
        
    // Number of classes
    int n_classes = 2;
    if(argc > 3)
        n_classes = atoi(argv[3]);
  
    tk::dnn::Yolo3Detection yolo;  

    tk::dnn::DetectionNN *detNN; 
    detNN = &yolo;
    detNN->init(net, n_classes, n_batch);	
    
    cv::Mat img;
    std::vector<cv::Mat> batch_frame;
    std::vector<cv::Mat> batch_dnn_input;
    auto start = high_resolution_clock::now(); 
    // cv::namedWindow("detection", cv::WINDOW_NORMAL);  

    // Read all images from a directory          
    cv::String path("../images/*.*");
    std::vector<cv::String> fn;
    cv::glob(path,fn,true); // recurse
    int total = 0;
    int num_img = 0; 
    for (size_t k=0; k < fn.size(); ++k){
         img = cv::imread(fn[k], cv::IMREAD_COLOR);
	 if (!img.data){
             std::cout << "Problem loading image!";
             break;
         } 
         batch_frame.push_back(img);

         // this will be resized to the net format
         batch_dnn_input.push_back(img.clone());
         ++num_img;
          
         
         // do inference
         if (num_img == n_batch){
             detNN->update(batch_dnn_input, n_batch);
             detNN->draw(batch_frame); 
             for(int bi=0; bi< n_batch; ++bi){
             //    cv::imshow("detection", batch_frame[bi]);
             //    cv::waitKey(10);
                   ++total;
             }           
             num_img = 0;
             batch_dnn_input.clear();
             batch_frame.clear();
         }            
    }
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<seconds>(stop - start);  
    double mean = 0; 
    std::cout<<COL_GREENB<<"\n\nTime stats:\n";
    std::cout << "Total files: " << total << " files\n";
    std::cout << "Total inference time: " << duration.count() << "s\n";
    std::cout<<"Min: "<<*std::min_element(detNN->stats.begin(), detNN->stats.end())/n_batch<<" ms\n";    
    std::cout<<"Max: "<<*std::max_element(detNN->stats.begin(), detNN->stats.end())/n_batch<<" ms\n";    
    for(int i=0; i<detNN->stats.size(); i++) mean += detNN->stats[i]; mean /= detNN->stats.size();
    std::cout<<"Avg: "<<mean/n_batch<<" ms\t"<<1000/(mean/n_batch)<<" FPS\n"<<COL_END; 
    
}

What I did to generate my model:

export TKDNN_MODE=FP16
export TKDNN_BATCHSIZE=4
./test_yolo3tiny_512

and this one to compare:

export TKDNN_MODE=FP16
export TKDNN_BATCHSIZE=1
./test_yolo3tiny_512

But when running test case, this is what I received:
For batch_size = 1:
use this command:

./image yolo3-tiny_512_fp16.rt 1

and this is result:

New NetworkRT (TensorRT v5.16)
Float16 support: 1
Int8 support: 0
DLAs: 0
create execution context
Input/outputs numbers: 4
input idex = 0 -> output index = 3
Data dim: 1 3 512 512 1
Data dim: 1 21 64 64 1
RtBuffer 0   dim: Data dim: 1 3 512 512 1
RtBuffer 1   dim: Data dim: 1 21 16 16 1
RtBuffer 2   dim: Data dim: 1 21 32 32 1
RtBuffer 3   dim: Data dim: 1 21 64 64 1

Time stats:
Total files: 1000 files
Total inference time: 85s
Min: 42.6142 ms
Max: 2113.66 ms
Avg: 46.6069 ms	21.456 FPS

For batch_size = 4:
use this command:

./image yolo3-tiny_512_fp16.rt 4

and this is result:

New NetworkRT (TensorRT v5.16)
Float16 support: 1
Int8 support: 0
DLAs: 0
create execution context
Input/outputs numbers: 4
input idex = 0 -> output index = 3
Data dim: 1 3 512 512 1
Data dim: 1 21 64 64 1
RtBuffer 0   dim: Data dim: 1 3 512 512 1
RtBuffer 1   dim: Data dim: 1 21 16 16 1
RtBuffer 2   dim: Data dim: 1 21 32 32 1
RtBuffer 3   dim: Data dim: 1 21 64 64 1

Time stats:
Total files: 1000 files
Total inference time: 86s
Min: 42.2471 ms
Max: 652.825 ms
Avg: 46.5498 ms	21.4823 FPS

As you can see, there is no huge diffenrent as described in your README. What did I do wrong?

engineRT is NULL after deserializing

Hi,

I compiled it on Windows and when I have serialized the model successfully but when it wants to deserialize the engineRT is null. Where is the problem?

Thanks

Inference tkDNN + Tensorrt with python

First, Thank for your hard work. Your repo is so impressive but I have a question. I'm not famillar with C++ so can you provide this code wrap tkDNN + tensorrt with python inference ?
Thanks

About performance of FPS with yolov3 and yolov3-tiny on Xavier

We are interested in the performance acceleration proposed by tkDNN, so we implement tkDNN according to the workflow by you provided, and execute the examples of Yolov3, include fp32, fp16 and int8.

We have also used examples provided by NVIDIA to implement TensorRT in the past, example of Yolov3 by python like / usr / src / tensorrt / sample / python / yolov3_onnx. (we will call it TRT)

I made the FPS obtained by Darknet, TRT, tkDNN into a table, the table is follow as:
Xavier FPS

Expected that tkDNN will be faster than TRT, but we observed that the speed of TRT is similar to that of tkDNN.

Why is tkDNN not faster than TRT?

Bug in NetworkRT::serialize

Sorry, don't know how best to do this on github, but anyway... This caused a bug:

bool NetworkRT::serialize(const char *filename) {
std::ofstream p(filename);

It should be:

std::ofstream p(filename, std::ios::binary);

Problem in adding dilated convolution

Hi
The AlexeyAB/darknet repository, supports dilated convolution. I tried to add the dilation support to tkDNN but it results in error.

Changes can be seen in my fork here.
This is the output of a test program (test_yolo3_tiny_dilation) created based on test_yolo3tiny, using this cfg file:

Not supported field: batch=128
Not supported field: subdivisions=16
Not supported field: momentum=0.9
Not supported field: decay=0.0005
Not supported field: angle=0
Not supported field: saturation = 1.5
Not supported field: exposure = 1.15
Not supported field: hue=0
Not supported field: learning_rate=0.001
Not supported field: burn_in=1000
Not supported field: max_batches = 15000
Not supported field: policy=steps
Not supported field: steps=13000,14000
Not supported field: scales=.1,.1
New NETWORK (tkDNN v0.5, CUDNN v8)
Reading weights: I=1 O=16 KERNEL=3x3x1
Reading weights: I=16 O=32 KERNEL=3x3x1
Reading weights: I=32 O=16 KERNEL=1x1x1
Reading weights: I=16 O=32 KERNEL=3x3x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=64 O=128 KERNEL=3x3x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=64 O=128 KERNEL=3x3x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=64 O=128 KERNEL=3x3x1
Reading weights: I=64 O=16 KERNEL=1x1x1
Reading weights: I=192 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=18 KERNEL=1x1x1
Not supported field: anchors = 25, 21,  36, 23,  37, 31,  49, 35,  63, 48,  98, 61
Not supported field: jitter=.3
Not supported field: ignore_thresh = .7
Not supported field: truth_thresh = 1
Not supported field: random=1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=128 KERNEL=3x3x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=18 KERNEL=1x1x1
Not supported field: anchors = 25, 21,  36, 23,  37, 31,  49, 35,  63, 48,  98, 61
Not supported field: jitter=.3
Not supported field: ignore_thresh = .7
Not supported field: truth_thresh = 1
Not supported field: random=1

====================== NETWORK MODEL ======================
N.  Layer type       input (H*W,CH)        output (H*W,CH) 
  0 Conv2d           416 x  416,    1  ->  416 x  416,   16
  1 ActivationLeaky  416 x  416,   16  ->  416 x  416,   16
  2 Conv2d           416 x  416,   16  ->  208 x  208,   32
  3 ActivationLeaky  208 x  208,   32  ->  208 x  208,   32
  4 Conv2d           208 x  208,   32  ->  208 x  208,   16
  5 ActivationLeaky  208 x  208,   16  ->  208 x  208,   16
  6 Conv2d           208 x  208,   16  ->  208 x  208,   32
  7 ActivationLeaky  208 x  208,   32  ->  208 x  208,   32
  8 Conv2d           208 x  208,   32  ->  104 x  104,   64
  9 ActivationLeaky  104 x  104,   64  ->  104 x  104,   64
 10 Conv2d           104 x  104,   64  ->  104 x  104,   32
 11 ActivationLeaky  104 x  104,   32  ->  104 x  104,   32
 12 Conv2d           104 x  104,   32  ->  104 x  104,   64
 13 ActivationLeaky  104 x  104,   64  ->  104 x  104,   64
 14 Conv2d           104 x  104,   64  ->  104 x  104,   32
 15 ActivationLeaky  104 x  104,   32  ->  104 x  104,   32
 16 Conv2d           104 x  104,   32  ->  104 x  104,   64
 17 ActivationLeaky  104 x  104,   64  ->  104 x  104,   64
 18 Conv2d           104 x  104,   64  ->  104 x  104,   32
 19 ActivationLeaky  104 x  104,   32  ->  104 x  104,   32
 20 Conv2d           104 x  104,   32  ->  104 x  104,   64
 21 ActivationLeaky  104 x  104,   64  ->  104 x  104,   64
 22 Conv2d           104 x  104,   64  ->  104 x  104,   32
 23 ActivationLeaky  104 x  104,   32  ->  104 x  104,   32
 24 Route            104 x  104,   64  ->  104 x  104,   64
 25 Conv2d           104 x  104,   64  ->  104 x  104,   32
 26 ActivationLeaky  104 x  104,   32  ->  104 x  104,   32
 27 Conv2d           104 x  104,   32  ->  104 x  104,   64
 28 ActivationLeaky  104 x  104,   64  ->  104 x  104,   64
 29 Conv2d           104 x  104,   64  ->  104 x  104,   32
 30 ActivationLeaky  104 x  104,   32  ->  104 x  104,   32
 31 Route            104 x  104,   64  ->  104 x  104,   64
 32 Conv2d           104 x  104,   64  ->  104 x  104,   32
 33 ActivationLeaky  104 x  104,   32  ->  104 x  104,   32
 34 Route            104 x  104,   64  ->  104 x  104,   64
 35 Conv2d           104 x  104,   64  ->  104 x  104,   32
 36 ActivationLeaky  104 x  104,   32  ->  104 x  104,   32
 37 Route            104 x  104,   64  ->  104 x  104,   64
 38 Conv2d           104 x  104,   64  ->   52 x   52,  128
 39 ActivationLeaky   52 x   52,  128  ->   52 x   52,  128
 40 Conv2d            52 x   52,  128  ->   52 x   52,   64
 41 ActivationLeaky   52 x   52,   64  ->   52 x   52,   64
 42 Conv2d            52 x   52,   64  ->   52 x   52,  128
 43 ActivationLeaky   52 x   52,  128  ->   52 x   52,  128
 44 Conv2d            52 x   52,  128  ->   52 x   52,   64
 45 ActivationLeaky   52 x   52,   64  ->   52 x   52,   64
 46 Conv2d            52 x   52,   64  ->   52 x   52,  128
 47 ActivationLeaky   52 x   52,  128  ->   52 x   52,  128
 48 Route            104 x  104,   64  ->  104 x  104,   64
 49 Conv2d           104 x  104,   64  ->  104 x  104,   16
 50 ActivationLeaky  104 x  104,   16  ->  104 x  104,   16
 51 Reorg            104 x  104,   16  ->   52 x   52,   64
 52 Route             52 x   52,  192  ->   52 x   52,  192
 53 Conv2d            52 x   52,  192  ->   52 x   52,  128
 54 ActivationLeaky   52 x   52,  128  ->   52 x   52,  128
 55 Conv2d            52 x   52,  128  ->   52 x   52,  256
 56 ActivationLeaky   52 x   52,  256  ->   52 x   52,  256
 57 Conv2d            52 x   52,  256  ->   52 x   52,   18
 58 Yolo              52 x   52,   18  ->   52 x   52,   18
 59 Route            208 x  208,   32  ->  208 x  208,   32
 60 Pooling          208 x  208,   32  ->  104 x  104,   32
 61 Conv2d           104 x  104,   32  ->  104 x  104,   64
 62 ActivationLeaky  104 x  104,   64  ->  104 x  104,   64
 63 Pooling          104 x  104,   64  ->   52 x   52,   64
 64 Conv2d            52 x   52,   64  ->   52 x   52,  128
 65 ActivationLeaky   52 x   52,  128  ->   52 x   52,  128
 66 Pooling           52 x   52,  128  ->   26 x   26,  128
 67 Conv2d            26 x   26,  128  ->   26 x   26,  256
 68 ActivationLeaky   26 x   26,  256  ->   26 x   26,  256
 69 Pooling           26 x   26,  256  ->   13 x   13,  256
 70 Conv2d            13 x   13,  256  ->   13 x   13,  512
 71 ActivationLeaky   13 x   13,  512  ->   13 x   13,  512
 72 Pooling           13 x   13,  512  ->   13 x   13,  512
 73 Conv2d            13 x   13,  512  ->   13 x   13,   18
 74 Yolo              13 x   13,   18  ->   13 x   13,   18
===========================================================

GPU free memory: 324.542 mb.
New NetworkRT (TensorRT v7.1)
Float16 support: 1
Int8 support: 0
DLAs: 0
Selected maxBatchSize: 1
GPU free memory: 79.02 mb.
Building tensorRT cuda engine...
cloud not build cuda engine
.../tkDNN/src/NetworkRT.cpp:145
Aborting...

Any help?

How sholud I add a new layer?

1.How sholud I add a new layer with tkDNN?
2.How to manualize generating trt model?
I use the command

./darknet export <path-to-cfg-file> <path-to-weights> layers
./darknet export <path-to-cfg-file> <path-to-weights> debug

And

./test_yolo3            # run the yolo test (is slow)

auto generate yolo3_fp32.rt with exported bin files in cloud.

threshold?

Does the concept of threshold like in darknet exists?

Why do we need tkDNN?

Sorry, for stupid question, but it is not reflected in README. If I have Tensorrt, then tkDNN is just a config reader for yolo? Why does it use CUDNN directly, does it give any benefit over TRT? Does it give any perfomance benefit, compared to TRT? It would be great to give some insight for the users in the README file.

can a PAN3 network work in tkDNN?

Hi,

PAN3 has a few properties that I'm uncertain if these are only training augmentation or completely new layers that can affect inferencing.

Can a PAN3 network work in tkDNN? Are all the elements of it implemented?

If so, how would I set this up in Conv2d and Pooling? In particular stride_x and stride_y.

[maxpool]
size=8
stride=4
stride_x=4
stride_y=8

[convolutional]
batch_normalize=1
filters=64
size=1
stride=2
stride_x=2
stride_y=1
pad=1
activation=mish

Thank you!

Save video

Is possible save video like darknet -out_filename?

Edit: I found function to save video in demo.c

Resolution change - Error reading file yolo3_tiny/layers/c15.bin with n of float: 130560 seek: 0 size: 522240

Hi, any help on the below would be really appreciated.

I have adjusted the resolution and the number of classes on yolov3_tiny to match trained weights and yolo_v3_tiny config I have (see below) but am getting the following error when running : tane@xavier:~/git/tkDNN/build$ ./test_yolo3_tiny
Error reading file yolo3_tiny/layers/c15.bin with n of float: 130560 seek: 0 size: 522240

yolo3_tiny.cpp.txt
yolo3_tiny.cfg.txt

Error screen:
image

  1. I can successfully run example tests e.g. yolov4 and demo with no issue.
  2. I can confirm the configuration used in training of weights is identical to yolo3_tiny.cfg sample provided in this repo apart from class, resolution, and filters change.
  3. Adjustments are made before compiling (make clean, make) the code.
  4. Weights files (successfully?) converted using the guide provided in the readme. output here.
    yolo3_tiny_weights_convert.txt

I am attempting to do a performance comparison of trained models darknet vs tkDNN which will run on a Xavier AGX onboard a drone to track critically endangered dolphins. The better the FPS the better overall tracking performance.

Thanks!

is Cuda 10.0 required?

The latest Xavier L4T comes with Cuda 10.2. Can tkDNN work with Cuda 10.2 or it must me Cuda 10?

I get the following error and suspect it is because CUDNN v7.605(Cuda 10.2)

./test_yolo3
New NETWORK (tkDNN v0.4, CUDNN v7.605)
CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED
/tkDNN/src/Network.cpp:56
Aborting...

CUDNN_NVLIB issue

Hi,
When I try to make the project it is throwing an error like:


-- Found NVINFER: CUDNN_NVLIB-NOTFOUND
install dir:/usr/local
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDNN_NVLIB
    linked by target "tkDNN" in directory /home/ixtiyor/Documents/projects/temp/tkDNN

I have tried different versions of tensorrt but it is not working, can somebody help me please ?

How to change the classes of models

Hello, thank you for your work!
I trained yolo3 with a dataset which only have 4 classes. After I changing the files in debug and layers, I try to use test_yolo3 to create the rt file. Then I got an error:

Error reading file yolo3/layers/c81.bin with n of float: 261120 seek: 0 size: 1044480

I know the reason is the different between classes.
In /tkDNN/include/tkDNN/models/Yolo3.h , I can find:

int preYoloFilters = (classes+5)*3; 

How can I change the classes?
Thank you!

Parser?

Hi @ceccocats and thanks for the parser. It is very useful.

How can I deal with the maxpool layer below? It is not compatible.

Any advice is appreciated.

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[route]
layers=-1, -3, -6, -9, -12, -15, -18, -21, -24, -27

[maxpool]
maxpool_depth=1
out_channels=64
stride=1
size=1

########### [yolo-1]

[upsample]
stride=4

[route]
layers = -1,24

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

Also, I had to add the following to deal with carriage return chars.
found = line.find("\r");
if ( found != std::string::npos ) {
line = line.substr(0, found);
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.