rizhaocai / pytorch_onnx_tensorrt Goto Github PK

A tutorial about how to build a TensorRT Engine from a PyTorch Model with the help of ONNX

License: MIT License

Jupyter Notebook 41.46% Python 58.54%

pytorch_onnx_tensorrt's Introduction

PyTorch_ONNX_TensorRT

A tutorial that show how could you build a TensorRT engine from a PyTorch Model with the help of ONNX. Please kindly star this project if you feel it helpful.

News

A dynamic_shape_example (batch size dimension) is added.
Just run python3 dynamic_shape_example.py

This example should be run on TensorRT 7.x. I find that this repo is a bit out-of-date since there are some API changes from TensorRT 5.0 to TensorRT 7.x. I will put sometime in a near future to make it compatible.

Environment

Ubuntu 16.04 x86_64, CUDA 10.0
Python 3.5
PyTorch 1.0
TensorRT 5.0 (If you are using Jetson TX2, TensorRT will be already there if you have installed the jetpack)
3.1 Download TensorRT (You should pick up the right package that matches your environment)
3.2 Debian installation

  $ sudo dpkg -i nv-tensorrt-repo-ubuntu1x04-cudax.x-trt5.x.x.x-ga-yyyymmdd_1-1_amd64.deb # The downloaeded file
  $ sudo apt-key add /var/nv-tensorrt-repo-cudax.x-trt5.x.x.x-gayyyymmdd/7fa2af80.pub
  $ sudo apt-get update
  $ sudo apt-get install tensorrt
  
  $ sudo apt-get install python3-libnvinfer

To verify the installation of TensorRT $ dpkg -l | grep TensorRT You should see similar things like

  ii  graphsurgeon-tf	5.1.5-1+cuda10.1	amd64	GraphSurgeon for TensorRT package
  ii  libnvinfer-dev	5.1.5-1+cuda10.1	amd64	TensorRT development libraries and headers
  ii  libnvinfer-samples	5.1.5-1+cuda10.1	amd64	TensorRT samples and documentation
  ii  libnvinfer5		5.1.5-1+cuda10.1	amd64	TensorRT runtime libraries
  ii  python-libnvinfer	5.1.5-1+cuda10.1	amd64	Python bindings for TensorRT
  ii  python-libnvinfer-dev	5.1.5-1+cuda10.1	amd64	Python development package for TensorRT
  ii  python3-libnvinfer	5.1.5-1+cuda10.1	amd64	Python 3 bindings for TensorRT
  ii  python3-libnvinfer-dev	5.1.5-1+cuda10.1	amd64	Python 3 development package for TensorRT
  ii  tensorrt	5.1.5.x-1+cuda10.1	amd64	Meta package of TensorRT
  ii  uff-converter-tf	5.1.5-1+cuda10.1	amd64	UFF converter for TensorRT package

3.2 Install PyCuda (This will support TensorRT)

 $ pip3 install pycuda

If you get problems with pip, please try

$ sudo apt-get install python3-pycuda #(Install for /usr/bin/python3)

For full details, please check the TensorRT-Installtation Guide

Usage

Please check the file 'pytorch_onnx_trt.ipynb'

Int 8:

To run the int-8 optimization

python3 trt_int8_demo.py

You will see output like

Function forward_onnx called!
graph(%input : Float(32, 3, 128, 128),
%1 : Float(16, 3, 3, 3),
%2 : Float(16),
%3 : Float(64, 16, 5, 5),
%4 : Float(64),
%5 : Float(10, 64),
%6 : Float(10)):
%7 : Float(32, 16, 126, 126) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[1, 1]](%input, %1, %2), scope: Conv2d
%8 : Float(32, 16, 126, 126) = onnx::Relu(%7), scope: ReLU
%9 : Float(32, 16, 124, 124) = onnx::MaxPoolkernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[1, 1], scope: MaxPool2d
%10 : Float(32, 64, 120, 120) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[0, 0, 0, 0], strides=[1, 1]](%9, %3, %4), scope: Conv2d
%11 : Float(32, 64, 120, 120) = onnx::Relu(%10), scope: ReLU
%12 : Float(32, 64, 1, 1) = onnx::GlobalAveragePool(%11), scope: AdaptiveAvgPool2d
%13 : Float(32, 64) = onnx::Flattenaxis=1
%output : Float(32, 10) = onnx::Gemm[alpha=1, beta=1, transB=1](%13, %5, %6), scope: Linear
return (%output) Int8 mode enabled Loading ONNX file from path model_128.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine from file model_128.onnx; this may take a while...
Completed creating the engine
Loading ONNX file from path model_128.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine from file model_128.onnx; this may take a while...
Completed creating the engine
Loading ONNX file from path model_128.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine from file model_128.onnx; this may take a while...
Completed creating the engine
Toal time used by engine_int8: 0.0009500550794171857
Toal time used by engine_fp16: 0.001466430104649938
Toal time used by engine: 0.002231682623709525

This output is run by Jetson Xavier.
Please be noted that int8 mode is only supported by specific GPU modules, e.g. Jetson Xavier , Tesla P4, etc.

TensorRT 7 have been released. According to some feedbacks, the code is tested well with TensorRT 5.0 and might have some problems with TensorRT 7.0. I will update this repo by doing a test with TensorRT 7 and making it compatible soon.

Contact

Cai, Rizhao
Email: [email protected]

pytorch_onnx_tensorrt's People

Contributors

Stargazers

Watchers

Forkers

zhangming8 guohongli zwqnju sailyung fjfabz sunchuanxi wulingtian culturenotes csldali ywd-pku xiaozhimabing trippelc23 captain1986 wuxiaolianggit l1129433134 yuanye-f super-ljg toydogcat loovelj dyf-ai machinelp dwhou tomsirliu elevir huanzhile peace-zy liuxiaoxiao666 dd1ge fayssica tangtang666 zhangxuemiao sayanmutd creaitr herolin12 roborocklsm xuyanging simontu fushier fuxuliu chenshen03 hikkikuma jia0511 snehashis1997 zxcvbml dandan-chang asuradagsx-cyber chop2 xuatpham arslan-z csliujw wangxihao cattpku guangqianzhang mmrahman-utexas guoych7

pytorch_onnx_tensorrt's Issues

AdaptiveAvgPool2d not supported by tensorRT OnnxParser

Env:
ubuntu 16.04
Python 3.6
Pytorch 1.3.0
tensorRT:7.0.11

Detail:
the adaptiveavgpool2d operation is not supported by onnxparser, so it could lead to the failure of transformation of onnx to tensorrt.
you could use
for error in range(parser.num_errors):
print(parser.get_error(error))
after the parser read onnx model

（use arcface model） AttributeError: 'NoneType' object has no attribute 'serialize'

show:Completed parsing of ONNX file
but when

f.write(engine.serialize())

AttributeError: 'NoneType' object has no attribute 'serialize'

I guess my onnx model maybe have some problem,I am trying check the pth2onnx..

代码与 tensorrt 8.4 不兼容

我正在使用 tensorrt 8.4的环境运行代码，报错
'tensorrt.tensorrt.Builder' object has no attribute 'build_cuda_engine'，int8_mod, fp16_mode, int8_mode

在新的tensorrt中，使用config 接管了 builder的各种设置。

problems occured when executing "python trt_int8_demo.py "

Building an engine from file model_128.onnx; this may take a while...
Traceback (most recent call last):
File "trt_int8_demo.py", line 138, in
main()
File "trt_int8_demo.py", line 92, in main
engine_int8 = trt_helper.get_engine(batch_size,onnx_model_path,engine_model_path, fp16_mode=False, int8_mode=True, calibration_stream=calibration_stream, save_engine=True)
File "/data/zhangyl/PyTorch_ONNX_TensorRT/helpers/trt_helper.py", line 95, in get_engine
return build_engine(max_batch_size, save_engine)
File "/data/zhangyl/PyTorch_ONNX_TensorRT/helpers/trt_helper.py", line 76, in build_engine
engine = builder.build_cuda_engine(network)
TypeError: read_calibration_cache() missing 1 required positional argument: 'length'

error:create_execution_context

AttributeError: 'NoneType' object has no attribute 'create_execution_context'

Int8 implementation

Hi! Any idea or resources on how to implement the Int8? The documentation from Nvidia is too minimal to get how it is supposed to work. They mention creating two objects, but who knows from which classes and how.

error: Failed to parse ONNX model.

Hello,thank you for your work. I get a error when run the demo,but i just use the model_128.onnx and did not make any
changes.
What is the reason and how to solve it ？

error:
Please check if the ONNX model is compatible '
AssertionError: Failed to parse ONNX model.

Run batch size with inputs[1].host

Hi, everyone
I define batch_size_max = 4 and input size onnx= 2 to run batch_size = 2 for model trt
When i convert model that Error show "list index out of range" in inputs[1].host
That mean inputs only have one elements.
How to fix when i want to run batch size > 1

AssertionError: Failed to parse ONNX model. Please check if the ONNX model is compatible

trtexec --onnx=/home/guohao02/PyTorch_ONNX_TensorRT/model_128.onnx --explicitBatch

[10/16/2021-13:56:29] [I] === Model Options ===
[10/16/2021-13:56:29] [I] Format: ONNX
[10/16/2021-13:56:29] [I] Model: /home/guohao02/PyTorch_ONNX_TensorRT/model_128.onnx
[10/16/2021-13:56:29] [I] Output:
[10/16/2021-13:56:29] [I] === Build Options ===
[10/16/2021-13:56:29] [I] Max batch: explicit
[10/16/2021-13:56:29] [I] Workspace: 16 MB
[10/16/2021-13:56:29] [I] minTiming: 1
[10/16/2021-13:56:29] [I] avgTiming: 8
[10/16/2021-13:56:29] [I] Precision: FP32
[10/16/2021-13:56:29] [I] Calibration: 
[10/16/2021-13:56:29] [I] Safe mode: Disabled
[10/16/2021-13:56:29] [I] Save engine: 
[10/16/2021-13:56:29] [I] Load engine: 
[10/16/2021-13:56:29] [I] Inputs format: fp32:CHW
[10/16/2021-13:56:29] [I] Outputs format: fp32:CHW
[10/16/2021-13:56:29] [I] Input build shapes: model
[10/16/2021-13:56:29] [I] === System Options ===
[10/16/2021-13:56:29] [I] Device: 0
[10/16/2021-13:56:29] [I] DLACore: 
[10/16/2021-13:56:29] [I] Plugins:
[10/16/2021-13:56:29] [I] === Inference Options ===
[10/16/2021-13:56:29] [I] Batch: Explicit
[10/16/2021-13:56:29] [I] Iterations: 10 (200 ms warm up)
[10/16/2021-13:56:29] [I] Duration: 10s
[10/16/2021-13:56:29] [I] Sleep time: 0ms
[10/16/2021-13:56:29] [I] Streams: 1
[10/16/2021-13:56:29] [I] Spin-wait: Disabled
[10/16/2021-13:56:29] [I] Multithreading: Enabled
[10/16/2021-13:56:29] [I] CUDA Graph: Disabled
[10/16/2021-13:56:29] [I] Skip inference: Disabled
[10/16/2021-13:56:29] [I] === Reporting Options ===
[10/16/2021-13:56:29] [I] Verbose: Disabled
[10/16/2021-13:56:29] [I] Averages: 10 inferences
[10/16/2021-13:56:29] [I] Percentile: 99
[10/16/2021-13:56:29] [I] Dump output: Disabled
[10/16/2021-13:56:29] [I] Profile: Disabled
[10/16/2021-13:56:29] [I] Export timing to JSON file: 
[10/16/2021-13:56:29] [I] Export profile to JSON file: 
[10/16/2021-13:56:29] [I] 
----------------------------------------------------------------
Input filename:   /home/guohao02/PyTorch_ONNX_TensorRT/model_128.onnx
ONNX IR version:  0.0.6
Opset version:    9
Producer name:    pytorch
Producer version: 1.9
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.6) than this parser was built against (0.0.3).
While parsing node number 0 [Conv]:
ERROR: ModelImporter.cpp:296 In function importModel:
[5] Assertion failed: tensors.count(input_name)
[10/16/2021-13:56:29] [E] Failed to parse onnx file
[10/16/2021-13:56:29] [E] Parsing model failed
[10/16/2021-13:56:29] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # trtexec --onnx=/home/guohao02/PyTorch_ONNX_TensorRT/model_128.onnx --explicitBatch

AttributeError: 'NoneType' object has no attribute 'create_execution_context'

Connected to pydev debugger (build 181.5540.34)
Loading ONNX file from path ./models/onnx/model.onnx...
Beginning ONNX file parsing
WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Successfully casted down to INT32.
Completed parsing of ONNX file
Building an engine from file ./models/onnx/model.onnx; this may take a while...
[TensorRT] ERROR: Network must have at least one output
Failed to create the engine

Does onnx convert tensorRT need a data set for calibration?

I saw some quantization tutorials stating that a small part of the training data set is required for quantitative calibration to determine the range of activation values for activation and weights. Why is there no such part in the code you provided, or is it unnecessary? Thank you

Hi, I want to make sure whether there are any differences between engine generation by your python code and by trtexec?

And I also want to know how to quantize it in int8 model rather than fp32 by trtexec, if this is familiar to you.