huynhbaobk / tensorrt-triton-yolov5 Goto Github PK

View Code? Open in Web Editor NEW

52.0 1.0 9.0 3.31 MB

Python 21.01% Dockerfile 0.21% CMake 1.38% C++ 71.80% Cuda 5.18% C 0.21% Shell 0.22%

tensorrt-triton-yolov5's Introduction

YOLOv5 on Triton Inference Server with TensorRT

This repository shows how to deploy YOLOv5 as an optimized TensorRT engine to Triton Inference Server.

This project based on isarsoft yolov4-triton-tensorrt and Wang Xinyu - TensorRTx

Build TensorRT engine

Run the following to get a running TensorRT container with our repo code:

cd tensorrt-triton-yolov5
bash launch_tensorrt.sh

Or build docker from Dockerfile

cd tensorrt-triton-yolov5
sudo docker build -t baohuynhbk/tensorrt-20.08-py3-opencv4:latest -f tensorrt.Dockerfile .

Docker will download the TensorRT container. You need to select the version (in this case 20.08) according to the version of Triton that you want to use later to ensure the TensorRT versions match. Matching NGC version tags use the same TensorRT version.

Inside the container the following will run:

bash convert.sh

This will generate a file called yolov5.engine, which is our serialized TensorRT engine. Together with libmyplugins.so we can now deploy to Triton Inference Server.

Deploy to Triton Inference Server

Start Triton Server

Open an terminal

bash run_triton.sh

Client

Should install tritonclient first:

sudo apt update
sudo apt install libb64-dev

pip install nvidia-pyindex
pip install tritonclient[all]

Open another terminal. This repo contains a python client.

cd triton-deploy/clients/python
python client.py -o data/dog_result.jpg image data/dog.jpg

Benchmark

To benchmark the performance of the model, we can run Tritons Performance Client.

To run the perf_client, install the Triton Python SDK (tritonclient), which ships with perf_client as a preinstalled binary.

# Example
perf_client -m yolov5 -u 127.0.0.1:8221 -i grpc --shared-memory system --concurrency-range 32

Alternatively you can get the Triton Client SDK docker container.

docker run -it --ipc=host --net=host nvcr.io/nvidia/tritonserver:21.03-py3-sdk /bin/bash
cd install/bin
# Example
./perf_client -m yolov5 -u 127.0.0.1:8221 -i grpc --shared-memory system --concurrency-range 4

The following benchmarks were taken on a system with NVIDIA 2080 Ti GPU. Concurrency is the number of concurrent clients invoking inference on the Triton server via grpc. Results are total frames per second (FPS) of all clients combined and average latency in milliseconds for every single respective client.

tensorrt-triton-yolov5's People

Contributors

Stargazers

Watchers

Forkers

xinsuinizhuan tonhathuy wulouzhu nmber5 karanjakhar hongyuan-liu lakshmananan1102 cvbuff ducbxfsoft

tensorrt-triton-yolov5's Issues

Error while converting

Loading weights: /workspace/tensorrtx/yolov5/build/yolov5m6.wts
Building engine, please wait for a while...
[01/20/2022-12:37:24] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[01/20/2022-12:37:26] [E] [TRT] ../rtSafe/cuda/caskUtils.cpp (98) - Assertion Error in trtSmToCask: 0 (Unsupported SM.)
Build engine successfully!
yolov5: /workspace/tensorrtx/yolov5/yolov5.cpp:243: void APIToModel(unsigned int, nvinfer1::IHostMemory**, bool&, float&, float&, std::__cxx11::string&): Assertion `engine != nullptr' failed.
./convert.sh: line 26: 327 Aborted (core dumped) /workspace/tensorrtx/yolov5/build/yolov5 -s /workspace/tensorrtx/yolov5/build/yolov5m6.wts /workspace/tensorrtx/yolov5/build/yolov5m6.engine m6

容器中访问国外资源的问题

Great work! dockerfile 中需要访问github, git clone https://github.com/opencv/opencv.git; 另外convert.sh 中也有多处需要访问国外网站，请问下如何设置docker可以在容器中访问宿主机的vpn？

Issue in passing variable input image to triton

Hey,

I was trying to use the python client. The "client.py" file is accepting images of variable width & height of any aspect ratio, but the "config.pbtxt" file of the model has mentioned a size of (3,640,640). Also preprocessing does not resizing the images to (640,640). Is that not throwing an input error in triton since tensorrt input have fixed dimension? I tried the same & it is giving me a input size mismatch error.

python client.py image ../test-images/test_image_1.jpeg -width 480 --height 640 -o ../test-images/test_image_1_output_1.jpeg
Namespace(certificate_chain=None, client_timeout=None, confidence=0.5, fps=24.0, height=640, input='../test-images/test_image_1.jpeg ', mode='image', model='yolov5', model_info=False, nms=0.45, out='../test-images/test_image_1_output_1.jpeg', private_key= None, root certificates=None, ssl=False, url-=localhost:8001', verbose=False, width=480)

Running in 'image' mode Creating buffer from image file...

Invoking inference...
Traceback (most recent call last):
File "client.py", line 218, in
results = triton_client.infer(model_name=FLAGS.model,
File /home/divyanshu/anaconda3/envs/yolo/11b/python3.8/site-packages/trttonclient/grpc/init.py", line 1156, in infer
raise_error_grpc(rpc_error)
File "/home/divyanshu/anaconda3/envs/yolo/l1b/python3.8/site-packages/tritonclient/grpc/tntt.py", line 62, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils. InferenceServerException: [StatusCode. INVALID ARGUMENT] unexpected shape for input 'images' for model 'yolov5'. Expected [1,3,384,640], got [1,3,480,640]

Any insight on this would be helpful.

Thanks

Version for Jetson nano?

Hello, this is not really the issue, I understand that this repo is for x86, but maybe you could provide a version for arm platforms like Jetson family?

Only size 1 arrays can be converted to python scalars

Hi,
I run triton client for infer yolov5s but when run function postprocess I have a error "only size 1 arrays can be converted to python scalars" at. Please help me

Here my "saved_model" information

Low gpu utilization

I get around 25% gpu utilization (got that number using gpustats util). That seems to be pretty low. I have one yolov5l model converted to tensorrt in triton server. I have A4000 gpu and several cameras. I don’t get all 25fps, but gpu utilization does not do higher than 25%. Cpu is also used not more than 50% on all cores. How to get more performance? How should I troubleshoot the problem?