Giter Site home page Giter Site logo

tensorrt-triton-yolov5's Introduction

YOLOv5 on Triton Inference Server with TensorRT

This repository shows how to deploy YOLOv5 as an optimized TensorRT engine to Triton Inference Server.

This project based on isarsoft yolov4-triton-tensorrt and Wang Xinyu - TensorRTx

Build TensorRT engine

Run the following to get a running TensorRT container with our repo code:

cd tensorrt-triton-yolov5
bash launch_tensorrt.sh

Or build docker from Dockerfile

cd tensorrt-triton-yolov5
sudo docker build -t baohuynhbk/tensorrt-20.08-py3-opencv4:latest -f tensorrt.Dockerfile .

Docker will download the TensorRT container. You need to select the version (in this case 20.08) according to the version of Triton that you want to use later to ensure the TensorRT versions match. Matching NGC version tags use the same TensorRT version.

Inside the container the following will run:

bash convert.sh

This will generate a file called yolov5.engine, which is our serialized TensorRT engine. Together with libmyplugins.so we can now deploy to Triton Inference Server.

Deploy to Triton Inference Server

Start Triton Server

Open an terminal

bash run_triton.sh

Client

Should install tritonclient first:

sudo apt update
sudo apt install libb64-dev

pip install nvidia-pyindex
pip install tritonclient[all]

Open another terminal. This repo contains a python client.

cd triton-deploy/clients/python
python client.py -o data/dog_result.jpg image data/dog.jpg

Benchmark

To benchmark the performance of the model, we can run Tritons Performance Client.

To run the perf_client, install the Triton Python SDK (tritonclient), which ships with perf_client as a preinstalled binary.

# Example
perf_client -m yolov5 -u 127.0.0.1:8221 -i grpc --shared-memory system --concurrency-range 32

Alternatively you can get the Triton Client SDK docker container.

docker run -it --ipc=host --net=host nvcr.io/nvidia/tritonserver:21.03-py3-sdk /bin/bash
cd install/bin
# Example
./perf_client -m yolov5 -u 127.0.0.1:8221 -i grpc --shared-memory system --concurrency-range 4

The following benchmarks were taken on a system with NVIDIA 2080 Ti GPU. Concurrency is the number of concurrent clients invoking inference on the Triton server via grpc. Results are total frames per second (FPS) of all clients combined and average latency in milliseconds for every single respective client.

tensorrt-triton-yolov5's People

Contributors

huynhbaobk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

tensorrt-triton-yolov5's Issues

Error while converting

Loading weights: /workspace/tensorrtx/yolov5/build/yolov5m6.wts
Building engine, please wait for a while...
[01/20/2022-12:37:24] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[01/20/2022-12:37:26] [E] [TRT] ../rtSafe/cuda/caskUtils.cpp (98) - Assertion Error in trtSmToCask: 0 (Unsupported SM.)
Build engine successfully!
yolov5: /workspace/tensorrtx/yolov5/yolov5.cpp:243: void APIToModel(unsigned int, nvinfer1::IHostMemory**, bool&, float&, float&, std::__cxx11::string&): Assertion `engine != nullptr' failed.
./convert.sh: line 26: 327 Aborted (core dumped) /workspace/tensorrtx/yolov5/build/yolov5 -s /workspace/tensorrtx/yolov5/build/yolov5m6.wts /workspace/tensorrtx/yolov5/build/yolov5m6.engine m6

容器中访问国外资源的问题

Great work! dockerfile 中需要 访问github, git clone https://github.com/opencv/opencv.git; 另外convert.sh 中也有多处需要访问国外网站,请问下如何设置docker可以在容器中访问宿主机的vpn?

Issue in passing variable input image to triton

Hey,

I was trying to use the python client. The "client.py" file is accepting images of variable width & height of any aspect ratio, but the "config.pbtxt" file of the model has mentioned a size of (3,640,640). Also preprocessing does not resizing the images to (640,640). Is that not throwing an input error in triton since tensorrt input have fixed dimension? I tried the same & it is giving me a input size mismatch error.

python client.py image ../test-images/test_image_1.jpeg -width 480 --height 640 -o ../test-images/test_image_1_output_1.jpeg
Namespace(certificate_chain=None, client_timeout=None, confidence=0.5, fps=24.0, height=640, input='../test-images/test_image_1.jpeg ', mode='image', model='yolov5', model_info=False, nms=0.45, out='../test-images/test_image_1_output_1.jpeg', private_key= None, root certificates=None, ssl=False, url-=localhost:8001', verbose=False, width=480)

Running in 'image' mode Creating buffer from image file...

Invoking inference...
Traceback (most recent call last):
File "client.py", line 218, in
results = triton_client.infer(model_name=FLAGS.model,
File /home/divyanshu/anaconda3/envs/yolo/11b/python3.8/site-packages/trttonclient/grpc/init.py", line 1156, in infer
raise_error_grpc(rpc_error)
File "/home/divyanshu/anaconda3/envs/yolo/l1b/python3.8/site-packages/tritonclient/grpc/tntt.py", line 62, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils. InferenceServerException: [StatusCode. INVALID ARGUMENT] unexpected shape for input 'images' for model 'yolov5'. Expected [1,3,384,640], got [1,3,480,640]

Any insight on this would be helpful.

Thanks

Version for Jetson nano?

Hello, this is not really the issue, I understand that this repo is for x86, but maybe you could provide a version for arm platforms like Jetson family?

Low gpu utilization

I get around 25% gpu utilization (got that number using gpustats util). That seems to be pretty low. I have one yolov5l model converted to tensorrt in triton server. I have A4000 gpu and several cameras. I don’t get all 25fps, but gpu utilization does not do higher than 25%. Cpu is also used not more than 50% on all cores. How to get more performance? How should I troubleshoot the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.