Giter Site home page Giter Site logo

k9ele7en / triton-tensorrt-inference-craft-pytorch Goto Github PK

View Code? Open in Web Editor NEW
31.0 2.0 7.0 15.88 MB

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

License: BSD 3-Clause "New" or "Revised" License

Python 97.19% Shell 2.81%
triton-inference-server tensorrt tensorrt-conversion onnx onnx-torch pytorch nvidia-docker inference-engine inference-server inference

triton-tensorrt-inference-craft-pytorch's Introduction

Advanced Triton Inference Pipeline for CRAFT (Character-Region Awareness For Text detection)

Overview

Implementation new inference pipeline using NVIDIA Triton Inference Server for CRAFT text detector in Pytorch.

Author

k9ele7en. Give 1 star if you find some value in this repo.
Thank you.

License

[BSD-3-Clause License] The BSD 3-clause license allows you almost unlimited freedom with the software so long as you include the BSD copyright and license notice in it (found in Fulltext).

Updates

13 Jul, 2021: Initial update, preparation script run well.

14 Jul, 2021: Inference on Triton server run well (single request), TensorRT format give advance performance.

Getting started

1. Install dependencies

Requirements

$ pip install -r requirements.txt

2. Install required environment for inference using Triton server

Check ./README_ENV.md for details. Install tools/packages included:

  • TensorRT
  • Docker
  • nvidia-docker
  • PyCUDA ...

3. Training

The code for training is not included in this repository, as ClovaAI provided.

4. Inference instruction using pretrained model

  • Download the trained models
Model name Used datasets Languages Purpose Model Link
General SynthText, IC13, IC17 Eng + MLT For general purpose Click
IC15 SynthText, IC15 Eng For IC15 only Click
LinkRefiner CTW1500 - Used with the General Model Click

5. Model preparation before run Triton server:

a. Triton Inference Server inference: see details at ./README_ENV.md
Initially, you need to run a (.sh) script to prepare Model Repo, then, you just need to run Docker image when inferencing. Script get things ready for Triton server, steps covered:

  • Convert downloaded pretrain into mutiple formats
  • Locate converted model formats into Triton's Model Repository
  • Run (Pull first if not exist) Triton Server image from NGC

Check if Server running correctly:

$ curl -v localhost:8000/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

Now everythings ready, start inference by:

  • Run docker image of Triton server (replace mount -v path to your full path to model_repository):
$ sudo docker run --gpus all --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/maverick911/repo/Triton-TensorRT-Inference-CRAFT-pytorch/model_repository:/models nvcr.io/nvidia/tritonserver:21.05-py3 tritonserver --model-repository=/models
...
+------------+---------+--------+
| Model      | Version | Status |
+------------+---------+--------+
| detec_onnx | 1       | READY  |
| detec_pt   | 1       | READY  |
| detec_trt  | 1       | READY  |
+------------+---------+--------+
I0714 00:37:55.265177 1 grpc_server.cc:4062] Started GRPCInferenceService at 0.0.0.0:8001
I0714 00:37:55.269588 1 http_server.cc:2887] Started HTTPService at 0.0.0.0:8000
I0714 00:37:55.312507 1 http_server.cc:2906] Started Metrics Service at 0.0.0.0:8002

Run infer by cmd:

$ python infer_triton.py -m='detec_trt' -x=1 --test_folder='./images' -i='grpc' -u='localhost:8001'
Request 1, batch size 1s/sample.jpg
elapsed time : 0.9521937370300293s

Output from Triton:

Performance benchmarks: single image (sample.jpg), time in seconds

  • Triton server: (gRPC-HTTP):

    Model format gRPC (s) HTTP (s)
    TensoRT 0.946 0.952
    Torchscript 1.244 1.098
    ONNX 1.052 1.060
  • Classic Pytorch: 1.319s

Arguments

  • -m: name of model with format
  • -x: version of model
  • --test_folder: input image/folder
  • -i: protocol (HTTP/gRPC)
  • -u: URL of corresponding protocol (HTTP-8000, gRPC-8001)
  • ... (Details in ./infer_triton.py)

Notes:

  • Error below is caused by wrong dynamic input shapes, check if the input image shape is valid to dynamic shapes in config.
inference failed: [StatusCode.INTERNAL] request specifies invalid shape for input 'input' for detec_trt_0_gpu0. Error details: model expected the shape of dimension 2 to be between 256 and 1200 but received 1216

b. Classic Pytorch (.pth) inference:

$ python test.py --trained_model=[weightfile] --test_folder=[folder path to test images]

The result image and socre maps will be saved to ./result by default.

triton-tensorrt-inference-craft-pytorch's People

Contributors

k9ele7en avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

triton-tensorrt-inference-craft-pytorch's Issues

speedup

hello,
I found that there is speedup using tensorrt(fp32, fp16) inference, is that right?

And I found that batch inference for torch model has no speedup too. I do not know if there is something wrong for me

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.