cyyai / aiinfer Goto Github PK

View Code? Open in Web Editor NEW

111.0 3.0 20.0 75.75 MB

CMake 1.45% C++ 79.72% Shell 0.37% Cuda 17.65% Python 0.81%

aiinfer's Introduction

项目介绍

这是一个c++版的AI推理库，目前只支持tensorrt模型的推理。为了对相关任务进行加速，任务的前后处理大多是由cuda编写，项目中会有cuda版本标识。

新增项目消息:

🔥增加了RT-DETR的目标检测tensorrt推理
- 导出RT-DETR-Engine模型教程
- RT-DETR检测cuda版本
🔥增加了多目标追踪ByteTrack的实现(速度快,精度高)，强烈推荐
- ByteTrack的使用教程
🔥增加了yolov8各个任务的tensorrt推理，包含检测、分割、姿态估计
🔥增加yolo系列通用的检测代码，包含yolov5、yolox、yolov6、yolov7
- 导出各yolo系列Engine模型的教程
- 上述yolo系列通用det-cuda代码

项目目录介绍

AiInfer
  |--application # 模型推理应用的实现，你自己的模型推理可以在该目录下实现
    |--yolov8_det_app # 举例：实现的一个yolov8检测
    |--xxxx
  |--utils # 工具目录
    |--backend # 这里实现backend的推理类
    |--common # 里面放着一些常用的工具类
      |--arg_parsing.hpp # 命令行解析类，类似python的argparse
      |--cuda_utils.hpp # 里面放着一些cuda常用的工具函数
      |--cv_cpp_utils.hpp # 里面放着一些cv相关的工具函数
      |--memory.hpp # 有关cpu、gpu内存申请和释放的工具类
      |--model_info.hpp # 有关模型的前后处理的常用参数定义，例如均值方差、nms阈值等
      |--utils.hpp # cpp中常用到的工具函数，计时、mkdir等
    |--post_process # 后处理实现目录，cuda后处理加速,如果你有自定义的后处理也可以写在这里
    |--pre_process # 前处理实现目录，cuda前处理加速,如果你有自定义的前处理也可以写在这里
    |--tracker # 这个是目标检测追踪库的实现，已解耦，不想用可直接删除
  |--workspaces # 工作目录，里面可以放一些测试图片/视频、模型，然后在main.cpp中直接使用相对路径
  |--mains # 这里面是main.cpp合集，这里采用每个app单独对应一个main文件，便于理解，写一起太冗余

如何开始

1. Linux & Windows下环境配置

linux推荐使用VSCode,windows推荐使用visual studio 2019
安装显卡驱动、cuda、cudnn、opencv、tensorrt-->安装教程
建议先从一个检测的例子入手，熟悉项目架构，例如：application/yolov8_app/yolov8_det_cuda

2. onnx转trt【fp16+int8】

onnx的导出建议是动态batch，这里举例pytorch模型的导出，如果你需要动态宽高，该项目也支持~

torch.onnx._export(
        model,
        dummy_input, # 例如torch.randn(1,3,640,640)
        save_onnx_path,
        input_names=["image"],
        output_names=["output"],
        dynamic_axes={'image': {0: 'batch'},
                      'output': {0: 'batch'}},
        opset_version=args.opset, # 一般11或12更加适用于各种芯片或板子
    )

将onnx精简[可选]

# 注意，如果你已经在代码中运行过onnxsim了，那就略过这步
pip install onnxsim # 安装onnxsim库，可以直接将复杂的onnx转为简单的onnx模型，且不改变其推理精度
onnxsim input_onnx_model output_onnx_model # 通过该命令行你会得到一个去除冗余算子的onnx模型

onnx的fp16量化，转tensorrt，建议动态batch

# 前提，保证导出的onnx是动态batch，也就是输入shape是[-1,3,640,640]。注:640只是举例,输入你的宽高即可
trtexec --onnx=xxx_dynamic.onnx \
        --workspace=4098 \
        --minShapes=image:1x3x640x640 \
        --maxShapes=image:16x3x640x640 \
        --optShapes=image:4x3x640x640 \
        --saveEngine=xxx.engine \
        --avgRuns=100 \
        --fp16

onnx的int8量化，这个尽量不要用trtexec导出，精度会有点问题，建议使用
- 商汤的ppq的int8量化工具,支持tensorrt|openvino|mnn|ncnn|...
- ppq不会使用的看yolov6的量化教程:

3. 项目编译和运行

配置CMakeLists中的计算能力为你的显卡对应值
- 例如-gencode=arch=compute_75,code=sm_75，例如RTX3090是86，则是：-gencode=arch=compute_86,code=sm_86
- 计算能力根据型号参考这里查看：https://developer.nvidia.com/zh-cn/cuda-gpus#compute
- 也可直接运行脚本python3 assets/get_device_sm.py获取sm值
在CMakeLists.txt中配置你本机安装的tensorrt路径，和add_executable中你要使用的main.cpp文件
CMake:
- mkdir build && cd build
- cmake ..
- make -j8
- cd ..
查看项目需要输入的命令

cd workspaces
./infer -h

--model_path, -f: 要输如模型的路径，必选
--image_path, -i: 要输出的测试图片，必选
--batch_size, -b: 要使用的batch_size[>=1]，可选，默认=1
--score_thr, -s: 一般指后处理要筛选的得分阈值，可选，默认=0.5f
--device_id, -g: 多显卡的显卡id,可选，默认=0
--loop_count, -c: 要推理的次数，一般用于计时，可选，默认=10
--warmup_runs, -w: 模型推理的预热次数(激活cuda核)，可选，默认=2
--output_dir, -o: 要存储结果的目录，可选，默认=''
--help, -h: 使用-h来查看都有哪些命令

# 然后运行按照你自己的要求运行即可，例如：
./infer -f xxx.engine -i xxx.jpg -b 10 -c 10 -o cuda_res # 使用cuda的前后处理，结果保存在cuda_res文件夹下

4. 制作成c++的sdk,交付项目

cd build
make install
# 然后你会在workspaces下看到一个install文件夹，这里面就是你要交付的include文件和so库

感谢相关项目

aiinfer's People

Contributors

Stargazers

Watchers

aiinfer's Issues

CmakeList.txt编译时遇到的问题

cmake --version
cmake version 3.26.0
cmake -S . -B build
success
cmake --build build
fatal error: Eigen/Core: No such file or directory
sudo apt-get install libeigen3-dev
修改CmakeLists.txt
`include_directories(
# eigen3
"/usr/include/eigen3"
${OpenCV_INCLUDE_DIRS}
${CUDA_INCLUDE_DIRS}
${EIGEN3_INCLUDE_DIRS} # 追踪要用到

# tensorrt
${TensorRT_ROOT}/include
${TensorRT_ROOT}/samples/common # 导入这个主要是为了适应于trt多版本[v7.xx,v8.xx]的logger导入

# 项目里要用到的
${PROJECT_SOURCE_DIR}/utils
${PROJECT_SOURCE_DIR}/application

) build success`

yolov8-pose的代码存在问题

我用导出的单batch onnx，最后的推理结果如下：

RT-DETR动态输入的onnx导出的trt推理报错，使用的是Python版本，报错信息如下：[TRT] [E] 3: [executionContext.cpp::resolveSlots::2791] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2791, condition: allInputDimensionsSpecified(routine))，静态onnx导出的trt推理没问题

cuda编译错误

您好，start+git clone ->cuda build,整个工程代码基本没变，然后配置完本地cuda+trt属性，我用的是cuda11.7 tensorrt843,然后编译.cu时总是报错

identifier "w1" is undefined
...

identifier "v4" is undefined
this declaration has no storage class or type specifier
expression must have class type but it has type "double (*)(int, const double *)"
...
MSB3721 命令“"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\nvcc.exe" -gencode=arch=compute_52,code="sm_52,compute_52" --use-local-env -ccbin "D:\program\VS2019\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64" -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Fdx64\Release\vc142.pdb /FS /MD " -o F:\VS2019\TRT_deploy\test_trtv8seg\test_src\x64\Release\cuda_function.cu.obj "F:\VS2019\TRT_deploy\test_trtv8seg\test_src\include\cuda_function.cu"”已退出，返回代码为 1。 test_src D:\program\VS2019\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA 11.7.targets 790

----->可以帮忙看下怎么看解决吗？谢谢您

RTDETR python推理代码报错段错误 (核心已转储)

定位到报错代码是：
def load_engine(engine_path):
print("ssss")
TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
runtime = trt.Runtime(TRT_LOGGER)
print("tttt")
with open(engine_path, 'rb') as f:
print('zzz')
return runtime.deserialize_cuda_engine(f.read())
中的runtime = trt.Runtime(TRT_LOGGER) ，trt文件由TensorRT 8.6.0.12转出，在c++版本中可以正常推理，pytensorrt版本是8.6.0 ,推理报错段错误 (核心已转储)，请问可能的原因是什么？谢谢。

添加C++编译问题

-- Configuring done
-- Generating done
-- Build files have been written to: /home/uisee/disk/dl_model_deploy/dl_model_infer/build
[ 14%] Building NVCC (Device) object CMakeFiles/utils_cu_cpp.dir/utils/preprocess/utils_cu_cpp_generated_pre_process.cu.o
/home/uisee/disk/dl_model_deploy/dl_model_infer/utils/preprocess/pre_process.cu(30): error: more than one instance of overloaded function "ai::preprocess::rint" matches the argument list:
function "rint(float)"
function "std::rint(float)"
argument types are: (float)

/home/uisee/disk/dl_model_deploy/dl_model_infer/utils/preprocess/pre_process.cu(31): error: more than one instance of overloaded function "ai::preprocess::rint" matches the argument list:
function "rint(float)"
function "std::rint(float)"
argument types are: (float)

2 errors detected in the compilation of "/home/uisee/disk/dl_model_deploy/dl_model_infer/utils/preprocess/pre_process.cu".
CMake Error at utils_cu_cpp_generated_pre_process.cu.o.Debug.cmake:280 (message):
Error generating file
/home/uisee/disk/dl_model_deploy/dl_model_infer/build/CMakeFiles/utils_cu_cpp.dir/utils/preprocess/./utils_cu_cpp_generated_pre_process.cu.o

CMakeFiles/utils_cu_cpp.dir/build.make:537: recipe for target 'CMakeFiles/utils_cu_cpp.dir/utils/preprocess/utils_cu_cpp_generated_pre_process.cu.o' failed
make[2]: *** [CMakeFiles/utils_cu_cpp.dir/utils/preprocess/utils_cu_cpp_generated_pre_process.cu.o] Error 1
CMakeFiles/Makefile2:84: recipe for target 'CMakeFiles/utils_cu_cpp.dir/all' failed
make[1]: *** [CMakeFiles/utils_cu_cpp.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2

Segmentation fault (core dumped)

你好，我想问下，FP16分割推理最后这个报错是怎么回事？
前面生成engine 啥地都很正常 ... 也转换维度了

./infer_seg -f weights/yolov8n-seg.engine -i res/bus.jpg -o cuda_res

***** Display run Config: start *****
model path set to: weights/yolov8n-seg.engine
image path set to: res/bus.jpg
batch size set to: 1
score threshold set to: 0.4
device id set to: 0
loop count set to: 10
num of warmup runs set to: 2
output directory set to: cuda_res
***** Display run Config: end *****

[trt_infer.cpp:147]: Infer 0x5605cfcd2a90 [StaticShape]
[trt_infer.cpp:160]: Inputs: 1
[trt_infer.cpp:165]: 	0.images : shape {1x3x640x640}
[trt_infer.cpp:168]: Outputs: 2
[trt_infer.cpp:173]: 	0.output1 : shape {1x32x160x160}
[trt_infer.cpp:173]: 	1.output0 : shape {1x8400x116}
[Batch=1, iters=10,run infer mean time:]: 1.80176 ms
Segmentation fault (core dumped)