Giter Site home page Giter Site logo

tpat's Introduction

TPAT - TensorRT Plugin Autogen Tool

Introduction

  1. Automatically generate high-performance TensorRT plugins for unsupported operators or replacing inefficient kernels.
  2. End-to-end command line tool. No requirement for any CUDA programming knowledge. Users only need to provide the ONNX model and assign the node names or types to auto-generate TensorRT plugin.
  3. The performance of auto-generated TensorRT plugins in real cases:

Support Matrix

Runtime Env : dockerfile

1. Build image

nvidia-docker build .

2. Run container

nvidia-docker run -itd --gpus all -v <TPAT path dir>:/root <Image_ID> /bin/bash

3. Execute conrainer

nvidia-docker exec -it <Container_ID> /bin/bash

4. Modify CUDA_PATH and TRT_PATH in python/trt_plugin/Makefile

CUDA_PATH: local CUDA installation path
TRT_LIB_PATH: local TensorRT installation path

5. Plugin auto generated

cd examples
python test_onehot_dynamic_direct.py
  • tpat_onehot.so is stored in python/trt_plugin/lib/

Runtime Env : Build

1. Prerequisites

System Packages

  • LLVM >= 9.0.1, (LLVM==9.0.1 recommended)
  • GCC >= 7.3.0, (GCC==7.4.0 recommended)
  • TensorRT

PyPI packages

  • numpy pycuda onnx onnxruntime onnx_graphsurgeon xgboost jinja2 ctypes tornado cloudpickle psutil

NOTE: these necessary packages are recorded in requirements.txt

Optional packages

  • tensorflow-gpu==1.15
  • tf2onnx
  • torch
  • pytest

NOTE: these optional packages are required by Example and UnitTest

2. Clone the TPAT repository

git clone -b master https://github.com/nvidia/TensorRT TPAT
cd TPAT
git submodule update --init --recursive

3. Build BlazerML-TVM

mkdir build && cp cmake/config.cmake build
#Edit build/config.cmake to customize the compilation options
set(USE_LLVM /usr/local/llvm/bin/llvm-config)
set(USE_CUDA ON)
#gcc compiler is required to support C++14
cd build && cmake .. 
make -j
#TVM Python package
export TVM_HOME=/path/to/tvm
export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}

4. Plugin Compiler Env

Modify python/trt_plugin/Makefile according to your environment setup.

CUDA_PATH: local CUDA installation path
TRT_LIB_PATH: local TensorRT installation path

Usage

TPAT provides a Python function and command line for usage.

Python function

onnx2plugin(
	input_model_path, 
	output_model_path, 
	node_names=None, 
	node_types=None, 
	plugin_name_dict=None,
	dynamic_bs=False, # if True, this operator support dynamic batchsize
	min_bs=1,
	max_bs=256,
	opt_bs=128
	)
  • input_model_path[required] : input onnx model including nodes which require TRT plugin
  • output_model_path[required] : output onnx model where the corresponding node types are replaced by plugin names. The output onnx model can be directly converted to TRT with onnx parser and built plugin dynamic library.
  • node_names : list of node names for autogen
  • node_types : list of node types for autogen
  • plugin_name_dict : dict of {plugin_name: node_name} for autogen
  • dynamic_bs : if True, TPAT will generate plugin that supported dynamic batch, if False, generated plugin only support fixed shapes but has better performance.
  • min_bs: the minium batch size in range of dynamic batch.
  • max_bs: the maxium batch size in range of dynamic batch.
  • opt_bs: the optimize batch size in range of dynamic batch.

NOTE: For node_names, node_types, plugin_name_dict, at least one of them should be provided

Command line

# Separate different ops with spaces
python3 Onnx2Plugin.py -i input.onnx -o output.onnx -n op_name1 op_name2 -dynamic=true -min=1 -max=512 -opt=256
python3 Onnx2Plugin.py -i input.onnx -o output.onnx -t op_type1 op_type2 -dynamic=false
python3 Onnx2Plugin.py -i input.onnx -o output.onnx -p '{"op_name1": "plugin_name1", "op_name2": "plugin_name2"}'
  • -i[required]: input_model_path
  • -o[required]: output_model_path
  • -n: node_names
  • -t: node_types
  • -p: plugin_name_dict
  • -dynamic: dynamic_bs
  • -min: min_bs
  • -max: max_bs
  • -opt: opt_bs

Output

1. Assign nodes and plugin names through plugin_name_dict

  • trt_plugin/src contains {plugin_name}.cu and {plugin_name}.h
  • trt_plugin/lib contains {plugin_name}.so

2. Assign node names or node types

  • trt_plugin/src contains tpat_{node_name}.cu and tpat_{node_name}.h
  • trt_plugin/lib contains tpat_{node_name}.so

Example && UnitTest

Release notes

Changelog

  • Support mutiple nodes for autogen
  • Support boolean input/outputs
  • Able to reuse plugins

Known issues

  • Only support dynamic BatchSize
  • Opeartors with int8/float16/double inputs/outputs are not supported

TODO

  • Support ONNX subgraph for autogen
  • Support direction conversion from TensorFlow and PyTorch

tpat's People

Contributors

buptqq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tpat's Issues

Fail to run example test_onehot_dynamic_direct.py

Description

I tried to run the example test_onehot_dynamic_direct.py , but got a segment fault. And I found this fault occurred in parser.parse(model.read()) (line 268). I would appreciate it if you could help me solve this problem.

Environment

docker-image==nvcr.io/nvidia/tensorflow:20.06-tf1-py3
nvidia-driver==470.82
cuda==11.3
TensorRT==8.2.3

onnx==1.10.0
onnxruntime==1.10.0
onnxruntime-gpu==1.10.0
onnx-graphsurgeon==0.3.26
tf2onnx==1.11.1

Log

[02/28/2023-11:46:17] [TRT] [V] Original shape: (_, 64), unsqueezing to: (_, _, _, _)
[02/28/2023-11:46:17] [TRT] [W] ShapedWeights.cpp:173: Weights dense/kernel/read:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[02/28/2023-11:46:17] [TRT] [V] Registering layer: dense/MatMul for ONNX node: dense/MatMul
[02/28/2023-11:46:17] [TRT] [V] Original shape: (_, 256, 1, 1), squeezing to: (_, _)
[02/28/2023-11:46:17] [TRT] [V] Registering tensor: dense/MatMul:0 for ONNX tensor: dense/MatMul:0
[02/28/2023-11:46:17] [TRT] [V] dense/MatMul [MatMul] outputs: [dense/MatMul:0 -> (-1, 256)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Parsing node: Min__6 [Min]
[02/28/2023-11:46:17] [TRT] [V] Searching for input: dense/MatMul:0
[02/28/2023-11:46:17] [TRT] [V] Searching for input: clip_by_value/Minimum/y:0
[02/28/2023-11:46:17] [TRT] [V] Min__6 [Min] inputs: [dense/MatMul:0 -> (-1, 256)[FLOAT]], [clip_by_value/Minimum/y:0 -> ()[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Registering layer: clip_by_value/Minimum/y:0 for ONNX node: clip_by_value/Minimum/y:0
[02/28/2023-11:46:17] [TRT] [V] Registering layer: Min__6 for ONNX node: Min__6
[02/28/2023-11:46:17] [TRT] [V] Registering tensor: Min__6:0 for ONNX tensor: Min__6:0
[02/28/2023-11:46:17] [TRT] [V] Min__6 [Min] outputs: [Min__6:0 -> (-1, 256)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Parsing node: Max__9 [Max]
[02/28/2023-11:46:17] [TRT] [V] Searching for input: Min__6:0
[02/28/2023-11:46:17] [TRT] [V] Searching for input: clip_by_value/y:0
[02/28/2023-11:46:17] [TRT] [V] Max__9 [Max] inputs: [Min__6:0 -> (-1, 256)[FLOAT]], [clip_by_value/y:0 -> ()[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Registering layer: clip_by_value/y:0 for ONNX node: clip_by_value/y:0
[02/28/2023-11:46:17] [TRT] [V] Registering layer: Max__9 for ONNX node: Max__9
[02/28/2023-11:46:17] [TRT] [V] Registering tensor: Max__9:0 for ONNX tensor: Max__9:0
[02/28/2023-11:46:17] [TRT] [V] Max__9 [Max] outputs: [Max__9:0 -> (-1, 256)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Parsing node: Cast [Cast]
[02/28/2023-11:46:17] [TRT] [V] Searching for input: Max__9:0
[02/28/2023-11:46:17] [TRT] [V] Cast [Cast] inputs: [Max__9:0 -> (-1, 256)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Casting to type: int32
[02/28/2023-11:46:17] [TRT] [V] Registering layer: Cast for ONNX node: Cast
[02/28/2023-11:46:17] [TRT] [V] Registering tensor: Cast:0 for ONNX tensor: Cast:0
[02/28/2023-11:46:17] [TRT] [V] Cast [Cast] outputs: [Cast:0 -> (-1, 256)[INT32]], 
[02/28/2023-11:46:17] [TRT] [V] Parsing node: test_onehot [tpat_test_onehot]
[02/28/2023-11:46:17] [TRT] [V] Searching for input: Cast:0
[02/28/2023-11:46:17] [TRT] [V] Searching for input: const_fold_opt__17
[02/28/2023-11:46:17] [TRT] [V] Searching for input: const_fold_opt__19
[02/28/2023-11:46:17] [TRT] [V] test_onehot [tpat_test_onehot] inputs: [Cast:0 -> (-1, 256)[INT32]], [const_fold_opt__17 -> (1)[INT32]], [const_fold_opt__19 -> (2)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [I] No importer registered for op: tpat_test_onehot. Attempting to import as plugin.
[02/28/2023-11:46:17] [TRT] [I] Searching for plugin: tpat_test_onehot, plugin_version: 1, plugin_namespace: 
[02/28/2023-11:46:17] [TRT] [V] Registering layer: const_fold_opt__17 for ONNX node: const_fold_opt__17
[02/28/2023-11:46:17] [TRT] [V] Registering layer: const_fold_opt__19 for ONNX node: const_fold_opt__19
[02/28/2023-11:46:17] [TRT] [I] Successfully created plugin: tpat_test_onehot
[02/28/2023-11:46:17] [TRT] [V] Registering layer: test_onehot for ONNX node: test_onehot
Segmentation fault

KeyError int8

Model class

from transformers.modeling_outputs import CausalLMOutputWithCrossAttentions

class CustomModel(GPT2LMHeadModel):
    def __init__(self, config):
        super(CustomModel, self).__init__(config)
        self.loss = torch.nn.CrossEntropyLoss()

    def forward(
        self,
        input_ids: Optional[torch.IntTensor] = None
    ) -> torch.FloatTensor:
        
        transformer_outputs = self.transformer(
            input_ids,
            past_key_values=None,
            attention_mask=None,
            token_type_ids=None,
            position_ids=None,
            head_mask=None,
            inputs_embeds=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
            use_cache=None,
            output_attentions=None,
            output_hidden_states=None,
            return_dict=None,
        )
        hidden_states = transformer_outputs[0]

        lm_logits = self.lm_head(hidden_states)
        labels = input_ids
        shift_logits = lm_logits[..., :-1, :].contiguous()
        shift_labels = labels[..., 1:].contiguous()

        loss = self.loss(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))        

        # return loss.reshape(-1,1)
        return CausalLMOutputWithCrossAttentions(
            loss=loss.reshape(-1,1),
            logits=None,
            past_key_values=None,
            hidden_states=None,
            attentions=None,
            cross_attentions=None,
        )

Command for conversion:

onnx2plugin(
	input_model_path ="./onnx_tpat/model.onnx", 
	output_model_path="./onnx_tpat/model.tpat.onnx", 
	# node_names="/loss/SoftmaxCrossEntropyLoss", 
	node_types = ["SoftmaxCrossEntropyLoss"], 
	plugin_name_dict={"SoftmaxCrossEntropyLoss": "tpat_softmax_cross_entropy"},
    dynamic_bs=False,
	# dynamic_bs=True, # if True, this operator support dynamic batchsize
	# min_bs=1,
	# opt_bs=64,
	# max_bs=100,
	)

I faced this error:

Couldn't find reusable plugin for node [/loss/SoftmaxCrossEntropyLoss](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f74656e736f7272742d6167632d6465746563746f722d72756e2d376531323764333238643637222c2273657474696e6773223a7b22686f7374223a227373683a2f2f6563322d36332d33322d35322d3130302e65752d776573742d312e636f6d707574652e616d617a6f6e6177732e636f6d227d7d.vscode-resource.vscode-cdn.net/loss/SoftmaxCrossEntropyLoss)
Start auto-tuning!
Compile...
[/tmp/tuning.log](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f74656e736f7272742d6167632d6465746563746f722d72756e2d376531323764333238643637222c2273657474696e6773223a7b22686f7374223a227373683a2f2f6563322d36332d33322d35322d3130302e65752d776573742d312e636f6d707574652e616d617a6f6e6177732e636f6d227d7d.vscode-resource.vscode-cdn.net/tmp/tuning.log) does not exist!


Running...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[2], line 1
----> 1 onnx2plugin(
      2 	input_model_path ="[./onnx_tpat/model.onnx](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f74656e736f7272742d6167632d6465746563746f722d72756e2d376531323764333238643637222c2273657474696e6773223a7b22686f7374223a227373683a2f2f6563322d36332d33322d35322d3130302e65752d776573742d312e636f6d707574652e616d617a6f6e6177732e636f6d227d7d.vscode-resource.vscode-cdn.net/root/working/onnx_tpat/model.onnx)", 
      3 	output_model_path="[./onnx_tpat/model.tpat.onnx](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f74656e736f7272742d6167632d6465746563746f722d72756e2d376531323764333238643637222c2273657474696e6773223a7b22686f7374223a227373683a2f2f6563322d36332d33322d35322d3130302e65752d776573742d312e636f6d707574652e616d617a6f6e6177732e636f6d227d7d.vscode-resource.vscode-cdn.net/root/working/onnx_tpat/model.tpat.onnx)", 
      4 	# node_names="[/loss/SoftmaxCrossEntropyLoss](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f74656e736f7272742d6167632d6465746563746f722d72756e2d376531323764333238643637222c2273657474696e6773223a7b22686f7374223a227373683a2f2f6563322d36332d33322d35322d3130302e65752d776573742d312e636f6d707574652e616d617a6f6e6177732e636f6d227d7d.vscode-resource.vscode-cdn.net/loss/SoftmaxCrossEntropyLoss)", 
      5 	node_types = ["SoftmaxCrossEntropyLoss"], 
      6 	plugin_name_dict={"SoftmaxCrossEntropyLoss": "tpat_softmax_cross_entropy"},
      7     dynamic_bs=False,
      8 	# dynamic_bs=True, # if True, this operator support dynamic batchsize
      9 	# min_bs=1,
     10 	# opt_bs=64,
     11 	# max_bs=100,
     12 	)

File [/workspace/TPAT/python/onnx_to_plugin.py:196](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f74656e736f7272742d6167632d6465746563746f722d72756e2d376531323764333238643637222c2273657474696e6773223a7b22686f7374223a227373683a2f2f6563322d36332d33322d35322d3130302e65752d776573742d312e636f6d707574652e616d617a6f6e6177732e636f6d227d7d.vscode-resource.vscode-cdn.net/workspace/TPAT/python/onnx_to_plugin.py:196), in onnx2plugin(input_model_path, output_model_path, node_names, node_types, plugin_name_dict, dynamic_bs, min_bs, max_bs, opt_bs)
    194         os.remove(dy_input_model)
    195 else:
--> 196     onnx_name_mapping_trt_plugin = generate_plugin_library(
    197         input_model_path, nodes, plugin_name_dict 
    198     )
    199 print("Onnx_name_mapping_trt_plugin: {}".format(onnx_name_mapping_trt_plugin))
    200 OnnxModified(
    201     input_model_path, output_model_path, nodes, onnx_name_mapping_trt_plugin
...
    352             )
    353     input_slot_dict[idx] = self._input_dict[str(i)]
    354 if len(self._allocate_global_memory) != 0:

KeyError: 'int8'

RandomNormal not supported for frontend ONNX

I want create a RandomNormal op and I find this operator marked "Y" in TPAT-1.0 Operator Schemas. I build BlazerML-TVM successfully and use command line python onnx_to_plugin.py -i randn_test.onnx -o output.onnx -t RandomNormal, there was an ERROR tvm.error.OpNotImplemented: The following operators are not supported for frontend ONNX: RandomNormal. In /mypath/TPAT/3rdparty/blazerml-tvm/python/tvm/relay/frontend/onnx.py, I find RandomNormal operator was commented in function _get_convert_map, I think it means RandomNormal was not supported.
So how TPAT-1.0 create .so file for RandomNormal ?

Error when running one_hot example

2023-03-06 03:27:42,218 - INFO - tf2onnx: ONNX model is saved at model/test_op_plugin.onnx
const_input:  Constant (const_fold_opt__17): (shape=(1,), dtype=<class 'numpy.int32'>)
values:  [256]
const_input:  Constant (const_fold_opt__19): (shape=(2,), dtype=<class 'numpy.float32'>)
values:  [0. 1.]
/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:53: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'
  warnings.warn("Specified provider '{}' is not in available provider names."
Compile...
/tmp/tuning.log does not exist!
Running...
/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:53: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'
  warnings.warn("Specified provider '{}' is not in available provider names."
Traceback (most recent call last):
  File "test_onehot_dynamic_direct.py", line 335, in <module>
    main()
  File "test_onehot_dynamic_direct.py", line 229, in main
    trt_plugin_names = onnx2plugin(
  File "/root/examples/../python/onnx_to_plugin.py", line 190, in onnx2plugin
    onnx_name_mapping_trt_plugin = generate_plugin_library(
  File "/root/examples/../python/onnx_to_plugin.py", line 86, in generate_plugin_library
    template_params_list.append(PluginTemplateParams(
  File "/root/python/plugin_template_params.py", line 64, in __init__
    self.parse()
  File "/root/python/plugin_template_params.py", line 163, in parse
    constant_params = self._kernel_generate.constant_param
  File "/root/python/cuda_kernel.py", line 287, in constant_param
    return self._lib.get_constant_params()
AttributeError: 'GraphExecutorFactoryModule' object has no attribute 'get_constant_params'

test_tpat.py error

Traceback (most recent call last):
File "test_tpat.py", line 3860, in
test_abs()
File "test_tpat.py", line 360, in test_abs
op_expect(node, inputs=[x], outputs=[y], op_type=op_type, op_name=op_name)
File "test_tpat.py", line 346, in op_expect
verify_with_ort_with_trt(model, inputs, op_name, np_result=np_result)
File "test_tpat.py", line 251, in verify_with_ort_with_trt
ort_result = get_onnxruntime_output(model, inputs)
File "test_tpat.py", line 225, in get_onnxruntime_output
rep = onnxruntime.backend.prepare(model, "CPU")
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/backend/backend.py", line 138, in prepare
return cls.prepare(bin, device, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/backend/backend.py", line 114, in prepare
inf = InferenceSession(model, sess_options=options, providers=providers)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 370, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/model_load_utils.h:47 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::basic_string, int>&, const onnxruntime::logging::Logger&, bool, const string&, int) ONNX Runtime only guarantees support for models stamped with official released onnx opset versions. Opset 16 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx is till opset 15.

TPAT and TRT - no kernel image is available for execution on the device

Hi,

I've successfully converted a model into TensorRT using TPAT generated plugin using the following command:

/usr/src/tensorrt/bin/trtexec --onnx=model_batch1_tpat.onnx --saveEngine=model.plan --buildOnly --verbose --fp16 --workspace=6000 --explicitBatch --noTF32 --plugins="tpat_onehot.so"

but after running trtexec test using this command:

/usr/src/tensorrt/bin/trtexec  --loadEngine=model.plan --verbose --workspace=6000  --plugins="./tpat_onehot.so"

I'm getting the following errors:

[06/02/2022-02:55:34] [E] [TRT] ../rtExt/cuda/cudaPluginV2DynamicExtRunner.cpp (108) - Cuda Error in execute: 209 (no kernel image is available for execution on the device)
[06/02/2022-02:55:34] [E] [TRT] FAILED_EXECUTION: std::exception

I managed to get one TPAT plugin for tpat_onehot.so which doesn't throw this error, but I don't see any difference in the way I generated the plugins. Is there something about the non-deterministic process of generating a plugin using TVM that can cause this behavior?

Thank you!

Could you provide simple tutorial on how to run onnx_to_plugin for simple operator?

Hi, thank you for your great work.
I just wonder how to run onnx_to_plugin on Tile operator. I know it is supported by TPAT 1.0.
I have tried
python3 onnx_to_plugin.py -i model/pfe_baseline32000.onnx -o model/pfe_baseline_tpat.onnx -t Tile
python3 onnx_to_plugin.py -i model/pfe_baseline32000.onnx -o model/pfe_baseline_tpat.onnx -n Tile_16 -dynamic=true -min=1 -max=256 -opt=128
But it returns

Couldn't find reusable plugin for node Tile_16

  7: tvm::relay::StorageAllocaBaseVisitor::DeviceAwareVisitExpr_(tvm::relay::FunctionNode const*)                                  [0/60]
  6: tvm::relay::StorageAllocaBaseVisitor::GetToken(tvm::RelayExpr const&)
  5: tvm::relay::ExprVisitor::VisitExpr(tvm::RelayExpr const&)
  4: tvm::relay::transform::DeviceAwareExprVisitor::VisitExpr_(tvm::relay::CallNode const*)
  3: tvm::relay::StorageAllocator::DeviceAwareVisitExpr_(tvm::relay::CallNode const*)
  2: tvm::relay::StorageAllocaBaseVisitor::CreateToken(tvm::RelayExprNode const*, bool)
  1: tvm::relay::StorageAllocator::CreateTokenOnDevice(tvm::RelayExprNode const*, DLDeviceType, bool)
  0: tvm::relay::StorageAllocator::GetMemorySize(tvm::relay::StorageToken*)
  File "/workspace/TPAT/3rdparty/blazerml-tvm/src/relay/backend/graph_plan_memory.cc", line 408
TVMError:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (pval != nullptr) is false: Cannot allocate memory symbolic tensor shape [?, ?, ?]

Thank you

tensorflow bert model can not build successfully, when to solve?

tensorflow bert model(ckpt or pb or saved_model) can't build successfully by TPAT, the problem is that the nodes input & output of onnx model transformed by tf2onnx have no shape and dtype info(all are None). My current solution is to use onnxruntime generating another onnx which has shape and dtype info, and then use TPAT to generate TensorRT model. So, when do you have a plan to solve this problem?

test_tpat error

I have build this project with required gcc=7.3.0 and llvm 9.0.1.
my onnxruntime=1.9.0, onnx=1.10.0

when I run the test_tpat.py. I got following error:
Traceback(most recent call last)
file "test_tpad.py" line 3908, in
test_abs()
.............
File "../python/onnx_to_plugin.py", line 98, in onnx2plugin
input_model_path, nodes, plugin_name_dict
file "../python/onnx_to_plugin.py", line 43, in generate_plugin_library
cuda_kernel.run()
file "../python/cuda_kernel.py" in line 69, in run
mod, params, self._target, include_simple_tasks = True, opt_level = op_level
TypError:autoscheduler_get_tunning_tasks() got an unexpected keyword argument 'opt_level'

can anyone help?
thankyou!

so build succeed, tensorrt run error one hot example

running in the docker created by docker file

get:

[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin tpat_test_onehot version 1
In node -1 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[TensorRT] ERROR: Network must have at least one output
[TensorRT] ERROR: Network validation failed.
[ERROR] engine is None

seems plugin not load properly.

how to fix it.

see full log below

root@0390133f0efa:~/examples# python test_onehot_dynamic_direct.py
2023-10-08 08:49:03.568139: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-10-08 08:49:05.214999: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2023-10-08 08:49:05.215383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x678b5d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-10-08 08:49:05.215399: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2023-10-08 08:49:05.216549: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2023-10-08 08:49:05.273763: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.273974: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x678d2f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-08 08:49:05.273991: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 2070, Compute Capability 7.5
2023-10-08 08:49:05.274107: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.274219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:05.274242: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:05.274250: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:05.274276: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:05.274284: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:05.276174: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:05.276622: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:05.276637: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:05.276687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.276832: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.276916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:05.276937: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:05.522333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-08 08:49:05.522360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2023-10-08 08:49:05.522366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2023-10-08 08:49:05.522522: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.522698: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.522807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7031 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-08 08:49:05.554951: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:06.211052: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf2onnx/verbose_logging.py:76: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2023-10-08 08:49:07,330 - WARNING - tensorflow: From /usr/local/lib/python3.6/dist-packages/tf2onnx/verbose_logging.py:76: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2023-10-08 08:49:07.331626: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2023-10-08 08:49:07.357287: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.357450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:07.357467: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.359028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:07.359702: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:07.359943: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:07.361533: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:07.361962: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:07.362148: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:07.362249: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.362408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.362507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:07.391000: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2023-10-08 08:49:07.391315: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47e4fe0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-10-08 08:49:07.391330: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2023-10-08 08:49:07.439144: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.439349: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x48208d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-08 08:49:07.439364: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 2070, Compute Capability 7.5
2023-10-08 08:49:07.439512: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.439621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:07.439641: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.439660: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:07.439672: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:07.439683: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:07.439703: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:07.439715: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:07.439726: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:07.439772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.439892: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.439976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:07.440000: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.684408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-08 08:49:07.684438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2023-10-08 08:49:07.684444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2023-10-08 08:49:07.684700: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.684903: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.685038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6648 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf2onnx/tf_loader.py:343: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2023-10-08 08:49:07,685 - WARNING - tensorflow: From /usr/local/lib/python3.6/dist-packages/tf2onnx/tf_loader.py:343: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

INFO:tensorflow:Froze 0 variables.
2023-10-08 08:49:07,689 - INFO - tensorflow: Froze 0 variables.
INFO:tensorflow:Converted 0 variables to const ops.
2023-10-08 08:49:07,690 - INFO - tensorflow: Converted 0 variables to const ops.
2023-10-08 08:49:07.690924: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:07.691111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.691127: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:07.691137: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:07.691147: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:07.691169: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:07.691179: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:07.691190: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:07.691236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:07.691471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-08 08:49:07.691477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2023-10-08 08:49:07.691482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2023-10-08 08:49:07.691540: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691668: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6648 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-08 08:49:07.692623: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.692728: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2023-10-08 08:49:07.692805: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2023-10-08 08:49:07.693102: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:07.693209: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.693221: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:07.693230: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:07.693240: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:07.693250: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:07.693260: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:07.693276: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:07.693341: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693477: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:07.693579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-08 08:49:07.693585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2023-10-08 08:49:07.693590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2023-10-08 08:49:07.693649: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693774: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6648 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-08 08:49:07.696080: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: graph_to_optimize
2023-10-08 08:49:07.696093: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824]   constant_folding: Graph size after: 15 nodes (-2), 14 edges (-2), time = 0.933ms.
2023-10-08 08:49:07.696097: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824]   function_optimizer: function_optimizer did nothing. time = 0.009ms.
2023-10-08 08:49:07.696101: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824]   constant_folding: Graph size after: 15 nodes (0), 14 edges (0), time = 0.241ms.
2023-10-08 08:49:07.696104: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824]   function_optimizer: function_optimizer did nothing. time = 0.007ms.
2023-10-08 08:49:07,696 - INFO - tf2onnx: inputs: ['input:0']
2023-10-08 08:49:07,696 - INFO - tf2onnx: outputs: ['output:0']
2023-10-08 08:49:07,698 - INFO - tf2onnx.tfonnx: Using tensorflow=1.15.2, onnx=1.10.0, tf2onnx=1.11.1/1915fb
2023-10-08 08:49:07,699 - INFO - tf2onnx.tfonnx: Using opset <onnx, 11>
2023-10-08 08:49:07,708 - INFO - tf2onnx.tf_utils: Computed 0 values for constant folding
2023-10-08 08:49:07,717 - VERBOSE - tf2onnx.tfonnx: Mapping TF node to ONNX node(s)
2023-10-08 08:49:07,719 - VERBOSE - tf2onnx.tfonnx: Summay Stats:
	tensorflow ops: Counter({'Const': 7, 'Identity': 3, 'Placeholder': 1, 'MatMul': 1, 'Minimum': 1, 'Maximum': 1, 'Cast': 1, 'OneHot': 1})
	tensorflow attr: Counter({'dtype': 8, 'value': 7, 'shape': 1, 'transpose_a': 1, 'transpose_b': 1, 'Truncate': 1, 'to': 1, 'axis': 1})
	onnx mapped: Counter({'Const': 6, 'Identity': 2, 'Placeholder': 1, 'MatMul': 1, 'Minimum': 1, 'Maximum': 1, 'Cast': 1, 'OneHot': 1})
	onnx unmapped: Counter()
2023-10-08 08:49:07,719 - INFO - tf2onnx.optimizer: Optimizing ONNX model
2023-10-08 08:49:07,719 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
2023-10-08 08:49:07,722 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: no change
2023-10-08 08:49:07,722 - VERBOSE - tf2onnx.optimizer: Apply remove_redundant_upsample
2023-10-08 08:49:07,724 - VERBOSE - tf2onnx.optimizer.UpsampleOptimizer: no change
2023-10-08 08:49:07,724 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
2023-10-08 08:49:07,726 - VERBOSE - tf2onnx.optimizer.ConstFoldOptimizer: Concat -1 (1->0), Const -1 (6->5), Unsqueeze -3 (3->0)
2023-10-08 08:49:07,726 - VERBOSE - tf2onnx.optimizer: Apply const_dequantize_optimizer
2023-10-08 08:49:07,727 - VERBOSE - tf2onnx.optimizer.ConstDequantizeOptimizer: no change
2023-10-08 08:49:07,727 - VERBOSE - tf2onnx.optimizer: Apply loop_optimizer
2023-10-08 08:49:07,729 - VERBOSE - tf2onnx.optimizer.LoopOptimizer: no change
2023-10-08 08:49:07,729 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
2023-10-08 08:49:07,730 - VERBOSE - tf2onnx.optimizer.MergeDuplicatedNodesOptimizer: no change
2023-10-08 08:49:07,730 - VERBOSE - tf2onnx.optimizer: Apply reshape_optimizer
2023-10-08 08:49:07,731 - VERBOSE - tf2onnx.optimizer.ReshapeOptimizer: no change
2023-10-08 08:49:07,731 - VERBOSE - tf2onnx.optimizer: Apply global_pool_optimizer
2023-10-08 08:49:07,733 - VERBOSE - tf2onnx.optimizer.GlobalPoolOptimizer: no change
2023-10-08 08:49:07,733 - VERBOSE - tf2onnx.optimizer: Apply q_dq_optimizer
2023-10-08 08:49:07,734 - VERBOSE - tf2onnx.optimizer.QDQOptimizer: no change
2023-10-08 08:49:07,734 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
2023-10-08 08:49:07,736 - VERBOSE - tf2onnx.optimizer.IdentityOptimizer: Identity -5 (5->0)
2023-10-08 08:49:07,736 - VERBOSE - tf2onnx.optimizer: Apply remove_back_to_back
2023-10-08 08:49:07,737 - VERBOSE - tf2onnx.optimizer.BackToBackOptimizer: no change
2023-10-08 08:49:07,737 - VERBOSE - tf2onnx.optimizer: Apply einsum_optimizer
2023-10-08 08:49:07,738 - VERBOSE - tf2onnx.optimizer.EinsumOptimizer: no change
2023-10-08 08:49:07,738 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
2023-10-08 08:49:07,739 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: no change
2023-10-08 08:49:07,739 - VERBOSE - tf2onnx.optimizer: Apply remove_redundant_upsample
2023-10-08 08:49:07,740 - VERBOSE - tf2onnx.optimizer.UpsampleOptimizer: no change
2023-10-08 08:49:07,740 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
2023-10-08 08:49:07,741 - VERBOSE - tf2onnx.optimizer.ConstFoldOptimizer: no change
2023-10-08 08:49:07,741 - VERBOSE - tf2onnx.optimizer: Apply const_dequantize_optimizer
2023-10-08 08:49:07,742 - VERBOSE - tf2onnx.optimizer.ConstDequantizeOptimizer: no change
2023-10-08 08:49:07,742 - VERBOSE - tf2onnx.optimizer: Apply loop_optimizer
2023-10-08 08:49:07,743 - VERBOSE - tf2onnx.optimizer.LoopOptimizer: no change
2023-10-08 08:49:07,743 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
2023-10-08 08:49:07,744 - VERBOSE - tf2onnx.optimizer.MergeDuplicatedNodesOptimizer: no change
2023-10-08 08:49:07,744 - VERBOSE - tf2onnx.optimizer: Apply reshape_optimizer
2023-10-08 08:49:07,745 - VERBOSE - tf2onnx.optimizer.ReshapeOptimizer: no change
2023-10-08 08:49:07,745 - VERBOSE - tf2onnx.optimizer: Apply global_pool_optimizer
2023-10-08 08:49:07,746 - VERBOSE - tf2onnx.optimizer.GlobalPoolOptimizer: no change
2023-10-08 08:49:07,746 - VERBOSE - tf2onnx.optimizer: Apply q_dq_optimizer
2023-10-08 08:49:07,747 - VERBOSE - tf2onnx.optimizer.QDQOptimizer: no change
2023-10-08 08:49:07,747 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
2023-10-08 08:49:07,748 - VERBOSE - tf2onnx.optimizer.IdentityOptimizer: no change
2023-10-08 08:49:07,748 - VERBOSE - tf2onnx.optimizer: Apply remove_back_to_back
2023-10-08 08:49:07,749 - VERBOSE - tf2onnx.optimizer.BackToBackOptimizer: no change
2023-10-08 08:49:07,749 - VERBOSE - tf2onnx.optimizer: Apply einsum_optimizer
2023-10-08 08:49:07,750 - VERBOSE - tf2onnx.optimizer.EinsumOptimizer: no change
2023-10-08 08:49:07,751 - INFO - tf2onnx.optimizer: After optimization: Concat -1 (1->0), Const -1 (6->5), Identity -5 (5->0), Unsqueeze -3 (3->0)
2023-10-08 08:49:07,752 - INFO - tf2onnx:
2023-10-08 08:49:07,752 - INFO - tf2onnx: Successfully converted TensorFlow model model/test_op_test_onehot.pb to ONNX
2023-10-08 08:49:07,752 - INFO - tf2onnx: Model inputs: ['input:0']
2023-10-08 08:49:07,752 - INFO - tf2onnx: Model outputs: ['output:0']
2023-10-08 08:49:07,752 - INFO - tf2onnx: ONNX model is saved at model/test_op_plugin.onnx
const_input:  Constant (const_fold_opt__18): (shape=(1,), dtype=int32)
values:  [256]
const_input:  Constant (const_fold_opt__19): (shape=(2,), dtype=float32)
values:  [0. 1.]
[08:49:08] /workspace/TPAT/3rdparty/blazerml-tvm/src/tir/transforms/loop_partition.cc:590: Warning: Cannot prove: ((((floordiv(((any_dim*256) + 511), 512) - 1) - floordiv(any_dim, 2)) + 1) >= 0), when generating the post doubt loop
Compile...
/tmp/tuning.log does not exist!




Running...
Compile...
/tmp/tuning.log does not exist!




Running...
Compile...
/tmp/tuning.log does not exist!




Running...
rm -rf ./lib/tpat_test_onehot.so ./obj/*
if [ ! -d ./obj ]; then mkdir -p ./obj; fi
/usr/local/cuda-11.0//bin/nvcc -w -std=c++11 -M -MT tpat_test_onehot.o -I. -I/usr/local/cuda-11.0//samples/common/inc -I/usr/local/cuda-11.0//include -I/usr/include/x86_64-linux-gnu -I/usr/include/x86_64-linux-gnu -I/usr/include -o tpat_test_onehot.d src/tpat_test_onehot.cu
/usr/local/cuda-11.0//bin/nvcc -w -std=c++11 -I. -I/usr/local/cuda-11.0//samples/common/inc -I/usr/local/cuda-11.0//include -I/usr/include/x86_64-linux-gnu -I/usr/include/x86_64-linux-gnu -I/usr/include -Xcompiler -fPIC -arch=sm_75 -o tpat_test_onehot.o -c src/tpat_test_onehot.cu
# /usr/local/cuda-11.0//bin/nvcc -w -std=c++11 -I. -I/usr/local/cuda-11.0//samples/common/inc -I/usr/local/cuda-11.0//include -I/usr/include/x86_64-linux-gnu -I/usr/include/x86_64-linux-gnu -I/usr/include -Xcompiler -fPIC -arch=sm_75 -G -lineinfo -o tpat_test_onehot.o -c src/tpat_test_onehot.cu
g++ -w -std=c++11 -shared -o tpat_test_onehot.so tpat_test_onehot.o -L/usr/local/cuda-11.0//lib64 -L/usr/local/cuda-11.0//lib64 -L/workspace/TensorRT-8.0.3.4/lib  -lnvinfer -lcudart -lcuda -Wl,-rpath=/usr/local/cuda-11.0//lib64 -Wl,-rpath=/usr/local/cuda-11.0//lib64 -Wl,-rpath=/workspace/TensorRT-8.0.3.4/lib
if [ ! -d  ./lib ]; then mkdir -p ./lib; fi
mv *.o   ./obj/
mv *.d   ./obj/
mv *.so ./lib/
Onnx_name_mapping_trt_plugin: {'test_onehot': 'tpat_test_onehot'}
load ./trt_plugin/lib/tpat_test_onehot
[TensorRT] VERBOSE: Registered plugin creator - ::GridAnchor_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::NMS_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Reorg_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Region_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Clip_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::LReLU_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::PriorBox_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Normalize_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::RPROI_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::BatchedNMS_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::FlattenConcat_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::CropAndResize version 1
[TensorRT] VERBOSE: Registered plugin creator - ::DetectionLayer_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Proposal version 1
[TensorRT] VERBOSE: Registered plugin creator - ::ProposalLayer_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::ResizeNearest_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Split version 1
[TensorRT] VERBOSE: Registered plugin creator - ::SpecialSlice_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::InstanceNormalization_TRT version 1
[TensorRT] VERBOSE: ModelImporter.cpp:202: Adding network input: input:0 with dtype: float32, dimensions: (-1, 64)
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: input:0 for ONNX tensor: input:0
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: dense/kernel/read:0
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: clip_by_value/Minimum/y:0
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: clip_by_value/y:0
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: const_fold_opt__18
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: const_fold_opt__19
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: dense/MatMul [MatMul]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: input:0
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: dense/kernel/read:0
[TensorRT] VERBOSE: ModelImporter.cpp:125: dense/MatMul [MatMul] inputs: [input:0 -> (-1, 64)], [dense/kernel/read:0 -> (64, 256)],
[TensorRT] VERBOSE: builtin_op_importers.cpp:2053: GEMM: using FC layer instead of MM because all criteria were met.
[TensorRT] WARNING: onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] VERBOSE: onnx2trt_utils.cpp:1793: Original shape: (_, 64), unsqueezing to: (_, _, _, _)
[TensorRT] VERBOSE: ImporterContext.hpp:141: Registering layer: dense/MatMul for ONNX node: dense/MatMul
[TensorRT] VERBOSE: onnx2trt_utils.cpp:1641: Original shape: (_, 256, 1, 1), squeezing to: (_, _)
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: dense/MatMul:0 for ONNX tensor: dense/MatMul:0
[TensorRT] VERBOSE: ModelImporter.cpp:179: dense/MatMul [MatMul] outputs: [dense/MatMul:0 -> (-1, -1)],
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: Min__6 [Min]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: dense/MatMul:0
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: clip_by_value/Minimum/y:0
[TensorRT] VERBOSE: ModelImporter.cpp:125: Min__6 [Min] inputs: [dense/MatMul:0 -> (-1, -1)], [clip_by_value/Minimum/y:0 -> ()],
[TensorRT] VERBOSE: ImporterContext.hpp:141: Registering layer: Min__6 for ONNX node: Min__6
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: Min__6:0 for ONNX tensor: Min__6:0
[TensorRT] VERBOSE: ModelImporter.cpp:179: Min__6 [Min] outputs: [Min__6:0 -> (-1, -1)],
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: Max__9 [Max]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: Min__6:0
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: clip_by_value/y:0
[TensorRT] VERBOSE: ModelImporter.cpp:125: Max__9 [Max] inputs: [Min__6:0 -> (-1, -1)], [clip_by_value/y:0 -> ()],
[TensorRT] VERBOSE: ImporterContext.hpp:141: Registering layer: Max__9 for ONNX node: Max__9
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: Max__9:0 for ONNX tensor: Max__9:0
[TensorRT] VERBOSE: ModelImporter.cpp:179: Max__9 [Max] outputs: [Max__9:0 -> (-1, -1)],
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: Cast [Cast]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: Max__9:0
[TensorRT] VERBOSE: ModelImporter.cpp:125: Cast [Cast] inputs: [Max__9:0 -> (-1, -1)],
[TensorRT] VERBOSE: builtin_op_importers.cpp:320: Casting to type: int32
[TensorRT] VERBOSE: ImporterContext.hpp:141: Registering layer: Cast for ONNX node: Cast
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: Cast:0 for ONNX tensor: Cast:0
[TensorRT] VERBOSE: ModelImporter.cpp:179: Cast [Cast] outputs: [Cast:0 -> (-1, -1)],
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: test_onehot [tpat_test_onehot]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: Cast:0
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: const_fold_opt__18
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: const_fold_opt__19
[TensorRT] VERBOSE: ModelImporter.cpp:125: test_onehot [tpat_test_onehot] inputs: [Cast:0 -> (-1, -1)], [const_fold_opt__18 -> (1)], [const_fold_opt__19 -> (2)],
[TensorRT] INFO: ModelImporter.cpp:135: No importer registered for op: tpat_test_onehot. Attempting to import as plugin.
[TensorRT] INFO: builtin_op_importers.cpp:3659: Searching for plugin: tpat_test_onehot, plugin_version: 1, plugin_namespace:
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin tpat_test_onehot version 1
In node -1 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[TensorRT] ERROR: Network must have at least one output
[TensorRT] ERROR: Network validation failed.
[ERROR] engine is None
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
Aborted (core dumped)
root@0390133f0efa:~/examples#

unsupported ptx version error

image

I run the example for onehot plugin and ran into this error. The gpu info printed out with lspci | grep -i vga command is : 1VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)1. I think this is rtx3090 gpu. The nvidia driver version 510.

However, on another machine with lspci | grep -i vga command printed out: 37:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1), the onehot example can be run successfully (the nvidia driver version is 515). This machine seems to be also with rtx3090 gpu but the lspci | grep -i vga command printed out message is different from the previous machine. Right now I am not able to find out the cause.

Docker image

Hello, I have some problems in building the environment. Do you have the open source Docker image built?

can not find project_libbacktrace and report an error while building tvm form source

I was trying to to generate a plugin, but I can not compile the 3rdparty TVM from source. It can not dpwnload “project_libbacktrace” automaticly. where should I download it?

CMake Error at /usr/local/share/cmake-3.21/Modules/ExternalProject.cmake:2866 (message):
  No download info given for 'project_libbacktrace' and its source directory:

Is it custom operator supported?

Is it unbuilt-in operators of TVM supported?If it is, then in which function dose the work to generate the computes and schedules?How about a custom operator?

Half model error

感谢大佬们开源的工作。
在使用TPAT产生插件ScatterElements的时候,产生如下报错
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=3 Dimension=0
我的运行命令为:
python onnx_to_plugin.py -i CodeFormer.onnx -o plan.onnx -n ScatterElements_1022 -dynamic=true -min=1 -max=6 -opt=3

报错位置发生在python/cuda_kernels.py compute_tensor(),重新加载half_model.onnx的时候,报错日志如下

 File "/data//TPAT/python/onnx_to_plugin.py", line 287, in <module>
    onnx2plugin(
  File "/data//TPAT/python/onnx_to_plugin.py", line 190, in onnx2plugin
    onnx_name_mapping_trt_plugin = generate_plugin_library(
  File "/data//TPAT/python/onnx_to_plugin.py", line 85, in generate_plugin_library
    cuda_kernel.run()
  File "/data//TPAT/python/cuda_kernel.py", line 54, in run
    graph_def = self.extract_target_onnx_node(self._onnx_model)
  File "/data//TPAT/python/cuda_kernel.py", line 211, in extract_target_onnx_node
    computed_tensor_shapes = self.compute_tensor_shape(
  File "/data//TPAT/python/cuda_kernel.py", line 163, in compute_tensor_shape
    session = ort.InferenceSession(half_model_path, providers=EP_list)
  File "/home/ningnx/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/ningnx/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 408, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0

附上onnx的地址
原始onnx
half model

希望大佬帮忙可以解答一下这个问题,感恩!

when to support scan operator?

when to support scan operator? I need to use this operator! Do you have a plan to support? How can I do at this time?

Conversion Error for IsInf OP

We converted the ONNX model with IsInf OPs and it succeeded. We noticed that the IsInf OP is implemented by tpat_ininf and Cast OP. When we convert the ONNX to TensorRT model, the error happen as follow:

onnx2trt.py:29: DeprecationWarning: Use set_memory_pool_limit instead.
config.max_workspace_size =( 1 << 20 ) * 3 * 1024
Loading ONNX file from path /home/tensorrt/model_testing-sim.onnx...
Beginning ONNX file parsing
[08/16/2022-10:10:18] [TRT] [W] onnx2trt_utils.cpp:363: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
raw shape of 0 is: (6, 3, 928, 1600)
Completed parsing of ONNX file
Building an engine from file /home/tensorrt/model_testing-sim.onnx; this may take a while...
onnx2trt.py:54: DeprecationWarning: Use build_serialized_network instead.
engine = builder.build_engine(network,config)
[08/16/2022-10:11:00] [TRT] [E] 1: [castBuilder.cpp::addSupportedFormats::117] Error Code 1: Internal Error (Cast output type does not support bool.)
Completed creating Engine
Traceback (most recent call last):
File "onnx2trt.py", line 57, in
f.write(engine.serialize())
AttributeError: 'NoneType' object has no attribute 'serialize'

Do you still have this issue for IsInf OP? How can I solve this issue?

No radical Subgraph optimization for TensorRT

TPAT in fact is one node by one node to optimize,namely use TVM Ansor to auto tuning one node every time,butTVM is optimized based on subgraph,as follows:
图片
I want to know:radical subgraph optimization for TPAT is work when using TVM subgraph?modify the TVM code can realize this function?

out of memeory

Traceback (most recent call last):
  File "test_onehot_dynamic_direct.py", line 344, in <module>
    main()
  File "test_onehot_dynamic_direct.py", line 236, in main
    trt_plugin_names = onnx2plugin(
  File "/root/tpat/examples/../python/onnx_to_plugin.py", line 190, in onnx2plugin
    onnx_name_mapping_trt_plugin = generate_plugin_library(
  File "/root/tpat/examples/../python/onnx_to_plugin.py", line 85, in generate_plugin_library
    cuda_kernel.run()
  File "/root/tpat/python/cuda_kernel.py", line 83, in run
    self._module = graph_executor.create(
  File "/workspace/TPAT/3rdparty/blazerml-tvm/python/tvm/contrib/graph_executor.py", line 66, in create
    return GraphModule(fcreate(graph_json_str, libmod, *device_type_id))
  File "/workspace/TPAT/3rdparty/blazerml-tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  8: TVMFuncCall
  7: _ZNSt17_Function_handlerIFvN3
  6: tvm::runtime::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const [clone .isra.0]
  5: tvm::runtime::GraphExecutorCreate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module const&, std::vector<DLDevice, std::allocator<DLDevice> > const&, tvm::runtime::PackedFunc)
  4: tvm::runtime::GraphExecutor::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLDevice, std::allocator<DLDevice> > const&, tvm::runtime::PackedFunc)
  3: tvm::runtime::GraphExecutor::SetupStorage()
  2: tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optional<tvm::runtime::String>)
  1: tvm::runtime::DeviceAPI::AllocDataSpace(DLDevice, int, long const*, DLDataType, tvm::runtime::Optional<tvm::runtime::String>)
  0: tvm::runtime::CUDADeviceAPI::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)
  File "/workspace/TPAT/3rdparty/blazerml-tvm/src/runtime/cuda/cuda_device_api.cc", line 123
TVMError:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory

I run for a onehot plugin with node input [xxx, 561, 561], depth 64. The above error message occurred. The shape of the node input seems not to use so much memory.

what‘s the blazerml-tvm build error below?

The build log blow:
[ 89%] Building CXX object CMakeFiles/tvm_objs.dir/src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc.o
[ 89%] Building CXX object CMakeFiles/tvm_objs.dir/src/relay/backend/contrib/example_target_hooks/target.cc.o
[ 89%] Building CXX object CMakeFiles/tvm_objs.dir/src/relay/backend/contrib/example_target_hooks/tir_to_runtime.cc.o
[ 90%] Building CXX object CMakeFiles/tvm_objs.dir/src/contrib/hybrid/codegen_hybrid.cc.o
[ 90%] Built target tvm_objs
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
The command '/bin/sh -c cd /workspace/TPAT/3rdparty/blazerml-tvm/build/ && cmake .. && make -j8' returned a non-zero code: 2

who can tell me what wrong?

无法跑通 example

按照 ReadMe 中的指导进行一下操作:

  1. 使用 dockerfile 构建镜像
  2. 使用 1 张 v100-32g 卡创建容器
  3. cd 到 /workspace/TPAT/examples 目录下执行 python test_onehot_dynamic_direct.py ,出现 segfault,简单定位是在 onnx2plugin 的 cuda_kernel.run() 处出现异常。

因为直接使用了 dockerfile build ,所以没有修改 TRT_LIB_PATH 的值。但我看了下默认 TRT_LIB_PATH 的值为 /root/workspace/download/ft_local/TensorRT-8.0.0.3/lib ,这个目录在镜像里面并不存在,请问是否还需要重新设置这个值,需要的话应该如何设置呢?

Can't build TPAT

Is there something wrong with the build document ?When I follow the document clone the TPAT repository and try to build with command:

mkdir build && cp cmake/config.cmake build

the error occurred:

cp: cannot stat 'cmake/config.cmake': No such file or directory

Cuda Error in execute: 209 (no kernel image is available for execution on the device)

Hi,

I'm trying to run TPAT on Jetson AGX with Jetpack 4.4.1

I managed to install everything using the docker image with small modifications to the Dockerfile which now looks like this:

FROM nvcr.io/nvidia/l4t-tensorflow:r32.4.4-tf1.15-py3
RUN apt-get update && apt-get install build-essential cmake -y
RUN wget -O "clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz" https://github.com/llvm/llvm-project/releases/download/llvmorg-9.0.1/clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz \
    && tar -xvf clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz && mkdir -p /usr/local/llvm/ \
    && mv clang+llvm-9.0.1-aarch64-linux-gnu/* /usr/local/llvm/
RUN python3 -m pip install --upgrade pip
RUN pip3 install buildtools onnx==1.10.0 
RUN pip3 install pycuda nvidia-pyindex
RUN apt-get install git
RUN pip install onnx-graphsurgeon onnxruntime==1.9.0 tf2onnx xgboost==1.5.2
RUN git clone --recursive https://github.com/Tencent/TPAT.git /workspace/TPAT && cd /workspace/TPAT/3rdparty/blazerml-tvm && mkdir build && cp cmake/config.cmake build && cd build 
RUN sed -i 's/set(USE_LLVM OFF)/set(USE_LLVM \/usr\/local\/llvm\/bin\/llvm-config)/g' /workspace/TPAT/3rdparty/blazerml-tvm/build/config.cmake 
RUN sed -i 's/set(USE_CUDA OFF)/set(USE_CUDA ON)/g' /workspace/TPAT/3rdparty/blazerml-tvm/build/config.cmake
RUN cd /workspace/TPAT/3rdparty/blazerml-tvm/build/ && cmake .. && make -j8 
ENV TVM_HOME="/workspace/TPAT/3rdparty/blazerml-tvm/"
ENV PYTHONPATH="$TVM_HOME/python:${PYTHONPATH}" 

After running OPENBLAS_CORETYPE=ARMV8 python3 test_tpat.py I get this error:

Onnx_name_mapping_trt_plugin: {'abs_0': 'tpat_abs_0'}
[TensorRT] ERROR: ../rtExt/cuda/cudaPluginV2DynamicExtRunner.cpp (108) - 
Cuda Error in execute: 209 (no kernel image is available for execution on the device)

And it triggers error on assert:

[TensorRT] ERROR: FAILED_EXECUTION: std::exception
[[[1.7640524  0.4001572  0.978738   2.2408931  1.867558  ]
  [0.9772779  0.95008844 0.1513572  0.10321885 0.41059852]
  [0.14404356 1.4542735  0.7610377  0.12167501 0.44386324]
  [0.33367434 1.4940791  0.20515826 0.3130677  0.85409576]]

 [[2.5529897  0.6536186  0.8644362  0.742165   2.2697546 ]
  [1.4543657  0.04575852 0.18718386 1.5327792  1.4693588 ]
  [0.15494743 0.37816253 0.88778573 1.9807965  0.34791216]
  [0.15634897 1.2302907  1.2023798  0.3873268  0.30230275]]

 [[1.048553   1.420018   1.7062702  1.9507754  0.5096522 ]
  [0.4380743  1.2527953  0.7774904  1.6138978  0.21274029]
  [0.89546657 0.3869025  0.51080513 1.1806322  0.02818223]
  [0.42833188 0.06651722 0.3024719  0.6343221  0.36274117]]]
================
[array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]
trt cross_check output  False
Traceback (most recent call last):
  File "test_tpat.py", line 3860, in <module>
    test_abs()
  File "test_tpat.py", line 360, in test_abs
    op_expect(node, inputs=[x], outputs=[y], op_type=op_type, op_name=op_name)
  File "test_tpat.py", line 346, in op_expect
    verify_with_ort_with_trt(model, inputs, op_name, np_result=np_result)
  File "test_tpat.py", line 300, in verify_with_ort_with_trt
    assert ret, "result check False"
AssertionError: result check False

Can you please provide some guidance on what might be the problem?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.