Giter Site home page Giter Site logo

triton-inference-server / model_navigator Goto Github PK

View Code? Open in Web Editor NEW
168.0 10.0 24.0 11.47 MB

Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.

Home Page: https://triton-inference-server.github.io/model_navigator/

License: Apache License 2.0

Makefile 0.16% Python 97.28% Shell 2.56%
deep-learning gpu inference

model_navigator's Introduction

Triton Model Navigator

Welcome to Triton Model Navigator, an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs. The Triton Model Navigator streamlines the process of moving models and pipelines implemented in PyTorch, TensorFlow, and/or ONNX to TensorRT.

The Triton Model Navigator automates several critical steps, including model export, conversion, correctness testing, and profiling. By providing a single entry point for various supported frameworks, users can efficiently search for the best deployment option using the per-framework optimize function. The resulting optimized models are ready for deployment on either PyTriton or Triton Inference Server.

Features at Glance

The distinct capabilities of Triton Model Navigator are summarized in the feature matrix:

Feature Description
Ease-of-use Single line of code to run all possible optimization paths directly from your source code
Wide Framework Support Compatible with various machine learning frameworks including PyTorch, TensorFlow, and ONNX
Models Optimization Enhance the performance of models such as ResNET and BERT for efficient inference deployment
Pipelines Optimization Streamline Python code pipelines for models such as Stable Diffusion and Whisper using Inplace Optimization, exclusive to PyTorch
Model Export and Conversion Automate the process of exporting and converting models between various formats with focus on TensorRT and Torch-TensorRT
Correctness Testing Ensures the converted model produce correct outputs validating against the original model
Performance Profiling Profiles models to select the optimal format based on performance metrics such as latency and throughput to optimize target hardware utilization
Models Deployment Automates models and pipelines deployment on PyTriton and Triton Inference Server through dedicated API

Documentation

Learn more about Triton Model Navigator features in documentation.

Prerequisites

Before proceeding with the installation of Triton Model Navigator, ensure your system meets the following criteria:

  • Operating System: Linux (Ubuntu 20.04+ recommended)
  • Python: Version 3.8 or newer
  • NVIDIA GPU

You can use NGC Containers for PyTorch and TensorFlow which contain all necessary dependencies:

Install

The Triton Model Navigator can be installed from pypi.org.

Installing with PyTorch extras

For installing with PyTorch dependencies, use:

pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[torch]

Installing with TensorFlow extras

For installing with TensorFlow dependencies, use:

pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[tensorflow]

Installing with onnxruntime-gpu for CUDA 12

The default CUDA version for ONNXRuntime is CUDA 11.8. To install with CUDA 12 support use following extra index url:

.. --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ ..

Quick Start

The quick start section provides examples of possible optimization and deployment paths provided in Triton Model Navigator.

Optimize Stable Diffusion with Inplace

The Inplace Optimize allows seamless optimization of models for deployment, such as converting them to TensorRT, without requiring any changes to the original Python pipelines.

The below code presents Stable Diffusion pipeline optimization. But first, before you run the example install the required packages:

pip install transformers diffusers torch

Then, initialize the pipeline and wrap the model components with nav.Module::

import model_navigator as nav
from transformers.modeling_outputs import BaseModelOutputWithPooling
from diffusers import DPMSolverMultistepScheduler, StableDiffusionPipeline


def get_pipeline():
    # Initialize Stable Diffusion pipeline and wrap modules for optimization
    pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    pipe = pipe.to("cuda")

    pipe.text_encoder = nav.Module(
        pipe.text_encoder,
        name="clip",
        output_mapping=lambda output: BaseModelOutputWithPooling(**output),
    )
    pipe.unet = nav.Module(
        pipe.unet,
        name="unet",
    )
    pipe.vae.decoder = nav.Module(
        pipe.vae.decoder,
        name="vae",
    )

    return pipe

Prepare a simple dataloader:

def get_dataloader():
    # Please mind, the first element in tuple need to be a batch size
    return [(1, "a photo of an astronaut riding a horse on mars")]

Execute model optimization:

pipe = get_pipeline()
dataloader = get_dataloader()

nav.optimize(pipe, dataloader)

Once the pipeline has been optimized, you can load explicit the most performant version of the modules executing:

nav.load_optimized()

At this point, you can simply use the original pipeline to generate prediction with optimized models directly in Python:

pipe.to("cuda")

images = pipe(["a photo of an astronaut riding a horse on mars"])
image = images[0][0]

image.save("an_astronaut_riding_a_horse.png")

An example of how to serve a Stable Diffusion pipeline through PyTriton can be found here.

Optimize ResNET and deploy on Triton

Triton Model Navigator support also optimization path for deployment on Triton. This path is supported for nn.Module, keras.Model or ONNX files which inputs are tensors.

To optimize ResNet50 model from TorchHub run the following code:

import torch
import model_navigator as nav

# Optimize Torch model loaded from TorchHub
package = nav.torch.optimize(
    model=torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True).eval(),
    dataloader=[torch.randn(1, 3, 256, 256) for _ in range(10)],
)

Once optimization is done, creating a model store for deployment on Triton is simple as following code:

import pathlib

# Generate the model store from optimized model
nav.triton.model_repository.add_model_from_package(
    model_repository_path=pathlib.Path("model_repository"),
    model_name="resnet50",
    package=package,
    strategy=nav.MaxThroughputStrategy(),
)

Profile any model or callable in Python

Triton Model Navigator enhances models and pipelines and provides a uniform method for profiling any Python function, callable, or model. At present, our support is limited strictly to static batch profiling scenarios.

As an example, we will use a simple function that simply sleeps for 50ms:

import time


def custom_fn(input_):
    # wait 50ms
    time.sleep(0.05)
    return input_

Let's provide a dataloader we will use for profiling:

# Tuple of batch size and data sample
dataloader = [(1, ["This is example input"])]

Finally, run the profiling of the function with prepared dataloader:

nav.profile(custom_fn, dataloader)

Examples

We offer comprehensive, step-by-step guides that showcase the utilization of the Triton Model Navigator’s diverse features. These guides are designed to elucidate the processes of optimization, profiling, testing, and deployment of models using PyTriton and Triton Inference Server.

Useful Links

model_navigator's People

Contributors

adamrajfer avatar glos-nv avatar jkosek avatar jzakrzew avatar kacper-kleczewski avatar knowicki-nvidia avatar piotr-bazan-nv avatar piotrm-nvidia avatar ptarasiewicznv avatar pziecina-nv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

model_navigator's Issues

Convert onnx model with plugin layer

I have an object detection model that was exported to ONNX and then had this http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnxops/onnx__EfficientNMS_TRT.html attached using onnx_graphsurgeon.

In the past I've used trtexec to convert the model to TensorRT. This succeeds without any issues, but I'm trying to use model navigator to convert the same model and get the following error. Is there a way to tell it how to load this plugin?

2023-06-16 19:45:40 ERROR    Navigator: Command finished with unexpected error: Traceback (most recent call last):
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/pipelines/pipeline.py", line 111, in _execute_unit
    command_output = execution_unit.command(status).run(
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/commands/base.py", line 78, in run
    output = self._run(*args, **_filter_dict_for_func(kwargs, self._run))
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/commands/infer_metadata.py", line 128, in _run
    input_names = self._get_default_input_names(model, sample, framework)
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/commands/infer_metadata.py", line 167, in _get_default_input_names
    with onnx_runner:
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/runners/base.py", line 146, in __enter__
    self.activate()
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/runners/base.py", line 182, in activate
    self.activate_impl()
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/runners/onnx.py", line 122, in activate_impl
    self.sess, _ = utils.invoke_if_callable(self._sess)
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/utils/common.py", line 312, in invoke_if_callable
    ret = func(*args, **kwargs)
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/runners/onnx.py", line 59, in __call__
    return self.call_impl(*args, **kwargs)
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/model_navigator/runners/onnx.py", line 82, in call_impl
    return onnxrt.InferenceSession(model_bytes, providers=providers)
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/workspace/virtualenvs/model-navigator/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 395, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for EfficientNMS_TRT(1) node with name 'batchedNMS'

There is an error using `nav.triton.model_repository.add_model_from_package`

Thanks for your hard working on this remarkable tool. And I encountered an error when I execute the command ./optimize.py --model-name classification --max-sequence-length=10.

#!/usr/bin/env python3

import math
import os

# https://stackoverflow.com/questions/62691279/how-to-disable-tokenizers-parallelism-true-false-warning
os.environ["TOKENIZERS_PARALLELISM"] = "false"

import argparse
import itertools
import pathlib

import numpy as np
from datasets import load_dataset
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, DataCollatorWithPadding, TensorType
from transformers import AutoModelForSequenceClassification
from transformers.onnx.features import FeaturesManager
import model_navigator as nav


def get_model(model_name: str):
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    model.config.return_dict = True
    return model


def get_dataloader(
    model_name: str,
    dataset_name: str,
    max_batch_size: int,
    num_samples: int,
    max_sequence_length: int,
):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    if max_sequence_length == -1:
        max_sequence_length = getattr(tokenizer, "model_max_length", 512)

        if max_sequence_length > 512:
            max_sequence_length = 512

    model = FeaturesManager.get_model_from_feature(
        feature="sequence-classification", model=model_name
    )
    _, model_onnx_config = FeaturesManager.check_supported_model_or_raise(
        model=model, feature="sequence-classification"
    )
    onnx_config = model_onnx_config(model.config)
    input_names = tuple(onnx_config.inputs.keys())
    dataset = load_dataset(dataset_name)["train"]

    def preprocess_function(examples):
        return tokenizer(
            examples["content"], truncation=True, max_length=max_sequence_length
        )

    tokenized_dataset = dataset.map(preprocess_function, batched=True)
    tokenized_dataset = tokenized_dataset.remove_columns(
        [c for c in tokenized_dataset.column_names if c not in input_names]
    )
    dataloader = DataLoader(
        tokenized_dataset,
        batch_size=max_batch_size,
        collate_fn=DataCollatorWithPadding(
            tokenizer=tokenizer,
            padding=True,
            max_length=max_sequence_length,
            return_tensors=TensorType.PYTORCH,
        ),
    )

    return [sample for sample, _ in zip(dataloader, range(num_samples))]


def get_verify_function():
    def verify_func(ys_runner, ys_expected):
        """Verify that at least 99% max probability tokens match on any given batch."""
        for y_runner, y_expected in zip(ys_runner, ys_expected):
            if not all(
                np.mean(a.argmax(axis=2) == b.argmax(axis=2)) > 0.99
                for a, b in zip(y_runner.values(), y_expected.values())
            ):
                return False
        return True

    return verify_func


def get_configuration(
    model_name: str,
    batch_size: int,
    max_sequence_length: int,
):
    model = FeaturesManager.get_model_from_feature(
        model=model_name,
        feature="sequence-classification",
    )
    _, model_onnx_config = FeaturesManager.check_supported_model_or_raise(
        model=model,
        feature="sequence-classification",
    )
    onnx_config = model_onnx_config(model.config)
    input_names = tuple(onnx_config.inputs.keys())
    output_names = tuple(onnx_config.outputs.keys())
    dynamic_axes = {
        name: axes
        for name, axes in itertools.chain(
            onnx_config.inputs.items(),
            onnx_config.outputs.items(),
        )
    }
    opset = onnx_config.default_onnx_opset

    tensorrt_profile = nav.TensorRTProfile()
    for k in input_names:
        tensorrt_profile.add(
            k,
            (1, max_sequence_length),
            (math.ceil(batch_size / 2), max_sequence_length),
            (batch_size, max_sequence_length),
        )

    optimization_profile = nav.OptimizationProfile(
        max_batch_size=batch_size,
        batch_sizes=[
            1,
            math.ceil(batch_size / 2),
            batch_size,
        ],
        stability_percentage=15,
        max_trials=5,
        throughput_cutoff_threshold=0.1,
    )

    configuration = {
        "input_names": input_names,
        "output_names": output_names,
        "sample_count": 10,
        "optimization_profile": optimization_profile,
        "custom_configs": [
            nav.TorchConfig(
                jit_type=nav.JitType.TRACE,
                strict=False,
            ),
            nav.OnnxConfig(
                opset=opset,
                dynamic_axes=dynamic_axes,
            ),
            nav.TensorRTConfig(
                precision=(nav.TensorRTPrecision.FP32),
                max_workspace_size=2 * 1024 * 1024 * 1024,
                trt_profile=tensorrt_profile,
            ),
        ],
    }
    return configuration


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--workspace",
        type=str,
        default=".navigator_workspace",
        help="navigator cache workspace",
    )
    parser.add_argument(
        "--input-model",
        type=str,
        default="uer/albert-base-chinese-cluecorpussmall",
        help="input model",
    )
    parser.add_argument(
        "--model-name",
        type=str,
        required=True,
        help="sub dir model name in model store model_repository folder",
    )
    parser.add_argument(
        "--batch-size",
        type=int,
        default=4,
        help="batch size on model",
    )
    parser.add_argument(
        "--max-sequence-length",
        type=int,
        default=-1,
        help="max input text sequence length on model",
    )
    parser.add_argument(
        "--device",
        type=str,
        default="cpu",
        help="device = None or 'cpu' or 0 or '0' or '0,1,2,3'",
    )
    parser.add_argument(
        "--min-top1-accuracy",
        type=float,
        default=0.9,
    )
    parser.add_argument(
        "--model-repository",
        type=str,
        default=f".model_repository",
        help="model repository folder serving on tirton",
    )
    return parser.parse_args()


def main(FLAGS):
    dataset_name = "madao33/new-title-chinese"
    num_samples = 10

    model = get_model(FLAGS.input_model)
    dataloader = get_dataloader(
        model_name=FLAGS.input_model,
        dataset_name=dataset_name,
        max_batch_size=FLAGS.batch_size,
        num_samples=num_samples,
        max_sequence_length=FLAGS.max_sequence_length,
    )
    verify_func = get_verify_function()
    configuration = get_configuration(
        model_name=FLAGS.input_model,
        batch_size=FLAGS.batch_size,
        max_sequence_length=FLAGS.max_sequence_length,
    )

    package = nav.torch.optimize(
        model=model,
        dataloader=dataloader,
        # verify_func=verify_func,
        target_device=nav.DeviceKind.CPU
        if str(FLAGS.device) == "cpu"
        else nav.DeviceKind.CUDA,
        debug=True,
        verbose=True,
        workspace=pathlib.Path(FLAGS.workspace) / FLAGS.model_name,
        **configuration,
    )

    import shutil

    shutil.rmtree(
        pathlib.Path(FLAGS.model_repository) / FLAGS.model_name,
        ignore_errors=True,
    )

    nav.triton.model_repository.add_model_from_package(
        model_repository_path=pathlib.Path(FLAGS.model_repository),
        model_name=FLAGS.model_name,
        package=package,
    )


if __name__ == "__main__":
    main(parse_args())

Here is my error stack, and it reports the failure of conversion from TorchConfig to a dict with the syntax **.

Traceback (most recent call last):
  File "./optimize.py", line 258, in <module>
  File "./optimize.py", line 250, in main
    model_repository_path=pathlib.Path(FLAGS.model_repository),
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/triton/model_repository.py", line 202, in add_model_from_package
    if package.config.batch_dim not in [0, None]:
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/package/package.py", line 103, in config
    config_dict["custom_configs"] = self._get_custom_configs(self.status.config["custom_configs"])
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/package/package.py", line 315, in _get_custom_configs
    obj = custom_config_class.from_dict(fields)  # pytype: disable=not-instantiable
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/api/config.py", line 452, in from_dict
    return cls(**config_dict)
TypeError: ABCMeta object argument after ** must be a mapping, not TorchConfig

So I have to add the patching codes to the logic by the following codes:

#..... Same as the above

    import shutil

    shutil.rmtree(
        pathlib.Path(FLAGS.model_repository) / FLAGS.model_name,
        ignore_errors=True,
    )
    # Here are the patching codes
    package.status.config["custom_configs"] = {
        k: conf.to_dict() for k, conf in package.status.config["custom_configs"].items()
    }
    nav.triton.model_repository.add_model_from_package(
        model_repository_path=pathlib.Path(FLAGS.model_repository),
        model_name=FLAGS.model_name,
        package=package,
    )


if __name__ == "__main__":
    main(parse_args())

Conversion must require Cuda-Capabilities?

I have the configuration:

#!/usr/bin/env python3

import logging
import pathlib
from typing import Iterable
import math

import numpy as np
import model_navigator as nav
from model_navigator.api.config import Sample

LOGGER = logging.getLogger(__name__)


def get_model(FLAGS):
    model = pathlib.Path(FLAGS.input_model)
    if model.suffix == ".pt":
        from yolov5_utils.yolov5 import export

        export.run(
            weights=FLAGS.input_model,
            imgsz=(FLAGS.img, FLAGS.img),
            dynamic=True,
            device=FLAGS.device,
        )
    return model.with_suffix(".onnx")


def get_dataloader(FLAGS):
    return [
        np.random.randn(1, 3, FLAGS.img, FLAGS.img).astype(np.float32)
        for _ in range(FLAGS.batch_size)
    ]


def get_verify_function():
    def verify_func(ys_runner: Iterable[Sample], ys_expected: Iterable[Sample]) -> bool:
        for y_runner, y_expected in zip(ys_runner, ys_expected):
            if not all(
                np.allclose(a, b, rtol=1.0e-3, atol=1.0e-3)
                for a, b in zip(y_runner.values(), y_expected.values())
            ):
                return False
        return True

    return verify_func


def get_profiler_config(FLAGS):
    return nav.onnx.ProfilerConfig(
        run_profiling=True,
        batch_sizes=[
            1,
            math.ceil(FLAGS.batch_size / 2),
            FLAGS.batch_size,
        ],
        measurement_mode=nav.MeasurementMode.TIME_WINDOWS,
        measurement_interval=2500,  # ms
        measurement_request_count=10,
        stability_percentage=15,
        max_trials=5,
        throughput_cutoff_threshold=0.1,
    )


def get_configuration(FLAGS):
    return {
        "custom_configs": [
            nav.TensorRTConfig(
                precision=(nav.TensorRTPrecision.FP32),
                max_workspace_size=4 * 1024 * 1024 * 1024,  # 4GB
                trt_profile=nav.TensorRTProfile().add(
                    "images",
                    (1, 3, FLAGS.img, FLAGS.img),
                    (
                        math.ceil(FLAGS.batch_size / 2),
                        3,
                        FLAGS.img,
                        FLAGS.img,
                    ),
                    (FLAGS.batch_size, 3, FLAGS.img, FLAGS.img),
                ),
            ),
            nav.OnnxConfig(opset=17),
        ]
    }


def parse_args():
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--workspace",
        type=str,
        default=".navigator_workspace",
        help="navigator cache workspace",
    )
    parser.add_argument(
        "--input-model",
        type=str,
        default="yolov5m.pt",
        help="input model",
    )
    parser.add_argument(
        "--model-name",
        type=str,
        default="yolov5",
        help="sub dir model name in model store model_repository folder",
    )
    parser.add_argument(
        "--batch-size",
        type=int,
        default=4,
        help="batch size on model",
    )
    parser.add_argument(
        "--img",
        type=int,
        default=1280,
        help="image size",
    )
    parser.add_argument(
        "--device",
        type=str,
        default="cpu",
        help="device = None or 'cpu' or 0 or '0' or '0,1,2,3'",
    )
    parser.add_argument(
        "--min-top1-accuracy",
        type=float,
        default=0.9,
    )
    parser.add_argument(
        "--model-repository",
        type=str,
        default=f".model_repository",
        help="model repository folder serving on tirton",
    )
    return parser.parse_args()


def main(FLAGS):
    model = get_model(FLAGS)
    dataloader = get_dataloader(FLAGS)
    verify_func = get_verify_function()
    configuration = get_configuration(FLAGS)
    profiler_config = get_profiler_config(FLAGS)

    package = nav.onnx.optimize(
        model=model,
        dataloader=dataloader,
        # target_formats=(nav.Format.TENSORRT),
        verify_func=verify_func,
        target_device=nav.DeviceKind.CPU,
        profiler_config=profiler_config,
        debug=True,
        verbose=True,
        workspace=pathlib.Path(FLAGS.workspace) / FLAGS.model_name,
        # **configuration,
    )

    import shutil

    shutil.rmtree(
        pathlib.Path(FLAGS.model_repository) / FLAGS.model_name,
        ignore_errors=True,
    )
    nav.triton.model_repository.add_model_from_package(
        model_repository_path=pathlib.Path(FLAGS.model_repository),
        model_name=FLAGS.model_name,
        package=package,
        strategy=nav.MaxThroughputStrategy(),
    )


if __name__ == "__main__":
    main(parse_args())

When I do:

pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator

./optimize_onnx.py --input-model=yolov5m.pt \
    --model-name=yolov5 \
    --batch-size=4 \
    --device=0 \
    --model-repository=./model_repository

I got the error with the lack of Cuda-capable device is detected:

YOLOv5 🚀 2023-6-19 Python-3.8.10 torch-1.14.0a0+410ce96 CPU

Fusing layers... 
YOLOv5m summary: 290 layers, 21172173 parameters, 0 gradients

PyTorch: starting from yolov5m.pt with output shape (1, 100800, 85) (40.8 MB)

TorchScript: starting export with torch 1.14.0a0+410ce96...
TorchScript: export success ✅ 8.1s, saved as yolov5m.torchscript (81.2 MB)

ONNX: starting export with onnx 1.12.0...
ONNX: export success ✅ 2.6s, saved as yolov5m.onnx (80.8 MB)

Export complete (14.1s)
Results saved to /workspaces/ai-serving-solution/deploy/triton_model_utils/example/yolov5
Detect:          python detect.py --weights yolov5m.onnx 
Validate:        python val.py --weights yolov5m.onnx 
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5m.onnx')  
Visualize:       https://netron.app
2023-06-21 10:58:09 INFO     Navigator: ============================== Common config parameters ============================================
2023-06-21 10:58:09 INFO     Navigator: {
    "framework": "onnx",
    "workspace": ".navigator_workspace/yolov5",
    "target_formats": [
        "onnx",
        "trt"
    ],
    "target_device": "cpu",
    "sample_count": 100,
    "profiler_config": {
        "run_profiling": true,
        "batch_sizes": [
            1,
            2,
            4
        ],
        "measurement_mode": "time_windows",
        "measurement_interval": 2500,
        "measurement_request_count": 10,
        "stability_percentage": 15,
        "max_trials": 5,
        "throughput_cutoff_threshold": 0.1
    },
    "runner_names": [
        "TensorFlowSavedModelCPU",
        "TensorFlowCPU",
        "OnnxCPU",
        "TorchScriptCPU",
        "TorchCPU",
        "PythonRunner"
    ],
    "batch_dim": 0,
    "seed": 0,
    "_input_names": null,
    "_output_names": null,
    "from_source": true,
    "forward_kw_names": null,
    "custom_configs": {},
    "verbose": true,
    "debug": true
}
2023-06-21 10:58:11 INFO     Navigator: Removing exiting workspace at .navigator_workspace/yolov5
2023-06-21 10:58:11 INFO     Navigator: ============================== Pipeline 'Preprocessing' started ====================================
2023-06-21 10:58:11 INFO     Navigator: ============================== Command 'InferInputMetadata' started ================================
2023-06-21 10:58:11 ERROR    Navigator: Command finished with unexpected error: Traceback (most recent call last):
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/pipelines/pipeline.py", line 111, in _execute_unit
    command_output = execution_unit.command(status).run(
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/commands/base.py", line 78, in run
    output = self._run(*args, **_filter_dict_for_func(kwargs, self._run))
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/commands/infer_metadata.py", line 128, in _run
    input_names = self._get_default_input_names(model, sample, framework)
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/commands/infer_metadata.py", line 160, in _get_default_input_names
    onnxrt_runner_cls = OnnxrtCUDARunner if is_cuda_available() else OnnxrtCPURunner
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/utils/devices.py", line 127, in is_cuda_available
    return bool(get_gpus(["all"]))
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/utils/devices.py", line 79, in get_gpus
    devices = [dev["uuid"] for dev in get_available_gpus()]
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/utils/devices.py", line 54, in get_available_gpus
    _check_ret(cuda.cuInit(0))
  File "/home/vscode/.local/lib/python3.8/site-packages/model_navigator/utils/devices.py", line 40, in _check_ret
    raise ModelNavigatorError(f"CUDA error: {err_str.value}.")
model_navigator.exceptions.ModelNavigatorError: CUDA error: b'no CUDA-capable device is detected'.

The required command has failed. Please, review the log and verify the reported problems.

On the contrary, I execute the command with my cuda device 0, and everything is OK:

./optimize_onnx.py --device=0

looks like a bug while in utils.dataloader.load_samples

Version:0.74
Detailed steps to reproduce the bug: after nav.optimize, check samples(*.npz) consistency in model_input and model_output with the same index

problems:
https://github.com/triton-inference-server/model_navigator/blob/8baf51016810cada8a750887758eabd5d1e6910d/model_navigator/utils/dataloader.py#L194C1-L194C5
sorted(samples_dirpath.iterdir())
Whiling loading samples(*.npz) from files, this line will result in un-expected loading sequence as
0.npz, 1.npz, 10.npz, 100.npz, 2.npz ....
rather than expected index ascending way

And with this un-expected loading sequence, npz's in model_output directory will not match thoese in model_input directory, causing failure in correctness/verifymodel

maybe change this line to someting like:
sorted(samples_dirpath.iterdir(), key=lambda f: int(''.join(filter(str.isdigit, str(f)))))
or use some regex to extract the index

TypeError: pybind11::init(): factory function returned nullptr

I want to use this repo in docker(nvcr.io/nvidia/tritonserver:22.12-py3)

steps

python3 -m pip install triton-model-navigator

Successfully installed aiohttp-3.9.0 aiosignal-1.3.1 async-timeout-4.0.3 attrs-23.1.0 brotli-1.1.0 coloredlogs-15.0.1 cuda-python-12.3.0 dacite-1.8.1 fire-0.5.0 flatbuffers-23.5.26 frozenlist-1.4.0 gevent-23.9.1 geventhttpclient-2.0.2 greenlet-3.0.1 grpcio-1.59.3 humanfriendly-10.0 jsonlines-4.0.0 mpmath-0.19 multidict-6.0.4 onnx-1.14.1 onnx-graphsurgeon-0.3.27 onnxruntime-gpu-1.16.3 onnxscript-0.1.0.dev20231121 packaging-23.2 polygraphy-0.49.0 protobuf-3.20.3 psutil-5.9.6 py-cpuinfo-9.0.0 pynvml-11.5.0 python-rapidjson-1.13 python-slugify-8.0.1 pyyaml-6.0.1 sympy-1.12 tabulate-0.9.0 tensorrt-8.6.1.post1 termcolor-2.3.0 text-unidecode-1.3 triton-model-navigator-0.7.4 tritonclient-2.39.0 typing-extensions-4.8.0 wrapt-1.14.1 yarl-1.9.3 zope.event-5.0 zope.interface-6.1

python3 test.py

test.py

import pathlib

import model_navigator as nav
import numpy as np


def dataloader():
    return [np.random.rand(1, 3, 224, 224).astype("float32")]


onnx_model = r"model.onnx"
onnx_package = nav.onnx.optimize(
    model=onnx_model,
    dataloader=dataloader(),
    target_formats=(nav.Format.TENSORRT,),
    workspace=pathlib.Path("onnx_workspace"),
    custom_configs=[nav.TensorRTConfig(precision=(nav.TensorRTPrecision.FP32,))],
)
nav.package.save(onnx_package, "onnx_linear.plan", override=True)

2023-11-21 09:18:02 WARNING Navigator: Command finished with ModelNavigatorUserInputError. The error is considered as external error. Usually caused by incompatibilities between the model and the target formats and/or runtimes. Please review the command output.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/model_navigator/pipelines/pipeline.py", line 108, in _execute_unit
command_output = execution_unit.command().run(**input_parameters) # pytype: disable=not-instantiable
File "/usr/local/lib/python3.8/dist-packages/model_navigator/commands/base.py", line 116, in run
output = self._run(*args, **_filter_dict_for_func(kwargs, self._run))
File "/usr/local/lib/python3.8/dist-packages/model_navigator/commands/convert/onnx/onnx2trt.py", line 149, in _run
conversion_max_batch_size = self._execute_conversion(
File "/usr/local/lib/python3.8/dist-packages/model_navigator/commands/convert/base.py", line 75, in _execute_conversion
conversion_max_batch_size = cls._execute_single_conversion(
File "/usr/local/lib/python3.8/dist-packages/model_navigator/commands/convert/base.py", line 90, in _execute_single_conversion
convert_func(get_args())
File "/usr/local/lib/python3.8/dist-packages/model_navigator/commands/convert/onnx/onnx2trt.py", line 150, in
convert_func=lambda args: context.execute_external_runtime_script(onnx2trt.file, args),
File "/usr/local/lib/python3.8/dist-packages/model_navigator/commands/execution_context.py", line 188, in execute_external_runtime_script
self.execute_cmd(cmd, allow_failure=allow_failure)
File "/usr/local/lib/python3.8/dist-packages/model_navigator/commands/execution_context.py", line 231, in execute_cmd
raise ModelNavigatorUserInputError(
model_navigator.exceptions.ModelNavigatorUserInputError: Processes exited with error code: 1. Command to reproduce error: /bin/bash trt-fp32/reproduce_conversion.sh

/bin/bash trt-fp32/reproduce_conversion.sh
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[W] Unable to determine GPU memory usage
[W] CUDA initialization failure with error: 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Traceback (most recent call last):
  File "trt-fp32/reproduce_conversion.py", line 127, in <module>
    fire.Fire(convert)
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "trt-fp32/reproduce_conversion.py", line 105, in convert
    network = network_from_onnx_path(exported_model_path.as_posix(), flags=onnx_parser_flags)
  File "<string>", line 3, in network_from_onnx_path
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 40, in __call__
    return self.call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/util/util.py", line 710, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 223, in call_impl
    builder, network, parser = super().call_impl()
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/util/util.py", line 710, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 138, in call_impl
    builder, network = create_network(strongly_typed=self.strongly_typed)
  File "<string>", line 3, in create_network
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 40, in __call__
    return self.call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/util/util.py", line 710, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 103, in call_impl
    builder = trt.Builder(trt_util.get_trt_logger())
TypeError: pybind11::init(): factory function returned nullptr

Add support to override default model.py file name

TritonServer allows user to override model.py default file name for python backends (this request can also apply to other backends). https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_repository.html#python-models

However nav does not support this field in ModelConfig when calling add_model(). Also when we pass a file called "xxx.py" it still overwrites the file name using the line

.

Feature Request: Support this new field on ModelConfig, and use the user provided filename directly (xxx.py) when deploying model, if the new field is provided.

convert_results.yaml instead of convert.yaml?

Following the quick start guide on a fresh install, receive the following error mod:

model-navigator run --model-name add_sub \
    --model-path examples/quick-start/model.pt \
    --inputs INPUT__0:-1,16:float32 INPUT__1:-1,16:float32 \
    --outputs OUTPUT__0:-1,16:float32 OUTPUT__1:-1,16:float32 \
    --override-workspace
.
.
.
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/home/navigator_workspace/convert_results.yaml'

Investigating the workspace that model navigator creates, I found a "convert.yaml" instead of "convert_results.yaml". I'm unable to just change the file to see if this is the file navigator is looking for that has just been misnamed. Can I get some feedback if these are the same file, or if I have osme other error and it just never gets to the creation of convert_results.yaml?

`model-navigator optimize` stops with an error

Running model-navigator optimise my_model.nav following the instructions in quick_start.md will stop with an error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/model_navigator/cli/analyze.py", line 90, in analyze_cmd
    analyze_results = analyzer.run()
  File "/usr/local/lib/python3.8/dist-packages/model_navigator/model_analyzer/analyzer.py", line 72, in run
    analyzer.run(mode=ModelAnalyzerMode.ANALYZE, verbose=self._verbose, quiet=quiet)
  File "/usr/local/lib/python3.8/dist-packages/model_navigator/model_analyzer/model_analyzer.py", line 73, in run
    raise ModelNavigatorException(
model_navigator.exceptions.ModelNavigatorException: Running model-analyzer with ['model-analyzer', '--quiet', 'analyze', '-f', '/scratch_space/navigator_workspace/analyzer/config-analyze.yaml'] failed with exit status 1 : None output : None

model-analyzer anlyze is an alias for model-analyzer profile, so the analysis_models in the generated navigator_workspace/analyzer/config-analyze.yaml I think the key is wrong. I think changing analysis_models to profile_models should work

environment

  • base image: nvcr.io/nvidia/tritonserver:23.01-py3
  • triton-model-analyzer 1.24.0
  • model-navigator 0.3.7

Model conversion for multiple models

Hi Triton team,

Thanks for the model navigator tool and autoconversion, as well as, autovalidation features that comes with it. I wanted to know if there is any way to run it for multiple models. Currently, the commands in documentation only have examples for single model. I want to run it for multiple models that are kept in same folders. I know it is possible by writing a bash script but I was curious if such feature is already there or not. Such feature will completely automate the deployment routine in our case.

Thanks/

Optimize API throwing "sh.CommandNotFound: tritonserver"

Environment

  • Ubuntu 18.04.6 LTS
  • Docker 20.10.13
  • CUDA Version: 11.7
  • NVIDIA-SMI 515.65.01
  • Driver Version: 515.65.01
  • GPU: Tesla T4

Description

Hi, I am trying to use the "optimize" API but I am getting the following error.

root@:/home/ubuntu/model_navigator# model-navigator optimize bert.nav
2023-01-27 07:05:03 - INFO - model_navigator.log: optimize args:
2023-01-27 07:05:03 - INFO - model_navigator.log:       model_name = my_model
2023-01-27 07:05:03 - INFO - model_navigator.log:       model_path = /home/ubuntu/model_navigator/navigator_workspace/.input_data/input_model/torchscript-trace/model.pt
2023-01-27 07:05:03 - INFO - model_navigator.log:       model_format = torchscript
2023-01-27 07:05:03 - INFO - model_navigator.log:       model_version = 1
2023-01-27 07:05:03 - INFO - model_navigator.log:       target_formats = ['tf-trt', 'tf-savedmodel', 'onnx', 'trt', 'torchscript', 'torch-trt']
2023-01-27 07:05:03 - INFO - model_navigator.log:       onnx_opsets = [14]
2023-01-27 07:05:03 - INFO - model_navigator.log:       tensorrt_precisions = ['fp32', 'fp16']
2023-01-27 07:05:03 - INFO - model_navigator.log:       tensorrt_precisions_mode = hierarchy
2023-01-27 07:05:03 - INFO - model_navigator.log:       tensorrt_explicit_precision = False
2023-01-27 07:05:03 - INFO - model_navigator.log:       tensorrt_sparse_weights = False
2023-01-27 07:05:03 - INFO - model_navigator.log:       tensorrt_max_workspace_size = 4294967296
2023-01-27 07:05:03 - INFO - model_navigator.log:       atol = {'output__0': 0.23096442222595215}
2023-01-27 07:05:03 - INFO - model_navigator.log:       rtol = {'output__0': 0.09238576889038086}
2023-01-27 07:05:03 - INFO - model_navigator.log:       inputs = {'input__0': {'name': 'input__0', 'shape': [-1, 8], 'dtype': 'int64', 'optional': False}, 'input__1': {'name': 'input__1', 'shape': [-1, 8], 'dtype': 'int64', 'optional': False}}
2023-01-27 07:05:03 - INFO - model_navigator.log:       outputs = {'output__0': {'name': 'output__0', 'shape': [-1, 2], 'dtype': 'float32', 'optional': False}}
2023-01-27 07:05:03 - INFO - model_navigator.log:       min_shapes = None
2023-01-27 07:05:03 - INFO - model_navigator.log:       opt_shapes = None
2023-01-27 07:05:03 - INFO - model_navigator.log:       max_shapes = None
2023-01-27 07:05:03 - INFO - model_navigator.log:       value_ranges = None
2023-01-27 07:05:03 - INFO - model_navigator.log:       dtypes = None
2023-01-27 07:05:03 - INFO - model_navigator.log:       engine_count_per_device = {}
2023-01-27 07:05:03 - INFO - model_navigator.log:       triton_backend_parameters = {}
2023-01-27 07:05:03 - INFO - model_navigator.log:       triton_launch_mode = local
2023-01-27 07:05:03 - INFO - model_navigator.log:       triton_server_path = tritonserver
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_max_batch_size = 128
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_max_concurrency = 1024
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_max_instance_count = 5
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_concurrency = []
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_batch_sizes = []
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_instance_counts = {}
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_max_batch_sizes = []
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_preferred_batch_sizes = []
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_backend_parameters = {}
2023-01-27 07:05:03 - INFO - model_navigator.log:       config_search_early_exit_enable = False
2023-01-27 07:05:03 - INFO - model_navigator.log:       top_n_configs = 3
2023-01-27 07:05:03 - INFO - model_navigator.log:       objectives = {'perf_throughput': 10}
2023-01-27 07:05:03 - INFO - model_navigator.log:       max_latency_ms = None
2023-01-27 07:05:03 - INFO - model_navigator.log:       min_throughput = 0
2023-01-27 07:05:03 - INFO - model_navigator.log:       max_gpu_usage_mb = None
2023-01-27 07:05:03 - INFO - model_navigator.log:       perf_analyzer_timeout = 600
2023-01-27 07:05:03 - INFO - model_navigator.log:       perf_analyzer_path = perf_analyzer
2023-01-27 07:05:03 - INFO - model_navigator.log:       perf_measurement_mode = count_windows
2023-01-27 07:05:03 - INFO - model_navigator.log:       perf_measurement_request_count = 50
2023-01-27 07:05:03 - INFO - model_navigator.log:       perf_measurement_interval = 5000
2023-01-27 07:05:03 - INFO - model_navigator.log:       perf_measurement_shared_memory = none
2023-01-27 07:05:03 - INFO - model_navigator.log:       perf_measurement_output_shared_memory_size = 102400
2023-01-27 07:05:03 - INFO - model_navigator.log:       workspace_path = navigator_workspace
2023-01-27 07:05:03 - INFO - model_navigator.log:       override_workspace = False
2023-01-27 07:05:03 - INFO - model_navigator.log:       override_conversion_container = False
2023-01-27 07:05:03 - INFO - model_navigator.log:       framework_docker_image = nvcr.io/nvidia/pytorch:22.10-py3
2023-01-27 07:05:03 - INFO - model_navigator.log:       triton_docker_image = nvcr.io/nvidia/tritonserver:22.10-py3
2023-01-27 07:05:03 - INFO - model_navigator.log:       gpus = ('all',)
2023-01-27 07:05:03 - INFO - model_navigator.log:       verbose = False
2023-01-27 07:05:03 - INFO - model_navigator.utils.docker: Run docker container with image model_navigator_converter:22.10-py3; using workdir: /home/ubuntu/model_navigator
2023-01-27 07:05:06 - INFO - model_navigator.converter.transformers: Running command copy on /home/ubuntu/model_navigator/navigator_workspace/.input_data/input_model/torchscript-trace/model.pt
2023-01-27 07:05:06 - INFO - model_navigator.converter.transformers: Running command annotation on /home/ubuntu/model_navigator/navigator_workspace/converted/model.pt
2023-01-27 07:05:06 - INFO - model_navigator.converter.transformers: Saving annotations to /home/ubuntu/model_navigator/navigator_workspace/converted/model.pt.yaml
2023-01-27 07:05:06 - INFO - pyt.transformers: ts2onnx command started.
2023-01-27 07:05:17 - INFO - pyt.transformers: ts2onnx command succeed.
2023-01-27 07:05:18 - INFO - polygraphy.transformers: Polygraphy onnx2trt started.
2023-01-27 07:05:18 - WARNING - polygraphy.transformers: This conversion should be done on target GPU platform
2023-01-27 07:06:57 - INFO - polygraphy.transformers: onnx2trt command succeed.
2023-01-27 07:06:57 - INFO - polygraphy.transformers: Polygraphy onnx2trt succeeded.
2023-01-27 07:06:57 - INFO - polygraphy.transformers: Polygraphy onnx2trt started.
2023-01-27 07:06:57 - WARNING - polygraphy.transformers: This conversion should be done on target GPU platform
2023-01-27 07:25:40 - INFO - polygraphy.transformers: onnx2trt command succeed.
[I] Loading inference results from /home/ubuntu/model_navigator/navigator_workspace/converted/model-ts2onnx_op14-polygraphyonnx2trt_fp16_mh.plan.comparator_outputs.json
[I] Loading inference results from /home/ubuntu/model_navigator/navigator_workspace/converted/model-ts2onnx_op14-polygraphyonnx2trt_fp16_mh.plan.comparator_outputs.json
[I] Loading inference results from /home/ubuntu/model_navigator/navigator_workspace/converted/model-ts2onnx_op14-polygraphyonnx2trt_fp16_mh.plan.comparator_outputs.json
2023-01-27 07:25:40 - WARNING - polygraphy.transformers: Polygraphy onnx2trt conversion failed. Details can be found in logfile: /home/ubuntu/model_navigator/navigator_workspace/converted/model-ts2onnx_op14-polygraphyonnx2trt_fp16_mh.plan.log
2023-01-27 07:25:40 - INFO - model_navigator.converter.torch_tensorrt: model_navigator.converter.torch_tensorrt command started.
2023-01-27 07:25:40 - WARNING - model_navigator.converter.torch_tensorrt: This conversion should be done on target GPU platform
2023-01-27 07:26:10 - INFO - model_navigator.converter.torch_tensorrt: model_navigator.converter.torch_tensorrt command succeeded.
2023-01-27 07:26:10 - INFO - model_navigator.converter.torch_tensorrt: model_navigator.converter.torch_tensorrt command started.
2023-01-27 07:26:10 - WARNING - model_navigator.converter.torch_tensorrt: This conversion should be done on target GPU platform
2023-01-27 07:27:19 - INFO - model_navigator.converter.torch_tensorrt: model_navigator.converter.torch_tensorrt command succeeded.
2023-01-27 07:27:27 - INFO - optimize: Running Triton Model Configurator for converted models
2023-01-27 07:27:27 - INFO - optimize:  - my_model.ts2onnx_op14
2023-01-27 07:27:27 - INFO - optimize:  - my_model.ts2onnx_op14-polygraphyonnx2trt_fp32_mh
2023-01-27 07:27:27 - INFO - optimize:  - my_model
2023-01-27 07:27:27 - INFO - optimize:  - my_model.torch_tensorrt_module_precisionTensorRTPrecision.FP32
2023-01-27 07:27:27 - INFO - optimize:  - my_model.torch_tensorrt_module_precisionTensorRTPrecision.FP16
2023-01-27 07:27:27 - INFO - optimize: Running triton model configuration variants generation for my_model.ts2onnx_op14
2023-01-27 07:27:27 - INFO - optimize: Generated model variant my_model.ts2onnx_op14 for Triton evaluation.
Traceback (most recent call last):
  File "/opt/conda/bin/model-navigator", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/main.py", line 53, in main
    cli(max_content_width=160)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/optimize.py", line 235, in optimize_cmd
    config_results = _configure_models_on_triton(
  File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/optimize.py", line 445, in _configure_models_on_triton
    triton_server.start()
  File "/opt/conda/lib/python3.8/site-packages/model_navigator/triton/server/server_local.py", line 71, in start
    tritonserver_cmd = sh.Command(tritonserver_cmd)
  File "/opt/conda/lib/python3.8/site-packages/sh.py", line 1310, in __init__
    raise CommandNotFound(path)
sh.CommandNotFound: tritonserver

Steps To Reproduce

  1. prepare docker file for model navigator.
    Dockerfile
    FROM nvcr.io/nvidia/pytorch:22.10-py3
    ENV DEBIAN_FRONTEND=noninteractive
    
    # WAR for PEP660
    RUN pip install --no-cache-dir --upgrade pip==21.2.4 setuptools==57.4.0
    RUN pip install janome fugashi ipadic
    RUN pip install --extra-index-url https://pypi.ngc.nvidia.com git+https://github.com/triton-inference-server/[email protected]#egg=model-navigator[pyt,huggingface,cli] --upgrade
    
    
    ENTRYPOINT []
    
  2. Build
    docker build -f Dockerfile -t model-navigator .
    
  3. Run container
    docker run -it --rm \
    --ipc=host \
    --gpus 1 \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /home/ubuntu/triton/triton-inference-server/docs/examples/model_repository:/home/ubuntu/triton/triton-inference-server/docs/examples/model_repository \
    -v /home/ubuntu/model_navigator:/home/ubuntu/model_navigator \
    -w /home/ubuntu/model_navigator \
    --net host \
    --name model-navigator \
    model-navigator /bin/bash
    
    I didn't understand which directory I'm supposed to specify for "model-catalog", so I tried the following but all got the same error.
    • Skipping this line
    • [-v /home/ubuntu/models:/home/ubuntu/models] /models is an empty directory
    • [-v /home/ubuntu/triton/triton-inference-server/docs/examples/model_repository:/home/ubuntu/triton/triton-inference-server/docs/examples/model_repository] path to model_repository
  4. Use Model Navigator's nav.torch.export API to create .nav file from pytorch BERT model
  5. Run Optimize using .nav file previously created
    model-navigator optimize bert.nav
    
    Then I get the error I've mentioned above.

Inconsistency in the ModelConfig API

The model config in PyTriton has a field decoupled.
https://github.com/triton-inference-server/pytriton/blob/98d2fdd73cceab82f4c35a4fc4d90d5158f41504/pytriton/model_config/model_config.py#L27-L42

The model config in model navigator does not have this decoupled field.

This causes an error in the function _get_triton_model_config:
https://github.com/triton-inference-server/pytriton/blob/98d2fdd73cceab82f4c35a4fc4d90d5158f41504/pytriton/models/model.py#L227

Please add the decoupled field in the model config of the model navigator.

My current workaround until the decoupled field is added.


    mconfig = pytriton_adapter.config
    mconfig.decoupled = False

    with Triton() as triton:
        """Load model into Triton Inference Server."""
        triton.bind(
            model_name="linear",
            infer_func=infer_func,
            inputs=pytriton_adapter.inputs,
            outputs=pytriton_adapter.outputs,
            # config=pytriton_adapter.config,
            config=mconfig,
        )
        """Serve model through Triton Inference Server."""
        triton.serve()

Otherwise I get an error:

  File "/usr/local/lib/python3.10/dist-packages/pytriton/triton.py", line 431, in serve
    self.run()
  File "/usr/local/lib/python3.10/dist-packages/pytriton/triton.py", line 391, in run
    self._model_manager.create_models()
  File "/usr/local/lib/python3.10/dist-packages/pytriton/models/manager.py", line 72, in create_models
    model.generate_model(self._model_repository.path)
  File "/usr/local/lib/python3.10/dist-packages/pytriton/models/model.py", line 144, in generate_model
    triton_model_config = self._get_triton_model_config()
  File "/usr/local/lib/python3.10/dist-packages/pytriton/models/model.py", line 227, in _get_triton_model_config
    decoupled=self.config.decoupled,
AttributeError: 'ModelConfig' object has no attribute 'decoupled'

No such file or directory error when model-navigator convert executes

I am trying to convert from Tensorflow saved-model to onnx model, but when I execute model_navigator from command line, this error occurs:

FileNotFoundError: [Errno 2] No such file or directory: '/opt/tritonserver/workspace/navigator_workspace/convert_results.yaml

root@my_username:/opt/tritonserver/workspace# model-navigator convert --model-format tf-savedmodel --model-name efficientnetb4  --model-path test_models/tensorflow/saved_models/efficientnetb4 --output-path efficientnetb4.onnx  --override-workspace
2021-08-10 11:04:03 - INFO - model_navigator.utils.docker: Run docker container with image model_navigator_converter:21.05-tf2-py3; using workdir: /opt/tritonserver/workspace
Usage: model-navigator convert [OPTIONS]
Try 'model-navigator convert --help' for help.

Error: Missing option '-n' / '--model-name'.
Traceback (most recent call last):
  File "/usr/local/bin/model-navigator", line 33, in <module>
    sys.exit(load_entry_point('model-navigator', 'console_scripts', 'model-navigator')())
  File "/opt/tritonserver/workspace/model_navigator/model_navigator/cli/main.py", line 49, in main
    cli(max_content_width=160)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/opt/tritonserver/workspace/model_navigator/model_navigator/cli/convert_model.py", line 396, in convert_cmd
    return convert(
  File "/opt/tritonserver/workspace/model_navigator/model_navigator/cli/convert_model.py", line 311, in convert
    conversion_results = _run_in_docker(
  File "/opt/tritonserver/workspace/model_navigator/model_navigator/cli/convert_model.py", line 239, in _run_in_docker
    results = results_store.load("convert", ConversionResult)
  File "/opt/tritonserver/workspace/model_navigator/model_navigator/results.py", line 55, in load
    with results_path.open("r") as results_file:
  File "/usr/lib/python3.8/pathlib.py", line 1222, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/usr/lib/python3.8/pathlib.py", line 1078, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/opt/tritonserver/workspace/navigator_workspace/convert_results.yaml'

So I can't convert. Is there any solution for this? Thank you.

[Perf_Analyzer]: took very long to exit, killing perf_analyzer.

I tried to run model_navigator on my onnx model and it stops by giving following error..

2022-04-22 18:56:36.187 INFO[perf_analyzer.py:214] perf_analyzer took very long to exit, killing perf_analyzer... 2022-04-22 18:56:41.700 INFO[server_local.py:121] Stopped Triton Server.

Traceback (most recent call last): File "/usr/local/bin/model-analyzer", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 402, in main analyzer.profile(client=client, gpus=gpus) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/analyzer.py", line 125, in profile self._model_manager.run_models(models=[model]) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/model_manager.py", line 79, in run_models while not rcg.is_done() and not self._state_manager.exiting(): File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/generate/run_config_generator.py", line 60, in is_done return (self._pacg.is_done() and File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/generate/perf_analyzer_config_generator.py", line 84, in is_done return self._done_walking() or self._last_results_erroneous() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/generate/perf_analyzer_config_generator.py", line 176, in _done_walking and self._done_walking_concurrencies() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/generate/perf_analyzer_config_generator.py", line 183, in _done_walking_concurrencies 1) or not self._throughput_gain_valid() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/generate/perf_analyzer_config_generator.py", line 194, in _throughput_gain_valid valid_gains = [self._calculate_throughput_gain(x) > THROUGHPUT_MINIMUM_GAIN \ File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/generate/perf_analyzer_config_generator.py", line 194, in <listcomp> valid_gains = [self._calculate_throughput_gain(x) > THROUGHPUT_MINIMUM_GAIN \ File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/generate/perf_analyzer_config_generator.py", line 212, in _calculate_throughput_gain throughput_after = self._get_throughput(self._all_results[after_index]) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/generate/perf_analyzer_config_generator.py", line 217, in _get_throughput return measurement.get_metric_value('perf_throughput') AttributeError: 'NoneType' object has no attribute 'get_metric_value'

Traceback (most recent call last): File "/opt/model-navigator/model_navigator/cli/profile.py", line 130, in profile_cmd checkpoint_path = profiler.run() File "/opt/model-navigator/model_navigator/model_analyzer/profiler.py", line 100, in run analyzer.run(mode=ModelAnalyzerMode.PROFILE, verbose=self._verbose) File "/opt/model-navigator/model_navigator/model_analyzer/model_analyzer.py", line 73, in run raise ModelNavigatorException( model_navigator.exceptions.ModelNavigatorException: Running model-analyzer with ['model-analyzer', 'profile', '-f', '/home/darvis-ml3/darvis_ml/xperiments/model_navigator/navigator_workspace/analyzer/config-profile.yaml'] failed with exit status 1 : None

PyTreeMetaData infered wrong input mapping, causing wrong nav.optimize

maybe we need a more fool-proof PyTreeMetaData?

Version:0.7.4
Detailed steps to reproduce the bug:
1. use the torch bert example in code repo
2. change related codes from stuffs of DistilBertForMaskedLM to BertModel instead, which accept 3 inputs(input_ids, attention_mask, token_type_ids) instead of 2
3. run optimize.py and check the generated {model}/reproduce_***.sh, you'll see a wrong metadata input arg as:
"pytree_metadata": {"metadata": {"input_ids": "input_ids", "token_type_ids": "attention_mask", "attention_mask": "token_type_ids"}, "tensor_type": "torch"}}'
PyTreeMetadata Inferrred a wrong mapping for "token_type_ids"/"attention_mask", thus making subsequent conversion/evaluation wrong

possible cause?:
https://github.com/triton-inference-server/model_navigator/blob/8baf51016810cada8a750887758eabd5d1e6910d/model_navigator/core/tensor.py#L370C1-L370C1

PyTreeMetaData now inferred input mapping from samples(which is a dict for this case, {'input_id': ..., 'token_type_ids':..., 'attention_mask':...), and it relied on the order of the keys SAME TO those required in BertModel, which is actually (['input_id', 'attention_mask', 'token_type_ids']),
maybe use the dict keys' name aswell, to infer metadata?

PyTorch lightning to Onnx conversion

Hi i am trying to convert the donut model which is built on PyTorch lightning and its throwing me the following error.

**2022-09-27 15:02:35,649 INFO Navigator API: PyTorch to ONNX export started
2022-09-27 15:02:35,652 WARNING Navigator API: External errors are usually caused by incompatibilites between the model and the target formats and/or runtimes.
2022-09-27 15:02:35,652 WARNING Navigator API: Encountered an error when executing command:
Traceback (most recent call last):
File "/home/swapnil/anaconda3/envs/model-navigator/lib/python3.8/site-packages/model_navigator/framework_api/commands/export/pyt.py", line 133, in call
"output_names": list(output_metadata.keys()),
AttributeError: 'tuple' object has no attribute 'keys'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/swapnil/anaconda3/envs/model-navigator/lib/python3.8/site-packages/model_navigator/framework_api/commands/core.py", line 91, in transform
self.output = self.call(**kwargs)
File "/home/swapnil/anaconda3/envs/model-navigator/lib/python3.8/site-packages/model_navigator/framework_api/commands/export/pyt.py", line 142, in call
context.execute_local_runtime_script(exporters.pytorch2onnx.file, exporters.pytorch2onnx.export, args)
File "/home/swapnil/anaconda3/envs/model-navigator/lib/python3.8/site-packages/model_navigator/framework_api/exceptions.py", line 64, in exit
raise UserError(exc_value)
model_navigator.framework_api.exceptions.UserError: 'tuple' object has no attribute 'keys'

2022-09-27 15:02:35,652 INFO Navigator API: You can disable error suppression for debugging with flag NAV_DEBUG=1**

Can you please provide lights on whatever am i missing am able to load the model into inference mode and make inference without any issue.

Triton Navigator failing to convert the tensorflow model into tensorflow-trt.

While trying conversion from the tf-savedmodel format to tf-trt format model-navigator throws error (Please provide a full dataset profile instead of max_batch_size.). We have given the dataset profile as mentioned below but it throws the same error. Model similar to retina face which have dynamix axes throws a similar error.

Multiple Dynamic Axes

Model - Retinaface face detection.

Model-Config 

target_formats: ["tf-trt"]

model signature

inputs:

   input__0:

     name: data

     shape: [-1, -1, -1, 3]

     dtype: float32

outputs:

   output__0:

     name: face_rpn_bbox_pred_stride16

     shape: [-1, -1, -1, 8]

     dtype: float32

   output__1:

     name: face_rpn_bbox_pred_stride32

     shape: [-1, -1, -1, 8]

     dtype: float32

   output__2:

     name: face_rpn_bbox_pred_stride8

     shape: [-1, -1, -1, 8]

     dtype: float32

   output__3:

     name: face_rpn_landmark_pred_stride16

     shape: [-1, -1, -1, 20]

     dtype: float32

   output__0:

     name: face_rpn_landmark_pred_stride32

     shape: [-1, -1, -1, 20]

     dtype: float32

   output__0:

     name: face_rpn_landmark_pred_stride8

     shape: [-1, -1, -1, 20]

     dtype: float32

   output__0:

     name: tf.compat.v1.transpose_1

     shape: [-1, -1, -1, 4]

     dtype: float32

   output__0:

     name: tf.compat.v1.transpose_3

     shape: [-1, -1, -1, 4]

     dtype: float32

   output__0:

     name: tf.compat.v1.transpose_5

     shape: [-1, -1, -1, 4]

     dtype: float32

   output__0:

     name: face_rpn_bbox_pred_stride16

     shape: [-1, -1, -1, 8]

     dtype: float32

comparator config

atol:

   output__0: 0.01

rtol:

   output__0: 0.1

dataset profile

max_shapes:

   image__0: [-1, -1, -1, 3]

dtypes:

   image__0: float32

Model-Meta-Data

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:

  The given SavedModel SignatureDef contains the following input(s):

  The given SavedModel SignatureDef contains the following output(s):

    outputs['__saved_model_init_op'] tensor_info:

        dtype: DT_INVALID

        shape: unknown_rank

        name: NoOp

  Method name is: 

signature_def['serving_default']:

  The given SavedModel SignatureDef contains the following input(s):

    inputs['data'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 3)

        name: serving_default_data:0

  The given SavedModel SignatureDef contains the following output(s):

    outputs['face_rpn_bbox_pred_stride16'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 8)

        name: StatefulPartitionedCall:0

    outputs['face_rpn_bbox_pred_stride32'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 8)

        name: StatefulPartitionedCall:1

    outputs['face_rpn_bbox_pred_stride8'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 8)

        name: StatefulPartitionedCall:2

    outputs['face_rpn_landmark_pred_stride16'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 20)

        name: StatefulPartitionedCall:3

    outputs['face_rpn_landmark_pred_stride32'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 20)

        name: StatefulPartitionedCall:4

    outputs['face_rpn_landmark_pred_stride8'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 20)

        name: StatefulPartitionedCall:5

    outputs['tf.compat.v1.transpose_1'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 4)

        name: StatefulPartitionedCall:6

    outputs['tf.compat.v1.transpose_3'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 4)

        name: StatefulPartitionedCall:7

    outputs['tf.compat.v1.transpose_5'] tensor_info:

        dtype: DT_FLOAT

        shape: (-1, -1, -1, 4)

        name: StatefulPartitionedCall:8

  Method name is: tensorflow/serving/predict

Issues

/usr/local/lib/python3.8/dist-packages/numpy/core/getlimits.py:499: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.

  setattr(self, word, getattr(machar, word).flat[0])

/usr/local/lib/python3.8/dist-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.

  return self._float_to_str(self.smallest_subnormal)

/usr/local/lib/python3.8/dist-packages/numpy/core/getlimits.py:499: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.

  setattr(self, word, getattr(machar, word).flat[0])

/usr/local/lib/python3.8/dist-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.

  return self._float_to_str(self.smallest_subnormal)

Error: Cannot construct default dataset profile: too many dynamic axes in the model input data: [-1, -1, -1, 3]. Please provide a full dataset profile instead of max_batch_size.

Error: No results found for convert_model

Triton Navigator Failing At Torch Conversion To TRT-Torch.

I am working on model_navigator to deploy into the triton server but facing
Issue While converting the torchscript model into torch-trt framework it ends up throwing the error tensorflow module is missing as we are converting the torchscript format to torch-trt, missing tensorflow module is unexpected to occur. We have tested the model conversion using yolov5s model from official repository (https://github.com/ultralytics/yolov5)

Steps to replicate the issue:

make docker

docker run -it --rm --gpus 1 -v /var/run/docker.sock:/var/run/docker.sock -v : -v : -w --net host --name model-navigator model-navigator /bin/bash

model-navigator convert   --model-name yolov5  --model-format torchscript --model-path /workspace/model-files/yolov5s.pt  --target-formats onnx --gpus all

Error -

__2022-07-27 08:32:03 - INFO - model_navigator.utils.docker: Run docker container with image model_navigator_converter:22.06-py3; using workdir: /app/wrkdir

Traceback (most recent call last):

  File "/opt/conda/bin/model-navigator", line 8, in

    sys.exit(main())

  File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/main.py", line 53, in main

    cli(max_content_width=160)

  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in call

    return self.main(*args, **kwargs)

  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main

    rv = self.invoke(ctx)

  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1659, in invoke

    return _process_result(sub_ctx.command.invoke(sub_ctx))

  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke

    return ctx.invoke(self.callback, **ctx.params)

  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke

    return __callback(*args, **kwargs)

  File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func

    return f(get_current_context(), *args, **kwargs)

  File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/convert_model.py", line 476, in convert_cmd

    return convert(

  File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/convert_model.py", line 398, in convert

    conversion_results = _run_locally(

  File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/convert_model.py", line 146, in _run_locally

    dataloader = RandomDataloader(

  File "/opt/conda/lib/python3.8/site-packages/model_navigator/converter/dataloader.py", line 190, in init

    self._generate_default_profile(model_config, model_signature_config, max_batch_size)

  File "/opt/conda/lib/python3.8/site-packages/model_navigator/converter/dataloader.py", line 223, in _generate_default_profile

    model_signature = extract_model_signature(model_config.model_path)

  File "/opt/conda/lib/python3.8/site-packages/model_navigator/converter/dataloader.py", line 125, in extract_model_signature

    return module._get_tf_signature(model_path)

  File "/opt/conda/lib/python3.8/site-packages/Pyro4/core.py", line 185, in call

    return self.__send(self.__name, args, kwargs)

  File "/opt/conda/lib/python3.8/site-packages/Pyro4/utils/flame.py", line 83, in __invoke

    return self.flameserver.invokeModule(module, args, kwargs)

  File "/opt/conda/lib/python3.8/site-packages/Pyro4/core.py", line 185, in call

    return self.__send(self.__name, args, kwargs)

  File "/opt/conda/lib/python3.8/site-packages/Pyro4/core.py", line 476, in _pyroInvoke

    raise data  # if you see this in your traceback, you should probably inspect the remote traceback as well

ModuleNotFoundError: No module named 'tensorflow'

Error: No results found for convert_model

Can't install Model Navigator

Issue

I want to install model navigator for pytorch but getting the error below.
image

Environment

Ubuntu 18.04.6 LTS
Docker 20.10.13
CUDA Version: 11.7
NVIDIA-SMI 515.65.01
Driver Version: 515.65.01
GPU: Tesla T4

Steps To reproduce

  1. Create Docker File
FROM nvcr.io/nvidia/pytorch:22.10-py3
ENV DEBIAN_FRONTEND=noninteractive

# WAR for PEP660
RUN pip install --no-cache-dir --upgrade pip
RUN pip install janome fugashi ipadic
RUN pip install --extra-index-url https://pypi.ngc.nvidia.com .[pyt]


ENTRYPOINT []   
  1. Build image
docker build -f model_nav.Dockerfile -t model-navigator .

Segfault Immediately After Hifigan Conversion from TorchScript

Steps to reproduce

  1. Get a hifigan model from NeMo and export it to .pt
docker run --rm --gpus '"device=0"' -it --ipc=host \
-v $HOME/:/ext_home \
-v ${PWD}:${PWD} \
 -w ${PWD} \
--name $USER_nemo \
nvcr.io/nvidia/nemo:1.3.0

In the container above, run the following python code:

from nemo.collections.tts.models import HifiGanModel

model = HifiGanModel.from_pretrained(model_name="tts_hifigan")
model.export("./hifigan.pt")
  1. Run model-navigator container
git clone https://github.com/triton-inference-server/model_navigator.git
# Optional
# git checkout v0.2.2
make docker
cd ..

docker run -it --rm \
 --gpus 1 \
 -v /var/run/docker.sock:/var/run/docker.sock \
 -v ${PWD}:${PWD} \
 -w ${PWD} \
 --net host \
 --name model-navigator \
 model-navigator /bin/bash
  1. Get my Model Navigator config from Google Drive

  2. Run model-navigator inside the container:

cp navigator_config.yaml navigator_config_run.yaml; model-navigator run --config-path navigator_config_run.yaml

An Error

It results in a log that ends like

�[34m2021-10-20 15:57:10�[0m - �[1;34mWARNING�[0m - �[34mpolygraphy.transformers:�[0m �[33mThis conversion should be done on target GPU platform�[0m
�[34m2021-10-20 15:58:05�[0m - �[1;34mINFO�[0m - �[34mpolygraphy.transformers:�[0m Polygraphy onnx2trt succeed.
�[0m
Segmentation fault (core dumped)    

I've reconstruncted a backtrace of the segfault, and here it is in the textual form.
The coredump itself is too large to share.

Seems, like there is some error in protobuf description of ONNX-ML. Maybe the onnx version used is unstable.

Environment

NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4
Tesla V100-PCIE-16GB
Using Linux 4.18.0-305.17.1.el8_4.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 8.4.2105
Tested at v0.2.2, commit 988e96d

appropriate polygraphy and tensorRT version for main branch

Hi Trtion Team:

when I ran examples/torch/linear/optimize.py, I encournted the err as:

2023-05-10 17:44:17,576 INFO     Navigator:     [W] Unable to determine GPU memory usage
    [W] CUDA initialization failure with error: 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
    Traceback (most recent call last):
      File "/opt/conda/envs/py38/bin/polygraphy", line 8, in <module>
        sys.exit(main())
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/tools/_main.py", line 70, in main
        status = selected_tool.run(args)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/tools/base/tool.py", line 171, in run
        status = self.run_impl(args)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/tools/convert/convert.py", line 98, in run_impl
        with self.arg_groups[TrtLoadEngineBytesArgs].load_engine_bytes() as serialized_engine:
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/tools/args/backend/trt/loader.py", line 575, in load_engine_bytes
        return loader()
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/backend/base/loader.py", line 40, in __call__
        return self.call_impl(*args, **kwargs)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/util/util.py", line 694, in wrapped
        return func(*args, **kwargs)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/backend/trt/loader.py", line 492, in call_impl
        ret, owns_network = util.invoke_if_callable(self._network)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/util/util.py", line 663, in invoke_if_callable
        ret = func(*args, **kwargs)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/backend/base/loader.py", line 40, in __call__
        return self.call_impl(*args, **kwargs)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/util/util.py", line 694, in wrapped
        return func(*args, **kwargs)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/backend/trt/loader.py", line 207, in call_impl
        with util.FreeOnException(super().call_impl()) as (builder, network, parser):
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/util/util.py", line 694, in wrapped
        return func(*args, **kwargs)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/backend/trt/loader.py", line 127, in call_impl
        with util.FreeOnException(create_network(explicit_batch=self.explicit_batch)) as (builder, network):
      File "<string>", line 3, in create_network
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/backend/base/loader.py", line 40, in __call__
        return self.call_impl(*args, **kwargs)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/util/util.py", line 694, in wrapped
        return func(*args, **kwargs)
      File "/opt/conda/envs/py38/lib/python3.8/site-packages/polygraphy/backend/trt/loader.py", line 100, in call_impl
        with util.FreeOnException([trt.Builder(trt_util.get_trt_logger())]) as (builder,):
    TypeError: pybind11::init(): factory function returned nullptr

Here is my pip pkg version:

polygraphy: 0.47.1
tensorrt: 8.6.1
torch: 2.0.1

commands to reproduce(refer to model_navigator/examples/torch/linear/README.md):

./optimize.py --output-path linear.nav

Readme update

This install: pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[<extras,>]

breaks pips parser.

This works instead:

pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[extras]

When inferring with PyTritonAdapter, the results are different.

import model_navigator as nav
package = nav.package.load("package.nav")
pytriton_adapter = nav.pytriton.PyTritonAdapter(
    package=package, strategy=nav.MaxThroughputStrategy()
) # OnnxCUDARuntime is applied.
model_runner = pytriton_adapter.runner
model_runner.activate()

ort_sess = ort.InferenceSession("navigator_workspace/onnx/model.onnx")

output1 = ort_sess.run(None, {
    "input_values": data[0].numpy(),
})
output2 = model_runner.infer({
    "input_values": data[0].numpy()
})

output1 and output2 are sharply different. Even though the graph is in the navigator workspace.

Where can i export to triton with a warmup configuration?

Hello, thanks for your awesome triton automatic generator.When i export model to the configuration for tritonserver, some models need more time to initialize. Are there some codes to configure a simple warmup step automatically? Such as:

name: "yolov5"
max_batch_size: 2
input {
  name: "input__0"
  data_type: TYPE_FP32
  dims: 3
  dims: 1280
  dims: 1280
}
output {
  name: "output__0"
  data_type: TYPE_FP32
  dims: 100800
  dims: 85
}
instance_group {
  kind: KIND_GPU
}
dynamic_batching {
}
backend: "pytorch"
model_warmup [ # a warmup config field
    {
        name: "warmup_requests"
        batch_size: 1
        inputs: {
            key: "input__0" # input key
            value: {
                random_data: true
                dims: [3 ,1280, 1280] # random data
                data_type: TYPE_FP32
            }
        }
    }
]

The above model_warmup config will enable tritonserver to execute a warmup step when it launch a model. And if the warmup failed, then the model failed launching.

Questions related to TRT conversion and TRT-LLM support

I have 2 separate questions which I could not find an answer yet, so post it here hope someone can answer:

  1. When doing TRT conversion from torchscript to trt. Would nav call polygraphy surgeon sanitize to do things like constant folding? This is helpful when dealing with larger size models. It seems nav underlying uses polygraphy but want to check if it also sanitizes.

  2. There's an alpha release for TRT-LLM tool which combines TensorRT and FasterTransformer. Is this tool on your roadmap to support it? As a user for nav, I like the simpler interface it provides compared to do compilation/conversion in multiple steps. It would be great to see future support related to LLM.

Error in exporting keras model

Hi,

I have a keras model which I loaded and export to nav. I get error while exporting as below. any tips of finding out what could be wrong.

`
validation data loaded
shape is: (1, 100, 128, 128, 2)
model loaded
2022-06-21 04:59:56 INFO Navigator API: ============================== Config parameters ===================================================
2022-06-21 04:59:56 INFO Navigator API: {'framework': 'tensorflow2', 'model_name': 'brain_tumor_tf2_model', 'workdir': '/opt/model-navigator/navigator_workdir', 'override_workdir': True, 'target_formats': ['onnx'], 'sample_count': 100, 'disable_git_info': False, 'batch_dim': 0, 'seed': 0, 'timestamp': '2022-06-21T03:59:56.321501', '_input_names': ['input_1'], '_output_names': ['conv2d_22'], 'from_source': True, 'max_workspace_size': 8589934592, 'target_precisions': ['fp32', 'fp16'], 'minimum_segment_size': 3, 'target_device': 'cpu', 'opset': 14, 'onnx_runtimes': ['CPUExecutionProvider']}
2022-06-21 04:59:56 INFO Navigator API: ============================== Pipeline TensorFlow 2 pipeline started ==============================
2022-06-21 04:59:56 INFO Navigator API: ============================== Infer input metadata. ===============================================
2022-06-21 04:59:56 INFO Navigator API: ============================== Fetch input model data ==============================================
2022-06-21 04:59:56 WARNING Navigator API: No TRT (min, opt, max) values for axes provided. Using values derived from the dataloader: {'input_1': {0: (1, 1, 1), 1: (100, 100, 100), 2: (128, 128, 128), 3: (128, 128, 128), 4: (2, 2, 2)}}.
2022-06-21 04:59:56 WARNING Navigator API: No dynamic axes provided. Using values derived from the dataloader: defaultdict(<class 'list'>, {})
2022-06-21 04:59:56 INFO Navigator API: ============================== Infer output metadata. ==============================================
2022-06-21 04:59:56 ERROR Navigator API: UserError raised.
2022-06-21 04:59:56 WARNING Navigator API: External errors are usually caused by incompatibilites between the model and the target formats and/or runtimes.
2022-06-21 04:59:56 ERROR Navigator API: Traceback (most recent call last):
File "/opt/model-navigator/model_navigator/framework_api/commands/infer_metadata.py", line 127, in call
output = runner.infer(profiling_sample)
File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/runner.py", line 170, in infer
return self.infer_impl(feed_dict, *args, **kwargs)
File "/opt/model-navigator/model_navigator/framework_api/runners/tf.py", line 59, in infer_impl
if isinstance(self.model._saved_model_inputs_spec, Mapping):
AttributeError: 'str' object has no attribute '_saved_model_inputs_spec'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/model-navigator/model_navigator/framework_api/commands/core.py", line 81, in transform
self.output = self.call(**kwargs)
File "/opt/model-navigator/model_navigator/framework_api/commands/infer_metadata.py", line 127, in call
output = runner.infer(profiling_sample)
File "/opt/model-navigator/model_navigator/framework_api/exceptions.py", line 27, in exit
raise UserError(exc_value)
model_navigator.framework_api.exceptions.UserError: 'str' object has no attribute '_saved_model_inputs_spec'
`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.