Giter Site home page Giter Site logo

allegroai / clearml-serving Goto Github PK

View Code? Open in Web Editor NEW
125.0 11.0 39.0 1.96 MB

ClearML - Model-Serving Orchestration and Repository Solution

Home Page: https://clear.ml

License: Apache License 2.0

Python 96.64% Dockerfile 0.99% Shell 2.37%
machine-learning mlops devops deep-learning kubernetes ai clearml model-serving serving serving-pytorch-models

clearml-serving's Introduction

ClearML Serving - Model deployment made easy

clearml-serving v1.3.1
✨ Model Serving (ML/DL) Made Easy 🎉

🔥 NEW version 1.3 🚀 20% faster !

GitHub license PyPI pyversions PyPI version shields.io Artifact Hub Slack Channel

🌟 ClearML is open-source - Leave a star to support the project! 🌟

clearml-serving is a command line utility for model deployment and orchestration.
It enables model deployment including serving and preprocessing code to a Kubernetes cluster or custom container based solution.

🔥 NEW 🎊 Take it for a spin with a simple docker-compose command 🪄 ✨

Features:

  • Easy to deploy & configure
    • Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
    • Support Deep Learning Models (Tensorflow, PyTorch, ONNX)
    • Customizable RestAPI for serving (i.e. allow per model pre/post-processing for easy integration)
  • Flexible
    • On-line model deployment
    • On-line endpoint model/version deployment (i.e. no need to take the service down)
    • Per model standalone preprocessing and postprocessing python code
  • Scalable
    • Multi model per container
    • Multi models per serving service
    • Multi-service support (fully seperated multiple serving service running independently)
    • Multi cluster support
    • Out-of-the-box node auto-scaling based on load/usage
  • Efficient
    • Multi-container resource utilization
    • Support for CPU & GPU nodes
    • Auto-batching for DL models
  • Automatic deployment
    • Automatic model upgrades w/ canary support
    • Programmable API for model deployment
  • Canary A/B deployment
    • Online Canary updates
  • Model Monitoring
    • Usage Metric reporting
    • Metric Dashboard
    • Model performance metric
    • Model performance Dashboard

ClearML Serving Design

ClearML Serving Design Principles

Modular , Scalable , Flexible , Customizable , Open Source

Installation

Prerequisites

  • ClearML-Server : Model repository, Service Health, Control plane
  • Kubernetes / Single-instance Machine : Deploying containers
  • CLI : Configuration & model deployment interface

💅 Initial Setup

  1. Setup your ClearML Server or use the Free tier Hosting
  2. Setup local access (if you haven't already), see instructions here
  3. Install clearml-serving CLI:
pip3 install clearml-serving
  1. Create the Serving Service Controller
  • clearml-serving create --name "serving example"
  • The new serving service UID should be printed New Serving Service created: id=aa11bb22aa11bb22
  1. Write down the Serving Service UID
  2. Clone clearml-serving repository
git clone https://github.com/allegroai/clearml-serving.git
  1. Edit the environment variables file (docker/example.env) with your clearml-server credentials and Serving Service UID. For example, you should have something like
cat docker/example.env
  CLEARML_WEB_HOST="https://app.clear.ml"
  CLEARML_API_HOST="https://api.clear.ml"
  CLEARML_FILES_HOST="https://files.clear.ml"
  CLEARML_API_ACCESS_KEY="<access_key_here>"
  CLEARML_API_SECRET_KEY="<secret_key_here>"
  CLEARML_SERVING_TASK_ID="<serving_service_id_here>"
  1. Spin the clearml-serving containers with docker-compose (or if running on Kubernetes use the helm chart)
cd docker && docker-compose --env-file example.env -f docker-compose.yml up 

If you need Triton support (keras/pytorch/onnx etc.), use the triton docker-compose file

cd docker && docker-compose --env-file example.env -f docker-compose-triton.yml up 

💪 If running on a GPU instance w/ Triton support (keras/pytorch/onnx etc.), use the triton gpu docker-compose file

cd docker && docker-compose --env-file example.env -f docker-compose-triton-gpu.yml up 

Notice: Any model that registers with "Triton" engine, will run the pre/post processing code on the Inference service container, and the model inference itself will be executed on the Triton Engine container.

🌊 Optional: advanced setup - S3/GS/Azure access

To add access credentials and allow the inference containers to download models from your S3/GS/Azure object-storage, add the respective environment variables to your env files (example.env) See further details on configuring the storage access here

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION

GOOGLE_APPLICATION_CREDENTIALS

AZURE_STORAGE_ACCOUNT
AZURE_STORAGE_KEY

💁 Concepts

CLI - Secure configuration interface for on-line model upgrade/deployment on running Serving Services

Serving Service Task - Control plane object storing configuration on all the endpoints. Support multiple separated instance, deployed on multiple clusters.

Inference Services - Inference containers, performing model serving pre/post processing. Also support CPU model inferencing.

Serving Engine Services - Inference engine containers (e.g. Nvidia Triton, TorchServe etc.) used by the Inference Services for heavier model inference.

Statistics Service - Single instance per Serving Service collecting and broadcasting model serving & performance statistics

Time-series DB - Statistics collection service used by the Statistics Service, e.g. Prometheus

Dashboards - Customizable dashboard-ing solution on top of the collected statistics, e.g. Grafana

👉 Toy model (scikit learn) deployment example

  1. Train toy scikit-learn model
  • create new python virtual environment
  • pip3 install -r examples/sklearn/requirements.txt
  • python3 examples/sklearn/train_model.py
  • Model was automatically registered and uploaded into the model repository. For Manual model registration see here
  1. Register the new Model on the Serving Service
  • clearml-serving --id <service_id> model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --name "train sklearn model" --project "serving examples"
  • Notice the preprocessing python code is packaged and uploaded to the "Serving Service", to be used by any inference container, and downloaded in realtime when updated
  1. Spin the Inference Container
  • Customize container Dockerfile if needed
  • Build container docker build --tag clearml-serving-inference:latest -f clearml_serving/serving/Dockerfile .
  • Spin the inference container: docker run -v ~/clearml.conf:/root/clearml.conf -p 8080:8080 -e CLEARML_SERVING_TASK_ID=<service_id> -e CLEARML_SERVING_POLL_FREQ=5 clearml-serving-inference:latest
  1. Test new model inference endpoint
  • curl -X POST "http://127.0.0.1:8080/serve/test_model_sklearn" -H "accept: application/json" -H "Content-Type: application/json" -d '{"x0": 1, "x1": 2}'

Notice, now that we have an inference container running, we can add new model inference endpoints directly with the CLI. The inference container will automatically sync once every 5 minutes.

Notice On the first few requests the inference container needs to download the model file and preprocessing python code, this means the request might take a little longer, once everything is cached, it will return almost immediately.

Notes:

Review the model repository in the ClearML web UI, under the "serving examples" Project on your ClearML account/server (free hosted or self-deployed).

Inference services status, console outputs and machine metrics are available in the ClearML UI in the Serving Service project (default: "DevOps" project)

To learn more on training models and the ClearML model repository, see the ClearML documentation

🐢 Registering & Deploying new models manually

Uploading an existing model file into the model repository can be done via the clearml RestAPI, the python interface, or with the clearml-serving CLI.

To learn more on training models and the ClearML model repository, see the ClearML documentation

  • local model file on our laptop: 'examples/sklearn/sklearn-model.pkl'
  • Upload the model file to the clearml-server file storage and register it clearml-serving --id <service_id> model upload --name "manual sklearn model" --project "serving examples" --framework "scikit-learn" --path examples/sklearn/sklearn-model.pkl
  • We now have a new Model in the "serving examples" project, by the name of "manual sklearn model". The CLI output prints the UID of the newly created model, we will use it to register a new endpoint
  • In the clearml web UI we can see the new model listed under the Models tab in the associated project. we can also download the model file itself directly from the web UI
  • Register a new endpoint with the new model clearml-serving --id <service_id> model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --model-id <newly_created_model_id_here>

Notice we can also provide a differnt storage destination for the model, such as S3/GS/Azure, by passing --destination="s3://bucket/folder", gs://bucket/folder, azure://bucket/folder. Yhere is no need to provide a unique path tp the destination argument, the location of the model will be a unique path based on the serving service ID and the model name

🐰 Automatic model deployment

The clearml Serving Service support automatic model deployment and upgrades, directly connected with the model repository and API. When the model auto-deploy is configured, a new model versions will be automatically deployed when you "publish" or "tag" a new model in the clearml model repository. This automation interface allows for simpler CI/CD model deployment process, as a single API automatically deploy (or remove) a model from the Serving Service.

💡 Automatic model deployment example

  1. Configure the model auto-update on the Serving Service
  • clearml-serving --id <service_id> model auto-update --engine sklearn --endpoint "test_model_sklearn_auto" --preprocess "preprocess.py" --name "train sklearn model" --project "serving examples" --max-versions 2
  1. Deploy the Inference container (if not already deployed)
  2. Publish a new model the model repository
  • Go to the "serving examples" project in the ClearML web UI, click on the Models Tab, search for "train sklearn model" right click and select "Publish"
  • Use the RestAPI details
  • Use Python interface:
from clearml import Model
Model(model_id="unique_model_id_here").publish()
  1. The new model is available on a new endpoint version (1), test with: curl -X POST "http://127.0.0.1:8080/serve/test_model_sklearn_auto/1" -H "accept: application/json" -H "Content-Type: application/json" -d '{"x0": 1, "x1": 2}'

🐦 Canary endpoint setup

Canary endpoint deployment add a new endpoint where the actual request is sent to a preconfigured set of endpoints with pre-provided distribution. For example, let's create a new endpoint "test_model_sklearn_canary", we can provide a list of endpoints and probabilities (weights).

clearml-serving --id <service_id> model canary --endpoint "test_model_sklearn_canary" --weights 0.1 0.9 --input-endpoints test_model_sklearn/2 test_model_sklearn/1

This means that any request coming to /test_model_sklearn_canary/ will be routed with probability of 90% to /test_model_sklearn/1/ and with probability of 10% to /test_model_sklearn/2/.

Note:

As with any other Serving Service configuration, we can configure the Canary endpoint while the Inference containers are already running and deployed, they will get updated in their next update cycle (default: once every 5 minutes)

We can also prepare a "fixed" canary endpoint, always splitting the load between the last two deployed models:

clearml-serving --id <service_id> model canary --endpoint "test_model_sklearn_canary" --weights 0.1 0.9 --input-endpoints-prefix test_model_sklearn/

This means that is we have two model inference endpoints: /test_model_sklearn/1/ and /test_model_sklearn/2/. The 10% probability (weight 0.1) will match the last (order by version number) endpoint, i.e. /test_model_sklearn/2/ and the 90% will match /test_model_sklearn/2/. When we add a new model endpoint version, e.g. /test_model_sklearn/3/, the canary distribution will automatically match the 90% probability to /test_model_sklearn/2/ and the 10% to the new endpoint /test_model_sklearn/3/.

Example:

  1. Add two endpoints:
  • clearml-serving --id <service_id> model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --name "train sklearn model" --version 1 --project "serving examples"
  • clearml-serving --id <service_id> model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --name "train sklearn model" --version 2 --project "serving examples"
  1. Add Canary endpoint:
  • clearml-serving --id <service_id> model canary --endpoint "test_model_sklearn_canary" --weights 0.1 0.9 --input-endpoints test_model_sklearn/2 test_model_sklearn/1
  1. Test Canary endpoint:
  • curl -X POST "http://127.0.0.1:8080/serve/test_model" -H "accept: application/json" -H "Content-Type: application/json" -d '{"x0": 1, "x1": 2}'

📊 Model monitoring and performance metrics 🔔

Grafana Screenshot

ClearML serving instances send serving statistics (count/latency) automatically to Prometheus and Grafana can be used to visualize and create live dashboards.

The default docker-compose installation is preconfigured with Prometheus and Grafana, do notice that by default data/ate of both containers is not persistent. To add persistence we do recommend adding a volume mount.

You can also add many custom metrics on the input/predictions of your models. Once a model endpoint is registered, adding custom metric can be done using the CLI. For example, assume we have our mock scikit-learn model deployed on endpoint test_model_sklearn, we can log the requests inputs and outputs (see examples/sklearn/preprocess.py example):

clearml-serving --id <serving_service_id_here> metrics add --endpoint test_model_sklearn --variable-scalar
x0=0,0.1,0.5,1,10 x1=0,0.1,0.5,1,10 y=0,0.1,0.5,0.75,1

This will create a distribution histogram (buckets specified via a list of less-equal values after = sign), that we will be able to visualize on Grafana. Notice we can also log time-series values with --variable-value x2 or discrete results (e.g. classifications strings) with --variable-enum animal=cat,dog,sheep. Additional custom variables can be in the preprocess and postprocess with a call to collect_custom_statistics_fn({'new_var': 1.337}) see clearml_serving/preprocess/preprocess_template.py

With the new metrics logged we can create a visualization dashboard over the latency of the calls, and the output distribution.

Grafana model performance example:

  • browse to http://localhost:3000
  • login with: admin/admin
  • create a new dashboard
  • select Prometheus as data source
  • Add a query: 100 * increase(test_model_sklearn:_latency_bucket[1m]) / increase(test_model_sklearn:_latency_sum[1m])
  • Change type to heatmap, and select on the right hand-side under "Data Format" select "Time series buckets"
  • You now have the latency distribution, over time.
  • Repeat the same process for x0, the query would be 100 * increase(test_model_sklearn:x0_bucket[1m]) / increase(test_model_sklearn:x0_sum[1m])

Notice: If not specified all serving requests will be logged, to change the default configure "CLEARML_DEFAULT_METRIC_LOG_FREQ", for example CLEARML_DEFAULT_METRIC_LOG_FREQ=0.2 means only 20% of all requests will be logged. You can also specify per endpoint log frequency with the clearml-serving CLI. Check the CLI documentation with clearml-serving metrics --help

🔥 Model Serving Examples

  • Scikit-Learn example - random data
  • Scikit-Learn Model Ensemble example - random data
  • XGBoost example - iris dataset
  • LightGBM example - iris dataset
  • PyTorch example - mnist dataset
  • TensorFlow/Keras example - mnist dataset
  • Multi-Model Pipeline example - multiple models
  • Multi-Model ASync Pipeline example - multiple models
  • Custom Model example - custom data

🙏 Status

  • FastAPI integration for inference service
  • multi-process Gunicorn for inference service
  • Dynamic preprocess python code loading (no need for container/process restart)
  • Model files download/caching (http/s3/gs/azure)
  • Scikit-learn. XGBoost, LightGBM integration
  • Custom inference, including dynamic code loading
  • Manual model upload/registration to model repository (http/s3/gs/azure)
  • Canary load balancing
  • Auto model endpoint deployment based on model repository state
  • Machine/Node health metrics
  • Dynamic online configuration
  • CLI configuration tool
  • Nvidia Triton integration
  • GZip request compression
  • TorchServe engine integration
  • Prebuilt Docker containers (dockerhub)
  • Docker-compose deployment (CPU/GPU)
  • Scikit-Learn example
  • XGBoost example
  • LightGBM example
  • PyTorch example
  • TensorFlow/Keras example
  • Model ensemble example
  • Model pipeline example
  • Statistics Service
  • Kafka install instructions
  • Prometheus install instructions
  • Grafana install instructions
  • Kubernetes Helm Chart
  • Intel optimized container (python, numpy, daal, scikit-learn)

Contributing

PRs are always welcomed ❤️ See more details in the ClearML Guidelines for Contributing.

clearml-serving's People

Contributors

allegroai-git avatar amirhmk avatar besrym avatar fawadahmed322 avatar h4dr1en avatar jkhenning avatar pollfly avatar thepycoder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clearml-serving's Issues

Shared functions across preprocess file

Hello. I have some helper functions that are shared acros the preprocess.py files, so I'd like to refactor them. However, I'm not sure where I can put them, and how to import them? The pythonpath seems to be /root/clearml, but I can't find any of the files when I start browing there inside the inference Docker container.

Any insights?

Inconsistent argument syntax in clearml-serving client

Just noticed that the output type argument have a different syntax depending on which clearml-serving model command is run:

clearml-serving --id xxxxxxxxx model auto-update [...] --output-type float32

Returns an error:

clearml-serving: error: unrecognized arguments: --output-type float32

but it works with --output_type.

If you run the command clearml-serving model add, is the other way around: the argument --output_type throws an error, --output-type works just fine.

Request for unknown model: 'test_model_pytorch' version 1 is not at ready state

https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch

I'm running examples as per readme.md, but I get the following error.

What should I do?

{"detail":"Error processing request: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"Request for unknown model: 'test_model_pytorch' version 1 is not at ready state\"\n\tdebug_error_string = \"{\"created\":\"@1652700912.192078289\",\"description\":\"Error received from peer ipv4:172.25.0.5:8001\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":1069,\"grpc_message\":\"Request for unknown model: 'test_model_pytorch' version 1 is not at ready state\",\"grpc_status\":14}\"\n>"}

Endpoints appear to be normal.

newplot (1)
newplot (2)

triton model breaks serving instance

We have setup clearml serving on Kubernetes including triton support. Our triton instance has no GPU, so deploying a model leads to the following error in the triton instance:

E0718 07:41:21.083440 30 model_lifecycle.cc:596] failed to load 'distilbert-test2' version 1: Invalid argument: unable to load model 'distilbert-test2', TensorRT backend supports only GPU device

Trying to remove the model again is not possible:
clearml-serving --id 5097f44fe9cb45f7be2a917c6fe8cad9 model remove --endpoint distilbert-test2

yields the following:

`clearml-serving - CLI for launching ClearML serving engine
2023-07-18 09:47:59,260 - clearml.Task - ERROR - Failed reloading task 5097f44fe9cb45f7be2a917c6fe8cad9
2023-07-18 09:47:59,290 - clearml.Task - ERROR - Failed reloading task 5097f44fe9cb45f7be2a917c6fe8cad9

Error: Task ID "5097f44fe9cb45f7be2a917c6fe8cad9" could not be found
`

In general, our observation is that the serving is not resilient against these kind of problems. A broken model should not break the instance.

AWS S3 storage driver (boto3) not found

Hello! I am trying to use clearml-serving to serve my PyTorch pretrained model.
I deploy ClearML Server and using S3 Minio on local network to store artifacts and pretrained weights.

There is no problem with storing and getting models using Input\Output Models. Everything works correctly.
But clearml-serving (particularly the clearml-serving-triton container) don't have opportunity to work with Minio as it has not python module boto3

Using tutorial I add S3 credentials to example.env:

CLEARML_WEB_HOST=http://192.168.3.217:8080
CLEARML_API_HOST=http://192.168.3.217:8008
CLEARML_FILES_HOST=http://192.168.3.217:8081
CLEARML_API_ACCESS_KEY=CLEARML_API_ACCESS_KEY
CLEARML_API_SECRET_KEY=CLEARML_API_SECRET_KEY
CLEARML_SERVING_TASK_ID="ccfed15e442242a19338c20772562df2"
AWS_ACCESS_KEY_ID=AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY=AWS_SECRET_ACCESS_KEY

After that it doesn't work as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY not be sent to docker-compose. I add those variables to clearml-serving-triton container in the way to solve it:

  clearml-serving-triton:
    image: allegroai/clearml-serving-triton:latest
    container_name: clearml-serving-triton
    restart: unless-stopped
    # optimize perforamnce
    security_opt:
      - seccomp:unconfined
    # ports:
      # - "8001:8001"
    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
      CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
      CLEARML_TRITON_POLL_FREQ: ${CLEARML_TRITON_POLL_FREQ:-1.0}
      CLEARML_TRITON_METRIC_FREQ: ${CLEARML_TRITON_METRIC_FREQ:-1.0}
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-ACCES_KEY}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-SECRET_ACCESS_KEY}

But after that there is an error in this container:

clearml-serving-triton        | 2022-12-15 05:05:45,607 - clearml.storage - ERROR - AWS S3 storage driver (boto3) not found. Please install driver using: pip install "boto3>=1.9"

I guess that it can be fixed adding "boto3>=1.9" to the container requirements.txt here:
https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/engines/triton/requirements.txt

After doing this and building using local docker image I get the following error:

clearml-serving-triton        | 2022-12-15 05:10:54,624 - clearml.storage - ERROR - Could not download s3://192.168.3.217:9000/models/test/RegNet.b04da49b696a472b94677e26762078d1/models/regnet_y_400MF.pt , err: SSL validation failed for https://192.168.3.217:9000/models/test/RegNet.b04da49b696a472b94677e26762078d1/models/regnet_y_400MF.pt [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1131)

And I don't have any ideas how to disable secure connection in this container

Error triton helper file

Hello,
I'm trying clearml serving with nvidia triton. I'm having trouble with the above error which is FileNotFoundError: [WinError 2] The system cannot find the file specified. Can someone help me?
error triton helper

Triton cannot find downloaded model

Following commit b5f5d72 , the fixes regarding the container arguments and the cloud service python SDK's were resolved, howeve the Triton server still cannot find the downloaded model from Azure Blob Store locally.

This is because the name of the file is inherited from the Azure filename, rather than the expected "model.pt" that Triton is looking for. The model is placed in the correct folder structure, just not the correct name.

I successfully resolved this in my fork, by placing the following at the end of the triton_model_service_update_step method of the ServingService class.

new_target_path = Path(os.path.join(target_path.parent),'model.pt')
shutil.move(target_path.as_posix(), new_target_path.as_posix())

Issue with pytorch preprocess code

Hi, I have encountered error stating that model was expecting input [1 28 28] but given [1 784] when trying out the pytorch example. I think it is due to the flatten() of the array before return by the preprocess method.

Can I also ask

  1. How do we update the preprocess code to the same created endpoint using command line / codes?
  2. When we create the endpoint with the preprocess code, the code preprocess.py is stored in the clearml server. Does the inference container periodically pull from clearml server or the clearml server will push to the inference container upon any update? May I know where to access this codes that manage this behavior to better understand what's going behind this?

Thanks.

Multiple TensorRT handling(plan file per gpu)

I have a question regarding the use of multiple TensorRT engines and how ClearML addresses this issue. As you may know, TensorRT plan files need to be optimized based on the compute capability of each GPU. Consequently, each GPU requires a distinct plan file. Triton addresses this by introducing a variable named cc_model_filenames in config.pbtxt, where we specify which model will be used for each GPU, based on the compute capability. However, in ClearML, and specifically within triton_helper.py, it seems that any plan file is renamed to model.plan. This approach appears to be problematic in cases where different GPUs are used. For example, in my configuration, I have:

model-repository
       | -------- Resnet50
                      | -------- config.pbtxt
                      | -------- 1
                                 | -------- resnet50_T4.plan
                                 | -------- resnet50_A100.plan

And my config.pbtxt looks like this:

cc_model_filenames [
  {
    key: "7.5"
    value: "resnet50_T4.plan"
  },
  {
    key: "8.0"
    value: "resnet50_A100.plan"
  }
]

Given the code written in triton_helper.py, is it possible to manage multiple models?

Triton inference server docker container deployment fails due to ports conflict

I've been following the example on Keras, but using a PyTorch model.
I have setup a serving instance with the following command:

clearml-serving triton --project "Caltech Birds/Deployment" --name "ResNet34 Serving"

I then added the model endpoint and the model ID of the model to be served:

clearml-serving triton --endpoint "resent34_cub200" --model-id "57ed24c1011346d292ecc9e797ccb47e"

The model was trained using an experiment script which included the generation of a config.pbtxt configuration file at the time of completion of model training. This was connected to the experiment configuration as per the Keras example, and resulted in the following configuration being added to the experiment:

            platform: "pytorch_libtorch"
            input [
                {
                    name: "input_layer"
                    data_type: TYPE_FP32
                    dims: [ 3, 224, 224 ]
                }
            ]
            output [
                {
                    name: "fc"
                    data_type: TYPE_FP32
                    dims: [ 200 ]
                }
            ]

I then created a queue on a GPU compute node (as the model requires GPU resource):

clearml-agent daemon --queue default --gpus all --detached --docker

The serving endpoint is then started with the following command:

clearml-serving launch -queue default

I can see two items in my deployment sub-project, the service I created, and a triton serving engine inference object.

On execution, the triton serving engine inference fails with the following errors:

2021-06-08 16:28:49
task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clearml-compute-gpu-002:0
2021-06-08 16:28:49
Running Task f2fbb3218e8243be9f6ab37badbb4856 inside docker: nvcr.io/nvidia/tritonserver:21.03-py3 arguments: ['--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002']
2021-06-08 16:28:50
Executing: ['docker', 'run', '-t', '--gpus', 'all', '--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002', '-e', 'CLEARML_WORKER_ID=ecm-clearml-compute-gpu-002:0', '-e', 'CLEARML_DOCKER_IMAGE=nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002', '-v', '/tmp/.clearml_agent.ft8vulpe.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.j9b8arhf:/root/.ssh', '-v', '/home/edmorris/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/edmorris/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/edmorris/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/edmorris/.clearml/cache:/clearml_agent_cache', '-v', '/home/edmorris/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvcr.io/nvidia/tritonserver:21.03-py3', 'bash', '-c', 'apt-get update ; apt-get install -y git ; . /opt/conda/etc/profile.d/conda.sh ; conda activate base ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring  --id f2fbb3218e8243be9f6ab37badbb4856']
2021-06-08 16:28:55
docker: Error response from daemon: driver failed programming external connectivity on endpoint wonderful_galileo (0c2feca5684f2f71b11fa1e8da4550d42b23c456e52ba0069d0aae64cd75f55b): Error starting userland proxy: listen tcp4 0.0.0.0:8001: bind: address already in use.
2021-06-08 16:28:55
Process failed, exit code 125

This could be related to the parameters of the Triton docker container, which includes both ipc=host and specific port mapping '-p', '8000:8000'. This appears to be specified in the ServingService.launch_service() method from the ServingService Class from the clearml-serving package would appear to have both been specified as hard coded for the Triton docker container:

def launch_engine(self, queue_name, queue_id=None, verbose=True):
        # type: (Optional[str], Optional[str], bool) -> None
        """
        Launch serving engine on a specific queue
        :param queue_name: Queue name to launch the engine service running the inference on.
        :param queue_id: specify queue id (unique stand stable) instead of queue_name
        :param verbose: If True print progress to console
        """
        # todo: add more engines
        if self._engine_type == 'triton':
            # create the serving engine Task
            engine_task = Task.create(
                project_name=self._task.get_project_name(),
                task_name="triton serving engine",
                task_type=Task.TaskTypes.inference,
                repo="https://github.com/allegroai/clearml-serving.git",
                branch="main",
                commit="ad049c51c146e9b7852f87e2f040e97d88848a1f",
                script="clearml_serving/triton_helper.py",
                working_directory=".",
                docker="nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002",
                argparse_args=[('serving_id', self._task.id), ],
                add_task_init_call=False,
            )
            if verbose:
                print('Launching engine {} on queue {}'.format(self._engine_type, queue_id or queue_name))
            engine_task.enqueue(task=engine_task, queue_name=queue_name, queue_id=queue_id)

Where to find logging for preprocessing Custom model

Trying to create a custom model using Ultralytics' YoloV8, I got this message while using Postman for testing my endpoint.

image

header:
image

body payload:

{
"imgString": "base64encodedImage"
}

The preprocess input would be like this:

def preprocess(self, body: dict, state: dict, collect_custom_statistics_fn=None) -> Any:
        print(body)
        base64String = body.get("imgString")
        print(base64String)
        self._image = cv2.imdecode(np.frombuffer(base64.b64decode(base64String), np.uint8), cv2.IMREAD_COLOR)
        self._scalingH, self._scalingW = self._image.shape[0]/imgSize, self._image.shape[1]/imgSize
        data = cv2.resize(self._image, (imgSize, imgSize))
        return data

The process

def process(
            self,
            data: Any,
            state: dict,
            collect_custom_statistics_fn: Optional[Callable[[dict], None]],
    ) -> Any:  # noqa
        

        # this is where we do the heavy lifting, i.e. run our model.
        results = self._model.predict(data, imgsz = imgSize,
                                      conf = configModel["model-config"]["conf"], iou = configModel["model-config"]["iou"],
                                      save = configModel["model-config"]["save-mode"], save_conf = configModel["model-config"]["save-mode"],
                                      save_crop = configModel["model-config"]["save-mode"], save_txt = configModel["model-config"]["save-mode"],
                                      device = configModel["model-config"]["device-mode"])
        return results

and the postprocess like this.

def postprocess(self, data: Any, state: dict, collect_custom_statistics_fn=None) -> dict:
        results = data
        classes = results[0].names

        imgDict = {}
        finalDict = {}
        dictDataEntity = {}
        for boxes in results[0].boxes:
            for box in boxes:
                labelNo = int(box.cls)

                x1 = int(box.xyxy[0][0]*self._scalingW)
                y1 = int(box.xyxy[0][1]*self._scalingH)
                x2 = int(box.xyxy[0][2]*self._scalingW)
                y2 = int(box.xyxy[0][3]*self._scalingH)

                tempCrop = self._image[y1:y2, x1:x2]

                imgDict.update({labelNo:tempCrop})

        orderedDict = OrderedDict(sorted(imgDict.items()))
        for key, value in orderedDict.items():
            for classKey, classValue in classes.items(): 
                if key == classKey:
                    finalDict[classValue] = value

        img_v_resize = hconcat_resize(finalDict.values(),imgDelimiter) #
        gray_imgResize = get_grayscale(img_v_resize) # call the grayscaling function
        success, encoded_image = cv2.imencode('.jpg', gray_imgResize) # save the image in memory
        BytesImage = encoded_image.tobytes()
        a = cv2.resize(img_v_resize, (960, 540))
        #cv2.imwrite("test.jpg", gray_imgResize)

        text_response = get_text_response_from_path(BytesImage)

        #========== POST PROCESSING ================#
        dataEntity = text_response[0].description.strip() # show only the description info from gvision
        a = [i.split("\n") for i in dataEntity.split('PEMISAH') if i]
        

        value = []
        value.clear()
        for i in a:
            c = [d for d in i if d]
            listToStr = ' '.join([str(elem) for elem in c])
            stripListToStr = listToStr.strip()
            value.append(stripListToStr)

        i = 0

        for entity in classes.values():
            dictDataEntity[entity] = value[i]
            i+=1
            if len(value) == i:
                break

        for label in classes.values():
            if label not in dictDataEntity.keys():
                dictDataEntity[label] = "-"

        return dict(predict=dictDataEntity.tolist())

the problem is I want to check the logging to find which codes having problems from that and I can't find where the log is for preprocessing. because I'm pretty sure the problem is one of my codes but I cant find from what line it is. or is there any way to write the log in the docker log or terminal. Thanks

String not supported for Triotn

Have been working on model ensemble, continuing conversations from #53, just thought it may be a better idea to create new issues for different things that I find along the way. Essentially we want the output of the model be a S3 path where all the results are saved to as a JSON.

However it doesn't seem like clearml-serving is mapping object datatype properly? Triton does support strings..

The issue lies here I believe: np_to_triton_dtype. This currently maps an object to TYPE_BYTES to be written to the config.pbtxt file (which is not a valid type as per link above), whereas it should be TYPE_STRING.

Unable to load onnx models into Triton

Hi there,

I have been working on deploying our inference pipeline on clearml-serving using the docker-compose approach. I've hashed out most of the issues thus far thanks to the community, now I am facing another issue while loading onnx models.

I am getting the following error:

clearml-serving-triton   | mmdet  | UNAVAILABLE: Internal: **failed to stat file /models/mmdet/1/model.onnx**

I exec'd into the container to see what's inside /models, and under /models/mmdet/1 there was a model.bin but no model.onnx. I created the modeling using the OutputModel. I also tried doing it through the CLI:

clearml-serving --id $SERVING_ID model upload --name "mmdet_cli" --project $PROJECT_NAME --path /mmdet/model.onnx

but the same thing. I'm guessing when the folder structure is getting set up, this file gets renamed to a .bin extention. Should this be happening or am I doing something wrong?

When I download the file from the models section in the portal, it's an onnx file, exactly the one I uploaded. So not sure where this renaming is happening tbh...
Screenshot 2023-04-13 at 3 12 25 PM

Inconsistent inference results from clearml-serving

Hello,
I deployed a model using clearml-serving, but it generate inconsistent results across same HTTP requests.

To recreate:

  1. I deployed a self-hosted clearml server in my local kubernetes (from docker image allegroai/clearml:1.4.0).
  2. Reused the pytorch MNIST example from https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch.
  3. Went through the model training process.
  4. Installed clearml-serving with helm (helm repo: NAME allegroai/clearml-serving, CHART VERSION 0.4.1, APP VERSION 0.9.0).
  5. Deployed the MNIST model to a serving endpoint.
  6. Tested the endpoint "http://ip:port/serve/test_model_pytorch" using POSTMAN

Everything goes well as the readme.md from https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch instructed.
But mysteriously, the HTTP responses are not consistent! (The MNIST model occasionally returns different "digits" from the same input image)

I'm quite confused here, and have no idea if any random process happens during the model inference.
Thanks for any help!

Deploying Models from Azure Blob

Models which are located on the clearML servers (created by Task.init(..., output_uri=True) ) run perfectly while models which are located on azure blob storage produce different problems in different scenarios:

  1. start the docker container, add a model from the clearML server and afterwards add a model located on azure (on the same endpoint) -> no error, http requests are answered properly (but probably the model which was added first is used)
  2. start the docker container with no model added and first add a model from azure -> error: test_model_pytorch': failed to open text file for read /models/test_model_pytorch/config.pbtxt: No such file or directory .
  3. start the docker container where a model from azure was already added before -> error:
clearml-serving-triton        | Error retrieving model ID ca186e8440b84049971a0b623df36783 []
clearml-serving-triton        | Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
clearml-serving-triton        | Traceback (most recent call last):
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 540, in <module>
clearml-serving-triton        |     main()
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 532, in main
clearml-serving-triton        |     helper.maintenance_daemon(
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 274, in maintenance_daemon
clearml-serving-triton        |     raise ValueError("triton-server process ended with error code {}".format(error_code))
clearml-serving-triton        | ValueError: triton-server process ended with error code 1

Side note: The same problem occurs hosting the containers on windows and on linux. All azure credentials are succesfully set up as envioronment variables in 'clearml-serving-inference', 'clearml-serving-triton' and 'clearml-serving-statistics' containers.

ValueError: dictionary update sequence element #0 has length <>; 2 is required

Hi everyone! I use command to create entrypoint:

clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --input-size '[1, 640]' '[640, 1]' --input-name 'encoder_outputs' 'decoder_outputs' --input-type float32 float32 --output-size '[100]' --output-name 'outputs' --output-type float32 --aux-config name=\"conformer_joint\" max_batch_size=16 dynamic_batching.max_queue_delay_microseconds=100 platform=\"onnxruntime_onnx\" default_model_filename=\"model.bin\"

this command creates config.pbtxt like that (copied from logs):

name: "conformer_joint"
platform: "onnxruntime_onnx"
default_model_filename: "model.bin"
input: [{
    dims: [-1, 1, 640]
    data_type: TYPE_FP32
    name: "encoder_outputs"
  },
  {
    dims: [-1, 640, 1]
    data_type: TYPE_FP32
    name: "decoder_outputs"
  }]
output: [{
    dims: [-1, 129]
    data_type: TYPE_FP32
    name: "outputs"
  }]

Logs from k8s:

I0802 22:48:17.274440 53 model_repository_manager.cc:1206] loading: conformer_joint:1
I0802 22:48:17.274536 53 onnxruntime.cc:2560] TRITONBACKEND_ModelInitialize: conformer_joint (version 1)
I0802 22:48:17.274881 53 onnxruntime.cc:666] skipping model configuration auto-complete for 'conformer_joint': inputs and outputs already specified
I0802 22:48:17.276238 53 onnxruntime.cc:2603] TRITONBACKEND_ModelInstanceInitialize: conformer_joint (GPU device 0)
I0802 22:48:17.279143 53 model_repository_manager.cc:1352] successfully loaded 'conformer_joint' version 1

And there are no error in clearml-serving. But when I'm trying to create request like that

import numpy as np
import requests
r = requests.post(f"<URL>", json={"encoder_outputs": np.random.randn(1, 1, 640).tolist(), "decoder_outputs": np.random.randn(1, 640, 1).tolist()})
r.json()

I get this:

[2023-08-02 23:02:51 +0000] [113] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 152, in jsonable_encoder
    data = dict(obj)
           ^^^^^^^^^
ValueError: dictionary update sequence element #0 has length 129; 2 is required

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 436, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/root/clearml/clearml_serving/serving/main.py", line 31, in custom_route_handler
    return await original_route_handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 255, in app
    content = await serialize_response(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 152, in serialize_response
    return jsonable_encoder(response_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 117, in jsonable_encoder
    encoded_value = jsonable_encoder(
                    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 160, in jsonable_encoder
    raise ValueError(errors) from e
ValueError: [ValueError('dictionary update sequence element #0 has length 129; 2 is required'), TypeError('vars() argument must have __dict__ attribute')]

I think this is because of batch size and maybe I need to add something in config.pbtxt. Any ideas?

Thanks in advance!

Error: Failed loading preprocess code for '<>': No module named 'transformers'

Hi everyone!

I faced the problem with ClearML-serving. I've deployed onnx model from huggingface in clearml-serving, but "Error processing request: Error: Failed loading pre process code for '<>': No module named 'transformers'" appears when trying to send a request like in example (https://github.com/allegroai/clearml-serving/tree/main/examples/huggingface).

Preprocessing file just like in example.

The transformers package has been installed with CLEARML_EXTRA_PYTHON_PACKAGES variable in serving service deployment file.

You got any ideas?

Thanks in advance

Behavior of published model

Apologies that I may not have understood well as there are limited documentation.

From the read me:
"Notice: If we re-run our keras training example and publish a new model in the repository, the engine will automatically update to the new model."

I tested, when i first run my training but did not publish. When i start triton, this version is still avail in triton for inference. Is this correct?

I also tried after start triton with version 1, i retrain the same model with same params. The triton polling indicates no change, thus did not pull the models over. Can I ask if this is the intended behavior?

Docker image not up-to-date

Hey there,

I just tried launching a new serving instance as our demands are growing. A few months ago I commited a change that resolved a missing await allowing us to override the process() method.

However, it seems like when pulling the latest docker image, this change is not reflected as no new image is pushed to docker hub. I'm not sure how often you release, but it seems like there are other changes which may not be reflected on the images. Could you please elaborate? Should I just create my own image from the updates source code...?

AttributeError: module 'numpy' has no attribute 'int'

I am trying to install clearml serving on python 3.9.
The problem seems to be related to new releases of numpy.

Here is the full stack trace:

clearml-serving create --name "serving example"

Traceback (most recent call last):
  File "/Users/galleon/.pyenv/versions/maio-serving/bin/clearml-serving", line 5, in <module>
    from clearml_serving.__main__ import main
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/__main__.py", line 9, in <module>
    from clearml_serving.serving.model_request_processor import ModelRequestProcessor, CanaryEP
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/model_request_processor.py", line 18, in <module>
    from .preprocess_service import BasePreprocessRequest
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/preprocess_service.py", line 247, in <module>
    class TritonPreprocessRequest(BasePreprocessRequest):
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/preprocess_service.py", line 253, in TritonPreprocessRequest
    np.int: 'int_contents',
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Installed packages:

pip list
Package            Version
------------------ -----------
attrs              22.2.0
certifi            2022.12.7
charset-normalizer 3.1.0
clearml            1.9.3
clearml-serving    1.2.0
furl               2.1.3
idna               3.4
jsonschema         4.17.3
numpy              1.24.2
orderedmultidict   1.0.1
pathlib2           2.3.7.post1
Pillow             9.4.0
pip                23.0.1
psutil             5.9.4
PyJWT              2.4.0
pyparsing          3.0.9
pyrsistent         0.19.3
python-dateutil    2.8.2
PyYAML             6.0
requests           2.28.2
setuptools         58.1.0
six                1.16.0
urllib3            1.26.15

docker-compose-triton-gpu fails due to "PTX was compiled with an unsupported toolchain"

Describe the bug
After following the docker-compose-triton-gpu.yml instructions for the pytorch example the server fails to spin up. The service fails due to the following error:

model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.

To Reproduce
Steps to reproduce the behavior:

  1. Run the pytorch example in
    # Train and Deploy Keras model with Nvidia Triton Engine

Expected behavior
The service spins up without the model_repository_manager.cc:1152 error message.

Screenshots
n/a

Desktop (please complete the following information):

  • OS: Ubuntu 20.04.1
  • Virtualization version: (docker --version & docker-compose --version)
    docker --version & docker-compose --version [1] 1611412 Docker version 20.10.16, build aa7e414 docker-compose version 1.29.2, build 5becea4c [1]+ Done docker --version

Additional context
See similar issue here: triton-inference-server/server#3877

Serving with custom engine

Hello,

I checked the pipeline example where you use custom engine, but it is not very complete. What if I want to run normal pytorch inference without any engine?

Is it also possible to implement my own Rest API (e.g. Flask) or at least have more control how I process my inferences? In your README.md, it says: Customizable RestAPI for serving (i.e. allow per model pre/post-processing for easy integration). How can a really customize the RestAPI?

Thanks!
Bruno

Endpoint authorization with API key

Hello,

I have a multi-tenant application and I would like to control who have access to each endpoint with API Keys. That is still a bit unclear for me. How can I authorize users before they consume some endpoint?

This question also extends to serving engines in general, like TorchServe. How people are normally controlling access to the inference APIs?

Thanks,
Bruno

Add on for README

Hello, after some issues raised, there are codes add on to define triton engine args (e.g. ports, triton version) but the usage of these are not updated to README nor the --help yet.

Also it is not very clear the purpose of certain args like project name, name even after reading the --help.
Perhaps these can be updated in the README, so easier to use it.

This is also related to my failed attempt to use my own ClearML server and Triton setup with ClearML serving.
Suspect it might be due to familiarisation of these args and also might have gaps in implementation. But suggest to get the args right first, so that I can test out further.

clearml.storage - ERROR - Google cloud driver not found

Trying to call model endpoint where the model stored on GCP bucket, getting the error:

clearml.storage - ERROR - Google cloud driver not found. Please install driver using: pip install "google-cloud-storage>=1.13.2"

After installing manually it works.

We use the k8s version - installed with helm

Could not download model in triton container

Hello!

I use ClearML free (the one without configuration vault stuff) + clearml-serving module

When I spinned docker-compose and tried to pull model from our s3, I've got an error in tritonserver container:

2024-03-13 11:26:56,913 - clearml.storage - WARNING - Failed getting object size: ClientError('An error occurred (403) when calling the HeadObject operation: Forbidden')
2024-03-13 14:26:57
2024-03-13 11:26:57,042 - clearml.storage - ERROR - Could not download s3://<BUCKET>/<FOLDER>/<PROJECT>/<TASK_NAME>.75654091e56141199c9d9594305d6872/models/model_package.zip , err: An error occurred (403) when calling the HeadObject operation: Forbidden

But I've set env variables in example.env (AWS_ ones too) and I could find them in tritonserver container via

$ env | grep CLEARML
$ env | grep AWS

FILES

docker-compose-triton-gpu.yaml

version: "3"

services:
  zookeeper:
    image: bitnami/zookeeper:3.7.0
    container_name: clearml-serving-zookeeper
    # ports:
      # - "2181:2181"
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
    networks:
      - clearml-serving-backend

  kafka:
    image: bitnami/kafka:3.1.1
    container_name: clearml-serving-kafka
    # ports:
      # - "9092:9092"
    environment:
      - KAFKA_BROKER_ID=1
      - KAFKA_CFG_LISTENERS=PLAINTEXT://clearml-serving-kafka:9092
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://clearml-serving-kafka:9092
      - KAFKA_CFG_ZOOKEEPER_CONNECT=clearml-serving-zookeeper:2181
      - ALLOW_PLAINTEXT_LISTENER=yes
      - KAFKA_CREATE_TOPICS="topic_test:1:1"
    depends_on:
      - zookeeper
    networks:
      - clearml-serving-backend

  prometheus:
    image: prom/prometheus:v2.34.0
    container_name: clearml-serving-prometheus
    volumes:
      - ./prometheus.yml:/prometheus.yml
    command:
      - '--config.file=/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=200h'
      - '--web.enable-lifecycle'
    restart: unless-stopped
    # ports:
      # - "9090:9090"
    depends_on:
      - clearml-serving-statistics
    networks:
      - clearml-serving-backend

  alertmanager:
    image: prom/alertmanager:v0.23.0
    container_name: clearml-serving-alertmanager
    restart: unless-stopped
    # ports:
      # - "9093:9093"
    depends_on:
      - prometheus
      - grafana
    networks:
      - clearml-serving-backend

  grafana:
    image: grafana/grafana:8.4.4-ubuntu
    container_name: clearml-serving-grafana
    volumes:
      - './datasource.yml:/etc/grafana/provisioning/datasources/datasource.yaml'
    restart: unless-stopped
    ports:
      - "3001:3000"
    depends_on:
      - prometheus
    networks:
      - clearml-serving-backend


  clearml-serving-inference:
    image: allegroai/clearml-serving-inference:1.3.1-vllm
    build:
      context: ../
      dockerfile: clearml_serving/serving/Dockerfile
    container_name: clearml-serving-inference
    restart: unless-stopped
    # optimize perforamnce
    security_opt:
      - seccomp:unconfined
    ports:
      - "8080:8080"
    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
      CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
      CLEARML_SERVING_PORT: ${CLEARML_SERVING_PORT:-8080}
      CLEARML_SERVING_POLL_FREQ: ${CLEARML_SERVING_POLL_FREQ:-1.0}
      CLEARML_DEFAULT_BASE_SERVE_URL: ${CLEARML_DEFAULT_BASE_SERVE_URL:-http://127.0.0.1:8080/serve}
      CLEARML_DEFAULT_KAFKA_SERVE_URL: ${CLEARML_DEFAULT_KAFKA_SERVE_URL:-clearml-serving-kafka:9092}
      CLEARML_DEFAULT_TRITON_GRPC_ADDR: ${CLEARML_DEFAULT_TRITON_GRPC_ADDR:-clearml-serving-triton:8001}
      CLEARML_USE_GUNICORN: ${CLEARML_USE_GUNICORN:-}
      CLEARML_SERVING_NUM_PROCESS: ${CLEARML_SERVING_NUM_PROCESS:-}
      CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-}
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
      AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
      GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
      AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
      AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
    depends_on:
      - kafka
      - clearml-serving-triton
    networks:
      - clearml-serving-backend

  clearml-serving-triton:
    image: allegroai/clearml-serving-triton:1.3.1-vllm
    build:
      context: ../
      dockerfile: clearml_serving/engines/triton/Dockerfile.vllm
    container_name: clearml-serving-triton
    restart: unless-stopped
    # optimize perforamnce
    security_opt:
      - seccomp:unconfined
    # ports:
      # - "8001:8001"
    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
      CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
      CLEARML_TRITON_POLL_FREQ: ${CLEARML_TRITON_POLL_FREQ:-1.0}
      CLEARML_TRITON_METRIC_FREQ: ${CLEARML_TRITON_METRIC_FREQ:-1.0}
      CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-}      
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
      AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
      GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
      AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
      AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
    depends_on:
      - kafka
    networks:
      - clearml-serving-backend
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['1']
              capabilities: [gpu]

  clearml-serving-statistics:
    image: allegroai/clearml-serving-statistics:latest
    container_name: clearml-serving-statistics
    restart: unless-stopped
    # optimize perforamnce
    security_opt:
      - seccomp:unconfined
    # ports:
      # - "9999:9999"
    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
      CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
      CLEARML_DEFAULT_KAFKA_SERVE_URL: ${CLEARML_DEFAULT_KAFKA_SERVE_URL:-clearml-serving-kafka:9092}
      CLEARML_SERVING_POLL_FREQ: ${CLEARML_SERVING_POLL_FREQ:-1.0}
    depends_on:
      - kafka
    networks:
      - clearml-serving-backend


networks:
  clearml-serving-backend:
    driver: bridge

example.env

CLEARML_WEB_HOST="[REDACTED]"
CLEARML_API_HOST="[REDACTED]"
CLEARML_FILES_HOST="s3://[REDACTED]"
CLEARML_API_ACCESS_KEY="<access_key_here>"
CLEARML_API_SECRET_KEY="<secret_key_here>"
CLEARML_SERVING_TASK_ID="<serving_service_id_here>"
CLEARML_EXTRA_PYTHON_PACKAGES="boto3"
AWS_ACCESS_KEY_ID="[REDACTED]"
AWS_SECRET_ACCESS_KEY="[REDACTED]"
AWS_DEFAULT_REGION="[REDACTED]"

Dockerfile.vllm:

FROM nvcr.io/nvidia/tritonserver:24.02-vllm-python-py3


ENV LC_ALL=C.UTF-8

COPY clearml_serving /root/clearml/clearml_serving
COPY requirements.txt /root/clearml/requirements.txt
COPY README.md /root/clearml/README.md
COPY setup.py /root/clearml/setup.py

RUN python3 -m pip install --no-cache-dir -r /root/clearml/clearml_serving/engines/triton/requirements.txt
RUN python3 -m pip install --no-cache-dir -U pip -e /root/clearml/

# default serving port
EXPOSE 8001

# environement variable to load Task from CLEARML_SERVING_TASK_ID, CLEARML_SERVING_PORT

WORKDIR /root/clearml/
ENTRYPOINT ["clearml_serving/engines/triton/entrypoint.sh"]

Set Triton version

Hi, currently when I use clearml serving, when deploying serving triton service, it is always version 21.03. Is there a way that I can config or set to 21.05? I need some feature from 21.05.

serving stuck because of deleted model

My clearml serving deployment is stuck.

No models are registered,

clearml-serving --id 7303713271b941f7a0b45760d45208dd model list
clearml-serving - CLI for launching ClearML serving engine
List model serving and endpoints, control task id=7303713271b941f7a0b45760d45208dd
Info: syncing model endpoint configuration, state hash=d3290336c62c7fb0bc8eb4046b60bc7f
Endpoints:
{}
Model Monitoring:
{}
Canary:
{}

However, old models are still somehow there:

serving-task:
image

There is a leftover model that I am unable to remove:
image

Triton-Task:

2023-11-20 16:18:40
ClearML Task: created new task id=9b3460b62f9d4015890c7dd2c0064bcf
2023-11-20 15:18:40,452 - clearml.Task - INFO - No repository found, storing script code instead
ClearML results page: http://clearml-webserver:8080/projects/9b4bbac7f1c248e894793f5771005826/experiments/9b3460b62f9d4015890c7dd2c0064bcf/output/log
2023-11-20 16:18:40
configuration args: Namespace(inference_task_id=None, metric_frequency=1.0, name='triton engine', project=None, serving_id='7303713271b941f7a0b45760d45208dd', t_allow_grpc=None, t_buffer_manager_thread_count=None, t_cuda_memory_pool_byte_size=None, t_grpc_infer_allocation_pool_size=None, t_grpc_port=None, t_http_port=None, t_http_thread_count=None, t_log_verbose=None, t_min_supported_compute_capability=None, t_pinned_memory_pool_byte_size=None, update_frequency=1.0)
String Triton Helper service
{'serving_id': '7303713271b941f7a0b45760d45208dd', 'project': None, 'name': 'triton engine', 'update_frequency': 1.0, 'metric_frequency': 1.0, 'inference_task_id': None, 't_http_port': None, 't_http_thread_count': None, 't_allow_grpc': None, 't_grpc_port': None, 't_grpc_infer_allocation_pool_size': None, 't_pinned_memory_pool_byte_size': None, 't_cuda_memory_pool_byte_size': None, 't_min_supported_compute_capability': None, 't_buffer_manager_thread_count': None, 't_log_verbose': None}
Updating local model folder: /models
2023-11-20 15:18:41,106 - clearml.Model - ERROR - Action failed <400/201: models.get_by_id/v1.0 (Invalid model id (no such public or company model): id=0bbba86c98c54610a14350ba69e2e330, company=d1bd92a3b039400cbafc60a7a5b1e52b)> (model=0bbba86c98c54610a14350ba69e2e330)
2023-11-20 15:18:41,107 - clearml.Model - ERROR - Failed reloading task 0bbba86c98c54610a14350ba69e2e330
2023-11-20 15:18:41,115 - clearml.Model - ERROR - Action failed <400/201: models.get_by_id/v1.0 (Invalid model id (no such public or company model): id=0bbba86c98c54610a14350ba69e2e330, company=d1bd92a3b039400cbafc60a7a5b1e52b)> (model=0bbba86c98c54610a14350ba69e2e330)
2023-11-20 15:18:41,115 - clearml.Model - ERROR - Failed reloading task 0bbba86c98c54610a14350ba69e2e330
2023-11-20 16:18:41
Traceback (most recent call last):
  File "clearml_serving/engines/triton/triton_helper.py", line 540, in <module>
    main()
  File "clearml_serving/engines/triton/triton_helper.py", line 532, in main
    helper.maintenance_daemon(
  File "clearml_serving/engines/triton/triton_helper.py", line 237, in maintenance_daemon
    self.model_service_update_step(model_repository_folder=local_model_repo, verbose=True)
  File "clearml_serving/engines/triton/triton_helper.py", line 146, in model_service_update_step
    print("Error retrieving model ID {} []".format(model_id, model.url if model else ''))
  File "/usr/local/lib/python3.8/dist-packages/clearml/model.py", line 341, in url
    return self._get_base_model().uri
  File "/usr/local/lib/python3.8/dist-packages/clearml/backend_interface/model.py", line 496, in uri
    return self.data.uri
AttributeError: 'NoneType' object has no attribute 'uri'

How can a broken task be fixed without deploying a new serving instance?

[ Setup/examples ] Initial Installation Issues - docker compose errors

Hello clearml team,
Congrats on the release of clearml-serving V2 🎉

I really wanted to check it out, and I'm having difficulties running the basic setup and scikit-learn example commands on my side.
I want to run the Installation and the Toy model (scikit learn) deployment example

I have a self-hosted clearml Server built with the helm chart on Kubernetes.

The environment variables of clearml-serving/docker/docker-compose.yml where defined in the myexemple.env file, and starts like this :

CLEARML_WEB_HOST="<http://localhost:8080/>"
CLEARML_API_HOST="<http://localhost:8008/>"
CLEARML_FILES_HOST="<http://localhost:8081/>"

Upon running docker-compose , both clearml-serving-inference and clearml-serving-statistics return errors:

Retrying (Retry(total=236, connect=236, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4065110310>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login

I think the issue comes from the communication with the Kafka service, but I do not know how to solve this.
Has anyone encountered this issue and solved it before, since it's the default installation on the doc ?

Haven't found any related issues on any of the GitHub repos
Thanks for the help 🤖

Serving Scikit-Learn models

I couldn't find any backends and configurations that support Scikit-Learn models (eg. pickle format).

As Clearml is having integration with Scikit-Learn, there should be some options to serve the model.

Please add a workaround to support it.

Unable to canonicalize address from Kafka

Getting the following error when trying to deploy to ECS using docker-compose in the Kafka service:

Unable to canonicalize address clearml-serving-zookeeper:2181 because it's not resolvable

Wondering, why are the ports commented out in the docker-compose file?

The zookeeper service seemed to be up and running on the ECS console.

Thanks!

Error during creation endpoints with config.pbtxt

I have created endpoint like this:

clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --aux-config "./config.pbtxt"

config.pbtxt file:

name: "conformer_joint"
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
    max_queue_delay_microseconds: 100
}
input: [
    {
        name: "encoder_outputs"
        data_type: TYPE_FP32
        dims: [
            1,
            640
        ]
    },
    {
        name: "decoder_outputs"
        data_type: TYPE_FP32
        dims: [
            640,
            1
        ]
    }
]
output: [
    {
        name: "outputs"
        data_type: TYPE_FP32
        dims: [
            129
        ]
    }
]

preprocess_joint.py file:

from typing import Any, Union, Optional, Callable

class Preprocess(object):
    def __init__(self):
        # set internal state, this will be called only once. (i.e. not per request)
        pass

    def preprocess(
            self,
            body: Union[bytes, dict],
            state: dict, 
            collect_custom_statistics_fn: Optional[Callable[[dict], None]]
        ) -> Any:
        return body["encoder_outputs"], body["decoder_outputs"]

    def postprocess(
            self,
            data: Any,
            state: dict, 
            collect_custom_statistics_fn: Optional[Callable[[dict], None]]
        ) -> dict:
        return {"data":data.tolist()}

triton container and inference container show no errors, and I can find this triton model with right config.pbtxt in folder /models/conformer_joint. But when I try to make a request to model like this:

import numpy as np
import requests
body={
    "encoder_outputs": [np.random.randn(1, 640).tolist()],
    "decoder_outputs": [np.random.randn(640, 1).tolist()]
}
response = requests.post(f"<>/conformer_joint", json=body)
response.json()

I am getting an error:

Error processing request: object of type 'NoneType' has no len()

Model endpoint in serving task:

conformer_joint {
  engine_type = "triton"
  serving_url = "conformer_joint"
  model_id = "<>"
  preprocess_artifact = "py_code_conformer_joint"
  auxiliary_cfg = """name: "conformer_joint"
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
    max_queue_delay_microseconds: 100
}
input: [
    {
        name: "encoder_outputs"
        data_type: TYPE_FP32
        dims: [
            1,
            640
        ]
    },
    {
        name: "decoder_outputs"
        data_type: TYPE_FP32
        dims: [
            640,
            1
        ]
    }
]
output: [
    {
        name: "outputs"
        data_type: TYPE_FP32
        dims: [
            129
        ]
    }
]
"""
}

Error occurs in process function of TritonPreprocessRequest (https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving/preprocess_service.py#L358C9-L358C81) because function use endpoint params like input_name, input_type and input_size. When we create endpoint like above, this parameters placed in auxiliary_cfg attribute.

Is there any chance to fix that error and create endpoint like above?

Triton server keeps crashing

When i try to follow examples/pytorch the triton server is crashing, i.e. exiting with status code -6.

this is the log from the container:

I1004 17:32:10.693691 41 grpc_server.cc:4375] Started GRPCInferenceService at 0.0.0.0:8001                                                                                                               │
│ I1004 17:32:10.693968 41 http_server.cc:3075] Started HTTPService at 0.0.0.0:8000                                                                                                                        │
│ I1004 17:32:10.736035 41 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002                                                                                                                     │
│ I1004 17:34:10.746305 41 model_repository_manager.cc:994] loading: test_model_pytorch:1                                                                                                                  │
│ I1004 17:34:10.848495 41 libtorch.cc:1355] TRITONBACKEND_ModelInitialize: test_model_pytorch (version 1)                                                                                                 │
│ I1004 17:34:10.852702 41 libtorch.cc:253] Optimized execution is enabled for model instance 'test_model_pytorch'                                                                                         │
│ I1004 17:34:10.852761 41 libtorch.cc:271] Inference Mode is disabled for model instance 'test_model_pytorch'                                                                                             │
│ I1004 17:34:10.852801 41 libtorch.cc:346] NvFuser is not specified for model instance 'test_model_pytorch'                                                                                               │
│ I1004 17:34:10.856732 41 libtorch.cc:1396] TRITONBACKEND_ModelInstanceInitialize: test_model_pytorch (device 0)                                                                                          │
│ terminate called after throwing an instance of 'c10::Error'                                                                                                                                              │
│   what():  isTuple()INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h":1910, please report a bug to PyTorch. Expected Tuple but got String                                 │
│ Exception raised from toTupleRef at /opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h:1910 (most recent call first):                                                                                  │
│ frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f6caf24e11c in /opt/tritonserver/backends/pytorch/libc10.so │
│ frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7f6caf22bcb4 in /opt/tri │
│ frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x53 (0x7f │
│ frame #3: <unknown function> + 0x368a57a (0x7f6cf239657a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                          │
│ frame #4: <unknown function> + 0x368a6e9 (0x7f6cf23966e9 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                          │
│ frame #5: torch::jit::SourceRange::highlight(std::ostream&) const + 0x48 (0x7f6cefe48678 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                          │
│ frame #6: torch::jit::ErrorReport::what() const + 0x2c3 (0x7f6cefe2eeb3 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                           │
│ frame #7: <unknown function> + 0x102b9 (0x7f6cf91f92b9 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                       │
│ frame #8: <unknown function> + 0x1d4d2 (0x7f6cf92064d2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                       │
│ frame #9: <unknown function> + 0x1d9f2 (0x7f6cf92069f2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                       │
│ frame #10: TRITONBACKEND_ModelInstanceInitialize + 0x374 (0x7f6cf9206db4 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                     │
│ frame #11: <unknown function> + 0x307dee (0x7f6cfb143dee in /opt/tritonserver/bin/../lib/libtritonserver.so)
│ frame #12: <unknown function> + 0x3093b3 (0x7f6cfb1453b3 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                             │
│ frame #13: <unknown function> + 0x301067 (0x7f6cfb13d067 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                             │
│ frame #14: <unknown function> + 0x18a7ca (0x7f6cfafc67ca in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                             │
│ frame #15: <unknown function> + 0x1979b1 (0x7f6cfafd39b1 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                             │
│ frame #16: <unknown function> + 0xd6de4 (0x7f6cfa991de4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)                                                                                                     │
│ frame #17: <unknown function> + 0x9609 (0x7f6cfae0f609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)                                                                                                     │
│ frame #18: clone + 0x43 (0x7f6cfa67f293 in /usr/lib/x86_64-linux-gnu/libc.so.6)                                                                                                                          │
│                                                                                                                                                                                                          │
│ Signal (6) received.                                                                                                                                                                                     │
│  0# 0x000055E2DF079299 in tritonserver                                                                                                                                                                   │
│  1# 0x00007F6CFA5A3210 in /usr/lib/x86_64-linux-gnu/libc.so.6                                                                                                                                            │
│  2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6                                                                                                                                                       │
│  3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6                                                                                                                                                         │
│  4# 0x00007F6CFA959911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                       │
│  5# 0x00007F6CFA96538C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                       │
│  6# 0x00007F6CFA964369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                       │
│  7# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                     │
│  8# 0x00007F6CFA761BEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1                                                                                                                                        │
│  9# _Unwind_Resume in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1                                                                                                                                            │
│ 10# 0x00007F6CEFA61C49 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so                                                                                                                             │
│ 11# 0x00007F6CF23966E9 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so
12# torch::jit::SourceRange::highlight(std::ostream&) const in /opt/tritonserver/backends/pytorch/libtorch_cpu.so                                                                                        │
│ 13# torch::jit::ErrorReport::what() const in /opt/tritonserver/backends/pytorch/libtorch_cpu.so                                                                                                          │
│ 14# 0x00007F6CF91F92B9 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                                                                                                                        │
│ 15# 0x00007F6CF92064D2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                                                                                                                        │
│ 16# 0x00007F6CF92069F2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                                                                                                                        │
│ 17# TRITONBACKEND_ModelInstanceInitialize in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                                                                                                     │
│ 18# 0x00007F6CFB143DEE in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 19# 0x00007F6CFB1453B3 in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 20# 0x00007F6CFB13D067 in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 21# 0x00007F6CFAFC67CA in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 22# 0x00007F6CFAFD39B1 in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 23# 0x00007F6CFA991DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                       │
│ 24# 0x00007F6CFAE0F609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0                                                                                                                                      │
│ 25# clone in /usr/lib/x86_64-linux-gnu/libc.so.6                                                                                                                                                         │
│                                                                                                                                                                                                          │
│ configuration args: Namespace(inference_task_id=None, metric_frequency=1.0, name='triton engine', project=None, serving_id='dd756abf5e8b42efab92dfb0cfa57a5e', t_allow_grpc=None, t_buffer_manager_threa │
│ String Triton Helper service                                                                                                                                                                             │
│ {'serving_id': 'dd756abf5e8b42efab92dfb0cfa57a5e', 'project': None, 'name': 'triton engine', 'update_frequency': 1.0, 'metric_frequency': 1.0, 'inference_task_id': None, 't_http_port': None, 't_http_t │
│                                                                                                                                                                                                          │
│ Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']   │
│ Info: syncing models from main serving service                                                                                                                                                           │
│ reporting metrics: relative time 60 sec                                                                                                                                                                  │
│ Info: syncing models from main serving service                                                                                                                                                           │
│ Updating local model folder: /models                                                                                                                                                                     │
│ INFO: target config.pbtxt file for endpoint 'test_model_pytorch':                                                                                                                                        │
│                                                                                                                                                                                                          │
│ input: [{                                                                                                                                                                                                │
│     dims: [1, 28, 28]                                                                                                                                                                                    │
│     data_type: TYPE_FP32                                                                                                                                                                                 │
│     name: "INPUT__0"                                                                                                                                                                                     │
│   }]                                                                                                                                                                                                     │
│ output: [{                                                                                                                                                                                               │
│     dims: [-1, 10]
    data_type: TYPE_FP32                                                                                                                                                                                 │
│     name: "OUTPUT__0"                                                                                                                                                                                    │
│   }]                                                                                                                                                                                                     │
│ backend: "pytorch"                                                                                                                                                                                       │
│                                                                                                                                                                                                          │
│ Update model v1 in /models/test_model_pytorch/1                                                                                                                                                          │
│ Info: Models updated from main serving service                                                                                                                                                           │
│ reporting metrics: relative time 120 sec                                                                                                                                                                 │
│ Traceback (most recent call last):                                                                                                                                                                       │
│   File "clearml_serving/engines/triton/triton_helper.py", line 515, in <module>                                                                                                                          │
│     main()                                                                                                                                                                                               │
│   File "clearml_serving/engines/triton/triton_helper.py", line 507, in main                                                                                                                              │
│     helper.maintenance_daemon(                                                                                                                                                                           │
│   File "clearml_serving/engines/triton/triton_helper.py", line 248, in maintenance_daemon                                                                                                                │
│     raise ValueError("triton-server process ended with error code {}".format(error_code))                                                                                                                │
│ ValueError: triton-server process ended with error code -6                                                                                                                                               │
│ Stream closed EOF for clearml-serving/clearml-serving-triton-85779b957d-hdx7q (clearml-serving-triton)

clearml-serving not working with newer numpy 1.24

I am unable to use clearml-serving for model deployment on my setup.

OS: Ubuntu 22.04 Server LTS
Python: 3.10.6

Steps:

  1. pip install clearml-serving
  2. clearml-serving create --name "serving example"

I get the following error:

Traceback (most recent call last):
  File "/home/user_65s/.local/bin/clearml-serving", line 5, in <module>
    from clearml_serving.__main__ import main
  File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/__main__.py", line 9, in <module>
    from clearml_serving.serving.model_request_processor import ModelRequestProcessor, CanaryEP
  File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/model_request_processor.py", line 18, in <module>
    from .preprocess_service import BasePreprocessRequest
  File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/preprocess_service.py", line 247, in <module>
    class TritonPreprocessRequest(BasePreprocessRequest):
  File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/preprocess_service.py", line 253, in TritonPreprocessRequest
    np.int: 'int_contents',
  File "/home/user_65s/.local/lib/python3.10/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'int'. Did you mean: 'inf'?

It appears you are using np.int internally which has been deprecated since numpy 1.20:

1: DeprecationWarning: np.int is a deprecated alias for the builtin int. To silence this warning, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

When I downgrade to numpy==1.23.* it works.

better explanation of the scalar type and buckets

Hi there, today I struggled with the (--variable-scalar) argument and the buckets in the clearml-serving metrics add command.

I think the documentation could be improved.

I already got help in the ClearML Slack:

A scalar in buckets is simply a histogram. Because if you have 1000s of requests per second, it makes no sense to display every data point. So scalars can be divided into buckets and for each minute, for example, we can calculate how much % of total traffic fell in bucket 1, bucket 2, bucket 3, etc. Then we display this histogram as a single column in a heatmap. Y axis is the buckets, color is the value ~ % of traffic in that bucket, and X is time.

torchserve support?

Hello, I see TorchServe engine support mentioned in the Readme but cannot find any way to actually use it. Is it available?

add aux-config parser error

Hello,
I'm trying to add a onnx model and specify the platform on clearml-serving model add aux-config but i have a parser error,
Screenshot 2022-06-14 at 17 12 52
on the file triton_helper.py the parser expect an int,

Screenshot 2022-06-14 at 17 06 46

thanks for your work on clearml-serving 🙌

Triton Engine did not auto update to new model after retraining

I saw this line in the readme:
"Notice: If we re-run our keras training example and publish a new model in the repository, the engine will automatically update to the new model."

I have a created a serving service. However, when i retrain my model with same project and task name, and publish the model once training is done, the model version deployed in triton is not updated.

May I know if this is a bug or did I misunderstand some steps?

Removing model monitoring in endpoint

From the docs, i can see that there are commands to add model to an endpoint, and also add a model monitoring via the auto-update command. I can't seem to find any commands to remove the model monitoring. I can only do model removal

image

Is there no such capabilities for now, or is the doc just not updated?

Triton docker container failed to start due to unknown error

The inference task was successfully created after launching the serving services (clearml-serving launch --queue default).

However, it seems that the nvdia container failed to start with the following error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

Using a local worker without GPU attached to the public hosting clearml server.

ClearML serving design v2

ClearML serving design document v2.0

Goal: Create a simple interface to serve multiple models with scalable serving engines on top of Kubernetes

Design Diagram (edit here)
Untitled-2022-02-01

Features

  • Fully continuous model upgrade/configuration capabilities
  • Separate pre/post processing from model inference (serving engine)
  • Support custom python script per endpoint (pre/post processing)
  • Support multiple model inference serving engine instances
  • Support A/B/Canary testing per Endpoint (i.e. test new versions of the model with probability distribution)
  • Support model monitoring functions
  • Support for 3rd party monitoring plugins
  • Abstract Serving Engine interface
  • REST API with serving engine
  • gRPC interface between pre-processing python code and model inference
    • More efficient encoding than json encode/decode (both compute and network)
  • Performance (i.e. global latency / throughput and model inference latency) logging
    • Optional custom metric reporting
  • Standalone setup for debugging
    • Pre-process (proxy) code (running on host machine) (launching the “Model Inference”)
    • Model inference (serving engine) inside local container
  • Deployment support for Kubernetes
    • Proxy container (with pre-processing code) has kubectl control
    • Serving engine container (model inference) launched by the proxy container
  • Autoscaling inference model engines based on latency

Modules

  • ClearML serving container
    • Singleton instance, acting as the proxy & load balancer
  • ClearML serving Task
    • Stores configuration of a single instance of the Serving container
      • 3rd party plugins
      • Kubernetes config
      • Serving Engine configuration
      • Models / Endpoints
  • Serving Engine
    • Standalone container interacting with the ClearML serving instance
    • ClearML Sidecar configuring the Serving Engine (real-time) & sending reports back
  • ClearML model repository
    • Unique ID per model
    • Links to model files
    • Links to modler pre/post processing code base (git)
    • Supports Tags / Name
    • General purpose key/value meta-data
    • Queryable
  • Configuration CLI
    • Build containers
    • Configure serving system

Usage Example

  • CLI configuring the ClearML serving Task
    • Select initial set of models / endpoints (i.e. endpoint for specific model)
    • Set Kubernetes pod template YAML
      • Job YAML to be used for launching the serving engine container
  • CLI build Kubernetes Job YAML
    • Build the Kubernetes Job YAML to be used to launch the ClearML serving container
    • Add necessary credentials making sure the “ClearML serving container” will be able to launch serving containers
  • Kubectl launching the “ClearML serving container”
    • The “ClearML serving container” will be launching the serving engine containers
  • Once “ClearML serving container” is up, logs are monitored in the ClearML UI
  • Add additional models to a running “ClearML serving container”
    • Provide the “ClearML serving Task”
    • Add/Remove new model UID

Incorrect shape size PyTorch

Was following the tutorial for PyTorch, was able to create an endpoint successfully, but wasn't able to get an inference result (using both ways) due to a shape mismatch error.

unexpected shape for input 'INPUT__0' for model 'test_model_pytorch'. Expected [1,28,28], got [1,784]

I then tried setting the INPUT__0 shape to 1 784 but that didn't work either. Then realized that the preprocess function in preprocess.py flattens the data before returning it which was causing the error. Removing the flatten() resolved my issue.

This also seems to be the case for the keras example.

Serving autoscaling strategy

Hello, ClearML team!

I'm trying to understand how serving auto-scaling works.

From readme:

Scalable
Multi model per container
Multi models per serving service
Multi-service support (fully seperated multiple serving service running independently)
Multi cluster support
Out-of-the-box node auto-scaling based on load/usage <---- *

I found that serving has ability to auto-scale, but in helm charts (triton for example) I only found replicas: 1 and didn't find an auto-scale implementation anywhere (like this https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/, for example).

Could you please clarify Clearml-serving scaling strategy and where can I find configuration files?

Thanks in advance.

Managing published versions

Currently once published, the model status remains as "published" and Triton will only use the latest "published" model and unload the previous version.

But this unloading of model did not align with the "published" state and can be confusing.
May I suggest to expand the function with unpublish so we can explicitly unload model versions in Triton.
This will also allow multiple versions (published) of the models to be available in Triton.

Triton inference server fails to load checkpointed PyTorch Ignite model

The Triton server is now able to find the local copy of the model weight pt file and attempts to serve it, following fixes in #3.

The following error occurs when the model is served by the Triton Inference server:

Starting Task Execution:

clearml-serving - Nvidia Triton Engine Helper
ClearML results page: https://clearml-server.westeurope.cloudapp.azure.com/projects/779be4f4d83541d786eb839bb062fa93/experiments/364c73e36a454842a314169d78514034/output/log
String Triton Helper service
{'serving_id': 'b978817fa0544b94b2015b420a96f14c', 'project': 'serving', 'name': 'nvidia-triton', 'update_frequency': 10, 'metric_frequency': 1, 't_http_port': None, 't_http_thread_count': None, 't_allow_grpc': None, 't_grpc_port': None, 't_grpc_infer_allocation_pool_size': None, 't_pinned_memory_pool_byte_size': None, 't_cuda_memory_pool_byte_size': None, 't_min_supported_compute_capability': None, 't_buffer_manager_thread_count': None}

Updating local model folder: /models
[INFO]:: URL: cub200_resnet34 Endpoint: ServingService.EndPoint(serving_url='cub200_resnet34', model_ids=['57ed24c1011346d292ecc9e797ccb47e'], model_project=None, model_name=None, model_tags=None, model_config_blob='\n            platform: "pytorch_libtorch"\n            input [\n                {\n                    name: "input_layer"\n                    data_type: TYPE_FP32\n                    dims: [ 3, 224, 224 ]\n                }\n            ]\n            output [\n                {\n                    name: "fc"\n                    data_type: TYPE_FP32\n                    dims: [ 200 ]\n                }\n            ]\n        ', max_num_revisions=None, versions=OrderedDict())
[INFO]:: Model ID: 57ed24c1011346d292ecc9e797ccb47e Version: 1
[INFO]:: Model ID: 57ed24c1011346d292ecc9e797ccb47e Model URL: azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,447 - clearml.storage - INFO - Downloading: 5.00MB / 81.72MB &#64; 18.80MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,730 - clearml.storage - INFO - Downloading: 13.00MB / 81.72MB &#64; 28.29MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,741 - clearml.storage - INFO - Downloading: 21.00MB / 81.72MB &#64; 684.91MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,760 - clearml.storage - INFO - Downloading: 29.00MB / 81.72MB &#64; 426.19MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,791 - clearml.storage - INFO - Downloading: 37.00MB / 81.72MB &#64; 258.86MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,806 - clearml.storage - INFO - Downloading: 45.00MB / 81.72MB &#64; 535.17MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,907 - clearml.storage - INFO - Downloading: 53.00MB / 81.72MB &#64; 79.03MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,963 - clearml.storage - INFO - Downloading: 61.72MB / 81.72MB &#64; 155.64MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,968 - clearml.storage - INFO - Downloading: 69.72MB / 81.72MB &#64; 1502.19MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,979 - clearml.storage - INFO - Downloading: 77.72MB / 81.72MB &#64; 790.76MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,985 - clearml.storage - INFO - Downloaded 81.72 MB successfully from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt , saved to /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] Local path to the model: /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
Update model v1 in /models/cub200_resnet34/1
[INFO] Target Path:: /models/cub200_resnet34/1/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] Local Path:: /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] New Target Path:: /models/cub200_resnet34/1/model.pt
Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=600.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
I0610 15:20:55.182775 671 metrics.cc:221] Collecting metrics for GPU 0: Tesla P40
I0610 15:20:55.498654 671 libtorch.cc:940] TRITONBACKEND_Initialize: pytorch
I0610 15:20:55.498688 671 libtorch.cc:950] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.498699 671 libtorch.cc:956] 'pytorch' TRITONBACKEND API version: 1.0
2021-06-10 15:20:55.688775: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0610 15:20:55.729429 671 tensorflow.cc:1880] TRITONBACKEND_Initialize: tensorflow
I0610 15:20:55.729458 671 tensorflow.cc:1890] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.729464 671 tensorflow.cc:1896] 'tensorflow' TRITONBACKEND API version: 1.0
I0610 15:20:55.729473 671 tensorflow.cc:1920] backend configuration:
{}
I0610 15:20:55.731061 671 onnxruntime.cc:1728] TRITONBACKEND_Initialize: onnxruntime
I0610 15:20:55.731085 671 onnxruntime.cc:1738] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.731095 671 onnxruntime.cc:1744] 'onnxruntime' TRITONBACKEND API version: 1.0
I0610 15:20:55.756821 671 openvino.cc:1166] TRITONBACKEND_Initialize: openvino
I0610 15:20:55.756848 671 openvino.cc:1176] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.756854 671 openvino.cc:1182] 'openvino' TRITONBACKEND API version: 1.0
I0610 15:20:56.081773 671 pinned_memory_manager.cc:205] Pinned memory pool is created at '0x7f229c000000' with size 268435456
I0610 15:20:56.082099 671 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 67108864
I0610 15:20:56.083854 671 model_repository_manager.cc:1065] loading: cub200_resnet34:1
I0610 15:20:56.184287 671 libtorch.cc:989] TRITONBACKEND_ModelInitialize: cub200_resnet34 (version 1)
I0610 15:20:56.185272 671 libtorch.cc:1030] TRITONBACKEND_ModelInstanceInitialize: cub200_resnet34 (device 0)

1623338462128 ecm-clearml-compute-gpu-002:gpuall DEBUG I0610 15:20:59.633139 671 libtorch.cc:1063] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0610 15:20:59.633184 671 libtorch.cc:1012] TRITONBACKEND_ModelFinalize: delete model state
E0610 15:20:59.633206 671 model_repository_manager.cc:1242] failed to load 'cub200_resnet34' version 1: Internal: failed to load model 'cub200_resnet34': [enforce fail at inline_container.cc:227] . file not found: archive/constants.pkl
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x68 (0x7f23c6279498 in /opt/tritonserver/backends/pytorch/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::getRecordID(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xda (0x7f23a1a23d4a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #2: caffe2::serialize::PyTorchStreamReader::getRecord(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x38 (0x7f23a1a23da8 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #3: torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&) + 0xab (0x7f23a323508b in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #4: <unknown function> + 0x3c035e5 (0x7f23a32355e5 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #5: <unknown function> + 0x3c05fd0 (0x7f23a3237fd0 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #6: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x1ab (0x7f23a32391eb in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #7: torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xc2 (0x7f23a323b332 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #8: torch::jit::load(std::istream&, c10::optional<c10::Device>) + 0x6a (0x7f23a323b41a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #9: <unknown function> + 0x104a6 (0x7f23c67d44a6 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #10: <unknown function> + 0x12ac4 (0x7f23c67d6ac4 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #11: <unknown function> + 0x13772 (0x7f23c67d7772 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #12: TRITONBACKEND_ModelInstanceInitialize + 0x374 (0x7f23c67d7b34 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #13: <unknown function> + 0x2f8a99 (0x7f24104a8a99 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #14: <unknown function> + 0x2f927c (0x7f24104a927c in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #15: <unknown function> + 0x2f77ec (0x7f24104a77ec in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #16: <unknown function> + 0x183c00 (0x7f2410333c00 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #17: <unknown function> + 0x191581 (0x7f2410341581 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #18: <unknown function> + 0xd6d84 (0x7f240fcead84 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #19: <unknown function> + 0x9609 (0x7f2410185609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #20: clone + 0x43 (0x7f240f9d8293 in /lib/x86_64-linux-gnu/libc.so.6)

I0610 15:20:59.633540 671 server.cc:500] 
+-----------------...</char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char>

Originally posted by @ecm200 in #3 (comment)

Configure options for gRPC (Triton server)

Hello! I am trying to play around with the configs for gRPC for the triton server.

I’m using the docker-compose setup, so not sure if the CLI will work for my usecase (perhaps passing them as env variables would work?)

For instance, I’d like to set some variables like this:

('grpc.max_send_message_length', 512 * 1024 * 1024), ('grpc.max_receive_message_length', 512 * 1024 * 1024)]

Is this possible currently? I’m getting an error from gRPC that my payload is more than the limit (8MB instead of 4MB…)

#2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.