kserve / modelmesh-runtime-adapter Goto Github PK

Unified runtime-adapter image of the sidecar containers which run in the modelmesh pods

License: Apache License 2.0

Dockerfile 1.74% Makefile 1.10% Go 93.26% Shell 3.61% Python 0.29%

modelmesh-runtime-adapter's Introduction

KServe

KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative machine learning (ML) models. It aims to solve production model serving use cases by providing high abstraction interfaces for Tensorflow, XGBoost, ScikitLearn, PyTorch, Huggingface Transformer/LLM models using standardized data plane protocols.

It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KServe is being used across various organizations.

For more details, visit the KServe website.

KFServing has been rebranded to KServe since v0.7.

Why KServe?

KServe is a standard, cloud agnostic Model Inference Platform for serving predictive and generative AI models on Kubernetes, built for highly scalable use cases.
Provides performant, standardized inference protocol across ML frameworks including OpenAI specification for generative models.
Support modern serverless inference workload with request based autoscaling including scale-to-zero on CPU and GPU.
Provides high scalability, density packing and intelligent routing using ModelMesh.
Simple and pluggable production serving for inference, pre/post processing, monitoring and explainability.
Advanced deployments for canary rollout, pipeline, ensembles with InferenceGraph.

Learn More

To learn more about KServe, how to use various supported features, and how to participate in the KServe community, please follow the KServe website documentation. Additionally, we have compiled a list of presentations and demos to dive through various details.

🛠️ Installation

Standalone Installation

Serverless Installation: KServe by default installs Knative for serverless deployment for InferenceService.
Raw Deployment Installation: Compared to Serverless Installation, this is a more lightweight installation. However, this option does not support canary deployment and request based autoscaling with scale-to-zero.
ModelMesh Installation: You can optionally install ModelMesh to enable high-scale, high-density and frequently-changing model serving use cases.
Quick Installation: Install KServe on your local machine.

Kubeflow Installation

KServe is an important addon component of Kubeflow, please learn more from the Kubeflow KServe documentation. Check out the following guides for running on AWS or on OpenShift Container Platform.

🛫 Create your first InferenceService

💡 Roadmap

📘 InferenceService API Reference

🧰 Developer Guide

✍️ Contributor Guide

🤝 Adopters

modelmesh-runtime-adapter's People

Contributors

Stargazers

Watchers

modelmesh-runtime-adapter's Issues

v0.11.0: Model directory creation and loading error

Model loading fails with the v0.11 images where the model uses a HTTP protocol storageUri:

Error message:
Failed to create model directory and load model {"modelId": $model_name, "error": "Error calling stat on /models/$model_name__isvc-xxx/. : start /models/$model_name/... not a directory"

Downgrading to the v0.10 image eliminates the problem.

Make storage provider download concurrency setting configurable

Currently the GCS and S3 storage providers have hard-coded concurrency settings when downloading assets. Making these configurable through an environment variable would give more flexibility.

Follow up of doc strings pattern

From https://github.com/kserve/modelmesh-runtime-adapter/pull/68/files#r1403679093:

If we update the function doc strings, we should make sure they read correctly, i.e. lower case the word "Returns" here. Same for the changed doc strings below.

As you mentioned, we could do a sweep in a separate PR, to fix the doc comment grammar, but I don't think that would get picked up since there is always more important work to do :-)

I would either leave these doc changes out of this PR, so at least they read well. Or, in this PR, fix the grammar to bring it up to style as in the examples here: https://www.digitalocean.com/community/tutorials/how-to-write-comments-in-go#doc-comments

[Feature Request] Reimplement Load Model of Triton and MLServer

Good afternoon,

Thank you very much for creating this amazing framework.

I have seen a potential very good feature when doing inference with GPU models. I have seen that the implementation of triton and mlserver adapters use the following method: CalcMemCapacity to return model size.

This method returns model size based on disk size. However, for models executed in GPU it would be better to return the increase in VRAM. Do you think is doable? @tjohnson31415 @rafvasq @njhill @pvaneck

I am glad to help if you think is doable, but I don't have experience in GO, but I can learn

MLServer model-settings.json uri conversion

I am experiencing some strange behaviour regarding the uri conversion of the parameters.uri field in the model-settings.json file. I have created a model-settings.json file that is placed in the same S3 folder as my model. In most cases, the uri conversion works as expected.

For example, with my model filename as my-model.joblib, I provided the following model-settings.json file

{
  "implementation": "mlserver_sklearn.SKLearnModel",
  "name": "sklearn-example",
  "parameters": {
    "uri": "my-model.joblib"
  }
}

and the name and parameters.uri fields were correctly overwritten into these values when I checked the file from within the serving runtime pod

{
  "implementation": "mlserver_sklearn.SKLearnModel",
  "name": "sklearn-mnist-2__isvc-a6929dd134",
  "parameters": {
    "uri": "/models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/my-model.joblib"
  }
}

However, the peculiar thing is that if I were to name my model file as mnist.joblib instead and change my model-settings.json to the following,

{
  "implementation": "mlserver_sklearn.SKLearnModel",
  "name": "sklearn-example",
  "parameters": {
    "uri": "mnist.joblib"
  }
}

then my model would fail to load due to this error in the mlserver container:

2023-03-27 09:46:20,423 [mlserver.grpc] ERROR - Invalid URI specified for model sklearn-mnist-2__isvc-a6929dd134 (/models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/models/sklearn-mnist-2__isvc-a6929dd134/sklearn-model/mnist.joblib)
Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/utils.py", line 44, in _inner
    return await f(self, request, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/model_repository.py", line 24, in _inner
    return await f(self, request, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/grpc/model_repository.py", line 48, in RepositoryModelLoad
    await self._handlers.load(request.model_name)
  File "/venv/lib/python3.11/site-packages/mlserver/handlers/model_repository.py", line 67, in load
    model = await self._model_registry.load(model_settings)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 283, in load
    return await self._models[model_settings.name].load(model_settings)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 141, in load
    await self._load_model(new_model)
  File "/venv/lib/python3.11/site-packages/mlserver/registry.py", line 158, in _load_model
    await model.load()
  File "/venv/lib/python3.11/site-packages/mlserver_sklearn/sklearn.py", line 34, in load
    model_uri = await get_model_uri(
                ^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/mlserver/utils.py", line 44, in get_model_uri
    raise InvalidModelURI(settings.name, full_model_path)
mlserver.errors.InvalidModelURI: Invalid URI specified for model sklearn-mnist-2__isvc-a6929dd134 (/models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/models/sklearn-mnist-2__isvc-a6929dd134/sklearn-model/mnist.joblib)

I have checked the model-settings.json file in the serving runtime pod and it appears that the parameters.uri value has indeed been set to /models/_mlserver_models/sklearn-mnist-2__isvc-a6929dd134/models/sklearn-mnist-2__isvc-a6929dd134/sklearn-model/mnist.joblib.

I am currently using MLServer v1.2.3 and kserve v0.9.0.

May I know if anyone is able to reproduce this? If so, are there any clues as to what the issue might be? And if not, what might I be doing wrong that is causing this error?

Azure Blob Storage support

We should add Azure Blob storage provider support for pullman.

This will help fill the storage provider gaps between KServe and ModelMesh.

Some relevant files regarding KServe's Azure provider implementation:

https://github.com/kserve/kserve/tree/master/docs/samples/storage/azure
https://github.com/kserve/kserve/blob/3ac7982bea93dfb3519ef1cd562497d170f1e3ed/python/kserve/kserve/storage.py#L216
https://github.com/kserve/kserve/blob/3ac7982bea93dfb3519ef1cd562497d170f1e3ed/pkg/credentials/azure/azure_secret.go

Reduce size of runtime-adapter image (exclude Python/tensorflow to convert keras models)

The current image weight is very high (2.14Gb) which slows down the predictor's uptime.

Correct me if I'm wrong please, but the only reason the adapter needs to install tensorflow is to convert keras models to tensorflow models, which sounds weird to do it on runtime and not in advance, see

modelmesh-runtime-adapter/model-mesh-triton-adapter/server/utils.go

Lines 63 to 64 in f9781d2

    
           func convertKerasToTF(kerasFile string, targetPath string, ctx context.Context, loggr logr.Logger) error { 
        
           	cmd := exec.Command("python", "/opt/scripts/tf_pb.py", kerasFile, targetPath)

modelmesh-runtime-adapter/Dockerfile

Line 145 in f9781d2

# install python to convert keras to tf

modelmesh-runtime-adapter/Dockerfile

Line 164 in f9781d2

pip install tensorflow

modelmesh-runtime-adapter/Dockerfile

Line 172 in f9781d2

    
           COPY --from=build /opt/app/model-mesh-triton-adapter/scripts/tf_pb.py /opt/scripts/

If we remove this option, we can remove the tensorflow installation, and since python is needed only for that, removing the entire python installation.
This reduces the image size from 2.14 GB to 256Mb.

Can we just remove it? If not, can we have two images, the original one and a new slim one?

S3 downloader has a hardcoded limit of 100 files

Currently the S3 downloader has a hardcoded limit of 100 files (see: https://github.com/kserve/modelmesh-runtime-adapter/blob/2d5bb69e9ed19efd74fbe6f8b76ec2e970702e3c/pullman/storageproviders/s3/downloader.go#L79C3-L79C27).

This means that any model that contains more than 100 files gets cut off at that arbitrary file and causes the model to only be partially copied and then subsequently fails at runtime. For example, when using a model like the argos-translate model with many languages you can exceed the 100 file limit.

1.7132796006805687e+09	DEBUG	Triton Adapter.Triton Adapter Server	found objects to download	{"type": "s3", "cacheKey": "s3|xyz", "path": "translate", "count": 100}

This limit seems arbitrary and should be made configurable.

Use the addHeader tool to standardize the license headers

The goal of this issue is to add a new step to take care of the license headers as part of the go fmt make goal to keep it consistent across all files it is present.

This new step will have:

a standalone make goal: make add-header
will be called as part of the make fmt

Pre requisites:

the headers should not have tabs, only spaces.

Tool to use:

go install github.com/google/addlicense@latest

Makefile call:

.PHONY: addheaders
addheaders:
	./scripts/addheaders.sh

Feature request: support IAM Roles for Service Accounts

Is your feature request related to a problem? If so, please describe.

Currently, the S3 storage provider uses static credentials pulled from Kubernetes config. This works great for on-premise Kubernetes clusters, but for cluster admins running in AWS EKS the preferred method of authenticating to AWS APIs is IAM Roles for Service Accounts (IRSA). The kserve controller added support for IRSA in 2021 for non-ModelMesh InferenceServices; it would be great if we could use IRSA with ModelMesh as well.

Describe your proposed solution

What I would propose is instead of failing early when the access_key_id is missing, simply allow the S3 client to infer credentials from the environment using the default credential provider chain if the creds are missing from the config.

Describe alternatives you have considered

For now we will use static credentials since they are the only option. We also considered using a local minio instance as a proxy to S3 since minio does support IRSA, but we decided that was too complex.

Additional context

For context, see this user guide from AWS.

Runtime adapter puller incorrectly estimates model size using PVC

Hi team,

Great work with the PVC integration to modelmesh. It is highly valuable to us and we managed to configure it.

However, it seems that the model size calculation (which heuristically relies on the model size on disk) is false, even though it used to work when we used s3 storage.
The model size was 498706131 bytes using s3, and now it is only 52 using PVC.

We suspect that the puller is estimating the symlink size and not the actual model file size.
The puller is using filepath.Walk, which according to the docs, does not follow symbolic links.

Adapter logs:

INFO Triton Adapter Starting Triton Adapter Server 
{
	"adapter_config": {
		"Port": 8085,
		"TritonPort": 8001,
		"TritonContainerMemReqBytes": 16106127360,
		"TritonMemBufferBytes": 1073741824,
		"CapacityInBytes": 15032385536,
		"MaxLoadingConcurrency": 10,
		"ModelLoadingTimeoutMS": 90000,
		"DefaultModelSizeInBytes": 536870912,
		"ModelSizeMultiplier": 3,
		"RuntimeVersion": "23.06-py3",
		"LimitModelConcurrency": 0,
		"RootModelDir": "/models/_triton_models",
		"UseEmbeddedPuller": true
	}
}

DEBUG Triton Adapter.Triton Adapter Server The PVC directory is set {
	"type": "pvc",
	"cacheKey": "pvc|",
	"pvcDir": "/pvc_mounts/model-store-claim"
}


DEBUG Triton Adapter.Triton Adapter Server The model path is set 
{
	"type": "pvc",
	"cacheKey": "pvc|",
	"fullModelPath": "/pvc_mounts/model-store-claim/[model-path]/[model-name]"
}


INFO	Triton Adapter.Triton Adapter Server.Load Model	Setting 'SizeInBytes' to a multiple of model disk size	
{
	"model_id": "onnx_v1_6534700162730997825",
	"SizeInBytes": 156,
	"disk_size": 52,
	"multiplier": 3
}

incorrect download for triton model with non-empty config.pbtxt

modelmesh-serving's model-types/advanced-configuration.md shows this is suppose to work as expected, but doesn't seem to for me.

The Problem

For Triton, a model directory consists of atleast 2 files (in most cases) that need to be pulled.

models
└── coco-onnx
    ├── 1
    │   └── model.onnx
    └── config.pbtxt

Here, if we provide Predictor's storage.path: 's3://models/coco-onnx' it would puts the coco-onnx model inside modelId dir

/models                                             #<-- model root
└── yolo-onnx__ksp-f3a4aa93aa                       #<-- modelId
    └── coco-onnx
        ├── 1
        │   └── model.onnx
        └── config.pbtxt

which the triton cannot load, since it expects it to be

/models
└── yolo-onnx__ksp-f3a4aa93aa
    ├── 1
    │   └── model.onnx
    └── config.pbtxt

instead.

Expected Solution

When the storage.path is pointing to a model folder, that has config.pbtxt in it, the contents of it should be copied inside <modelId> folder (eg. yolo-onnx__ksp-f3a4aa93aa) instead of the <modelId>/<modelName> folder.

config.pbtxt used

name: "coco-onnx"
max_batch_size:4
optimization { execution_accelerators {
  gpu_execution_accelerator : [ {
    name : "tensorrt"
    parameters { key: "precision_mode" value: "FP32" }
    parameters { key: "max_workspace_size_bytes" value: "1073741824" }}
  ]
}}
input [
  {
    name: "input"
    data_type: TYPE_UINT8
    dims: [ 3 , -1, -1]
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ -1, -1 ]
  },
]

Predictor definition used

kubectl apply -f - <<EOF
apiVersion: serving.kserve.io/v1alpha1
kind: Predictor
metadata:
  name: yolo-onnx
spec:
  modelType:
    name: custom                   #<-- custom since I was trying to see if putting onnx was the issue. neither works
  runtime:
    name: triton-2.x
  path: triton-yolo-models/coco-onnx
  storage:
    s3:
      secretKey: localMinIO
      bucket: triton-models
EOF

Update go-toolset from 1.17 to 1.19

For this task, there would be need some changes which are:

bump go-toolset to 1.19
update the golangci-lint from v1.43.0 to v1.51.1
- align the deprecated linters
ioutil io calls deprecated, moved to os package.
- update required on the methods' signature that has changed as well:
  - ioutil.RedDir returns []ioutil.FileInfo while the os.ReadDir returns []os.DirEntry
go 1.19 compiler and linter complains about:
- package modelmesh-runtime-adapter/model-serving-puller/server
  server/server.go:23:2: use of internal package github.com/kserve/modelmesh-runtime-adapter/internal/util not allowed

For this internal module a change will be required.

leaving a question here:
Should it be renamed to something else? If so, ideas?
If we shouldn't be removing it, what are the other options?

Compatibility matrix of runtime adapters and serving runtimes

Hi ModelMesh team,

I found it here that the model management api is not finalized:

Note that this is currently subject to change, but we will try to ensure that any changes are backwards-compatible or at least will require minimal change on the runtime side.

Currently modelmesh supports multiple servingruntimes, I want to check if there is any compatibility matrix (similar here) available for the user to check which versions are supported in modelmesh for each runtime, if not, is it possible to provide one, thanks.

Support post predictor deployment hook

This requirement arises from my need for deploying ensemble models in triton.
Modelmesh's triton_runtime_adapter supports deploying models with config.pbtxt that should allow deploying python models.
@tjohnson31415 did a great job of explaining how model-mesh-triton-adapter works.

In deploying ensemble model however, it's more complex than what model-mesh-triton-adapter handles.

I had tried to discuss it on triton's github #4118

A fair solution without complicating model-mesh-triton-adapter is to support running post-deployment-hook.sh in model dir, if exists, that the user can then take care of by adding the logic explained in #4118

Solution

<storage-path>/
├── config.pbtxt
├── post-deployment-hook.sh
└── <version>/
    └── <model-data>

Once the model is copied and triton-adapter has done it's thing, if there exists a post-deployment-hook.sh in the model dir, it should be executed in model dir.

In my case, it would take care of converting my ensemble model M1 (ensemble of M11, M12 and M13)

/models/_triton_models
└── M1
    ├── config.pbtxt
    ├── post-deployment-hook.sh
    ├── M11
    │   ├── 1
    │   │   └── model.dali
    │   └── config.pbtxt
    ├── M12
    │   ├── 1
    │   │   └── model.onnx
    │   └── config.pbtxt
    └── M13
        ├── 1
        │   └── model.py
        └── config.pbtxt

into

/models/_triton_models
├── M1
│   ├── config.pbtxt
│   ├── post-depoyment-hook.sh
│   ├── M11
│   │   ├── 1
│   │   │   └── model.dali
│   │   └── config.pbtxt
│   ├── M12
│   │   ├── 1
│   │   │   └── model.onnx
│   │   └── config.pbtxt
│   └── M13
│       ├── 1
│       │   └── model.py
│       └── config.pbtxt
├── M11 -> M1/M11
├── M12 -> M1/M12
└── M13 -> M1/M13

So that the triton server can load the ensemble model M1, and also load it's constituent models M11, M12, M13

Triton adapter with PVC mounts

I'm using the Triton adapter with PVC mounts on OpenShift 4.12.

The Triton adapter seems not to have any parameter to set the pvcMountBase, and the default mount point does not exist in the adapter.

Inference service definition:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: gpt
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      runtime: triton-2.x
      storageUri: pvc://models/gpt4/

Log of the triton adapter:

[rosa@bastion ~]$ oc logs modelmesh-serving-triton-2.x-76d757c4fd-g7cwn -c triton-adapter -f
1.683314414790224e+09   INFO    Triton Adapter  Starting Triton Adapter Server  {"adapter_config": {"Port":8085,"TritonPort":8001,"TritonContainerMemReqBytes":1073741824,"TritonMemBufferBytes":134217728,"CapacityInBytes":939524096,"MaxLoadingConcurrency":1,"ModelLoadingTimeoutMS":90000,"DefaultModelSizeInBytes":1000000,"ModelSizeMultiplier":1.25,"RuntimeVersion":"21.06.1-py3","LimitModelConcurrency":0,"RootModelDir":"/models/_triton_models","UseEmbeddedPuller":true}}
1.6833144147902968e+09  INFO    Triton Adapter.Triton Adapter Server    Connecting to Triton... {"port": 8001}
1.6833144147905552e+09  INFO    Triton Adapter.Triton Adapter Server    Initializing Puller     {"Dir": "/models"}
1.6833144147905667e+09  INFO    Triton Adapter.Triton Adapter Server    Triton runtime adapter started
1.683314414790646e+09   INFO    Triton Adapter.Triton Adapter Server.client-cache       starting clean up of cached clients
1.6833144147922897e+09  INFO    Triton Adapter  Adapter will run at port        {"port": 8085, "Triton port": 8001}
1.683314414792354e+09   INFO    Triton Adapter  Adapter gRPC Server registered, now serving
1.6833144263663614e+09  INFO    Triton Adapter.Triton Adapter Server    Using runtime version returned by Triton        {"version": "2.11.0"}
1.683314426366414e+09   INFO    Triton Adapter.Triton Adapter Server    runtimeStatus   {"Status": "status:READY capacityInBytes:939524096 maxLoadingConcurrency:1 modelLoadingTimeoutMs:90000 defaultModelSizeInBytes:1000000 runtimeVersion:\"2.11.0\" methodInfos:{key:\"inference.GRPCInferenceService/ModelInfer\" value:{idInjectionPath:1}} methodInfos:{key:\"inference.GRPCInferenceService/ModelMetadata\" value:{idInjectionPath:1}}"}
1.683314431781367e+09   INFO    Triton Adapter.Triton Adapter Server.Load Model Using model type        {"model_id": "gpt__isvc-205815200d", "model_type": "pytorch"}
1.6833144317814991e+09  DEBUG   Triton Adapter.Triton Adapter Server    Reading storage credentials
1.6833144317815368e+09  DEBUG   Triton Adapter.Triton Adapter Server    creating new repository client  {"type": "pvc", "cacheKey": "pvc|"}
1.683314431781578e+09   ERROR   Triton Adapter.Triton Adapter Server.Load Model Failed to pull model from storage       {"model_id": "gpt__isvc-205815200d", "error": "rpc error: code = Unknown desc = Failed to pull model from storage due to error: could not process pull command: unable to create repository of type 'pvc': the PVC mount base '/pvc_mounts' doesn't exist: stat /pvc_mounts: no such file or directory"}
github.com/kserve/modelmesh-runtime-adapter/internal/proto/mmesh._ModelRuntime_LoadModel_Handler
        /opt/app-root/src/internal/proto/mmesh/model-runtime_grpc.pb.go:181
google.golang.org/grpc.(*Server).processUnaryRPC
        /remote-source/deps/gomod/pkg/mod/google.golang.org/[email protected]/server.go:1301
google.golang.org/grpc.(*Server).handleStream
        /remote-source/deps/gomod/pkg/mod/google.golang.org/[email protected]/server.go:1642
google.golang.org/grpc.(*Server).serveStreams.func1.2
        /remote-source/deps/gomod/pkg/mod/google.golang.org/[email protected]/server.go:938

Enable archive extraction for HTTP storage provider

Currently, with the HTTP storage provider, we don't yet support archive format extractions (e.g. tar.gz, zip), so its use is limited to all-in-one model files (e.g. .joblib files).

This should be investigated and supported so that users can pass in an archive contained all their model assets.

Investigate usage of third-party storage pullers

There are a few options that provide support for pulling from various cloud storage providers:

We should investigate using these instead of implementing/maintaining our own pulling logic for various storage providers.

Add TorchServe runtime adapter

TorchServe Adapter

TorchServe is now supporting v2 inference protocol with kserve/kserve#64, we can now implement the torchserve adapter to support torchserve runtime on modelmesh.

Triton RuntimeStatus.MethodInfos is missing ModelStreamInfer

Triton provides an extension to the standard gRPC inference api for streaming (inference.GRPCInferenceService/ModelStreamInfer), this extension is required to use vLLM backend with triton.
However currently the triton runtime adapter does not advertise the existence of this gRPC method and trying to call it results in an error (inference.GRPCInferenceService/ModelStreamInfer: UNIMPLEMENTED: Method not found or not permitted: inference.GRPCInferenceService/ModelStreamInfer)

To resolve this issue, I think the ModelStreamInfer method must be added here:

modelmesh-runtime-adapter/model-mesh-triton-adapter/server/server.go

Lines 267 to 269 in f9781d2

    
           mis := make(map[string]*mmesh.RuntimeStatusResponse_MethodInfo) 
        
           mis[tritonServiceName+"/ModelInfer"] = &mmesh.RuntimeStatusResponse_MethodInfo{IdInjectionPath: path1} 
        
           mis[tritonServiceName+"/ModelMetadata"] = &mmesh.RuntimeStatusResponse_MethodInfo{IdInjectionPath: path1}

Format all *.go files to have the empty line above the package declaration

We should be consistently format all *.go files to have an empty line above the package declaration.

Without the empty line, go fmt will treat the license header as the package doc and reformat it according to Go doc format standards, and IDE will show the license text as package documentation 🙃

Originally posted by @ckadner in #68 (comment)

models-dir exceeds the limit "1536Mi"

We have a 40Go model we try to run in modelmesh.

The adapter crash:

27m         Warning   FailedPreStopHook   pod/modelmesh-serving-triton-2.x-76d757c4fd-tbgpp    Exec lifecycle hook ([/opt/kserve/mmesh/stop.sh wait]) for Container "mm" in Pod "modelmesh-serving-triton-2.x-76d757c4fd-tbgpp_gpt(fb99c2ce-505f-4846-b140-99d482a63b1b)" failed - error: command '/opt/kserve/mmesh/stop.sh wait' exited with 137: , message: "waiting for litelinks process to exit after server shutdown triggered\n"`

We can see this storage limit log:

$ oc get pod modelmesh-serving-triton-2.x-76d757c4fd-g7cwn -oyaml | less
    message: 'Usage of EmptyDir volume "models-dir" exceeds the limit "1536Mi". '
    phase: Failed

We don't find a config option to change the limit.

	func convertKerasToTF(kerasFile string, targetPath string, ctx context.Context, loggr logr.Logger) error {
	cmd := exec.Command("python", "/opt/scripts/tf_pb.py", kerasFile, targetPath)

	mis := make(map[string]*mmesh.RuntimeStatusResponse_MethodInfo)
	mis[tritonServiceName+"/ModelInfer"] = &mmesh.RuntimeStatusResponse_MethodInfo{IdInjectionPath: path1}
	mis[tritonServiceName+"/ModelMetadata"] = &mmesh.RuntimeStatusResponse_MethodInfo{IdInjectionPath: path1}

kserve / modelmesh-runtime-adapter Goto Github PK

modelmesh-runtime-adapter's Introduction

KServe

Why KServe?

Learn More

🛠️ Installation

Standalone Installation

Kubeflow Installation

🛫 Create your first InferenceService

💡 Roadmap

📘 InferenceService API Reference

🧰 Developer Guide

✍️ Contributor Guide

🤝 Adopters

modelmesh-runtime-adapter's People

Contributors

Stargazers

Watchers

Forkers

modelmesh-runtime-adapter's Issues

The Problem

Expected Solution

config.pbtxt used

Predictor definition used

Solution

TorchServe Adapter

Recommend Projects

Recommend Topics

Recommend Org