aws-samples / host-yolov8-on-sagemaker-endpoint Goto Github PK

License: MIT No Attribution

Python 22.28% Jupyter Notebook 75.30% Batchfile 2.42%

host-yolov8-on-sagemaker-endpoint's Introduction

Host a YOLOv8 model on a SageMaker Endpoint

This aim of this project is to host a YOLOv8* PyTorch model on a SageMaker Endpoint and test it by invoking the endpoint. The project utilizes AWS CloudFormation/CDK to build the stack and once that is created, it uses the SageMaker notebooks created in order to create the endpoint and test it.

(*) NOTE: YOLOv8 is distributed under the GPLv3 license.

For YOLOv5 TensorFlow deployment on SageMaker Endpoint, kindly refer to the GitHub and the Blog on YOLOv5 on SageMaker Endpoint

AWS Architecture:

AWS CloudFormation Stack Creation

The AWS CloudFormation Stack can be created using 2 methods: (1) Using Template or (2) Using AWS CDK. Both the methods are described as follows:

Create Stack using AWS CloudFormation:
- Choose Launch Stack and (if prompted) log into your AWS account:
- Select a unique Stack Name, ackowledge creation of IAM resources, create the stack and wait for a few minutes for it to be successfully deployed
PyTorch YOLOv8 model with AWS CDK In order to create the stack with AWS CDK, follow the steps highlighted in yolov8-pytorch-cdk. Use these steps:

$ cd yolov8-pytorch-cdk
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip3 install -r requirements.txt
$ cdk synth
$ cdk bootstrap
$ cdk deploy

YOLOv8 PyTorch model deployment on Amazon SageMaker Endpoints:

From AWS Console, go to Amazon SageMaker Notebook Instances
Select the Notebook created by the stack and open it
Inside SageMaker Notebook, navigate: sm-notebook and open the notebooks: 1_DeployEndpoint.ipynb & 2_TestEndpoint.ipynb
1. 1_DeployEndpoint.ipynb: Download YOLOv8 model, zip inference code and model to S3, create SageMaker endpoint and deploy it
2. 2_TestEndpoint.ipynb: Test the deployed endpoint by running an image and plotting output; Cleanup the endpoint and hosted model

Contributors:

host-yolov8-on-sagemaker-endpoint's People

Contributors

Stargazers

Watchers

host-yolov8-on-sagemaker-endpoint's Issues

Non-standard packaging of model.pt under code/

Conventionally, the actual model artefacts e.g. yolov8l.pt would be packaged directly under the model.tar.gz root and outside of the code/ subfolder - which should contain only the inference code module.

This is important because (AFAIK) re-creating PyTorchModels with a different source_dir argument will "re-pack" the source artifact to replace the contents of code/ - generating a new final model.tar.gz for deployment. Deviating from this pattern is confusing for users who start from this sample and then try to deploy a model trained on SageMaker (where the starting point will likely be a model.tar.gz with a .pt file in the root)

Additionally, the model_dir passed to the model_fn points to this extracted root folder, so a more typical model loading function might look like:

def model_fn(model_dir):
    print("Executing model_fn from inference.py ...")
    env = os.environ
    model = YOLO(os.path.join(model_dir, env['YOLOV8_MODEL']))
    return model

I'd recommend refactoring this sample to pull yolov8l.pt up to the root level of model.tar.gz... and maybe even demonstrate how an initial "raw" model.tar.gz containing only the model is "re-packed" to include the inference code module - when calling PyTorchModel(source_dir="code", entry_point="inference.py", ....)

Why yolov8l.pt is not outside of code folder?

The author said here
https://aws.amazon.com/jp/blogs/machine-learning/hosting-yolov8-pytorch-model-on-amazon-sagemaker-endpoints/
that "The model weights yolov8l.pt file must be outside the code/ directory and the main inference python script inference.py"
But according to the code

from ultralytics import YOLO
import os, sagemaker, subprocess, boto3
from datetime import datetime

## Choose a model:
model_name = 'yolov8l.pt'

YOLO(model_name)
os.system(f'mv {model_name} code/.')

bashCommand = "tar -cpzf  model.tar.gz code/"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

It seems that the model lies inside the code folder together with inference.py and requirements.txt.
Is there any conflict?

nodejs12.x is no longer supported for creating or updating AWS Lambda functions

Hi @kcsong and @rpshah,

Thanks a lot for this script. I got the following error when clicking on your "Launch stack" button on N. Virginia region:

CustomS3AutoDeleteObjectsCustomResourceProviderHandler

Resource handler returned message: "The runtime parameter of nodejs12.x is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs18.x) while creating or updating functions. (Service: Lambda, Status Code: 400, Request ID: 42532834-d584-4dae-8feb-8827ed878a0e)" (RequestToken: ec4fa608-8ac9-5375-8694-db3f03b8e431, HandlerErrorCode: InvalidRequest)

Any chance this can be updated so that I can re run?

Thank you,
Michael

prediction_output elements are of type ultralytics.engine.results.Results. Requirements.txt need nvgpu

Hi Team,
for result in prediction_output:
pass
here the result is of type ultralytics.engine.results.Results, when we try to get result.keys, we get an exception.
In order to get the boxes and confidence scores, we need

for result in prediction_output:
        boxes = result.boxes.xyxy
        conf = result.boxes.conf

As I checked the documentation, we also have attributes masks, keypoints and probs. The boxes is ultralytics.engine.results.Boxes object. The object has attributes of xyxy, xywh, xyxyn etc.
boxes.xyxy or boxes.xywh return tensor element so to get the float point we need .item() function to retrieve.

In the requirements.txt, we need to add nvgpu, somehow the aws doesn't have nvgpu installed when we deploy the endpoint.

Can I raise a PR by fixing it?

stdout MODEL_LOG - FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/code/best.pt'

I am trying to deploy a yoloV5 model in SageMaker following this notebook. The endpoint is successfully deployed but when I am trying to test the endpoint using predictor.predict(payload) it is showing this error:

ModelError Traceback (most recent call last)
Cell In[196], line 1
----> 1 result = predictor.predict(payload)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/sagemaker/base_predictor.py:212, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes, component_name)
209 if inference_component_name:
210 request_args["InferenceComponentName"] = inference_component_name
--> 212 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
213 return self._handle_response(response)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.._api_call(self, *args, **kwargs)
561 raise TypeError(
562 f"{py_operation_name}() only accepts keyword arguments."
563 )
564 # The "self" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
1017 error_code = error_info.get("QueryErrorCode") or error_info.get(
1018 "Code"
1019 )
1020 error_class = self.exceptions.from_code(error_code)
-> 1021 raise error_class(parsed_response, operation_name)
1022 else:
1023 return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch log stream`

When I looked into the error logs, it is showing:
stdout MODEL_LOG - FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/code/best.pt'

The file structure while creating the tar file is:

model.tar.gz
├─ code/
├── inference.py
├── requirements.txt
└── best.pt

I have even tried it with best.pt out side the code folder according to this article: https://aws.amazon.com/blogs/machine-learning/hosting-yolov8-pytorch-model-on-amazon-sagemaker-endpoints/
model.tar.gz
├─ code/
│ ├── inference.py
│ └── requirements.txt
└── best.pt

still faced same issue.

ModelError when doing inference

The root cause is that the attribute "keys" of ultralytics.engine.results.Results was changed to "_keys". please refer to https://docs.ultralytics.com/reference/engine/results/#ultralytics.engine.results.Results

So please change the corresponding part of output_fn function in inference.py
Below is the my modification:

for result in prediction_output:
if 'boxes' in result._keys and result.boxes is not None:
infer['boxes'] = result.boxes.numpy().data.tolist()
if 'masks' in result._keys and result.masks is not None:
infer['masks'] = result.masks.numpy().data.tolist()
if 'keypoints' in result._keys and result.keypoints is not None:
infer['keypoints'] = result.keypoints.numpy().data.tolist()
if 'probs' in result._keys and result.probs is not None:
infer['probs'] = result.probs.numpy().data.tolist()

Initial Cloudformation steps don't work

Preface: I'm an AWS noob, so apologies in advance if there's something obvious I'm missing.

The instructions state that the Cloudformation Stack can be created either using the Cloudformation "launch stack" link or by following a series of steps using the AWS CDK. I have been unable to get either of these working. Steps I've tried are detailed below.

Cloudformation "launch stack" template

Click the link
Select a name
Click the button for "acknowledge creation of IAM resources"
Click "create stack"
Cloudformation starts creating the stack and then starts rolling back the changes.
Select "detect root cause". This finds that "CustomS3AutoDeleteObjectsCustomResourceProviderHandler" is the likely problem, and that it fails because the nodejs12.x runtime is no longer supported:

Resource handler returned message: "The runtime parameter of nodejs12.x is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs18.x) while creating or updating functions. (Service: Lambda, Status Code: 400, Request ID: d611eb01-007a-4154-9fde-b4b7201028fa)" (RequestToken: d13e5b8c-5193-4889-be3d-92d8cccf76a8, HandlerErrorCode: InvalidRequest)

I inspected the YAML file and changed the runtime to nodejs18.x. Repeated steps 2, 3, & 4 with this updated YAML file. This also fails, with a likely root cause of:

Resource handler returned message: "Error occurred while GetObject. S3 Error Code: PermanentRedirect. S3 Error Message: The bucket is in this region: us-east-1. Please use this region to retry the request (Service: Lambda, Status Code: 400, Request ID: e8e95860-729d-453a-8331-d7c3239eae38)" (RequestToken: 333a794a-9ca8-3bd0-bf94-2e387c4350c4, HandlerErrorCode: InvalidRequest)

I initially tried running Cloudformation in us-east-2 because this is the region where I have the greatest available quotas. I'm not sure what us-east-1 bucket is being referenced nor how to change the file to refer to a us-east-2 bucket instead.

I changed Cloudformation to the us-east-1 region and tried re-running the updated YAML file (with nodejs18.x as the runtime) and go the following error:

The account-level service limit 'ml.m5.4xlarge for notebook instance usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota. (Service: AmazonSageMaker; Status Code: 400; Error Code: ResourceLimitExceeded; Request ID: ab79c7e5-36b7-4ea9-8cc0-bcad41e4293f; Proxy: null)

Given the byzantine AWS quota structure, it's not clear to me exactly what quota(s) I would need to try to have increased nor how many days I'd need to wait before the appropriate quota(s) are increased. I've requested an increase of "ml.m5.4xlarge for notebook instance usage" within the SageMaker section - hopefully this is the correct quota and that I don't need to have additional quotas (e.g., Sagemaker endpoint, Sagemaker training, Lambda). I don't see any other references to instance types in the YAML file, so hopefully it'll work if/when the quota is increased.

AWS CDK

The steps seem pretty straightforward, but it's not clear to me where I should be running these commands. I tried using a terminal in a SageMaker Jupyterlab space, but got "bash: cdk: command not found" when I tried to run the cdk synth command. Is there some other place I should be running these commands?

Thanks in advance for any help you can give,
-J

Image request payloads are wrapped in numpy arrays

Hi folks, nice sample!

Problem description

I found an interesting issue when trying to use this example with Endpoint Data Capture or with Async Endpoints:

The example notebooks use the JSONDeserializer to fetch JSON responses, but leave the predictor serializer as the default (which for PyTorch is NumpySerializer). As a result, the payload passed to the endpoint is not actually JPEG file data, but a JPEG byte array packaged in a NumPy array.

This functionally works, but it means that the data on the wire is kind of an awkward format: If you deploy the endpoint as Async, the object that gets saved to S3 is not actually a JPEG/PNG/etc image, but something that needs a bit of magic to open. Likewise if you set up a real-time endpoint with data capture, the format of the captured data is not ideal.

Suggested updates

I suggest updating the notebooks to use a raw data serializer such as the DataSerializer (which can accept either data or filenames), which will also require slightly tweaking the inference.py input_fn code to parse the image data as expected.