handwriting_recog_aws

Maintainer: [email protected]

Pipeline Overview

This pipeline allows a user to test his/her trained onnx model via AWS Lambda.
User can upload his/her onnx model to Lambda by uploading model.onnx file directly to an Amazon S3 bucket.
User can then test his/her onnx model by uploading a digit.png file directly to the an S3 bucket.
When the digit.png is uploaded to Amazon S3, Lambda is triggered and runs the load_model_infer.py script that loads model.onnx from Amazon S3 bucket, makes an inference for the digit.png file from Amazon S3 bucket, and saves results to results.txt and uploads it to the an Amazon S3 bucket - the user can download and view the results.txt from Amazon S3 bucket.

Pipeline Data Flow Diagram

Building the Pipeline Overview

Create "module.zip"
Create an AWS Account
Setup Amazon S3 (AWS)
Setup Amazon Lambda (AWS)
Upload model.onnx and module.zip to Amazon S3
Load module.zip (load_model.infer.py and packages) to the Lambda
Run-through the pipeline

Create "module.zip"

Locate "load_model_infer.py" file and "packages" directory in "files_for_amazon_s3" directory.
Open up "load_model.infer.py" and change "<your-unique-s3-bucket-name>" to s3 bucket name that is (literally)globally unique - we'd suggest use combination of your uniquename and "umich.edu".
Zip "load_model_infer.py" and "packages", and rename the .zip file "module.zip".

Create an AWS Account

Follow the steps on this page: https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/

Setup Amazon S3 (AWS)

Go to the "AWS Management Console", search for "S3" in the search field, and select it to go to the Amazon S3 console page.

Click on the "Create Bucket", specify bucket name as "handwriting-recog-aws-bucket" (it has to be this name since the load_model_infer.py looks for "handwriting-recog-aws-bucket" when accessing Amazon S3), and click "Next" - leave everything at their default values.

Click "Next" - leave everything at their default values.

Un-check the checkbox next to "Block all public access" - this is never a good idea for production-level products, but this is done only for this project to make the pipeline setup process go smoother. After un-checking the checkbox, click "Next".

Click "Create bucket" - leave everything at their default values.

You should be able to see the name of the bucket that you just created. Click on it to go to the bucket's console/edit page.

This is your Amazon S3 bucket's console/edit page - this is where you will later upload module.zip, model.onnx, and digit.png to and where you will receive the results.txt file from Amazon Lambda.

Setup Amazon Lambda (AWS)

Go to the "AWS Management Console", search for "Lambda" in the search field, and select it to go to the Amazon Lambda console page.

Click on the "Create function", specify function name (in this example, we named ours "handwriting-recog-aws-function"), set "Runtime" as "Python3.6", and click "Create function" - leave everything at their default values.

Once you click the "Create function", you will be taken to the Lambda console page. Start configuring by first clicking "+ Add trigger".

Once you are at the "Add trigger" page, search for S3 in the search field and click on it.

Once the "Trigger configuration" options appear, start configuring them - set "Bucket" to the Amazon S3 bucket you created previously, "Event type" to "PUT", leave "Prefix" blank, "Suffix" to ".png", and check the "Enable trigger" checkbox. Click "Add".

Once you are back at the Lambda console page, click on the "handwriting-recog-aws-function" button at the center of the page to go to the Lambda function editing/configuring page.

Once the Lambda function editing/configuring page is loaded, scroll down to the section named "Execution role". Once you are there, click on the "View the <default-role-name> role on the IAM console" link that is highlighted blue.

Once you are at the IAM console page, click on the "Attach Policies" button. Then search for "AmazonS3FullAccess" policy and click on the "Attach Policy" button - this gives Lambda access to Amazon S3's contents, such as buckets and contents inside of the buckets, etc.

Once "Execution role" is set, take a look at the "Basic settings" section. Under "Timeout" set the value to 10 seconds. Afterwards, click "Save" button located at the upper right of the Lambda console page.

Awesome! Your Amazon Lambda setup is complete - we will come back to this console to upload module.zip, located in Amazon S3, to Lambda, which contains the load_model_infer.py script and the python packages needed to run the script.

Upload model.onnx and module.zip to Amazon S3

To upload model.onnx and module.zip to Amazon S3 (files located inside of the directory "files_for_amazon_s3"), go back to your Amazon S3 bucket's console/edit page, and drag-and-drop model.onnx and module.zip onto the main console page - a pop up will show up asking you to do configurations, but there is nothing more to configure, so just click through all of the "Next"s and click "Upload".

Once the files are uploaded, you are completely done with Amazon S3 setup! :) - we will later come back to this page to upload digit.png to run the whole pipeline.

Load module.zip (load_model.infer.py and packages) to Amazon Lambda

To load module.zip, which contains load_model_infer.py and necessary python packages, to Lambda, go back to your Lambda function console page. Once you are there, scroll down to find the section "Function code" and select "Upload a file from Amazon S3" under the "Code entry type". Afterwards, set "Handler" field to "load_model_infer.lambda_handler".

Once you select "Upload a file from Amazon S3", you will now see the field "Amazon S3 link URL". To get this URL, go back to your Amazon S3 bucket console, click module.zip (this will take you another page that has details about module.zip), and click "Copy path".

After you get the "Amazon S3 link URL" for module.zip, go back to your Lambda function console page and paste it into the field "Amazon S3 link URL". Click "Save" button located at the upper right of the Lambda console to save the module.zip into your Lambda function.

Awesome! Your module.zip (load_model.infer.py and packages) is now loaded into your Lambda function. This method of uploading code is used whenever the size of the code and the packages/dependencies are too large in size.

Run-through the pipeline (Finally!)

Now that your AWS pipeline setup is complete, it's now time to test it.

To begin locate digit.png under "test_images" directory. Note: Before you run an image through the pipeline, make sure to rename the image to "digit.png".
Once you have located the digit.png, go to your Amazon S3 bucket console. Once you are at the Amazon S3 bucket console, drag-and-drop the digit.png onto the console. Upload the image the way you uploaded files previously - do not configure anything, just click through the "Next"s and click "Upload".

After you upload it, refresh the Amazon S3 bucket console page - you should be able to see the results.txt within your Amazon S3 bucket.

Download the results.txt to take a look at the inference results made by your onnx model - this shows that the model predicted that the uploaded digit.png is "1" with 0.573457 probability, which is (somewhat) correct!

results.txt

digit.png

Things to consider/think about when-creating/after-creating this pipeline

After you've created this pipeline and started testing with different images, you will notice that the results are not as good as the results from this demo https://microsoft.github.io/onnxjs-demo/#/mnist, which uses the exact same model that we are using to predict images that we are using - the test images that are included are also generated from this demo website. Why does this happen?

This could be caused by the way we are currently converting raw-input images into 28 x 28. In this pipeline, the image conversion happens in the "load_model_infer.py" - take a look at the script and see if anything could be changed to get better results. Once you find a way to make it better, test it locally and try deploying the new script to the Lambda and test it.

As you are attempting to modify the script and re-deploy it to AWS, you will quickly realize there are various things to consider. You will start to have questions like, "how do I test the new script locally?" or "why do I get this error when I upload my new module.zip?". Here are some of the most common things that come up very quickly.

"How do I test the new script locally?"

There are many ways to do this, but the key thing is that you need to test the script in an environment that is exactly the same or very similar to the environment in Amazon Lambda. Amazon Lambda is a Linux environment, so what I, personally, did was creating a Linux Docker container and run the script in it - this is one of the fastest ways to get Linux environment setup and up and running quickly.

"Why do I get an error when I try to upload a new module.zip" to Lambda via Amazon S3?"

This could be happening for various reasons, but one of the most common causes for this issue is the way you are "zipping" or "compressing" the files (load_model_infer.py and model.onnx) - you should not zip the folder that the files are in, but you should select the two files and zip them instead.

stevelee55 / handwriting_recog_aws Goto Github PK