Giter Site home page Giter Site logo

microsoftdocs / pipelines-azureml Goto Github PK

View Code? Open in Web Editor NEW
112.0 11.0 514.0 1.98 MB

Example Azure Pipeline to train and deploy a machine learning model using the Azure Machine Learning service

License: Creative Commons Attribution 4.0 International

Python 100.00%

pipelines-azureml's Introduction

Note

This repo uses Azure Machine Learning Python SDK v1 and is not actively maintained. For Azure Machine Learning Python SDK v2 examples, see https://github.com/Azure/azureml-examples.

Introduction

This repo shows an E2E training and deployment pipeline with Azure Machine Learning's CLI. For more info, please visit Azure Machine Learning CLI documentation.

This example requires some familiarity with Azure Pipelines or GitHub Actions. For more information, see here.

Instructions

Detailed Instructions

First, fork (or clone) the repository to your own GitHub account, so that you can make modification to your pipelines. From there, follow these instructions to get the whole setup and demo up and running:

📄 Detailed step-by-step setup instructions 📄

Short Instructions

If you are familar with Azure Machine Learning and Azure DevOps, you can follow these shortend instructions:

  1. Fork or clone this repo
  2. Create an Azure Machine Learning workspace named aml-demo in a resource group named aml-demo
  3. Create a new project in Azure DevOps/Pipelines
  4. Goto Project settings, select Service connections, create a new connection of type Azure Resource Manager, select Service principal (automatic) and configure it to the Resource Group of your Machine Learning workspace. Name it azmldemows. For more details see here or follow the tutorial.
  5. Create a new pipeline for the project, point it to the pipelines/diabetes-train-and-deploy.yml file in your forked GitHub repo. This defines an example pipeline.
  6. Modify the pipelines/diabetes-train-and-deploy.yml and change the ml-rg variable to the Azure resource group that contains your workspace. You may also change the ml-ws variable to the name of your Azure Machine Learning service workspace.
  7. Run the pipeline.

Declare variables for CI/CD pipeline

In case you want to leverage an existing ML workspace, you can customize it in the example pipeline pipelines/diabetes-train-and-deploy.yml:

 - ml-ws-connection: 'azmldemows'  # Workspace Service Connection name
 - ml-ws: 'aml-demo'               # AML Workspace name
 - ml-rg: 'aml-demo'               # AML resource Group name
 - ml-ct: 'cpu-cluster-1'          # AML Compute cluster name
 - ml-path: 'models/diabetes'      # Model directory path in repo
 - ml-exp: 'exp-test'              # Experiment name
 - ml-model-name: 'diabetes-model' # Model name
 - ml-aks-name: 'aks-prod'         # AKS cluster name

Run CLI scripts to create training compute, train model, register model, deploy model

You can also manually emulate the example pipeline on your machine by running the following commands (make sure to substitue the variables from above):

az extension add -n azure-cli-ml

cd models/diabetes/
az ml folder attach -w $(ml-ws) -g $(ml-rg)
az ml computetarget create amlcompute -n $(ml-ct) --vm-size STANDARD_D2_V2 --max-nodes 1
az ml run submit-script -c config/train --ct $(ml-ct) -e $(ml-exp) -t run.json train.py
az ml model register -n $(ml-model-name) -f run.json --asset-path outputs/ridge_0.95.pkl -t model.json
az ml model deploy -n diabetes-qa-aci -f model.json --ic config/inference-config.yml --dc config/deployment-config-aci.yml --overwrite
az ml computetarget create aks --name $(ml-aks-name) --cluster-purpose DevTest
az ml model deploy --name diabetes-prod-aks --ct $(ml-aks-name) -f model.json --ic config/inference-config.yml --dc config/deployment-config-aks.yml  --overwrite

Further notes

If you want to scope your project to your Azure Machine Learning service workspace, you can install the Machine Learning DevOps extension in your Azure DevOps project.

pipelines-azureml's People

Contributors

blackmist avatar csiebler avatar jpe316 avatar juliakm avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar pksvv avatar syntaxc4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pipelines-azureml's Issues

Running h2o.ai in Azure ML (Installing Java is a must)

mcr.microsoft.com/azureml/base:0.2.4 is pretty flat, so tried a few steps to install Java.

  1. Adding a custom base dockerfile
script: train.py
arguments: []
framework: Python
environment:
  python:
    userManagedDependencies: false
    interpreterPath: python
    condaDependenciesFile: train-env.yml
  docker:
    enabled: true
    baseDockerfile: Dockerfile

Returns error:

Output from dependency scanning: fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

  1. Add an argument to the docker. According to this documentation and this one as well, I can add an argument to the docker command. So, tried the following.
script: train.py
arguments: []
framework: Python
environment:
  python:
    userManagedDependencies: false
    interpreterPath: python
    condaDependenciesFile: config/train-conda.yml
  docker:
    enabled: true
    baseImage: mcr.microsoft.com/azureml/base:0.2.4
    arguments: ["--run","apt-get install default-jdk"] 

also arguments: "apt-get install default-jdk" like this.

As there is no documentation about it, having issues installing Java on the environment. Looking for your help.

Error in Attach folder to workspace step

Hi,
When I run the pipeline, I'm getting the error below :
The problem seems to be at the Attach folder to workspace step.

  • task: AzureCLI@2
    displayName: 'Attach folder to workspace'
    inputs:
    azureSubscription: $(ml-ws-connection)
    workingDirectory: $(ml-path)
    scriptLocation: inlineScript
    scriptType: 'bash'
    inlineScript: 'az ml folder attach -w $(ml-ws) -g $(ml-rg)'

ERROR: ProjectSystemException:
Message: {
"error_details": {
"error": {
"code": "AuthorizationFailed",
"message": "The client 'a43e0215-c079-499e-b242-2c8cdc19e0ec' with object id 'a43e0215-c079-499e-b242-2c8cdc19e0ec' does not have authorization to perform action 'Microsoft.MachineLearningServices/workspaces/read' over scope '/subscriptions/#######-####-####-####-###########/resourceGroups/aml-demo/providers/Microsoft.MachineLearningServices/workspaces/aml-demo' or the scope is invalid. If access was recently granted, please refresh your credentials."
}
},
"status_code": 403,
"url": "https://management.azure.com/subscriptions/ce55f75a-7c5d-4393-ac9e-601083781d51/resourceGroups/aml-demo/providers/Microsoft.MachineLearningServices/workspaces/aml-demo?api-version=2020-01-01"
}
InnerException None
ErrorResponse
{
"error": {
"message": "{\n "error_details": {\n "error": {\n "code": "AuthorizationFailed",\n "message": "The client 'a43e0215-c079-499e-b242-2c8cdc19e0ec' with object id 'a43e0215-c079-499e-b242-2c8cdc19e0ec' does not have authorization to perform action 'Microsoft.MachineLearningServices/workspaces/read' over scope '/subscriptions/#######-####-####-####-###########/resourceGroups/aml-demo/providers/Microsoft.MachineLearningServices/workspaces/aml-demo' or the scope is invalid. If access was recently granted, please refresh your credentials."\n }\n },\n "status_code": 403,\n "url": "https://management.azure.com/subscriptions/#######-####-####-####-###########/resourceGroups/aml-demo/providers/Microsoft.MachineLearningServices/workspaces/aml-demo?api-version=2020-01-01\"\n}"
}
}
##[error]Script failed with exit code: 1

Testing the model

My deployment in AKS and ACI is done properly. But how can I test that this is running as expected or not.?

Retry pipeline and/or task on failure

I use the Python SDK to develop ML pipelines for Azure ML.

How do I get my PythonScriptStep tasks or the encompassing Pipeline object to simply rerun upon failure?
I reckon it's pretty common for pipelines to temporarily break upon temporary network, storage, etc. issues so a simple rerun / retry seems pretty basic for task orchestration frameworks to provide (see e.g. Apache Airflow).

I've spent a fair amount of time going over the documentation for Azure ML and I just can't figure out how to get "retry upon failure" behaviour.

The closest there is is the continue_on_step_failure pipeline / task parameter which doesn't really do what's needed.

Any advice please?

Model not found in cache or in root at ./diabetes-model

Hello,

Following the different steps of the Azure Pipeline, I got this issue :

"message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nOperation ID: e9252f0d-81f8-44e5-bd6d-983076eca1f5\nMore information can be found using '.get_logs()'\nError:\n{\n "code": "DeploymentTimedOut",\n "statusCode": 504,\n "message": "The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. You can run print(service.state) from the python SDK to retrieve the current state of the webservice."\n}

Looking for the logs with get_logs(), I extract this part of the message :
Model not found in cache or in root at ./diabetes-model

The az CLI command is the following : az ml model deploy -n diabetes-qa-aci -f model.json --ic config/inference-config.yml --dc config/deployment-config-aci.yml --overwrite -v

And model.json is created by the previous step and contains :
{
"cpu": "",
"createdTime": "2020-06-09T04:57:54.550301+00:00",
"description": "",
"experimentName": "diabetes-exp",
"framework": "Custom",
"frameworkVersion": null,
"gpu": "",
"id": "diabetes_reg_model:2",
"memoryInGB": "",
"name": "diabetes_reg_model",
"properties": "",
"runId": "diabetes-exp_1591678184_b25da442",
"sampleInputDatasetId": "",
"sampleOutputDatasetId": "",
"tags": "",
"version": 2
}

Any idea ?

Unable to delete pipeline drafts?

The Designer UI has a feature to delete pipeline drafts.

This feature is grayed out. There is no ability to select the pipeline draft and delete it either. Is this a defect?

Screen Shot 2020-11-06 at 5 44 35 PM

Error in train model

I'm having trouble completing the getting_started example (getting_started.md) as the pipeline stops on the train (takes too long ≈ 60 min on train model job). Here are the last logs before canceling automatically (the file contains the entire logs:
Complete Logs.txt
):

2022-02-07T00:52:37.0050192Z WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)",)': /packages/6b/b2/c0d62a3a91c13641e09af294c13fe16929f88dc5902718388cd9b292217f/azure_mgmt_authorization-0.52.0-py2.py3-none-any.whl
2022-02-07T00:52:37.0052090Z Downloading azure_mgmt_authorization-0.52.0-py2.py3-none-any.whl (112 kB)
2022-02-07T00:52:37.0052735Z
2022-02-07T00:57:40.9228879Z WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)",)': /packages/a1/71/9a20913e92771b3c23564f1bea54d376d09fb30a75585087c70b769d75c8/azure_mgmt_authorization-0.51.1-py2.py3-none-any.whl
2022-02-07T00:58:41.5520782Z Downloading azure_mgmt_authorization-0.51.1-py2.py3-none-any.whl (111 kB)
2022-02-07T00:58:41.5521395Z
2022-02-07T00:59:42.2727333Z INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
2022-02-07T01:03:45.8869909Z Downloading azure_mgmt_authorization-0.51.0-py2.py3-none-any.whl (111 kB)
2022-02-07T01:09:52.4374279Z WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)",)': /packages/6f/17/55b974603c16be89c7a7c16bac57b7bce48527bf1bebc3f116f7215176e6/azure_mgmt_authorization-0.50.0-py2.py3-none-any.whl
2022-02-07T01:09:52.4376241Z Downloading azure_mgmt_authorization-0.50.0-py2.py3-none-any.whl (81 kB)
2022-02-07T01:09:52.4376835Z
2022-02-07T01:26:07.6809069Z WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)",)': /packages/67/e4/b3535daae30db9b3f73046a0c151c5c2ae2d2bff96ba0c28c1f26a21dbf1/azure_mgmt_authorization-0.40.0-py2.py3-none-any.whl
2022-02-07T01:26:07.6811091Z Downloading azure_mgmt_authorization-0.40.0-py2.py3-none-any.whl (38 kB)
2022-02-07T01:26:07.6811445Z
2022-02-07T01:39:04.9650251Z ##[error]The operation was canceled.
2022-02-07T01:39:04.9664245Z ##[section]Finishing: Train model

Any example of model deployment on local compute?

Instead of ACI, what if we want to test our deployment via Azure DevOps locally?

What would the steps? Please add it? So far I have this:
in deployment-config-local.yml

computeType: local
port: 13579

and in the pipeline I have

az ml model deploy -n diabetes-qa-local --model diabetes-model:1 --ic config/inference-config.yml --dc config/deployment-config-local.yml

But it returns

Downloading model diabetes-model:1 to C:\Users\mkrdi\AppData\Local\Temp\azureml_s5877b_f\diabetes-model\1
Generating Docker build context.

then it fails

{'Azure-cli-ml Version': '1.4.0', 'Error': WebserviceException:
        Message: Received bad response from service:
Response Code: 400
Headers: {'Date': 'Wed, 06 May 2020 02:01:46 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Request-Context': 'appId=cid-v1:2d2e8e63-272e-4b3c-8598-4ee570a0e70d', 'x-ms-client-request-id': 'e734f89cdce14741bf8dc8ca879a8bab', 'x-ms-client-session-id': '71665c61-45e2-465a-9b6b-10d23ce6b0f8', 'api-supported-versions': '1.0, 2018-03-01-preview, 2018-11-19', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload'}
Content: b'{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"ServiceModelConflict","message":"Exactly one of the ModelIds or Models must be specified for a service."}],"correlation":{"RequestId":"e734f89cdce14741bf8dc8ca879a8bab"}}'
        InnerException None
        ErrorResponse
{
    "error": {
        "message": "Received bad response from service:\nResponse Code: 400\nHeaders: {'Date': 'Wed, 06 May 2020 02:01:46 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Request-Context': 'appId=cid-v1:2d2e8e63-272e-4b3c-8598-4ee570a0e70d', 'x-ms-client-request-id': 'e734f89cdce14741bf8dc8ca879a8bab', 'x-ms-client-session-id': '71665c61-45e2-465a-9b6b-10d23ce6b0f8', 'api-supported-versions': '1.0, 2018-03-01-preview, 2018-11-19', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload'}\nContent: b'{\"code\":\"BadRequest\",\"statusCode\":400,\"message\":\"The request is invalid.\",\"details\":[{\"code\":\"ServiceModelConflict\",\"message\":\"Exactly one of the ModelIds or Models must be specified for a service.\"}],\"correlation\":{\"RequestId\":\"e734f89cdce14741bf8dc8ca879a8bab\"}}'"     
    }
}}

Issue running Azure DevOps pipeline from pipelines/diabetes-train-and-deploy.yml

I've followed the instructions in the readme to set up the repo, created the service connection as directed, and created an Azure DevOps pipeline based on the diabetes-train-and-deploy.yml file. The workspace the pipeline points to is an existing resource that was created prior to finding the pipelines-azureml repo. When I run the pipeline it always fails on the Train Model step with the following error:

"error": {
    "code": "UserError",
    "message": "Image build run on compute failed: User starting the run is not an owner or assigned user to the Compute Instance",
    "details": []
},

I'm able to dig in further to the error in ML Studio and it shows the user calling is the service connection I set up for the pipeline. On the off chance that it might be a permissions issue, I added that user as a contributor to the workspace but I see the same error. I've also tried the powershell commands from the "Run CLI scripts..." section at the bottom of the README.md file and I get the same message running under my Azure account which has the Owner role on the ML Workspace.

The pipeline was able to create the compute cluster, but it seems that it doesn't have access to the cluster after it's created? Another possibility is that our workspace has something locked down that is preventing this pipeline from working properly. Any help is greatly appreciated. Thank you!

Issue with model train command .

Hi,

We are getting error when running the below command .
az ml run submit-script -c config/train --ct $(ml-ct) -e $(ml-exp) -t run.json train.py

Readme instructions broken

Since the last change to the azure-pipelines.yml the instructions in the readme.md are not valid anymore:

Modify the azure-pipelines.yml and change myresourcegroup to the Azure resource group that contains your workspace. You must also change the myworkspace entry to the name of your Azure Machine Learning service workspace.

  • azureSubscription (service connection) is now "build-demo" everywhere instead of "azmldemows"
  • resource group name is now "scottgu-all-hands" instead of "myresourcegroup"
  • ML workspace name is now "build-2019-demo" instead of "myworkspace"

Compute name 'cpu-cluster-1' is invalid

Raising a ticket because the compute name 'cpu-cluster-1' is invalid. My suggestion would be to change it into 'cpu'. See error message below:

Command group 'ml' is experimental and under development. Reference and support levels: https://aka.ms/CLI_refstatus
Creating compute instance...
{'Azure-cli-ml Version': '1.29.0', 'Error': ComputeTargetException:
        Message: Compute name 'cpu-cluster-1' is not available. Reason: Invalid. Message: A name for an Azure ML Com
pute Instance must be between 3 and 24 characters in length and must use only numbers, letters and minus symbol (-)
,must start with letters. Numbers cannot be the ending of the name if the previous character is a minus symbol (-).
 Please specify a different Azure ML Instance name
        InnerException None
        ErrorResponse
{
    "error": {
        "message": "Compute name 'cpu-cluster-1' is not available. Reason: Invalid. Message: A name for an Azure ML
Compute Instance must be between 3 and 24 characters in length and must use only numbers, letters and minus symbol (
-)\uff0cmust start with letters. Numbers cannot be the ending of the name if the previous character is a minus symbo
l (-). Please specify a different Azure ML Instance name"
    }
}}

Problems executing the pipeline examples

Hello there,
I'm trying to follow the tutorial but when I executed it I got the following error

##[error]No hosted parallelism has been purchased or granted. To request a free parallelism grant, please fill out the following form https://aka.ms/azpipelines-parallelism-request
Pool: Azure Pipelines
Image: Ubuntu-16.04
Started: Just now
Duration: 11s

Job preparation parameters
ContinueOnError: False
TimeoutInMinutes: 60
CancelTimeoutInMinutes: 5
Expand:
  MaxConcurrency: 0
  ########## System Pipeline Decorator(s) ##########

  Begin evaluating template 'system-pre-steps.yml'
Evaluating: eq('true', variables['system.debugContext'])
Expanded: eq('true', Null)
Result: False
Evaluating: resources['repositories']['self']
Expanded: Object
Result: True
Evaluating: not(containsValue(job['steps']['*']['task']['id'], '6d15af64-176c-496d-b583-fd2ae21d4df4'))
Expanded: not(containsValue(Object, '6d15af64-176c-496d-b583-fd2ae21d4df4'))
Result: True
Evaluating: resources['repositories']['self']['checkoutOptions']
Result: Object
Finished evaluating template 'system-pre-steps.yml'
********************************************************************************
Template and static variable resolution complete. Final runtime YAML document:
steps:
- task: 6d15af64-176c-496d-b583-fd2ae21d4df4@1
  inputs:
    repository: self

I found that now you have to request permissions to MS, there is any way to execute it without request their permissions?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.