Giter Site home page Giter Site logo

mlops-databricks's Introduction

MLOps for Azure Databricks Example

This repo is used in a tutorial for learning how to do DevOps for Machine Learning (also called MLOps) using Azure Databricks and Azure ML Services.

The DevOps Pipelines are defined using the azure-pipelines.yml for Azure DevOps, as well as main.yml for GitHub Actions.

Using This Sample Project

If you want to run this example in Azure DevOps, you need to prepare you enviroment with the following steps.

Required Accounts And Resources

This examples uses Azure DevOps as an CI/CD toolset, as well as the Microsoft Azure platform to host you trained Machine Learning Model.

You can start with both platform completely free:

Azure Databricks Workspace

In your Azure subsciption, you need to create an Azure Databricks workspace to get started.

NOTE: I recommend to place the Azure Databricks Workspace in a new Resource Group, to be able to clean everything up more easily afterwards.

Importing This DevOps Project

As soon as you have access to the Azure DevOps platform, you're able to create a project to host your MLOps pipeline.

As soon as this is created, you can import this GitHub repository into your Azure DevOps project.

Set up The Build Pipeline

By importing the GitHub files, you also imported the azure-pipelines.yml file.

This file can be used to create your first Build Pipeline.

NOTE: In this Build Pipeline I'm using a preview feature called "Multi-Stage Pipelines". In order to use those, you should enable this preview feature.

Connecting Azure Databricks

To be able to run this pipeline, you also need to connect your Azure Databricks Workspace.

Therefor, yor first need to generate an access token.

This token must be stored as encrypted secret in your Azure DevOps Build Pipeline...

Adding an Azure Pipeline Variable

NOTE: The variable must be called databricks.token

Azure Pipeline Variables

... or your GitHub Project.

Adding an Azure Pipeline Variable

NOTE: The GitHub Secret must be called DATABRICKS_TOKEN

Azure Pipeline Variables

Connecting the Azure ML Service Workspace

Step 1: Create Azure AD Service Principal

The Databricks-Notebooks for serving your model, will create an Azure Machine Learning Workspace (and other resources) for you.

To grant Azure Databricks access rights to your Azure Subscription, you need to create a Service Principal in your Azure Active Directory.

You can do that directly in the Cloud Shell of the Azure Portal, by using one these two commands:

az ad sp create-for-rbac -n "http://MLOps-Databricks"

Least Privilege Principle: If you want to narrow that down to a specific Resource Group and Azure Role, use the following command

az ad sp create-for-rbac -n "http://MLOps-Databricks" --role contributor --scopes /subscriptions/{SubID}/resourceGroups/{ResourceGroup1}

Make a note of the result of this command, as you will need it in a later step.

Step 2: Install / Update Databricks CLI

Azure Databricks has its own place to store secrets.

At the time of creating this example, this store can be only accessed via the Databricks command-line interface (CLI).

Therefor you should install this CLI on your local machine or in the Azure Cloud Shell.

pip install -U databricks-cli

NOTE: You need python 2.7.9 or later / 3.6 or later to install and use the Databricks command-line interface (CLI)

Step 3: Store Databricks Secrets

Using the Databricks CLI, you can now create your own section (scope) for your secrets...

databricks secrets create-scope --scope azureml

... and add the required secrets to the scope.

# Use the "tenant" property from the Azure AD Service Principal command output
databricks secrets put --scope azureml --key tenant_id
# Use the "appId" property from the Azure AD Service Principal command output
databricks secrets put --scope azureml --key client_id
# Use the "password" property from the Azure AD Service Principal command output
databricks secrets put --scope azureml --key client_secret

databricks secrets put --scope azureml --key subscription_id
databricks secrets put --scope azureml --key resource_group
databricks secrets put --scope azureml --key workspace_name

OPTIONAL: Pre-Approval Checks (Azure DevOps)

To avoid high costs from the Azure Kubernetes Service, which will be created by the "Deploy To Production" job, I recommend that you set up a Pre-Approval Check for the wine-quality-production environment.

This can be done in the Environments section of your Azure Pipelines.

Azure Pipeline Environments

mlops-databricks's People

Contributors

saschadittmann avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.