Giter Site home page Giter Site logo

airflow-snowparkml-demo's Introduction

Intro

Snowpark ML (in public preview) is a python framework for Machine Learning workloads with Snowpark. Currently Snowpark ML provides a model registry (storing ML tracking data and models in Snowflake tables and stages), feature engineering primitives similar to scikit-learn (ie. LabelEncoder, OneHotEncoder, etc.) and support for training and deploying certain model types as well as deployments as user-defined functions (UDFs).

This guide demonstrates how to use Apache Airflow to orchestrate a machine learning pipeline leveraging Snowpark ML for feature engineering as well as model training and scoring.

This demo also shows the use of the Snowflake XCOM backend which reinforces security and governance by serializing all task in/output to Snowflake tables and stages while storing in the Airflow XCOM table a URI pointer to the data.

Prerequisites

Setup

  1. Install Astronomer's Astro CLI. The Astro CLI is an Apache 2.0 licensed, open-source tool for building Airflow instances and is the fastest and easiest way to be up and running with Airflow in minutes. Open a terminal window and run:

For MacOS

brew install astro

For Linux

curl -sSL install.astronomer.io | sudo bash -s
  1. Clone this repository:
git clone https://github.com/astronomer/airflow-snowparkml-demo
cd airflow-snowparkml-demo
  1. Open the .env file in an editor and update the following variables with you account information This demo assumes the use of a new Snowflake trial account with admin privileges. A database named 'DEMO' and schema named 'DEMO' will be created in the DAG. Running this demo without admin privileges or with existing database/schema will require further updates to the .env file.
  • AIRFLOW_CONN_SNOWFLAKE_DEFAULT
    -- login
    -- password
    -- account **

** The Snowflake account field of the connection should use the new ORG_NAME-ACCOUNT_NAME format as per Snowflake Account Identifier policies. The ORG and ACCOUNT names can be found in the confirmation email or in the Snowflake login link (ie. https://xxxxxxx-yyy11111.snowflakecomputing.com/console/login) Do not specify a region when using this format for accounts.

NOTE: Database and Schema names should be CAPITALIZED due to a bug in Snowpark ML.

  1. Start Apache Airflow:

    astro dev start
  2. Run the Snowpark ML Demo DAG

astro dev run dags unpause snowpark_ml_demo
astro dev run dags trigger snowpark_ml_demo
  1. Connect to the Local Airflow UI and login with admin/admin

  2. While waiting for the DAG run to complete exam the DAG code by opening the file include/dags/snowpark_ml_demo.py. Each function includes a docstring with an explanation of the task functions.

For a more advanced example see the Customer Analytics Demo

airflow-snowparkml-demo's People

Contributors

mpgreg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.