Giter Site home page Giter Site logo

distributed-deep-learning-workshop's Introduction

Distributed Deep Learning Workshop

In this workshop, we will train a deep learning model in a distributed manner using Databricks. We will discuss how we can leverage Delta Lake to prepare structured, semi-structured, or unstructured datasets and Petastorm for distributing datasets efficiently on a cluster. We will also cover how to use Horovod for distributed training on both CPU and GPU based hardware. This example aims to serve as a reusable template that is tailorable to meet your specific modeling needs.

Workshop structure

The workshop involves a series of Databricks notebooks split into two parts,

In part 1 we look at how we can optimally leverage the parallelism of Spark for training deep learning models in a distributed manner. The notebooks outline the following:

  • Data Prep
    • How to create a Delta table with the Binary file data source reader using JPEG image sources.
  • Single node training
  • Distributed training

In part 2 we look at how we can paralellize both hyperparameter tuning and model inference. We illustrate:

  • Model tuning with Hyperopt
    • Tuning a single node DL model with Hyperopt
    • Tuning a distributed Horovod process with Hyperopt
  • Distributed model inference
    • How to package up a custom Pyfunc with preprocessing/post-processing steps
    • Applying that logged custom Pyfunc in a single node inference setting
    • Applying that logged custom Pyfunc in a distributed inference setting

Requirements

A recommended Databricks ML Runtime >= 7.3LTS is suggested. Please use the repos feature to clone into your repo and access the notebook.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.