Required cluster runtime version: 12.2 LTS ML. Each participant should provision his/her own single-node cluster using the required runtime. Clone this repository into a Databricks Repo.
Notebook run order:
- etl: Load raw data and create a Delta table of features
- eda: Compare/contrast Spark SQL and the DataFrame API.
- models/xgboost: Train an XGBoost model and log to MLflow.
- models/random_forest_hyperopt: Train a Random Forest model with hyperparameter tuning and MLflow logging.
- compare_models: Choose the best model and register it in the Model Registry.
- score: Load the production model from the Model Registry and perform inference.
- No notebook: Follow along with instructor: Deploy the production model as a Rest API.
- No notebook: Follow along with instructor: Create and run a multi-task job via the Databricks Jobs UI
- model_registry_webhook: Watch instructor: Triggering activities base on Model Registry events.
- No notebook: Follow along with instructor: Auto ML, training and comparing models automatically.
Extras if time permits:
- extras/custom_mlflow_model: Creating and logging your own, custom MLflow model.
- extas/feature_store: Integrating the Databricks Feature Store into the model training and inference workflows.
Notebook run order:- passenger_demographic_features
- passenger_ticket_features
- fit_model
- model_inference