swiss-ai-center / a-guide-to-mlops Goto Github PK

A simple yet complete guide to MLOps tools and practices - from a conventional way to a modern approach of working with ML projects.

Home Page: https://mlops.swiss-ai-center.ch

License: Creative Commons Attribution Share Alike 4.0 International

Dockerfile 100.00%

best-practices ml mlops bentoml cml dvc

a-guide-to-mlops's Introduction

A guide to MLOps

A simple yet complete guide to MLOps tools and practices - from a conventional way to a modern approach of working with ML projects. Website available at https://mlops.swiss-ai-center.ch.

Local development with Docker Compose (recommended)

To improve the documentation locally, run Material for MkDocs with the following commands:

# Build the Docker container
docker compose build

# Start the Docker container
docker compose up serve

You can now access the local development server at http://localhost:8000.

If you make changes to the documentation, the web page should reload.

Local development with Python

To improve the documentation locally, run Material for MkDocs with the following commands:

# Install all dependencies for Material for MkDocs
sudo apt install --yes \
    libcairo2-dev \
    libfreetype6-dev \
    libffi-dev \
    libjpeg-dev \
    libpng-dev \
    libz-dev

# Create the virtual environment
python3.11 -m venv .venv

# Activate the virtual environment
source .venv/bin/activate

# Install the Python dependencies
pip install \
    --requirement requirements.txt \
    --requirement requirements-freeze.txt

# Run Material for MkDocs
mkdocs serve

You can now access the local development server at http://localhost:8000.

If you make changes to the documentation, the web page should reload.

Format the documentation with Docker Compose (recommended)

To format the Markdown documentation, run mdwrap with the following commands:

# Build the Docker container
docker compose build

# Start the Docker container
docker compose up format

Format the documentation with Python

To format the Markdown documentation, run mdwrap with the following commands:

# Create the virtual environment
python3.11 -m venv .venv

# Activate the virtual environment
source .venv/bin/activate

# Install the Python dependencies
pip install \
    --requirement requirements.txt \
    --requirement requirements-all.txt

# Run mdwrap
mdwrap --fmt docs

a-guide-to-mlops's People

Contributors

Stargazers

Watchers

Forkers

robinfru

a-guide-to-mlops's Issues

Add pre-commit for listing and formatting

Expand section on reproducibility / pip freeze / transitive deps on chapter 2

Simplify cicd introduction part 2

Chapter 8: Only dvc repo to show how the experiment is reproducible everywhere
Chapter 9: Production cicd with commit dvc.lock and CML report step
- Explain why do we need to dvc repro --pull --allow-missing to avoid team collaboration side effects.
Chapter 10 (new): Change params and make the merge request

Deploy MinIO : The request signature we calculated does not match the signature you provided. Check your key and signing method.

Hi everyone,

I would like to deploy MinIO like as you explain in the guide, but I have some problems.

What I do :

Clone the repo
Go to infrastructure/minio
Create the folder minio-data
Start the container with docker compose up --detach

With my browser, I can go to localhost:9001 and the login page appears. When I set the default login (user: minio, password: minio123), the web page tells me : The request signature we calculated does not match the signature you provided. Check your key and signing method.

Is it the wrong password and username ?

I tried modifying the docker-compose as below and it works with the username and the password minio, minio123 :

networks:
  minio:
    name: minio

services:
  minio:
    container_name: minio
    image: quay.io/minio/minio
    restart: unless-stopped
    env_file:
      - minio.env
    environment:
      - MINIO_ROOT_USER=minio
      - MINIO_ROOT_PASSWORD=minio123
    command: server /data --console-address ":9001"
    networks:
      - minio
    ports:
      - 9000:9000
      - 9001:9001
    volumes:
      - ./minio-data:/data:rw

Can you tell me what's going on ?

Thank you and have a nice day.

Minor Improvements

The following are my minor suggestions to the MLOps guide I went through.

Add a link to the params.yaml file for configuration explanation in Chapter 1
Specify DVC default params.yaml file naming convention in Chapter 1
Remove unprecise code block titles in Chapter 2, 3, 4, and 5
Add .DS_Store to .gitignore to all chapters
Clarify the output of gcloud init for new versions in Chapter 3
Give example for the GCP Bucket in Chapter 3
Update git status output to include data/README.md in Chapter 3
Update download path of the GCP credentials in Chapter 6
Add explanation for when the Compare & pull request button is not visible in Chapter 7
Add code block titles in Chapter 8
Update git diff output to include missing output in Chapter 8
Add info box for FastAPI url adding http://localhost:8080/docs in Chapter 8

Chapter 13: Investigate the possibility of using private Docker image on ghcr.io

This is possible by overriding the resources.yaml.j2 template with the flag templates_dir.0.

https://dev.to/asizikov/using-github-container-registry-with-kubernetes-38fb

apiVersion: v1
kind: Namespace
metadata:
  name: {{ namespace }}
  labels:
    name: {{ namespace }}

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ image_name }}
  namespace: {{ namespace }}
spec:
  selector:
    matchLabels:
      app: {{ image_name }}
  template:
    metadata:
      labels:
        app: {{ image_name }}
    spec:
      containers:
      - name: {{ image_name }}
        image: {{ image_uri }}
        imagePullPolicy: {{ image_pull_policy }}
        ports:
        - containerPort: {{ port }}
      imagePullSecrets:
      - name: dockerconfigjson-github-com

---

apiVersion: v1
kind: Service
metadata:
  name: {{ image_name }}
  namespace: {{ namespace }}
  labels:
    run: {{ image_name }}
spec:
  ports:
  - port: {{ port }}
    protocol: TCP
    targetPort: {{ port }}
  selector:
    app: {{ image_name }}
  type: {{ service_type }}

---

kind: Secret
type: kubernetes.io/dockerconfigjson
apiVersion: v1
metadata:
  name: dockerconfigjson-github-com
  namespace: {{ namespace }}
  labels:
    app: dockerconfigjson-github-com
data:
  .dockerconfigjson: <SECRET HERE>

Add formatter for markdown

A suggestion might be to use mdformat in conjunction with pre-commit.

These are Python packages and do not require any additional dependencies.

The update would include adding Poetry to the root of the project for the mentioned packages, and format the whole repo. There also need to be some documentation on how to setup Poetry and pre-commit for contributors.

Remove explain code in notebook

Add generator script for checkpoints

Improve and update the Python code of the experiment

The code branch (https://github.com/csia-pme/a-guide-to-mlops/tree/code) contains the code used in the guide. The original link (https://dvc.org/doc/start/data-management/pipelines) stated on the branch has been moved to https://dvc.org/doc/start/data-management/data-pipelines.

This code is only a backup of the DVC's Get Started: Data Pipelines guide. The code is available at https://github.com/iterative/example-get-started/tree/main and is generated from the repository https://github.com/iterative/example-repos-dev/tree/master/example-get-started.

I think we could improve the code made by DVC with a contribution to improve readiness with a linter and a little cleanup.

Once the contribution to the DVC repository is made, we could update the code in our repository as well for a clearer and easier approach on the model lifecycle.

Fill the forms to create the new "cours de formation continue"

I'm still waiting for an answer regarding the form "Estimation du budget".

Deploy Label Studio on Kubernetes

Now that we have a Kubernetes cluster starting from chapter 12 (still a work in progress in #81), we could use it to deploy Label Studio for part 4.

Incorrect expected value chapter 8

When going though the guide, the expected value for "Is this related to the R programming language?" with the MLEM api in chapter 8 is 1.

However, the actual value when running it is 0.

The solution would be to find a new set of params or a new sentence in order to match the expected value.

Continuous deployment of the model with MLEM and the CI/CD pipeline

Based on the work of #74, add a new 13th chapter "Continuous deployment of the model with MLEM and the CI/CD pipeline" (move the current 13th chapter "Chapter 13 - Train the model on a Kubernetes pod with CML" to "Chapter 14 - Train the model on a Kubernetes pod with CML").

This chapter will deploy the model using MLEM with the CI/CD pipeline for a "true" continuous deployment (CD) pipeline.

Add a Dev Container configuration for this repository

Add a Dev Container configuration for this repository to easily have access to:

A Python environment (with Poetry)
Material for MkDocs

Correct requirement

Demonstrate step 7

Add section about why use poetry

The goal would be to add a section to the guide under Get Started explaining why we used Poetry instead of pip with its respective pros.

Add readme template for the guide

Currently, there is no readme at the root of the repository when doing the guide. This is not ideal as it shows an empty repository of GitHub/GitLab.

Clarify checkpoints in repo

It is not really clear from the guide that the checkpoints in the repo are the code which is found on GitHub except for chapter 1.

The idea was to clarify at the end of the chapters that the code is what you should have on your own GitHub repo.

Improvements to the guide before the workshop

Add line wrapping/formatting tool

Chapters 8-9: Add pip cache to CI/CD

Update the Setup Python Action to the following in order to cache pip dependencies

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
          cache: 'pip'

       - name: Setup Python
         uses: actions/setup-python@v4
         with:
           python-version: '3.10'
+          cache: 'pip'

Update part 3

Rewrite the guide in a more formal way

As discussed with @rmarquis, it could be a good idea to rewrite some parts of the guide in a more formal way.

Examples

Chapter 2: Share your ML experiment code with Git

"Instead of relying on ZIP archives, we'll create a Git repository to enable easy collaboration with the rest of the team"
"Instead of relying on ZIP archives, we will create a Git repository to enable easy collaboration with the rest of the team"

Chapter 5: Track model evolutions with DVC

"Once this stage is created, you'll be able to change our model's configuration, evaluate the new configuration and compare its performance with the last commited ones."
"Once this stage is created, you will be able to change our model's configuration, evaluate the new configuration and compare its performance with the last commited ones."

Ignore the evaluation directory from Git and track it with DVC

In "Chapter 2: Share your ML experiment code with Git - Create a .gitignore file" (https://csia-pme.github.io/a-guide-to-mlops/the-guide/chapter-2-share-your-ml-experiment-code-with-git/#create-a-gitignore-file), the evaluation directory is not ignored by Git.

The evaluation directory could be ignored by Git and tracked by DVC so that all evaluation data is stored in S3 to clean up the repository and help the review of Pull Requests.

This change might impact some other chapters as well.

Add commit hash and link to cml report

Fix GitHub Workflows with CML

Add report screenshots

Add evaluation report output image in Chapter 1
Add plot diff report output image in Chapter 5
Add media folder for the images
Add custom style for images

Update the advanced section with poetry

The current guide on how to setup cml with Kubernetes does use Poetry to install the dependencies.

The goal would be to update the CI to work with Poetry instead.

Regenerate outputs for parts 1-2

Use python3 instead of python for consistency

python to python3
Fix caps with python -> Python

Display the importance.png in CML reports

At the moment, the evaluation/plots/importance.png file isn't used in CML reports.

We should display this file and compare it to the previous run.

Update the guide

The guide will be updated to provide better insights on the MLOps process.

In discussion with @rmarquis and @leonardcser, we have imagined to split the guide as follow:

Part 1: Local training and model evaluation

Chapter 1: Run a simple ML experiment with Jupyter Notebook
Chapter 2: Adapt and move the Jupyter Notebook to Python scripts
Chapter 3: Initialize Git and DVC for local training
Chapter 4: Reproduce the ML experiment with DVC
Chapter 5: Track model evolution with DVC

Part 2: Collaborate online in the cloud

Chapter 6: Move the ML experiment data to the cloud
Chapter 7: Move the ML experiment code to the cloud
Chapter 8: Reproduce the ML experiment in a CI/CD pipeline
Chapter 9: Track model evolution in the CI/CD pipeline with CML

Part 3: Serve and deploy the model online

Chapter 10: Save and load the model with MLEM
Chapter 11: Serve the model locally with MLEM
Chapter 12: Deploy and access the model on Kubernetes with MLEM
Chapter 13: Train the model on a Kubernetes pod with CML

Part 4: Labeling new data and retrain the model

Chapter 14: Setup Label Studio
Chapter 15: Import existing data to Label Studio
Chapter 16: Label new data with Label Studio
Chapter 17: Retrain the model from new data with DVC Sync
Chapter 18: Link the model to Label Studio and get predictions

In a workshop scenario, we think each part could be done in half a day, meaning the entire guide could be done in two days.

A new branch update-the-guide will be the starting point of this update and each chapter will have its own branch and pull request.

Add GitHub Workflows for chapters 1-3

At the moment, a CI/CD pipeline is ran for chapters 4-8.

It could be useful to add GitHub Workflows to run the CI/CD pipeline for chapters 1-3 as well to ensure it works overtime and spot the differences between the chapters on how to run the experiment.

Update collapsible admonition to resume chapter from checkpoint

Currently there is very long and complex git commands. We should simplify by having there checkpoint in another branch, and moving the desired chapter out of the git clone.

Create new branch for checkpoint
Add gitignore for .venv
Generate checkpoints and save to branch
Document the process of creating checkpoint and saving them into the branch (in the root README.md)
For each chapter add the admonition instructions for starting from a checkpoint (from the chapter before)

Add workflow integration test for the guide

We can use the generation script to also tests if everything works as an integration test.

Improvements to the homepage

We should display the logo of the HES-SO on the homepage.
We could add a table of contents pointing to the content of the guide between the title ("Classify celestial bodies using MLOps best practices") and the body of text ("Welcome to our comprehensive guide to MLOps!"). My first reflex was to scroll down, which hides the menu.

Update all dependencies

DVC, CML and MLEM have a fast development lifecycle with frequent releases (with some breaking changes).

The guide was written with fixed versions of these tools last year. It would be nice to update those tools and validate it still works in their current versions.

Add readme short description

Fix last issues with part 1+2 of the guide

Demonstrate step 7

New cleanup chapter

Add a new chapter (9), as a guide to clean up the resources and environments in order to avoid unexpected costs.
Update the conclusion used resources
Update the mkdocs.yml config to include the new chapter

Add Poetry to the guide

This issue depends on the addition of Poetry in the source code #26.

Improve the CML reports

As discussed with @AdrienAllemand and @rmarquis, we could improve the readability of the CML reports.

Here are a few things we can consider:

Overwriting the previous CML comment with the cml comment update command instead of the current cml comment create command (documentation: https://cml.dev/doc/ref/comment#update)
Displaying a first bloc of text containing the differences in parameters used between two runs (dvc metrics diff main --show-md >> report.md) and a second bloc containing all the plots and images in a collapsible block (only supported on GitHub).

The main reason I didn't enable the cml comment update in the first place on a Pull Request is to allow the user to see all differences on each revisions on a single page but it might be cumbersome.

Add mermaid graphs as visual aid

As discussed with @ludelafo and @leonardcser, it would be great to display a visual aid at the start of each chapter to help readers to visualize what is added/improved at each step.

Idea: start with a simple graph representing a jupyter notebook executed linearly, that is gradually replaced/improved by components that eventually represent the complete MLOps stack we create. Highlight the new part added in a specific color.

We can use the embedded mkdocs mermaid.js support, which would make it easy to update graph if necessary.

Suggestion to manage dependencies

After having a look at Poetry for python package management, its integration might expand too much the scope of the guides.

Using pip is much more familiar. A possibility would be having three requirements.txt files (named accordingly).

One with variable versions for internal update of the packages for the guides
Another one autogenerated by the pipeline or before push that freezes the internal requirements. This last one would be split into two requirement files:
- One containing the main packages (which is also referenced in the docs)
- One containing all the rest of the freezed dependencies

Pros:

Update packages over time internally
Fixes dependencies breaking
A user of the guides is not cluttered by the pip freeze in the docs

Cons:

A user of the guides will need to pip install two requirement files when initialising the environment

Make usage of DVCLive

Improve guide before workshop

Add dedicated section about python virtual env basics and pip

Following a discussion with @grafolytics, it has been suggested to add a dedicated chapter (or supplementary page) about the basics of Python virtual environment, python shell switching, pip and the need to better/more robust tools. Think about this additiona page as an 101, expanded documentation about the need to use poetry instead of pip/conda.

While not directly part of MLOps, the rational for this documentation is that many Data Scientists don't actually understand how the underlying Python virtual environments work, or why they are needed in the first place (and having learned most of it the hard way over many years, I'd even go as far as most Python users/developers don't either). Not understanding these basics would lead to botched attempts to solve issues in the wrong way (such as mixing pip/poetry with conda packages) and utter disappointment when these issues will eventually occurs.

Some food for thought:

Why not tell people to "simply" use pyenv, poetry or anaconda: An article that makes the point that extra tools build above what is provided as standard by Python do indeed solves some shortcoming, but bring an additional layer of complexity and abstraction that make occuring issues difficult to solve. The article does encourage to not use any additional toll and "keep it simple" to avoid issues, which is obviously not possible here as these shortcoming need to be solved in the context of MLOps (lock and versioning of transitive dependencies). The case of adding another layer of complexity/confusion is valid however.
Python Virtual Environments: A Primer: Some article about virtual environement, which might provide a good basis for this additional chapter.

What do you guys think?