End-to-End CV MLOps Project – How many House Sparrows?

The source code and some results of local end-to-end implementation of MLOps in practice for an object detection web application project, including the source code of the application itself.

About • Installation • Configuration • How To Use • App Demo • How To Test

About

The goal of creating this project was to learn how to implement end-to-end MLOps in practice. As a result, the machine learning pipeline was built that fine-tunes a pre-trained model and the web application for detecting house sparrows in a photo that interacts with the model via an API. This repository contains the source code of the pipeline, including the source code of the web app and API, and some of its run results, which are needed to reproduce the pipeline and demonstrate its work. The following diagram shows more clearly how MLOps is implemented in the project:

Its extended version can be viewed in docs/project-mlops-diagram-extended.svg.

More information about the dataset and model used can be found in docs/dataset-card.md and docs/model-card.md, respectively.

Installation

The source code was developed on Windows. All the Python modules in the repository, except those in the deployment folder, were also run and tested on Linux (Ubuntu) (in Google Colab and using GitHub Actions, respectively).

First, clone this repo and go to its root directory. Then, create a virtual environment with Python 3.9 (not tested on other versions) and activate it. After that, install either all the dependencies of the project by running from the command line:

$ python -m pip install -r requirements/dev-requirements.txt

or only those needed for a specific MLOps task (see table below).

Note The repo contains two types of ML pipelines: DVC pipelines (including pipelines/dvc.yaml and pipelines/new_data_pipeline/dvc.yaml) and a Metaflow workflow (pipelines/mlworkflow.py) (not available for Windows!). If you plan to use the second one, then uncomment the appropriate tool in the pipe-requirements.txt file to install it.

Show requirements table

MLOps Component/Task	Requirements	Used Files & Folders	Output Files & Folders
EDA	eda-requirements.txt	notebooks/EDA.ipynb, data/raw	notebooks/EDA.ipynb (outputs), data/prepared (optional)
Data Checking	data-check-requirements.txt	data_checks, great_expectations, data, configs/params.yaml	data_checks/data_check_results, great_expectations/uncommitted, pipe.log
Model Training	train-requirements.txt	src, data, configs	hyper_opt, configs/best_params.yaml, mlruns, models, outputs, reports/model_report.md, pipe.log
Pipeline/Workflow	pipe-requirements.txt	pipelines, data_checks, great_expectations, src, data, configs	.dvc, pipelines (/dvc.lock & /dvc_dag.md) or .metaflow, data_checks/data_check_results, great_expectations, hyper_opt, configs/best_params.yaml, mlruns, models, outputs, reports/model_report.md, pipe.log
Model Deployment / API & App	deployment-requirements.txt	deployment (except /demo), src/train/train_inference_fns.py, src/utils.py, mlruns, configs/params.yaml, .streamlit	monitoring/current_deployed_model.yaml, monitoring/data
Model Monitoring	monitoring-requirements.txt	monitoring, data, configs/params.yaml	monitoring/deployed_model_check_results, reports/deployed_model_performance_report.html, mon.log
Continuous Integration (CI)	ci-requirements.txt	.github, tests (except /webapi), pytest.ini, data_checks, src	-
Web App Demo	requirements.txt (in deployment/demo)	deployment/demo	-

If you used dev-requirements.txt, run pre-commit install to install git hook scripts from the .pre-commit-config.yaml file (including for the DVC project in this repo). If you want to use your own Great Expectations/DVC projects, ensure that they are initialized in the root directory of the repo, or do it by running the great_expectations init / dvc init commands. For details, refer to the documentation for the respective tools.

The dataset to reproduce the ML pipeline of this project can be found here. To run the pipeline on your own data, they must be organized as described in docs/dataset-card.md. If necessary, configure items of the Great Expectations project according to the new data. See samples of the used data in the tests/data_samples folder.

Configuration

The project is configured to run on a local machine, although this can be changed if necessary. The main settings of the MLOps for this project are held in the configs/params.yaml file.

Note To use remote storages or advanced features, some installed Python packages, DVC and MLflow for example, require additional dependencies to be installed. See their documentation.

How To Use

Below are the CLI commands for MLOps components, which are executed manually in this implementation. Their order matters because the following commands depend on the results of the previous ones. Other components are already included in the pipelines/workflow, such as data verification/validation, hyperparameter optimization, and model stage transition to production, or are triggered when code is pushed to GitHub, such as tests.

Run either the pipeline or the workflow (both contain similar stages/steps) to train (fine-tune) a object detection model:

# (Optional) Generate a Python script for the 'new_data_expectation_check' step
$ great_expectations checkpoint script new_image_info_and_bbox_ckpt.yml

# Generate a Python script for the 'raw_data_expectation_check' step
$ great_expectations checkpoint script image_info_and_bbox_ckpt.yml

# Run the model training workflow
$ python pipelines/mlworkflow.py run

or with the --production flag if the trained model will be used in production, regardless of its performance.

Warning The workflow is created using Metaflow, which is not available on Windows.

# (Optional) Reproduce the new data check pipeline
$ dvc repro pipelines/new_data_pipeline/dvc.yaml

# Reproduce the model training pipeline without including new data checks
$ dvc repro pipelines/dvc.yaml

or use the --all-pipelines flag to reproduce all the pipelines for all the dvc.yaml files present in the repo. DAGs of the pipelines can be viewed in the pipelines/dvc_dag.md file.

(Optional) View a history of model training runs:

```
$ mlflow ui --backend-store-uri sqlite:///mlruns/mlruns.db
```
Note Change the value for --backend-store-uri to match the tracking server URI set for MLflow.

Use a deployed model via the API in the web app to get its performance data:

# Run the API on uvicorn server
$ python deployment/api.py

# Run the web app on Streamlit server
$ streamlit run deployment/app.py

Monitor the performance of the deployed model:

$ python monitoring/monitor_deployed_model.py

As a result of executing the commands, the project directory will have the structure similar to that presented in the docs/project-directory-structure.md file.

Notebooks in this repo, except EDA.ipynb, contain trial runs of the data checks and initial experiments to build the model training pipeline.

App Demo

You can try out the web application from this project on .

If the demo is not available or not displayed, then it can be seen as a static image in docs/app-image.pdf.

How To Test

If pytest and its required plugins are not installed, run from the command line:

$ python -m pip install -r requirements/test-requirements.txt

Test configurations are held in the pytest.ini file.

# Run all the tests in the repo except API ones
$ pytest --ignore=tests/webapi/ tests/

# Run uvicorn server and the API
$ python deployment/api.py

# Run API tests
$ pytest tests/webapi/

Note Some of the tests take a long time, they are marked as "slow".
# Skip slow tests
$ pytest -m "not slow" tests/

Warning Sometimes the integration tests fail. This is due to the stochastic nature of machine learning algorithms. Try running the test again.

mdshohidul143 / end-to-end-cv-mlops-project Goto Github PK

end-to-end-cv-mlops-project's Introduction

End-to-End CV MLOps Project – How many House Sparrows?

The source code and some results of local end-to-end implementation of MLOps in practice for an object detection web application project, including the source code of the application itself.

About

Installation

Configuration

How To Use

App Demo

How To Test

end-to-end-cv-mlops-project's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent