Giter Site home page Giter Site logo

model-credit-mlflow's Introduction

Credit Card Default Example

Used MLOPS Architecture

alt

Key points

For sake of simplicity, the components of the architecture will be explain as "should be" followed by "current implementation" (limited scoped, just for the demo)

  1. Data Analysis: Should be extracted from the feature store, in this case just a simple csv dataset (credit card default) will be processed [https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients#]
  2. Model tracking/experimentation: Should be a specific in-house instance of Mlflow, in this case we spin-up a Mlflow docker container. The use of model tracking is detailed as a serious of MLFlowProjects (a ds-component with their own dependencies)
  3. Source code / repository: In this demo is Github
  4. CI/CD Stage: It is done with Github Actions, you can see this in the "Actions" tab on Github
  5. Automated Pipelines: It is implemented as an Inference Pipeline (containing preprocess and the ml-model). All the steps is done by distincts MlflowProjects that shows all the machine learnig development lifecycle
  6. Model Registry: This is done through "MlFlow model registry", just a model registration (versioned)
  7. Trained Model / Model Serving: There is various options for implementing it, could be FlaskApi/FastApi/(Batch approach) but in this demo we support this feature with "Mlflow model serving"
  8. ML Prediction Service: By model serving stage, it is an active service listening in real time for post requests (see an example at "serve/" folder). It could reuse features from feature store and additional data/features coming from the request
  9. Performance Monitoring: It should be done by integrating monitoring tools used in the company at the time (grafana, splunk, so on), in this demo it is done by evidently-ai orchestrated by airflow thus generating drift metrics (see "drift/" folder and dashboard drift report at "drift/evidently_reports/credit_card_default_data_drift_by_airflow.html")
  10. Alert triggers: Should be done by integrating alerting tools the company use at the time (PagerDuty / Slack )
  11. Retraining: Should reuse the Automated Pipeline we've created after CI/CD, also triggered by a drift detection rule

Starting the Demo

  • List images (including hidden)

    docker image ls -a
    
  • List Containers (including hidden)

    docker ps -a
    
  • Move to the Project Repo

    cd Documents/ProjectRepos
    
  • Download this Repo

    git clone https://github.com/lcajachahua/model-credit-mlflow.git
    
  • After downloading the repo, move to the root folder

    cd model-credit-mlflow
    
  • Build the project

    docker build -t model-credit-mlflow .
    
  • Activate the Virtual Environment

    conda activate pipeline_test
    

Step MLOps - MLFlow

  1. Downloading

    mlflow run ./download -P step=download_data -P file_url="https://github.com/lcajachahua/model-credit-mlflow/raw/main/_data/default_of_credit_card_clients.csv?raw=true" -P artifact_name=raw_data.csv -P artifact_description="Pipeline for data downloading" --experiment-name credit_card_default --run-name download_data
    
  2. Preprocessing

    mlflow run ./preprocess -P step=preprocess -P input_step=download_data -P input_artifact=raw_data.csv -P artifact_name=preprocessed_data.csv -P artifact_description="Pipeline for data preprocessing" --experiment-name credit_card_default --run-name preprocess
    
  3. Check/tests

    mlflow run ./check_data -P step=check_data -P input_step=preprocess -P reference_artifact=preprocessed_data.csv -P sample_artifact=preprocessed_data.csv -P ks_alpha=0.05 --experiment-name credit_card_default --run-name check_data
    
  4. Segregation

    mlflow run ./segregate -P step=segregate -P input_step=preprocess -P input_artifact=preprocessed_data.csv -P artifact_root=data -P test_size=0.3 -P stratify=default --experiment-name credit_card_default --run-name segregate
    
  5. Modeling

    mlflow run ./random_forest -P step=random_forest -P input_step=segregate -P train_data=data/data_train.csv -P model_config=rf_config.yaml -P export_artifact=model_export -P random_seed=42 -P val_size=0.3 -P stratify=default --experiment-name credit_card_default --run-name random_forest
    
  6. Evaluate

    mlflow run ./evaluate -P step=evaluate -P input_model_step=random_forest -P model_export=model_export -P input_data_step=segregate -P test_data=data/data_test.csv --experiment-name credit_card_default --run-name evaluate
    
  7. See the Online Environment. To finish the UI, press Ctrl+C

     mlflow ui
    

Mlflow Deployment

Batch: Download the mlflow model and send it to the Production Environment
mlflow artifacts download -r <MODEL-ID>
Online: Create a endpoint to push requests (View the example in /serve/real_time_inference.ipynb notebook)
mlflow models serve -m <EXPORTED-PATH>/model_export

Finishing the Environment

  • Deactivate the Virtual Environment

    conda deactivate
    

alt

Drift detection

alt

alt

CI/CD: Continuous machine learning integration

You can see the CI/CD pipeline for the credit card default model with Github Actions. Each commit triggers and executes the CI/CD pipeline

alt

model-credit-mlflow's People

Contributors

lcajachahua avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.