For sake of simplicity, the components of the architecture will be explain as "should be" followed by "current implementation" (limited scoped, just for the demo)
- Data Analysis: Should be extracted from the feature store, in this case just a simple csv dataset (credit card default) will be processed [https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients#]
- Model tracking/experimentation: Should be a specific in-house instance of Mlflow, in this case we spin-up a Mlflow docker container. The use of model tracking is detailed as a serious of MLFlowProjects (a ds-component with their own dependencies)
- Source code / repository: In this demo is Github
- CI/CD Stage: It is done with Github Actions, you can see this in the "Actions" tab on Github
- Automated Pipelines: It is implemented as an Inference Pipeline (containing preprocess and the ml-model). All the steps is done by distincts MlflowProjects that shows all the machine learnig development lifecycle
- Model Registry: This is done through "MlFlow model registry", just a model registration (versioned)
- Trained Model / Model Serving: There is various options for implementing it, could be FlaskApi/FastApi/(Batch approach) but in this demo we support this feature with "Mlflow model serving"
- ML Prediction Service: By model serving stage, it is an active service listening in real time for post requests (see an example at "serve/" folder). It could reuse features from feature store and additional data/features coming from the request
- Performance Monitoring: It should be done by integrating monitoring tools used in the company at the time (grafana, splunk, so on), in this demo it is done by evidently-ai orchestrated by airflow thus generating drift metrics (see "drift/" folder and dashboard drift report at "drift/evidently_reports/credit_card_default_data_drift_by_airflow.html")
- Alert triggers: Should be done by integrating alerting tools the company use at the time (PagerDuty / Slack )
- Retraining: Should reuse the Automated Pipeline we've created after CI/CD, also triggered by a drift detection rule
-
List images (including hidden)
docker image ls -a
-
List Containers (including hidden)
docker ps -a
-
Move to the Project Repo
cd Documents/ProjectRepos
-
Download this Repo
git clone https://github.com/lcajachahua/model-credit-mlflow.git
-
After downloading the repo, move to the root folder
cd model-credit-mlflow
-
Build the project
docker build -t model-credit-mlflow .
-
Activate the Virtual Environment
conda activate pipeline_test
-
Downloading
mlflow run ./download -P step=download_data -P file_url="https://github.com/lcajachahua/model-credit-mlflow/raw/main/_data/default_of_credit_card_clients.csv?raw=true" -P artifact_name=raw_data.csv -P artifact_description="Pipeline for data downloading" --experiment-name credit_card_default --run-name download_data
-
Preprocessing
mlflow run ./preprocess -P step=preprocess -P input_step=download_data -P input_artifact=raw_data.csv -P artifact_name=preprocessed_data.csv -P artifact_description="Pipeline for data preprocessing" --experiment-name credit_card_default --run-name preprocess
-
Check/tests
mlflow run ./check_data -P step=check_data -P input_step=preprocess -P reference_artifact=preprocessed_data.csv -P sample_artifact=preprocessed_data.csv -P ks_alpha=0.05 --experiment-name credit_card_default --run-name check_data
-
Segregation
mlflow run ./segregate -P step=segregate -P input_step=preprocess -P input_artifact=preprocessed_data.csv -P artifact_root=data -P test_size=0.3 -P stratify=default --experiment-name credit_card_default --run-name segregate
-
Modeling
mlflow run ./random_forest -P step=random_forest -P input_step=segregate -P train_data=data/data_train.csv -P model_config=rf_config.yaml -P export_artifact=model_export -P random_seed=42 -P val_size=0.3 -P stratify=default --experiment-name credit_card_default --run-name random_forest
-
Evaluate
mlflow run ./evaluate -P step=evaluate -P input_model_step=random_forest -P model_export=model_export -P input_data_step=segregate -P test_data=data/data_test.csv --experiment-name credit_card_default --run-name evaluate
-
See the Online Environment. To finish the UI, press Ctrl+C
mlflow ui
mlflow artifacts download -r <MODEL-ID>
Online: Create a endpoint to push requests (View the example in /serve/real_time_inference.ipynb notebook)
mlflow models serve -m <EXPORTED-PATH>/model_export
-
Deactivate the Virtual Environment
conda deactivate
You can see the CI/CD pipeline for the credit card default model with Github Actions. Each commit triggers and executes the CI/CD pipeline