Giter Site home page Giter Site logo

squaredev-io / whitebox Goto Github PK

View Code? Open in Web Editor NEW
181.0 6.0 5.0 21.76 MB

[Not Actively Maintained] Whitebox is an open source E2E ML monitoring platform with edge capabilities that plays nicely with kubernetes

Home Page: https://squaredev.io/whitebox/

License: MIT License

Python 99.21% Dockerfile 0.10% Smarty 0.65% CSS 0.03%
machine-learning monitoring observability python explainability explainable-ai ml-monitoring mlflow mlops modelops

whitebox's Introduction

Update June 19, 2023: Whitebox is now prioritizing monitoring LLMs. This repo is no longer maintained, but our commitment to building fair and responsible AI applications remains. If you're passionate about ML or React and want to join us as a founding engineer, reach out to Kostas on Discord.


Whitebox - E2E machine learning monitoring

Whitebox is an open source E2E ML monitoring platform with edge capabilities that plays nicely with kubernetes


Documentation: https://whitebox-ai.github.io/whitebox

Source Code: https://github.com/whitebox-ai/whitebox

Roadmap: https://github.com/whitebox-ai/whitebox/milestones

Issue tracking https://github.com/orgs/whitebox-ai/projects/1/views/3

Discord: https://discord.gg/G5TKJMmGUt


Whitebox is an open source E2E ML monitoring platform with edge capabilities that plays nicely with kubernetes.

The key features are:

  • Classification models metrics
  • Regression models metrics
  • Data / model drift monitoring
  • Alerts

Design guidelines:

  • Easy: Very easy to set up and get started with.
  • Intuitive: Designed to be intuitive and easy to use.
  • Pythonic SDK: Pythonic SDK for building your own monitoring infrastructure.
  • Robust: Get production-ready MLOps system.
  • Kubernetes: Get production-ready code. With automatic interactive documentation.

Installation

Install the server using docker compose. See the docs for more info.

Install the SDK with pip:

pip install whitebox-sdk

How to use

After you are done installing the server and the SDK, you can start using it.

After you get the API key, all you have to do is create an instance of the Whitebox class adding your host and API key as parameters:

from whitebox import Whitebox

wb = Whitebox(host="127.0.0.1:8000", api_key="some_api_key")

Now you're ready to start using Whitebox! Read the documentation to learn more about the SDK.

Set up locally for development

Whitebox supports Postgres and SQLite. You can use either one of them. If you want to use SQLite, you need to set up a SQLite database and set the DATABASE_URL environment variable to the database URL. If you want to use Postgres, you don't need to do anything. Just have a Postgres database running and set the DATABASE_URL environment variable to the database URL.

Install packages:

python -m venv .venv
pip install -r requirements.txt
pre-commit install

Run the server:

ENV=dev uvicorn whitebox.main:app --reload

Quick way to start a postgres database:

docker compose up postgres -d

Tests:

  • Run: ENV=test pytest or ENV=test pytest -s to preserve logs.
  • Watch: ENV=test ptw
  • Run test coverage ENV=test coverage run -m pytest
  • Look at coverage report: coverage report or coverage html to generate an html. To view it in your browser open the htmlcov/index.html file.

Docs

Documentation is hosted bby GitHub here: https://whitebox-ai.github.io/whitebox

mkdocs serve -f docs/mkdocs/mkdocs.yml -a localhost:8001

Deploy Whitebox

Using docker

Whitebox uses postgres as its database. They need to run in the same docker network. An example docker-compose file is located in the examples folder. Make sure you replace the SECRET_KEY with one of your own. Look below for more info.

docker-compose -f examples/docker-compose/docker-compose.yml up

If you just need to run Whitebox, make sure you set the DATABASE_URL in the environment.

docker run -dp 8000:8000 sqdhub/whitebox:main -e DATABASE_URL=postgresql://user:password@host:port/db_name

To save the api key encrypted in the database, provide a SECRET_KEY variable in the environment that is consisted of a 16 bytes string.

python -c "from secrets import token_hex; print(token_hex(16))"

Save this token somewhere safe.

The api key can be retrieved directly from the postgres database:

API_KEY=$(docker exec <postgres_container_id> /bin/sh -c "psql -U postgres -c \"SELECT api_key FROM users WHERE username='admin';\" -tA")

echo $API_KEY

If you've set the SECRET_KEY in the environment get the decrypted key using:

docker exec <whitebox_container_id> /usr/local/bin/python scripts/decrypt_api_key.py $API_KEY

Using Helm

You can also install Whitebox server and all of its dependencies in your k8s cluster using helm

helm repo add squaredev https://chartmuseum.squaredev.io/
helm repo update
helm install whitebox squaredev/whitebox

Contributing

We happily welcome contributions to Whitebox. You can start by opening a new issue!

whitebox's People

Contributors

gcharis avatar ksiabani avatar momegas avatar nickntamp avatar renatocmaciel avatar sinnec avatar stavrostheocharis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

whitebox's Issues

Handle inference sets with some actuals

In order to calculate performance metrics for a model, the actuals from the inference rows are required.
As of now, whitebox only handles the cases where an inference set has all the actuals or an inference set has no actuals. In case there's a set of inferences with only some actuals, the performance metrics pipeline won't work as expected.

Proposed solution:
Since actuals are optional, filter out those inference rows that don't have actuals and execute the performance metrics pipeline only for those rows that have actuals. In case there are no actuals at all, the pipeline will be skipped (that's already implemented).

Log inferences

Description

As a developer I want to be able to log my mode's input data and predictions in prod

Data to be logged

  • features
  • predictions
  • raw inputs
  • actuals

Tasks

  • Inferences basic APIs
  • Inferences batch create

Check model monitor's status before creating alerts

Currently the status of a monitor (active/inactive) isn't checked before running the checks for alert creation so even if a monitor is inactive it can create an alert.

Suggested Solutions:

  • Check for the monitor's status before creating the alerts OR
  • Fetch only active monitors from the database in alerts pipeline.

Missing values count should be performed in unprocessed dataset

As the missing value count is an indicator of missing values in a feature of a dataset, has much more value to be performed in the unprocessed dataset rather than in the processed one - most likely a handling procedure of missing values would had been performed already in the processed dataset.

Also, for monitoring and alerting reason, has more value to count the missing values as a percentage of the total entries of each feature (e.g. if a feature has 100 entries and 3 missing values the missing values count is 3%). This ease a lot the setting of thresholds for the specific metric.

Monitoring / alerting functionality

Description

Implement the ability for a user to create a monitor for a metric and get alerts when conditions are met.

The metrics that can be monitored are:

  • accuracy
  • precision
  • recall
  • f1
  • data drift for a feature
  • concept drift for a feature
  • missing values count for a feature

Proposed solution

We need some rest endpoints to create the monitors (this is quite complex). A CRUD set of endpoints should be enough.

Then, after the analytics pipelines run, we can fetch the monitors from the database and compare the conditions of the monitors with the analytics result. When a condition is met, an alert should be created in the database.

Create a pipeline that calculates data drifts per feature compared to training set

Description

As a developer, I want to be able to track data drifts per feature compared to training set that may occur in my ML app

Proposed solution

The pipeline should take as an input the training set and the inference data and calculate the drifting distance per features and return it as output.

Proposed algos

  • evidentlyai
  • nanny ml

Acceptance criteria

  • implementation
  • tests

Analytics pipeline for all model metrics

Description

As whitebox gathers the data from the model inferences, we need a way for all the metrics to be calculated every T (time). These metrics are later going to be used for visualizations, and issue #24 (not part of this issue).

Proposed solution

The proposed initial solution is a cron job that gathers all the needed data for every metric and calculates it, then saves it to the database. The time the cron job runs should be configurable from an ENV variable (this can change in later iterations).

The pipelines the cron needs to run are the following:

[Roadmap] Add regression models

Description

Machine Learning Regression is a technique for investigating the relationship between independent variables or features and a dependent variable or outcome. It's used as a method for predictive modelling in machine learning, in which an algorithm is used to predict continuous outcomes.

Whitebox is currently producing metrics for classification models. We need to expand this to produce metrics for regression models as well.

Metrics

Example of metrics:

  1. Mean Squared Error
  2. Root Mean Squared Error
  3. Mean Absolute Error

The outcome should be that the above metrics are stored in the database just like classification problems and then depending on the model the API needs to return the appropriate results.

Explain model inference functionality

Description

We need to create an endpoint that, given the id of an inference that exists in the database, the API will return the explanation using SHAP values or a similar algo. For this to be possible, we need 3 things to be in place.

  1. A training dataset
  2. A decision tree based trained model
  3. The inference on which to perform the explainability (should be in the database).

So we need the following to be completed:

Perform global explainability on the inference dataset

As for now, the pipeline performs expainability per inference row.
Explainability for the whole inference dataset maybe be useful in the future.
Some related code has already be written:

  • Pipeline
def create_xai_pipeline_classification_per_inference_dataset(training_set: pd.DataFrame, target: str, inference_set: pd.DataFrame, type_of_task: str, load_from_path = None
)-> Dict[str, Dict[str, float]]:
    
    xai_dataset=training_set.drop(columns=[target])
    explainability_report={}

    # Make a mapping dict which will be used lated to map the explainer index
    # with the features names

    mapping_dict={}
    for feature in range (0,len(xai_dataset.columns.tolist())):
        mapping_dict[feature]=xai_dataset.columns.tolist()[feature]


    # Expainability for both classifications tasks
    # We have again to revisit here in the future as in case we upload the model
    # from the file system we don't care if it is binary or multiclass

    if type_of_task=='multiclass_classification':
        
        # Giving the option of retrieving the local model

        if load_from_path != None:
            model = joblib.load('{}/lgb_multi.pkl'.format(load_from_path))
        else:
            model, eval = create_multiclass_classification_training_model_pipeline(training_set, target)
            explainer = lime.lime_tabular.LimeTabularExplainer(xai_dataset.values, feature_names=xai_dataset.columns.values.tolist(), mode="classification",random_state=1)
        
        for inference_row in range(0,len(inference_set)):
            exp = explainer.explain_instance(inference_set.values[inference_row], model.predict)
            med_report=exp.as_map()
            temp_dict = dict(list(med_report.values())[0])
            map_dict = {mapping_dict[name]: val for name, val in temp_dict.items()}
            explainability_report["row{}".format(inference_row)]= map_dict
               

    elif type_of_task=='binary_classification':     
        
        # Giving the option of retrieving the local model

        if load_from_path != None:
            model = joblib.load('{}/lgb_binary.pkl'.format(load_from_path))
        else:
            model, eval = create_binary_classification_training_model_pipeline(training_set, target) 
            explainer = lime.lime_tabular.LimeTabularExplainer(xai_dataset.values, feature_names=xai_dataset.columns.values.tolist(), mode="classification",random_state=1)

        for inference_row in range(0,len(inference_set)):
            exp = explainer.explain_instance(inference_set.values[inference_row], model.predict_proba)
            med_report=exp.as_map()
            temp_dict = dict(list(med_report.values())[0])
            map_dict = {mapping_dict[name]: val for name, val in temp_dict.items()}
            explainability_report["row{}".format(inference_row)]= map_dict

            
    return explainability_report 
  • Unit tests
def test_create_xai_pipeline_classification_per_inference_dataset(self):
        binary_class_report =create_xai_pipeline_classification(df_binary,"target",df_binary_inference,"binary_classification")
        multi_class_report=create_xai_pipeline_classification(df_multi,"target",df_multi_inference,"multiclass_classification")
        binary_contribution_check_one = binary_class_report["row0"]["worst perimeter"]
        binary_contribution_check_two = binary_class_report["row2"]['worst texture']
        multi_contribution_check_one = multi_class_report["row0"]["hue"]
        multi_contribution_check_two = multi_class_report["row9"]["proanthocyanins"]
        assert (len(binary_class_report)) == len(df_binary_inference)
        assert (len(multi_class_report)) == len(df_multi_inference)
        assert (round(binary_contribution_check_one, 3)) == 0.253
        assert (round(binary_contribution_check_two, 2)) == -0.09
        assert (round(multi_contribution_check_one, 2)) == -0.08
        assert (round(multi_contribution_check_two, 3)) == -0.023

Create metrics calculation pipeline for classification models

Create a pipeline that given features, predictions, actuals etc calculates the following:
You can find more here

Simple feature metrics

  • Missing Count
  • Average
  • Minimum
  • Maximum
  • Sum
  • Variance
  • Standard Deviation

Model performance metrics

  • Precision
  • Recall
  • F1
  • Accuracy
  • True Positive Count
  • True Negative Count
  • False Positive Count
  • False Negative Count

Acceptance criteria

  • The pipeline must assume that gets all the required data as input (actuals may be missing) and returns a result of the above.
  • Unit test happy and error cases

Upload the training datasets used for the creation of models

Description

For some functionality to be implemented, the training dataset of the model is needed. We need to create an endpoint that allows the user to upload the training dataset used for the creation of models. The rows should be saved in the database. We also need a way to retrieve them altogether along with the dataset info.

Proposed solution

Create the following endpoints

  • POST v1/dataset/metadata: Creates the dataset metadata entity in the database
  • POST v1/dataset/{dataset_id}/rows: Creates the rows in the database. Accepts an array of the rows as batch
  • GET v1/dataset/metadata: Returns the metadata of the dataset (name, etc)
  • GET v1/dataset/{dataset_id}: Returns the rows of the dataset

Schema naming pattern and default properties

There's a need to discuss the schema's naming pattern and default values because with constant changes and inconsistencies the situation will get complicated and difficult to fix as app is scaling.

Training model pipeline failing with KeyError

The way the API and pipeline are designed right now:

  • A user inserts their dataset rows and when saved to the database the training model pipeline is triggered.
  • If everything is successful the trained model is saved in the user's filesystem.

Here there's a possibility that the pipeline will fail with a KeyError because the given target when creating a model doesn't exist in the training dataset (a user's fault). The thing is though that a trained model isn't produced and the dataset rows are saved in the database. The user can't delete them from the SDK or API and if they insert new and correct ones, it's going to cause a chaos with all the metrics calculations due to columns mismatches.

Suggested solutions:

  1. Save the dataset rows in the database, train the model and if it fails (Raising a KeyError exception), delete the saved dataset rows from the database so the user can start fresh. OR
  2. Before the dataset rows are saved in the database, check if the target exists in the dataset provided by the user.

Add ability to define custom metrics in the monitoring pipelines

Description

At the moment, Whitebox calculates specific monitoring metrics that are considered a standard in the industry.

Some users though have use cases where these metrics are not enough and would like to integrate their own custom metrics for whitebox to calculate.

Migrate all analytics pipelines to Airflow

Since whitebox workflows become more and more complex, we need a way to orchestrate them. We can use the Airflow Python API to define and execute workflows. A workflow is defined as a directed acyclic graph (DAG) in Airflow. Each node in the DAG represents a task, and the edges between nodes represent dependencies between tasks.

The architecture should become roughly as follows:

sequenceDiagram

participant API as API
participant Database as Database
participant Airflow as Airflow

API ->> Database: Store data

loop Cron Workflows
    Database ->> Airflow: Extract data
    Airflow ->> Airflow: Analyze data
    Airflow ->> Database: Store result
end

Some implementation notes:

  • Airflow should be a different deployment that the API. Approach it as a different service that runs all the workflows.
  • It should use the same instance of SQL but have its own database as a database backend
  • All current (and future) metrics calculations should happen inside airflow.

[Roadmap] Publish Helm charts

Description

Create helm chart files so we and other users can deploy whitebox on their environment

Proposed solution

Create the helm chart in this repo in a helm folder in the root of the project

Register model and model data

Description

As a developer, I want to be able to register my model and model data

Objects to be registered

  • Model
  • Training data
  • Test data

Analysis

Upon registering the model, a surrogate model should be created to be used for XAI

Get training and inference datasets from S3

Is your feature request related to a problem? Please describe.
At the moment, whitebox only accepts the training dataset and inferences through the API. In order to load data faster, users should be able to define how whitebox can retrieve this data from an SQL database.

Describe the solution you'd like
In the UI / API / SDK the user should be able to choose a source for the model training dataset and inferences. A possible flow is the following:

  1. Create the model (either through UI or SDK)
  2. Go to settings and select where the data will be coming from (S3, in the future SQL, etc)
  3. Add the required credentials for S3 bucket and the corresponding file

Describe alternatives you've considered
At the moment, whitebox only accepts the training dataset and inferences through the API. This is an alternative but we need to extend it.

Additional context
Here is an exmaple of how Aporia does something similar: https://docs.aporia.com/introduction/quickstart

Inserting timestamps and actuals of inferences through SDK

The way SDK is designed at the moment, there are two issues concerning inferences:

Timestamps:
The user can insert a timestamp that will apply to the whole set of inferences but not to a specific row individually.

Actuals:
The user cannot insert actuals for the inferences. Even if they do by placing them in the processed dataframe, the SDK won't handle it and the 'actual' field will always be empty. Since there's no way for the user to update the actuals at a later time, the user will never get the performance metrics.

Content

From megas' chapters

  • Tutorial - user guide (Step by step detailed guide with examples on how to use WB)

Whitebox Grafana dashboard

As whitebox produces metrics, we need a way to display them in a dashboard.

Grafana is a nice and flexible dashboard tool made exactly for this kind of purposes.

The dashboard should display key metrics and statistics related to the performance of machine learning models. This will help users to track the performance of models over time, identify areas for improvement, and ensure that models are serving customers effectively.

Some ideas for what to include in the dashboard:

  • Model accuracy over time
  • Model training and evaluation times
  • Model resource usage (e.g. GPU memory and compute time)
  • Number of model predictions made

In short, we need a dashboard that displays all Whitebox metrics depending on model type.

Note: Since dashboards in Grafana cannot be dynamic (you only import a JSON) we may need an endpoint that produces the said JSON so that the user can copy it and import it in Grafana. This should also be doable from the SDK as well.

Track model's performance

Description

As a developer I want to be able to track my model's performance with specific metrics

Proposed metrics:

  • Mean Squared Error
  • Root Mean Squared Error
  • Mean Absolute Error
  • Precision
  • Recall
  • F1
  • Accuracy
  • True Positive Count
  • True Negative Count
  • False Positive Count
  • False Negative Count

Copy information of a model from MLFlow

Description

Since MLFlow is an industry standard and a lot of people use it, it makes sense that whitebox integrates with it and uses it as a data store, or something similar providing missing functionality in the monitoring field of MLOps

[Roadmap] Add example notebooks in the repo and docs and embed them in docs

We want to improve the usability of our documentation by adding example notebooks that demonstrate how to use the various features of our software. These notebooks should be included in the documentation, along with embedded versions that users can interact with. This will allow users to see the code in action and experiment with the examples themselves. In addition, we believe that this will make it easier for users to understand how to use the software and will help them get up and running more quickly.

Value error in ROC AUC score

The error is:

image

How to reproduce the error:
Create a new small dataset of max 5 rows with only 2 labels and train a model where it predicts always the one value out of 2.

How to fix the error:
(There are 2 parts that need to be fixed)

  • In Line 67 in analytics/models/pipeline need to change the position for calculation of the ROC score

    image

    Based on the documentation of the package:
    image

  • Also at the splitting part to train and test set, need to ensure that each one of the classes will always be represented into the test side, otherwise, the same error will occur again.

Detect missing feature values

Description

As a developer I want to track feature quality over the data sent to my model and uncover missing feature values,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.