Giter Site home page Giter Site logo

lorentzenchr / model-diagnostics Goto Github PK

View Code? Open in Web Editor NEW
22.0 22.0 4.0 5.88 MB

Tools for diagnostics and assessment of (machine learning) models

Home Page: https://lorentzenchr.github.io/model-diagnostics/

License: MIT License

Python 100.00%
bias-detection calibration machine-learning performance-metrics python

model-diagnostics's People

Contributors

lorentzenchr avatar m-maggi avatar mayer79 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

model-diagnostics's Issues

Make mkdocstrings filters work

Currently, the mkdocstrings filters in mkdocs.yml does not work (as I would expect). The files for generating the API Reference page are currently filtered in gen_ref_pages.py.

Generalised PAV algo for isotonic regression

Implement isotonic regression for functionals $T$ (expectiles and quantiles) with generalised PAV algorithm. See

Wrong axis labels on reliability diagrams for non-mean functionals

The xlabel and ylabel values are currently hardcoded:

  • if diagram_type == "reliability":
    ylabel = "estimated E(Y|prediction)"
    title = "Reliability Diagram"
    else:
    ylabel = "prediction - estimated E(Y|prediction)"
    title = "Bias Reliability Diagram"
    ax.set(xlabel="prediction for E(Y|X)", ylabel=ylabel)

In particular they are wrong whenever functional != "mean".

Score decomposition

Add the score/loss decomposition according to Pohle with the CORP approch (PAV algo/isotonic regression) of Dimitriadis.

M.-O. Pohle. "The Murphy Decomposition and the Calibration-Resolution Principle: A New Perspective on Forecast Evaluation". In: arXiv (2020). (arXiv: 2005.01835 [stat.ME])[https://arxiv.org/abs/2005.01835].

T. Dimitriadis, T. Gneiting, and A. I. Jordan. "Stable reliability diagrams for probabilistic classifiers". In: Proceedings of the National Academy of Sciences 118.8
(2021), e2016191118. (doi: 10.1073/pnas.2016191118)[https://doi.org/10.1073/pnas.2016191118].

Bug? compute_bias raises IndexError in minimal conda environment installation if pyarrow is not installed

Description

I installed model-diagnostics in a new conda environment and then when trying to call compute_bias I got an IndexError (also in the example regression notebook). If I install pyarrow the error disappears and the result under "expected result" appears. If I uninstall pyarrow the IndexError comes again.

Traceback (most recent call last):
  File "/workspaces/dev_env/minimal_example.py", line 8, in <module>
    _ = compute_bias(
        ^^^^^^^^^^^^^
  File "/opt/conda/envs/minimal/lib/python3.11/site-packages/model_diagnostics/calibration/identification.py", line 488, in compute_bias
    p_value[(n > 1) & (stderr_ == 0)] = 0
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Code to reproduce

conda create -n minimal python=3.11
source activate minimal
pip install model-diagnostics
...
Installing collected packages: threadpoolctl, six, pyparsing, polars, pillow, packaging, numpy, kiwisolver, joblib, fonttools, cycler, scipy, python-dateutil, contourpy, scikit-learn, matplotlib, model-diagnostics
Successfully installed contourpy-1.1.1 cycler-0.12.0 fonttools-4.43.0 joblib-1.3.2 kiwisolver-1.4.5 matplotlib-3.8.0 model-diagnostics-1.0.1 numpy-1.26.0 packaging-23.2 pillow-10.0.1 polars-0.19.7 pyparsing-3.1.1 python-dateutil-2.8.2 scikit-learn-1.3.1 scipy-1.11.3 six-1.16.0 threadpoolctl-3.2.0
import numpy as np
from model_diagnostics.calibration.plots import compute_bias

y = np.array([0., 50.])
mx = np.array([0.01, 1.5])

_ = compute_bias(
    y_obs=y,
    y_pred=mx,
    feature=None,
)
print(_)

Expected result

shape: (1, 5)                                                                                                                                                                                                                                                                                                                                                                                                   
┌───────────┬────────────┬──────────────┬─────────────┬──────────┐                                                                                                                                                                                                                                                                                                                                              
│ bias_mean ┆ bias_count ┆ bias_weights ┆ bias_stderr ┆ p_value  │                                                                                                                                                                                                                                                                                                                                              
│ ---       ┆ ---        ┆ ---          ┆ ---         ┆ ---      │                                                                                                                                                                                                                                                                                                                                              
│ f64       ┆ u32        ┆ f64          ┆ f64         ┆ f64      │                                                                                                                                                                                                                                                                                                                                              
╞═══════════╪════════════╪══════════════╪═════════════╪══════════╡                                                                                                                                                                                                                                                                                                                                              
│ -24.245   ┆ 2          ┆ 2.0          ┆ 24.255      ┆ 0.500131 │                                                                                                                                                                                                                                                                                                                                              
└───────────┴────────────┴──────────────┴─────────────┴──────────┘                                                                                                                                                                                                                                                                                                                                              

Versions

OS linux-aarch64
model-diagnostics 1.0.1
Numpy 1.26.0
Python 3.11.5.final.0

Import from model-diagnostics.scoring is not working

Trying to use model-diagnostics in two separate environments (Dataiku, python3.8 and basic Mac command line, python3.7), I have experienced the same issue with using the model-diagnostics package. Please add an example to the docs that shows how to install and import the package in a new environment. Thank you.

python3 -m pip install model-diagnostics
Requirement already satisfied: model-diagnostics
python3

from model_diagnostics.scoring import GammaDeviance, decompose
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'model_diagnostics.scoring'
from model_diagnostics import calibration
dir(calibration)
['all', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'plot_reliability_diagram', 'plots']
from model_diagnostics import scoring
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'scoring' from 'model_diagnostics' (/Users/ebaughm/Library/Python/3.7/lib/python/site-packages/model_diagnostics/init.py)

Bug? scoring.decompose raises ValueError but input arrays meet the required conditions - y is not constant

Similar to #98, however it does not have to do with y being constant.
Maybe one could check in the code whether the domain error is happening at the recalibration step and inform the user accordingly?

Reproducible example:

import numpy as np
from model_diagnostics.scoring import decompose, PoissonDeviance

y = np.array([0., 50.])
w = np.array([0.1, 0.01])
mx = np.array([0.01, 1.5])

_ = decompose(
    y_obs=y,
    y_pred=mx,
    scoring_function=PoissonDeviance(),
    weights=w
)
print(_)
Traceback (most recent call last):
  File "/workspaces/dev_env/minimal_example.py", line 8, in <module>
    _ = decompose(
        ^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 869, in decompose
    score_recalibrated = scoring_function(y_o, recalibrated, w)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 58, in __call__
    return np.average(self.score_per_obs(y_obs, y_pred), weights=weights)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 202, in score_per_obs
    raise ValueError(msg)
ValueError: Valid domain for degree=1 is y_obs >= 0 and y_pred > 0.

Versions

OS linux-aarch64
model-diagnostics 1.0.1
Numpy 1.25.2
Python 3.11.5.final.0

Make confidence intervals of reliability diagram tighter

The confidence intervals of plot_reliability_diagram are made a bit too wide by

# We make the interval conservatively monotone increasing by applying
# np.maximum.accumulate etc.
lower = -np.minimum.accumulate(-boot.confidence_interval.low)
upper = np.maximum.accumulate(boot.confidence_interval.high)

This is not conservative because in might lead to the conclusion that the model is auto-calibrated - within the uncertainty - while it actually is not.

scoring.decompose raises ValueError but input arrays meet the required conditions

Description

There could be situations where calling scoring.decompose raises an Exception

Traceback (most recent call last):
  File "/workspaces/dev_env/minimal_example.py", line 9, in <module>
    decompose(
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 848, in decompose
    score_recalibrated = scoring_function(y_obs, recalibrated, w)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 58, in __call__
    return np.average(self.score_per_obs(y_obs, y_pred), weights=weights)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 202, in score_per_obs
    raise ValueError(msg)
ValueError: Valid domain for degree=1 is y_obs >= 0 and y_pred > 0.

even though the input arrays y_obs and y_pred meet the required conditions.

Code to reproduce

import numpy as np
from model_diagnostics.scoring import decompose, PoissonDeviance

y = np.array([0., 0., 0., 0., 0., 0., 0.])
w = np.array([1000, 1000, 1000, 1000, 1000, 1000, 1000])
mx = np.array([5448.18388438, 50.01519906, 50.01704928, 50.01044007,
               50.01964483, 50.01714894, 5762.89508342])

decompose(
    y_obs=y,
    y_pred=mx,
    scoring_function=PoissonDeviance(),
    weights=w
)

Expected result

To be discussed.

Versions

OS linux-aarch64
model-diagnostics 1.0.0
Numpy 1.25.2
Python 3.11.5.final.0

Additional plotting backend plotly

So far, all the plotting capabilities rely on matplotlib. It would be nice to change the plotting backend to plotly, maybe with a global obtion:

import model_diagnostics
from model_diagnostics.calibration import plot_bias 

with model_diagnostic.config_context(plot_backend="plotly"):
    plot_bias(...)

# or globally
model_diagnostic.set_config(plot_backend="plotly"):
plot_bias(...)

The downside is that this might double the lines of code an bring a lot more maintenance burden.

Add case weights

The current functionalities do not support case weights, aka sample weights. There are, however, many use cases where they are important. Case weights should be supported by:

  • identification_function does not make sense
  • compute_bias
  • plot_bias
  • plot_reliability_diagram

Improve plot_bias with Null values

  • x (ticks) label for string/categorical Null values (legend entry and diamond shape is enough)
  • diamond shapes for string/categorical Null values, always to the right
  • add diamond to legend

Murphy diagrams

W. Ehm, T. Gneiting, A. I. Jordan, and F. Krüger. “Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings”. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78.3
(2016), pp. 505–562. doi: 10.1111/rssb.12154. arXiv: 1503.08195 [math.ST].

Consistent scoring functions for expectiles and quantiles

Add scoring functions that are (strictly) consistent for expectiles and quantiles:

  • homogeneous score (Gneiting Eq. 19)
  • Tweedie deviances (which coincide with homog. score)
  • Asymmetric piecewise linear (Gneiting Eq. 24) for quantiles
  • Asymmetric piecewise quadratic (Gneiting Eq. 27) for expectiles
  • log loss

Gneiting, Tilmann. "Making and Evaluating Point Forecasts." Journal of the American Statistical Association 106 (2009): 746 - 762. (PDF: http://arxiv.org/pdf/0912.0902.pdf)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.