lorentzenchr / model-diagnostics Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 4.0 5.88 MB

Tools for diagnostics and assessment of (machine learning) models

Home Page: https://lorentzenchr.github.io/model-diagnostics/

License: MIT License

Python 100.00%

bias-detection calibration machine-learning performance-metrics python

model-diagnostics's People

Contributors

Stargazers

Watchers

Forkers

mayer79 m-maggi psarka

model-diagnostics's Issues

Confidence intervals for reliability diagrams

Add confidence intervals to the reliability diagram, either by

bootstrapping
Dimitriadis et al, https://cran.r-project.org/web/packages/calibrationband/index.html

Dimitriadis, T., Duembgen, L., Henzi, A., Puke, M., & Ziegel, J.F. (2022). Honest calibration assessment for binary outcome predictions. arXiv:2203.04065

Add sample uncertainty to score decompose

DOC remove plot_backend as parameter from plot_bias

Replace polars deprecated groupby by group_by

See https://www.pola.rs/posts/polars-0-19-upgrade-guide/.

Make mkdocstrings filters work

Currently, the mkdocstrings filters in mkdocs.yml does not work (as I would expect). The files for generating the API Reference page are currently filtered in gen_ref_pages.py.

Generalised PAV algo for isotonic regression

Implement isotonic regression for functionals $T$ (expectiles and quantiles) with generalised PAV algorithm. See

Jordan, A.I., Mühlemann, A. & Ziegel, J.F. Characterizing the optimal solutions to the isotonic regression problem for identifiable functionals. Ann Inst Stat Math 74, 489–514 (2022). https://doi.org/10.1007/s10463-021-00808-0 (PDF: https://www.ism.ac.jp/editsec/aism/74/0489.pdf)
R package monotone with publication
Busing, F. M. T. A. (2022). Monotone Regression: A Simple and Fast O(n) PAVA Implementation. Journal of Statistical Software, Code Snippets, 102(1), 1–25. https://doi.org/10.18637/jss.v102.c01

Wrong axis labels on reliability diagrams for non-mean functionals

The xlabel and ylabel values are currently hardcoded:

model-diagnostics/src/model_diagnostics/calibration/plots.py

Lines 208 to 214 in 78ace74

    
           if diagram_type == "reliability": 
        
               ylabel = "estimated E(Y|prediction)" 
        
               title = "Reliability Diagram" 
        
           else: 
        
               ylabel = "prediction - estimated E(Y|prediction)" 
        
               title = "Bias Reliability Diagram" 
        
           ax.set(xlabel="prediction for E(Y|X)", ylabel=ylabel)

In particular they are wrong whenever functional != "mean".

Add quantile and expectile to scoring.decompose

Score decomposition

Add the score/loss decomposition according to Pohle with the CORP approch (PAV algo/isotonic regression) of Dimitriadis.

M.-O. Pohle. "The Murphy Decomposition and the Calibration-Resolution Principle: A New Perspective on Forecast Evaluation". In: arXiv (2020). (arXiv: 2005.01835 [stat.ME])[https://arxiv.org/abs/2005.01835].

T. Dimitriadis, T. Gneiting, and A. I. Jordan. "Stable reliability diagrams for probabilistic classifiers". In: Proceedings of the National Academy of Sciences 118.8
(2021), e2016191118. (doi: 10.1073/pnas.2016191118)[https://doi.org/10.1073/pnas.2016191118].

Partial Dependence

compute_partial_dependence and plot_partial_dependence

Bug? compute_bias raises IndexError in minimal conda environment installation if pyarrow is not installed

Description

I installed model-diagnostics in a new conda environment and then when trying to call compute_bias I got an IndexError (also in the example regression notebook). If I install pyarrow the error disappears and the result under "expected result" appears. If I uninstall pyarrow the IndexError comes again.

Traceback (most recent call last):
  File "/workspaces/dev_env/minimal_example.py", line 8, in <module>
    _ = compute_bias(
        ^^^^^^^^^^^^^
  File "/opt/conda/envs/minimal/lib/python3.11/site-packages/model_diagnostics/calibration/identification.py", line 488, in compute_bias
    p_value[(n > 1) & (stderr_ == 0)] = 0
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Code to reproduce

conda create -n minimal python=3.11
source activate minimal
pip install model-diagnostics

...
Installing collected packages: threadpoolctl, six, pyparsing, polars, pillow, packaging, numpy, kiwisolver, joblib, fonttools, cycler, scipy, python-dateutil, contourpy, scikit-learn, matplotlib, model-diagnostics
Successfully installed contourpy-1.1.1 cycler-0.12.0 fonttools-4.43.0 joblib-1.3.2 kiwisolver-1.4.5 matplotlib-3.8.0 model-diagnostics-1.0.1 numpy-1.26.0 packaging-23.2 pillow-10.0.1 polars-0.19.7 pyparsing-3.1.1 python-dateutil-2.8.2 scikit-learn-1.3.1 scipy-1.11.3 six-1.16.0 threadpoolctl-3.2.0

import numpy as np
from model_diagnostics.calibration.plots import compute_bias

y = np.array([0., 50.])
mx = np.array([0.01, 1.5])

_ = compute_bias(
    y_obs=y,
    y_pred=mx,
    feature=None,
)
print(_)

Expected result

shape: (1, 5)                                                                                                                                                                                                                                                                                                                                                                                                   
┌───────────┬────────────┬──────────────┬─────────────┬──────────┐                                                                                                                                                                                                                                                                                                                                              
│ bias_mean ┆ bias_count ┆ bias_weights ┆ bias_stderr ┆ p_value  │                                                                                                                                                                                                                                                                                                                                              
│ ---       ┆ ---        ┆ ---          ┆ ---         ┆ ---      │                                                                                                                                                                                                                                                                                                                                              
│ f64       ┆ u32        ┆ f64          ┆ f64         ┆ f64      │                                                                                                                                                                                                                                                                                                                                              
╞═══════════╪════════════╪══════════════╪═════════════╪══════════╡                                                                                                                                                                                                                                                                                                                                              
│ -24.245   ┆ 2          ┆ 2.0          ┆ 24.255      ┆ 0.500131 │                                                                                                                                                                                                                                                                                                                                              
└───────────┴────────────┴──────────────┴─────────────┴──────────┘

Versions

OS linux-aarch64
model-diagnostics 1.0.1
Numpy 1.26.0
Python 3.11.5.final.0

Add confidence_level to plot_bias

Add example for quantile regression

Import from model-diagnostics.scoring is not working

Trying to use model-diagnostics in two separate environments (Dataiku, python3.8 and basic Mac command line, python3.7), I have experienced the same issue with using the model-diagnostics package. Please add an example to the docs that shows how to install and import the package in a new environment. Thank you.

python3 -m pip install model-diagnostics
Requirement already satisfied: model-diagnostics
python3

from model_diagnostics.scoring import GammaDeviance, decompose
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'model_diagnostics.scoring'
from model_diagnostics import calibration
dir(calibration)
['all', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'plot_reliability_diagram', 'plots']
from model_diagnostics import scoring
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'scoring' from 'model_diagnostics' (/Users/ebaughm/Library/Python/3.7/lib/python/site-packages/model_diagnostics/init.py)

Add example for classification

Bug? scoring.decompose raises ValueError but input arrays meet the required conditions - y is not constant

Similar to #98, however it does not have to do with y being constant.
Maybe one could check in the code whether the domain error is happening at the recalibration step and inform the user accordingly?

Reproducible example:

import numpy as np
from model_diagnostics.scoring import decompose, PoissonDeviance

y = np.array([0., 50.])
w = np.array([0.1, 0.01])
mx = np.array([0.01, 1.5])

_ = decompose(
    y_obs=y,
    y_pred=mx,
    scoring_function=PoissonDeviance(),
    weights=w
)
print(_)

Traceback (most recent call last):
  File "/workspaces/dev_env/minimal_example.py", line 8, in <module>
    _ = decompose(
        ^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 869, in decompose
    score_recalibrated = scoring_function(y_o, recalibrated, w)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 58, in __call__
    return np.average(self.score_per_obs(y_obs, y_pred), weights=weights)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 202, in score_per_obs
    raise ValueError(msg)
ValueError: Valid domain for degree=1 is y_obs >= 0 and y_pred > 0.

Versions

OS linux-aarch64
model-diagnostics 1.0.1
Numpy 1.25.2
Python 3.11.5.final.0

Make confidence intervals of reliability diagram tighter

The confidence intervals of plot_reliability_diagram are made a bit too wide by

# We make the interval conservatively monotone increasing by applying
# np.maximum.accumulate etc.
lower = -np.minimum.accumulate(-boot.confidence_interval.low)
upper = np.maximum.accumulate(boot.confidence_interval.high)

This is not conservative because in might lead to the conclusion that the model is auto-calibrated - within the uncertainty - while it actually is not.

scoring.decompose raises ValueError but input arrays meet the required conditions

Description

There could be situations where calling scoring.decompose raises an Exception

Traceback (most recent call last):
  File "/workspaces/dev_env/minimal_example.py", line 9, in <module>
    decompose(
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 848, in decompose
    score_recalibrated = scoring_function(y_obs, recalibrated, w)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 58, in __call__
    return np.average(self.score_per_obs(y_obs, y_pred), weights=weights)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 202, in score_per_obs
    raise ValueError(msg)
ValueError: Valid domain for degree=1 is y_obs >= 0 and y_pred > 0.

even though the input arrays y_obs and y_pred meet the required conditions.

Code to reproduce

import numpy as np
from model_diagnostics.scoring import decompose, PoissonDeviance

y = np.array([0., 0., 0., 0., 0., 0., 0.])
w = np.array([1000, 1000, 1000, 1000, 1000, 1000, 1000])
mx = np.array([5448.18388438, 50.01519906, 50.01704928, 50.01044007,
               50.01964483, 50.01714894, 5762.89508342])

decompose(
    y_obs=y,
    y_pred=mx,
    scoring_function=PoissonDeviance(),
    weights=w
)

Expected result

To be discussed.

Versions

OS linux-aarch64
model-diagnostics 1.0.0
Numpy 1.25.2
Python 3.11.5.final.0

Add quantile and expectile to plot_reliability_diagram

Additional plotting backend plotly

So far, all the plotting capabilities rely on matplotlib. It would be nice to change the plotting backend to plotly, maybe with a global obtion:

import model_diagnostics
from model_diagnostics.calibration import plot_bias 

with model_diagnostic.config_context(plot_backend="plotly"):
    plot_bias(...)

# or globally
model_diagnostic.set_config(plot_backend="plotly"):
plot_bias(...)

The downside is that this might double the lines of code an bring a lot more maintenance burden.

Migrate linter to ruff

https://github.com/charliermarsh/ruff

Create a logo

This library needs a logo. The ones of https://pola.rs and https://numpy.org have nice ones.

Kernel SHAP

Add kernel SHAP as in https://github.com/mayer79/kernelshap.

Use mike for versioned deployment of docs

https://github.com/jimporter/mike

Make plotly backend more visible in the documentation or examples

Add case weights

The current functionalities do not support case weights, aka sample weights. There are, however, many use cases where they are important. Case weights should be supported by:

~~identification_function~~ does not make sense
compute_bias
plot_bias
plot_reliability_diagram

Improve plot_bias with Null values

~~x (ticks) label for string/categorical Null values~~ (legend entry and diamond shape is enough)
diamond shapes for string/categorical Null values, always to the right
add diamond to legend

Polars v0.20.4 deprecations

Polars version 0.20.4 deprecated the keyword columns in frame.drop(..), see https://github.com/pola-rs/polars/blame/6050a32ac34d00ad1ea82ea61ed4bb3e9a59064e/py-polars/polars/dataframe/frame.py#L6637-L6639, see pola-rs/polars#13460.

Partial Dependence

compute_partial_dependence and plot_partial_dependence

Publish on conda-forge

Support missing/nan/null values

Murphy diagrams

W. Ehm, T. Gneiting, A. I. Jordan, and F. Krüger. “Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings”. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78.3
(2016), pp. 505–562. doi: 10.1111/rssb.12154. arXiv: 1503.08195 [math.ST].

MNT only collect for bug in polars 0.20.20

#148 fixed a bug introduced in polars 0.20.20. This was then fixed in polars, see pola-rs/polars#15779 and pola-rs/polars#15784 in v0.20.22.

Therefore, we should restrict the necessary collect only to polars 0.20.20-0.20.21.

Consistent scoring functions for expectiles and quantiles

Add scoring functions that are (strictly) consistent for expectiles and quantiles:

homogeneous score (Gneiting Eq. 19)
Tweedie deviances (which coincide with homog. score)
Asymmetric piecewise linear (Gneiting Eq. 24) for quantiles
Asymmetric piecewise quadratic (Gneiting Eq. 27) for expectiles
~~log loss~~

Gneiting, Tilmann. "Making and Evaluating Point Forecasts." Journal of the American Statistical Association 106 (2009): 746 - 762. (PDF: http://arxiv.org/pdf/0912.0902.pdf)

	if diagram_type == "reliability":
	ylabel = "estimated E(Y\|prediction)"
	title = "Reliability Diagram"
	else:
	ylabel = "prediction - estimated E(Y\|prediction)"
	title = "Bias Reliability Diagram"
	ax.set(xlabel="prediction for E(Y\|X)", ylabel=ylabel)

lorentzenchr / model-diagnostics Goto Github PK

model-diagnostics's People

Contributors

Stargazers

Watchers

Forkers

model-diagnostics's Issues

Description

Code to reproduce

Expected result

Versions

Versions

Description

Code to reproduce

Expected result

Versions

Recommend Projects

Recommend Topics

Recommend Org