lorentzenchr / model-diagnostics Goto Github PK
View Code? Open in Web Editor NEWTools for diagnostics and assessment of (machine learning) models
Home Page: https://lorentzenchr.github.io/model-diagnostics/
License: MIT License
Tools for diagnostics and assessment of (machine learning) models
Home Page: https://lorentzenchr.github.io/model-diagnostics/
License: MIT License
Add confidence intervals to the reliability diagram, either by
Dimitriadis, T., Duembgen, L., Henzi, A., Puke, M., & Ziegel, J.F. (2022). Honest calibration assessment for binary outcome predictions. arXiv:2203.04065
Currently, the mkdocstrings filters in mkdocs.yml
does not work (as I would expect). The files for generating the API Reference page are currently filtered in gen_ref_pages.py
.
Implement isotonic regression for functionals
The xlabel
and ylabel
values are currently hardcoded:
model-diagnostics/src/model_diagnostics/calibration/plots.py
Lines 208 to 214 in 78ace74
In particular they are wrong whenever functional != "mean"
.
Add the score/loss decomposition according to Pohle with the CORP approch (PAV algo/isotonic regression) of Dimitriadis.
M.-O. Pohle. "The Murphy Decomposition and the Calibration-Resolution Principle: A New Perspective on Forecast Evaluation". In: arXiv (2020). (arXiv: 2005.01835 [stat.ME])[https://arxiv.org/abs/2005.01835].
T. Dimitriadis, T. Gneiting, and A. I. Jordan. "Stable reliability diagrams for probabilistic classifiers". In: Proceedings of the National Academy of Sciences 118.8
(2021), e2016191118. (doi: 10.1073/pnas.2016191118)[https://doi.org/10.1073/pnas.2016191118].
compute_partial_dependence
and plot_partial_dependence
I installed model-diagnostics in a new conda environment and then when trying to call compute_bias I got an IndexError (also in the example regression notebook). If I install pyarrow the error disappears and the result under "expected result" appears. If I uninstall pyarrow the IndexError comes again.
Traceback (most recent call last):
File "/workspaces/dev_env/minimal_example.py", line 8, in <module>
_ = compute_bias(
^^^^^^^^^^^^^
File "/opt/conda/envs/minimal/lib/python3.11/site-packages/model_diagnostics/calibration/identification.py", line 488, in compute_bias
p_value[(n > 1) & (stderr_ == 0)] = 0
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
conda create -n minimal python=3.11
source activate minimal
pip install model-diagnostics
...
Installing collected packages: threadpoolctl, six, pyparsing, polars, pillow, packaging, numpy, kiwisolver, joblib, fonttools, cycler, scipy, python-dateutil, contourpy, scikit-learn, matplotlib, model-diagnostics
Successfully installed contourpy-1.1.1 cycler-0.12.0 fonttools-4.43.0 joblib-1.3.2 kiwisolver-1.4.5 matplotlib-3.8.0 model-diagnostics-1.0.1 numpy-1.26.0 packaging-23.2 pillow-10.0.1 polars-0.19.7 pyparsing-3.1.1 python-dateutil-2.8.2 scikit-learn-1.3.1 scipy-1.11.3 six-1.16.0 threadpoolctl-3.2.0
import numpy as np
from model_diagnostics.calibration.plots import compute_bias
y = np.array([0., 50.])
mx = np.array([0.01, 1.5])
_ = compute_bias(
y_obs=y,
y_pred=mx,
feature=None,
)
print(_)
shape: (1, 5)
┌───────────┬────────────┬──────────────┬─────────────┬──────────┐
│ bias_mean ┆ bias_count ┆ bias_weights ┆ bias_stderr ┆ p_value │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ u32 ┆ f64 ┆ f64 ┆ f64 │
╞═══════════╪════════════╪══════════════╪═════════════╪══════════╡
│ -24.245 ┆ 2 ┆ 2.0 ┆ 24.255 ┆ 0.500131 │
└───────────┴────────────┴──────────────┴─────────────┴──────────┘
OS linux-aarch64
model-diagnostics 1.0.1
Numpy 1.26.0
Python 3.11.5.final.0
Trying to use model-diagnostics in two separate environments (Dataiku, python3.8 and basic Mac command line, python3.7), I have experienced the same issue with using the model-diagnostics package. Please add an example to the docs that shows how to install and import the package in a new environment. Thank you.
python3 -m pip install model-diagnostics
Requirement already satisfied: model-diagnostics
python3from model_diagnostics.scoring import GammaDeviance, decompose
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'model_diagnostics.scoring'
from model_diagnostics import calibration
dir(calibration)
['all', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'plot_reliability_diagram', 'plots']
from model_diagnostics import scoring
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'scoring' from 'model_diagnostics' (/Users/ebaughm/Library/Python/3.7/lib/python/site-packages/model_diagnostics/init.py)
Similar to #98, however it does not have to do with y being constant.
Maybe one could check in the code whether the domain error is happening at the recalibration step and inform the user accordingly?
Reproducible example:
import numpy as np
from model_diagnostics.scoring import decompose, PoissonDeviance
y = np.array([0., 50.])
w = np.array([0.1, 0.01])
mx = np.array([0.01, 1.5])
_ = decompose(
y_obs=y,
y_pred=mx,
scoring_function=PoissonDeviance(),
weights=w
)
print(_)
Traceback (most recent call last):
File "/workspaces/dev_env/minimal_example.py", line 8, in <module>
_ = decompose(
^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 869, in decompose
score_recalibrated = scoring_function(y_o, recalibrated, w)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 58, in __call__
return np.average(self.score_per_obs(y_obs, y_pred), weights=weights)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 202, in score_per_obs
raise ValueError(msg)
ValueError: Valid domain for degree=1 is y_obs >= 0 and y_pred > 0.
OS linux-aarch64
model-diagnostics 1.0.1
Numpy 1.25.2
Python 3.11.5.final.0
The confidence intervals of plot_reliability_diagram
are made a bit too wide by
# We make the interval conservatively monotone increasing by applying
# np.maximum.accumulate etc.
lower = -np.minimum.accumulate(-boot.confidence_interval.low)
upper = np.maximum.accumulate(boot.confidence_interval.high)
This is not conservative because in might lead to the conclusion that the model is auto-calibrated - within the uncertainty - while it actually is not.
There could be situations where calling scoring.decompose raises an Exception
Traceback (most recent call last):
File "/workspaces/dev_env/minimal_example.py", line 9, in <module>
decompose(
File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 848, in decompose
score_recalibrated = scoring_function(y_obs, recalibrated, w)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 58, in __call__
return np.average(self.score_per_obs(y_obs, y_pred), weights=weights)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/model_diagnostics/scoring/scoring.py", line 202, in score_per_obs
raise ValueError(msg)
ValueError: Valid domain for degree=1 is y_obs >= 0 and y_pred > 0.
even though the input arrays y_obs and y_pred meet the required conditions.
import numpy as np
from model_diagnostics.scoring import decompose, PoissonDeviance
y = np.array([0., 0., 0., 0., 0., 0., 0.])
w = np.array([1000, 1000, 1000, 1000, 1000, 1000, 1000])
mx = np.array([5448.18388438, 50.01519906, 50.01704928, 50.01044007,
50.01964483, 50.01714894, 5762.89508342])
decompose(
y_obs=y,
y_pred=mx,
scoring_function=PoissonDeviance(),
weights=w
)
To be discussed.
OS linux-aarch64
model-diagnostics 1.0.0
Numpy 1.25.2
Python 3.11.5.final.0
So far, all the plotting capabilities rely on matplotlib. It would be nice to change the plotting backend to plotly, maybe with a global obtion:
import model_diagnostics
from model_diagnostics.calibration import plot_bias
with model_diagnostic.config_context(plot_backend="plotly"):
plot_bias(...)
# or globally
model_diagnostic.set_config(plot_backend="plotly"):
plot_bias(...)
The downside is that this might double the lines of code an bring a lot more maintenance burden.
This library needs a logo. The ones of https://pola.rs and https://numpy.org have nice ones.
Add kernel SHAP as in https://github.com/mayer79/kernelshap.
The current functionalities do not support case weights, aka sample weights. There are, however, many use cases where they are important. Case weights should be supported by:
identification_function
compute_bias
plot_bias
plot_reliability_diagram
Polars version 0.20.4 deprecated the keyword columns
in frame.drop(..)
, see https://github.com/pola-rs/polars/blame/6050a32ac34d00ad1ea82ea61ed4bb3e9a59064e/py-polars/polars/dataframe/frame.py#L6637-L6639, see pola-rs/polars#13460.
compute_partial_dependence
and plot_partial_dependence
W. Ehm, T. Gneiting, A. I. Jordan, and F. Krüger. “Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings”. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78.3
(2016), pp. 505–562. doi: 10.1111/rssb.12154. arXiv: 1503.08195 [math.ST].
#148 fixed a bug introduced in polars 0.20.20. This was then fixed in polars, see pola-rs/polars#15779 and pola-rs/polars#15784 in v0.20.22.
Therefore, we should restrict the necessary collect
only to polars 0.20.20-0.20.21.
Add scoring functions that are (strictly) consistent for expectiles and quantiles:
Gneiting, Tilmann. "Making and Evaluating Point Forecasts." Journal of the American Statistical Association 106 (2009): 746 - 762. (PDF: http://arxiv.org/pdf/0912.0902.pdf)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.