classifier-calibration / pycalib Goto Github PK

View Code? Open in Web Editor NEW

15.0 1.0 2.0 11.06 MB

Python library for classifier calibration

Home Page: https://classifier-calibration.github.io/PyCalib/

License: BSD 3-Clause "New" or "Revised" License

Python 99.31% Makefile 0.69%

python machine-learning classifier classifier-training calibration probabilistic-models optimal-decision-making

pycalib's Introduction

PyCalib

Python library for classifier calibration

User installation

The PyCalib package can be installed from Pypi with the command

pip install pycalib

Documentation

The documentation can be found at https://classifier-calibration.github.io/PyCalib/

Development

There is a make file to automate some of the common tasks during development. After downloading the repository create the virtual environment with the command

make venv

This will create a venv folder in your current folder. The environment needs to be loaded out of the makefile with

source venv/bin/activate

After the environment is loaded, all dependencies can be installed with

make requirements-dev

Unittest

Unittests are specified as doctest examples in simple functions (see example ), and more complex tests in their own python files starting with test_ (see example ).

Run the unittest with the command

make test

The test will show a unittest result including the coverage of the code. Ideally we want to increase the coverage to cover most of the library.

Contiunous Integration

Every time a commit is pushed to the master branch a unittest is run following the workflow .github/workflows/ci.yml. The CI badge in the README file will show if the test has passed or not.

Analyse code

We are trying to follow the same code standards as in Numpy and Scikit-learn, it is possible to check for pep8 and other code conventions with

make code-analysis

Documentation

The documentation can be found at https://www.classifier-calibration.com/PyCalib/, and it is automatically updated after every push to the master branch.

All documentation is done ussing the Sphinx documentation generator. The documentation is written in reStructuredText (*.rst) files in the docs/source folder. We try to follow the conventions from Numpy and Scikit-learn.

The examples with images in folder docs/source/examples are generated automatically with Sphinx-gallery from the python code in folder examples/ starting with xmpl_{example_name}.py.

The docuemnation can be build with the command

make doc

(Keep in mind that the documentation has its own Makefile inside folder docs).

After building the documentation, a new folder should appear in docs/build/ with an index.html that can be opened locally for further exploration.

The documentation is always build and deployed every time a new commit is pushed to the master branch with the workflow .github/workflows/documentation.yml.

After building, the docs/build/html folder is pushed to the branch gh-pages.

Check Readme

It is possible to check that the README file passes some tests for Pypi by running

make check-readme

Upload to PyPi

After testing that the code passes all unittests and upgrading the version in the file pycalib/__init__.py the code can be published in Pypi with the following command:

make pypi

It may require user and password if these are not set in your home directory a file .pypirc

[pypi]
username = __token__
password = pypi-yourtoken

Contributors

This code has been adapted by Miquel from several previous codes. The following is a list of people that has been involved in some parts of the code.

Miquel Perello Nieto
Hao Song
Telmo Silva Filho
Markus Kängsepp

pycalib's People

Contributors

Stargazers

Watchers

Forkers

valeman pjmss

pycalib's Issues

ECE metrics need rounding on some simple tests

The current code creates some floating point precision problems when subtracting a probability of 0.9 for a positive sample, in which ECE should be 0.1 but it seems to be 0.09999999999999. I had to round the number and print it in order to pass the doctest, but I wonder if this precision issue can be avoided.

https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L460

Make CalibratedClassifier accept any arbitrary calibrator and classifier

The current calibratedClassifier was designed only for Beta calibration, sigmoid and Isotonic https://github.com/perellonieto/PyCalib/blob/master/pycalib/models/__init__.py#L552

This class should ideally accept any base calibrator and base probabilistic classfier.

However, it is not clear if this class is necessary, as it may be outdated by CalibratedModel https://github.com/perellonieto/PyCalib/blob/master/pycalib/models/__init__.py#L205

Should the critical difference be included in the library?

The current library has a function plot_critical_difference https://github.com/perellonieto/PyCalib/blob/master/pycalib/visualisations/__init__.py#L466 which uses an adapted code to compute statistical test to compare several calibration methods in several datasets, and plots a critical difference diagram using the Orange library. Altought it is useful in empirical comparisons, it is not clear if this should be part of this library.

Document get_converging_lines

Document function get_converging_lines https://github.com/perellonieto/PyCalib/blob/master/pycalib/visualisations/ternary.py#L154 with the same standard than accuracy https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L8 and others including a couple of simple doctests.

Unify redundant ECE metrics

There are multiple functions to compute ECE because of legacy reasons.

The first one had different options to compute confidence-ECE and other variants. I think we can remove the general function ECE and only use the more specific ones, but this needs to be clarified and do properly.

Document conf_MCE

Document function conf_MCE https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L410 with the same standard than accuracy https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L8 and others including a couple of simple doctests.

Should ploting a confusion matrix be part of the library?

Ploting a confusion matrix https://github.com/perellonieto/PyCalib/blob/master/pycalib/visualisations/__init__.py#L382 is useful to evaluate the performance of a multiclass classification/calibration method, but not sure if this is out of the scope of this library.

Refactor plot_reliability_diagram

The current function is ~200 lines long https://github.com/perellonieto/PyCalib/blob/master/pycalib/visualisations/__init__.py#L18-L243
. We should divide the code into multiple functions that perform smaller tasks. Also, document the function as the metrics/accuracy example.

Deprecate plot_{binary/multiclass}_reliability_diagram_gaps

Currently the new implementation plot_reliability_diagram accepts the binary and multiclass case with the option to show bars and gaps. Once we are happy with the new functions we need to remove the following functions

plot_binary_reliability_diagram_gaps: https://github.com/perellonieto/PyCalib/blob/master/pycalib/visualisations/__init__.py#L246
plot_multiclass_reliability_diagram_gaps: https://github.com/perellonieto/PyCalib/blob/master/pycalib/visualisations/__init__.py#L338

Document conf_ECE

Document function conf_ECE https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L111 with the same standard than accuracy https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L8 and others including a couple of simple doctests.

Document pECE

Document function pECE https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L544 with the same standard than accuracy https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L8 and others including a couple of simple doctests.

TypeError: BetaCalibration() takes no arguments error

Hello,

I am testing calibration of Beta distribution with the following code:

Code

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from pycalib.models import CalibratedClassifierCV
from sklearn.metrics import brier_score_loss

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

base_clf = RandomForestClassifier(n_estimators=50)
base_clf.fit(X_train, y_train)

prob_pos_uncalibrated = base_clf.predict_proba(X_test)[:, 1]
score_uncalibrated = brier_score_loss(y_test, prob_pos_uncalibrated)
print("Brier score loss (uncalibrated):", score_uncalibrated)

# Calibrate the classifier using isotonic calibration
calibrated_clf = CalibratedClassifierCV(base_clf, method='beta', cv='prefit')
calibrated_clf.fit(X_train, y_train)

# Get the calibrated predicted probabilities
prob_pos_calibrated = calibrated_clf.predict_proba(X_test)[:, 1]
score_calibrated = brier_score_loss(y_test, prob_pos_calibrated)
print("Brier score loss (calibrated):", score_calibrated)

Error

---> 23 calibrated_clf.fit(X_train, y_train)
     25 # Get the calibrated predicted probabilities
     26 prob_pos_calibrated = calibrated_clf.predict_proba(X_test)[:, 1]

File ~/mambaforge/envs/facets/lib/python3.10/site-packages/pycalib/models/__init__.py:467, in CalibratedClassifierCV.fit(self, X, y, sample_weight)
    465         calibrated_classifier.fit(X, y, sample_weight)
    466     else:
--> 467         calibrated_classifier.fit(X, y)
    468     self.calibrated_classifiers_.append(calibrated_classifier)
    469 else:

File ~/mambaforge/envs/facets/lib/python3.10/site-packages/pycalib/models/__init__.py:692, in _CalibratedClassifier.fit(self, X, y, sample_weight)
    690 # TODO Remove BetaCalibration
    691 elif self.method == 'beta':
--> 692     calibrator = BetaCalibration(parameters="abm")
    693 elif self.method == 'beta_am':
    694     calibrator = BetaCalibration(parameters="am")

TypeError: BetaCalibration() takes no arguments

The same results occurred when I tried beta_am and beta_ab. Do you know why do I see this error?

Document full_ECE

Document function full_ECE https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L500 with the same standard than accuracy https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L8 and others including a couple of simple doctests.

OneVsRestCalibrator + IsotonicCalibration output probabilities have a wrong shape

from sklearn.naive_bayes import GaussianNB
from pycalib.models import CalibratedModel, OneVsRestCalibrator, IsotonicCalibration
from sklearn import datasets

X, y = datasets.make_classification(n_classes=3, n_clusters_per_class=1)

cal = CalibratedModel(GaussianNB(), method=OneVsRestCalibrator(IsotonicCalibration()))
cal.fit(X, y)
cal.predict_proba(X).shape

returns the following shape

(2, 100, 3)

While other calibrators (eg. BinningCalibration, SigmoidCalibration) return the correct shape

(100, 3)

Document classwise_ECE

Document function classwise_ECE https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L450 with the same standard than accuracy https://github.com/perellonieto/PyCalib/blob/master/pycalib/metrics.py#L8 and others including a couple of simple doctests.

Avoid using a direct adaptation of OneVsRestClassifier from scikit-learn in OneVsRestCalibrator

The current function OneVsRestCalibrator https://github.com/perellonieto/PyCalib/blob/master/pycalib/models/multiclass.py#L72 uses an adapted copy of the scikit-learn OneVsRestClassifier https://github.com/scikit-learn/scikit-learn/blob/95119c13a/sklearn/multiclass.py#L138

It would be good to clarify what parts are necessary for this code, and what parts can be refactored in a way that we can use directly scikit-learn without duplicating the code.

classifier-calibration / pycalib Goto Github PK

pycalib's Introduction

PyCalib

User installation

Documentation

Development

Unittest

Contiunous Integration

Analyse code

Documentation

Check Readme

Upload to PyPi

Contributors

pycalib's People

Contributors

Stargazers

Watchers

Forkers

pycalib's Issues

Recommend Projects

Recommend Topics

Recommend Org