Giter Site home page Giter Site logo

classifier-calibration / pycalib Goto Github PK

View Code? Open in Web Editor NEW
15.0 1.0 2.0 11.06 MB

Python library for classifier calibration

Home Page: https://classifier-calibration.github.io/PyCalib/

License: BSD 3-Clause "New" or "Revised" License

Python 99.31% Makefile 0.69%
python machine-learning classifier classifier-training calibration probabilistic-models optimal-decision-making

pycalib's Introduction

CI Documentation License BSD3 Python3.8 pypi codecov DOI

PyCalib

Python library for classifier calibration

User installation

The PyCalib package can be installed from Pypi with the command

pip install pycalib

Documentation

The documentation can be found at https://classifier-calibration.github.io/PyCalib/

Development

There is a make file to automate some of the common tasks during development. After downloading the repository create the virtual environment with the command

make venv

This will create a venv folder in your current folder. The environment needs to be loaded out of the makefile with

source venv/bin/activate

After the environment is loaded, all dependencies can be installed with

make requirements-dev

Unittest

Unittests are specified as doctest examples in simple functions (see example ), and more complex tests in their own python files starting with test_ (see example ).

Run the unittest with the command

make test

The test will show a unittest result including the coverage of the code. Ideally we want to increase the coverage to cover most of the library.

Contiunous Integration

Every time a commit is pushed to the master branch a unittest is run following the workflow .github/workflows/ci.yml. The CI badge in the README file will show if the test has passed or not.

Analyse code

We are trying to follow the same code standards as in Numpy and Scikit-learn, it is possible to check for pep8 and other code conventions with

make code-analysis

Documentation

The documentation can be found at https://www.classifier-calibration.com/PyCalib/, and it is automatically updated after every push to the master branch.

All documentation is done ussing the Sphinx documentation generator. The documentation is written in reStructuredText (*.rst) files in the docs/source folder. We try to follow the conventions from Numpy and Scikit-learn.

The examples with images in folder docs/source/examples are generated automatically with Sphinx-gallery from the python code in folder examples/ starting with xmpl_{example_name}.py.

The docuemnation can be build with the command

make doc

(Keep in mind that the documentation has its own Makefile inside folder docs).

After building the documentation, a new folder should appear in docs/build/ with an index.html that can be opened locally for further exploration.

The documentation is always build and deployed every time a new commit is pushed to the master branch with the workflow .github/workflows/documentation.yml.

After building, the docs/build/html folder is pushed to the branch gh-pages.

Check Readme

It is possible to check that the README file passes some tests for Pypi by running

make check-readme

Upload to PyPi

After testing that the code passes all unittests and upgrading the version in the file pycalib/__init__.py the code can be published in Pypi with the following command:

make pypi

It may require user and password if these are not set in your home directory a file .pypirc

[pypi]
username = __token__
password = pypi-yourtoken

Contributors

This code has been adapted by Miquel from several previous codes. The following is a list of people that has been involved in some parts of the code.

  • Miquel Perello Nieto
  • Hao Song
  • Telmo Silva Filho
  • Markus Kängsepp

pycalib's People

Contributors

perellonieto avatar pgijsbers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

valeman pjmss

pycalib's Issues

Make CalibratedClassifier accept any arbitrary calibrator and classifier

The current calibratedClassifier was designed only for Beta calibration, sigmoid and Isotonic https://github.com/perellonieto/PyCalib/blob/master/pycalib/models/__init__.py#L552

This class should ideally accept any base calibrator and base probabilistic classfier.

However, it is not clear if this class is necessary, as it may be outdated by CalibratedModel https://github.com/perellonieto/PyCalib/blob/master/pycalib/models/__init__.py#L205

Unify redundant ECE metrics

There are multiple functions to compute ECE because of legacy reasons.

The first one had different options to compute confidence-ECE and other variants. I think we can remove the general function ECE and only use the more specific ones, but this needs to be clarified and do properly.

Deprecate plot_{binary/multiclass}_reliability_diagram_gaps

Currently the new implementation plot_reliability_diagram accepts the binary and multiclass case with the option to show bars and gaps. Once we are happy with the new functions we need to remove the following functions

TypeError: BetaCalibration() takes no arguments error

Hello,

I am testing calibration of Beta distribution with the following code:

Code

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from pycalib.models import CalibratedClassifierCV
from sklearn.metrics import brier_score_loss

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

base_clf = RandomForestClassifier(n_estimators=50)
base_clf.fit(X_train, y_train)

prob_pos_uncalibrated = base_clf.predict_proba(X_test)[:, 1]
score_uncalibrated = brier_score_loss(y_test, prob_pos_uncalibrated)
print("Brier score loss (uncalibrated):", score_uncalibrated)

# Calibrate the classifier using isotonic calibration
calibrated_clf = CalibratedClassifierCV(base_clf, method='beta', cv='prefit')
calibrated_clf.fit(X_train, y_train)

# Get the calibrated predicted probabilities
prob_pos_calibrated = calibrated_clf.predict_proba(X_test)[:, 1]
score_calibrated = brier_score_loss(y_test, prob_pos_calibrated)
print("Brier score loss (calibrated):", score_calibrated)

Error

---> 23 calibrated_clf.fit(X_train, y_train)
     25 # Get the calibrated predicted probabilities
     26 prob_pos_calibrated = calibrated_clf.predict_proba(X_test)[:, 1]

File ~/mambaforge/envs/facets/lib/python3.10/site-packages/pycalib/models/__init__.py:467, in CalibratedClassifierCV.fit(self, X, y, sample_weight)
    465         calibrated_classifier.fit(X, y, sample_weight)
    466     else:
--> 467         calibrated_classifier.fit(X, y)
    468     self.calibrated_classifiers_.append(calibrated_classifier)
    469 else:

File ~/mambaforge/envs/facets/lib/python3.10/site-packages/pycalib/models/__init__.py:692, in _CalibratedClassifier.fit(self, X, y, sample_weight)
    690 # TODO Remove BetaCalibration
    691 elif self.method == 'beta':
--> 692     calibrator = BetaCalibration(parameters="abm")
    693 elif self.method == 'beta_am':
    694     calibrator = BetaCalibration(parameters="am")

TypeError: BetaCalibration() takes no arguments

The same results occurred when I tried beta_am and beta_ab. Do you know why do I see this error?

OneVsRestCalibrator + IsotonicCalibration output probabilities have a wrong shape

from sklearn.naive_bayes import GaussianNB
from pycalib.models import CalibratedModel, OneVsRestCalibrator, IsotonicCalibration
from sklearn import datasets

X, y = datasets.make_classification(n_classes=3, n_clusters_per_class=1)

cal = CalibratedModel(GaussianNB(), method=OneVsRestCalibrator(IsotonicCalibration()))
cal.fit(X, y)
cal.predict_proba(X).shape

returns the following shape

(2, 100, 3)

While other calibrators (eg. BinningCalibration, SigmoidCalibration) return the correct shape

(100, 3)

Avoid using a direct adaptation of OneVsRestClassifier from scikit-learn in OneVsRestCalibrator

The current function OneVsRestCalibrator https://github.com/perellonieto/PyCalib/blob/master/pycalib/models/multiclass.py#L72 uses an adapted copy of the scikit-learn OneVsRestClassifier https://github.com/scikit-learn/scikit-learn/blob/95119c13a/sklearn/multiclass.py#L138

It would be good to clarify what parts are necessary for this code, and what parts can be refactored in a way that we can use directly scikit-learn without duplicating the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.