Giter Site home page Giter Site logo

expected_cost_models's Introduction

Expected cost

How to create a proper environment for this project

  1. Create a conda environment with the latest python 3.8:
conda create -n <env_name> python=3.8
  1. Activate the environment:
conda activate <env_name>
  1. Install the requirements:
pip install -r requirements.txt

Information

This repository contains version from the original repository luferrer/expected_cost and luferrer/psr-calibration with some modifications to make it work with the version 3.8 of python and the latest version of the libraries used.

To make use of this repository, you can create a notebook in the directory notebooks and import the modules from the directory expected_cost and psrcal as follows:

# This should be the first cell in the notebook
import sys
sys.path += ["../../", "../", "../../../"]
from expected_cost import ec, utils
from psrcal.calibration import *

Please note that in this case we are using all functions inside the calibration module from the psrcal package.

References

Repositories README

Expected Cost

Methods for computing the expected cost (EC) on an evaluation dataset, as defined in statistical learning text books (e.g., Bishop's "Pattern recognition and machine learning", and Hastie's et al "The elements of statistical learning"). Given a matrix of user-defined costs with elements $c_{y\ d}$, where $y$ is the true class of a sample and $d$ is the decision made by the system for that sample, this metric is estimated as the average of the costs across all samples in the dataset. That is:

$EC = \frac{1}{K} \sum_k c_{y_k\ d_k}$

where the sum runs over the $K$ samples in the evaluation set and $c_{y_k\ d_k}$ is the cost incurred at sample $k$.

The EC is a generalization of the total error (which, in turn, is 1 minus the accuracy) and the balanced total error (which is 1 minus the balanced accuracy). The generalization is in the following ways: (1) it allows for costs that are different for each type of error, and (2) it allows for decisions that do not correspond one to one to the classes (e.g., it allows for the introduction of an "abstain" decision). The EC comes with an elegant theory on how to make optimal decisions given a certain set of costs, and it enables analysis of calibration. For these reasons we believe it is superior to other commonly used classification metrics, like the F-beta score or the Mathews correlation coefficient. All these issues are discussed in detail in:

L. Ferrer, "Analysis and Comparison of Classification Metrics", arXiv:2209.05355

The results in the paper can be replicated with the code in the examples directory in this repository.

The code provides methods for computing the EC when decisions are given by:

  • hard decisions obtained with some external method,

  • Bayes decisions made by optimizing the cost given the scores from a system assuming they can be used to obtain well-calibrated posteriors, or

  • optimal decisions made by choosing the decision threshold that minimizes the cost. This last option is only applicable for the binary case and the standard square cost function.

The scripts in the examples directory can be used with any dataset of scores and targets. See the examples/data.py file for examples on how to load your own data in the format required by the examples.

The repository provides calibration functionality using a separate repository called psr-calibration. The psr-calibration repository (which requires pytorch) is not needed for normal functioning of the code. A wrapper of the psr-calibration main calibration method can be found in the expected_cost/calibration.py file. An example on how to use this method can be found in the notebooks/data.py file.

Proper Scoring Rules for calibration

This repository contains the code for doing calibration on multi-class and binary classification using various approaches. It is a work in progress but the basic functionality is ready. See the experiments directory for examples on how to run the code.

expected_cost_models's People

Contributors

brunocruzfranchi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.