Giter Site home page Giter Site logo

alexandrumonahov / mewto Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 64 KB

mewto is an R package that allows you to experiment with different thresholds for classification of prediction results in the case of binary classification problems and visualize various model evaluation metrics, confusion matrices and the ROC curve. It also allows you to calculate the optimal threshold based on a weighted evaluation criterion.

R 100.00%
r threshold machine-learning roc visualization classification regression youden-jstatistic weighting model-evaluation-metrics roc-curve pr-curve confusion-matrix

mewto's Introduction

mewto

Model Evaluation with Weighted Threshold Optimization

mewto is an R package that allows users to experiment with different thresholds for classification of prediction results in the case of binary classification problems and to interactively visualize model evaluation metrics, confusion matrices, the ROC and PR curves. It can also calculate the optimal threshold based on a weighted evaluation criterion and displays related performance metrics.

v1.1.0

What's new?

  • PR curve added to the visualization options
  • UI layout changed to accomodate multiple visualizations
  • code rewriten with optimization in mind: load times signifficantly reduced
  • minor corrections in function documentations

About mewto

mewto currently consists of two functions:

mewtoApp

This function launches a Shiny application where the user can interactively manipulate the threshold used in binary classification and view the associated metrics, confusion matrix and ROC curve. The app also allows for optimal threshold calculation according to a weighted version of Youden's J-statistic.

In R, simply call the function:

mewtoApp(actuals, probabilities)

actuals - Data of factor type with two levels: "yes" for positive and "no" for negative.

probabilities - Data of numeric type which should represent the probabilities of realization of the positive category.

mewtoThresh

This function calculates the optimal threshold according to a weighted version of Youden's J-statistic.

mewtoThresh(actuals, probabilities, weight)

actuals - Data of factor type with two levels: "yes" for positive and "no" for negative.

probabilities - Data of numeric type which should represent the probabilities of realization of the positive category.

weight - The importance attributed to sensitivity, or formulated differently, to the maximization of the true positives rate.

Example

This example generates 100000 observations of actual data (labelled "no", "yes"), as well as predicted values (ranging between 0 and 1), and stores tham in a dataframe called "df". The user can then call the functions included in the mewto package to perform exploratory analysis or obtain the optimum threshold value according to the weighted Youden J-statistic.

# Generate dataset
set.seed(123)
nobs = 100000 # Select the number of observations to be generated
predicted <- runif(nobs, 0, 1) # Probabilities representing the predicted values
thresh <- runif(nobs, 0.2, 0.8) # Intermediary step to generate actuals 
df <- data.frame(predicted, thresh) # "predicted" and "thresh" combined in "df"
df$actuals <- c("no", "yes")[(df$predicted >= df$thresh) + 1] # Actual data

# Call the mewto library
library(mewto)

# Run mewtoApp to launch the visual interface and experiment
mewtoApp(df$actuals, df$predicted)

# Run mewtoThresh with a weight of 0.5 to obtain the optimal threshold according to Youden's original J-statistic
mewtoThresh(df$actuals, df$predicted, weight=0.5)

Technical details

In the calculation of the optimal threshold, a weighted version of Youden's J-statistic (Youden, 1950) is employed. The optimal cut-off is the threshold that maximizes the distance to the identity (diagonal) line. The function maximizes the metric:

w * sensitivity + (1 - w) * specificity, where "w" is the "weight" parameter.

Youden's J-statistic has been modified by adding the weighting parameter "w". The statistic varies in the interval [0;1]. Given a weighting factor w = 0.5, the weighted optimization function produces the same result as Youden's original J statistic. This particular statistic has been chosen since it is well-suited for weighting, and it is also the default criterion used in the R package pROC.

Download and installation

Online, from Github:

You can download mewto directly from Github. To do so, you need to have the devtools pachage installed and loaded. Once you are in R, run the following commands:

install.packages("devtools")

library("devtools")

install_github("alexandrumonahov/mewto")

You may face downloading errors from Github if you are behind a forewall or there are https download restrictions. To avoid this, you can try running the following commands:

options(download.file.method = "libcurl")

options(download.file.method = "wininet")

Offline, by manually downloading and installing the package files:

Alternatively, if you cannot download the file through Github, you may also download the binary package file from the link below:

https://github.com/alexandrumonahov/zip/blob/main/mewto.zip

Place the downloaded file into the working directory of R. The do one of the following:

Option 1) Run the following command:

install.packages('mewto_1.0.zip', repos = NULL, type = "win.binary")

Option 2) In RStudio:

Go to the Packages tab in the bottom-right pane and click on "Install". In the pop-up window that appears, click on "Browse" and choose the package mewto.zip that you have just downloaded. Click on "Install".

Once the package is stalled, you can run it using the: library(mewto) command.

Special thanks!

I would like to give special thanks to Prof. Stefan Bender, Dr. Jens Mehrhoff, Gabriela Alves Werb and the Bundesbank ICBD team for having inspired the creation of this package.

Version history

v1.0.0

  • mewto's application launch
  • interactive threshold component added
  • weighted optimization algorithm developped based on Youden's J-Statistic
  • confusion matrix and performance metrics analysis included
  • ROC curve visualization augmented to display user's threshold selection on the curve

Author details

Alexandru Monahov, 2021

mewto's People

Contributors

alexandrumonahov avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.