performance-metrics-from-scratch

This repository has the implementation of Performance Metrics (e.g. F1 score, AUC, Accuracy, etc) from scratch, without using Scikit Learn library.

Datasets used in this project

sample1.csv : In this dataset, the data is highly imbalanced (number of positive data points >> number of negative data points). The dataset has 2 columns namely 'y' and 'proba' which are basically the actual label of the data point and the probability score of the data point respectively.
sample2.csv: In this dataset, the data is highly imbalanced (number of positive data points << number of negative data points). The dataset has 2 columns namely 'y' and 'proba' which are basically the actual label of the data point and the probability score of the data point respectively.
sample3.csv : This is a regression data which means the labels are continuous numbers (97, 101.23, etc.). This dataset is used to calculate metrics like MSE, MAPE, R Sqaured error.

Performance Metrics Explored

1. Confusion Matrix -

Confusion Matrix is the combination of TP, FP, TN, FN in which TP is True Positive, FP is False Positive, TN is True Negative, FN is False Negative.
True Positive - The classifier predicted the label as Positive which is actually Positive.
False Positive - The classifier predicted the label as Positive but actually the label is Negative.
True Negative - The classifier predicted the label as Negative which is actually Negative.
False Negative - The classifier predicted the label as Negative but actually the label is Positive.

2. F1 Score -

F1 score is the harmonic mean of precision and recall. Harmonic mean for a and b is 2ab/(a+b).
Precision - Out of all the predicted positive data points, how much positive data points was classifier able to predict correctly.
Precision = TP/(TP+FP).
Recall - Out of all the actual positive data points, how much positive data points was classifier able to predict correctly.
Recall = TP/(TP+FN).

3. Accuracy -

Accuracy is the metric in which out of total predictions, how much the classifier is able to predict correctly, both for positive and negative predictions.
Accuracy = (TP+TN)/(TP+FP+TN+FN).

4. AUC Score -

AUC (Area Under Curve) is a performance measurement metric for classification problems at various threshold settings. FPR and TPR are used to calculate AUC Score.
False Positive Rate (FPR) - Out of all the negative data points, how many data points was classifier able to pick up. It is also called as negative recall.
FPR = FP/(FP+TN).
True Positive Rate (TPR) - Out of all the positive data points, how many data points was classifier able to pick up. It is also called as positive recall.
TPR = TP/(TP+FN).

5. Mean Squared Error (MSE) -

MSE is a metric to measure the average of the squares of the errors. This metric is used for regression problems where data is continuous in nature. So the output of this metric does not lie in between some intervals like 0 and 1.

6. Mean Absolute Percentage Error (MAPE) -

MAPE is a metric to measure the average of the absolute percentage errors. This metric is used for regression problems where data is continuous in nature.

7. R Squared Error -

R Squared Error is the comparison of residual sum of squares (SSres) with the total sum of sqaures (SStotal). It represents the goodness of fit of a regression model.
SSres - It is the sum of squares of the residual error which is nothing but the Mean Squared Error (MSE).
SStotal - It is the sum of the squares of the total error which is the sum of the squares of the differences between the actual labels and the mean of the actual labels, which is nothing but the variance value.

sachelsout / performance-metrics-from-scratch Goto Github PK