Jill and I are tasked with applying machine learning to solve a real-world challenge: credit card risk. Credit risk is an inherently unbalanced classification problem, with using a credit card data set from LendingClub we will oversample the data using the RandomOverSampler and SMOTE algorithms, and undersample the data using the ClusterCentroids algorithm. Then, you’ll use a combinatorial approach of over- and undersampling using the SMOTEENN algorithm. Next, you’ll compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk.
Looking here we can see the balanced accuracy score (66%) and the precision (99%) and recall scores (60%) for the Naive Random Oversampling.
Looking here we can see the balanced accuracy score (66%) and the precision (99%) and recall scores (70%) for SMOTE Oversampling.
Looking here we can see the balanced accuracy score (54%) and the precision (99%) and recall scores (40%) for Undersampling.
Looking here we can see the balanced accuracy score (64%) and the precision (99%) and recall scores (59%) for Combination Sampling.
Looking here we can see the balanced accuracy score (79%) and the precision (99%) and recall scores (87%) for Balanced Random Forest Classifier.
Looking here we can see the balanced accuracy score (93%) and the precision (99%) and recall scores (93%) for AdaBoost Sampling.
Based on the analysis and testing it seems as if AdaBoost performed the best across the metrics. They key metric that it performed best on was recall. AdaBoost preformed the best in the average of all metrics and caught 93% of the fraud cases however it still lacks precision which is a distinguishing factor in uncovering all of these cases. While there may be some flaws in AdaBoost, it seems like it is the best model for our data in order to catch the most fraud claims.