Giter Site home page Giter Site logo

brdx88 / trial_deploy_model Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 1.56 MB

trying to deploy a Machine Learning model and the API code can be accessed across from any machine.

Home Page: https://credit-risk-brianic.herokuapp.com/

Python 0.58% Jupyter Notebook 97.61% JavaScript 0.49% CSS 0.31% HTML 1.00%
flask heroku machine-learning weight-of-evidence postman

trial_deploy_model's Introduction

CREDIT RISK MODEL

Live Credit Risk Predictor

1

Business Problems

As a Financing Company, the user wants to build a credit scoring model to predict whether the client will default or not after their loan application.

Business Goals

Research and develop the model to predict applicants whether the applicant will default or not, and also find the best metrics since this is an imbalance class dataset.

Data Dictionary

Feature Name Description
person_age Age
person_income Annual income
person_home_ownership Home ownership
person_emp_length Employment length (in years)
loan_intent Loan intent
loan_amnt Loan amount
loan_int_rate Interest rate
loan_percent_income Percent income by loan
cb_person_default_on_file Historical default
cb_person_cred_hist_length Credit history length
loan_status Loan status

Results

    +----------------------+--------------------------------+--------------------------------+--------------------------------+
    |        		   | Train	    	   	    | Test	    	   	     | Holdout Sample	   	      | 
    | Model                +----------+----------+----------+----------+----------+----------+----------+----------+----------+
    |   		   | Recall   | F1-Score | AUC	    | Recall   | F1-Score | AUC	     | Recall   | F1-Score | AUC      |
    +----------------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+	
    | Logistic Regression  | 0.525296 | 0.618740 | 0.737900 | 0.524548 | 0.621112 | 0.738705 | 0.470407 | 0.584248 | 0.717754 |
    | RandomForest	   | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.500000 |
    | XGBoost  		   | 0.695587 | 0.813239 | 0.845633 | 0.689061 | 0.798403 | 0.839225 | 0.696387 | 0.802480 | 0.843304 |
    +----------------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+

Conclusions

Since this case is an imbalanced dataset (non-default:77.7% ; default:22.3%), it's worth looking at the AUC and Recall metrics instead. Why? Especially for Recall metrics. For business purposes, we assume to minimize Type 2 (minimize False Negative -- predict non-default (0), actual default (1)). Hence, we use Recall metrics for optimum result.

It can be seen in the table above, the model which has the highest and the most stable AUC and Recall is XGBoost AUC: 0.839225 and XGBoost Recall: 0.689061.

In addition, the Recall and AUC scores on the train and test are not much different. It means that we can conclude that this model is 'just right' to classify target 1 and target 0, neither overfitting nor underfitting.

If we look back at the features importance by Logistic Regression with Lasso regularization, the selected features seem make sense. Features which affect loan_status are:

  1. Percentage of Income ('loan_percent_income'),
  2. Loan Amount ('loan_amnt_WOE'),
  3. Employement Length ('person_emp_length_WOE'),
  4. Owning Home ('person_home_ownership_OWN'),
  5. Loan Grade ('loan_grade'),
  6. Intention for Venture ('loan_intent_VENTURE'),
  7. Intention for Education ('loan_intent_EDUCATION'),
  8. Renting home ('person_home_ownership_RENT'),
  9. Age ('person_age_WOE'),
  10. Credit History Length ('cb_person_cred_hist_length_WOE'),
  11. Intention for personal purposes ('loan_intent_PERSONAL'),
  12. Intention for home improvement ('loan_intent_HOMEIMPROVEMENT').

After tuning the models and get each metrics, we could predict the holdout sample using our previous models. We see that the XGBoost algorithm shows its best performance among the others. In the holdout sample, XGBoost can reach the AUC: 0.843304 and the Recall: 0.696387. It tells us that XGBoost could be our model for production, because it's not overfitted and it can predicts the holdout sample very well.

GUIDANCE FOR INPUT AND OUTPUT FORMAT

Guidance for input and output format when access it on web.

1. Input Format

Endpoint: https://credit-risk-brianic.herokuapp.com/predict-api.

Using 'POST' method, input variables like the data dictionary above except loan status. It must be JSON like this below for example:

{
	"person_age":24,
	"person_income":168000,
	"person_home_ownership":"MORTGAGE",
	"person_emp_length":0.0,
	"loan_intent":"PERSONAL",
	"loan_grade":"E",
	"loan_amnt":25000,
	"loan_int_rate":16.45,
	"loan_percent_income":0.15,
	"cb_person_default_on_file":"N",
	"cb_person_cred_hist_length":3
	}

2. Output Format

The expected output should be like this below:

{
    "model": "XGB-Credit-Risk",
    "prediction": "87.76% Non-default",
    "version": "1.0.0"
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.