LendingClubML

Claat: https://codelabs-preview.appspot.com/?file_id=17iO3pKXC5vyRMQFh5PLCRtJOvvmEDHGLeW02zzXlzBw#0

Task 1:

-Irs the professor is going to need the whole data set for the study. The professor would like to know the trend and the factors affecting the interest rate of the loan as the students have more data(features) to explore. This strengthens the point that in the industry most of the efforts goes in improving and understanding data. It is more important to have more data when compared to complex algorithms.

-The data had many colors which were empty or had values which can not be used.

-Analysis of the graphs generated from the lending club data. The graphs have been studied keeping in mind the need of the professor.

https://bit.ly/2YhEYQF

Task 2:

The data was cleaned using panda library. The columns with more than 80% of empty data were dropped. The columns with functions not related to prediction were dropped too. Converting the grade from string to numeric value. Changing the type of data and filling the empty cells with the median of the columns.

Used manual feature engineering and auto feature tools (https://github.com/featuretools/featuretools/ )

In manual feature tools, we could change the things according to our needs while in auto we are restricted

Task 3:

Building 3 manual ML models:

-Linear regression

-Random Forest

-Neural Network

We have taken a MAPE of the output to choose the most reliable model. Random Forest has the least MAPE.

Task 4: Hypertuning

Using Hypertuning to increase the efficiency.

a. Regression: Try L1, L2, Elasticnet regularization

b. Neural networks: Change epochs, optimizers, the learning rate

c. Random forest: No of trees, Tree depth

. https://scikitlearn.org/stable/modules/grid_search.html

Libraries Used:

MLPRegressor
numpy
pandas
tensorflow
StandardScaler
Building 3 AutoML models:
TPOT
AutoSklearn
H2O.ai
LinearRegression
RidgeCV
LassoCV
ElasticNet
RandomizedSearchCV
KFold
learning_curve, GridSearchCV
train_test_split

Using AutoML makes it more Interactive and Interpretable. AutoML makes it easy to change the input files without many changes.

It can be used on any input data to get the desired output

Task 5:

Building test cases to check our Random Forest model and Analysing the output.

Citations:

James Max Kanter, Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. IEEE DSAA 2015. [https://dai.lids.mit.edu/wp-content/uploads/2017/10/DSAA_DSM_2015.pdf]
https://cmdlinetips.com/2018/01/7-tips-to-read-a-csv-file-as-pandas-data-frame/
https://github.com/ryanschaub/Predicting-Loan-Interest-Rates/blob/master/WF%20Interest%20Rate%20Predictions.ipynb
Team 4 (Presentation given in the class)
https://www.liebertpub.com/doi/full/10.1089/big.2018.0092

vaibhavi-khamar / assignment3-lendingclub Goto Github PK

assignment3-lendingclub's Introduction

LendingClubML

assignment3-lendingclub's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent