Giter Site home page Giter Site logo

scotiabankdschallengef23's Introduction

Scotiabank Data Science Challenge

The Scotiabank Data Science Challenge invites participants to engineer machine learning models, optimizing a large financial institution's loan approval system. The challenge focuses on the banking procedure of accurately evaluating an applicant's creditworthiness, balancing the act of maximizing interest revenues while minimizing potential losses from loan delinquencies.

Process

Step 1: Data Preprocessing and Cleaning ๐Ÿ“Š

  • Variable Management: Utilized the "Manage Variable" node to refine our dataset, assigning new roles and levels for more preprocessed, precise analysis. For instance, this included converting numerical variables from a "nominal" level to an "ordinal" level, which led to a more accurate input analysis.

  • Feature Machine: Deployed the "Feature Machine" node, a data preprocessing tool for feature extraction and data organization. This node handled data irregularities by using various techniques, such as imputation to fill missing values through advanced median and mean calculations. We used transformation policies focusing on Cardinality, Kurtosis, Missingness, Outliers, and Skewness. The number of features per input was set to 8 to maximize net profit without overfitting the model.

Step 2: Logistic Regression Model ๐Ÿง 

  • Model Selection: Adopted a Logistic Regression model due to its supervised machine learning algorithm used for binary classification tasks.
  • Methodology: Applied a generalized logit function for its capacity to handle multicategorical response variables, paired with a Stepwise method to refine feature selection based on statistical significance.

Step 3: Data Visualization and Analysis ๐Ÿ“ˆ

  • Employed detailed histograms to visualize applicant data, focusing specifically on the delinquency rates within the 0-6% interval. Post-analysis, we compiled and exported the selected applicant IDs into a CSV file.

Results ๐Ÿฅ‡

By leveraging Cortex Analytics, powered by SAS, our model achieved the highest net profit and ranked first place on the leaderboard.

  • Net Profit: $548,214,371.75
  • Approved Applicants: 756,054 applicants

ROC (Receiver Operating Characteristic) Graph Analysis ๐Ÿ“

  • The orange curve represents the training data set, while the purple curve represents the validation data set.
  • The data was partitioned into 70 (train) : 30 (validation) ratio in order to prevent overfitting.
  • Both curves are closer to the top-left corner, indicating good model performance on the training data.
  • The minimal gap between the training and validation curves suggests that the model generalizes well without overfitting.

More Insights ๐Ÿ”Ž

scotiabankdschallengef23's People

Contributors

donghwui avatar leo-cf-tian avatar

Stargazers

Jenny Hui avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.