Giter Site home page Giter Site logo

lead_score_prediction's Introduction

Lead Score Prediction Model v1.0

Problem Statement

We need a prediction model to predict the probability of our users to deposit. We can do so by having lead quality scoring based on the user bahaviours and characteristics

Definition

i)Lead - Each lead (Unique Binary User ID) represents an individual that signup on our platforms
ii)Lead Score - We measure the score of our leads by predicting the probability of having first deposit

Methodolody

For simplicity, we can use users activity within 24 hours after signup, to predict if an user will deposit within 14 days after signup. So this will be a binary classification

Features Engineering

We will feature engineering into different streams based on the nature of the features.
To start with, we will have:
i) Signup Feature - User details by the time they signup
ii) Clickstream - User activities on the website (eg: hits, sessions etc)
iii) Demo Trade Activity - User trading activities with demo accounts (BO and MT5)
vi) CLV - User deposit activities and amount (this will be the dependent variable)

Refer Here for Feature Details

We can do feature engineering on each part separately, and the final dataset will be just the concatenation of these parts.

Model Training

In this version, we train data with users joined from 2021-01-01 to 2021-02-15
The first model lead_score_model is the baseline model which includes every features with boosted-tree classifier

We then select the top 20 features from the first model by feature importance, and do training with DNN classifier

Current best score

PRC AUC:0.685
ROC AUC:0.857

Further Improvement

More Feature Engineering

Combine Demo Trade Features
The features between BO and MT5 Demo trade are similar, and we will have huge amount of null values if we separate them, because most of the users dont do demo trade on the first day of signup, the issue becomes worse when we split them into BO and MT5

Include Livechat Data
User activities with help center might be a good feature for predicting deposit posibilities

Improve Features ETL Pipeline

Optimize Train Data Table ETL Process
The table is currently updated with scheduled query by weekly (overwrite) which is around 80+ GB per run We can reduce it by using append option

Optimize Subset Features ETL Process
As we will have more summary and aggregated tables with data warehourse, some features can be obtained more easily
Eg: Most of the Signup Features can get from User Profile Combined Table

Model Interpretability

One of the major drawbacks of machine learning is it's interpretability I found that there is a technique called weight of evidence which can help to explain the predicting power of isolated independent variable towards the dependent variable(classification). This can be implemented on the next version.
Reference Article

Related Visualization

Tableau Dashboard

Lead Score 2.0 usage proposal (N days)

NoteBook

lead_score_prediction's People

Contributors

chris-tan-binary avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.