Lead Score Prediction Model v1.0

Problem Statement

We need a prediction model to predict the probability of our users to deposit. We can do so by having lead quality scoring based on the user bahaviours and characteristics

Definition

i)Lead - Each lead (Unique Binary User ID) represents an individual that signup on our platforms
ii)Lead Score - We measure the score of our leads by predicting the probability of having first deposit

Methodolody

For simplicity, we can use users activity within 24 hours after signup, to predict if an user will deposit within 14 days after signup. So this will be a binary classification

Features Engineering

We will feature engineering into different streams based on the nature of the features.
To start with, we will have:
i) Signup Feature - User details by the time they signup
ii) Clickstream - User activities on the website (eg: hits, sessions etc)
iii) Demo Trade Activity - User trading activities with demo accounts (BO and MT5)
vi) CLV - User deposit activities and amount (this will be the dependent variable)

Refer Here for Feature Details

We can do feature engineering on each part separately, and the final dataset will be just the concatenation of these parts.

Model Training

In this version, we train data with users joined from 2021-01-01 to 2021-02-15
The first model lead_score_model is the baseline model which includes every features with boosted-tree classifier

We then select the top 20 features from the first model by feature importance, and do training with DNN classifier

Current best score

PRC AUC:0.685
ROC AUC:0.857

Further Improvement

More Feature Engineering

Combine Demo Trade Features
The features between BO and MT5 Demo trade are similar, and we will have huge amount of null values if we separate them, because most of the users dont do demo trade on the first day of signup, the issue becomes worse when we split them into BO and MT5

Include Livechat Data
User activities with help center might be a good feature for predicting deposit posibilities

Improve Features ETL Pipeline

Optimize Train Data Table ETL Process
The table is currently updated with scheduled query by weekly (overwrite) which is around 80+ GB per run We can reduce it by using append option

Optimize Subset Features ETL Process
As we will have more summary and aggregated tables with data warehourse, some features can be obtained more easily
Eg: Most of the Signup Features can get from User Profile Combined Table

Model Interpretability

One of the major drawbacks of machine learning is it's interpretability I found that there is a technique called weight of evidence which can help to explain the predicting power of isolated independent variable towards the dependent variable(classification). This can be implemented on the next version.
Reference Article

Related Visualization

Tableau Dashboard

Lead Score 2.0 usage proposal (N days)

NoteBook

xianwei-chris / lead_score_prediction Goto Github PK

lead_score_prediction's Introduction

Lead Score Prediction Model v1.0

Problem Statement

Definition

Methodolody

Features Engineering

Model Training

Current best score

Further Improvement

More Feature Engineering

Improve Features ETL Pipeline

Model Interpretability

Related Visualization

Lead Score 2.0 usage proposal (N days)

lead_score_prediction's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent