churn-prediction's Introduction

Predicting Customer Churn using Apache Spark

Project Motivation
Installation
File Descriptions
Results
Acknowledgements

Project Motivation

This project was created as part of Udacity's Data Scientist for Enterprise nanodegree. Here I have analyzed the log file of a music streaming service 'Sparkify' and built a machine learning model for predicting customer churn using Apache Spark. The full dataset is quite large (12GB).

Installation

The final code was run on IBM Watson Studio using Default Spark Python 3.5 XS runtime.

Libraries Used : numpy, pandas, matplotlib, seaborn, pyspark

File Descriptions

Sparkify.ipynb : Contains detailed analysis, visualizations and modeling for a subset of data.
sparkify_app_ibm.ipynb : Contains the final code run on IBM Watson Studio.

Results

The final Random Forest Classifier model gives an F1-score of 0.88, which is quite satisfactory. There is still a lot of scope for improvements, like creating new features from the data and trying out other models and hyperparameters.

For a detailed discussion, check out my blog post

Acknowledgements

Thanks to Udacity for the data and course material.

Recommend Projects

rehan-ahmar / churn-prediction Goto Github PK

churn-prediction's Introduction

Predicting Customer Churn using Apache Spark

Table of Contents

Project Motivation

Installation

File Descriptions

Results

Acknowledgements

churn-prediction's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent