Giter Site home page Giter Site logo

udacity-data-scientist-nanodegree-capstone-project's Introduction

udacity-data-scientist-nanodegree-capstone-project

Blog post

https://medium.com/@jsohi/what-type-of-users-cancel-their-music-subscriptions-1a6a42ccebae

Motivation

This is the capstone project for the Udacity Data Scientist Nanodegree program. In this project, I used Spark (PySpark) to predict customer churn for a fictional company called Sparkify (with data provided by Insight Data Science). I went through three ML model iterations until I got an acceptable F1 score. This is a measure of model perfomance (a harmonic mix of precision and recall). Traditional accuracy was not a good metric in this case because the dataset was unbalanced (only a few users churn relative to users who do not)

File/directory tree

- README.md # this file
- transform_raw_to_user.py - take the raw data and transform to user level for modeling
- Sparkify.ipynb # working through 1st model iteration with sample data
- final.ipynb # the intermediate and final model selected which is GBT

- data - entire folder is gitignored due to large file size
|- mini_sparkify_event_data.json  # raw data to process
|- TRANSFORMED_mini_sparkify_event_data.csv  # data that has been transformed

- models # different model iterations (different ML classification algorithms, parameters, features, etc)
|- final_model # one folder per model version

Libraries used

  • pyspark (v2.4.5)
  • plotly
  • pandas
  • pathLib

Analysis/model results

Model Iteration Model Type F1 Score (test set)
1 Logistic Regression 0.67
2 Random Forest 0.69
3 Gradient Boosted Trees 0.71

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.