Giter Site home page Giter Site logo

cloudera / cml_amp_churn_prediction Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 31.0 2.73 MB

Build an scikit-learn model to predict churn using customer telco data.

License: Apache License 2.0

Python 10.64% Jupyter Notebook 86.51% Shell 0.01% CSS 0.64% JavaScript 0.28% HTML 1.93%
churn-prediction logistic-regression explainability interpretability lime

cml_amp_churn_prediction's Introduction

Churn Modeling with scikit-learn

This repository accompanies the Visual Model Interpretability for Telco Churn blog post and contains the code needed to build all project artifacts on CML. Additionally, this project serves as a working example of the concepts discussed in the Cloudera Fast Forward report on Interpretability which is freely available for download.

table_view

The primary goal of this repo is to build a logistic regression classification model to predict the probability that a group of customers will churn from a fictitious telecommunications company. In addition, the model is interpreted using a technique called Local Interpretable Model-agnostic Explanations (LIME). Both the logistic regression and LIME models are deployed using CML's real-time model deployment capability and exercised via a basic Flask-based web application that allows users to interact with the model to see which factors in the data have the most influence on the probability of a customer churning.

Project Structure

The project is organized with the following folder structure:

.
├── code/              # Backend scripts, and notebooks needed to create project artifacts
├── flask/             # Assets needed to support the front end application
├── images/            # A collection of images referenced in project docs
├── models/            # Directory to hold trained models
├── raw/               # The raw data file used within the project
├── cdsw-build.sh      # Shell script used to build environment for experiments and models
├── model_metrics.db   # SQL lite database used to store model drift metrics
├── README.md
└── requirements.txt

By following the notebooks, scripts, and documentation in the code directory, you will understand how to perform similar classification tasks on CML, as well as how to use the platform's major features to your advantage. These features include:

  • Data ingestion and manipulation with Spark
  • Streamlined model development and experimentation
  • Point-and-click model deployment to a RESTful API endpoint
  • Application hosting for deploying frontend ML applications
  • Model operations including model governance and tracking of mode performance metrics

We will focus our attention on working within CML, using all it has to offer, while glossing over the details that are simply standard data science. We trust that you are familiar with typical data science workflows and do not need detailed explanations of the code.

If you have deployed this project as an Applied ML Prototype (AMP), you will not need to run any of the setup steps outlined in this document as everything is already installed for you. However, you may still find it instructive to review the documentation and their corresponding files, and in particular run through code/2_data_exploration.ipynb and code/3_model_building.ipynb in a Jupyter Notebook session to see the process that informed the creation of the final model.

If you are building this project from source code without automatic execution of project setup, then you should follow the steps listed in this document carefully and in order.

cml_amp_churn_prediction's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cml_amp_churn_prediction's Issues

Script 7A Discrepancy between readme and code

Hi Andrew,

I am running each step in the code folder manually as if I were a workshop attendee and I noticed in scripts 7A we use the cmlbootstrap lib to grab the latest model id (lines 200 - 217) based on model name and we assume the deployed model name is "Churn Model API Endpoint" (line 207).

However, if we create a model following the instructions in the readme (the one in the code folder) for part "5 Serve Model" it will be called "explainer" so script 7A doesn't pick up the model correctly and fails.

We are going to use this project on Wednesday 4/21 for a large workshop and I was wondering if you could please fix it by then if possible.

Thanks,
Paul

Need to manually pip install seaborn

Hello,

I was running the churn demo last night and noticed I had to manually pip install seaborn in order to run scripts 7A and B even after pip installing the requirements. Please look into this and let me know if you need more.

Thanks!
Paul

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.