Giter Site home page Giter Site logo

steliosgian / churn-engineering Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 17.76 MB

Predicting customer churn repo focusing on the software engineering part

Python 75.42% Dockerfile 1.92% Shell 22.66%
python churn docker machine-learning docker-compose mlflow travis-ci bash scala spark

churn-engineering's Introduction

Customer Churn - Engineering

Build Status LinkedIn

Table of Contents
  1. About The Project
  2. Getting Started
  3. Roadmap

About The Project

The main purpose of this project, is to focus on the engineering part and not so on the modelling part. We will create efficient data pipelines and will adhere to coding best practices using different tools, languages, and technologies like Python, Scala, Spark, Docker, CI/CD tools, etc.

Note: This repo will be used for testing different technologies.

Built With

Getting Started

The dataset for this project is taken from Kaggle. It is a simple dataset regarding customer churn including both numeric and categorical features. It is a classification task with the target variable being binary (True/False), meaning that if a customer has left the company, the target variable is True/1, otherwise it is False/0.

It is necessary to save the csv file from Kaggle to the "data" (src/python/src/data/) directory renaming it as "telco_churn.csv" in order for the pipeline to work.

To run the project, you need to clone this repo and run the docker/docker-compose-shell.sh script.

This script runs the train, predict, or both phases. To run only the train phase, include the argument "train" to the script. For the predict phase, add "predict", and for both, either run it with no arguments or add "both".

Clone the repo

git clone https://github.com/SteliosGian/churn-engineering.git

Run the script

./docker/docker-compose-shell.sh

Make sure the script has the adequate permissions

chmod +x docker/docker-compose-shell.sh

or run

bash docker/docker-compose-shell.sh

MLflow Server

The project starts a local MLflow server running in the background, which you can access at http://127.0.0.1:5000/ .
With MLflow, you can track custom metrics and hyperparameters as well as log artifacts such as plots and models.

mlflow_gif.gif

Prerequisites

Docker must be installed in order to run the project with Docker. Otherwise, it can be executed by running the python scripts (train.py/predict.py) individually.

Notes

Spark is not needed for this project because the amount of data is not that large. However, a small pipeline is created in the branch "spark" using Scala.

Roadmap

  • Docker ☑
  • Shell scripts ☑
  • TravisCI ☑
  • MLflow ☑
  • Spark ☑
  • API

churn-engineering's People

Contributors

steliosgian avatar

Watchers

 avatar  avatar

Forkers

aguiejean1992

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.