Giter Site home page Giter Site logo

sgouda0412 / customer-churn-data-analytics-data-pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from 3amory99/customer-churn-data-analytics-data-pipeline

0.0 0.0 1.0 1.37 MB

Customer Churn Data Analytics Data Pipeline using Apache Airflow, Glue, S3, Redshift, PowerBI

Python 100.00%

customer-churn-data-analytics-data-pipeline's Introduction

Customer Churn Data Analytics Data Pipeline

Welcome to the Customer Churn Data Analytics Data Pipeline project! This comprehensive Python ETL (Extract, Transform, Load) data engineering endeavor utilizes the power of Apache Airflow and various AWS services, including Glue, S3, and Redshift, to create an end-to-end solution for analyzing customer churn data. The project also seamlessly integrates PowerBI for insightful data visualization.

cover

Project Overview

In this hands-on project, we delve into the intricacies of building and automating a robust ETL pipeline. The key components of our pipeline include:

  • Apache Airflow:

    • Open-source orchestration and scheduling platform.
    • Task automation for seamless workflow execution.

    dag

  • AWS Glue:

    • Utilizes Glue Crawler to infer schemas from an AWS S3 bucket.
    • Creates a comprehensive data catalog for efficient data management.
    • Facilitates data loading into an Amazon Redshift data warehouse.

    glue_1

    glue_2

  • AWS S3:

    • Serves as the source for our data, housing the information to be analyzed.

    s3

  • Amazon Redshift:

    • Acts as the central data warehouse for storing and managing our processed data.

    redshift

  • PowerBI:

    • Connects seamlessly to the Redshift cluster for dynamic and interactive data visualization.
    • Provides valuable insights into customer churn patterns.

    powerbi

Project Workflow

  1. Data Extraction:

    • AWS Glue Crawler extracts data from the AWS S3 bucket.
    • Schemas are inferred, and a data catalog is created for easy reference.
  2. Data Transformation:

    • Utilizes Apache Airflow to orchestrate the ETL workflow.
    • Cleansing and transforming data to prepare it for analysis.
  3. Data Loading:

    • AWS Glue loads the processed data into the Amazon Redshift data warehouse.
  4. Data Visualization:

    • PowerBI connects to the Redshift cluster for interactive data visualization.
    • Gain valuable insights and detect patterns related to customer churn.

Project Highlights

  • AWS Cloud Platform:

    • The entire project is executed on the AWS cloud platform, ensuring scalability and reliability.
  • End-to-End Automation:

    • Apache Airflow is employed for the orchestration and automation of the entire ETL pipeline.
  • Comprehensive Data Analysis:

    • Leverage Amazon Athena to write SQL queries on the data catalog for in-depth analysis.

Getting Started

Dataset

This project utilizes a fictional telco company dataset that provided home phone and Internet services to 7043 customers in California during Q3. The dataset is available on Kaggle and can be accessed here.

Dataset Overview

  • Number of Customers: 7043
  • Location: California
  • Time Period: Q3

Kaggle Dataset Link

Telco Customer Churn IBM Dataset

To dive into the project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/3amory99/Customer-Churn-Data-Analytics-Data-Pipeline.git

customer-churn-data-analytics-data-pipeline's People

Contributors

3amory99 avatar

Forkers

sureshb208

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.