Giter Site home page Giter Site logo

admariner / dend-data_pipeline_airflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gfelot/dend-data_pipeline_airflow

0.0 0.0 0.0 283 KB

Learn to build a data pipeline with Airflow to automate wrangling data - An Udacity Data Engineer Nano Degree Project

Home Page: https://eu.udacity.com/course/data-engineer-nanodegree--nd027

Python 100.00%

dend-data_pipeline_airflow's Introduction

DEND-Data_Pipeline_Airflow

Loading S3 file with Airflow to ETL with Redshift

The purpose of this project is to build an adapted data model thanks to python to load data in a JSON file format and wrangle them into a star schema (see the ERD) with the pipeline written as a code thanks to AirFlow.

Prerequisite

  1. Install Docker.

  2. This project is run with docker.

    docker run -d -p 8080:8080 -v /path/to/project/dags:/usr/local/airflow/dags -v /path/to/project/plugins:/usr/local/airflow/plugins -v to/project/requirements.txt:/requirements.txt --name airflow puckel/docker-airflow webserver
    

And everything is setup to launch AirFlow.

  1. You need also to configure your AWS credential with the AirFlow IU:

    We'll use Airflow's UI to configure your AWS credentials and connection to Redshift.

    1. Go to the Airflow UI: AirFlow Admin Panel

    2. Under Connections, select Create. AirFlow Connection Panel

    3. On the create connection page, enter the following values:

      • Conn Id: Enter aws_credentials.
      • Conn Type: Enter Amazon Web Services.
      • Login: Enter your Access key ID from the IAM User credentials you downloaded.
      • Password: Enter your Secret access key from the IAM User credentials you downloaded.

      Once you've entered these values, select Save and Add Another.

Main Goal

The compagny Sparkify need to analyses theirs data to better know the way users (free/paid) use theirs services. With this data pipeline we will be able to schedule, monitor and build more easily the ETL of this data.

Data Pipeline

Dag

This data pipeline is easy to read and understand even for a newcomer to AirFlow.

Data Model

This pipeline finally is made to build this DB star schema below to make easier the data analysis

ERD

Run it

Few steps

Go to the AirFlow UI after several seconds after running your container.

dend-data_pipeline_airflow's People

Contributors

gfelot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.