Giter Site home page Giter Site logo

rifa8 / capstone-project-with-dynamic-dag Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 4.91 MB

The project focuses on creating an ELT pipeline to consolidate data from diverse resources into a single source of truth in BigQuery. The heart of this project is the innovative use of Apache Airflow to design a dynamic Directed Acyclic Graph (DAG) that automates task generation based on predefined file configurations.

Python 100.00%
elt visualization dynamic-dag

capstone-project-with-dynamic-dag's Introduction

Capstone Project Brief Data Engineering Team D ("The Future")

Constraints

  • Separate data comes from multiple sources such as databases, CSV and JSON.
  • Constraints for each problem will be specifically defined in the project description section.

About the Project

Background

An edu-tech platform called "pinter-skuy" provides online courses facilitated by professional mentors, and anyone can enroll in these courses. As the business gains momentum, the management level aims to conduct monitoring and evaluation of their online courses.

Therefore, the information that has been stored in different sources to date is intended to be consolidated into a single source of truth for subsequent analysis.

Tools and Framework

Github Badge Docker Badge Cloud-Shell Phyton Postgres Airflow Google_BigQuery_Badge

ERD

ERD

Flowchart Project

flowchart

Running Project

git clone https://github.com/rifa8/capstone-project-with-dynamic-dag
docker compose up -d

Then open localhost:8080 to access Airflow.

Username: airflow
Password: airflow

airflow

Next, set up connections in Airflow. Go to Admin >> Connections in the Airflow UI, then add a connection. In this project, there are 2 connections, to_bq for connecting to BigQuery and pg_conn for connecting to the PostgreSQL database. to_bq

pg_conn

Then run the DAG.

TIPS: Let the DAG run according to the schedule. Do not manually run the DAG so that the ExternalTaskSensor can activate automatically.

First, activate the DAG dag_etl_to_dwh and wait until its status is success. After that, activate the DAG dag_etl_to_datamart and the ExternalTaskSensor will run automatically because the task status in the DAG dag_etl_to_dwh is already success, as specified in the script for dag_etl_to_datamart, allowed_states=['success'].

task_wait_ext_task = ExternalTaskSensor(
    task_id=f"wait_{ext_task_depen['dag_id']}_{ext_task_depen['task_id']}",
    external_dag_id=ext_task_depen['dag_id'],
    external_task_id=ext_task_depen['task_id'],
    allowed_states=['success'],
    execution_delta=timedelta(minutes=ext_task_depen['minutes_delta'])
                )

dwh

datamart

Check tables in BigQuery

Dataset dwh dwh

Dataset datamart datamart

To view the visualization results (dashboard), you can also access the following link: Looker Studio

Author

  • Yoga Martafian Github Badge

  • Karno Github Badge

  • Muhammad Rifa Github Badge

capstone-project-with-dynamic-dag's People

Contributors

rifa8 avatar yogamartafian avatar

Stargazers

 avatar Nadir Basalamah avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.