Giter Site home page Giter Site logo

airflow-lab's Introduction

Airflow Lab

Introduction

This lab is designed to help you get familiar with Apache Airflow. You will learn how to create a simple DAG, schedule it, and monitor its execution.

Note: You can use Astro CLI to create a new Airflow project. For more information, see Astro CLI

Prerequisites

  • Basic knowledge of Python
    • Variables
    • Functions
    • Control Flow
    • arg and kwargs
  • Basic knowledge of Docker
    • docker compose up and down is good enough
  • poetry
    • poetry install --no-root to install dependencies

Lab Instructions

  1. Configuration

    • Lightweight Airflow setup with Docker, see docker-compose.lite.yaml
    • Enable Test button in Airflow UI
    • Disable Example DAGs
    • Copy Airflow Configuration
    • Enable Flower UI
  2. What's Airflow?

    • Workflow Orchestration
    • Data Pipeline
  3. Overview of Airflow UI and concepts

    • Airflow UI
      • Pause/Unpause
      • Trigger DAG
      • Refresh
      • Recent Tasks
      • DAG Runs
      • Graph View
    • DAGs
    • Operators
    • Tasks
  4. Writing your first DAG (Single Operator)

    • Create a new DAG with PythonOperator
    • Defining DAG
      • Schedule
      • Task
    • Test the DAG
  5. Writing your second DAG (Multiple Operators)

    • Create a new DAG with PythonOperator
    • Define dependencies between tasks
    • Test the DAG
  6. Schedule your DAG

    • Fixed Interval
    • Cron Expression
    • Preset Airflow Scheduler
  7. Google Drive to GCS

    • Create a new DAG
    • Create a new connection for Google Drive via Service Account
    • Use GoogleDriveToGCSOperator to copy files from Google Drive to GCS
    • Test the DAG
  8. Working with Sensor

    • GoogleDriveFileSensor to wait for a file to be uploaded to Google Drive
  9. Scraping Data from Githubs to Postgres

    • SimpleHTTPOperator to get data from Github API
    • PostgresOperator to insert data into Postgres
  10. Trigger Other DAGs

    • Learn how to trigger another DAG
    • Getting to know TriggerDagRunOperator
  11. Task Decorators - Taskflow API

    • Simplified way to define tasks
    • Getting to know @task decorator
    • Using @task to define taks like PythonOperator
  12. Testing - In Progress

    • Unit Testing
    • DAG Integrity Testing
    • dag.test() method
  13. Dataset - Data-aware scheduling - In Progress

    • Trigger DAG based on the data availability
    • Wait for many datasets to be available
  14. Celery Executor (Local) - In Progress

    • Monitor the task execution with Flower UI (To enable Flower UI, see chapter-0)
    • Add more workers to the Celery Executor
      • Duplicate airflow-worker service in docker-compose.yml and rename it
      • Restart Docker
  15. Dependencies between Tasks - In Progress

    • Basic define dependencies between tasks
    • Fan-in and Fan-out
    • Trigger Rules
    • Conditional Trigger
  16. Managing Complex Tasks with TaskGroup - In Progress

    • Group tasks together
    • Define dependencies between TaskGroups

airflow-lab's People

Contributors

werockstar avatar

Stargazers

Karnpapon Boonput avatar

Watchers

Lucian avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.