This lab is designed to help you get familiar with Apache Airflow
. You will learn how to create a simple DAG, schedule it, and monitor its execution.
Note: You can use Astro CLI
to create a new Airflow project. For more information, see Astro CLI
- Basic knowledge of Python
- Variables
- Functions
- Control Flow
- arg and kwargs
- Basic knowledge of Docker
docker compose up
anddown
is good enough
- poetry
poetry install --no-root
to install dependencies
-
- Lightweight Airflow setup with Docker, see
docker-compose.lite.yaml
- Enable Test button in Airflow UI
- Disable Example DAGs
- Copy Airflow Configuration
- Enable Flower UI
- Lightweight Airflow setup with Docker, see
-
- Workflow Orchestration
- Data Pipeline
-
Overview of Airflow UI and concepts
- Airflow UI
- Pause/Unpause
- Trigger DAG
- Refresh
- Recent Tasks
- DAG Runs
- Graph View
- DAGs
- Operators
- Tasks
- Airflow UI
-
Writing your first DAG (Single Operator)
- Create a new DAG with
PythonOperator
- Defining DAG
- Schedule
- Task
- Test the DAG
- Create a new DAG with
-
Writing your second DAG (Multiple Operators)
- Create a new DAG with
PythonOperator
- Define dependencies between tasks
- Test the DAG
- Create a new DAG with
-
- Fixed Interval
- Cron Expression
- Preset Airflow Scheduler
-
- Create a new DAG
- Create a new connection for Google Drive via Service Account
- Use
GoogleDriveToGCSOperator
to copy files from Google Drive to GCS - Test the DAG
-
GoogleDriveFileSensor
to wait for a file to be uploaded to Google Drive
-
Scraping Data from Githubs to Postgres
SimpleHTTPOperator
to get data from Github APIPostgresOperator
to insert data into Postgres
-
- Learn how to trigger another DAG
- Getting to know
TriggerDagRunOperator
-
Task Decorators - Taskflow API
- Simplified way to define tasks
- Getting to know
@task
decorator - Using
@task
to define taks likePythonOperator
-
Testing - In Progress
- Unit Testing
- DAG Integrity Testing
dag.test()
method
-
Dataset - Data-aware scheduling - In Progress
- Trigger DAG based on the data availability
- Wait for many datasets to be available
-
Celery Executor (Local) - In Progress
- Monitor the task execution with Flower UI (To enable Flower UI, see chapter-0)
- Add more workers to the Celery Executor
- Duplicate
airflow-worker
service indocker-compose.yml
and rename it - Restart Docker
- Duplicate
-
Dependencies between Tasks - In Progress
- Basic define dependencies between tasks
- Fan-in and Fan-out
- Trigger Rules
- Conditional Trigger
-
Managing Complex Tasks with TaskGroup - In Progress
- Group tasks together
- Define dependencies between TaskGroups