Start Data Engineering's Projects
Code for "Advanced data transformations in SQL" free live workshop
Code for my "Efficient Data Processing in SQL" book.
Beginner data engineering project - batch edition
Simple stream processing pipeline
Near real time ETL to populate a dashboard.
Repo for CDC with debezium blog post
Cost Efficient Data Pipelines with DuckDB
open data for blog content at https://www.startdataengineering.com/
Repository for Data Engineering Interview Series
Data pipeline code generator for portfolio projects
Code for data quality with greatexpectations blog
Sample project to demonstrate data engineering best practices
Code to demonstrate data engineering metadata & logging best practices
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/
Repository showing how to automate data testing as part of CI
Repo to explain development, CI/CD cycle in dbt
Multiple node presto cluster on docker container
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Example repo to create end to end tests for data pipeline.
Code for "Efficient Data Processing in Spark" Course
Simple ETL demonstrated with literate programming
public file hosting
Making data pipelines idempotent
Profile readme
Local development environment for python data projects, with Docker