This project is about building an end to end ETL data pipeline with Postgres database as the staging area, data is converted into Fact and Dimension Tables and loaded into Mage an open source modern data engineering data pipeline tool for transforming and integrating data, Bigquery as the data warehouse and Lockerstudio for visualization and analysis.
Firstly docker volume and network was created to mount docker files of Postgres, Pgadmin and Mage.ai. The docker files were run and pandas was used to load data into the database [Note: Try using Spark SQL or Spark dataframe to load a very large set of data] Chunk by chunk. The Postgres database servers as the staging area for the dataset. Dataset was converted into Fact and dimension tables , Merged and loaded into Mage.ai tool for transforming and integrating data.The data was further transformed with the Mage transformation tool into dictionary as key and values , then loaded into Bigquery. Data was loaded into BigQuery for Joins and aggregation to be used for analysis and Visualization with LockerStudio.
Dataset was converted into Fact and dimension tables , Merged and loaded into Mage.ai tool for transforming and integrating data.The data was further transformed with the Mage transformation tool into dictionary as key and values , then loaded into Bigquery.
Data was loaded into BigQuery for Joins and aggregation to be used for analysis and Visualization