Giter Site home page Giter Site logo

data_project's Introduction

ANNUAL INCOME FOR NEWZEALAND

A data engineering Project

At this project, i've tried to download a CSV file from the website of the newzealand gouvernemant to create a dashboard that resume the income of this country from 2013 to 2020. For this project we're going to use :

dbt also allows us to introduce good software engineering practices by defining a deployment workflow: - Develop models - Test and document models - Deploy models with version control and CI/CD. for more infos go check : https://github.com/ziritrion/dataeng-zoomcamp/blob/main/notes/4_analytics.md

  • BigQuery : is a Data Warehouse solution offered by Google Cloud Platform. -BQ is serverless. There are no servers to manage or database software to install; this is managed by Google and it's transparent to the customers. -BQ is scalable and has high availability. Google takes care of the underlying software and infrastructure. -BQ has built-in features like Machine Learning, Geospatial Analysis and Business Intelligence among others. -BQ maximizes flexibility by separating data analysis and storage in different compute engines, thus allowing the customers to budget accordingly and reduce costs. -Some alternatives to BigQuery from other cloud providers would be AWS Redshift or Azure Synapse Analytics.

  • Data studio : A free datavisualisation tool

we’re going to design a pipeline based on GCP (Google Cloud Platform) with the use of :

  • Terraform to create resources in GCP
  • Data studio for visualization
  • Airflow for Pipeline Orchestration
  • BigQuery as a Warehouse
  • Dbt : For tranforming data

DATA

Terraform used to create ressources in GCP, two files :

  • maint.tf :the version of terraform and GCP credentials
  • variables.tf : all the ressources (bucket, bigquery datset, storage type ...)

image

In this deposit you're going to find the data ingestion script in the folder airflow/dags + a docker-compose.yaml and a docker file in order to install airlow. I've put some comments on the code the results of airflow :

Capture 4 dags :

  • Download the data set (choose your dataset)
  • Format to park in order to change the type of document from csv to parquet size
  • local to gcs in order to put data in a DATA LAKE
  • Bigquery in order to create the table in the DATA WAREHOUSE result on GCP :

image

image

After doing all that, we're going to use dbt tool (cloud version) in order to transform our data (raw data to transformed) i've created a table named total income

look at the code :

  • Schema.yml : Define our source code
  • mytransformation.sql : my first transfomation
  • Total_income.sql : my second one with where statement in order to get only the income

image

dbt

Our new table is ready to be visualisad on Data studio :

image

after we've created the table with dbt, i 've created tree tiles with Data studio :

image

data_project's People

Contributors

amine-elkostali avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.