Giter Site home page Giter Site logo

data-engineering-portfolio's Introduction

image

Hello World! I'm Vedanth.

This is a complete portfolio of the projects I have designed with a major focus on implementing various data engineering tech and cloud services across Azure, AWS and GCP.

Feel Free to Connect with me ๐Ÿค 

LinkedIn | GitHub

Brief Overview

In this project, I have used the Random User Generator API to fetch data intermittedly using Airflow DAG pipelines and store the data in Postgres DB. The entire streaming process is managed by a Kafka setup that has a Zookeeper pipeline to manage multiple broadcasts and process them from the message queue. There is a master-worker architecture setup on Apache Spark. Finally there is a Cassandra DB setup that has a listener that takes the stream data from Spark and stores in a columnar format. The entire project is containerized with Docker.

Solution Architecture

image

Tech Stack

  • Apache Airflow: Responsible for orchestrating the pipeline and storing fetched data in a PostgreSQL database.
  • Apache Kafka and Zookeeper: Used for streaming data from PostgreSQL to the processing engine.
  • Control Center and Schema Registry: Helps in monitoring and schema management of our Kafka streams.
  • Apache Spark: For data processing with its master and worker nodes.
  • Cassandra: Where the processed data will be stored.

Brief Overview

This is a complete end to end Formula 1 race analytics project that encompasses extraction of data from ErgastAPI, applying the right schema and using slowly changing dimensions with three different layers for raw, processed and presented data. The data is analysed using Azure Databricks after applying SQL filters and transformations to make the data understandable. The data is also subjected to incremental load constraints and data ingestion job is run every Sat at 10pm after the race dynamically with rerun pipelines and Email alerts on failure.

Solution Architecture

image

Tech Stack

  • Spark SQL
  • Azure Databricks
  • Postman
  • PySpark
  • Azure Blob Storage
  • Azure Unity Catalog
  • Azure Data Factory
  • Azure DevOps
  • Azure Synapse Studio
  • Delta Lake Storage
  • PowerBI

Medal Metrics: Tokyo Olympics Data Alchemy ๐Ÿคพโ€โ™€๐ŸŽ–๏ธ

Brief Overview

The project utilizes the Tokyo Olympics Dataset from Kaggle with data from over 11,000 athletes with 47 disciplines, along with 743 Teams taking part in the 2021(2020) Tokyo Olympics. There are different data files for coaches, athletes, medals and teams that was first ingested using KaggleAPI analysed using a variety of Azure Services, finally presented as a neat dashboard on Synapse Studio and PowerBI.

Solution Architecture

image

Tech Stack

  • Azure Data Factory
  • Azure Data Lake Gen 2
  • Azure Blob Storage
  • Azure Databricks
  • Synapse Analytics
  • PowerBI

InvestIQ

Tech Stack

  • AWS EC2
  • Apache Airflow
  • RapidAPI
  • AWS Lambda Functions
  • AWS S3 Storage
  • AWS Cloudwatch
  • AWS Redshift
  • PowerBI

"Introducing InvestIQ Metrics: Where every market heartbeat finds meaning. InvestIQ Metrics is your dynamic portal into the rhythm of the stock market, offering a symphony of data-driven insights and analysis. Powered by cutting-edge real time data, this project transcends the mundane, illuminating the trends, patterns, and opportunities hidden within the market's fluctuations.

data-engineering-portfolio's People

Contributors

vedanthv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.