Giter Site home page Giter Site logo

limagbz / data-mesh-yelp Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 0.0 10.69 MB

A complete data product project used to study concepts related to data enginerring such as modeling, collection, operations and so on.

License: GNU General Public License v3.0

Dockerfile 1.49% Shell 4.03% Mustache 18.61% Smarty 63.49% Makefile 0.05% Go 1.12% Nix 0.26% Jupyter Notebook 9.11% Python 1.83%
data-engineering kubernetes pipelines data-mesh grafana grafana-loki microk8s prometheus devcontainer minio yelp-dataset change-data-capture debezium kafka kafka-connect

data-mesh-yelp's Introduction

Data Mesh Project

GPLv3 License

This project aims to design and implement a data mesh architecture by using close to real business data provided on the Yelp Dataset. This matches perfectly with the Data Mesh concept of modeling analytics for business. For details about the data see Yelp Dataset Documentation.

Note that this is not a production-ready project. This is rather a lab to deep my knowledge into Data Engineering, DevOps and mainly data meshs. So errors and changes will occur as my knowledge evolves. Feel free to contribute with this project by contacting me with suggestions, tips and ways that I can improve this code (see Contributing for more details)

  1. Logical Architecture
  2. Platform Architecture
  3. Setup your Local Environment
  4. Contributing
  5. References

Logical Architecture

Note

Please refer to Logical Architecture for details about the diagram. For information about each product (including their canvas and interaction map) refer to their own documentation on products folder.

Platform Architecture

Note

This architecture (and the diagram) is heavily based on the tech stacks found here, more precisely this a mix of both Datamesh Architecture: MinIO and Trino and Datamesh Architecture: dbt and Snowflake. Changes should occur as the project. Please refer to Infra README's for more information about the architecture.

Setup your local environment

In order to deploy the resources a kubernetes cluster is required. How to deploy a local kubernetes cluster is out of the scope of this project. This code was tested under a MicroK8S managed cluster. If this is your choice the following addons were enabled:

microk8s enable dns
microk8s enable helm
microk8s enable helm3
microk8s enable hostpath-storage
microk8s enable rbac
microk8s enable registry

Note

There are many solutions out there to deploy a local cluster (e.g. Minikube, Kind). You can see some examples on Kubernetes: Install Tools.

It is also required to download the Yelp Dataset (photos are not required) and extract it on the data folder. To download please follow the instruction on Yelp Dataset: Download The Data

Setup your development environment

This project embed a full-feature developer container for VSCode users containing all the tools, extensions and required configurations to develop the code. If you don't know how dev containers work please read Visual Studio Code: Developing Inside a Container.

For people that do not use VSCode the Dockerfile contains all the tools used by the project. You can use that as a base for setup your environment.

Contributing

Since this is a lab project currently I am the only person developing the code. However feel free to propose new features/improvements, ask questions, suggest tips and etc on the discussion tab. For bug reports use the issues tab (with the bug template).

Note

Please, read the CONTRIBUTING Guide for more details about styleguides, best practices and conventions followed by the project.

References

Below are some main references used by this project. Feel free to read them for a more deep understanding about the project.

data-mesh-yelp's People

Contributors

limagbz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.