Giter Site home page Giter Site logo

matalie / mobydq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ubisoft/mobydq

0.0 0.0 0.0 38.18 MB

Tool to automate data quality checks on data pipelines

License: Apache License 2.0

Dockerfile 1.36% Python 46.77% HTML 2.24% JavaScript 28.27% CSS 0.04% PLpgSQL 6.31% Shell 15.02%

mobydq's Introduction

MobyDQ

MobyDQ is a tool for data engineering teams to automate data quality checks on their data pipeline, capture data quality issues and trigger alerts in case of anomaly, regardless of the data sources they use.

Data pipeline

This tool has been inspired by an internal project developed at Ubisoft Entertainment in order to measure and improve the data quality of its Enterprise Data Platform. However, this open source version has been reworked to improve its design, simplify it and remove technical dependencies with commercial software.


Getting Started

Skip the bla bla and run your data quality indicators by following the Getting Started page. The complete documentation is also available on Github Pages: https://mobydq.github.io.


Requirements

Install Docker

This tool has been fully containerized with Docker to ensure easy deployment and portability. To add the Docker repository to your Linux machine, execute the following commands in a terminal window.

$ sudo apt-get update
$ sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

Install Docker Community Edition.

$ sudo apt-get update
$ sudo apt-get install docker-ce

Add your user to the docker group to setup its permissions. Make sure to restart your machine after executing this command.

$ sudo usermod -a -G docker <username>

Install Docker Compose

Execute the following command in a terminal window.

$ sudo apt install docker-compose

Setup Your Instance

Create Configuration Files

Based on the template below, create a text file named .env at the root of the project. This file is used by Docker Compose to load configuration parameters into environment variables. This is typically used to manage file paths, logins, passwords, etc. Make sure to update the postgres user password for both POSTGRES_PASSWORD and DATABASE_URL parameters.

# DB
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password

# GRAPHQL
DATABASE_URL=postgres://postgres:password@db:5432/mobydq

# SCRIPTS
GRAPHQL_URL=http://graphql:5433/graphql
MAIL_HOST=smtp.server.org
MAIL_PORT=25
MAIL_SENDER[email protected]

# APP PARAMS
NODE_ENV=development
REACT_APP_GRAPHQL_API_URL=http://0.0.0.0:5433/graphql

Create Docker Network

This custom network is used to connect the different containers between each others. It is used in particular to connect the ephemeral containers ran when executing batches of indicators.

``


## Create Docker Volume
Due to Docker compatibility issues on Windows machines, we recommend to manually create a Docker volume instead of directly mounting external folders in `docker-compose.yml`. This volume will be used to persist the data stored in the PostgreSQL database. Execute the following command.
```shell
``


## Build Docker Images
Go to the project root and execute the following command in your terminal window.
```shell
$ cd mobydq
$ docker-compose build --no-cache

Run Docker Containers

To start all the Docker containers as deamons, go to the project root and execute the following command in your terminal window.

$ cd mobydq
$ docker-compose up -d db graphql api app

Individual components can be accessed at the following addresses:

Note access to GraphiQL and the PostgreSQL database is restricted by default to avoid intrusions. In order to access these addresses directly, you must run them with the following command to open their ports:

$ cd mobydq
$ docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d db graphql

Run Test Cases

To execute all test cases, execute following command from the project repository:

 to be documented

Dependencies

Docker Images

The containers run by docker-compose have dependencies with the following Docker images:

Python Packages

docker (3.5.0)

jinja2 (2.10.0)

mobydq's People

Contributors

alexisrolland avatar sijonelis avatar lerignoux avatar pascalhonegger avatar thomasgassmann avatar matalie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.