Giter Site home page Giter Site logo

vmellos / dataqualitygreatexpectationsspark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cicerojmm/dataqualitygreatexpectationsspark

0.0 0.0 0.0 315 KB

Testes de Qualidade em Dados com Great Expectation e Spark

Shell 1.23% Python 96.65% Dockerfile 2.12%

dataqualitygreatexpectationsspark's Introduction

Data Quality com Great Expectations e Spark

O Great Expectations é uma ferramenta de validação de dados open source que ajuda a garantir a qualidade dos dados.

Exemplos de arquitetura

Case 1: Great Expectations com Spark no EMR Orquestrado pelo Airflow

alt text

Principais arquivos do projeto


├───airflow
│   ├───airflow_infra
|   |   └───Dockerfile: contém algumas configurações da imagem Docker do Airflow
|   |   └───docker-compose.yml: contém a configuração de todos os serviços do Airflow
|   |   └───requirements.txt: contém as dependências Python para executar as DAGs do Airflow
│   ├───dags
│       └───dag_apply_data_quality_with_ge.py: DAG do Airflow para criar o EMR, executar o script do Great Expectations e terminar o cluster
|       └───bootstrap-great-expectation.sh: script de bootstrap do EMR para instalação das dependências do projeto
|       └───emr_config.json: configuração do EMR para executar um cluster
└───script_pyspark_emr
    ├───modules
    |   └───run.py: arquivo responsável por definir qual função será executada
    |   ├───utils
    |   |   └───spark_utils.py: contém a lógica para criar uma instância do Spark
    |   |   └───logger_utils.py: contém a lógica para gerenciar os logs da aplicação
    |   ├───jobs
    │       └───job_processed_data.py: script principal com os case de testes do Great Expectations
    └───main.py: arquivo que inicializa a execução do script Spark

dataqualitygreatexpectationsspark's People

Contributors

cicerojmm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.