Giter Site home page Giter Site logo

re-data's Introduction

Logo

Slack License Last commit

re_data

re_data is a framework to improve data quality in your company, build on top of dbt.

  • create a data quality project for your organization to monitor and improve the quality of your data,
  • compute data quality metrics for all your tables and add your own code computing those,
  • look for anomalies in all your metrics and investigate problematic data

Key features

Data quality metrics

re_data creates data quality metrics schema in your data warehouse containing metrics for all your tables (or only those you would like to monitor) Metrics schema contains information about:

  • time since last records were added
  • number of records added
  • number of missing values in columns over time
  • min/max/avg of values in all your columns
  • string lengths in all your columns

Think about it as an INFORMATION_SCHEMA on steroids ๐Ÿ’ช And this is just a start and in your project, you can compute many other data quality metrics specific to your organization.

Detecting anomalies

re_data looks at metrics gathered and alerts if those are suspicious, comparing to data saw in the past. This means situations like those:

  • sudden drops or increases in the volume of new records added to your tables
  • longer than expected break between data arrivals
  • increase in NULL values in one of your columns
  • different maximal/minimal/avg numbers in any of table columns

Will be detected. All data including anomalies is saved directly into your data warehouse so you can easily integrate any existing alerting with it.

Data testing

re_data supports writing data tests by adding dbt_expectations library (and some of our test macros) to dbt project created. We recommend using it, to test both:

  • tables you are monitoring
  • metrics about your data created by re_data

Getting started

Follow our getting started toy shop tutorial! here ๐ŸŽˆ๐Ÿš™ ๐Ÿฆ„

Docs

More details on tables created by re_data through dbt package are on the project Github https://github.com/re-data/dbt-re-data and docs for this package: here

Community

Join Slack for questions about using re_data and discussion with people making it ๐Ÿ™‚

Integrations

We support almost all of the main data warehouses supported by dbt. We plan to add support for Spark (now officially supported by dbt).

Integration Status
BigQuerySupported
PostgreSQLSupported
RedshiftSupported
SnowflakeSupported
Apache SparkPlanned

License

re_data is licensed under the MIT license. See the LICENSE file for licensing information.

Contributing

We love all contributions ๐Ÿ˜ bigger and smaller.

Check out the current list of issues here and see if you like anything from there. Also, feel welcome to join our Slack and suggest ideas or set up a live session here.

And if you got this far and like what we are building, support us! Star https://github.com/re-data/re-data on Github ๐Ÿคฉ

re-data's People

Contributors

mateuszklimek avatar jwarlander avatar harduim avatar sbboakye avatar arkady-emelyanov avatar dlbas avatar guicalare avatar hishoss avatar maciejklimek avatar zuba0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.