Giter Site home page Giter Site logo

philky-m / turnstyl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from oslabs-beta/turnstyl

0.0 0.0 0.0 1.81 MB

Turnstyl is an open-source ETL data integrity monitoring node package built to work with event and messaging brokers like Kafka

TypeScript 99.26% JavaScript 0.74%

turnstyl's Introduction

npm version Last commit Github stars Snyk

Turnstyl is an NPM package built to work with Kafka and Google Big Query to allow you to compare the data schema input at the producer level to what arrives into your data warehouse.

The aim of Turnstyl is to monitor and notify the user of data validity issues in your data pipeline, specifically at the fringe between backend and ETL systems. Turnstyl was primarily designed for use in micro-service architecture where events are dumped into a data warehouse in a generic and schema-agnostic format.

โœ… Pre-requisites

To run this package you'll need to have:

  • A working instance of Kafka or equivalent message broker
  • Be setup and using Google BigQuery as your data warehouse
  • Have activated the Google API and setup a service account

๐Ÿš€ QuickStart

Getting started with Turntstyl is easy:

  1. In your terminal run npm install turnstyl in your target project directory
  2. Create a turnstyl.config.yaml in your project root directory, using the template below to input your Google Big Query project and dataset names, and the path to your Google API service account credentials JSON.

Example turnstyl.config.yaml file:

version: '1.0'

google_service_credentials: './GOOGLE_BIG_QUERY_SERVICE_ACCOUNT_CREDENTIALS.json'
big_query_project_name: 'BIG_QUERY_PROJECT_NAME'
big_query_dataset_name: 'BIG_QUERY_DATASET_NAME'
  1. Import the package into the service where data is being sent to your producer

const { Turnstyl } = require("turnstyl");

  1. Instantiate a new Turnstyl object and invoke the following two methods:

const newTurnstyl = new Turnstyl();

  • newTurnstyl.cacheProducerEvent(topic, message); - Caches your events where they are produced (or in the producer) and with reference to the topic that they will be sent through, parsing the object for it's data types.
  • newTurnstyl.compareProducerToDBSchema(topic); - Compares the events that have arrived into Big Query (assuming the topic name matches your target table) flagging if there is a discrepancy between the two.

โœจ Features

  • Compare data object processed by the producer to the JSON deposited in the data warehouse
  • Integration with Google Cloud Platform (Big Query)

๐Ÿ› ๏ธ Example usage

const { Kafka } = require('kafkajs');
const { Turnstyl } = require('turnstyl');

const newTurnstyl = new Turnstyl();

/**
 * @function producer function that connects to Kafka sends a message then disconnects
 * @param producerName STRING name of producer
 * @param message STRING message that will be sent to Kafka
 * @param topic STRINg name of the topic that message will be posed to on Kafka
 */
const producer = async (
  producerName
  message
  topic
) => {
  //Declare a variable kafka assigned to an instance of kafka (door into the kafka brokerage)
  const kafka = new Kafka({
    clientId: producerName,
    brokers: ['kafka:9092'],
  });
  const producer = kafka.producer();

  newTurnstyl.cacheProducerEvent(topic, message);
  newTurnstyl.compareProducerToDBSchema(topic);

  try {
    // Connect to the producer
    await producer.connect();
  } catch (error) {
    console.log('Producer Connection error: ', error);
  }

  try {
    // Send message
    await producer.send({
      topic: topic,
      messages: [{ value: JSON.stringify(message) }],
    });
    console.log('message is:', message);
  } catch (error) {
    console.log('error in message send', error);
  }
  console.log('Data sent by producer');
  // Close connection to the broker
  producer.disconnect();
};

export { producer };

๐Ÿญ Supported Message brokers

  • Apache Kafka

๐Ÿญ Supported Data Warehouses

  • Google Big Query

๐Ÿ“ธ Screenshots

๐Ÿš— Roadmap

Head to our roadmap to see our upcoming planned features

๐Ÿ‘ Contribute

Want to contribute to Turnstyl? Head over to our contribution page

Contributors:

Jae Kim - LinkedIn

Yolan Arnett - LinkedIn

Dillon Schriver - LinkedIn

Emeric David - LinkedIn

Michael Colley - LinkedIn - Twitter

If you'd like to support the active development of Turnstyl:

  • Add a GitHub Star to the project
  • Tweet about the project on your Twitter.
  • Write a review or tutorial on Medium, Dev.to or personal blog.

๐ŸŽŸ๏ธ License

Turnstyl is free and open-source software licensed under the MIT License.

turnstyl's People

Contributors

michaelecolley avatar jaeklm avatar emeric-gh avatar dillon-schriver avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.