Giter Site home page Giter Site logo

cqube-ingestion's Introduction

Ingestion Process

cQube-LLD-Parser drawio

The above diagram shows the low level diagram of the data ingestion process in cQube. cQube accepts data in the form of CSVs, these CSVs are required to follow set and strict rules of naming conventions. These CSVs are then processed to create the processable JSON schemas out of the CSV data. These JSONs are then processed to generate the various datasets insert data into those datasets. If we go flow wise as shown in the diagram, we get the CSV files which are presently stored in the /ingest folder. There are two types of data files:

  1. Dimension files: These define the dimensions that act as the atomic building blocks for the actual dataset files. For example: District ids, school ids, etc.
  2. Event files: These contain the actual data that is aggregated and stored into the tables.

Both of these types of files further have two types:

  1. Grammar files: These file define the schema of the table to be created when ingesting the data present in the corresponding data file.
  2. Data files: These files contain the actual data that needs to be ingested.

Below diagram shows the correlation between the types of CSVs and the final table that is created. Each event file has data that combines various dimensions and time dimensions together, these files are read and processed and accordingly different datasets are created for a single event file based on the number of time dimensions and dimensions and their combinations, we have the option to define which compound datasets (combinations of more than one dimensions) to be created and also to specify which dimension to not be converted to a data, via the whitelisted and blacklisted options in the config file.

cQube-CSV-Ingestion drawio

References

To learn more about cQube you can refer to the following links:

cqube-ingestion's People

Contributors

ansh-sarkar avatar chakshugautam avatar dhanush-2397 avatar ramya-k-murthy avatar techsavvyash avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.