Giter Site home page Giter Site logo

chirag03k / ver Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thedatastation/ver

0.0 0.0 0.0 133.71 MB

Data Discovery Tools and Systems

License: MIT License

Shell 0.01% JavaScript 1.44% C++ 0.77% Python 83.11% Java 13.37% CSS 0.20% HTML 0.14% Jupyter Notebook 0.92% Dockerfile 0.04%

ver's Introduction

Data Discovery Tools and Systems

[This project is currently a work in progress, and we expect to have all documentation, test and a full demo done by the end of the 2nd quarter of 2023. 1st quarter of 2024]

Data discovery is the problem of identifying and retrieving data that satisfies an information need. See common data discovery problems here and feel free to open issues to suggest other scenarios you know about. This respository contains Ver, a collection of tools designed to address data discovery problems.

Ver is divided into separate components that can be used in isolation to solve point problems, or used jointly to address discovery scenarios end to end. Upstream of any pipeline we have data repositories. Downstream of any pipeline we have an interface. Ver has different interfaces, including a Python discovery API (Aurum), a view discovery API (we call it Ver as well), and a utility-function based search (we call it Metam). And we are always thinking of new interfaces and components to help more users address their discovery needs.

Structure of the Repository

WIP

Ver Overview and Architecture

A conceptual way of understanding what Ver does is to look at its architecture.

The picture includes each of Ver's components. We give a brief description of each component below. Most components can be used in a standalone manner, and if you inspect the component, you will find a component-specific README that gives you more details.

Discovery Engine and Index Creation

This component builds indices over pathless table collections: i) a join path index, which can be approximate; ii) retrieval indices over table names, values, and attribute names and column similarity. The indices are available online, via the Engine’s API to other components.

View Specification

Discovery interfaces include spreadsheet-style, keyword search, APIs, natural language, and combinations of these. The reference architecture supports these interfaces via this component. For QBE-based interfaces, as implemented by Ver, the input is a set of examples, and the output of this stage is a set of example attributes and values.

Column Selection

This component selects the subset of tables containing user-provided examples. The output of this component is a collection of candidate tables and columns.

Join Graph Search

Given a set of candidate tables, an input query, and the discovery index providing join paths, this component identifies all join graphs that, when materialized, produce candidate PJ-views. The main goal of this component is to address the large join path space. To materialize candidate PJ-views, this component uses a Materializer, a data processing component with the capacity to execute PJ queries.

View Distillation

This component computes categories from the candidate PJ-views that include redundancy and containment in the views, as well as opportunities for unioning views and more. Some categories can be used to distill/summarize the views. Others are shared with the downstream component.

View Presentation

This component uses different question interfaces to elicit information from users via data questions. The questions are designed to narrow down the space until users find the desired view. The component chooses what questions to ask, sequentially, using a bandit-based approach.

ver's People

Contributors

raulcf avatar youny626 avatar snowgy avatar wangsibovictor avatar rogertangos avatar kevindharmawan avatar zaki-indra avatar sainyam avatar ygina avatar yinyanghu avatar stanleyzhu1 avatar jmftrindade avatar mansoure avatar semihyumusak avatar nato16 avatar florents-tselai avatar justinanderson avatar suhailshergill avatar damienrrb avatar michaeldh42 avatar svdwoude avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.