Giter Site home page Giter Site logo

intuit / superglue Goto Github PK

View Code? Open in Web Editor NEW
153.0 12.0 37.0 46.08 MB

Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs and reports.

License: Apache License 2.0

Shell 0.04% Scala 76.00% Dockerfile 0.04% JavaScript 19.22% HTML 0.11% SCSS 4.59%

superglue's Introduction

Superglue

Join the chat at https://gitter.im/intuit/superglue

Superglue is a lineage-tracking tool to help visualize the propagation of data through complex pipelines.

Superglue demo

Quick Start

Dependencies:

  • JDK 8
  • Docker

The first-time setup takes about five minutes.

Superglue setup

Note: The gifs show superglue being hosted at http://localhost:3000, but that has since changed. Be sure to use http://localhost:8080 instead!

Detailed instructions below!

Launch the development environment with Docker

We've included a docker configuration to set up all of the services that superglue needs to run. To launch the development image, run

docker-compose -f deployments/development/docker-compose.yml up

This launches

  • A MySQL database on port 3314
  • The superglue frontend at http://localhost:8080
  • The superglue backend at http://localhost:8080/api
  • An elasticsearch server at http://localhost:8080/elasticsearch

Note: By default, docker allocates 2GB of memory for containers, but you may need to increase this limit, otherwise elasticsearch will shut down.

Install the command-line client

To install the superglue command-line client, run

./gradlew installDist

This will put the superglue executable into ~/.superglue/bin/. Add this directory to your path to use it as a command by pasting the following line to the end of your ~/.bashrc:

export PATH="${HOME}/.superglue/bin:${PATH}"

Get started with sample data

We've included a sample SQL script with some dummy statements to illustrate Superglue's usefulness. The next steps will assume you successfully installed the superglue command-line tool and have the docker development containers running.

The first thing we need to do is initialize the database. To do this, we need a configuration file with the database's location and credentials. We've provided one for this exercise in examples/superglue.conf.

cd examples
superglue init --database

Note: The superglue tool automatically searches for a file called superglue.conf in the current directory to use as its configuration.

Next, we need to parse our sample data (in examples/demo.sql) and get it into the database. Our configuration file also lists the files that should be parsed, and again, the command-line tool will automatically use superglue.conf.

# In examples/
superglue parse

If everything works out, superglue should print out a json blurb that describes the data it parsed, then it will pause for a few seconds as it inserts the data into the database.

The last setup step is to load our data into elasticsearch so that we'll be able to search through the data from the UI.

superglue elastic --load

Once all of that's done, head on over to a browser and open up http://localhost:8080. You should be able to start searching for table names, and click one to see it's lineage.

Note: The sample data tables are named using Lorem Ipsum, so try searching one of those words.

Tests

To run all of the tests, run:

./gradlew test

To check the code's test coverage, run:

# To just generate a report
./gradlew reportScoverage

# To pass or fail based on coverage threshold (75%)
./gradlew checkScoverage

After running reportScoverage (and also checkScoverage if it passed), you can view the coverage report by opening a module's build/reports/scoverage/index.html file in a browser.

Contributing

If you'd like to contribute to Superglue, be sure to check out our contributing guidelines and feel free to open an issue or pull request!

superglue's People

Contributors

bennettelisa avatar dependabot[bot] avatar gitter-badger avatar lingyv-li avatar mayurmadnani avatar sambekar15 avatar shashankviswanadha avatar zaraehhs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

superglue's Issues

Superglue Calcite Parser : Upgrade to latest Calcite Version - 1.27.0

Is your feature request related to a problem? Please describe.
Superglue Calcite Parser : Upgrade to latest Calcite Version - 1.27.0 to take advantage of some of the fixes wrt RLIKE, DATE functions
https://calcite.apache.org/downloads/

Describe the solution you'd like
Regression and test cases should pass after upgrading to latest calcite version
Sql Statement like - 1) select cola from tableA where MAX(realm_email) RLIKE '.+@.+\..+' 2) select cola from tableA where MAX(realm_email) not rlike '.+@.+\..+' should succeed on parsing

Gradle issue "play" plugin

The supplied phased action failed with an exception.
A problem occurred configuring project ':api'.
Build file '.\superglue-master\api\build.gradle' line: 2
Plugin [id: 'play'] was not found in any of the following sources:

Due to 'play' plugin, gradle is not building.
I also tried the same with palyframework but still facing same issue.
Also the link provided is not of a valid page.

Enable traversal of graph on UI by depth

Currently the Superglue UI only allows us to see the lineage of depth 1. However, the backend API has capabilities for different depths (1 through 4, -1 for full lineage), so we want to augment the UI to display those lineages to the user.

Describe the solution you'd like
We want to have a drop down that will let you choose the depth, and make a call to the backend with the new depth. Then the UI can display the new lineage with the new depth once the fetching is complete.

Describe alternatives you've considered
We are open to other approaches in displaying different depths on the UI.

Additional context
Add any other context or screenshots about the feature request here.

Local set up throws 502 error with the url http://localhost:8080

Followed all the set up steps per Readme and get 502 error with http://localhost:8080. The error log is below

To Reproduce
Steps to reproduce the behavior. For example:

  1. Did the following setup steps:
    docker-compose -f deployments/development/docker-compose.yml up
    ./gradlew installDist
    export PATH="${HOME}/.superglue/bin:${PATH}"
    cd examples
    superglue init --database

In examples/

superglue parse
superglue elastic --load

  1. The configuration is same as default in the git
  2. Send a request like http://localhost:8080
  3. Error:
    nginx | 172.20.0.1 - - [01/Oct/2020:21:31:55 +0000] "GET /app HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15"
    nginx | 172.20.0.1 - - [01/Oct/2020:21:31:56 +0000] "GET / HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15"
    nginx | 172.20.0.1 - - [01/Oct/2020:21:31:58 +0000] "GET / HTTP/1.1" 502 158 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15"
    nginx | 2020/10/01 21:31:58 [error] 7#7: *51 connect() failed (113: No route to host) while connecting to upstream, client: 172.20.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "http://172.20.0.4:3000/", host: "localhost:8080"

Expected behavior
Should display the UI

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: MacOS Catalina 10.15.4
  • Browser - both Chrome and Safari
  • Version
    java -version
    openjdk version "11.0.7" 2020-04-14 LTS

Docker:
docker version
Client: Docker Engine - Community
Cloud integration 0.1.18
Version: 19.03.13
API version: 1.40
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 16:58:31 2020
OS/Arch: darwin/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 19.03.13
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 17:07:04 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.3.7
GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683

Superglue Calcite Parser : Use SqlBabelParserImpl as Parser Factory instead of SqlDdlParserImpl while creating calcite config

Is your feature request related to a problem? Please describe.
Superglue Calcite Parser uses SqlDdlParserImpl Parser Factory to create calcite Config. It sets Conformance to SqlConformanceEnum.BABEL but it is not sufficient . Many of the date/time syntaxes are fixed in SqlBabelParserImpl but not available in SqlDdlParserImpl.

Describe the solution you'd like
Change the Superglue Calcite Parser to Use SqlBabelParserImpl as Parser Factory instead of SqlDdlParserImpl while creating calcite config
SQl statements like - select date('1990-01-01') from t , select date(x) from t should succeed on parsing

[Frontend] Create a button on the lineage screen that returns the users back to the search page

Is your feature request related to a problem? Please describe.
Currently the lineage page only shows the network graph. While the user can click Back on their browser, it is not easy to switch back and forth between the lineage screen and the search screen.

Describe the solution you'd like
it would be better if we can provide a back button on the app itself that brings the user back to the search page.

Describe alternatives you've considered
Other creative ideas are encouraged such as a top menu bar that has an embedded search component along with other menus that would be useful.

Additional context
Current lineage page:
image

Add logging in the application

Is your feature request related to a problem? Please describe.
Currently the application does not have logging. The errors are printed on console. Using logging in your application can be useful for monitoring, debugging, error tracking, and business intelligence.

For eg - Hitting the API - http://localhost:8080/api/v1/lineage/table/LOREM?bw=-1&fw=-1 returns empty response
but stacktrace is not logged. This makes it default to debug.

Describe the solution you'd like
Integrate with Play API logging - https://www.playframework.com/documentation/2.8.x/ScalaLogging
Log the error from API call
Desired output structure in log would be :
%d{yy/MM/dd HH:mm:ss}; thread=%thread; level=%p; %marker; class=%c; method=%method; lineNumber=%L; msg=%message%n%xException

Provide configurable ability to plug-in any parser

Is your feature request related to a problem? Please describe.
Currently we have implemented Calcite parser. We have defined interfaces so it's easy to plug in any parser - Calcite,gSQLParser but its hardcoded. Make it configurable .

Current implementation picks up parser depending upon kind of file and type of parser is hard-coded.
For example this config: com.intuit.superglue {
pipeline {
outputs.database.enabled = true
inputs.files = [{
base = "/Users/sambekar/GIT/care_analytics/care_analytics"
kind = "sql"
includes=["glob:/*.sql"]
},
{
base = "/Users/sambekar/GIT/care_analytics/care_analytics"
kind = "hql"
includes=["glob:
/.hql"]
},
{
base = "/Users/sambekar/GIT/sbg_stable_analyst_scripts/sbg_stable_analyst_scripts"
kind = "sql"
includes=["glob:**/
.sql"]
}
]
}
}

picks up calcite only for sql files because it is hard-coded in ParsingPipeline class . Parses only sql files. Calcite is able to pass hql files as well but this config filters that out.
Instead make the parser type configurable (could be calcite,gsqlparser etc..) and don't filter on kind of file
Proposed Config:

com.intuit.superglue {
pipeline {
outputs.database.enabled = true
parserEngine = "calcite"
inputs.files = [{
base = "/Users/sambekar/GIT/care_analytics/care_analytics"
kind = "sql"
includes=["glob:/*.sql"]
},
{
base = "/Users/sambekar/GIT/care_analytics/care_analytics"
kind = "hql"
includes=["glob:
/.hql"]
},
{
base = "/Users/sambekar/GIT/sbg_stable_analyst_scripts/sbg_stable_analyst_scripts"
kind = "sql"
includes=["glob:**/
.sql"]
}
]
}
}

Graph Model for SuperGlue Lineage Services

Is your feature request related to a problem? Please describe.
Currently, superGlue is using RDBMS for graph modeling and lineage services will construct the Graph from metadata. The challenges with this approach is as below.

  1. Handling full property Graph features will become nightmare and not scalable
  2. To handling column lineage will become more complex with current model
  3. Graph property based index is not straight forward.

To handle all the above issues, at Intuit, we have started working Graph model and integration with Neo4j.

Describe the solution you'd like
Coming up with Property Graph schema, integrating SuperGlue with Neo4j.

Describe alternatives you've considered
We will be working on end-end design document with all the details on various approaches

Additional context
NA

Lineage API response is incorrect when only of the queryParam (fw or bw) is passed

Problem:
Hitting http://localhost:9000/v1/lineage/table/LOREM?bw=2 returns TableNotFound

The API returns full lineage if queryParams are not passed.
For Eg -
http://localhost:8080/api/v1/lineage/table/LOREM- returns full backward and forward lineage
http://localhost:8080/api/v1/lineage/table/LOREM?bw=0 returns full forward Lineage
http://localhost:8080/api/v1/lineage/table/LOREM?fw=0 returns full backward Lineage

But searching for any different depth for a single param results in error -
http://localhost:8080/api/v1/lineage/table/LOREM?bw=2 returns Table "LOREM" not found

Instead it should return http://localhost:8080/api/v1/lineage/table/LOREM?bw=2 => Backward Lineage at depth 2 and full forward Lineage

Expectations: Hitting http://localhost:8080/api/v1/lineage/table/LOREM?bw=2 => Backward Lineage at depth 2 and full forward Lineage

dockerfile issue

Describe the bug
Dockerfile:1

1 | >>> FROM java:8
2 |
3 | WORKDIR /superglue/backend

ERROR: failed to solve: java:8: docker.io/library/java:8: not found

Update code to "FROM openjdk:8"

To Reproduce
run docker-compose command

Expected behavior
the command runs successfully

Desktop (please complete the following information):

  • OS: MacOS

Provide ability to configure SQL dialects and platform for SQL Input Paths.

Is your feature request related to a problem? Please describe.
Currently we are using default MYSQL dialect while parsing queries be it - hive,vertica,redshift,sparksql . Provide ability to configure SQL dialects - Eg SparkSQL,Vertica,Redshift,Hive etc for SQL Input Paths.

Dialect can be specified for each kind of sqlInput file Path

For example for the above config
com.intuit.superglue {
pipeline {
outputs.database.enabled = true
inputs.files = [{
base = "/Users/sambekar/GIT/care_analytics/care_analytics"
kind = "sql"
includes=["glob:/*.sql"]
},
{
base = "/Users/sambekar/GIT/care_analytics/care_analytics"
kind = "hql"
includes=["glob:
/.hql"]
},
{
base = "/Users/sambekar/GIT/sbg_stable_analyst_scripts/sbg_stable_analyst_scripts"
kind = "sql"
includes=["glob:**/
.sql"]
}
]
}
dao {
backend = "relational"
relational.db {
url = "jdbc:mysql://localhost:3314/superglue"
user = "root"
password = "superglue_development"
}
}
}

Proposed Config :

com.intuit.superglue {
pipeline {
outputs.database.enabled = true
inputs.files = [{
base = "/Users/sambekar/GIT/care_analytics/care_analytics"
kind = "sql"
includes=["glob:/*.sql"]
dialect = "VERTICA"
},
{
base = "/Users/sambekar/GIT/care_analytics/care_analytics"
kind = "hql"
includes=["glob:
/.hql"]
dialect = "SPARKSQL"
},
{
base = "/Users/sambekar/GIT/sbg_stable_analyst_scripts/sbg_stable_analyst_scripts"
kind = "sql"
includes=["glob:**/
.sql"]
dialect = "REDSHIFT"
}
]
}
dao {
backend = "relational"
relational.db {
url = "jdbc:mysql://localhost:3314/superglue"
user = "root"
password = "superglue_development"
}
}
}

Backend and Nginx Docker containers cannot be started

Describe the bug
Backend and Nginx Docker containers cannot be started.

To Reproduce
Steps to reproduce the behavior:

  1. Run docker-compose -f deployments/development/docker-compose.yml up in the project root directory
  2. In the terminal, the following messages are displayed:
  • backend exited with code 127
  • nginx exited with code 1
  1. See errors in Docker Desktop app as well

Expected behavior
Components are expected to all start, and app would be accessible through localhost.

Screenshots
Docker Client screenshot of backend error:
Screen Shot 2021-04-30 at 8 26 10 AM

Docker Client screenshot of nginx error:
Screen Shot 2021-04-30 at 8 25 23 AM

Desktop:

  • OS: MacOS Big Sur

Lineage API depth -1 throws 500 error

Describe the bug
Lineage API depth -1 throws 500 error. Negative depth is not supported . Instead of throwing exception it should gracefully handle the error

To Reproduce
GET call on lineage api - http://localhost:8080/api/v1/lineage/table/LOREM?bw=-1&fw=-1
Unexpected error constructing lineage for table "LOREM"

image

Expected behavior
Should return Validation Error (404) status code with formatted json Response

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. MacOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

SupeGlue blank page and not allowing to search on WEB UI

SuperGlue_blank_page_issue

**Describe the bug** after parsing and Elastic search load, it’s showing a blank page and not allowing to search anything on WEB UI.

To Reproduce
You can reproduce it by following README.

Expected behavior
NA

Screenshots
Adding the screenshot

Desktop (please complete the following information):

  • OS: MacOS 10.13.6
  • Browser [chrome]

[Frontend] make the lineage and search pages responsive

Is your feature request related to a problem? Please describe.
Currently the modules are stuck at a certain height and won't match the full length of the browsing window.

Describe the solution you'd like
We should aim to make the graph and the table responsive so that there aren't any extraneous scrollbar.

Additional context
scroll

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.