Giter Site home page Giter Site logo

opendatadiscovery / odd-platform Goto Github PK

View Code? Open in Web Editor NEW
1.1K 17.0 91.0 28.75 MB

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Home Page: https://opendatadiscovery.org

License: Apache License 2.0

Shell 0.02% Java 57.83% HTML 0.08% TypeScript 41.41% Python 0.07% JavaScript 0.08% Groovy 0.14% CSS 0.07% Mustache 0.29%
oss data-platform metadata metadata-management data-pipelines data-engineering observability data-catalog datacatalog data-discovery

odd-platform's Introduction

open-data-discovery-logo 

Next-Gen Data Discovery and Data Observability Platform

Apache2 Maintenance GitHub contributors GitHub issues by-label

WebsiteLinkedInSlackDocumentationBlogDemo

Next-Gen Data Discovery and Data Observability Platform

Demo

Play with our demo app!

Introduction

ODD is an open-source data discovery and observability tool for data teams that helps to efficiently democratise data, power collaboration and reduce time on data discovery through modern user-friendly environment.

Key wins

  • Shorten data discovery phase

  • Have transparency on how and by whom the data is used

  • Foster data culture by continuous compliance and data quality monitoring

  • Accelerate data insights

  • Know the sources of your dashboards and ad hoc reports

  • Deprecate outdated objects responsibly by assessing and mitigating the risks

  • 👉 ODD Platform is a reference implementation of Open Data Discovery Spec.

Features

Data Discovery and Observability

  • Accumulate scattered data insights in Federated Data catalogue
  • Gain observability through E2E Data objects Lineage
  • Benefit from cutting-edge E2E microservices Lineage feature in tracking your data flow through the whole data landscape
  • Be warned and alerted by Pipeline Monitoring tools
  • Store your metadata
  • Use ODD-native modern lightweight UI

ML First citizen

  • Save results of your ML Experiments by automatically logging its parameters

Data Security & Compliance

  • Manage Tags to prevent any abuse of the data
  • Refer to Tags to stay compliant with data security standards
  • Have full transparency on how and by whom the data is used

Data Quality

  • Utilize advanced Data Quality Dashboard to gain insights into data quality metrics, trends, and issues across your datasets, enabling proactive data quality management
  • Simplify DQ processes by using ODD with Great Expectations and DBT tests compatibility
  • Integrate ODD with any custom DQ framework

Reference Data Management (Lookup Tables) - a part of Master Data Management (MDM)

  • Manage and store reference data centrally, ensuring a single source of truth for key data elements like currency codes, country names, and product categories, etc.
  • Easily integrate Lookup Tables with data pipelines and transformations, enhancing data enrichment and validation processes
  • Support data governance and compliance efforts by maintaining accurate and consistent reference data across all data assets

Getting Started

Running as a separate container

Setting up PostgreSQL connection details, for example:

export POSTGRES_HOST=172.17.0.1 \
export POSTGRES_PORT=5432 \
export POSTGRES_DATABASE=postgres \
export POSTGRES_USER=postgres \
export POSTGRES_PASSWORD=mysecretpassword

Starting new instance of the platform:

docker run -d \
  --name odd-platform \
  -e SPRING_DATASOURCE_URL=jdbc:postgresql://${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DATABASE} \
  -e SPRING_DATASOURCE_USERNAME=${POSTGRES_USER} \
  -e SPRING_DATASOURCE_PASSWORD=${POSTGRES_PASSWORD} \  
  -p 8080:8080 \
  ghcr.io/opendatadiscovery/odd-platform:latest

Go to localhost:8080 in case of local environment

Running Locally with Docker Compose

docker-compose -f docker/demo.yaml up -d odd-platform-enricher

Deploying to Kubernetes with Helm Charts

Example configurations

There are various example configurations (via docker-compose) within docker/examples directory.

Contributing

Contributing to ODD Platform is very welcome. For basic contributions, all you need is being comfortable with GitHub and Git. The best ways to contribute are:

  • Work on new adapters
  • Work on documentation

To ensure equal and positive communication, we adhere to our Code of Conduct. Before starting any interactions with this repository, please read it and make sure to follow.

Please before contributing check out our Contributing Guide and issues labeled "good first issue":

GitHub issues by-label


Integrations

OpenDataDiscovery Platform offers comprehensive data source support to meet your needs.

Existing integrations
Proxy Adapter Airflow Airflow 2+
Apache Druid Cassandra Clickhouse
Elasticsearch Hive Kafka
Feast MSSQL MySQL
Microsoft ODBC MongoDB Neo4j
MariaDB Oracle PostgreSQL
Redshift Snowflake Vertica
Tarantool Athena DynamoDB
Glue Kinesis Quicksight
S3 SageMaker SageMaker Featurestore
SQS Delta lake S3 Tableau
Cube SuperSet PowerBi
Trino Presto DBT
Redash Spark MLflow
Kubeflow Databricks Unity Catalog Great Expectations
SQLite Couchbase Cockroachdb
Fivetran Airbyte Metabase
Mode BigQuery Singlestore

ODD Data Model

ODD operates the following high-level types of entities:

  1. Datasets (collections of data: tables, topics, files, feature groups)
  2. Transformers (transformers of data: ETL or ML training jobs, experiments)
  3. Data Consumers (data consumers: ML models or BI dashboards)
  4. Data Quality Tests (data quality tests for datasets)
  5. Data Inputs (sources of data)
  6. Transformer Runs (executions of ETL or ML training jobs)
  7. Quality Test Runs executions of data quality tests

For more information, please check specification.md.

Community Support

Join our community if you need help, want to chat or have any other questions for us:

  • GitHub - Discussion forums and issues
  • Slack - Join the conversation! Get all the latest updates and chat to the devs

Contacts

If you have any questions or ideas, please don't hesitate to drop a line to any of us.

Team Member LinkedIn GitHub
German Osin LinkedIn germanosin
Nikita Dementev LinkedIn DementevNikita
Damir Abdullin LinkedIn damirabdul
Alexey Kozyurov LinkedIn Leshe4ka
Pavel Makarichev LinkedIn vixtir
Roman Zabaluev LinkedIn Haarolean

License

ODD Platform uses the Apache 2.0 License.

odd-platform's People

Contributors

adyachenkoprov avatar alinamiryuk avatar anatolii-yemets avatar andreynenashev avatar anna-eg avatar artyomyus avatar chismur avatar codycrossley avatar damirabdul avatar dementevnikita avatar denisalik avatar dependabot[bot] avatar esmeneev avatar evanto avatar flyspot avatar germanosin avatar haarolean avatar leshe4ka avatar makonakro avatar maksimtereshin avatar marselahmetov avatar narekmat avatar nelydavtyan avatar nikitosinar avatar r81ack avatar ramandamayeu avatar rustamgimadiev avatar shilgam avatar vixtir avatar vladysl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

odd-platform's Issues

Bugs: Search and Infinite scroll

Search Page:

  • wrong overflowing at the filters section;
  • results section cut off at the bottom of the page;
  • search doesn't creating when client goes by url .../search

Infinite scroll:

  • when scrolling through the search results and loading new results, the results "jump";
  • on management page infinite scroll doesn't work correctly.

Misc adjustments according to design.

Prepare local development environment

Local dev env should be easy-to-run and consist of:

  1. docker-compose with ODD Platform, ODD Platform Puller, several available adapters and underlying data sources
  2. scripts that would ingest demonstrative data into the adapters so we could test lineage, DataQA, etc

Backend unit test coverage

Cover backend code with at least unit tests. Integration / e2e tests are welcome but not mandatory and will be covered in another issue

Useful information

Here are some useful links and information if you are new to this repository and OpenDataDiscovery system

Documentation

We keep all documentation in our GitBook. It contains all kind of information such as
system overview and use cases (why are we doing this), architecture (how are we doing this), etc.
Please take a look at the Developer Guides section where we explain how to contribute and
how to setup local environment

Contact us

If you're struggling with something, please feel free to ask us anything here in the GitHub issue ticket or via our Slack community.

All Pages: Show content skeleton while it is loading.

Github actions for ODD Platform

Add Github actions buildscript for ODD Platform:

  1. Build application on pushes/merges into main branch and package into docker image
  2. Push docker image into the private ECR repository
  3. Deploy application in k8s environment
  4. Add a possibility to build and push docker image into the public Github package docker repository (#16) via manual mechanism (GH Actions manual workflow)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.