Giter Site home page Giter Site logo

snowplow / iglu Goto Github PK

View Code? Open in Web Editor NEW
207.0 28.0 44.0 6.3 MB

Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow

Home Page: http://www.snowplow.io

License: Apache License 2.0

Shell 100.00%
json-schema machine-readable repository snowplow

iglu's Introduction

Snowplow Iglu

Latest release License Discourse posts

Snowplow logo

Overview

Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow. A schema repository (also called a registry) is like npm or Maven or git, but holds data schemas instead of software or code.

Iglu is used extensively in Snowplow. For a presentation on how we came to build Iglu, see this blog post.

Table of contents

Where to start?

The documentation is a great place to learn more, especially:

Would rather dive into the code? Then you are already in the right place!


Iglu technology 101

Iglu architecture

The repository structure outlines the interrelations among the architectural components of Iglu. To briefly explain these components:

  • Common: Common libraries and tools of the Iglu ecosystem.
  • Clients: Iglu clients are used for interacting with Iglu server repos and for resolving schemas in embedded and remote Iglu schema repositories.
  • Repositories: Iglu repositories act as stores of data schemas, that can be embedded or hosted over HTTP.
  • Infrastructure: Containers (e.g. terraform-modules) bundling infrastructure as code configuration for Iglu Server.

About this repository

This repository is an umbrella repository for all loosely-coupled Iglu components and is updated on each component release.

Since August 2022, all components have been extracted into their dedicated repositories and are still here as git submodules. This repository serves as an entry point and as a historical artifact.

Common

Clients

Repositories

Infrastructure

Copyright and license

Iglu is copyright 2014-2023 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

iglu's People

Contributors

adatzer avatar alexanderdean avatar benfradet avatar chuwy avatar fblundun avatar github-actions[bot] avatar gitter-badger avatar mandelliant avatar misterpig avatar ninjabear avatar oguzhanunlu avatar oldpa avatar rzats avatar saj1th avatar stanch avatar szareiangm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iglu's Issues

Ability to test-publish a schema

test-publish simply runs a new putative schema against the existing test suite to see the results. It doesn't actually publish.

Need to come up with a better name than test-publish.

Add validation only route for schemas

Create a new route to validate that a schema is self-describing, the route would be something along those lines:

/api/schemas/validate?json={}

And you would get back the validation failure message in case of failure or a 200 in case of success

Add company account to entity hierarchy

Currently companies are implicit in the vendors that an API key gives you read/write access to.

Let's expand this out a bit:

  • A superuser can create company accounts
  • A company account can have one or more vendor prefixes
  • TBC - should the read/write API keys be associated with the company account, or tied to a specific vendor prefix?

Note: when registering a vendor prefix with a company account, let's validate that the vendor prefix is not itself a prefix (or complete) of another vendor prefix in the system.

You can add additional vendor prefixes to a company account, but cannot remove them (because schemas are immutable).

Add iglu: {} metadata to JSON Schema

The idea is that when you return a JSON Schema, you get a metadata envelope (at the same level as "schema":).

Initial contents:

"iglu": {
  "createdAt": "<date-time>",
  "modifiedAt": "<date-time>"
}

This metadata should be read-only and not included when posting a new schema.

Command-line client for Iglu

Great idea by @BenFradet - essentially would allow you to publish, test and fetch schemas from the command-line.

If we write in node.js, then a) deployment is really easy (vs Ruby or Python) b) it will be responsive (vs JVM and c) we can extract out a node.js client

Figure out a way of supporting tests in Iglu

Fred wrote some nice tests for the core Snowplow schemas when these were a part of snowplow/snowplow. These can be seen here:

https://github.com/snowplow/snowplow/tree/40a5037563e729c67a922a3e2e67c4e5bb917809/0-common/schemas/jsonschema/tests

Fundamentally, tests divide into:

  • Good tests - pass validation
  • Bad tests - fail validation

So we need to think about how to store tests inside of Iglu. Starter for 10 - what about:

com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good
com.snowplowanalytics.self-desc/instance/jsonschema/1-0-0/tests/bad

The idea is that tests that are good for 1-0-0 must also (by definition) be good for 1-0-1, 1-0-2 etc. Tests which are bad for 1-0-0 could be good for 1-0-1 so there's nothing we can reason about there.

@fblundun thoughts?

Move to absolute paths in the catalog and schema services

Right now when we retrieve, for example, every version of a schema with:

/vendor/name/format

we get back:

[
  {
    "schema": {},
    "version": "1-0-0"
  },
  {
    "schema": {},
    "version": "1-0-1"
  }
]

I think it'd be better to get back:

[
  {
    "schema": {},
    "location": "/vendor/name/format/1-0-0"
  },
  {
    "schema": {},
    "location": "/vendor/name/format/1-0-1"
  }
]

Same goes when retrieving unique schema

/vendor/name/format/version

a bit of location metadata would be useful imo

{
  "schema": {},
  "location": "/vendor/name/format/version"
}

Add support for multi-get

This is an interesting one.

There are Iglu users who want to fetch schemas from Iglu in the browser. In the absence of an iglu-javascript-client, they are handrolling this support for now.

Their requirement on Iglu server is an interesting one: they will not be allowed to do multiple AJAX requests, one per schema, so instead would like a multi-get. This is where we would come up with a syntax for requesting multiple schemas at a time, and then the response would likely be something like this:

[
  {
    "key": "iglu:xxx/xxx/xxx/xxx",
    "schema": { ... }
  },
  {
    "key": "iglu:xxx/xxx/xxx/xxx",
    "schema": { ... }
  },
  ...
]

Then the Iglu user can cache all of those schemas in LocalStorage.

@BenFradet:

  • Do these requirements make sense to you?
  • Any ideas on the syntax for the multi-GET request?
  • Any changes/suggestions on the syntax for the multi-GET response?
  • Shall we treat this as a separate feature or related to the catalog functionality you are working on?

Add simple ownership model

Basic model:

  • Start with vendors (i.e. don't worry about teams, groups or individual users).
  • Each vendor has exclusive permissions on 0-N vendor prefixes
  • Each read key and each write key belongs to a vendor

Example:

  • Vendor called Snowplow Analytics Ltd
  • Has exclusive permissions on com.snowplowanalytics(.*) as schemas
  • Only read keys belonging to Snowplow Analytics Ltd can see schemas with a com.snowplowanalytics(.*) vendor
  • Only write keys belonging to com.snowplowanalytics can register a schema with a com.snowplowanalytics(.*) vendor

Thoughts @BenFradet ?

Decide if how to do key generation/distribution

As @BenFradet says:

for now I just stored the api keys in a different table on dynamo i guess it's fine for now?
do you think I should write an uuid generator as a part of the scala serv?
I have no idea how we would go about distributing them though

Scala Repo Server: create initial version

Should support:

GET /<schemas>/<vendor>/<name>/<jsonschema>/<version>

and authenticated:

POST /schemas

where the payload to the POST determines where the schema ends up.

Add support for JSON Table

JSON Tables:

  1. Should be available under com.myvendor/mytype/jsontable/1-0-0
  2. Should be validated against the JSON Schema for JSON Table Schemas (haven't found yet)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.