Giter Site home page Giter Site logo

grafana / thema Goto Github PK

View Code? Open in Web Editor NEW
228.0 127.0 14.0 22.64 MB

A CUE-based framework for portable, evolvable schema

License: Apache License 2.0

CUE 6.70% Go 93.29% Makefile 0.01%
openapi openapi3 cue cuelang grafana schema versioning packaging logic-programming config

thema's Introduction

Thema

Thema is a system for writing schemas. Much like JSON Schema or OpenAPI, it is general-purpose and its most obvious application is as an IDL. However, those systems treat changing schemas as out of scope: a single version of a schema for some object is the atomic unit, and versioning is left to opaque strings in external systems like git or HTTP. Thema, by contrast, makes schema change a first-class system property: the atomic unit is the set of schema for some object, iteratively appended to over time as requirements evolve.

Thema's approach is novel, so an analogy to the familiar may help. "Branching by abstraction" suggests that you refactor large applications not with long-running VCS branches and big-bang merges, but by letting old and new code live side-by-side on main, and choosing between them with logical gates, like feature flags. Thema is "schema versioning by abstraction": all versions of a schema live side-by-side on main, within logical structures Thema defines.

This holistic view allows Thema to act like a typechecker, but for change-safety between schema versions: either schema versions must be backwards compatible, or there must exist logic to translate a valid instance of schema from one schema version to the next. CUE, the language in which Thema schemas are written, allows Thema to mechanically verify these properties.

These capabilities make Thema a general framework for decoupling the evolution of communicating systems. This can be outward-facing: Thema's guardrails allow anyone to create APIs with Stripe's renowned backwards compatibility guarantees. Or it can be inward-facing: or to change the messages passed in a mesh of microservices without intricately orchestrating deployment.

Learn more in our docs, or in this overview video! (Some things have been renamed since that video, but the logic is unchanged.)

Usage

Thema defines the way schemas are written, organizing each object's history into a "lineage." Once authored, Thema also provides tools for working with lineages via a few basic operations. There are a few different usage patterns, all largely equivalent in capability:

  • CLI: a CLI command that provides access to Thema's basic operations, one lineage per invocation. Use it for fast exploration and testing of schemas, or as a tool in CI.
  • Server: An HTTP server that provides access to Thema's basic operations for a configurable set of lineages. Run it as a stateless sidecar in your infrastructure or microservice mesh.
  • Library: a library, importable in your application code, that provides a convenient interface to Thema's basic operations, as well as helpers for common usage patterns. Naturally the most flexible, and the recommended approach for creating new helpers, such as code generators, API generators, or a whole Kubernetes operator framework. (Currently only for Go1)

The CLI and server modes are bundled together in the thema command. To install:

go install github.com/grafana/thema/cmd/thema@latest

Maturity

Thema is a young project. The goals are large, but bounded: we will know when the core system is complete. And it mostly is, now - though some breaking changes to how schemas are written are planned before reaching stability.

It is not yet recommended to replace established, stable systems with Thema, but experimenting with doing so is reasonable (and appreciated!). For newer projects, Thema may be a good choice today; the decision is likely to come down to whether the long-term benefit of a simpler architecture for authoring, composing and evolving schema will offset the short-term cost of some incomplete functionality and breaking changes.

Prior/Related Art

A number of systems partially overlap with Thema - for some data, rolling together a set of schema with the relations between those schema.

  • Project Cambria - Thema's closest analogue. Limited in verifiability by (intentionally) being without a notion of linear schema ordering and versioning, and because schema and translations are written in a Turing complete language (Typescript).
  • Kubernetes resources and webhook conversions - Similar goals: multiple versions of resources (schema) and convertibility between them. Limited in verifiability by relying on convention for grouping schemas, and by expressing translation in a Turing complete language (Go).
  • Stripe's HTTP API - exhibits the backwards compatibility properties an API can have that arise from a schema system with translatability.

Footnotes

  1. Using Thema as a library in a language depends on a CUE evaluator for that language. Currently, the only CUE evaluator is written in Go. โ†ฉ

thema's People

Contributors

agnestoulet avatar dependabot[bot] avatar grafanawriter avatar ifsentient avatar ishanjainn avatar joanlopez avatar joeblubaugh avatar k-phoen avatar kylebrandt avatar radiohead avatar sdboyer avatar sh0rez avatar spinillos avatar undef1nd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

thema's Issues

Change `UnwrapCUE()` to `Underlying()`

It doesn't feel like there's an ideal name here, but UnwrapCUE() is using "wrap" in a way that just doesn't feel right.

Also, i don't like staring at CUE all over my code. It's yelly.

Underlying() seems better, and vaguely reminiscent of Go's notion of the type underlying an interface, which at least feels less wrong than Unwrap. Open to suggestions, though. For the limited window before i just do this :)

@IfSentient this change is coming, and it's obviously breaking. It'll be a trivial refactor though, just a method rename.

Create standard approach to migrating to Thema from other schema systems

A common case that's repeatedly come up is the need to gracefully go from some other schema system into Thema - an onramp of sorts.

Technically, this can be done with the 0.0 schema, and the "real" Thema schemas start on 1.0, but that's not great, as you still have to write a reverse lens back to 0.0, people could be confused and try to add a 0.1, and everyone has to just know in perpetuity to never try to translate back to that version.

It seems reasonable to me that lineages allocate a special, optional space for creating an optional onramp key, which need not follow joinSchema, and can be used to capture all the weird and warty former versions of an object - and then provide a forward-only lens. (Once the data's in, there's no going back.)

Add constraints to `#Lineage.name`

At minimum, we want:

  • strings.MinRunes(1)
  • ~=[z-aA-Z0-9_]

Probably the ideal target is the set of valid characters for an unquoted CUE label. That covers another known, future use case by disallowing slashes - important to be able to introduce e.g. an optional #Lineage.uri property later, and enforce that its trailing element is == #Lineage.name and be able to know a priori that there is only one trailing element.

Again, we can open this up more later. Opting for restrictiveness initially gives us more options later.

Make Thema errors excellent: clear, actionable, informative

For anyone not terribly interested in learning about the details of CUE and just wanting Thema to be a schema framework (our core target use case), validation error output is nearly indecipherable for anything outside of trivial cases. I laid out some examples in this grafana issue: grafana/grafana#37859

While there remain underlying things that CUE itself probably ought to fix - in particular, eliminating irrelevant disjunction branches in the presence of a discriminator field - we can actually probably do most or all of that directly in Thema. When it comes to generating comprehensible error messages, our job will always be easier, because we know enough about our inputs to be able to treat one input as schema, and the other as data compared against schema.

This can be done in stages. First will be just getting the basic scalar/enum-type validation failures improved. After that, we can layer on handling for more complex structures, like complex struct disjunctions.

Note that #8 has the potential to make those disjunction easier - if discimrinator fields are a property we can discern from the way that composition itself is declared, then we have all we need to do branch elimination.

  • #45
  • Actually write some test suites, yikes
  • Handle disjunctions

Add flag to tests that allows disabling `SkipBuggyChecks`

i'm quite happy with how the SkipBuggyChecks() option allows us to write Thema code that will automatically gain guarantees as the problems in the lib/CUE are resolved. However, there needs to be a flag that can be passed to go test which causes the option to be ignored. That way, we can trivially see which things are actually buggy without having to edit code.

Some simple package variable trickery ought to be sufficient.

Put final state of tutorials into easily-fetched form

The tutorials are quite helpful, but they introduce too much code, in too many broken-up parts, to reasonably expect the user to copy/paste or otherwise recreate it all themselves. We need to provide a clonable repo, or some subdirectory within this repo, from which folks can fetch the final state.

Create a `thema` command

It's worth making a thema command. Maybe two, one for CLI and one for server, but it feels like the functionality here is so compact that it would be fine to bundle the two together, just to simplify distribution.

On the CLI side, it would:

  • Given some raw CUE, try to pull a lineage out of it and validate it (necessarily done by all the following commands)
  • Given a valid lineage and some data input, Validate against a specified version
  • Given a valid lineage and some data input, ValidateAny to search for any schema in a specified lineage that validates the data
  • Given a valid lineage and some data input, Translate to a specified version of a specified lineage if the input validates against any schema
  • Given a valid lineage and a specified version, output the schema as e.g. OpenAPI

The same functionality would be offered via a server, but exposed as an HTTP API, and would allow specifying multiple lineages at a time. (That would initially be done via config, but could also have a special endpoint for ingesting new lineages.) This is easy to namespace, because lineages intrinsically require a name, so we can split up the HTTP API by that name, e.g. /<lin.name>/validateany takes a POST with data, with http headers determining content type of input and output. The only state depended on is the input lineage, which will make

It may be worth writing another kernel or two for most of these behaviors...or not, because really, it's just a couple method calls.

Make [de]hydration/default trimming a part of Thema's API

Some time ago, @ying-jeanne wrote logic to trim defaults out of a Thema schema (back when Thema was still inside Grafana and called Scuemata). The basic goal there is very simple: given a valid instance of a schema, produce a new instance of the schema that has all default values {present,absent}. I copied that logic into Thema when initially setting all this up, but kept it un-exported and haven't actually attached it to the API.

It's time to make that a proper part of the Thema API. I think it makes the most sense to attach it to the Instance type as a method with some rendering options, though i may change my mind once i get in there.

Some follow-up TODOs

  • Add an option to de/hydrate output to thema data translate command
  • Munge the src name of de/hydrated Instances
  • Add test suite

Add `thema lineage gen` subcommand to CLI

Experience with using Thema within Grafana Labs has clearly shown that having a strong code generation story is the way to get to dopamine fastest with Thema.

Now that we have generics and in particular ConvergentLineage, it's feasible to fashion a simple, opinionated code generator as part of the CLI - thema lineage gen - that should be sufficient for anyone to:

  • Write a lineage
  • thema lineage gen ... to create Go bindings with a ConvergentLineage
  • Get a vmuxer with a single line of handwritten code, and put it to work immediately

Having this will also mean we can eliminate 80% of the existing tutorials, as all of those needs will be automated away.

Initial targets for cli codegen: openapi, json schema, go, typescript

#71 (comment)

Write the invariants doc

It's essential that we communicate clearly about what the invariants of Thema are. They are the truly unique value proposition of the system, and they're referred to all over the place. Being able to learn them, and therefore reason about the guarantees, is a key design premise. As such, they have to be documented, succinctly, all in one place.

This doc is the placeholder for where it all needs to go.

Figure out how to enforce that sequences must be non-empty

We want to treat a lineage as invalid if it contains an empty sequence (zero schema declarations). That's the intent of this bit:

thema/lineage.cue

Lines 31 to 35 in b7b4330

// A Sequence is a non-empty ordered list of schemas, with the property that
// every schema in the sequence is backwards compatible with (subsumes) its
// predecessors.
// #Sequence: [...joinSchema]
#Sequence: [...joinSchema] & list.MinItems(1)

However, this kinda makes everything blow up. Like, with just the above declaration, this happens:

$ cd exemplars
$ go test
--- FAIL: TestExemplarValidity (0.00s)
    --- FAIL: TestExemplarValidity/Bind-defaultchange (0.00s)
        exemplars_test.go:40: defaultchange.l.#Sequence: invalid value [] (does not satisfy list.MinItems(1))
    --- FAIL: TestExemplarValidity/Bind-expand (0.00s)
        exemplars_test.go:40: expand.l.#Sequence: invalid value [] (does not satisfy list.MinItems(1))
    --- FAIL: TestExemplarValidity/Bind-narrowing (0.00s)
        exemplars_test.go:40: narrowing.l.#Sequence: invalid value [] (does not satisfy list.MinItems(1))
    --- FAIL: TestExemplarValidity/Bind-rename (0.00s)
        exemplars_test.go:40: rename.l.#Sequence: invalid value [] (does not satisfy list.MinItems(1))
    --- FAIL: TestExemplarValidity/Bind-single (0.00s)
        exemplars_test.go:40: single.l.#Sequence: invalid value [] (does not satisfy list.MinItems(1))

cue eval also fails with the same kind of error.

(We actually only make it that far if this is commented out, but the above is more meaningful)

To keep all the bad from happening, i've added this so that the list.MinItems() constraint is always satisfied:

thema/lineage.cue

Lines 31 to 46 in b7b4330

// A Sequence is a non-empty ordered list of schemas, with the property that
// every schema in the sequence is backwards compatible with (subsumes) its
// predecessors.
// #Sequence: [...joinSchema]
#Sequence: [...joinSchema] & list.MinItems(1)
// This exists because constraining with list.MinItems(1) isn't able to
// tell the evaluator that it is always safe to reference #Sequence[0],
// resulting in lots of garbage errors.
//
// Unfortunately, this allows empty lineage declarations by making the first
// schema an actual joinSchema, which we do not want to be valid text for
// authors to write.
//
// TODO figure out how to express the constraint without blowing up our Go logic
#Sequence: [joinSchema, ...joinSchema]

But clearly, that runs against the goal of making an instance of #Lineage invalid if it contains an empty sequence.

My relatively middling skill with CUE is really the limit here, i think ๐Ÿ˜ข. i'm just not sure how to express what i want: that an instance of #Lineage is only valid if it contains non-empty sequences. My attempts to do so seem to be getting caught up in how the #Lineage declaration itself is invalid because it does not, itself, satisfy the invariants. Which it shouldn't!

kernel: Go type incorrectly fails validation against schema

Currently, this Go type:

type Example struct {
    Enum string `json:"enum"`
    FlagMap map[string]bool `json:"flagmap"`
}

will fail validation, as performed in kernel.NewInputKernel(), against this Thema schema (lineage scaffolding omitted):

{
    enum: "foo" | "bar"
    flagmap: [string]: bool
}

We don't want that.

We may have to work a bit at getting this really correct, but i think the basic desired check is, "will all valid instances of the schema be assignable to the Go type, and would a round-trip result in the same input." That does likely mean - as in the enum case above - that the Go type will allow some values that the schema does not. But, even with the impending arrival of generics in Go, i see no way around that - CUE is simply a much more precise language for expressing bounds and controls over value spaces.

Convert `thema.Library` to `*thema.Runtime`

thema.Library should be renamed to thema.Context. Its methods should also be converted to requiring a pointer, thereby requiring all existing function signatures to switch to *thema.Context.

I originally named it Library because i thought this was a nice nod to how we basically load a bunch of CUE "functions" in from the thema CUE package, then call them from the various Go methods within thema. And i wanted it to be a value, not a pointer, because i was hoping to make the zero value useful, similar to e.g. sync.Mutex.

However:

  • Library contains a cue.Context, and shuffling that thing around has ended up feeling like the main job of Library when actually using the Thema go package.
  • I'm not thrilled with calling it thema.Context because it sorta muddies the water by not having cancellation capabilities. But my real objection there is with how Go stdlib already muddied those waters by conflating variable bags together with cancellation. Asking people to learn a new term, Library, is a heavy lift for just resolving that ambiguity. Better to just fall in line with CUE itself and use thema.Context.
  • Implementing thema within Grafana has made it pretty clear that allowing an explicit nil is preferable for centralizing use of a single shared context. Adding nil to the accepted value space of a function signature allow the caller to make it clear that they are not specifying a value, and expecting the func impl to grab the central one instead. This could be accomplished without pointers by passing the zero value - thema.Library{} - but that feels like undocumented magic; expressing the absence of a value is, unfortunately, one of the main use cases for nil pointers in Go.

Thus: thema.Library -> *thema.Context.

Make `thema` CLI dynamically support `github.com/grafana/thema` imports

Currently, running thema outside of either the thema repo itself, or outside of a cue.mod module context that does not have the github.com/grafana/thema codebase within its cue.mod/pkg dir, will result in

Error: import failed: cannot find package "github.com/grafana/thema"

This is clearly a big problem, as it means the thema CLI only works when the user has already set up their fs "correctly"...in a way that is difficult to even explain how to do correctly.

Known issues

Thema has a number of outstanding issues to address. Some are more critical than others, and not all need be addressed before we can call Thema stable. Each one of merits its own discussion, but i'm going to use this a scrappy list centralized in a single, easily-linked-to issue as a TODO list until time allows to get each one written up properly.

  • Adding generic composition, which we've already halfway genericized in Grafana
  • A clear, consistent answer for how object headers (at least the thema version, but possibly more) are attached to schema, and the degrees of freedom for individual lineage authors to choose their object headers
  • Settle on openness/closedness for various components in the system
  • permissibility of creating outside-the-lineage references
  • URIs/a registry/something; is this independent of CUE packaging rules
  • Invariant for the concreteness of a lens relation
  • Is it a mistake to treat lenses as mutable-after-publishing? The tradeoff seems to be right - it's too brittle to assume the translation mechanism is bug-free, because that would mean people could paint themselves into dead-ends. But it also complicates the publishing model
  • nonlinear translation/lens pathing, to avoid more dead-end painting. A good impl here could change the calculus on lens immutability
  • Naming for "sequence number, schema number"
  • really, really need to have a name/uri field for these. But how do we square that with CUE package management?
  • a mechanism for simplifying copying most fields in lens mappings
  • special mechanism for schematizing fields that are expected to contain (string?) references to other thema lineage
  • Lens shape, does it need refactoring to support nuanced default translation
  • need a mode for loading in Go that can ignore certain constraint violations...maybe? this is the list.MinItems(1) is invalid problem. Also, how can we the non-zero list length available for pseudofuncs that work on a lineage
  • Express all invariants in pure CUE

s/Lacunae/Lacunas/

This was cute as an idea, but Latin pluralization is just not actually fun to have around all the time.

Rewrite initial Thema tutorial to focus on just basic schema authoring, then pivot to codegen right away

Thema's existing tutorial (parts 1, 2, and 3) is outdated and overly verbose.

All of part 2 and most of part 3 are now now pretty much done by codegen (thema lineage gen). thema lineage gen gobindings with --bindtype even goes a step beyond part 3, thanks to generics and the new vmux system.

I think a new pass at the tutorials should focus on getting a basic thema-based dev loop going: write a schema, update types and bindings in Go and TS, test with some data.

Subcommands within thema lineage init, thema lineage gen, thema data validate should be sufficient for all this. We can, for now, ignore multiple schemas and lenses.

Some of what's in the existing tutorials could probably be reused, but if we just started from scratch, i wouldn't be unhappy ๐Ÿ˜„

Actually stand up a docs site

The raw docs under docs/ are passable to consume in github, but what's really needed is a docs site.

https://github.com/bep/docuapi looks like it'd be promising. The tutorials, in particular, have text that ought to appear adjacent to code, rather than above or below it.

Panic when attempting backwards compatibility verification between schemas

Currently, CUE (v0.4.0) panics on attempting to call Subsume to verify backwards [in]compatibility between schemas as part of BindLineage. This means we can't do our backwards compatibility verification within/between sequences - the most basic check Thema promises.

It seems to be tied to the cue.Value created by the Go CUE library's iterators, but i haven't narrowed it down yet. I'm putting up a CUE issue once i've isolated it at least a bit.

Use "major version" and "minor version" for syntactic versions

Thema's versioning system has two digits, e.g. v0.0 - what people coming from semver would be naturally inclined to call a "major version" and a "minor version."

i resisted this naming initially, for reasons i can't even remember. Instead, i've been calling the first number "sequence version" and the second "schema version." But this was an error. For one, "schema version" is confusingly ambiguous - is that the second number, or the version number as a whole? And second, the cost of asking people to learn new terms for version number positions is significantly higher than the risk of having folks transfer possibly-slightly-incorrect associations from semver.

Putting me over the top on this was the fact that OpenMetadata landed on major.minor with exactly the same semantics.

So - all docs need to be updated to use major and minor where appropriate. Variables and fields should be renamed from e.g. seqv and schv to majv and minv.

Doesn't Build (Or Go get) at least with go1.17

# github.com/grafana/thema
./trim.go:15:40: r.val undefined (type Instance has no field or method val)
./trim.go:25:18: unknown field 'val' in struct literal of type Instance
./trim.go:107:43: r.val undefined (type Instance has no field or method val)
./trim.go:117:18: unknown field 'val' in struct literal of type Instance

(Same with go get)

@ 9013ce8

Improve subsumption/invariant checking error output

In #45, we added a basic error reprocessing layer that leveraged the knowledge we have about the schema<>instance relationship to make validation errors messages clearer and contextualized for Thema.

Similar work now needs to be done for error messages coming out of invariant checking.

Actually implement `thema srv`

The README and CLI help docs talk a big game about having an HTTP server mode, but it's not yet implemented. Do that.

There's not a lot of mystery, here - it's very much what the CLI mode does, just moved over to an HTTP query structure.

Constrain sequence lenses so that an absent declaration is not valid

A key invariant of scuemata is that authors must define lenses for all sequences after the first. While it's impossible to generically verify the completeness of the translations in these lenses, or the lacunae the may emit, the bare minimum here is erroring if the author doesn't define a lens, at all.

Unfortunately, when we take the #Lens definition

thema/lineage.cue

Lines 48 to 67 in b7b4330

#Lens: {
// The last schema in the previous sequence; logical predecessor
ancestor: joinSchema
// The first schema in this sequence; logical successor
descendant: joinSchema
forward: {
to: descendant
from: ancestor
rel: descendant
lacunas: [...#Lacuna]
translated: to & rel
}
reverse: {
to: ancestor
from: descendant
rel: ancestor
lacunas: [...#Lacuna]
translated: to & rel
}
}

and reference it directly in seqs where we want lens instances to appear:

thema/lineage.cue

Lines 72 to 80 in b7b4330

seqs: [
{
schemas: #Sequence
},
...{
schemas: #Sequence
lens: #Lens
}
]

it means that a lineage is still valid even when the author omits the field. The questions are:

  • What's a reasonable thing to force a constraint failure on? Should #Lens itself be refactored so that it's more amenable to doing this?
  • How do we square wanting this to be incomplete with also wanting to provide helpers that reduce some of the boilerplate of writing lenses?

Maybe this is just yet another place where we need a must() check (cue-lang/cue#943), particularly to emit the right error.

Note that this issue is targeting a relatively minimal check - to climb up to our true-goal invariant, in addition to basic lens existence, we have to be certain that a) the Lens connects the end of the prior sequence to the head of its current sequence, and b) that given a concrete input valid wrt the predecessor schema, the rel of both forward and reverse lenses is sufficient to produce a concrete output of the successor schema.

Change `#Lineage.joinSchema` from `_` (top) to open struct with version

#Lineage.joinSchema defines the join/least upper bound that must be maintained by all schemas in a lineage. Currently, it starts as top, _:

thema/lineage.cue

Lines 11 to 24 in b7b4330

#Lineage: {
// joinSchema governs the shape of schema that may be expressed in a
// lineage. It is the least upper bound, or join, of the acceptable schema
// value space; the schemas defined in this lineage must be instances of the
// joinSchema.
//
// In the base case, the joinSchema is unconstrained/top - any value may be
// used as a schema.
//
// A lineage's joinSchema may never change as the lineage evolves.
//
// TODO should it be an open struct rather than top?
// TODO can this be a def? should it?
joinSchema: _

This isn't great:

  • It makes injecting a themaVersion field complicated, since we can't universally assume all schemas are a struct at base (could be scalar or list)
    • We can't even assume that all schemas are structs within a lineage
  • It's plausible that other bits of thema's helper CUE logic may be subtly assuming a struct type here as well
  • I don't see a use case for scalars or lists as a base schema type, for essentially the same reasons that

I'm sure i want it to be an open struct, {...}. i was worried about open/closedness here, but after some poking on the playground, i think we can make the shift without forcing any decisions on openness, which is great - that's a can of worms.

I don't see an immediate reason why this change would be anything other than changing just the one line - i'm mostly writing this issue up for posterity and as a place to dump that link to the playground. (also have it in a gist)

Change "Kernel" to "Mux"

As i was working on a larger design doc encompassing Thema, it occurred to me that a good way of describing a main intended usage pattern for Thema's versioning system is as a "version multiplexer".

From wikipedia:

A multiplexer...also known as a data selector, is a device that selects between several analog or digital input signals and forwards the selected input to a single output line.

This is a totally fair description of what Thema's kernels do. Just, rather than doing them over some set of distinct signals/"conversations"/HTTP handlers, it does it over the set of versions of schemas for a given thing. Seems like a much more descriptive name than "kernel."

This is just a renaming task - no functionality change. I think what needs to change would be:

  • Renaming kernel subpackage to mux
  • Renaming all things with Kernel in their name within that package to some variant of Muxer
  • The word Converge can probably go away since it was a word i picked that is really just the "mux" concept. Like, Mux.Do(b []byte) would make more sense...though that may have other problems
  • Updating the tutorial docs appropriately

It'll take some creativity to come up with the right set of words here, but i'm gonna mark it a "good first issue" anyway. (It won't be for everyone, but someone with a particular kind of inclination towards interface design might find this task a good angle for grokking what Thema does)

Create WASM system for embedding Thema operations in a browser

The thema command offers (or will offer) basic Thema operations via CLI and HTTP. It'd be absolutely amazing if we could also represent those operations as something that was easily embedded in a webpage.

This'd clearly be WASM. I've never worked with WASM and don't know where to start, but if the CUE playground is feasible, this must be, too.

I'm picturing three linked text input boxes:

  1. Lineage input, a text box that accepts raw CUE input, and a lineage can be pasted
    • Could also support a mode where e.g. the exemplars can be selected from a dropdown menu
    • Or just an arbitrary URL, which it'll try to fetch and populate the lineage input
  2. Data input, a text box that accepts raw YAML or JSON or CUE
    • Or, again, an arbitrary URL to slurp data from
  3. Operation/output, which includes controls over which operation to run (validate, validate-any, translate), and shows the result of performing that operation using the given lineage, with the given data.
    • Or, given that the set of operations is finite, closed, and small, it might be better to just always show the output of all operations. They can all meaningfully share the equivalent of a -v in the same way they do across the thema data subcommands. I suspect that being able to see how all three operations harmoniously relate will reinforce something about how Thema itself works in a way that could be missed when individually running thema data commands.

Eliminate use of Eval() in encoding/cue and any internal helpers

In the formatters introduced in e.g. in #56 and #59, there are a number of calls to cue.Value.Eval(). IIRC, i used these originally as a quick hack to eliminate printed references to the lineage's joinSchema.

I knew it was suboptimal at the time and left in a bunch of TODOs to remove, but the immediate impacts of doing it did not occur to me. Now that i've tried it against Grafana's dashboard lineage, though, i realized that the eval call definitely eliminates field/list comprehensions. That changes the semantics of the dashboard, which needs those for its (current approach to) composition.

While that hacky composition approach is almost certain to change with #8 - like, that comprehension should probably not be in the body of the schema - it just emphasized to me just how important it is to preserve the original, textual input from the user for these cases.

Make a kernel that uses generics

Thema's kernels are an ideal place to use generics: Make a kernel with type T; take []byte in, spit T out.

I haven't played enough with generics yet to have a confident vision of exactly what it should look like, but i currently see no reason to believe that we shouldn't just replace the existing InputKernel entirely with this approach.

This is also pretty self-contained - i suspect somebody else could pick it up :)

Add scaffolding support to `thema`

The thema command promises scaffolding support, but currently has none. Implement some.

At minimum, this should include generating an empty lineage with a package and a name. Eventually it'd probably be nice to also add new schemas and lenses this way, too.

Make the input kernel return validation errors based on the `To` schema

Currently, when data to InputKernel.Converge() fails validation, it will simply return a bland validation failed error. This isn't helpful in any real scenario. Instead, let's have it return the validation error that came back from the To schema, as we can be reasonably certain that that's the schema version the author is thinking about, anyway.

i didn't do this initially because i wasn't quite sure it would fit the user's mental model, but on seeing folks actually use it, it seems clear that this is the assumption they'd make.

Add field comments to tutorials and exemplars

The tutorials and exemplars should all have well-commented fields to establish in the mind of the reader from the very outset that adding docs to schema fields is both possible and desirable.

Introduce compositional Lineages

As of right now (Jan 3 2022) there is no code, docs, or anything that deals with compositional lineages. The only hint that they're even a thing is the UnarySchema and UnaryLineage struct types.

These absolutely are going to happen. The chief use case that motivated Thema in the first place - Grafana dashboards - absolutely requires them, and i worked out a halfway generic pattern for them back in August (grafana/grafana#38727).

It's "halfway" in the sense that it introduces a basic pattern for getting composing one lineage into another - a compose key on the lineage, which takes a string-templated key of other lineages, which makes injecting the other lineages pretty easy - but that composition is non-invertible (we have no trivial means of mapping the composed fields back out of the parent schema). Which is a problem for generic translations, and one of three issues to be addressed en route to generic compositional lineages.

Issue 1: Translation de/recomposition

Given an instance of a compositional lineage/schema that we want to translate, we must:

  • Decompose the instance into the parts owned by itself/, vs. by the lineages it's composed
  • Map the sub-instance back to the original form of the composed lineage (so, invert this mapping)
  • Perform translation on the original form through the lenses defined on the composed schema
  • Map the translated sub-instance back into the parent, folding any lacunae emitted by composed lineage translation into the accumulator as it proceeds

This shouldn't be horrifyingly difficult - it basically means one more layer of indirection and the introduction of a mapping object somewhat like #Lens, so that instead of directly doing and referencing the remapping right there under the compose key, as above, there's a new kind of object that defines how to map in both directions.

The compose key itself needs to stay top-level on the Lineage, as composition objects are essentially arguments to the whole lineage. But these composition mapping objects will likely need to be sequence-level, as composition mechanism can't change schema-to-schema. That's how we maintain backwards compatibility.

Issue 2: Translation destination

Another key issue to be tackled is control over translation distance with composition. That is, while translating to #Latest or #LatestWithinSequence is easy to map from the parent lineage to composed lineages, a key goal of Thema is that each lineage has their own versioning and history, so mapping #Exact from parent to composed lineage makes no sense.

For example, if i say, "Hey #Translate, take this instance to version [1, 4]", that's fine for the parent, but that version may not exist in the composed lineages. Even if it does, it is completely uncorrelated with the parent version, so whatever motivation i as the caller have for picking [1, 4] in the parent lineage cannot possibly hold for composed lineages. (What if [1, 4] is an upgrade in the parent and a downgrade in the composed?)

I think the solution here is reasonably straightforward, though i haven't thought it all out:

  • #Translate will need to be able to handle both unary and compositional lineages. In cases where the #SearchCriteria represent a user intent that can meaningfully cross the composition boundary (#Latest and #LatestWithinSequence, as explained above), those criteria are passed down directly to the composed lineage. For other #SearchCriteria like #Exact, we simply ignore composed lineages.
  • A new #TranslateComposed type will be introduced, which will allow the caller to explicitly specify behavior for composed lineages.

These two different translation calls will also weave in the first issue, as #Translate will necessarily flatten all emitted lacunae, whereas #TranslateComposed will emit lacunae within a structure that is isomorphic to the parent lineage's input composition structure.

Issue 3: Other forms of composition

The above is what's necessary for the one approach to lineage composition i've been focused on so far - the one that Grafana needs. A chat i had recently with @franklinhu reinforced a longstanding suspicion that there are other composition patterns that may important to support, which may end up changing some of the requirements we're driving at with both of the above.

I'm not overly concerned here, as i think any future composition forms should be easy to accrete onto Thema in a backwards-compatible way. And because i suspect that most things that initially appear to be compositions are either best or at least sufficiently accomplished as layers on top of the base #Lineage. That's how things turned out with CRDs.

Disallow certain CUE constructs within lineage (schema) declarations

There are some logical constructs in CUE that make analysis more complicated, and we'll probably have a much easier time creating the invariants if we just disallow their use within schema declarations. Here's a preliminary list:

  • if statements
  • comprehensions? def yes if composition logic (#8) can be kept entirely outside the schema itself and unified in when called
  • aliases? no specific reason to do this apart from it being suggestive that people are being too fancy

Cats don't like going back in bags. Better to err on the side of being restrictive initially, then open up later.

Tasks

  1. prio/low readiness/prod
  2. prio/low readiness/prod

Add helpers for export of Thema schemas as OpenAPI

It'd be quite valuable to be able to export schema from lineages as OpenAPI. This is true even independent of the goal for using Thema with CRDs/operators - but that's certainly a motivating factor.

There's support for converting CUE to OpenAPI in the cue stdlib, but it's a bit awkward to do - it still uses the old cue.Instance API, and it really wants #-defs for the types it takes, among other things. This is all doable, but it takes some massaging of thema lineage sources in a way that is really not reasonable to expect the average API user to do.

Add typescript gen support as package & in `thema lineage gen`

#72 introduced thema lineage gen, with subcommands for openapi, json schema, and go (types and thema bindings). i'd originally planned on TS as well, but cut it loose for time constraints.

This issue is to finish what i'd originally intended and add the TS output.

Refactor `#Lineage` to support reverse lens-mappings between schema within a sequence

Currently, lenses exist solely at the #Sequence level, and only for sequences after the first:

thema/lineage.cue

Lines 69 to 80 in b7b4330

// seqs is the list of sequences of schema that comprise the overall
// lineage, along with the lenses that allow translation back and forth
// across sequences.
seqs: [
{
schemas: #Sequence
},
...{
schemas: #Sequence
lens: #Lens
}
]

This is a relic of older thinking, where i was trying to convince myself that it's sufficient to guarantee translatability to the latest schema. It's not - if the same guarantee doesn't hold in the reverse direction, then a familiar evolutionary coupling results for communicating systems: it's not safe to start sending a new schema version until all receivers are known to have updated to accept it. It's arguably not as bad as the primary case, where receivers can't update until senders are known to support it, as receivers are often the hubs - but it's still an unacceptable degree of coupling.

Backward compatibility does not imply forwards compatibility/is a non-invertible relation, so introducing a requirement for reverse translatability means we have to change how lineages and sequences themselves are declared in order to introduce a place for the in-sequence reverse lenses.

Fixing this requires changing something fundamental about how lineages are declared. The simplest answer would be introducing an additional reverse-lens list onto #Sequence that's len(schemas)-1, though i haven't actually tried that to see what other weird warts that produces.

Even if that does work, i'm not sure i'm happy with the ergonomics - lineages are already a deeply intermeshed datastructure, and it's hard enough to keep track of everything going on in them. It's possible that we should use this additional requirement to trigger revisiting the way lineages are structured (in a way that preserves key invariants, obviously). I need to put up a discussion around that.


Although...

At least...i think we have to do this, but i'm not actually sure. I have this nagging voice in the back of my mind, saying that the set of changes allowable under backwards compatibility/subsumption rules actually do imply a lens morphism that can be inferred, at least in almost all cases:

  • If a new field was added, drop the field in the translated instance
    • This is the approach irrespective of whether the added field is optional or required+default
  • If a field contains a disjunction with a concrete default (e.g. field: *"foo" | "bar:), and new branches were added to the disjunction (field: *"foo" | "bar" | "baz"), backwards compatibility rules already dictate that the default (*"foo") must exist in prior versions. If an instance containing a value from a newly-added branch ("baz") is reverse-translated, fall back to the default ("foo")
    • If a field contains a disjunction with a non-concrete default or no default at all, and the newer instance falls outside the acceptable value space for the older schema, we cannot reach a resolution.

If the above is correct, it's not complete enough to obviate the need for a changed structure - though it could mean that automated logic is applied in the absence of a declaration. I'm loathe to get too far into lens generation right away, as it's probably a big rabbit hole, but i'd be remiss to not at least mention the possibility.

Add support for generating lineages from various inputs

It's enormously helpful to getting started with Thema that a user would be able to do get their lineages initially generated for them. Empty, and also by importing an existing schema from a CUE-supported format.

This'll need support both in a library/package form, and from the CLI.

Formalize a "grouped lineage" concept

For Thema's versioning rules to be meaningfully enforceable, we have to retain the basic rule that lineages are self-contained structures - no external references. However, it is clear that there are cases where grouping of multiple "distinct" objects is desirable.

The particular thing that makes the grouped objects "distinct", and yet still desirably a member of the same group is probably important. Maybe even crucial to the right design. I don't have a good general model of it yet, though, so i'll give some examples in order to help work by induction.

One obvious example is the notion of a Kubernetes group-version, where multiple distinct objects are grouped and indeed versioned together. AIUI, this is essentially because

  • The API generated and exposed for interacting with these objects is viewed as the irreducible unit
  • It is undesirable that any objects within that GV should ever have to deal with version skew relative to the other objects within the GV.

In Grafana, we have another kind of case that's come up. Here's some WIP code:

// The slots named and specified in this file are meta-schemas that act as a
// shared contract between Grafana plugins (producers) and coremodel types
// (consumers).
//
// On the consumer side, any coremodel Thema lineage can choose to define a
// standard Thema composition slot that specifies one of these named slots as
// its meta-schema. Such a specification entails that all schemas in any lineage
// placed into that composition slot must adhere to the meta-schema.
//
// On the producer side, Grafana's plugin system enforces that certain plugin
// types are expected to provide Thema lineages for these named slots which
// adhere to the slot meta-schema.
//
// For example, the Panel slot is consumed by the dashboard coremodel, and is
// expected to be produced by panel plugins.
//
// The name given to each slot in this file must be used as the name of the
// slot in the coremodel, and the name of the field under which the lineage
// is provided in a plugin's models.cue file.
//
// Conformance to meta-schema is achieved by Thema's native lineage joinSchema,
// which Thema internals automatically enforce across all schemas in a lineage.

slots: Panel: {
  // Defines plugin-specific options for a panel that should be persisted. Required,
  // though a panel without any options may specify an empty struct.
  PanelOptions: {...}
  // Plugin-specific custom field properties. Optional.
  PanelFieldConfig?: {...}
}

// Meta-schema for the DSOptions slot, as implemented in Grafana datasource plugins.
// DSOptions slot joinSchema. This provides space for both the normal and
// encrypted configuration portions of a datasource plugin's options.
slots: DSOptions: {
  // Normal datasource configuration options.
  Options: {...}
  // Sensitive datasource configuration options that require encryption.
  SecureOptions: {...}
}

// Meta-schema for the Query slot, as implemented in Grafana datasource plugins.
slots: Query: {...}

This is specifying one conventional lineage (Query), and two grouped lineages (Panel, DSOptions). Crucially, it is not expected that there ever exists a literal object instance of the schema, as-written. Rather, the definitions are grouped together because:

  • There is no use case in which it is valuable for them to be versioned independently
  • They are expected to be consumed as a group by the object composing them

These two reasons look a lot like the k8s ones. I don't think that's an accident. Rather, I have a strong feeling that this is a case where finding and applying a few math-y formalisms will lead to an elegant solution, which ultimately might be called a "grouped lineage."

Almost certainly relates to #8, though i'm not sure how yet.

testing framework

Hello,

In my research about Scuemata, I have not seen a test framework being mentioned.

It would be great to be able to provide some data, and validate that schema migrations and lacunaes are generates accordingly. Instead of everyone implementing it, an opiniated framework built into scuemata would be useful.

Plan out a path for converting a Thema lineage to a CRD

The Thema tutorials lay out how to map a Thema lineage to a LineageFactory, which forms a bridge from the literal lineage written in CUE to what's written in Go. The tutorials then continue on to create an InputKernel, which is one way of using Thema from Go programs.

However, that's not terribly helpful for the use case of Thema-as-Operator-framework - or at least, it doesn't seem that way right now. In that case, we want to go from a Lineage to a Go expression of a CRD, and the relevant necessary components on the Go side (controllers, Go types...?). i'm reasonably sure this can be done generically, which is why this issue is here, rather than in grafana/grafana.

This is a bit of a counterpart to grafana/grafana#44242, but that's just the tracker for where we're prototyping.

Would welcome input or just a "hey i'd like to help figure this out" from anyone so inclined :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.