Giter Site home page Giter Site logo

spartanz / schemaz Goto Github PK

View Code? Open in Web Editor NEW
164.0 164.0 18.0 1.26 MB

A purely-functional library for defining type-safe schemas for algebraic data types, providing free generators, SQL queries, JSON codecs, binary codecs, and migration from this schema definition

Home Page: https://spartanz.github.io/schemaz

License: Apache License 2.0

Scala 99.73% Shell 0.27%

schemaz's People

Contributors

danielyli avatar gitter-badger avatar grafblutwurst avatar insdami avatar jkobejs avatar juanpedromoreno avatar lglo avatar mijicd avatar vil1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

schemaz's Issues

Protobuf

We want a protobuf module with interpreters into protobuf (de)serializers.

Allow for recursive (self-referent) schemas

We need a way to represent recursive types like:

sealed trait Tree
final case class Node(left: Tree, right: Tree) extends Tree
final case class Leaf(label: Int)              extends Tree

One idea (thx @julienrf) would be to have a dedicated member in the GADT for that:

final class SchemaReference[F[_], A](ref: => F[A]) extends Schema[F, A]

which would allow us to use lazy vals or def to build such recursive schema, but would require interpreters implementers (that is us, most of the time) to properly handle the laziness.

lazy val tree = union(
  "Node" -+>: record(
    "left" -*>: tree :*:
    "right" -*>: tree
  ) :+:
  "Leaf" -+>: record(
    "label" -*>: prim(IntSchema)
  )
)

Another, more "essential" way would be to find a way to encode the "y-combinator", or some kind of fix-point, at the schema level.

Discuss with the scalaz-parsers team

Our project and scalaz-parsers seem to be related, we should start a discussion with the team building scalaz-parsers to define how we could best collaborate.

This issue should be solved by "dragging" member(s) of the scalaz-parsers team into this conversation and/or creating a corresponding issue in the scalaz-parsers repository.

JSON (proper)

We currently have a toy example interpreting schemas into JSON encoders, that merely stands as an example/POC for interpreting schemas into contravariant functors.

We need to add modules for working real-world JSON libraries (prioritising the most widely used).

Candidates are:

  • circe
  • play-json
  • argonaut
  • scalaz-json (when it happens)

Deal with failure scenario caused by choice of sum type encoding

Sums are encoded as lists of prisms, which means it's possible to fail to get any branch in a sum. That is, you can always set a given branch, but you can fail to retrieve any branch at all, if the user didn't specify enough branches to fully represent the sum type.

I think there's a way to fix this using compositional prisms.

WIP: Use traversal instead of hard coding Schema[List[A]]

We have a schema for a list. But we can generalize this and use use a Traversal optic like the product (lens) and sum (prism) cases.

Something like this:

final case class ColSchema[A](seq: Collection[A, _]) extends Schema[A]

case class Collection[C, A](traversal: Traversal[C, A], element: Schema[A])

I think every optic type will have its own term in the Schema sum type to give us the greatest chances for powerful composition.

Note: this is not quite right because we would like the ability to not only just modify elements and retrieve them, but also set them.

Get rid of source dependencies

Source dependencies don't work as expected in sbt, we should get rid of those until the problem is solved at the sbt level.

Meanwhile, we should get rid of all source dependencies (ProjectRef in the build definition and plugins), this will need scalaz/scalaz-sbt#20 to be solved first.

Discuss with the scalaz-analytics team

Our project and scalaz-analytics seem to be related, we should start a discussion with the team building scalaz-analytics to define how we could best collaborate.

This issue should be solved by "dragging" member(s) of the scalaz-analytics team into this conversation and/or creating a corresponding issue in the scalaz-analytics repository.

Schema diff

Implementing automatic data migration implies being able to compute the structural difference between to schemas.

Ideally, this difference would be represented as a sequence of "schema transformation" as defined in #45.

API Polish

Once the core is stabilized and we're confident the internal schema representation won't drastically change again, we need to make the API ready for public consumption by:

  • naming things correctly
  • reducing the API surface
  • removing the dependency to shapeless, or at least hiding it from the user (right now, we only need shapeless for getting singleton types for fields and branches names).

Iso schemas aren't enough

With only IsoSchema, we cannot define schemas such as "dates represented as JSON strings":

val dateSchema = iso(prim(JsonString), stringToDate)

Because we cannot implement a correct Iso that goes from String to Date. We could go from String to Option[Date] with an isomorphism, but that of course doesn't cover all the use cases.

A simple solution would be to add a PrismSchema to the schema GADT, same as IsoSchema but with a Prism instead of an Iso. Interpreter for covariant functors would then use this new node to handle errors and contravariant ones would use it the same way they use IsoSchema.

If we go down that road, the public API should be adapted carefully. I'd rather not add a prism combinator, because it would be easily confused with prim. Ideally, I'd like to have a unique combinator for "a schema seen through an optic" that would build an IsoSchema or a PrismSchema depending on the argument it is passed, be it an Iso or a Prism.

Flesh out the Schema AST

The constructors for schemas aren't implemented yet (see this definition for example).

We'll probably need more constructors like the one already defined (for records fields at least).

Also, the monocle dependencies (for things like Getter, Optional and so on) isn't defined yet, it needs to be added to the build.sbt

Add codecov

We need to measure the test coverage and display the badge on the README.
The fact we use testz might (or might not) make that a little bit tricky.

It's pretty much just a matter of installing the scoverage plugin and use coverage and coverageReport at the right place in the .travis.yml.

Avoid going through the "tuple representation"

This should be done after #38.

We should search for to avoid going through the "tuple representation" of records in the functors we derive.

Currently, when processing a record like:

final case class Foo(i: Int, s: String, b: Boolean)

The functors we derive from schemas first convert instances of Foo to a value of type (Int, (String, Boolean)); there should be a way of expressing derivations that avoids this unnecessary step.

Solving this issue might require to change the type of the iso field in Union and Record, currently an Iso[A0, A].

WIP: Top-level polymorphic schema definition

Since we are using SchemaModule, we may want to create a top-level, polymorphic definition of a schema. Something like this:

trait Schema[A] {
  def apply(m: SchemaModule): m.Schema[A]
}

Then a user can write:

object PersonSchema extends Schema[Person] {
  def apply(m: SchemaModule): m.Schema[A] = {
    import m._

    // Use everything in `m`
  }
}

If now we had a way to abstract over the primitives required by a schema, it would then be possible to have polymorphic schemas (across, e.g., Scala, JSON, XML, etc.).

Generic derivation of "Isos"

From the user's point of view, the most tedious part of defining a schema is writing the values of the iso field of unions and records.

For instance, in ordre to define the schema for a class

final case class Foo(i: Int, l: Long, s: String, b: Boolean)

one must implement an Iso[(Int, (Long, (String, (Boolean)))), Foo], which is 100% boilerplate.

Using generic programming, it is possible to derive automatically at compile time an Iso between any case class (product type) and its "nested tuples" representation. We should provide such a way in the generic module using scalaz-deriving.

This issue must be solved without modifying the existing API for building schemas (namely, the public methods in SchemaModule).

An acceptable solution would be to define some Generic* module extending SchemaModule with two methods :

  • caseClass* allowing to define a Record by simply providing a "product of labelled fields" (with automatic derivation of the iso)
  • sealedTrait* allowing to define an Union by simply providing a "sum of labelled branches (with automatic derivation of the iso)

Important note

This issue should be solved while taking into account the results of #44.

*: these names are merely suggested names

Reintroduce compile-time checks for record/union constructors

In the current schema encoding, RecordSchemas (resp. Unions) only make sense when they are built from "labelled products" (resp. "labelled sums"), ie. a product (resp. sum) whose all members are ProductTerms (resp. SumTerms).

It is necessary to (re)introduce a way to make impossible to construct records/unions with anything else than labelled products/sums.

This might be done "syntactically", by exposing methods and syntax that allow just that and nothing else (tweaking the methods' signatures in SchemaSyntax for starters), or by going down the "shapeless-like inductive type-level constraints" path (but that might prove cumbersome because of the "pattern-functorish" nature of Schema).

Implement JsonSchema

In src/main/scala/schemas.scala there is a unimplemented JsonSchema object.

We need to give it a proper implementation of type Prim[A], ie the "set" of primitive types in JSON.

I think we just need to create an ADT with string, number, boolean and null for that to work

Moving out of prototyping phase

Once all other issues labelled with needed-for-v1 are done, a few things need to be done in order to move out the prototyping phase:

  • add file headers and other "administrative stuff" (most of this should be done in sbt-org-policies)
  • add a proper CONTRIBUTING.md
  • merge prototyping into master
  • reconfigure the repository to make master the default branch again (CI, protected branches, etc)

Algebra of schema transformations

We call "schema transformation" all the operations one can apply to a schema that maintain the guaranty that data written by the original schema can be read using the new one (backward compatibility) and vice versa (forward compatibility).

These transformations are implemented in Avro, although there are probably not identified as an algebra by Avro's author.

So the two steps for solving this issue can be:

  1. look at Avro's source code and documentation to list the candidate operations
  2. formalise these findings to come up with a more fundamental abstraction (there is a fair chance we'll end up with a category of "schemas+transformations" (the so-called schema transformations being the morphisms of that category).

Alternatively, searching through the academic literature should also yield interesting results.

Generic derivation of schemas

Building upon #47, it seems rather obvious that we should be able to automatically derive the schema of any ADT at compile time (as a matter of fact, this gist achieves just that, using shapeless).

Like to #47, this should be provided in the generic (sbt) module, with no impact on the existing API for constructing schemas.

Also, if #34 gets merged before this issue is solved, words must be added in the documentations to stress out the fact that such fully automatic derivation should be used with caution, for it somehow defeats the purpose of having schemas as runtime values (in scenarios where schema evolution is a needed feature, it OK to fully derive schemas at compile time, as long as the application stores the derived schema somewhere at runtime).

Pull the "isos" out of the schema AST

We can (and should) pull the "isos" out of the SchemaF pattern-functor.

We would end up with a user-facing SchemaZ case class containing an "iso" from a business type T to a generic representation (sums of products) A and a Schema[A].

The typeclass derivation would still work the same: it would derive instances for A and corresponding instances for T would be obtained by maping or contramaping the relevant half of the "iso".

The impact on migrations is still unknown though.

Flesh out the GenModule

In order to see how far we can go with our current approach, we need to make sure the current Schema AST allows for concrete features like giving free scalacheck Gen instance from a schema.

That is, provided we have Gen instances for all the primitive types in a schema, we need to produce a Gen for the whole schema.

The (commented out) signature for gen in modules/scalacheck/src/main/scala/GenModule.scala might need to be adapted to fit that purpose.

WIP: Schema mapping using isomorphisms

It'd be nice that if you had a Schema[A], you could imap that to a Schema[B] by providing Iso[A, B]. I don't know if that's possible but I think it should be, by going through and using Iso on all the lenses / prisms / traversals.

Add benchmarks

We need a benchmarks sbt module with benchmarks for the functors produced by all our interpreters.

As satisfying first iteration would:

  • define a "meaningful" example schema (complex enough to contain at least one instance of every members of the Schema ADT).
  • for each interpreter, provide an "honest" implementation of the same functor, as it would be implemented "by hand" in real-life. For example, implement a play.api.libs.json.Reads using the Json.reads macro.
  • compare performances of both the derived and the manually implemented functor to come up with a performance score.

Schema serialisation

Schemas must be readable at runtime. In other words, we must provide a way to read/write schemas from/to the wire.

Schemas composed only of One, :+: and :*: (so-called "essential schemas") are (almost) trivially serialisable. But schemas that contain Isos (namely RecordSchema, Union, SeqSchema, etc.) aren't, since Isos are โ€“ roughly โ€“ functions, and functions aren't trivially serialisable.

On the other hand, these "iso-based" GADT members are merely a way to build schemas for user-defined Scala classes, which is only a part of our intended use cases. We also want to provide a safe way to deal with "dynamic data", eg. processing JSON documents (or Avro records, etc.) at runtime without necessarily having to coerce them to a case class.

These "dynamic" use cases are the ones that are most likely to require schema serialisation, so providing a way to serialise "essential schemas" only sounds like an acceptable first step.

Implement a JSON serializer

We should now have enough material to try and implement a first serializer.

The main idea is to implement a function like the following:

def jsonSerializer[A](schema: Schema[A])(implicit prims: ToJson[Prim[A]]): A => JSON = ???

Where ToJson is a simple typeclass needed to convert the primitive types of the module to JSON.
The actual JSON representation isn't very important for now, we can just define type JSON = String for the moment.

Implement ScalaSchema

In src/main/scala/schemas.scala there is a unimplemented ScalaSchema object.

We need to give it a proper implementation of type Prim[A], ie the "set" of primitive types in the Scala world. That would be a subset of the kind * types in the standard library: Int, Boolean, String, etc.

I'm not sure there is a convenient way to encode that but this is worth a try.

Avro

Add an avro module that would provide ways to encode/decode arbitrary data to/from org.apache.avro.GenericContainer given a schema.

This module should also provide ways to encode/decode our Schema GADT to/from org.apache.avro.Schema (this is related to #30)

Three tiered documentation

Before releasing a first version, we must provide a three-tiered documentation, each tier targeting a specific "level" of interaction with the library.

  • User : the general philosophy behind the library, its intended purpose and goals; step-by-step descriptions of the most common use-cases (typeclass derivation, schema evolution), for every provided module.
  • Power-user : how to extend the library, how the derivation mechanism works internally and how to create new modules and interpreters, how the migration mechanism works and how to implement new/custom migrations.
  • Contributor : the design decisions that have lead to the current implementation and the reasons behind them, and more generally anything that is useful to know before contributing to the core of the library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.