spartanz / parserz Goto Github PK
View Code? Open in Web Editor NEWA purely-functional library for creating both parsers, pretty-printers, and grammar definitions from a single, type-safe specification of a grammar
License: Apache License 2.0
A purely-functional library for creating both parsers, pretty-printers, and grammar definitions from a single, type-safe specification of a grammar
License: Apache License 2.0
It runs monad stack more than once
Because I've reinvented the wheel ๐
... because type inference seems not very good in IntelliJ ๐
To describe things that are parts of a predefined set, like chars or strings. To remove boilerplate. It can play nice with bnf rendering too.
Grammar cannot be a Monad, but it may look like it
These are current signatures of parser and printer
def parser[S, E, A](grammar: Grammar[S, S, E, A]): (S, Input) => (S, E \/ (Input, A))
def printer[S, E, A](grammar: Grammar[S, S, E, A]): (S, (Input, A)) => (S, E \/ Input)
They assume chipping off the input (and appending to current output) at every consumption step defined by a user, for example
val spacing: G[Unit] = consumePure(
cs => (cs.dropWhile(c => c == ' ' || c == '\n' || c == '\r'), ()),
{ case (cs, _) => ' ' :: cs }
)
Because it's defined by user it may be arbitrary and inefficient. Because it's mandatory, it is also a source of inefficiency due to allocations that must happen.
The idea is to prototype a solution that allows inner optimizations of accessing the next consumed piece of input, for example using random access, without needing to have user-defined functions and requiring allocations.
This means instead of user-defined functions we have to allow to describe how consumption is supposed to happen. The complexity arises from the fact that Input
is currently user-defined type and "description" would require to have implementation of consumption in the library itself, so we may need to constraint the Input
to only certain types, i.e. specialize on certain types of Input
.
I'm trying to understand the use case and how it can be expressed.
We have
case class Codec[I, A](eq: Equiv[I, (I, A)])
which consumes a bit of I
to produce A
and returns (remainder of I
) + A
.
We already have product operation to combine two codecs
def ~ [B](that: Codec[I, B]): Codec[I, (A, B)]
which consumes a bit of I
to produce A
and then consumes a bit of remainder of I
to produce B
and returns (what is yet remained of I
) + (A, B)
.
I've built (see the difference in that
type parameters)
def >> [B](that: Codec[A, B]): Codec[I, (A, B)]
which consumes a bit of I
to produce A
and then consumes a bit of A
to produce B
and returns (the remainder of I
) + (the remainder of A
) + B
.
Questions:
I
to (A, B)
in both cases, and I believe product is the logical assumption here. Can def >>
be expressed differently?Codec
isn't a monad. Does def >>
has anything to do with monadic composition?Given the idea described in #75, will it be possible to use Magnolia to derive instances like
final case class Person(name: String, age: Int, address: String)
implicit val personEquiv: Equiv[((String, Int), String), Person] =
Equiv.caseClass(Person.apply(_, _, _), Person.unapply)
via a concise syntax
implicit val personEquiv: Equiv[((String, Int), String), Person] = equiv[Person]
?
Use free encoding for library internals
Allow introspection of parsers structure, for example print it as https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
The idea is to have documentation provided by the program itself without relying on comments, configurations or external tools that perform code analysis.
Define parsers as
sealed trait Parsing[F[_], G[_], E]
and require ApplicativeError[F, E]
instead of Applicative[F]
Hard question: how it affects parsers that rely on "optional" effect?
Given we have the following combinators
def ~ [SI1 <: SI, SO1 >: SO, E1 >: E, B](that: Grammar[SI1, SO1, E1, B]): Grammar[SI1, SO1, E1, A /\ B]
def | [SI1 <: SI, SO1 >: SO, E1 >: E, B](that: Grammar[SI1, SO1, E1, B]): Grammar[SI1, SO1, E1, A \/ B]
that produce nested structures when chained, e.g.
val g1: Grammar[SI1, SO1, E, ((A, B), C)] = a ~ b ~ c
Explore the concept of building a parser/printer that skips creation of nested structures when executed.
Idea is to use a witness type with 2 parameters
type G[_, _]
where 1st is a phantom type to capture what is produced by using parser combinators and 2nd is some representation of the 1st.
For example
sealed trait Equiv[A, Repr]
object Equiv {
def caseClass1[Z, A](f: A => Z, g: Z => Option[A]): Equiv[A, Z] = ???
def caseClass2[Z, A, B](f: (A, B) => Z, g: Z => Option[(A, B)]): Equiv[(A, B), Z] = ???
def caseClass3[Z, A, B, C](f: (A, B, C) => Z, g: Z => Option[(A, B, C)]): Equiv[((A, B), C), Z] = ???
}
This way a
final case class Person(name: String, age: Int, address: String)
can be isomorphic to
((String, Int), String)
via means of Equiv
.
def to[Z](implicit equiv: Equiv[A, Z]): Grammar[SI, SO, E, Z]
can be used to describe an intent for isomorphic transformation.
Existing program in terms of grammars should be kept to be further optimized into an efficient code that is not doing extra allocations.
To have A
and B
zipped into just A
or just B
ignoring the other side.
Handling a | b | c
is cumbersome due to nested disjunction handling, e.g.
case Left(Left(a)) =>
and requires code changes if combination also changes, e.g. becomes a | b | c | d
.
There must be a way to remove the clutter from handling function.
Requires spartanz/sbt-org-policies#3
It is not possible to use Option
because there is no ApplicativeError
instance for it.
TODO: Use Unit \/ A
in the example.
Currently state is exposed in grammars
sealed abstract class Grammar[-SI, +SO, +E, A]
This results in state container (a Tuple) being allocated and returned at each step of execution
case Grammar.GADT.Produce(a) => (s: S, i: Input) => (s, Right((i, a)))
which is a huge performance overhead.
What if state is only returned at the very end?
So instead of
def parser[S, E, A](grammar: Grammar[S, S, E, A]): (S, Input) => (S, E \/ (Input, A))
we have
def parser[E, A](grammar: Grammar[E, A]): Input => (State, E \/ (Input, A))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.