spartanz / parserz Goto Github PK

A purely-functional library for creating both parsers, pretty-printers, and grammar definitions from a single, type-safe specification of a grammar

License: Apache License 2.0

Scala 100.00%

scala invertible parser-combinators functional-programming

parserz's Issues

Replace parsers.ProductFunctor with scalaz.Zip

Because I've reinvented the wheel 🙄

Allow recursive grammars without the explicit use of `Delay`

Everywhere in this file, I'd make sure you are aggressively lazy, so we can have recursive grammars. It will require a trick (identity map) to derive the graph from that, but it can be done.

Originally posted by @jdegoes in #52

Do not use overloaded names for combinators

... because type inference seems not very good in IntelliJ 😐

`InSet` combinator

To describe things that are parts of a predefined set, like chars or strings. To remove boilerplate. It can play nice with bnf rendering too.

The M question

Grammar cannot be a Monad, but it may look like it

Do not use remainder of input

These are current signatures of parser and printer

def parser[S, E, A](grammar: Grammar[S, S, E, A]): (S, Input) => (S, E \/ (Input, A))
def printer[S, E, A](grammar: Grammar[S, S, E, A]): (S, (Input, A)) => (S, E \/ Input)

They assume chipping off the input (and appending to current output) at every consumption step defined by a user, for example

  val spacing: G[Unit] = consumePure(
    cs => (cs.dropWhile(c => c == ' ' || c == '\n' || c == '\r'), ()),
    { case (cs, _) => ' ' :: cs }
  )

Because it's defined by user it may be arbitrary and inefficient. Because it's mandatory, it is also a source of inefficiency due to allocations that must happen.

The idea is to prototype a solution that allows inner optimizations of accessing the next consumed piece of input, for example using random access, without needing to have user-defined functions and requiring allocations.
This means instead of user-defined functions we have to allow to describe how consumption is supposed to happen. The complexity arises from the fact that Input is currently user-defined type and "description" would require to have implementation of consumption in the library itself, so we may need to constraint the Input to only certain types, i.e. specialize on certain types of Input.

Composition: use one codec after another

I'm trying to understand the use case and how it can be expressed.

We have

case class Codec[I, A](eq: Equiv[I, (I, A)])

which consumes a bit of I to produce A and returns (remainder of I) + A.

We already have product operation to combine two codecs

def ~ [B](that: Codec[I, B]): Codec[I, (A, B)]

which consumes a bit of I to produce A and then consumes a bit of remainder of I to produce B and returns (what is yet remained of I) + (A, B).

I've built (see the difference in that type parameters)

def >> [B](that: Codec[A, B]): Codec[I, (A, B)]

which consumes a bit of I to produce A and then consumes a bit of A to produce B and returns (the remainder of I) + (the remainder of A) + B.

Questions:

Is the use case real?
The signature is confusing, i.e. going from I to (A, B) in both cases, and I believe product is the logical assumption here. Can def >> be expressed differently?
Obviously Codec isn't a monad. Does def >> has anything to do with monadic composition?

Derivation of Equiv instances

Given the idea described in #75, will it be possible to use Magnolia to derive instances like

final case class Person(name: String, age: Int, address: String)

implicit val personEquiv: Equiv[((String, Int), String), Person] = 
  Equiv.caseClass(Person.apply(_, _, _), Person.unapply)

via a concise syntax

implicit val personEquiv: Equiv[((String, Int), String), Person] = equiv[Person]

V.0.2

Use free encoding for library internals

Print the parsers structure

Allow introspection of parsers structure, for example print it as https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form

The idea is to have documentation provided by the program itself without relying on comments, configurations or external tools that perform code analysis.

Scalaz7 support

Explicit error type parameter

Define parsers as

sealed trait Parsing[F[_], G[_], E]

and require ApplicativeError[F, E] instead of Applicative[F]

Hard question: how it affects parsers that rely on "optional" effect?

Explore removing nested tuples and eithers

Given we have the following combinators

def ~ [SI1 <: SI, SO1 >: SO, E1 >: E, B](that: Grammar[SI1, SO1, E1, B]): Grammar[SI1, SO1, E1, A /\ B]
def | [SI1 <: SI, SO1 >: SO, E1 >: E, B](that: Grammar[SI1, SO1, E1, B]): Grammar[SI1, SO1, E1, A \/ B]

that produce nested structures when chained, e.g.

val g1: Grammar[SI1, SO1, E, ((A, B), C)] = a ~ b ~ c

Explore the concept of building a parser/printer that skips creation of nested structures when executed.

Idea is to use a witness type with 2 parameters

type G[_, _]

where 1st is a phantom type to capture what is produced by using parser combinators and 2nd is some representation of the 1st.

For example

sealed trait Equiv[A, Repr]
object Equiv {
  def caseClass1[Z, A](f: A => Z, g: Z => Option[A]): Equiv[A, Z] = ???
  def caseClass2[Z, A, B](f: (A, B) => Z, g: Z => Option[(A, B)]): Equiv[(A, B), Z] = ???
  def caseClass3[Z, A, B, C](f: (A, B, C) => Z, g: Z => Option[(A, B, C)]): Equiv[((A, B), C), Z] = ???
}

This way a

final case class Person(name: String, age: Int, address: String)

can be isomorphic to

((String, Int), String)

via means of Equiv.

def to[Z](implicit equiv: Equiv[A, Z]): Grammar[SI, SO, E, Z]

can be used to describe an intent for isomorphic transformation.
Existing program in terms of grammars should be kept to be further optimized into an efficient code that is not doing extra allocations.

ZipR and ZipL combinators

To have A and B zipped into just A or just B ignoring the other side.

`alt` for `A | A`

Handling a | b | c is cumbersome due to nested disjunction handling, e.g.

case Left(Left(a)) =>

and requires code changes if combination also changes, e.g. becomes a | b | c | d.

There must be a way to remove the clutter from handling function.

sealed abstract class Grammar[-SI, +SO, +E, A]

This results in state container (a Tuple) being allocated and returned at each step of execution

case Grammar.GADT.Produce(a)      => (s: S, i: Input) => (s, Right((i, a)))

which is a huge performance overhead.

What if state is only returned at the very end?
So instead of

def parser[S, E, A](grammar: Grammar[S, S, E, A]): (S, Input) => (S, E \/ (Input, A))

we have

def parser[E, A](grammar: Grammar[E, A]): Input => (State, E \/ (Input, A))

spartanz / parserz Goto Github PK

parserz's Issues

Recommend Projects

Recommend Topics

Recommend Org