Giter Site home page Giter Site logo

raabro's Introduction

raabro

tests Gem Version

A very dumb PEG parser library.

Son to aabro, grandson to neg, grand-grandson to parslet. There is also a javascript version jaabro.

a sample parser/rewriter

You use raabro by providing the parsing rules, then some rewrite rules.

The parsing rules make use of the raabro basic parsers seq, alt, str, rex, eseq, ...

The rewrite rules match names passed as first argument to the basic parsers to rewrite the resulting parse trees.

require 'raabro'


module Fun include Raabro

  # parse
  #
  # Last function is the root, "i" stands for "input".

  def pstart(i); rex(nil, i, /\(\s*/); end
  def pend(i); rex(nil, i, /\)\s*/); end
    # parentheses start and end, including trailing white space

  def comma(i); rex(nil, i, /,\s*/); end
    # a comma, including trailing white space

  def num(i); rex(:num, i, /-?[0-9]+\s*/); end
    # name is :num, a positive or negative integer

  def args(i); eseq(nil, i, :pstart, :exp, :comma, :pend); end
    # a set of :exp, beginning with a (, punctuated by commas and ending with )

  def funame(i); rex(nil, i, /[a-z][a-z0-9]*/); end
  def fun(i); seq(:fun, i, :funame, :args); end
    # name is :fun, a function composed of a function name
    # followed by arguments

  def exp(i); alt(nil, i, :fun, :num); end
    # an expression is either (alt) a function or a number

  # rewrite
  #
  # Names above (:num, :fun, ...) get a rewrite_xxx function.
  # "t" stands for "tree".

  def rewrite_exp(t); rewrite(t.children[0]); end
  def rewrite_num(t); t.string.to_i; end

  def rewrite_fun(t)

    funame, args = t.children

    [ funame.string ] +
    args.gather.collect { |e| rewrite(e) }
      #
      # #gather collect all the children in a tree that have
      # a name, in this example, names can be :exp, :num, :fun
  end
end


p Fun.parse('mul(1, 2)')
  # => ["mul", 1, 2]

p Fun.parse('mul(1, add(-2, 3))')
  # => ["mul", 1, ["add", -2, 3]]

p Fun.parse('mul (1, 2)')
  # => nil (doesn't accept a space after the function name)

This sample is available at: doc/readme0.rb.

custom rewrite()

By default, a parser gets a rewrite(t) that looks at the parse tree node names and calls the corresponding rewrite_{node_name}().

It's OK to provide a custom rewrite(t) function.

module Hello include Raabro

  def hello(i); str(:hello, i, 'hello'); end

  def rewrite(t)
    [ :ok, t.string ]
  end
end

basic parsers

One makes a parser by composing basic parsers, for example:

  def args(i); eseq(:args, i, :pa, :exp, :com, :pz); end
  def funame(i); rex(:funame, i, /[a-z][a-z0-9]*/); end
  def fun(i); seq(:fun, i, :funame, :args); end

where the fun parser is a sequence combining the funame parser then the args one. :fun (the first argument to the basic parser seq) will be the name of the resulting (local) parse tree.

Below is a list of the basic parsers provided by Raabro.

The first parameter to the basic parser is the name used by rewrite rules. The second parameter is a Raabro::Input instance, mostly a wrapped string.

def str(name, input, string)
  # matching a string

def rex(name, input, regex_or_string)
  # matching a regexp
  # no need for ^ or \A, checks the match occurs at current offset

def seq(name, input, *parsers)
  # a sequence of parsers

def alt(name, input, *parsers)
  # tries the parsers returns as soon as one succeeds

def altg(name, input, *parsers)
  # tries all the parsers, returns with the longest match

def rep(name, input, parser, min, max=0)
  # repeats the the wrapped parser

def nott(name, input, parser)
  # succeeds if the wrapped parser fails, fails if it succeeds

def ren(name, input, parser)
  # renames the output of the wrapped parser

def jseq(name, input, eltpa, seppa)
  #
  # seq(name, input, eltpa, seppa, eltpa, seppa, eltpa, seppa, ...)
  #
  # a sequence of `eltpa` parsers separated (joined) by `seppa` parsers

def eseq(name, input, startpa, eltpa, seppa, endpa)
  #
  # seq(name, input, startpa, eltpa, seppa, eltpa, seppa, ..., endpa)
  #
  # a sequence of `eltpa` parsers separated (joined) by `seppa` parsers
  # preceded by a `startpa` parser and followed by a `endpa` parser

the seq parser and its quantifiers

seq is special, it understands "quantifiers": '?', '+' or '*'. They make behave seq a bit like a classical regex.

The '!' (bang, not) quantifier is explained at the end of this section.

module CartParser include Raabro

  def fruit(i)
    rex(:fruit, i, /(tomato|apple|orange)/)
  end
  def vegetable(i)
    rex(:vegetable, i, /(potato|cabbage|carrot)/)
  end

  def cart(i)
    seq(:cart, i, :fruit, '*', :vegetable, '*')
  end
    # zero or more fruits followed by zero or more vegetables
end

(Yes, this sample parser parses string like "appletomatocabbage", it's not very useful, but I hope you get the point about .seq)

The '!' (bang, not) quantifier is a kind of "negative lookahead".

  def menu(i)
    seq(:menu, i, :mise_en_bouche, :main, :main, '!', :dessert)
  end

Lousy example, but here a main cannot follow a main.

trees

An instance of Raabro::Tree is passed to rewrite() and rewrite_{name}() functions.

The most useful methods of this class are:

class Raabro::Tree

  # Look for the first child or sub-child with the given name.
  # If the given name is nil, looks for the first child with a name (not nil).
  #
  def sublookup(name=nil)

  # Gathers all the children or sub-children with the given name.
  # If the given name is nil, gathers all the children with a name (not nil).
  # When a child matches, does not pursue gathering from the children of the
  # matching child.
  #
  def subgather(name=nil)
end

I'm using "child or sub-child" instead of "descendant" because once a child or sub-child matches, those methods do not consider the children or sub-children of that matching entity.

Here is a closeup on the rewrite functions of the sample parser at doc/readme1.rb (extracted from an early version of floraison/dense):

require 'raabro'

module PathParser include Raabro

  # (...)

  def rewrite_name(t); t.string; end
  def rewrite_off(t); t.string.to_i; end
  def rewrite_index(t); rewrite(t.sublookup); end
  def rewrite_path(t); t.subgather(:index).collect { |tt| rewrite(tt) }; end
end

Where rewrite_index(t) returns the result of the rewrite of the first of its children that has a name and rewrite_path(t) collects the result of the rewrite of all of its children that have the "index" name.

errors

By default, a parser will return nil when it cannot successfully parse the input.

For example, given the above Fun parser, parsing some truncated input would yield nil:

tree = Sample::Fun.parse('f(a, b')
  # yields `nil`...

One can reparse with error: true and receive an error array with the parse error details:

err = Sample::Fun.parse('f(a, b', error: true)
  # yields:
  # [ line, column, offset, error_message, error_visual ]
[ 1, 4, 3, 'parsing failed .../:exp/:fun/:arg', "f(a, b\n   ^---" ]

The last string in the error array looks like when printed out:

f(a, b
   ^---

error when not all is consumed

Consider the following toy parser:

module ToPlus include Raabro

  # parse

  def to_plus(input); rep(:tos, input, :to, 1); end

  # rewrite

  def rewrite(t); [ :ok, t.string ]; end
end
Sample::ToPlus.parse('totota')
  # yields nil since all the input was not parsed, "ta" is remaining

Sample::ToPlus.parse('totota', all: false)
  # yields
[ :ok, "toto" ]
  # and doesn't care about the remaining input "ta"

Sample::ToPlus.parse('totota', error: true)
  # yields
[ 1, 5, 4, "parsing failed, not all input was consumed", "totota\n    ^---" ]

The last string in the error array looks like when printed out:

totota
    ^---

LICENSE

MIT, see LICENSE.txt

raabro's People

Contributors

dependabot[bot] avatar henrik avatar jmettraux avatar olleolleolle avatar petergoldstein avatar utilum avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

raabro's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.