cloudsmith-io / grako Goto Github PK

Copyright (C) 2017      by Juancarlo Añez
Copyright (C) 2012-2016 by Juancarlo Añez and Thomas Bragg

        def _somerulename_(self):
            ...
        def somerulename(self, ast):
            return ast
def _default(self, ast):
    ...
    return ast
def _postproc(self, context, ast):
    ...
GRAMMAR = '''
    @@grammar::Calc

    start = expression $ ;

    expression
        =
        | term '+' ~ expression
        | term '-' ~ expression
        | term
        ;

    term
        =
        | factor '*' ~ term
        | factor '/' ~ term
        | factor
        ;

    factor
        =
        | '(' ~ @:expression ')'
        | number
        ;

    number = /\d+/ ;
'''

def main():
    import pprint
    import json
    from grako import parse
    from grako.util import asjson

    ast = parse(GRAMMAR, '3 + 5 * ( 10 - 20 )')
    print('PPRINT')
    pprint.pprint(ast, indent=2, width=20)
    print()

    json_ast = asjson(ast)
    print('JSON')
    print(json.dumps(json_ast, indent=2))
    print()

if __name__ == '__main__':
    main()
PPRINT
[ '3',
  '+',
  [ '5',
    '*',
    [ '10',
      '-',
      '20']]]

JSON
[
  "3",
  "+",
  [
    "5",
    "*",
    [
      "10",
      "-",
      "20"
    ]
  ]
]
$ python -m grako
$ scripts/grako
$ grako
$ python -m grako -h
usage: grako [--generate-parser | --draw | --object-model | --pretty]
            [--color] [--trace] [--no-left-recursion] [--name NAME]
            [--no-nameguard] [--outfile FILE] [--object-model-outfile FILE]
            [--whitespace CHARACTERS] [--help] [--version]
            GRAMMAR

Grako (for "grammar compiler") takes a grammar in a variation of EBNF as
input, and outputs a memoizing PEG/Packrat parser in Python.

positional arguments:
GRAMMAR               the filename of the Grako grammar to parse

optional arguments:
--generate-parser     generate parser code from the grammar (default)
--draw, -d            generate a diagram of the grammar (requires --outfile)
--object-model, -g    generate object model from the class names given as
                        rule arguments
--pretty, -p          generate a prettified version of the input grammar

parse-time options:
--color, -c           use color in traces (requires the colorama library)
--trace, -t           produce verbose parsing output

generation options:
--no-left-recursion, -l
                        turns left-recusion support off
--name NAME, -m NAME  Name for the grammar (defaults to GRAMMAR base name)
--no-nameguard, -n    allow tokens that are prefixes of others
--outfile FILE, --output FILE, -o FILE
                        output file (default is stdout)
--object-model-outfile FILE, -G FILE
                        generate object model and save to FILE
--whitespace CHARACTERS, -w CHARACTERS
                        characters to skip during parsing (use "" to disable)

common options:
--help, -h            show this help message and exit
--version, -v         provide version information and exit
$
from myparser import MyParser

parser = MyParser()
ast = parser.parse('text to parse', rule_name='start')
print(ast)
print(json.dumps(ast, indent=2)) # ASTs are JSON-friendy
model = parser.parse(text, rule_name='start', semantics=MySemantics())
class MySpecialBuffer(MyLanguageBuffer):
    ...

buf = MySpecialBuffer(text)
model = parser.parse(buf, rule_name='start', semantics=MySemantics())
$ python myparser.py inputfile startrule
$ python myparser.py -h
usage: myparser.py [-h] [-c] [-l] [-n] [-t] [-w WHITESPACE] FILE [STARTRULE]

Simple parser for DBD.

positional arguments:
    FILE                  the input file to parse
    STARTRULE             the start rule for parsing

optional arguments:
    -h, --help            show this help message and exit
    -c, --color           use color in traces (requires the colorama library)
    -l, --list            list all rules and exit
    -n, --no-nameguard    disable the 'nameguard' feature
    -t, --trace           output trace information
    -w WHITESPACE, --whitespace WHITESPACE
                        whitespace specification
name = <expre> ;
FRAGMENT = /[a-z]+/ ;
A `|` be be used before the first option if desired:

    choices
        =
        | e1
        | e2
        | e3
        ;

In this example, other options won't be considered if a
parenthesis is parsed:

    atom
        =
          '(' ~ @:expre ')'
        | int
        | bool
        ;

    e {s ~ e}

yet the result is a single list of the form:

    [e, s, e, s, e....]

Use grouping if `s` is more complex than a *token* or a *pattern*:

    (s t)%{ e }+

It is equivalent to:

    s%{e}+|{}

The expression:

    '+'<{/\d+/}+

Will parse this input:

    1 + 2 + 3 + 4

To this tree:

    (
        '+',
        (
            '+',
            (
                '+',
                '1',
                '2'
            ),
            '3'
        ),
        '4'
    )

The expression:

    '+'>{/\d+/}+

Will parse this input:

    1 + 2 + 3 + 4

To this tree:

    (
        '+',
        '1',
        (
            '+',
            '2',
            (
                '+',
                '3',
                '4'
            )
        )
    )

It is equivalent to:

    s.{e}+|{}

Note that if *text* is alphanumeric, then **Grako** will check
that the character following the token is not alphanumeric. This
is done to prevent tokens like *IN* matching when the text ahead
is *INITIALIZE*. This feature can be turned off by passing
`nameguard=False` to the `Parser` or the `Buffer`, or by using a
pattern expression (see below) instead of a token expression.
Alternatively, the `@@nameguard` or `@@namechars` directives may
be specified in the grammar:

    @@nameguard :: False

or to specify additional characters that should also be considered
part of names:

    @@namechars :: '$-.'

The *regex* is interpreted as a [Python]'s [raw string literal] and
passed *as-is* to the [Python][] [re] module (or to
[regex], if available), using `match()` at the current position in
the text. The matched text is the [AST][Abstract Syntax Tree] for
the expression.

Consecutive patterns are concatenated to form a single one.

Constants can be used to inject elements into the concrete and
abstract syntax trees, perhaps avoiding having to write a
semantic action. For example:

    boolean_option = name ['=' (boolean|`true`) ] ;

The following set of declarations:

    includable = exp1 ;

    expanded = exp0 >includable exp2 ;

Has the same effect as defining *expanded* as:

    expanded = exp0 exp1 exp2 ;

Note that the included rule must be defined before the rule that
includes it.

The override operator is useful to recover only part of the right
hand side of a rule without the need to name it, or add a
semantic action.

This is a typical use of the override operator:

    subexp = '(' @:expre ')' ;

The [AST][Abstract Syntax Tree] returned for the `subexp` rule
will be the [AST][Abstract Syntax Tree] recovered from invoking
`expre`.

This operator is convenient in cases such as:

    arglist = '(' @+:arg {',' @+:arg}* ')' ;

In which the delimiting tokens are of no interest.

number = /[0-9]+/ ;
number = number:/[0-9]+/ ;
addition(Add, op='+')
    =
    addend '+' addend
    ;
addition::Add, '+'
    =
    addend '+' addend
    ;
def addition(self, ast, name, op=None):
    ...
def _default(self, ast, *args, **kwargs):
    ...
base::Param = exp1 ;

extended < base = exp2 ;
extended::Param = exp1 exp2 ;
start = ab $;

ab = 'xyz' ;

@override
ab = @:'a' {@:'b'} ;
ParseInfo = namedtuple(
    'ParseInfo',
    [
        'buffer',
        'rule',
        'pos',
        'endpos',
        'line',
        'endline',
    ]
)
$ grako -m MyLanguage mygrammar.ebnf
class MyLanguageParser(Parser):
    ...
@@grammar :: MyLanguage
parser = MyParser(text, whitespace='\t ')
parser = MyParser(text, whitespace=re.compile(r'[\t ]+'))
parser = MyParser(text, whitespace='')
@@whitespace :: /[\t ]+/
parser = MyParser(text, ignorecase=True)
@@ignorecase :: True
parser = MyParser(text, comments_re="\(\*.*?\*\)")
parser = MyParser(
    text,
    comments_re="\(\*.*?\*\)",
    eol_comments_re="#.*?$"
)
@@comments :: /\(\*.*?\*\)/
@@eol_comments :: /#.*?$/
@@keyword :: if endif
@@keyword :: else elseif
@name
identifier = /(?!\d)\w+/ ;
statements = {!'END' statement}+ ;
class MySemantics(object):
    def some_rule_name(self, ast):
        return ''.join(ast)

    def _default(self, ast):
        pass
def _default(self, ast):
    ...
class MyLanguageSemantics(object):
    def identifier(self, ast):
        if my_lange_module.is_keyword(ast):
            raise FailedSemantics('"%s" is a keyword' % str(ast))
        return ast
myrule = first_part preproc {second_part} ;

preproc = () ;
def preproc(self, ast):
    ...
#include :: "filename"
from grako.model import ModelBuilderSemantics

parser = MyParser(semantics=ModelBuilderSemantics())
addition::AddOperator = left:mulexpre '+' right:addition ;
integer::int = /[0-9]+/ ;
from grako.model import NodeWalker

class MyNodeWalker(NodeWalker):

    def walk_AddOperator(self, node):
        left = self.walk(node.left)
        right = self.walk(node.right)

        print('ADDED', left, right)

model = MyParser(semantics=ModelBuilderSemantics()).parse(input)

walker = MyNodeWalker()
walker.walk(model)
def walk_Node(self, node):
    print('Reached Node', node)

def walk_str(self, s):
    return s

def walk_object(self, o):
    raise Exception('Unexpected tyle %s walked', type(o).__name__)
from mymodel import AddOperator, MulOperator

semantics=ModelBuilderSemantics(types=[AddOperator, MulOperator])
additive
    =
    | addition
    | substraction
    ;

addition::AddOperator::Operator
    =
    left:mulexpre op:'+' right:additive
    ;

substraction::SubstractOperator::Operator
    =
    left:mulexpre op:'-' right:additive
    ;
class MyNodeWalker(NodeWalker):
    def walk_Operator(self, node):
        left = self.walk(node.left)
        right = self.walk(node.right)
        op = self.walk(node.op)

        print(type(node).__name__, op, left, right)

class Operator(ModelRenderer):
    template = '{left} {op} {right}'
class Lookahead(ModelRenderer):
    template = '''\
                with self._if():
                {exp:1::}\
                '''
    '''
    {fieldname:ind:sep:fmt}
    '''
indent(sep.join(fmt % render(v) for v in value), ind)
indent(fmt % render(value), ind)
parser = MyParser(
    text,
    left_recursion=True,
)
@@left_recursion :: False

cloudsmith-io / grako Goto Github PK

grako's Introduction

Grako

Table of Contents

Rationale

The Generated Parsers

Using the Tool

As a Library

Compiling grammars to Python

Using the Generated Parser

Grammar Syntax

Rules

Expressions

e1 | e2

e1 e2

( e )

[ e ]

{ e } or { e }*

{ e }+

{}

~

s%{ e }+

s%{ e } or s%{ e }*

op<{ e }+

op>{ e }+

s.{ e }+

s.{ e } or s.{ e }*

&e

!e

'text' or "text"

r'text' or r"text"

?"regexp" or ?'regexp'

`constant`

rulename

>rulename

()

!()

name:e

name+:e

@:e

@+:e

$

# comment

Deprecated Expressions

Rules with Arguments

Based Rules

Rule Overrides

Abstract Syntax Trees (ASTs)

Grammar Name

Whitespace

Case Sensitivity

Comments

Reserved Words and Keywords

Semantic Actions

Include Directive

Building Models

Walking Models

Model Class Hierarchies

Templates and Translation

Left Recursion

Examples

Grako

Regex

Calc

antlr2grako

Other open-source Examples

License

Contact and Updates

Credits

Contributors

Changes

Recommend Projects

Recommend Topics

Recommend Org

`e1 | e2`

`e1 e2`

`( e )`

`[ e ]`

`{ e }` or `{ e }*`

`{ e }+`

`{}`

`~`

`s%{ e }+`

`s%{ e }` or `s%{ e }*`

`op<{ e }+`

`op>{ e }+`

`s.{ e }+`

`s.{ e }` or `s.{ e }*`

`&e`

`!e`

`'text'` or `"text"`

`r'text'` or `r"text"`

`?"regexp"` or `?'regexp'`

`rulename`

`>rulename`

`()`

`!()`

`name:e`

`name+:e`

`@:e`

`@+:e`

`$`

`#` comment