goccmack / gogll Goto Github PK

Generates generalised LL (GLL) and reduced size LR(1) parsers with matching lexers

License: Apache License 2.0

Go 99.93% Makefile 0.07%

compiler-construction compiler-frontend context-free-grammars gll go golang lexer-generator lr-1 parser-generator rust rust-lang rustlang

gogll's People

Contributors

Stargazers

Watchers

gogll's Issues

Bug with using "\""

I'm using gogll version:

> gogll -version
gogll v3.2.0

If I run the following simple program:

package "hello"

Aa: "\"";

It fails with:

> gogll jsony.md
panic: runtime error: index out of range [2] with length 1

goroutine 1 [running]:
github.com/goccmack/gogll/ast.(*CharLiteral).Char(0xc00000c5c0, 0x1)
        /home/agus/go/src/github.com/goccmack/gogll/ast/lex.go:171 +0x1d9
github.com/goccmack/gogll/lex/items/event.Subset(0x68cc20, 0xc00000c5c0, 0x68cc20, 0xc00000c5c0, 0xc00000c5c0)
        /home/agus/go/src/github.com/goccmack/gogll/lex/items/event/event.go:105 +0x2a5
github.com/goccmack/gogll/lex/items.(*Set).nextSets(0xc00011e1c0, 0x0, 0x0, 0x0)
        /home/agus/go/src/github.com/goccmack/gogll/lex/items/items.go:179 +0x119
github.com/goccmack/gogll/lex/items.New(0xc00005c280, 0xc00000c5a0)
        /home/agus/go/src/github.com/goccmack/gogll/lex/items/items.go:65 +0x24f
main.main()
        /home/agus/go/src/github.com/goccmack/gogll/main.go:84 +0x21c

If this isn't a bug but a conceptual problem, it should probably fail with a more informative error.

Thanks for the project, it's very nice to play with BNF!

Question: What's the recommended way of dealing with comments?

I just experimented with gogll, and it's pretty awesome. Thanks for working on this!

As stated in the title, I was wondering what the suggested way is of dealing with code comments. As actual tokens in the grammar? Or would it make sense to allow the unicode.IsSpace() call in lexer.New() to be replaced with a custom version that skips over comments?

Incorrect installation instructions

Because 'go get' is no longer supported outside a module, go get github.com/goccmack/gogll/v3 (as shown in the readme) no longer works (and implies that the code you're pulling in is a library).

go install github.com/goccmack/gogll/v3@latest is the comparable command, reflected in this change, which properly installs the latest version of gogll to $GOPATH/bin .

Error supporting \n as part of the grammar

In BASIC there is no end-of-statement marker (like ; in C). Instead, the end-of-line is the marker. Event if I remove \n from !whitespace character, I'm still not allow to compile the grammar. Here is a simplified example:

\n is part of the syntax. See, it is not part of !whiespace

package "BUG_REPPORT"

File
	: DeclList                             							
;

DeclList
	: Stmt
	| DeclList   Stmt                   						
	;

Stmt
	: "Print" string_lit "\n"									 
	;


!line_comment
	: ('R''e''m' | ';') {not "\n" } "\n";

!whitespace : <' ' | '\t' | '\r'  > ;
string_lit 	: (quote {  quote quote | not_quote } quote) ;

This generates the following errors:

arse Errors:
Parse Error: LexZeroOrMore : ∙{ LexAlternates }  I[33]=string_lit (303,307) "\n" at line 21 col 37
Expected one of: ['[,<,>,any,|,),.,lowcase,not,{,},;,[,],number,tokid,(,char_lit,letter,upcase]
Parse Error: LexAlternates : RegExp ∙| LexAlternates  I[32]=} (301,302) } at line 21 col 35
Expected one of: [|]
Parse Error: RegExp : LexSymbol ∙RegExp  I[32]=} (301,302) } at line 21 col 35
Expected one of: [char_lit,.,<,[,lowcase,not,tokid,{,'[,letter,number,any,upcase,(]
Parse Error: RegExp : ∙LexSymbol  I[29]={ (291,292) { at line 21 col 25
Expected one of: [|,},),;,>,]]
Parse Error: RegExp : LexSymbol ∙RegExp  I[28]=) (289,290) ) at line 21 col 23
Expected one of: [(,upcase,char_lit,[,lowcase,not,tokid,{,'[,.,<,any,letter,number]
Parse Error: LexAlternates : RegExp ∙| LexAlternates  I[28]=) (289,290) ) at line 21 col 23
Expected one of: [|]
Parse Error: LexAlternates : ∙RegExp  I[26]=| (284,285) | at line 21 col 18
Expected one of: [),>,],}]
Parse Error: RegExp : LexSymbol ∙RegExp  I[26]=| (284,285) | at line 21 col 18
Expected one of: [(,upcase,char_lit,lowcase,not,tokid,{,'[,.,<,[,any,letter,number]
Parse Error: RegExp : ∙LexSymbol  I[25]=char_lit (280,283) 'm' at line 21 col 14
Expected one of: [|,},),;,>,]]
Parse Error: RegExp : ∙LexSymbol  I[24]=char_lit (277,280) 'e' at line 21 col 11
Expected one of: [),;,>,],|,}]

The expected result would be a valid grammar.

Generated files are set as executable

As stated in the title, and at least on linux, files generated using gogll are marked as executable despite being just plain source code. Is there a particular reason behind this?

This isn't a huge problem, but it does stand out. For reference, here is a screenshot of the file tree. The green files have all been generated by the tool and haven't been tampered with.

Cannot build grammar containing backtick as token

It seems I'm unable to define a token to be the backtick `.

E.g. consider the g1.md grammar:

package "g1"

Exp : Exp Op Exp
    | id
    ;

Op : "&" | "|" ;

id : letter <letter | number> ;

If I were to extend id like so:

id : letter <letter | number | '-'> ;

all would be handled just fine, but if I were to extend it like so:

id : letter <letter | number | '`'> ;

then gogll would no longer be able to build a parser from it:

Parse Errors:
Parse Error: LexAlternates : RegExp | ∙LexAlternates  I[24]=Error (388,391) '`  at line 26 col 32
Expected one of: [.,[,letter,{,(,any,char_lit,lowcase,not,number,upcase,<]
Parse Error: RegExp : LexSymbol ∙RegExp  I[23]=| (386,387) | at line 26 col 30
Expected one of: [any,char_lit,lowcase,number,{,not,upcase,(,.,<,[,letter]
Parse Error: LexAlternates : ∙RegExp  I[23]=| (386,387) | at line 26 col 30
Expected one of: [),>,],}]
Parse Error: RegExp : LexSymbol ∙RegExp  I[21]=| (377,378) | at line 26 col 21
Expected one of: [not,upcase,(,.,<,[,letter,any,char_lit,lowcase,number,{]
Parse Error: LexAlternates : ∙RegExp  I[21]=| (377,378) | at line 26 col 21
Expected one of: [),>,],}]
Parse Error: RegExp : ∙LexSymbol  I[19]=< (369,370) < at line 26 col 13
Expected one of: [>,],|,},),;]
Parse Error: Rules : ∙Rule  I[16]=tokid (357,359) id at line 26 col 1
Expected one of: [EOF]

Fix plain BNF input

The current version of gogll handles markdown grammar files correctly but plan BNF files don't work. For example all x.bnf files in the test directory.

The work-around for now is to use only markdown grammar files.

Add a v3.2.2 tag

Hi Marius, thank you for merging in PR #10.

Would you be able to add a v3.2.2 tag to the repository so we can pin our build to that version of GoGLL?

backslash used as operator (integer division)

in BASIC, the backslash character (\) is used for integer division on float numbers. When I try to use it, I get an error. Here is the code:

package "BUG_REPPORT"

Expr
	: number "\\" number;

And here is the message :

Parse Errors:
Parse Error: SyntaxRule : nt : ∙SyntaxAlternates ;  I[4]=number (36,42) number at line 6 col 7
Expected one of: [empty,nt,string_lit,tokid]

I also tried:

package "BUG_REPPORT"

Expr
	: number intDivOp number;      	    								

intDivOp: '\\';

This gives me the same error as above.

Please support the go module correctly.

I followed the documentation and ran go get github.com/goccmack/gogll and it installed v1.0.4 instead of the latest version v3.2.2.

This is because the major version suffix is not given to the module name, even though the major version of gogll is 2 or higher.

For more details, please refer to the following document.
https://golang.org/ref/mod#module-path
https://github.com/golang/go/wiki/Modules#semantic-import-versioning

To fix this, change the module name declared in the go.mod file as follows

module github.com/goccmack/gogll/v3

In addition, make the same change to all import statements.

Is there a char_set?

The Readme.md mentions a char_lit and char_set. But char_set seems undefined. How do I specify 'a'-'z' (a to z)?

Examples fail to parse

After struggling to get gogll to parse a fairly basic grammar I reverted to the examples, and found that neither the GoGLL grammar nor the Json grammar currently parse; the only grammar I could get to parse was boolx?

> curl https://raw.githubusercontent.com/goccmack/gogll/master/examples/json/json.md -o json.md
> gogll json.md
ParseError: Error: Parse Failed right extent=380, m=1047
 Parse Error: CharLiteral : \' ∙\\ anyof("nrt\\'\"") \'  cI=364 I[cI]=" at line 15 col 11
 Parse Error: Sep : SepChar ∙Sep  cI=363 I[cI]=' at line 15 col 10
 Parse Error: Sep : SepChar ∙Sep  cI=361 I[cI]=: at line 15 col 8
 Parse Error: NTChars : NTChar ∙NTChars  cI=360 I[cI]=space at line 15 col 7
 Parse Error: Sep : SepChar ∙Sep  cI=354 I[cI]=s at line 15 col 1
 Parse Error: NTChars : NTChar ∙NTChars  cI=326 I[cI]=; at line 11 col 13
 Parse Error: Symbols : Symbol ∙Sep Symbols  cI=326 I[cI]=; at line 11 col 13
 Parse Error: Alternates : Alternate ∙SepE | SepE Alternates  cI=326 I[cI]=; at line 11 col 13
 Parse Error: Sep : SepChar ∙Sep  cI=321 I[cI]=V at line 11 col 8
 Parse Error: NTChars : NTChar ∙NTChars  cI=319 I[cI]=: at line 11 col 6
 Parse Error: Sep : SepChar ∙Sep  cI=314 I[cI]=G at line 11 col 1
 Parse Error: Sep : SepChar ∙Sep  cI=271 I[cI]=" at line 9 col 9
 Parse Error: Sep : SepChar ∙Sep  cI=263 I[cI]=p at line 9 col 1
Error in BSR: 0 parse trees exist for start symbol GoGLL

empty.md:

ParseError: Error: Parse Failed right extent=33, m=116
 Parse Error: NTChars : NTChar ∙NTChars  cI=32 I[cI]=space at line 3 col 14
 Parse Error: Terminal : l ∙o w c a s e  cI=27 I[cI]=e at line 3 col 9
 Parse Error: Sep : SepChar ∙Sep  cI=26 I[cI]=l at line 3 col 8
 Parse Error: Sep : SepChar ∙Sep  cI=24 I[cI]=: at line 3 col 6
 Parse Error: NTChars : NTChar ∙NTChars  cI=23 I[cI]=space at line 3 col 5
 Parse Error: Sep : SepChar ∙Sep  cI=19 I[cI]=n at line 3 col 1
 Parse Error: Sep : SepChar ∙Sep  cI=8 I[cI]=" at line 1 col 9
Error in BSR: 0 parse trees exist for start symbol GoGLL

and my dumbed down test:

package "test"

GoGLL : Package Options ;

identifier : letter ;

Package: "package" identifier ;

Options : "option" identifier
  | "option" identifier ',' Options
  ;

produces:

Semantic Error: Rule GoGLL is not used at line 3 col 1

generate a parser for ASN1 grammar

I would like to know if I can use gogll to generate a parser from ASN.1 grammar from something like this:
ASN.1 Grammar example

What changes will be needed?

how to get comments ?

Lexer's Tokens not contains comments.

Support For Named Unicode Character Classes

I'm fiddling with gogll and I like it. Much cleaner than goyacc. I especially like the ability to embed the grammar in a CommonMark/Markdown file. A lovely way to document the grammar.

It would be nice/useful if gogll supported

Unicode named character classes
at least some Unicode character properties (e.g., ID_Start and ID_Continue)
arbitrary Go regular expressions as a means of defining terminals.

Currently, the only predefined character classes you support are

letter: Unicode character class L|Letter, which comprises the the following Unicode character classes:
- Lu | Uppercase Letter
- Ll | Lowercase Letter
- Lt| Titlecase Letter
- Lm | Modifier Letter
- Lo | Other Letter
upcase: Unspecified, but I suspect this is the Unicode character class Lu| Uppercase Letter.
lowcase: Unspecified, but I suspect this is the Unicode character class Ll|Lowercase Letter.
number: Unicode character class N|`Number, which comprises the following Unicode character classes
- Nd|Decimal Digit Number
- Nl|Letter Number
- No|Other Number

This makes writing parsers for some languages difficult (and the resulting parsers brittle should Unicode add new characters).

For instance, the Javascript/Ecmascript specification for an identifier is defined in terms of those Unicode characters having the ID_Start and ID_Continue properties:

And the Go programming language defines the identifier production as

identifier = letter { letter | unicode_digit } .

letter        = unicode_letter | "_" .

unicode_letter = /* a Unicode code point categorized as "Letter" */ .
unicode_digit  = /* a Unicode code point categorized as "Number, decimal digit" */ .

No having support for Unicode named character classes or properties makes it difficult to write a parser for such languages. The Unicode Number, decimal digit class comprises some 650 [discontiguous] code points. I have no idea how many code points are in the ID_Start category (a lot), and `ID_Continue adds to it. From Unicode® Standard Annex #31: Unicode Identifier and Pattern Syntax:

As you can see, adding support for this sort of stuff would be useful.

goccmack / gogll Goto Github PK

gogll's People

Contributors

Stargazers

Watchers

Forkers

gogll's Issues

Recommend Projects

Recommend Topics

Recommend Org