let-def / lrgrep Goto Github PK

View Code? Open in Web Editor NEW

25.0 25.0 2.0 1.67 MB

Menhir polishing toolbox, for experienced druids

Makefile 0.07% OCaml 99.69% Standard ML 0.24%

lrgrep's People

Contributors

Stargazers

Watchers

Forkers

jmid squiddev

lrgrep's Issues

Ideas to improve OCaml's syntax errors

I've come across two ocaml/ocaml issues related to sub-optimal syntax error and thought I would add them here (before I forget). They are:

Trying to understand the reduction/`!` pattern

I realise everything in this issue is quite involved, and not sure how "production ready" this repository is, so happy if you'd rather just close this :).

I'm currently fiddling with using lrgrep for a Lua parser I'm working on, in an attempt to provide better error messages for some common errors. I've put my current progress on this fork.

One issue I see a lot is people forgetting to put function parenthesis on zero-argument function calls, and so I'd like to provide an error for this case:

print -- should be print()
print("Hello")

Here parsing will fail on line 2, with an unexpected identifier. At that point, the interpreter trace for this parse looks like the following:

- line 1:0-5 IDENT
   [var: IDENT .]
 ↱ var
   [name: var .]
 ↱ simple_expr
   [name: simple_expr . OSQUARE expr CSQUARE]
   [name: simple_expr . DOT IDENT]
   [call: simple_expr . COLON IDENT call_args]
   [call: simple_expr . call_args]
 ↱ sep_list1(COMMA,name)
   [basic_stmt: sep_list1(COMMA,name) . EQUALS sep_list1(COMMA,expr)]
 ↱ name
   [simple_expr: name .]
   [sep_list1(COMMA,name): name . COMMA sep_list1(COMMA,name)]
   [sep_list1(COMMA,name): name .]
- entrypoint program

I'm currently matching this case with the following rule:

| [basic_stmt: sep_list1(COMMA,name) . EQUALS]; !
  partial { (* ... *) }

However, what I really want to be able to do is determine if the next token is on the same line (i.e. x 42, in which case its possible the user wanted to write x(42) or x = 42) or the next line (i.e print\nprint(), in which case it's probably just a function call). I tried doing this by putting a capture around the LR item, so I could compare positions:

| ([basic_stmt: sep_list1(COMMA,name) . EQUALS] as names); !
  partial { (* ... *) }

However, while the pattern still matches, names is always None. This is where my understanding begins to break down a little bit - the bytecode executed doesn't contain any Stores, and so I assume nothing is read from menhir's stack?

Is this something you'd expect to work or, if not, is there a alternative way I could look at implementing such a rule?

Again, apologies for such a verbose question!

Reductions with empty productions

Apologies for the slew of issues here (though thank you for all the work you've done on fixing them! ❤️).

I'm continuing to update my current code to use the latest version of lrgrep, and have hit a problem where reductions for empty productions do not appear to be matched.

I've committed a small reproduction case, but just to explain what's going on here.

We have the following grammar which accepts any number of characters between two brackets (e.g. (), (a), (aaa)).

let sentence := "(" ; ~ = chars ; ")" ; EOF ; <>

let chars := ~ = list(char) ; <List.rev>
let char := ~ = C ; <>

I'd like to match sentences without a closing bracket, and so have the following rule:

rule error_message = parse error
| OPAREN ; [chars]
{ "Unclosed '('" }

While this does match sentences like (aaa, it does not match just (. Curiously this does work with LRgrep 2 (so using OPAREN ; chars ; !), so I'm assuming this is a regression rather than intentional behaviour?

Sorry I can't provide more info! Still going down the rabbit hole of trying to understand how lrgrep works.

Is Menhir's `really_top` really needed?

I was taking a look at updating my code to use the latest version of lrgrep (I'm still using a version from February, before #8/#9 were merged). However, as of 0cd455b I noticed that lrgrep requires the Menhir parser to export a val really_top : 'a env -> element definition.

While this function is pretty easy to add, there's a bit of me which wonders whether it's needed/useful in the first place. The only time we need to read the very top of the stack in the OCaml test suite is for test_0314.ml:

lrgrep/ocaml/testsuite/test_0314.ml

Line 1 in a6fec51

let lident = lident and false = UIDENT to

lrgrep/ocaml/parse_errors.mlyl

Lines 81 to 84 in a6fec51

    
           | pos=[_* / LET ... . IN ...] 
        
           | pos=[_* / let_bindings(ext) ... . IN ...] 
        
           | pos=[_* / let_bindings(no_ext) ... . IN ...] 
        
             { "Expecting `in' to continue let-binding at " ^ line_and_char $startloc(pos) }

However, in this case, the capture isn't actually very useful - the start/end position of the top of the stack will the first character in the file, when probably we want to be capturing the position of the initial LET (or let_bindings(_)). I wonder if startloc should be capturing the position of the first token in the production in this case (and similarly for endloc) instead?

Apologies if this is already on your radar (or if I'm talking nonsense)! I've not dug too much into the code yet, just felt it was worth asking first.

let-def / lrgrep Goto Github PK

lrgrep's People

Contributors

Stargazers

Watchers

Forkers

lrgrep's Issues

Ideas to improve OCaml's syntax errors

Trying to understand the reduction/`!` pattern

Reductions with empty productions

Is Menhir's `really_top` really needed?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	\| pos=[_* / LET ... . IN ...]
	\| pos=[_* / let_bindings(ext) ... . IN ...]
	\| pos=[_* / let_bindings(no_ext) ... . IN ...]
	{ "Expecting `in' to continue let-binding at " ^ line_and_char $startloc(pos) }