Giter Site home page Giter Site logo

Precedence about pikaparser.jl HOT 10 CLOSED

jkrumbiegel avatar jkrumbiegel commented on May 20, 2024
Precedence

from pikaparser.jl.

Comments (10)

exaexa avatar exaexa commented on May 20, 2024

Hi,

predecence with PEGs is often a bit surprising. Original pikaparser basically compiles the precedence integers into "base grammar" by the usual failthrough construction with a base case, which removes much ambiguity and also allows you to nicely specify the fixity of whatever operators you have.

In your case, there's no telling whether the "glue chars together" in :sequence should have lower or higher precedence than the "alternatives" in :either, and what you got is a valid parse with respect to the grammar.

I succeeded with implementing the failthrough gadget manually:

using PikaParser
const P = PikaParser

r = Dict(
    :regex => P.seq(:expr1),  # toplevel
    :expr1 => P.first(:or => P.seq(:expr1, P.token('|'), :expr2), :expr2),  # precedence level 1
    :expr2 => P.one_or_more(:basic),   # precedence level 2
    :basic => P.first(  # easily decidable single-item base cases
        :char => P.satisfy(isletter),
        :parens => P.seq(P.token('('), :expr1, P.token(')')),
    )
)

g = P.make_grammar([:regex], P.flatten(r))

input = collect("abc|d(e|fg|h)ij|kl")

p = P.parse(g, input)

println(P.traverse_match(g, p, P.find_match_at(g, p, :regex, 1), :regex))

(At precedence level 1, you may also try other combinations of :expr1 and :expr2 operands to change the fixity of the |.)

On the string abc|d(e|fg|h)ij|kl it should give (with a bit of juliaformatter) something like:

regex(
    expr1(
        or(
            expr1(
                or(
                    expr1(expr2(basic(char()), basic(char()), basic(char()))),
                    var"or-2"(),
                    expr2(
                        basic(char()),
                        basic(
                            parens(
                                var"parens-1"(),
                                expr1(
                                    or(
                                        expr1(
                                            or(
                                                expr1(expr2(basic(char()))),
                                                var"or-2"(),
                                                expr2(basic(char()), basic(char())),
                                            ),
                                        ),
                                        var"or-2"(),
                                        expr2(basic(char())),
                                    ),
                                ),
                                var"parens-3"(),
                            ),
                        ),
                        basic(char()),
                        basic(char()),
                    ),
                ),
            ),
            var"or-2"(),
            expr2(basic(char()), basic(char())),
        ),
    ),
)

I don't see the immediate issue why your parser would produce the "bad" parsetree, but I expect the problem to be a bad interplay of greedy matching and left-recursion problems, which always gives surprises. In this case, if you'd parse with normal recursive descent, you would expect the :sequence rule to expand to :expr, then try :either which again tries :sequence at the same position and then loops forever. Pikaparser fixes this by reconstructing the parsetree from the end, and prefers matching the b|c as a continuation in ab|c as a higher-priority :either, which is a perfectly valid :expr (even better match than the :char because it doesn't produce failure later), but that doesn't give you the way to glue the chars together. In short, "priority" in first isn't the same thing as "precedence".

You might be able to quickfix your grammar by instead using :chars that greedily matches the whole char group, but I didn't try that.

Also, I should probably add some support for easily creating the precedence chains :D

from pikaparser.jl.

exaexa avatar exaexa commented on May 20, 2024

Bonus: parsing the regex suffixes with normal grammars is a major source of PITA but I guess here you'll have luck with just

...
:expr2 => P.one_or_more(:expr3),
:expr3 => P.first(:one_or_more, :zero_or_more, ..., :basic),
:one_or_more => P.seq(:basic, P.token('+')),
...

from pikaparser.jl.

exaexa avatar exaexa commented on May 20, 2024

Also, with #3 merged you can write the ruleset from above roughly as:

r = Dict(
    P.@precedences (i -> Symbol(:r, i)) c n begin
      :regex => P.seq(n, P.token('|'), c)
      :sequence => P.one_or_more(n)
      P.first(
        :group => P.seq(P.token('('), n, P.token(')')),
        :char => P.satisfy(isletter),
      )
    end
)

(untested)

from pikaparser.jl.

jkrumbiegel avatar jkrumbiegel commented on May 20, 2024

Cool, I will play around with this. It already looked like your previous suggestions worked, and I understood better why it didn't work before, so thanks!

from pikaparser.jl.

exaexa avatar exaexa commented on May 20, 2024

OK. Feel free to report your final grammar, might be useful to add some extra bits from it to tests.

from pikaparser.jl.

jkrumbiegel avatar jkrumbiegel commented on May 20, 2024

If you want you can have a look here: https://github.com/jkrumbiegel/ReadableRegex.jl/blob/explain-regex/explain_regex.jl

I wanted to do the reverse of what the package is doing, translate a given regex into function code that should hopefully be easier to understand.

For example, explain_regex(r"^-?[0-9]{0,2}(\.[0-9]{1,2})?$|^-?(100)(\.[0]{1,2})?$") gives this (with a tiny bit manual indentation to make the either arguments more salient:

either(
    BEGIN *
        maybe('-') *
        between(0, 2, char_in('0':1:'9')) *
        maybe(capture('.' * between(1, 2, char_in('0':1:'9')))) *
        END,
    BEGIN *
        maybe('-') *
        capture("100") *
        maybe(capture('.' * between(1, 2, char_in('0')))) *
        END,
)

from pikaparser.jl.

exaexa avatar exaexa commented on May 20, 2024

Aaaah nice. Regex is a bit write-only tbh, but this should work. If I get it right, you are able to reconstruct actual ReadableRegex structure from any regex pattern?

Anyway, let me know in case you hit any difficulties!

(also, how new is the import Package as P syntax? the last time I tried it didn't work :D :D )

from pikaparser.jl.

jkrumbiegel avatar jkrumbiegel commented on May 20, 2024

Yes, that's the goal. And yeah it's mostly read only, but it's very hard to understand as it is. Like spotting a | in the middle of a long regex etc. So I thought it would be cool to have that split out a bit, and with the ReadableRegex functions you can then more easily try out changes. But anyway it's mostly a didactical thing :)

from pikaparser.jl.

exaexa avatar exaexa commented on May 20, 2024

Thinking about that, is there any regular language matcher in Julia? This way we could have a completely regex-less regular language matching. (Technically you could translate the regular expressions to pikaparser and match them as is, but compared to plain DFA/NFA required for regexes there's an ugly lot of overhead here...)

from pikaparser.jl.

exaexa avatar exaexa commented on May 20, 2024

Closing this because the precedence helpers gonna get released asap. Thanks for input!

from pikaparser.jl.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.