Giter Site home page Giter Site logo

Comments (14)

veelo avatar veelo commented on July 18, 2024 1

Hi @munael.

I think delimited strings is one aspect of the D grammar that is impossible to capture in a typical PEG parser generator:

string hello = q"HELLO
This is a HELLO delimited string.
Double quotes " may be unbalanced!
HELLO";

This is because the delimiter is unknown until parsing is halfway through the expression to be parsed.

However, Pegged is not typical, and I recently found out that you can use a user defined parser for these kinds of language constructs: #342. This is cool!

So nowadays I would nuance myself and say that it may be possible to create a Pegged generated parser for the entire D language, but I can't be certain until somebody does it. Another aspect is mixins, which may be tricky.

However, Pegged is currently not implemented optimally regarding speed, and trumping DMD in terms of speed is a tall order. So if anybody would be interested in working on this, the motivation would be something else than speed. And then there will always be the issue of keeping it up to date.

from pegged.

veelo avatar veelo commented on July 18, 2024

What version of Pegged are you using? If it is a beta, can you compare against v0.4.4 please?

from pegged.

exoticus avatar exoticus commented on July 18, 2024

@veelo i'm on 0.4.4 already

For completeness sake, i tried 0.4.5-beta.2 and ended up with the same behavior, note i am not doing anything fancy, just compiling dgrammer.d

from pegged.

veelo avatar veelo commented on July 18, 2024

note i am not doing anything fancy, just compiling dgrammer.d

This one? https://github.com/PhilippeSigaud/Pegged/blob/master/examples/dgrammar/src/pegged/examples/dgrammar.d

I appended

mixin(grammar(Dgrammar));

unittest
{
    import std.stdio;
    writeln(D(`enum E = "Hello";`));
}

and ran dub test, and it took indeed ca. 5 minutes and 25GB. If your example is longer, I can imagine resource usage will be more like what you are seeing.

But this generates the parser at compile-time, which is probably the reason why this takes so long. With bigger grammars it is better to use asModule, which I will try next.

from pegged.

veelo avatar veelo commented on July 18, 2024

OK, asModule (after the model of the extended_pascal example) does not help with the time consumption. But it allows to take an extra -lowmem compiler option to take effect, which keeps memory usage around 1GB, at the cost of an increased compilation time of 20 minutes! I don't know if that's worth it.

I don't really know what we should do with this example. As it is now, the example is incomplete and merely defines a string enum, and there is not much point to test this in CI. I have no idea why this example is so demanding, extended_pascal is just marginally smaller, and finishes building in 4 seconds.

FYI regarding your observation "leaks memory like crazy": this is considered a feature of the compiler, which refrains from freeing memory by default. The -lowmem option changes that.

from pegged.

exoticus avatar exoticus commented on July 18, 2024

@veelo

extended_pascal is just marginally smaller, and finishes building in 4 seconds.

oh, i thought that like dgrammar.d any large grammar would be like that, tbh it's unusable, any idea where to even start investigating why this happens with this particular grammar, if that's the case?

this is considered a feature of the compiler, which refrains from freeing memory by default. The -lowmem option changes that.

My bad, dmd taking all that time made me think that there's something somehow stuck and using memory, want to note too that the resulting executable is 20Mb, isn't that bizarre?

from pegged.

veelo avatar veelo commented on July 18, 2024

Yes, something is going on here that would be interesting to investigate. However, I myself don't consider it worth my time. I think this example grew out of a curiosity of Philippe to see how far the capabilities of Pegged stretch. It was never complete, is now out of date, and also has some extra experiments (such as macros). It is uncertain to me whether the complete D grammar can in fact be accurately defined as a PEG. And if possible, what point is there now that dmd can be used as a library? In my eyes resources are better spent on improving things for grammars that are in actual use, if necessary.

Pegged is very heavy on templates, and it is well known that templates can result in considerable bloat in the executable. There are individuals that think they can develop a better template engine in the compiler (Stefan Koch) but as usual the problem is time (funding).

Pegged itself also bears signs of unfinished experiments and unneeded complexity, possibly things can be improved a bit by a cleanup.

from pegged.

exoticus avatar exoticus commented on July 18, 2024

@veelo really appreciate the input, when i asked that something might be wrong i meant a problem that prevailed in the dgrammer that will probably reveal itself again once another grammar gets big enough, or maybe hits the same combination that triggered this behavior, but understand that if it's an issue with this particular grammar alone as a result of experimentation it shouldn't be a priority. I am just new to pegged, fetched out that grammar as it's the biggest one so matching my planned grammar in terms of size, yet i was surprised of the time/memory, thought this is how it would get with any other big grammar

from pegged.

veelo avatar veelo commented on July 18, 2024

Your concerns are legitimate, and there is no guarantee that you won't run into problems. As usual with volunteer-run Open Source projects, you may have to put in an unexpected amount of extra work, but at least you have the possibility to do that. I ran into a fundamental problem myself (left-recursion) and had to do what felt like moving a mountain, but succeeded at last. Now left-recursion is not a problem anymore. I ended up looking after Pegged as a maintainer, but it is just one link in a chain of tools that I use and hence does not receive my full attention.

Out of curiosity, is your planned grammar that of an existing language, or will you be developing it as you go? PEGs are not a good match for every language, due to its use of prioritised choice. Pegged does support a longest match alternator to combat that limitation, but that breaks the linear time promise of a pure PEG.

from pegged.

exoticus avatar exoticus commented on July 18, 2024

@veelo totally understand the oss and that no maintainer is obligated to do nothing if he/she doesn't feel like doing it, i was just sharing a concern nothing more, and if i get deep enough into pegged i'll be happy to contribute back, about the language, It's a C-like language that i am developing, the reason i picked PEGs is that coding recursive decent parsers gets really tedious after a while, and you end up with so much moving parts that even small changes become a hassle, so i started looking for a better, context free grammars in general didn't fit because more often than not they have to deal with ambiguity in weird ways, but PEGs on the other hand are recursive decent naturally. One more thing that made me consider it and feel safe that i am not making a stupid choice is that Python is moving to PEG for it's parser/lexer/ast combo and the author made a pretty nice argument for that decision. So yeah will see how it goes

from pegged.

veelo avatar veelo commented on July 18, 2024

Interesting reference, nice to see they tackled the left-recursion challenge as well. Since you are developing a language as a PEG, I bet you'll have a much better experience than trying to convert an existing language into a PEG. Welcome on board! (The D language specification is reverse engineered from a manually written lexer+parser, and converting that into a PEG is a different game.)

from pegged.

munael avatar munael commented on July 18, 2024

Why is PEG unsuitable for the D grammar?

It is uncertain to me whether the complete D grammar can in fact be accurately defined as a PEG.

I'm new to parser generators and combinators, and to PEGs too. My understanding is that the theory behind it models Recursive Descent parsers. Can you share your thoughts from that the time on where in the D grammar a PEG would fail? I've also been looking at the DMD implementation of the lexer/parser but it's quite big.

Thank you :)

cc @veelo

from pegged.

munael avatar munael commented on July 18, 2024

@veelo

That's a good example 👀
Is it the case that the PEG formalism supports only stateless parsers?

Semantic actions suggest no, but semantic actions act after rules and can't create their own captures as far as I understand it. The implementation may support it, but it doesn't look like a proper PEG.

I would've expected delimited strings to be handled at the lexer level, to be honest. A custom parser could act as one, but what would the pre-parsed tokens in a delimited string look like?

from pegged.

veelo avatar veelo commented on July 18, 2024

That's a good example 👀 Is it the case that the PEG formalism supports only stateless parsers?

Yes. https://scg.unibe.ch/assets/download/lectures/cc/CC-09-PEGs.pdf

Semantic actions suggest no, but semantic actions act after rules and can't create their own captures as far as I understand it.

Correct.

I would've expected delimited strings to be handled at the lexer level, to be honest. A custom parser could act as one, but what would the pre-parsed tokens in a delimited string look like?

A PEG parser is a lexer and parser in one, so there is no lexer level.

from pegged.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.