Compiling the dgrammer.d takes close to 10 minutes an

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

note i am not doing anything fancy, just compiling <code class="notransla

OK, asModule (after the model of the <code class="not

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

is it okay for the D grammar to compile in 10 minutes and use 45 gigs of ram? about pegged HOT 14 OPEN

exoticus commented on July 18, 2024

is it okay for the D grammar to compile in 10 minutes and use 45 gigs of ram?

from pegged.

Comments (14)

veelo commented on July 18, 2024 1

Hi @munael.

I think delimited strings is one aspect of the D grammar that is impossible to capture in a typical PEG parser generator:

string hello = q"HELLO
This is a HELLO delimited string.
Double quotes " may be unbalanced!
HELLO";

This is because the delimiter is unknown until parsing is halfway through the expression to be parsed.

However, Pegged is not typical, and I recently found out that you can use a user defined parser for these kinds of language constructs: #342. This is cool!

So nowadays I would nuance myself and say that it may be possible to create a Pegged generated parser for the entire D language, but I can't be certain until somebody does it. Another aspect is mixins, which may be tricky.

However, Pegged is currently not implemented optimally regarding speed, and trumping DMD in terms of speed is a tall order. So if anybody would be interested in working on this, the motivation would be something else than speed. And then there will always be the issue of keeping it up to date.

from pegged.

veelo commented on July 18, 2024

What version of Pegged are you using? If it is a beta, can you compare against v0.4.4 please?

from pegged.

exoticus commented on July 18, 2024

@veelo i'm on 0.4.4 already

For completeness sake, i tried 0.4.5-beta.2 and ended up with the same behavior, note i am not doing anything fancy, just compiling dgrammer.d

from pegged.

veelo commented on July 18, 2024

note i am not doing anything fancy, just compiling dgrammer.d

This one? https://github.com/PhilippeSigaud/Pegged/blob/master/examples/dgrammar/src/pegged/examples/dgrammar.d

I appended

mixin(grammar(Dgrammar));

unittest
{
    import std.stdio;
    writeln(D(`enum E = "Hello";`));
}

and ran dub test, and it took indeed ca. 5 minutes and 25GB. If your example is longer, I can imagine resource usage will be more like what you are seeing.

But this generates the parser at compile-time, which is probably the reason why this takes so long. With bigger grammars it is better to use asModule, which I will try next.

from pegged.

veelo commented on July 18, 2024

OK, asModule (after the model of the extended_pascal example) does not help with the time consumption. But it allows to take an extra -lowmem compiler option to take effect, which keeps memory usage around 1GB, at the cost of an increased compilation time of 20 minutes! I don't know if that's worth it.

I don't really know what we should do with this example. As it is now, the example is incomplete and merely defines a string enum, and there is not much point to test this in CI. I have no idea why this example is so demanding, extended_pascal is just marginally smaller, and finishes building in 4 seconds.

FYI regarding your observation "leaks memory like crazy": this is considered a feature of the compiler, which refrains from freeing memory by default. The -lowmem option changes that.

from pegged.

exoticus commented on July 18, 2024

@veelo

extended_pascal is just marginally smaller, and finishes building in 4 seconds.

oh, i thought that like dgrammar.d any large grammar would be like that, tbh it's unusable, any idea where to even start investigating why this happens with this particular grammar, if that's the case?

this is considered a feature of the compiler, which refrains from freeing memory by default. The -lowmem option changes that.

My bad, dmd taking all that time made me think that there's something somehow stuck and using memory, want to note too that the resulting executable is 20Mb, isn't that bizarre?

from pegged.

veelo commented on July 18, 2024

Yes, something is going on here that would be interesting to investigate. However, I myself don't consider it worth my time. I think this example grew out of a curiosity of Philippe to see how far the capabilities of Pegged stretch. It was never complete, is now out of date, and also has some extra experiments (such as macros). It is uncertain to me whether the complete D grammar can in fact be accurately defined as a PEG. And if possible, what point is there now that dmd can be used as a library? In my eyes resources are better spent on improving things for grammars that are in actual use, if necessary.

Pegged is very heavy on templates, and it is well known that templates can result in considerable bloat in the executable. There are individuals that think they can develop a better template engine in the compiler (Stefan Koch) but as usual the problem is time (funding).

Pegged itself also bears signs of unfinished experiments and unneeded complexity, possibly things can be improved a bit by a cleanup.

from pegged.

exoticus commented on July 18, 2024

@veelo really appreciate the input, when i asked that something might be wrong i meant a problem that prevailed in the dgrammer that will probably reveal itself again once another grammar gets big enough, or maybe hits the same combination that triggered this behavior, but understand that if it's an issue with this particular grammar alone as a result of experimentation it shouldn't be a priority. I am just new to pegged, fetched out that grammar as it's the biggest one so matching my planned grammar in terms of size, yet i was surprised of the time/memory, thought this is how it would get with any other big grammar

from pegged.

veelo commented on July 18, 2024

Your concerns are legitimate, and there is no guarantee that you won't run into problems. As usual with volunteer-run Open Source projects, you may have to put in an unexpected amount of extra work, but at least you have the possibility to do that. I ran into a fundamental problem myself (left-recursion) and had to do what felt like moving a mountain, but succeeded at last. Now left-recursion is not a problem anymore. I ended up looking after Pegged as a maintainer, but it is just one link in a chain of tools that I use and hence does not receive my full attention.

Out of curiosity, is your planned grammar that of an existing language, or will you be developing it as you go? PEGs are not a good match for every language, due to its use of prioritised choice. Pegged does support a longest match alternator to combat that limitation, but that breaks the linear time promise of a pure PEG.

from pegged.

exoticus commented on July 18, 2024

@veelo totally understand the oss and that no maintainer is obligated to do nothing if he/she doesn't feel like doing it, i was just sharing a concern nothing more, and if i get deep enough into pegged i'll be happy to contribute back, about the language, It's a C-like language that i am developing, the reason i picked PEGs is that coding recursive decent parsers gets really tedious after a while, and you end up with so much moving parts that even small changes become a hassle, so i started looking for a better, context free grammars in general didn't fit because more often than not they have to deal with ambiguity in weird ways, but PEGs on the other hand are recursive decent naturally. One more thing that made me consider it and feel safe that i am not making a stupid choice is that Python is moving to PEG for it's parser/lexer/ast combo and the author made a pretty nice argument for that decision. So yeah will see how it goes

from pegged.

veelo commented on July 18, 2024

Interesting reference, nice to see they tackled the left-recursion challenge as well. Since you are developing a language as a PEG, I bet you'll have a much better experience than trying to convert an existing language into a PEG. Welcome on board! (The D language specification is reverse engineered from a manually written lexer+parser, and converting that into a PEG is a different game.)

from pegged.

munael commented on July 18, 2024

Why is PEG unsuitable for the D grammar?

It is uncertain to me whether the complete D grammar can in fact be accurately defined as a PEG.

I'm new to parser generators and combinators, and to PEGs too. My understanding is that the theory behind it models Recursive Descent parsers. Can you share your thoughts from that the time on where in the D grammar a PEG would fail? I've also been looking at the DMD implementation of the lexer/parser but it's quite big.

Thank you :)

cc @veelo

from pegged.

munael commented on July 18, 2024

@veelo

That's a good example 👀
Is it the case that the PEG formalism supports only stateless parsers?

Semantic actions suggest no, but semantic actions act after rules and can't create their own captures as far as I understand it. The implementation may support it, but it doesn't look like a proper PEG.

I would've expected delimited strings to be handled at the lexer level, to be honest. A custom parser could act as one, but what would the pre-parsed tokens in a delimited string look like?

from pegged.

veelo commented on July 18, 2024

That's a good example 👀 Is it the case that the PEG formalism supports only stateless parsers?

Yes. https://scg.unibe.ch/assets/download/lectures/cc/CC-09-PEGs.pdf

Semantic actions suggest no, but semantic actions act after rules and can't create their own captures as far as I understand it.

Correct.

I would've expected delimited strings to be handled at the lexer level, to be honest. A custom parser could act as one, but what would the pre-parsed tokens in a delimited string look like?

A PEG parser is a lexer and parser in one, so there is no lexer level.

from pegged.

is it okay for the D grammar to compile in 10 minutes and use 45 gigs of ram? about pegged HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent