Comments (14)
Hi @munael.
I think delimited strings is one aspect of the D grammar that is impossible to capture in a typical PEG parser generator:
string hello = q"HELLO
This is a HELLO delimited string.
Double quotes " may be unbalanced!
HELLO";
This is because the delimiter is unknown until parsing is halfway through the expression to be parsed.
However, Pegged is not typical, and I recently found out that you can use a user defined parser for these kinds of language constructs: #342. This is cool!
So nowadays I would nuance myself and say that it may be possible to create a Pegged generated parser for the entire D language, but I can't be certain until somebody does it. Another aspect is mixins, which may be tricky.
However, Pegged is currently not implemented optimally regarding speed, and trumping DMD in terms of speed is a tall order. So if anybody would be interested in working on this, the motivation would be something else than speed. And then there will always be the issue of keeping it up to date.
from pegged.
What version of Pegged are you using? If it is a beta, can you compare against v0.4.4 please?
from pegged.
@veelo i'm on 0.4.4
already
For completeness sake, i tried 0.4.5-beta.2
and ended up with the same behavior, note i am not doing anything fancy, just compiling dgrammer.d
from pegged.
note i am not doing anything fancy, just compiling
dgrammer.d
I appended
mixin(grammar(Dgrammar));
unittest
{
import std.stdio;
writeln(D(`enum E = "Hello";`));
}
and ran dub test
, and it took indeed ca. 5 minutes and 25GB. If your example is longer, I can imagine resource usage will be more like what you are seeing.
But this generates the parser at compile-time, which is probably the reason why this takes so long. With bigger grammars it is better to use asModule
, which I will try next.
from pegged.
OK, asModule
(after the model of the extended_pascal
example) does not help with the time consumption. But it allows to take an extra -lowmem
compiler option to take effect, which keeps memory usage around 1GB, at the cost of an increased compilation time of 20 minutes! I don't know if that's worth it.
I don't really know what we should do with this example. As it is now, the example is incomplete and merely defines a string enum, and there is not much point to test this in CI. I have no idea why this example is so demanding, extended_pascal
is just marginally smaller, and finishes building in 4 seconds.
FYI regarding your observation "leaks memory like crazy": this is considered a feature of the compiler, which refrains from freeing memory by default. The -lowmem
option changes that.
from pegged.
extended_pascal is just marginally smaller, and finishes building in 4 seconds.
oh, i thought that like dgrammar.d
any large grammar would be like that, tbh it's unusable, any idea where to even start investigating why this happens with this particular grammar, if that's the case?
this is considered a feature of the compiler, which refrains from freeing memory by default. The -lowmem option changes that.
My bad, dmd taking all that time made me think that there's something somehow stuck and using memory, want to note too that the resulting executable is 20Mb, isn't that bizarre?
from pegged.
Yes, something is going on here that would be interesting to investigate. However, I myself don't consider it worth my time. I think this example grew out of a curiosity of Philippe to see how far the capabilities of Pegged stretch. It was never complete, is now out of date, and also has some extra experiments (such as macros). It is uncertain to me whether the complete D grammar can in fact be accurately defined as a PEG. And if possible, what point is there now that dmd can be used as a library? In my eyes resources are better spent on improving things for grammars that are in actual use, if necessary.
Pegged is very heavy on templates, and it is well known that templates can result in considerable bloat in the executable. There are individuals that think they can develop a better template engine in the compiler (Stefan Koch) but as usual the problem is time (funding).
Pegged itself also bears signs of unfinished experiments and unneeded complexity, possibly things can be improved a bit by a cleanup.
from pegged.
@veelo really appreciate the input, when i asked that something might be wrong i meant a problem that prevailed in the dgrammer
that will probably reveal itself again once another grammar gets big enough, or maybe hits the same combination that triggered this behavior, but understand that if it's an issue with this particular grammar alone as a result of experimentation it shouldn't be a priority. I am just new to pegged, fetched out that grammar as it's the biggest one so matching my planned grammar in terms of size, yet i was surprised of the time/memory, thought this is how it would get with any other big grammar
from pegged.
Your concerns are legitimate, and there is no guarantee that you won't run into problems. As usual with volunteer-run Open Source projects, you may have to put in an unexpected amount of extra work, but at least you have the possibility to do that. I ran into a fundamental problem myself (left-recursion) and had to do what felt like moving a mountain, but succeeded at last. Now left-recursion is not a problem anymore. I ended up looking after Pegged as a maintainer, but it is just one link in a chain of tools that I use and hence does not receive my full attention.
Out of curiosity, is your planned grammar that of an existing language, or will you be developing it as you go? PEGs are not a good match for every language, due to its use of prioritised choice. Pegged does support a longest match alternator to combat that limitation, but that breaks the linear time promise of a pure PEG.
from pegged.
@veelo totally understand the oss and that no maintainer is obligated to do nothing if he/she doesn't feel like doing it, i was just sharing a concern nothing more, and if i get deep enough into pegged i'll be happy to contribute back, about the language, It's a C-like language that i am developing, the reason i picked PEGs is that coding recursive decent parsers gets really tedious after a while, and you end up with so much moving parts that even small changes become a hassle, so i started looking for a better, context free grammars in general didn't fit because more often than not they have to deal with ambiguity in weird ways, but PEGs on the other hand are recursive decent naturally. One more thing that made me consider it and feel safe that i am not making a stupid choice is that Python is moving to PEG for it's parser/lexer/ast combo and the author made a pretty nice argument for that decision. So yeah will see how it goes
from pegged.
Interesting reference, nice to see they tackled the left-recursion challenge as well. Since you are developing a language as a PEG, I bet you'll have a much better experience than trying to convert an existing language into a PEG. Welcome on board! (The D language specification is reverse engineered from a manually written lexer+parser, and converting that into a PEG is a different game.)
from pegged.
Why is PEG unsuitable for the D grammar?
It is uncertain to me whether the complete D grammar can in fact be accurately defined as a PEG.
I'm new to parser generators and combinators, and to PEGs too. My understanding is that the theory behind it models Recursive Descent parsers. Can you share your thoughts from that the time on where in the D grammar a PEG would fail? I've also been looking at the DMD implementation of the lexer/parser but it's quite big.
Thank you :)
cc @veelo
from pegged.
That's a good example 👀
Is it the case that the PEG formalism supports only stateless parsers?
Semantic actions suggest no, but semantic actions act after rules and can't create their own captures as far as I understand it. The implementation may support it, but it doesn't look like a proper PEG.
I would've expected delimited strings to be handled at the lexer level, to be honest. A custom parser could act as one, but what would the pre-parsed tokens in a delimited string look like?
from pegged.
That's a good example 👀 Is it the case that the PEG formalism supports only stateless parsers?
Yes. https://scg.unibe.ch/assets/download/lectures/cc/CC-09-PEGs.pdf
Semantic actions suggest no, but semantic actions act after rules and can't create their own captures as far as I understand it.
Correct.
I would've expected delimited strings to be handled at the lexer level, to be honest. A custom parser could act as one, but what would the pre-parsed tokens in a delimited string look like?
A PEG parser is a lexer and parser in one, so there is no lexer level.
from pegged.
Related Issues (20)
- Grammar won't parse whole text, it stops short a few lines... HOT 1
- Comment syntax HOT 7
- Unwanted space consumption in rule parameter
- Syntax wrappers HOT 1
- How does one do Intellisense using a Pegged grammar? HOT 3
- Is there a way to break up a grammar into D classes, each with their own data to parse into & thus subgrammar? HOT 2
- I can't find the Wikipedia ParseTree handling code in the Pegged wiki any more... HOT 1
- Will there be a speed-up when using a Dlang switch-case and shorter pegged variable names? HOT 2
- Can we create a grammar induction algorithm based upon expected Pegged parsing failures? HOT 2
- How do you use dub + pegged + asModule? HOT 1
- Bump new version HOT 7
- Remove obsolete CI.
- How do you branch on complex names such as caseSensitiveLiteral!("let") (they get much more complicated) HOT 1
- Undocumented features of Pegged HOT 2
- Remove Leading and trailing automatic space capture HOT 6
- core.exception.ArraySliceError@pegged\pegged\peg.d(1930): slice [0 .. 18446744073709551612] extends past source array of length 0 HOT 7
- v0.4.9: Invalid grammar errors are not reported HOT 29
- GrammarTester enhancment proposal HOT 5
- Recreating D's identifier-delimited string literals HOT 4
- Status of dgrammar.d? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pegged.