I've been playing around with the example markdown grammar[1] when I noticed some pecu

One more thing I just noticed: if I change the in

On Mon, Oct 1, 2012 at 5:46 AM, Val Markovic <a href="mailto:notifications@github.comw

On Mon, Oct 1, 2012 at 6:13 AM, Val Markovic <a href="mailto:notifications@github.comw

On Mon, Oct 1, 2012 at 7:04 PM, Val Markovic <a href="mailto:notifications@github.comw

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Problems with parsing spaces and tabs about pegged HOT 12 CLOSED

philippesigaud commented on July 17, 2024

Problems with parsing spaces and tabs

from pegged.

Comments (12)

Valloric commented on July 17, 2024

One more thing I just noticed: if I change the \t in the Spacechar rule to, say, x so that Spacechar <- " " / "x", I get the same wrong output. It doesn't matter what character I use instead of \t; it seems the problem is in /, that is the ordered or.

from pegged.

PhilippeSigaud commented on July 17, 2024

On Mon, Oct 1, 2012 at 5:46 AM, Val Markovic [email protected]:

[1] Which is just terrible BTW. I know, it's not your fault, the original
peg-markdown grammar has the bugs, I checked. I'm improving it so that it's
correct and uses the very nice Pegged extensions and I'll pull-request the
new grammar once I'm done.

Thanks a lot, and that's a pull request I find most exciting! I hope
parameterized rules should help in dealing with the dozens of HTML rules.I
recently found a bug in them (param rules), so I'll try and correct it
rapidly.

I corrected some bugs for the C and D grammar (some left in D, though), and
Markdown was next on my list.Apart from the bugs, I find the parse tree
delivered by this grammar to be strangely constructed, due to the way the
grammar was written.

The next step will be tu use it to parse the docs themselves.Then, writing
a tree-walking function that delivers LaTeX or raw text (or inserts
examples) from a .md file will be easy to code.

from pegged.

PhilippeSigaud commented on July 17, 2024

On Mon, Oct 1, 2012 at 6:13 AM, Val Markovic [email protected]:

One more thing I just noticed: if I change the \t in the Spacechar rule
to, say, x so that Spacechar <- " " / "x", I get the same wrong output.
It doesn't matter what character I use instead of \t; it seems the
problem is in /, that is the ordered or.

Maybe there is a repetition somewhere, like (Spacechar*)+, which can
loop indefinitely?

from pegged.

Valloric commented on July 17, 2024

Maybe there is a repetition somewhere, like (Spacechar*)+, which can
loop indefinitely?

I don't see the repetition in the test case I posted. With this test case
alone I'm experiencing the problem. Are you sure it's not a bug somewhere
in Pegged? Again, this test case is self-contained; the bug is either here
or in Pegged, and I don't see it here.

WRT markdown to LaTeX... I'm writing a markdown converter in D using Pegged
and the example grammar as a basis. I've already made many changes to it;
the final library will provide a ConvertMarkdown(input, output_type)
function that takes in a string of markdown text and the output format
desired (HTML, LaTex etc) and returns the processed string. v1.0 will
include only HTML output, but a LaTeX output type will be easy to add once I get
everything in place for the HTML. Also, I plan to add many, many test cases
for this library; there's already several different sets of markdown test
cases https://github.com/trentm/python-markdown2/wiki/Testing-Notes that
the various markdown converters out there are using and I intend to use as
many of the tests I can.

And yes, the example grammar builds not only a tree that's very peculiar,
but also incorrect for even very basic cases. But again, I'm fixing the
problems as I find them.

I'll gladly upstream the grammar when I'm done with it.

from pegged.

PhilippeSigaud commented on July 17, 2024

On Mon, Oct 1, 2012 at 7:04 PM, Val Markovic [email protected]:

Maybe there is a repetition somewhere, like (Spacechar*)+, which can
loop indefinitely?

I don't see the repetition in the test case I posted. With this test case
alone I'm experiencing the problem. Are you sure it's not a bug somewhere
in Pegged? Again, this test case is self-contained; the bug is either here
or in Pegged, and I don't see it here.

Callumenator found a bug in the keyword function (which I just
corrected). Maybe that was it ? Could send me the hanging grammar please?
(philippe.sigaud and the google mail).

WRT markdown to LaTeX... I'm writing a markdown converter in D using Pegged
and the example grammar as a basis.

That's mightily cool.

I've already made many changes to it;
the final library will provide a ConvertMarkdown(input, output_type)
function that takes in a string of markdown text and the output format
desired (HTML, LaTex etc) and returns the processed string.

would output_type be an enum, or a type?

v1.0 will
include only HTML output, but a LaTeX style will be easy to add once I get
everything in place for the HTML.

Yes.The markup done by markdown is quite simple (headers, links, lists...).

Also, I plan to add many, many test cases
for this library; there's already several different sets of markdown test
cases https://github.com/trentm/python-markdown2/wiki/Testing-Notes that
the various markdown converters out there are using and I intend to use as
many of the tests I can.

That's a pretty good idea.

And yes, the example grammar builds not only a tree that's very peculiar,
but also incorrect for even very basic cases. But again, I'm fixing the
problems as I find them

What I find strange is that it's supposed to be used by peg-markdown, which
in turn is used by MultiMarkDown. So I don't get how they do that...

I'll gladly upstream the grammar when I'm done with it.

Thanks a lot! Don't forget you in the grammar's attribution and the whole
converter.

That might be the basis of a bit more general converter (adding a basic
HTML grammar and a very basic to attribute the MD grammar as LaTeX
grammar to translate docs). Then add Ddoc and we are good.

from pegged.

Valloric commented on July 17, 2024

Callumenator found a bug in the keyword function (which I just
corrected). Maybe that was it ? Could send me the hanging grammar please?
(philippe.sigaud and the google mail).

Um... read my bug report again. All the information is there. :)

WRT bug in keyword function... I suggest you compile the test case with the
latest Pegged source and try it out.

What I find strange is that it's supposed to be used by peg-markdown, which
in turn is used by MultiMarkDown. So I don't get how they do that...

I find that strange too, but there you go.

Thanks a lot! Don't forget you in the grammar's attribution and the whole
converter.

I'm making the converter a separate library which I'm going to host here on
GitHub.

from pegged.

PhilippeSigaud commented on July 17, 2024

Then this is indeed the bug in keywords (activated by proposing only strings as alternative, as in your SpaceChar rule).

This was corrected a few minutes ago by another commit and works now. I used your original example.

Output:

Test  [0, 12]["foo", " ", "bar", " ", "baz "]
 +-Test.Inlines  [0, 12]["foo", " ", "bar", " ", "baz "]
    +-Test.Inline  [0, 3]["foo"]
    |  +-Test.String  [0, 3]["foo"]
    +-Test.Inline  [3, 4][" "]
    |  +-Test.Spaces  [3, 4][" "]
    +-Test.Inline  [4, 7]["bar"]
    |  +-Test.String  [4, 7]["bar"]
    +-Test.Inline  [7, 8][" "]
    |  +-Test.Spaces  [7, 8][" "]
    +-Test.Inline  [8, 12]["baz "]
       +-Test.String  [8, 12]["baz "]

["foo", " ", "bar", " ", "baz "]

from pegged.

Valloric commented on July 17, 2024

Awesome, thanks for fixing it!

Is the hang-on-leading-space problem also fixed? Again, same test case but change the input to " foo bar baz " (not at my workstation so can't check myself, sorry).

from pegged.

PhilippeSigaud commented on July 17, 2024

void main() 
{
  auto tree = Test(" foo bar baz ");
  writeln( tree );
  writeln( tree.matches );
}

Gives

Test  [0, 13][" ", "foo", " ", "bar", " ", "baz "]
 +-Test.Inlines  [0, 13][" ", "foo", " ", "bar", " ", "baz "]
    +-Test.Inline  [0, 1][" "]
    |  +-Test.Spaces  [0, 1][" "]
    +-Test.Inline  [1, 4]["foo"]
    |  +-Test.String  [1, 4]["foo"]
    +-Test.Inline  [4, 5][" "]
    |  +-Test.Spaces  [4, 5][" "]
    +-Test.Inline  [5, 8]["bar"]
    |  +-Test.String  [5, 8]["bar"]
    +-Test.Inline  [8, 9][" "]
    |  +-Test.Spaces  [8, 9][" "]
    +-Test.Inline  [9, 13]["baz "]
       +-Test.String  [9, 13]["baz "]

[" ", "foo", " ", "bar", " ", "baz "]

So it does not hang and is OK for the first space, but there is a bug on the last one (look the baz node, it's parsing 4 chars, including the space). And using more than one ending space gives strange results.

OK, back to pegged.peg.keywords, it's still buggy.

from pegged.

PhilippeSigaud commented on July 17, 2024

OK, I found it and corrected it. Dammit, quite a few bugs for such a short template.

void main() 
{
    auto tree = Test(" foo bar baz   ");
    writeln( tree );
    writeln( tree.matches );
}

Now correctly gives:

Test  [0, 15][" ", "foo", " ", "bar", " ", "baz", "   "]
 +-Test.Inlines  [0, 15][" ", "foo", " ", "bar", " ", "baz", "   "]
    +-Test.Inline  [0, 1][" "]
    |  +-Test.Spaces  [0, 1][" "]
    +-Test.Inline  [1, 4]["foo"]
    |  +-Test.String  [1, 4]["foo"]
    +-Test.Inline  [4, 5][" "]
    |  +-Test.Spaces  [4, 5][" "]
    +-Test.Inline  [5, 8]["bar"]
    |  +-Test.String  [5, 8]["bar"]
    +-Test.Inline  [8, 9][" "]
    |  +-Test.Spaces  [8, 9][" "]
    +-Test.Inline  [9, 12]["baz"]
    |  +-Test.String  [9, 12]["baz"]
    +-Test.Inline  [12, 15]["   "]
       +-Test.Spaces  [12, 15]["   "]

[" ", "foo", " ", "bar", " ", "baz", "   "]

And the trailing spaces are parsed OK.

from pegged.

Valloric commented on July 17, 2024

Great, thanks again!

On a related note, I've found it useful to add test cases to the test suite
of whatever project I was working on when I found and fixed a bug. The new
test would then test for the absence of the bug I just fixed, to make sure
that the same problem does not occur in the future.

Personally, I've found this workflow to be incredibly useful.

from pegged.

Valloric commented on July 17, 2024

Yup, just verified, everything works now. Thanks again!

from pegged.

Problems with parsing spaces and tabs about pegged HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent