Hello! 👋 First of all, I’d like to let y’all know that I really appreciate your work

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

There was some discussion of incremental parsing at <a class="issue-link js-issue-link

incremental parsing about common-markup-state-machine HOT 6 CLOSED

micromark commented on May 28, 2024 2

incremental parsing

from common-markup-state-machine.

Comments (6)

ChristianMurphy commented on May 28, 2024 1

@Zambonifofex there's a distinction here between implementation and spec.
This spec focuses on completeness and correctness, and as @wooorm mentioned the commonmark spec that this builds on, has several features which are at odds with incremental parsing.
Including reference links as you mention.

At an implementation level parsers may subtly diverge from the spec, to support faster parsing avoiding excessive backtracking or lookahead.
For example this incremental commonmark parser: https://github.com/ikatyang/tree-sitter-markdown
notes:

Note: This grammar is based on the assumption that link label matchings will never fail since reference links can come before their reference definitions, which causes it hard to do incremental parsing without this assumption.

from common-markup-state-machine.

ChristianMurphy commented on May 28, 2024

There was some discussion of incremental parsing at micromark/micromark#8 (comment)
@wooorm is working on a new version of the Micromark parser, he may be able to comment better on whether incremental parsing is still in scope.

This repo is mostly spec, not implementation, it sounds like you are interested in an implementation of an incremental markdown parser?
If so it may sense to continue this at https://github.com/micromark/micromark

from common-markup-state-machine.

zamfofex commented on May 28, 2024

This repo is mostly spec, not implementation, it sounds like you are interested in an implementation of an incremental markdown parser?

I imagined that a change in the spec might have been necessary to accomodate for incremental parser implementations, or at least to facilitate it somehow.

If you think this fits better on the micromark repo, you might be able to transfer this issue there depending on your permissions on both repositories.

from common-markup-state-machine.

wooorm commented on May 28, 2024

CM compliance is at odds with incremental parsing, as something on the first line of the document could affect the last line. Or something at the start of a paragraph could affect the end. As the thread Christian linked to explains in more detail.

I imagined that a change in the spec might have been necessary to accomodate for incremental parser implementations, or at least to facilitate it somehow.

That would make it no longer compatible with commonmark though 🤔

from common-markup-state-machine.

zamfofex commented on May 28, 2024

[…] something on the first line of the document could affect the last line.

That’s true, but the cases you had mentioned are kinda edge cases in my opinion. Very rarely will people have documents that look like those, and in those circumstances, it’d make sense and be fine to have to reparse as much as necessary.

[…] something at the start of a paragraph could affect the end.

In my opinion, incremental parsing only really needs to be done at the block level. For the inline level, it feels a bit unnecessary/overkill.

The actually problematic thing (which I didn’t see mentioned in that thread) is related to reference links. Ideally, if someone adds a new reference to the document, a paragraph would only need to be reparsed if it contains potential links whose target would have been that reference (if it had been present earlier, that is).

The problem is that it’s not clear to me whether it’s easy to parse potential link references beforehand that way.

from common-markup-state-machine.

wooorm commented on May 28, 2024

Going to close this because the conversation seems resolved.

I don’t think I’ll document incremental parsing here, as it deviates from CommonMark, and I’m not interesting in doing that.

For whether people want to skip this case and implement it differently in their parsers, I’m fine with that and I find it quite understandable!

from common-markup-state-machine.

incremental parsing about common-markup-state-machine HOT 6 CLOSED

Comments (6)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent