Giter Site home page Giter Site logo

vsch / flexmark-java Goto Github PK

View Code? Open in Web Editor NEW
2.2K 58.0 259.0 152.52 MB

CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.

License: BSD 2-Clause "Simplified" License

Java 97.24% Shell 0.05% JavaScript 0.02% HTML 2.59% CSS 0.10%
commonmark pegdown java markdown-parser markdown-processor markdown-conversion markdown-flavors markdown markdown-to-html html-to-markdown

flexmark-java's People

Contributors

1024c avatar bashtian avatar benelog avatar bvn13 avatar chiwanpark avatar dependabot[bot] avatar derari avatar gomiguchi avatar groxx avatar haumacher avatar jinneej avatar jjybdx4il avatar jochenberger avatar markkolich avatar minidigger avatar niklasf avatar niksw7 avatar parth avatar pcj avatar prayagverma avatar qwazer avatar rems avatar robinst avatar roxspring avatar sentyaev avatar spand avatar sparksparrow avatar tobiasstadler avatar vnaso avatar vsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flexmark-java's Issues

Unique id attribute to correspond with AST object id

Consider the following markdown:

# Header 1
Paragraph 1

Paragraph 2, Line 1
Paragraph 2, Line 2
Paragraph 2, Line 3

Paragraph 3, Line 1
#Header 2

What effort would be involved in making this generate unique, sequential, IDs that correspond to values in the AST? For example:

<h1 id="1">Header 1</h1>
<p id="2">Paragraph 1</p>

<p id="3">
<span id="4">Paragraph 2, Line 1</span>
<span id="5">Paragraph 2, Line 2</span>
<span id="6">Paragraph 2, Line 3</span>
</p>

<p id="7">Paragraph 3, Line 1</p>
<h1 id="8">Header 2</h1>

When the user is editing the markdown text:

Paragraph 2, Line 2|

The AST value for the caret's position would be 5, which corresponds to the <span id="5"> value in the HTML document.

no anchor links in HTML

The AnchorLink extension does not add anchor links to rendered HTML, but they are in the AST.

String markdown = "# head1\n\n## head2\n\nsome text";

Parser parser = Parser.builder()
		.extensions(Arrays.asList(AnchorLinkExtension.create()))
		.build();
Node document = parser.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

HtmlRenderer renderer = HtmlRenderer.builder()
		.extensions(Arrays.asList(AnchorLinkExtension.create()))
		.build();
System.out.println(renderer.render(document));

The AST (with anchor links):

Document[0, 28]
  Heading[0, 7] textOpen:[0, 1, "#"] text:[2, 7, "head1"]
    AnchorLink[2, 7]
      Text[2, 7] chars:[2, 7, "head1"]
  Heading[9, 17] textOpen:[9, 11, "##"] text:[12, 17, "head2"]
    AnchorLink[12, 17]
      Text[12, 17] chars:[12, 17, "head2"]
  Paragraph[19, 28]
    Text[19, 28] chars:[19, 28, "some text"]

The HTML output (not anchor links):

<h1>head1</h1>
<h2>head2</h2>
<p>some text</p>

Add FormatterExtension to Table Extension

When used with Formatter renderer with default options will convert:

day|time|spent
:---|:---:|--:
nov. 2. tue|10:00|4h 40m 
nov. 3. thu|11:00|4h
nov. 7. mon|10:20|4h 20m 
total:|| **13h**

to

| day         | time  |   spent |
|:------------|:-----:|--------:|
| nov. 2. tue | 10:00 |  4h 40m |
| nov. 3. thu | 11:00 |      4h |
| nov. 7. mon | 10:20 |  4h 20m |
| total:             || **13h** |

Maven Repository

I'd like to use this in my project to have the same result as the Markdown Navigator plugin.

I don't see this plugin on the maven repository? Do you plan to publish it there?

Is there any simple example code to generate a html output for given markdown file?

Problem with escapes in links

I have as input: [link](\(foo\)) (see http://spec.commonmark.org/0.27/#example-464).

In my visit(Link node) the reference text I get is \(foo\)which means that the escapes are not processed. Thus when I render this in HTML I get: <a href="\(foo\)">... instead of <a href="(foo)">....

Would be nice if flexmark-java could process the escapes and have a way in a Link object to get the processed reference.

WDYT?

Add Support for Wiki Images

New Extension overview for flexmark-ext-wikilink

flexmark-java extension for wiki links

Converts references that are wrapped in [[]] into wiki links with optional text separated by
|.

Will also convert ![[]] to image links if IMAGE_LINKS extension option is enabled.

Options:

  • DISABLE_RENDERING default false, if true then rendering of wiki links is disabled and they
    will render as plain text of the element node

  • IMAGE_PREFIX default "", prefix to add to wiki link page reference

  • IMAGE_LINKS default false, true will enable ![[]] image link syntax

  • IMAGE_FILE_EXTENSION default "", extension to be added to wiki image file refs

  • LINK_FIRST_SYNTAX default false, if true then [[page ref|link text]] syntax is used,
    otherwise [[link text|page ref]] syntax. Affects both link and image wiki references.

  • LINK_PREFIX default "", prefix to add to wiki link page reference

  • LINK_FILE_EXTENSION default "", extension to be added to wiki link page refs

Wrong startOffset in HardLineBreak

The startOffset in HardLineBreak is wrong in two cases:

a) if using backslash at the end of line, the startOffset does not include the backslash

example from ast_spec.md (line 7332):

foo\
bar
.
<p>foo<br />
bar</p>
.
Document[0, 9]
  Paragraph[0, 9]
    Text[0, 3] chars:[0, 3, "foo"]
    HardLineBreak[4, 5]
    Text[5, 8] chars:[5, 8, "bar"]

HardLineBreak starts at 4, but it should start at 3 where the backslash is.

b) if using more than two space characters at the end of a line, only the last two space characters are included in HardLineBreak, but it would IMO make more sense to include all trailing space characters in HardLineBreak

example from ast_spec.md (line 12564):

foo       
baz
.
<p>foo<br />
baz</p>
.
Document[0, 15]
  Paragraph[0, 15]
    Text[0, 3] chars:[0, 3, "foo"]
    HardLineBreak[8, 11]
    Text[11, 14] chars:[11, 14, "baz"]

HardLineBreak starts at 8, but it should start at 3 where the first trailing space character is

Abbreviation node not called when 2 abbreviations

Hi, I'm now adding support for MD abbreviations.

I have added the AbbreviationExtension and registered a handler:

        AbbreviationNodeVisitor abbreviationNodeVisitor = new AbbreviationNodeVisitor(this.visitor, this.listeners);
        this.visitor.addHandlers(
            new VisitHandler<>(Abbreviation.class, abbreviationNodeVisitor::visit)
        );

And in my visit method:

    public void visit(Abbreviation node)
    {
        // Since XWiki doesn't support abbreviations, we generate an HTML <abbr> element.
        String html;
        if (StringUtils.isNotEmpty(node.getAbbreviation())) {
            html = String.format("<abbr title=\"%s\">%s</abbr>", node.getAbbreviation(),
                String.valueOf(node.getChars()));
        } else {
            html = String.format("<abbr>%s</abbr>", String.valueOf(node.getChars()));
        }
        getListener().onRawText(html, Syntax.HTML_4_01);
    }

The problem comes from the following input:

The HTML specification is maintained by the W3C.

*[HTML]: Hyper Text Markup Language
*[W3C]:  World Wide Web Consortium

In my case I get only 1 call for my visit(Abbreviation node)method instead of the 2 I was expecting.

In other words, I don't get a call for the W3Cabbreviation.

Any idea?

Thanks again!

Table caption support?

Hi,

I'm workjng on migrating table support in XWiki, moving from pegdown to flexmark-java. I have basic support working but advanced usages still need some tuning.

For example the following input was working before in pegdown:

col1   |col2    |
-------|--------|
cell11 | cell12 |
cell21 | cell22 |
[caption]

But with flexmark-java and the tables extension, the caption is considered as text.

I see that table caption is mentioned at

but I couldn't find any test for it in the md file.

Could you let me know if this is implemented and if not, if it's planned?

Note that I do have a handler registered for TableCaption but my visit(TableCaption node)is not called with the input given above.

Thanks

Out of date documentation for Pegdown Migration Helper

I'm trying to follow this: https://github.com/vsch/flexmark-java#pegdown-migration-helper. However, I don't see the flexmark-profile-pegdown module in maven here: https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22com.vladsch.flexmark%22, and the pom for that module seems like it's kind of old (https://github.com/vsch/flexmark-java/blob/master/flexmark-profile-pegdown/pom.xml) because it references flexmark-java version 0.8.0. Is there a new way to accomplish Pegdown emulation that's not in the README?

HtmlBlock and newline

I have the following input:

hello

<table>
  <tr>
    <td>Foo</td>
  </tr>
</table>

world

And in the following method:

    public void visit(HtmlBlock node)
    {
        getListener().onRawText(String.valueOf(node.getChars()), Syntax.HTML_4_01);
    }

I get the following for String.valueOf(node.getChars()):

<table>\n  <tr>\n    <td>Foo</td>\n  </tr>\n</table>\n

I'm concerned about the last \n. In pegdown I wasn't getting any trailing newline, which seems better IMO since this means the newline char would be issued in another Node.

What's the rationale for including the trailing newline in the HTML node?

Thanks

Documentation about maven dependencies

I tried to use flexmark-java but simply adding flexmark-java from maven central did not work because it does not have the MutableDataSet class (used in the README example). Adding flexmark-util solved this problem but I am still missing ParserEmulationProfile, Parser and HtmlRenderer. How can I use flexmark-java in my project using Maven dependencies?

Unclosed FencedCodeBlock endOffset too small

If a fenced code block is missing the closing marker, then the whole text until the end-of-file is code, but the endOffset in FencedCodeBlock is not at the end-of-file. It is at the end of the opening marker.

You can notice this in you fantastic Markdown Navigator plugin when you enter ~~~ somewhere, then the text below does not get gray background (but is rendered as code in preview).

String markdown = "~~~\ncode\ncode2\ncode3";

Parser parser = Parser.builder().build();
Node document = parser.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

Outputs:

Document[0, 20]
  FencedCodeBlock[0, 4] open:[0, 3, "~~~"] content:[4, 20] lines[3]

BTW the AST output of FencedCodeBlock always outputs lines[3]. The number is always 3. Should this output the number of lines in the block?

Link reference definitions indented by spaces not recognized

If link reference definitions are indented by spaces, then only the first link reference definition is recognized (without leading spaces, it works).

String markdown = "aaa [link1] bbb [link2] ccc [link3]\n"
		+ "\n"
		+ "   [link1]: http://link1\n"
		+ "   [link2]: http://link2\n"
		+ "   [link3]: http://link3";
Parser parser = Parser.builder().build();
Node document = parser.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

Outputs:

Document[0, 111]
  Paragraph[0, 36]
    Text[0, 4] chars:[0, 4, "aaa "]
    LinkRef[4, 11] referenceOpen:[4, 5, "["] reference:[5, 10, "link1"] referenceClose:[10, 11, "]"]
      Text[5, 10] chars:[5, 10, "link1"]
    Text[11, 16] chars:[11, 16, " bbb "]
    LinkRef[16, 23] referenceOpen:[16, 17, "["] reference:[17, 22, "link2"] referenceClose:[22, 23, "]"]
      Text[17, 22] chars:[17, 22, "link2"]
    Text[23, 28] chars:[23, 28, " ccc "]
    LinkRef[28, 35] referenceOpen:[28, 29, "["] reference:[29, 34, "link3"] referenceClose:[34, 35, "]"]
      Text[29, 34] chars:[29, 34, "link3"]
  Reference[40, 61] refOpen:[40, 41, "["] ref:[41, 46, "link1"] refClose:[46, 48, "]:"] url:[49, 61, "http://link1"]
  Paragraph[65, 111]
    LinkRef[65, 72] referenceOpen:[65, 66, "["] reference:[66, 71, "link2"] referenceClose:[71, 72, "]"]
      Text[66, 71] chars:[66, 71, "link2"]
    Text[72, 86] chars:[72, 86, ": htt … link2"]
    SoftLineBreak[86, 87]
    LinkRef[90, 97] referenceOpen:[90, 91, "["] reference:[91, 96, "link3"] referenceClose:[96, 97, "]"]
      Text[91, 96] chars:[91, 96, "link3"]
    Text[97, 111] chars:[97, 111, ": htt … link3"]

Surprisingly, if I add the TaskListExtension, then it works as expected:

Parser parser2 = Parser.builder()
		.extensions(Arrays.asList(TaskListExtension.create()))
		.build();
Node document2 = parser2.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document2));

Outputs:

Document[0, 111]
  Paragraph[0, 36]
    Text[0, 4] chars:[0, 4, "aaa "]
    LinkRef[4, 11] referenceOpen:[4, 5, "["] reference:[5, 10, "link1"] referenceClose:[10, 11, "]"]
      Text[5, 10] chars:[5, 10, "link1"]
    Text[11, 16] chars:[11, 16, " bbb "]
    LinkRef[16, 23] referenceOpen:[16, 17, "["] reference:[17, 22, "link2"] referenceClose:[22, 23, "]"]
      Text[17, 22] chars:[17, 22, "link2"]
    Text[23, 28] chars:[23, 28, " ccc "]
    LinkRef[28, 35] referenceOpen:[28, 29, "["] reference:[29, 34, "link3"] referenceClose:[34, 35, "]"]
      Text[29, 34] chars:[29, 34, "link3"]
  Reference[40, 61] refOpen:[40, 41, "["] ref:[41, 46, "link1"] refClose:[46, 48, "]:"] url:[49, 61, "http://link1"]
  Reference[65, 86] refOpen:[65, 66, "["] ref:[66, 71, "link2"] refClose:[71, 73, "]:"] url:[74, 86, "http://link2"]
  Reference[90, 111] refOpen:[90, 91, "["] ref:[91, 96, "link3"] refClose:[96, 98, "]:"] url:[99, 111, "http://link3"]

Add ability to pass parameters to wiki links and wiki images

Use case 1: be able to support query string and anchors. Note that the reason for not use reference?a=b&c=d is to be able to easily support having ?in wiki page names.

Use case 2: be able to resize the images.

Syntax proposal: `[[label|reference||parameters]]``

Examples:

  • [[label|reference||queryString="a='b'&c=d" anchor="anchor"]]
  • [[image||width="300px"]]

Alternative: something like: [[label|reference]](parameters).

Yet another alternative is to make it even more generic and be able to pass parameters to any inline or block syntax element (as it's possible in the xwiki syntax for example). In the xwiki syntax you write the following:

(% a="b" c=d %)<element here>

Example1: (% a="b" c=d %)[[reference]]
Example2: Parameter for a list (e.g if you wish to pass the type of list):

(% style="list-style-type:disc" %)
* item 1
* item 2

Then it's up to the renderer to decide what it'll do with the parameters and what it'll honor.

The last alternative is to do nothing and force the user to use HTML but it's not very nice and I've already had users asking for support for wiki link anchors.

Let me know what you think. I could simply start by implementing support for reference?a=b&c=d for now and provide an escape character when users want to use ?in the wiki page name (or have them URL-encode it).

Thanks

NodeVisitor does not visit all children

If I have a VisitHandler on a (block) node, then NodeVisitor does not visit the children of that (block) node.

E.g if having VisitHandler for BulletList and Emphasis then the Emphasis handler is not invoked for lists.

The only way I've found to do this is to override NodeVisitor.visit(Node):

NodeVisitor visitor = new NodeVisitor(
	new VisitHandler<>(BulletList.class, this::visit),
	new VisitHandler<>(Emphasis.class, this::visit))
{
	@Override
	public void visit(Node node) {
		VisitHandler<?> handler = myCustomHandlersMap.get(node.getClass());
		if (handler != null)
			handler.visit(node);
		visitChildren(node);
	}
};
visitor.visit(astRoot);

Is this intended behavior or a bug?

How to add attribute 'class' to AutoLink node

Hi,
I'm new to flexmark. And this is not a bug report. I just want to get some help here:

  1. How to add 'class' attribute to AutoLink node?
  2. Is it possible to configure FlexMark to render \n as <br/>, for example, FlexMark now renders below markdown:
hello
world

to:

<p>hello\nworld</p>

I want to get below html instead:

<p>hello<br/>world</p>

Thanks in advance.

[Question] @ is missed

Hi,
Not sure whether this is a bug or a feature, text:

@someone

is rendered as

someone

Subscript, superscript, and struck text

Some markdown flavours support subscripts, superscripts, and struck text:

October 31^st^
H~2~O
~~Commonmark~~ Flexmark is amazing.

Flexmark extensions for these would be useful.

IndentedCodeBlock endOffset too large?

I think the endOffset of IndentedCodeBlock is too large. It is at the start of the next paragraph, so the IndentedCodeBlock includes trailing line separators and empty lines.

On the other hand, the end offset in FencedCodeBlock does not include trailing line separators and empty lines.

String markdown = "\tcode\n\nsome text";

Parser parser = Parser.builder().build();
Node document = parser.parse(markdown);
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

Outputs:

Document[0, 16]
  IndentedCodeBlock[1, 7]
  Paragraph[7, 16]
    Text[7, 16] chars:[7, 16, "some text"]

Shouldn't the IndentedCodeBlock end at 5 ?

DefinitionList extension doesn't seem to work

Hi, I've just tried using the DefinitionExtension extension but my visit(Definition*) methods are not called. Looking at the sources of the extensions, it seems the code is commented out at

public class DefinitionExtension implements Parser.ParserExtension, HtmlRenderer.HtmlRendererExtension {

I need to add support for definition lists which I was handling before with pegdown (for XWiki). Any idea?

Thanks!

Incorrect emphasis close marker source offset

When closing emphasis delimiter is partially used by an inner delimiter run the index needs to be adjusted by number of delimiters used so that when it is finally processed the closing sequence will reflect the correct position.

Hard line breaks do not work if markdown text/files uses CR LF as line separator

flexmark seems not handle CR characters in markdown text/files, which breaks hard-line-breaks (and maybe other things?). If a file uses CR LR as line separator, hard line breaks do not work.

With CR:

Parser parser = Parser.builder().build();

Node document = parser.parse("aaa  \r\nbbb\\\r\nccc");
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

Outputs (not hard line breaks; CR in text):

Document[0, 16]
  Paragraph[0, 16]
    Text[0, 6] chars:[0, 6, "aaa  \r"]
    SoftLineBreak[6, 7]
    Text[7, 12] chars:[7, 12, "bbb\\r"]
    SoftLineBreak[12, 13]
    Text[13, 16] chars:[13, 16, "ccc"]

Without CR:

Node document2 = parser.parse("aaa  \nbbb\\\nccc");
System.out.println(new AstCollectingVisitor().collectAndGetAstText(document2));

Outputs (correct):

Document[0, 14]
  Paragraph[0, 14]
    Text[0, 3] chars:[0, 3, "aaa"]
    HardLineBreak[3, 6]
    Text[6, 9] chars:[6, 9, "bbb"]
    HardLineBreak[9, 11]
    Text[11, 14] chars:[11, 14, "ccc"]

Issue with Image reference AST event order

Hi,

I'm trying to migrate from pegdown to flexmark-java for XWiki (http://xwiki.org) and I'm hitting an issue with the order of the AST events when using the following input:

![image.png][1]

[1]: image.png

In pegdown the following methods were called in that order:

  • visit(ReferenceNode referenceNode)
  • visit(RefImageNode refImageNode)

However in flexmark-java it's the opposite:

  • visit(ImageRef node)
  • visit(Reference node)

The issue is that XWiki's own AST model doesn't support image reference so I was generating an image node, resolving the reference. Now it seems it's no longer possible since I only get a Reference node after method passing an ImageRef has been called.

Is there a way for me to resolve the reference inside my visit(ImageRef node)?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.