commonmark / commonmark-java Goto Github PK

Java library for parsing and rendering CommonMark (Markdown)

License: BSD 2-Clause "Simplified" License

Java 99.81% JavaScript 0.08% Shell 0.10%

commonmark java library markdown parser renderer

commonmark-java's Introduction

CommonMark

CommonMark is a rationalized version of Markdown syntax, with a spec and BSD-licensed reference implementations in C and JavaScript.

Try it now!

For more details, see https://commonmark.org.

This repository contains the spec itself, along with tools for running tests against the spec, and for creating HTML and PDF versions of the spec.

The reference implementations live in separate repositories:

https://github.com/commonmark/cmark (C)
https://github.com/commonmark/commonmark.js (JavaScript)

There is a list of third-party libraries in a dozen different languages here.

Running tests against the spec

The spec contains over 500 embedded examples which serve as conformance tests. To run the tests using an executable $PROG:

python3 test/spec_tests.py --program $PROG

If you want to extract the raw test data from the spec without actually running the tests, you can do:

python3 test/spec_tests.py --dump-tests

and you'll get all the tests in JSON format.

JavaScript developers may find it more convenient to use the commonmark-spec npm package, which is published from this repository. It exports an array tests of JSON objects with the format

{
  "markdown": "Foo\nBar\n---\n",
  "html": "<h2>Foo\nBar</h2>\n",
  "section": "Setext headings",
  "number": 65
}

The spec

The source of the spec is spec.txt. This is basically a Markdown file, with code examples written in a shorthand form:

```````````````````````````````` example
Markdown source
.
expected HTML output
````````````````````````````````

To build an HTML version of the spec, do make spec.html. To build a PDF version, do make spec.pdf. For both versions, you must have the lua rock lcmark installed: after installing lua and lua rocks, luarocks install lcmark. For the PDF you must also have xelatex installed.

The spec is written from the point of view of the human writer, not the computer reader. It is not an algorithm---an English translation of a computer program---but a declarative description of what counts as a block quote, a code block, and each of the other structural elements that can make up a Markdown document.

Because John Gruber's canonical syntax description leaves many aspects of the syntax undetermined, writing a precise spec requires making a large number of decisions, many of them somewhat arbitrary. In making them, we have appealed to existing conventions and considerations of simplicity, readability, expressive power, and consistency. We have tried to ensure that "normal" documents in the many incompatible existing implementations of Markdown will render, as far as possible, as their authors intended. And we have tried to make the rules for different elements work together harmoniously. In places where different decisions could have been made (for example, the rules governing list indentation), we have explained the rationale for our choices. In a few cases, we have departed slightly from the canonical syntax description, in ways that we think further the goals of Markdown as stated in that description.

For the most part, we have limited ourselves to the basic elements described in Gruber's canonical syntax description, eschewing extensions like footnotes and definition lists. It is important to get the core right before considering such things. However, we have included a visible syntax for line breaks and fenced code blocks.

Differences from original Markdown

There are only a few places where this spec says things that contradict the canonical syntax description:

It allows all punctuation symbols to be backslash-escaped, not just the symbols with special meanings in Markdown. We found that it was just too hard to remember which symbols could be escaped.
It introduces an alternative syntax for hard line breaks, a backslash at the end of the line, supplementing the two-spaces-at-the-end-of-line rule. This is motivated by persistent complaints about the “invisible” nature of the two-space rule.
Link syntax has been made a bit more predictable (in a backwards-compatible way). For example, Markdown.pl allows single quotes around a title in inline links, but not in reference links. This kind of difference is really hard for users to remember, so the spec allows single quotes in both contexts.
The rule for HTML blocks differs, though in most real cases it shouldn't make a difference. (See the section on HTML Blocks for details.) The spec's proposal makes it easy to include Markdown inside HTML block-level tags, if you want to, but also allows you to exclude this. It also makes parsing much easier, avoiding expensive backtracking.

It does not collapse adjacent bird-track blocks into a single blockquote:

> these are two

> blockquotes

> this is a single
>
> blockquote with two paragraphs

Rules for content in lists differ in a few respects, though (as with HTML blocks), most lists in existing documents should render as intended. There is some discussion of the choice points and differences in the subsection of List Items entitled Motivation. We think that the spec's proposal does better than any existing implementation in rendering lists the way a human writer or reader would intuitively understand them. (We could give numerous examples of perfectly natural looking lists that nearly every existing implementation flubs up.)
Changing bullet characters, or changing from bullets to numbers or vice versa, starts a new list. We think that is almost always going to be the writer's intent.
The number that begins an ordered list item may be followed by either . or ). Changing the delimiter style starts a new list.
The start number of an ordered list is significant.
Fenced code blocks are supported, delimited by either backticks (```) or tildes (~~~).

Contributing

There is a forum for discussing CommonMark; you should use it instead of github issues for questions and possibly open-ended discussions. Use the github issue tracker only for simple, clear, actionable issues.

Authors

The spec was written by John MacFarlane, drawing on

his experience writing and maintaining Markdown implementations in several languages, including the first Markdown parser not based on regular expression substitutions (pandoc) and the first markdown parsers based on PEG grammars (peg-markdown, lunamark)
a detailed examination of the differences between existing Markdown implementations using BabelMark 2, and
extensive discussions with David Greenspan, Jeff Atwood, Vicent Marti, Neil Williams, and Benjamin Dumke-von der Ehe.

Since the first announcement, many people have contributed ideas. Kārlis Gaņģis was especially helpful in refining the rules for emphasis, strong emphasis, links, and images.

commonmark-java's People

Contributors

Stargazers

Watchers

Forkers

frickler pcj msgitter samn chiwanpark spiffygit chengchaos mgs255 prayagverma devslash-paul derari foxyv sathish-1492 vetesii birdnofoot ccrama radicaled semanticbeeng marcins pabranch melnicki openwide-java vmware-archive codesforliving jarvisxiong kongxianghe1234 yqpan1991 mattsheppard vmelnychuk noway1979 szeiger drobert evpaassen olafdietsche d-baer ashang tobre6 mrginglymus vdesmet93 arilesgit yusong666666 my19 davidpeterson laihui0207 kookey duhonghao erikvanzijst partito-radicale lirenmi00 jonsampson cmlanche hyl87 wang-song py389172739 kevinyzy ldrozdz kevinkelley gandus10 michellekwa jleider 1065448858 xzel23 saipsa hiukwok cwf818 andersab hl123123 arcnor capsicum trello-archive s-u-g-a-r crkjohn ashish-cloned-forked-repo perfoware rowhit stanleywin alistairgj grtlinux johannescalvin zyhui98 christianopiccinin nodejs-fabric teiniker princeshow chiyutianyi morristech walviealv wendelanchieta vytm kahtaf jawscout advancemen hhy5277 tinylamb yesltd itflypig awfeequdng literaryprogrammer thirukkural2022 jcxwyh

commonmark-java's Issues

Can't compile from fresh clone

Not sure if this is some interaction with something on my machine, but prevents me from compiling the source. I even deleted my ~/.m2 directory and pulled down dependencies from scratch. I had previously been able to build this. Any ideas?

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce 
  (ban-milestones-and-release-candidates) on project commonmark-ext-autolink: 
   Execution ban-milestones-and-release-candidates of goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce failed: 
org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: Could not resolve dependencies for project com.atlassian.commonmark:commonmark-ext-autolink:jar:0.2.1-SNAPSHOT: 
Could not find artifact com.atlassian.commonmark:commonmark:jar:tests:0.2.1-SNAPSHOT -> [Help 1]

mvn compile
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 6 modules...
[INFO] Installing Nexus Staging features:
[INFO]   ... total of 6 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] commonmark-java parent
[INFO] commonmark-java core
[INFO] commonmark-java extension for autolinking
[INFO] commonmark-java extension for strikethrough
[INFO] commonmark-java extension for tables
[INFO] commonmark-java integration tests
[INFO] 
[INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building commonmark-java parent 0.2.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-build-environment) @ commonmark-parent ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (ban-milestones-and-release-candidates) @ commonmark-parent ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:copy-resources (copy-license) @ commonmark-parent ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building commonmark-java core 0.2.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-build-environment) @ commonmark ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (ban-milestones-and-release-candidates) @ commonmark ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:copy-resources (copy-license) @ commonmark ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ commonmark ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ commonmark ---
[INFO] Compiling 5 source files to /Users/pcj/pow/opt/commonmark-java/commonmark/target/classes
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building commonmark-java extension for autolinking 0.2.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-build-environment) @ commonmark-ext-autolink ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (ban-milestones-and-release-candidates) @ commonmark-ext-autolink ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] commonmark-java parent ............................ SUCCESS [  0.663 s]
[INFO] commonmark-java core .............................. SUCCESS [  0.588 s]
[INFO] commonmark-java extension for autolinking ......... FAILURE [  0.026 s]
[INFO] commonmark-java extension for strikethrough ....... SKIPPED
[INFO] commonmark-java extension for tables .............. SKIPPED
[INFO] commonmark-java integration tests ................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.321 s
[INFO] Finished at: 2015-09-21T09:09:55-07:00
[INFO] Final Memory: 18M/222M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce (ban-milestones-and-release-candidates) on project commonmark-ext-autolink: Execution ban-milestones-and-release-candidates of goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce failed: org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: Could not resolve dependencies for project com.atlassian.commonmark:commonmark-ext-autolink:jar:0.2.1-SNAPSHOT: Could not find artifact com.atlassian.commonmark:commonmark:jar:tests:0.2.1-SNAPSHOT -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :commonmark-ext-autolink

Html blocks not wrapped in <p/> when escapeHtml set to true

I'm using the Parser/HtmlRenderer combo with escapeHtml(true), but this leaves html blocks without being wrapped in a paragraph tag. Please see the following example. It seems like incorrect behaviour to me.

Parsing code

public String render(String text) {
    Parser parser = Parser.builder().build();
    HtmlRenderer renderer = HtmlRenderer.builder().escapeHtml(true).build();
    Node document = parser.parse(text);
    return renderer.render(document);
}

Input

This is a paragraph.

This is a paragraph.

<div>html here</div>

This is a paragraph.

This is a paragraph.

<strong>html here</strong>

This is a paragraph.

This is a paragraph.

<p>html here</p>

This is a paragraph.

Expected output

<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>&lt;div&gt;html here&lt;/div&gt;</p>
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>&lt;strong&gt;html here&lt;/strong&gt;</p>
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>&lt;p&gt;html here&lt;/p&gt;</p>
<p>This is a paragraph.</p>

Actual output

<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
&lt;div&gt;html here&lt;/div&gt;
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>&lt;strong&gt;html here&lt;/strong&gt;</p>
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
&lt;p&gt;html here&lt;/p&gt;
<p>This is a paragraph.</p>

Note that the line <strong>html here</strong> actually is wrapped, but the other two aren't.

Update to CommonMark spec 0.26

Changes (from here):

empty list items can no longer interrupt a paragraph; this resolves an ambiguity with setext headers
ordered lists can interrupt a paragraph only when beginning with 1
the two-blank-lines-breaks-out-of-lists rule has been removed
the spec for emphasis and strong emphasis has been refined to give more intuitive results in some cases
tabs can be used after the # in an ATX header and between the markers in a thematic break

Spec changelog: http://spec.commonmark.org/changelog.txt
Spec diff: http://spec.commonmark.org/0.26/changes.html

CommonMark/Markdown renderer (round-trip parsing/rendering)

It's useful to be able to render a parsed document back to CommonMark markdown. This is a tracker issue for issues relating to that, so far:

Unable to disambiguate emphasis delimiter #10
Unable to disambiguate list-item type #11

Escaping of pipe symbol in gfm-tables-0.4.1 is not supported

GFM tables allow escaping of the pipe symbol in table cells. For example:

AAA	BBB
a	b

This is not supported in gfm-tables-0.4.1.

Github wiki style markdown issue

As an experiment, I took the wiki from http://github.com/osmdroid/osmdroid, cloned it and ran it through the commonmark and ran into a few issues.

it appears that relative links to another wiki page isn't resolved...

2. Walk through the [tutorial](How-to-use-the-osmdroid-library)

The target page is generated but the extension .html is missing. Not sure if there's anything that can be done to work around this.
2. The double bracket syntax isn't appear to be handled at all

[[osmdroid thirdparty|osmdroid thirdparty]]

If necessary, I can go back and alter all of the wiki pages to use full urls, but as always, I'll look for less painful solution

Table borders

Is there a way to inject a css style into generated tables or to turn on borders? I'd love a bootstrap styled .table kind of thing

Unable to disambiguate emphasis delimiter

AST nodes do not provide start or end indices into the source and Emphasis.java and StrongEmphasis.java do not provide delimiter infomation. Therefore, for the input:

Hello *Italic* **Bold** _Emph_ __Strong__ ~~Strike~~!

It does not currently seem possible to parse this, build an AST, and emit the same markdown back out, as the metadata about the delimiter character is lost.

Add underline support

Hello,

Could we add a UnderlineExtension in a new artifact commonmark-ext-gfm-underline? It would be a cousin class of the existing StrikethroughExtension in artifact commonmark-ext-gfm-strikethrough, with few little changes on this new class and others related to.

If you agree this issue, I could submit a pull request. However, I just need to know which delimiter should be the more accurate to do so.
I thing we should use an "easy to use" character on both QWERTY and AZERTY keyboards (and some others?), with a double occurrence. Here are some proposals: -- or && or %%.

Thank you in advance for your feedback.

Support for conditional delimiter processing

Thinking about superscript and subscript: in pandoc, the following is valid 2^10^, but ^a cat^ is not interpreted as superscript, unless one escapes the space (^a\ cat^).

How would the current delimiter parsing scheme handle this?

StringIndexOutOfBoundsException in InlineParserImpl

Attempting to parse the wrong string like this [example.com](http:\\example.com leads to the following exception:

java.lang.StringIndexOutOfBoundsException: length=32; index=32
            at java.lang.String.charAt(Native Method)
            at org.commonmark.internal.InlineParserImpl.parseCloseBracket(InlineParserImpl.java:579)
            at org.commonmark.internal.InlineParserImpl.parseInline(InlineParserImpl.java:303)
            at org.commonmark.internal.InlineParserImpl.parse(InlineParserImpl.java:157)
            at org.commonmark.internal.ParagraphParser.parseInlines(ParagraphParser.java:61)
            at org.commonmark.internal.DocumentParser.processInlines(DocumentParser.java:349)
            at org.commonmark.internal.DocumentParser.finalizeAndProcess(DocumentParser.java:495)
            at org.commonmark.internal.DocumentParser.parse(DocumentParser.java:84)
            at org.commonmark.parser.Parser.parse(Parser.java:61)

Order of Custom Extension Factories

Hi, I'm creating an extension to parsing YAML style metadata of markdown. Hyphens are generally used to divide metadata section. Following is example:

---
key1: value1
key2: value2

---

markdown document start!

This format is widely used. We can find lots of example in web.

But I cannot create the extension to parse metadata because core factories precede the custom extension factories. In above example, --- is considered as horizontal line by HorizontalRuleParser.

I think that the custom extension factories should precede the core factories. It would be helpful to extend parser.

Document if the classes are threadsafe.

Can a Parser or HtmlRenderer be used in a threaded environment - shared across threads?

is it possible to alter laziness in block quote

Just found the library and tried it to write some extension to support non standard syntax.
In short, i want to use commonmark-java library infrastructure to parse and/or render non standard syntax too.

Now, i'm trying to implement the blockquote rule but without laziness.
For example:

>Quote line1
>Quote line2
This should not be part of quote

Using above syntax i expected the result like below(but without extra line between the 'Quote line2' and 'This should not be part of quote'

Quote line1
Quote line2

This should not be part of quote

Is it possible to alter the laziness for my blockquote rule?
Which file should i read more?

Add support for asymmetric delimiters

org.commonmark.parser.DelimiterProcessor only supports inline elements with symmetric delimiters, like * and _ (or ~~ for strikethrough). I'd like to write an extension for inline elements with asymmetric delimiters (in my case, {}), but this is not possible with the current design.

I have considered using a org.commonmark.parser.PostProcessor, but I'd like my extension to be parsed with the same precedence as other inline elements.

Update to CommonMark spec 0.24

From the diff between 0.22 and 0.23:

header -> heading
horizontal rule -> thematic break
HtmlTag -> HtmlInline (to follow commonmark.js)
ATX heading: must be space (check regex)
Entity or numeric cahracter references in raw HTML
No more optional whitespace after link label

From the diff between 0.23 and 0.24:

Update spec example parsing
Headings with multiple lines
Parentheses inside the link destination may be escaped
No spaces in in link destination (even with <>)
Link scheme whitelist removed (link)

TextContent extension for GFM tables

There is an extension for parsing and rendering markdown with GFM tables to HTML, but there is no support for textcontent.

In my usecase I want to be able to parse and render markdown content with GFM tables to plain text and html. Especially for inline style GFM tables and empty headers, the formatting could be improved by an extension, in my opinion.

This is a possible text output:

Markdown | Less | Pretty
--- | --- | ---
*Still* | `renders` | **nicely**
1 | 2 | 3

Override rendering functions

Hi. Is there a way to override the rendering functions? For example, I'd like to render a link differently if it links to page on our domain vs one off-domain.

I was thinking to override the RenderVisitor inner class in HtmlRenderer but it has private access. Is there another way to do this?

Update to CommonMark spec 0.22

Make the following succeed:

./etc/update-spec.sh 0.22 && mvn clean test

See changes here: http://spec.commonmark.org/0.22/changes.html

I don't think we handle CR (without following LF) correctly ATM
Note changed conditions of HTML blocks

Using A Visitor

in your visitor example:

Node node = parser.parse("...");
MyVisitor visitor = new MyVisitor();
node.accept(visitor);

class MyVisitor extends AbstractVisitor {
    @Override
    public void visit(Paragraph paragraph) {
        // Do something with paragraph (override other methods for other nodes):
        System.out.println(paragraph);
        // Descend into children:
        visitChildren(paragraph);
    }
}

What should be printed to System out? Because I tried on a few things and when visiting both headers and paragraphs all it prints is the type folowed by curly braces like this:

INFO: Visit:(p)- Paragraph{}

I thought maybe it would print out the contents of the node? How do I get to the content of the node?

generating a pdf

Hi,

I'm trying to generate pdf using this project with creating a custom org.commonmark.renderer.Renderer. the problem is, the render method returns String which in my case it has flaw design and is not needed, IMHO, it would be better if it was outputstream or byte[] do you have any suggestion ?

AttributeProvider not working for TablesExtension and images

As documented in the source code (TableHtmlRenderer#renderBlock()):

// TODO: What about attributes? If we got the renderer instead of the visitor, we could call getAttributes.

To be able to set custom attributes would be very handy especially when working with Tables!

More in general: IMO AttributeProvider should be working for all tags, including images.

Parse backtick quotes (`) failed.

I want to start a code block like <div>、<table>、<pre>、<p> etc. It is begin and end with a backtick quotes(`).
But it looks like the parser don't understand this. The parser show a backpack quote char and then start a large div block, and the div block don't end until it reached the end of the page.

Document NodeRenderer to customize HTML rendering in README

It's too hidden in the Javadoc. We should have a section in the README showing how to use it, and maybe link to the Javadocs for HtmlRenderer.Builder.

Indent after numbers is not consistent with reference commonmark implementation

The following text:

1. foo


     this should not be code

Renders the last line as code, rather than text. I think this is a bug.

This does not match the reference commonmark implementation: http://spec.commonmark.org/dingus/

The majority of other implementations do not show this behaviour: http://johnmacfarlane.net/babelmark2/?normalize=1&text=1.+foo%0A%0A%0A++++bar

I believe the relevant part of the commonmark spec is 5.3 Lists, which says:

A list is a sequence of one or more list items of the same type. The list items may be separated by any number of blank lines.

Note that this was confusingly worded in earlier versions of the spec: https://talk.commonmark.org/t/multiple-blank-lines-inside-a-list/2289

Disable autoconverting HTML entity references

As the title says, is there a way to disable autoconverting HTML entity references, e.g. α to α? I am saving the converted HTML in a DB which is Windows-1252 encoded (can't be changed right now) so having some problems.

Header anchor extension fails on android

When running the renderer on android, I get an IllegalArgumentException:

java.lang.IllegalArgumentException: Unsupported flags: 256
    at java.util.regex.Pattern.<init>(Pattern.java:1320)
    at java.util.regex.Pattern.compile(Pattern.java:971)
    at org.commonmark.ext.heading.anchor.IdGenerator.<init>(IdGenerator.java:14)
    at org.commonmark.ext.heading.anchor.IdGenerator.<init>(IdGenerator.java:13)
    at org.commonmark.ext.heading.anchor.IdGenerator$Builder.build(IdGenerator.java:106)
    at org.commonmark.ext.heading.anchor.internal.HeadingIdAttributeProvider.<init>(HeadingIdAttributeProvider.java:20)
    at org.commonmark.ext.heading.anchor.internal.HeadingIdAttributeProvider.create(HeadingIdAttributeProvider.java:24)
    at org.commonmark.ext.heading.anchor.HeadingAnchorExtension$1.create(HeadingAnchorExtension.java:61)
    at org.commonmark.renderer.html.HtmlRenderer$RendererContext.<init>(HtmlRenderer.java:199)
    at org.commonmark.renderer.html.HtmlRenderer$RendererContext.<init>(HtmlRenderer.java:188)
    at org.commonmark.renderer.html.HtmlRenderer.render(HtmlRenderer.java:62)
    at org.commonmark.renderer.html.HtmlRenderer.render(HtmlRenderer.java:69)

Here is the code I am using to run the renderer

String md = "# markdown"
List<Extension> extensions = Arrays.asList(
        TablesExtension.create(),
        StrikethroughExtension.create(),
        AutolinkExtension.create(),
        HeadingAnchorExtension.create(),
        InsExtension.create());
Parser parser = Parser.builder().extensions(extensions).build();
HtmlRenderer renderer = HtmlRenderer.builder().extensions(extensions).build();
Node document = parser.parse(md);
renderer.render(document);

And here is the relevant section of my gradle config

apply plugin: 'com.android.application'

android {
    compileSdkVersion 25
    buildToolsVersion "25.0.0"

    defaultConfig {
        minSdkVersion 21
        targetSdkVersion 25
        jackOptions {
            enabled true
        }
        ...
    }
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
    ...
}

dependencies {
    ...
    ext.commonmark = "0.7.1"
    compile "com.atlassian.commonmark:commonmark:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-gfm-tables:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-autolink:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-gfm-strikethrough:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-heading-anchor:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-ins:$commonmark"
}

This looks very similar to this issue. Any help would be great, thanks!

Escape Raw HTML does not work.

I have tried to render some text including raw html inside. Whatever I give in escapeHtm() true or false it doesn't work. the given text renders like below:

< test > |&| < /test >

<test&gt |&amp| &lt/test&gt

There can be a bug in the detection part. correct me if i am doing something wrong.

Unexpected success in commonmark-android-test with android-10

In commit 4cfc2e0, I changed

diff --git a/commonmark-android-test/app/build.gradle b/commonmark-android-test/app/build.gradle
index 3ca56fb..5d940d1 100644
--- a/commonmark-android-test/app/build.gradle
+++ b/commonmark-android-test/app/build.gradle
@@ -10,13 +10,13 @@ if (testPropertiesFile.canRead()) {
 }
 
 android {
-    compileSdkVersion 16
-    buildToolsVersion "21.1.1"
+    compileSdkVersion 10
+    buildToolsVersion "21.1.2"
 
     defaultConfig {
         applicationId "com.atlassian.commonmark.android.test"
-        minSdkVersion 16
-        targetSdkVersion 16
+        minSdkVersion 10
+        targetSdkVersion 10
         versionCode 1
         versionName "1.0"

Even though I would expect it to fail, gradle reports

...
:app:packageSnapshotDebugAndroidTest
:app:assembleSnapshotDebugAndroidTest
:app:connectedSnapshotDebugAndroidTest
BUILD SUCCESSFUL
Total time: 1 mins 12.542 secs

and the report contains

<?xml version='1.0' encoding='UTF-8' ?>
<testsuite name="com.atlassian.commonmark.android.test.AndroidSupportTest" tests="8" failures="0" errors="0" skipped="0" time="50.773" timestamp="2017-03-02T17:26:42" hostname="localhost">
  <properties>
    <property name="device" value="android-10(AVD) - 2.3.3" />
    <property name="flavor" value="SNAPSHOT" />
    <property name="project" value="app" />
  </properties>
  <testcase name="parseTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.03" />
  <testcase name="headingAnchorExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="7.379" />
  <testcase name="insExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.549" />
  <testcase name="strikethroughExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.069" />
  <testcase name="autolinkExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.423" />
  <testcase name="tablesExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.019" />
  <testcase name="yamlFrontMatterExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.07" />
  <testcase name="htmlRendererTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.059" />
</testsuite>

When I copy commonmark-core into a test project (android-10) and build it as part of this project, the build fails as expected.

An option to add automatic IDs to headings

GitHub flavoured markdown adds automatic IDs to headings, and it would be great to see this as an option here (disclaimer: I'm hoping to see the feature in Stash/Bitbucket server).

This is not in commonmark (and won't be added) though some implementations even do this by default.

In the case of GitHub, a heading #foo gets an ID user-generated-foo to avoid clobbering IDs used elsewhere. There's then some JS that means that you can use #foo in your URL instead of #user-generated-foo.

This would allow us to use tools like markdown-toc or doctoc on Bitbucket server.

Android support

There's currently a fork with changes for making commonmark-java work on Android. This issue is about merging those changes back, and making sure we don't break things for Android in the future.

Initial discussion here: https://github.com/Doist/commonmark-android/commit/9ff69424f603c6b8c0ddb6419419d651c0de7380#commitcomment-15439453

Strike Literal value inconsistent

I'm just exploring this library a bit, will post issues as I see stuff. Sorry if too trivial. For the string:

Hello *Italic* **Bold** _Emph_ __Strong__ ~~Strike~~!

The printed AST is:

Document{}
.Paragraph{}
..Text{literal=Hello }
..Emphasis{}
...Text{literal=Italic}
..Text{literal= }
..StrongEmphasis{}
...Text{literal=Bold}
..Text{literal= }
..Emphasis{}
...Text{literal=Emph}
..Text{literal= }
..StrongEmphasis{}
...Text{literal=Strong}
..Text{literal= ~~Strike~~!}

Preservation of the double-tilde in the literal attribute appears inconsistent with the other node implementations. Not sure if this has any side-effect, but I thought I'd point it out.

Table rendering is failing

using commonmark-ext-gfm-tables:0.7.0 with the following input. It actually produces nothing.


| Module    |Javadocs   |
| ------    |--------   |
| gradle-fury-validation    | [Javadocs](javadocs/gradle-fury-validation/index.html)    |
| hello-gradhell    | [Javadocs](javadocs/hello-gradhell/index.html)    |
| hello-universe-lib    | [Javadocs](javadocs/hello-universe-lib/index.html)    |
| hello-world-aar   | [debug](javadocs/hello-world-aar/debug/index.html)     |
| hello-world-aar   | [release](javadocs/hello-world-aar/release/index.html)     |
| hello-world-lib   | [Javadocs](javadocs/hello-world-lib/index.html)   |
| hello-world-war   | [Javadocs](javadocs/hello-world-war/index.html)   |

I've tried adding more whitespace, removing the links, removing the leading and trailing pipes. What am I doing wrong?

            List<org.commonmark.Extension> extensions = new ArrayList<>();
            extensions.add(org.commonmark.ext.gfm.tables.TablesExtension.create());
            extensions.add(org.commonmark.ext.gfm.strikethrough.StrikethroughExtension.create());
            extensions.add(org.commonmark.ext.autolink.AutolinkExtension.create());
            Parser parser = Parser.builder().extensions(extensions).build();

            Node document = parser.parse(contents);
            HtmlRenderer renderer = HtmlRenderer.builder().build();
            String contents = renderer.render(document);  // document is the content above

Add source position/maps to AST

Having source positions is a useful feature for editors, as it allows linking blocks between the source and the rendered output. commonmark.js supports it, see highlighted blocks in preview of dingus.

There is some code for adding source positions to blocks, but it's untested and not currently exposed.

Allow rendering to OutputStream/Writer

Similar to #2, rendering to a stream should be possible. Also fairly simple to implement.

Can we have a 0.4.1 release?

I'm currently working on an app that deals with small texts and I figured it would be nice if it had basic CommonMark support.
I included compile 'com.atlassian.commonmark:commonmark:0.4.0' to my dependencies and added the Parser, but building and running the code would throw an exception:

java.util.regex.PatternSyntaxException: U_ILLEGAL_ARGUMENT_ERROR
^\p{IsWhite_Space}

I saw that this issue was fixed (b954f82) right after the 0.4.0 release, so a new version soon would be great! 👍

Allow custom implementations of InlineParsers

Hi there!

Continuing with the theme of extensibility I'd like to be able to specify the inline parser used to parse markdown.

In my particular use case I'd like to disable a few features of markdown (e.g., inline images) and add a few of my own (e.g., @name will be recognized and parsed as something specific).

Proposal:
Add a method to Builder, public Builder inlineParser(InlineParser parser) that allows passing an implementation of InlineParser to Parser.

e.g.,

Parser.builder().inlineParser(new AtMentionParser()).build()

Update to CommonMark spec 0.27

Changelog: http://spec.commonmark.org/changelog.txt

Diff: http://spec.commonmark.org/0.27/changes.html

TODO:

Add h2..h6 to block tag list
Check link precedence (see commonmark/commonmark-spec#427)

Editing enclosing <pre> tag attributes for an IndentedCodeBlock with AttributeProviderFactory

I'm new to commonmark-java, and am trying to figure out how to apply a CSS class to all <pre> tags surrounding an IndentedCodeBlock. I was successfully able to apply attributes to Header and BlockQuote nodes, but if I match a Node with IndentedCodeBlock and add a class to the attribute map, then it applies it to the <code> tag in the HTML it renders.

I think what's happening here is that <pre> isn't a Node, just an artefact of the way the HtmlRenderer renders the IndentedCodeBlock (and FencedCodeBlock) Nodes, and so I'm never able to override the attributes for the <pre> tag because it never goes through the AttributeProviderContext as a Node.

If I'm understanding this correctly, the only way to do this would be to override the rendering itself for FencedCodeBlocks and IndentedCodeBlocks.

Do you think it makes sense to allow for overriding the attributes of tags where more than 1 set of tags is generated per node? Perhaps the AttributesProvider could provide attributes for the outer tags first with an override for providing attributes inside of the nested structure?

StringIndexOutOfBoundsException on empty ordered list input.

Using com.atlassian.commonmark:commonmark:0.1.0:

import static org.junit.Assert.assertEquals;

import org.commonmark.html.HtmlRenderer;
import org.commonmark.node.Node;
import org.commonmark.parser.Parser;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;

@RunWith(JUnit4.class)
public class MarkdownTest {
  @Test
  public void shouldWork() {
    Parser parser = Parser.builder().build();
    Node root = parser.parse("2.");

    HtmlRenderer renderer = HtmlRenderer.builder().escapeHtml(false).build();
    String html = renderer.render(root);

    assertEquals(
        "<ol start=\"2\">\n" +
            "<li></li>\n" +
            "</ol>",
        html);
  }
}

The test above fails with:

java.lang.StringIndexOutOfBoundsException: String index out of range: 2
  at java.lang.String.charAt(String.java:658)
  at org.commonmark.internal.util.Substring.charAt(Substring.java:32)
  at java.lang.Character.codePointAt(Character.java:4668)
  at org.commonmark.internal.util.Parsing.isLetter(Parsing.java:55)
  at org.commonmark.internal.DocumentParser.incorporateLine(DocumentParser.java:194)
  at org.commonmark.internal.DocumentParser.parse(DocumentParser.java:83)
  at org.commonmark.parser.Parser.parse(Parser.java:45)
  at MarkdownTest.shouldWork(MarkdownTest.java:32)

For reference, here's the output for commonmark.js: http://spec.commonmark.org/dingus/?text=2.

Edit Fixed rendering of dingus link

Check GFM extensions against new spec

GitHub just posted on their blog that GitHub-flavored Markdown is now CommonMark + extensions and has a spec: https://githubengineering.com/a-formal-spec-for-github-markdown/

Spec lives here: https://github.github.com/gfm/

Check our implementation of tables, strikethrough (and maybe autolinking) against the spec.

Inline delimiter parser can not be registered more than once, delimiter character: ~

Trying to implement a separate extension for subscript like H~2~0, but I can't use tilde in conjunction with the existing ext-gfm-strikethrough extension, as they clash on parser registration. A workaround seems to be create a new extension that bundles both, something like ext-subscript-and-gfm-strikethrough that works similar to the EmphasisDelimiterProcessor. This is somewhat ugly, but perhaps reasonable since this should be a fairly rare occasion.

Thoughts?

Publish Javadoc somewhere

The API documentation should be published somewhere, probably on gh-pages.

Should use a nice theme like doclava.

StringIndexOutOfBoundsException parsing list followed by unfenced code block

This Markdown fragment throws StringIndexOutOfBoundsException from DocumentParser:

    String markdown = "## Do this\n" +
        "- cd to foo\n" +
        "\n" +
        "\tgit clone https://...\n" +
        "\n" +
        "## Next\n";

java.lang.StringIndexOutOfBoundsException: String index out of range: 8
    at org.commonmark.internal.util.Substring.subSequence(Substring.java:50)
    at org.commonmark.internal.DocumentParser.addLine(DocumentParser.java:330)
    at org.commonmark.internal.DocumentParser.incorporateLine(DocumentParser.java:245)
    at org.commonmark.internal.DocumentParser.parse(DocumentParser.java:74)
    at org.commonmark.parser.Parser.parse(Parser.java:61)

I think the parser is confused exiting the unfenced code block and leaving columnIsInTab=true, resulting in afterTab > line.length() and the parser exploding:

            // Our column is in a partially consumed tab. Expand the remaining columns (to the next tab stop) to spaces.
            int afterTab = index + 1;
            CharSequence rest = line.subSequence(afterTab, line.length());

Allow InputStream or Reader as Parser input

Currently, the only input type is String. In case the input is read from a stream, this means it has to be read into a String first, and then the parser has another copy of the data as block content.

Accepting an InputStream (or Reader?) instead would allow to get rid of one of the copies. DocumentParser already processes the input line by line, so this should be trivial.

Raw inline HTML incorrectly processed

Raw HTML is not handled according to the spec. Example markdown:

test raw html:
<a><bab><c2c>

Actual output from commonmark-java:

<p>test raw html:</p>
<p>&lt;a&gt;&lt;bab&gt;&lt;c2c&gt;</p>

Expected output:

<p>test raw html:</p>
<a><bab><c2c>

See http://spec.commonmark.org/0.22/#example-559

Unable to disambiguate list-item type

Similar to #10, list-item types are lost.

* Item 1
* Item 2
- Dash 1
- Dash 2

No information in the AST what the list item type was. Prevents round-trip parsing/emitting of markdown documents.

Could be solved by annotating AST nodes with start/end source positions. Not clear to me how to go about doing that.

commonmark-ext-gfm-tables single column table

Would expect to be able to render a single column table. Following code does not render a table.

    final String input = "| First Header |\n" +
            "| ------------- |\n" +
            "| Content Cell |\n" +
            "| Content Cell |\n";
    List<Extension> extensions = Arrays.asList(TablesExtension.create());
    Parser parser = Parser.builder().extensions(extensions).build();
    Node document = parser.parse(input);
    HtmlRenderer renderer = HtmlRenderer.builder().extensions(extensions).build();
    System.out.println(renderer.render(document));

AST visitor pattern?

Great that you are implementing the commonmark spec in java.

I like the way pegdown implements the Visitor Pattern to be able to modify or interpret the AST once constructed. Is there a similar or planned facility for commonmark-java? For example, in pegdown:

import org.pegdown.PegDownProcessor;
import org.pegdown.Extensions;
import org.pegdown.ast.RootNode;
import org.pegdown.ast.Visitor;

...

int exts
    = Extensions.DEFINITIONS
    | Extensions.AUTOLINKS
    | Extensions.HARDWRAPS
    | Extensions.TABLES
    | Extensions.STRIKETHROUGH
    | Extensions.SUPPRESS_ALL_HTML
    ;

PegDownProcessor processor = new PegDownProcessor(exts);
RootNode root = processor.parseMarkdown(s.toCharArray());

MyVisitor visitor = new MyVisitor();
root.accept(visitor);

...

class MyVisitor implements Visitor {

    @Override
    public void visit(org.pegdown.ast.Node node) {
        log.debug("visit {}", node);
    }

   ...
}

Thanks,
Paul

Adding ability to exclude some parsers

Hi! Is it possible to add an ability (via Builder) to disable some parsers? There could be cases when people want to parse everything except lists for example. Or for some cases people need to parse only bold and italic. It would be useful to have only one lib for all these cases :)