xoofx / markdig Goto Github PK

A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET

License: BSD 2-Clause "Simplified" License

C# 100.00%

commonmark markdown-processor markdown markdown-parser markdown-to-html commonmark-parsing gfm markdown-flavors dotnet dotnetcore

markdig's Introduction

Markdig

Markdig is a fast, powerful, CommonMark compliant, extensible Markdown processor for .NET.

NOTE: The repository is under construction. There will be a dedicated website and proper documentation at some point!

You can try Markdig online and compare it to other implementations on babelmark3

Features

Very fast parser and html renderer (no-regexp), very lightweight in terms of GC pressure. See benchmarks
Abstract Syntax Tree with precise source code location for syntax tree, useful when building a Markdown editor.
- Checkout Markdown Editor v2 for Visual Studio 2022 powered by Markdig!
Converter to HTML
Passing more than 600+ tests from the latest CommonMark specs (0.30)
Includes all the core elements of CommonMark:
- including GFM fenced code blocks.
Extensible architecture
- Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable builtin Markdown/Commonmark parsing (e.g Disable HTML parsing) or change behaviour (e.g change matching # of a headers with @)
Roundtrip support: Parses trivia (whitespace, newlines and other characters) to support lossless parse ⭢ render roundtrip. This enables changing markdown documents without introducing undesired trivia changes.
Built-in with 20+ extensions, including:
- 2 kind of tables:
  - Pipe tables (inspired from GitHub tables and PanDoc - Pipe Tables)
  - Grid tables (inspired from Pandoc - Grid Tables)
- Extra emphasis (inspired from Pandoc - Emphasis and Markdown-it)
  - strike through ~~,
  - Subscript ~
  - Superscript ^
  - Inserted ++
  - Marked ==
- Special attributes or attached HTML attributes (inspired from PHP Markdown Extra - Special Attributes)
- Definition lists (inspired from PHP Markdown Extra - Definitions Lists)
- Footnotes (inspired from PHP Markdown Extra - Footnotes)
- Auto-identifiers for headings (similar to Pandoc - Auto Identifiers)
- Auto-links generates links if a text starts with http:// or https:// or ftp:// or mailto: or www.xxx.yyy
- Task Lists inspired from Github Task lists.
- Extra bullet lists, supporting alpha bullet a. b. and roman bullet (i, ii...etc.)
- Media support for media url (youtube, vimeo, mp4...etc.) (inspired from this CommonMark discussion)
- Abbreviations (inspired from PHP Markdown Extra - Abbreviations)
- Citation text by enclosing ""..."" (inspired by this CommonMark discussion )
- Custom containers similar to fenced code block ::: for generating a proper <div>...</div> instead (inspired by this CommonMark discussion )
- Figures (inspired from this CommonMark discussion)
- Footers (inspired from this CommonMark discussion)
- Mathematics/Latex extension by enclosing $$ for block and $ for inline math (inspired from this CommonMark discussion)
- Soft lines as hard lines
- Emoji support (inspired from Markdown-it)
- SmartyPants (inspired from Daring Fireball - SmartyPants)
- Bootstrap class (to output bootstrap class)
- Diagrams extension whenever a fenced code block contains a special keyword, it will be converted to a div block with the content as-is (currently, supports mermaid and nomnoml diagrams)
- YAML Front Matter to parse without evaluating the front matter and to discard it from the HTML output (typically used for previewing without the front matter in MarkdownEditor)
- JIRA links to automatically generate links for JIRA project references (Thanks to @clarkd: https://github.com/clarkd/MarkdigJiraLinker)
Starting with Markdig version 0.20.0+, Markdig is compatible only with NETStandard 2.0, NETStandard 2.1, NETCoreApp 2.1 and NETCoreApp 3.1.

If you are looking for support for an old .NET Framework 3.5 or 4.0, you can download Markdig 0.18.3.

Third Party Extensions

Documentation

The repository is under construction. There will be a dedicated website and proper documentation at some point!

While there is not yet a dedicated documentation, you can find from the specs documentation how to use these extensions.

In the meantime, you can have a "behind the scene" article about Markdig in my blog post "Implementing a Markdown Engine for .NET"

Download

Markdig is available as a NuGet package:

Also Markdig.Signed NuGet package provides signed assemblies.

Usage

The main entry point for the API is the Markdig.Markdown class:

By default, without any options, Markdig is using the plain CommonMark parser:

var result = Markdown.ToHtml("This is a text with some *emphasis*");
Console.WriteLine(result);   // prints: <p>This is a text with some <em>emphasis</em></p>

In order to activate most of all advanced extensions (except Emoji, SoftLine as HardLine, Bootstrap, YAML Front Matter, JiraLinks and SmartyPants)

// Configure the pipeline with all advanced extensions active
var pipeline = new MarkdownPipelineBuilder().UseAdvancedExtensions().Build();
var result = Markdown.ToHtml("This is a text with some *emphasis*", pipeline);

Try it online!

You can have a look at the MarkdownExtensions that describes all actionable extensions (by modifying the MarkdownPipeline)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. For detailed contributing guidelines, please see contributing.md.

Build

In order to build Markdig, you need to install .NET 6.0

License

This software is released under the BSD-Clause 2 license.

Benchmarking

The latest benchmark was collected on April 23 2022, against the following implementations:

Markdig (version: 0.30.2): itself
cmark (version: 0.30.2): Reference C implementation of CommonMark, no support for extensions
CommonMark.NET(master) (version: 0.15.1): CommonMark implementation for .NET, no support for extensions, port of cmark, deprecated.
MarkdownSharp (version: 2.0.5): Open source C# implementation of Markdown processor, as featured previously on Stack Overflow, regexp based.

// * Summary *

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.202
  [Host]     : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
  DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT


|            Method |       Mean |     Error |    StdDev |
|------------------ |-----------:|----------:|----------:|
|           markdig |   1.979 ms | 0.0221 ms | 0.0185 ms |
|             cmark |   2.571 ms | 0.0081 ms | 0.0076 ms |
|    CommonMark.NET |   2.016 ms | 0.0169 ms | 0.0158 ms |
|     MarkdownSharp | 221.455 ms | 1.4442 ms | 1.3509 ms |

Markdig is roughly x100 times faster than MarkdownSharp
20% faster than the reference cmark C implementation

Donate

If you are using this library and find it useful for your project, please consider a donation for it!

Credits

Thanks to the fantastic work done by John Mac Farlane for the CommonMark specs and all the people involved in making Markdown a better standard!

This project would not have been possible without this huge foundation.

Thanks also to the project BenchmarkDotNet that makes benchmarking so easy to setup!

Some decoding part (e.g HTML EntityHelper.cs) have been re-used from CommonMark.NET

Thanks to the work done by @clarkd on the JIRA Link extension (https://github.com/clarkd/MarkdigJiraLinker), now included with this project!

Author

Alexandre MUTEL aka xoofx

markdig's People

Contributors

Stargazers

Watchers

Forkers

qujck rendlelabs eschoool modulexcite jorrit billbogaiv jither capjan vfarah-if thoemmi synhershko embix iwhp tarmil azizkap21 priaonehaha ioninteractive weralabaj mazalet svg-useful-backup shiroyacha crazyants johnsimons ygrenier sandboxorg daniel15 matthewrichards maldworth jkloop45 akzent dudb theolivenbaum clarkd pearsonn joannaz quickorbedead fredmerlo dmerson codesanook manuh73 mode777 kenmuse shareduck peinearydevelopment mlaily tthiery john0king murraymol maddalenatech lemon-go pulsarfx pebezo leotsarev gopycc boyoko rchinni-rythmos pkodathala-rythmos yevhen labreuer kryptos-fr gragra33 unity-technologies asine markheath copernicus365 sigswtool karaoke-dev adityapatwardhan macaba pascalberger cyberkingvb augustoproiete-forks forester123 v-tawe thaihoc215 hemantkd goblinlaboratory craigtp atulloh elypsiarecords monoblaine andinpontes payou42 closetoyou293 wenjason strogo aidmsu andy840119 fornever curriyo benpoell mentalos tangino mhmd-azeez mihazupan awesomedotnetcore dampee lijinfeng042 leogoncalvess nkolev92

markdig's Issues

Inlines Span, Column, Line are empty

I noticed ASTs Span, Column, Line and was hoping to use the same for inlines. But they are empty.
Here is a repro, to show what I mean. The repro is in powershell, I hope it's not too much troubles to decrypt it.

# load dll in the PowerShell session
> Import-Module Markdig.dll
> $ast = [Markdig.Markdown]::Parse("`n`n[foo](bar)")
> $ast                                                                                                   


LastLine                  : 1
Inline                    : {Markdig.Syntax.Inlines.LiteralInline}
ProcessInlines            : True
Parent                    : {Markdig.Syntax.ParagraphBlock}
Parser                    : Markdig.Parsers.ParagraphBlockParser
IsOpen                    : True
IsBreakable               : True
RemoveAfterProcessInlines : False
Column                    : 0
Line                      : 2
Lines                     : 
Span                      : 2-11



> $ast.Inline                                                                                            


Parent          : {Markdig.Syntax.Inlines.LiteralInline}
PreviousSibling : 
NextSibling     : 
IsClosed        : False
Column          : 0
Line            : 0
Content         : foo
Span            : 0-0

Note that in the first case Column, Line and Span are all correctly populated, but for the Inline they are all 0.

[Suggestion] Make markdig renderer-agnostic

Currently Markdig is still coupled with its HTML renderer. This makes writing new renderers very challenging.

One of the issue I encountered recently is when trying to also support extensions in my XAML renderer. For example when configuring the pipeline the following way:

var pipelinenew MarkdownPipelineBuilder()
    .UseTaskLists()
    .Build();

the TaskList extension is adding aHtmlTaskListRenderer. But in my case I would like it to add my own custom XamlTaskListRenderer instead.

Arguably I could duplicate the whole extensions mechanism to inject my own implementations but I think this defeats the purpose. Ideally all the existing code for the extensions (parsers, inlines) with the exceptions of the renderer should stay the same. So it would be nice to have another way to match/replace renderers without breaking the whole extension mechanism.

Going further, the current Markdig could be split into two assemblies: Markdig.Core which is unaware of any specific renderer (e.g. in my case I don't need to ship the HTML rendering specific code) and Markdig.HTML for the HTML rendering (which in my case is replaced by Markdig.Xaml).

[SmartyPants] is incompatible with UsePipeTables

Using SmartyPants like this stops PipeTables from working:

var builder = new MarkdownPipelineBuilder()
    .UseSmartyPants()
    .UseAdvancedExtensions();

Span not set for EmphasisInline and LinkInline

The Span property always return a 0,0 span for EmphasisInline and LinkInline.

This might be the same for other inline objects, but I've only tested for these two types

MarkdownPipelineBuilder thread safe?

so a MarkdownPipeline is not thread safe. but is a MarkdownPipelineBuilder thread safe?

Problem restarting list after paragraph

The following markdown:

1.   First item

Some text

2.   Second item

should render as:

<ol>
<li>First item</li>
</ol>
<p>Some text</p>
<ol start="2">
<li>Second item</li>
</ol>

but instead renders as:

<ol>
  <li>
    First item
  </li>
</ol>
<p>
  Some text
</p>
<p>
  2. Second item
</p>

This issue is related to the change in CommonMark 0.26 where starting a list after a paragraph is only allowed using 1. to start to list.
For some reason, Markdig is not respecting the blank line after the paragraph to allow continuation.

LinkHelper.Urilize

Tiny oversight in LinkHelper.Urilize - it throws out any and all digits. The easy fix:

LinkHelper.cs, Line 64 in current revision, insert:

else if (c.IsDigit())
{
    headingBuffer.Append(c);
    previousIsSpace = false;
}

And a few complimentary, non-exhaustive, tests. A few of them have a comment about not being equivalent to what Pandoc does (it doesn't trim trailing special chars). No-one says they have to be, though. 😃

[TestCase("Header identifiers in HTML", "header-identifiers-in-html")]
[TestCase("* Dogs*?--in *my* house?"  , "dogs-in-my-house")] // Not Pandoc equivalent: dogs--in...
[TestCase("[HTML], [S5], or [RTF]?"   , "html-s5-or-rtf")]
[TestCase("3. Applications"           , "applications")]
[TestCase("33"                        , "")]
public void TestUrilizeNonAscii_Pandoc(string input, string expectedResult)
{
    Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, false));
}

[TestCase("abc" , "abc")]
[TestCase("a-c" , "a-c")]
[TestCase("a c" , "a-c")]
[TestCase("a_c" , "a_c")]
[TestCase("a.c" , "a.c")]
[TestCase("a,c" , "ac")]
[TestCase("a--" , "a")] // Not Pandoc-equivalent: a--
[TestCase("a__" , "a")] // Not Pandoc-equivalent: a__
[TestCase("a.." , "a")] // Not Pandoc-equivalent: a..
[TestCase("a??" , "a")]
[TestCase("a  " , "a")]
[TestCase("a--d", "a-d")]
[TestCase("a__d", "a_d")]
[TestCase("a??d", "ad")]
[TestCase("a  d", "a-d")]
[TestCase("a..d", "a..d")]
[TestCase("-bc" , "bc")]
[TestCase("_bc" , "bc")]
[TestCase(" bc" , "bc")]
[TestCase("?bc" , "bc")]
[TestCase(".bc" , "bc")]
[TestCase("a-.-", "a")] // Not Pandoc equivalent: a-.-
public void TestUrilizeOnlyAscii_Simple(string input, string expectedResult)
{
    Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, true));
}

[TestCase("bær", "br")]
[TestCase("bør", "br")]
[TestCase("bΘr", "br")]
[TestCase("四五", "")]
public void TestUrilizeOnlyAscii_NonAscii(string input, string expectedResult)
{
    Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, true));
}

[TestCase("bár"   , "bar")]
[TestCase("àrrivé", "arrive")]
public void TestUrilizeOnlyAscii_Normalization(string input, string expectedResult)
{
    Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, true));
}

[TestCase("123"  , "")]
[TestCase("1,-b" , "b")]
[TestCase("b1,-" , "b1")] // Not Pandoc equivalent: b1-
[TestCase("ab3"  , "ab3")]
[TestCase("ab3de", "ab3de")]
public void TestUrilizeOnlyAscii_Numeric(string input, string expectedResult)
{
    Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, true));
}

[TestCase("一二三四五", "一二三四五")]
[TestCase("一,-b"    , "一-b")]
public void TestUrilizeNonAscii_NonAsciiNumeric(string input, string expectedResult)
{
    Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, false));
}

[TestCase("bær"  , "bær")]
[TestCase("æ5el" , "æ5el")]
[TestCase("-æ5el", "æ5el")]
[TestCase("-frø-", "frø")]
[TestCase("-fr-ø", "fr-ø")]
public void TestUrilizeNonAscii_Simple(string input, string expectedResult)
{
    Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, false));
}

// Just to be sure, test for characters expressly forbidden in URI fragments:
[TestCase("b#r"  , "br")]
[TestCase("b%r"  , "br")] // Invalid except as an escape character
[TestCase("b^r"  , "br")]
[TestCase("b[r"  , "br")]
[TestCase("b]r"  , "br")]
[TestCase("b{r"  , "br")]
[TestCase("b}r"  , "br")]
[TestCase("b<r"  , "br")]
[TestCase("b>r"  , "br")]
[TestCase(@"b\r" , "br")]
[TestCase(@"b""r", "br")]
public void TestUrilizeNonAscii_NonValidCharactersForFragments(string input, string expectedResult)
{
    Assert.AreEqual(expectedResult, LinkHelper.Urilize(input, false));
}

TaskList missing CSS class

The GitHub flavor for tasks lists looks like this:

- [ ] test

It renders in Markdig into:

<li><input disabled="disabled" type="checkbox" /> test</li>

GitHub renders it with a class name on the <li> element so it can be styled differently so it doesn't have a CSS list style bullet

<li class="task-list-item"><input disabled="disabled" type="checkbox" /> test</li>

And the <ul> gets a contains-task-list class attribute

Intercepting builtin tags?

In the readme you mention you can plug into the core parsing:

Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable builtin Markdown/Commonmark parsing (e.g Disable HTML parsing) or change behaviour (e.g change matching # of a headers with @)

Is there an example of this you could share? I'm looking specifically for image and link tags (as I mentioned in your blog post) - I want to rewrite the urls.

The MarkdownSharp way of doing this is to hack the source, for example add an event handler call inside private string DoImages(string text). Given your architecture I'm guessing it's a lot less messy in Markdig.

Extension request: Automatically convert urls to hyperlinks

Forgive me but with not much documentation and a short trip through the source I didn't see this. If it is there already, feel free to close this. :)

I just found this library and it looks excellent! I'm converting over from MarkdownSharp. One of the things it offers is the ability to automatically convert urls to hyperlinks. I'd love to do the same with this library.

table formatting won't work

The pandoc pipe table example won't render as long as there is a : for right align or centered formatting in the header line.

works:

| abc | def | ghi |
|---|---|---|
| 1 | 2 | 3 |

doesn't work, doesn't render as html

| abc | def | ghi |
|:---:|---|---:|
| 1 | 2 | 3 |

version is 0.5.12

Command line version

Is there a command line version? If not would you accept a pull request for one?

Support for GitHub task list syntax

It would be great if checkboxes could be shown in lists using this syntax:

- [ ] Not implemented yet
- [x] Implemented

GitHub parses that and adds read-only checkboxes to the list items

Rendering issue of code span and header with Advanced extensions

From Markdown Editor issue 4

With advanced extensions, the following markdown is not rendered properly:

See example on babelmark

`https://{domain}/callbacks`
#### HEADING
Paragraph

outputs the invalid html:

<p>
  <code>https://{domain}/callbacks</code>
</p>
<h4 id="section" domain>
</h4>
<p>
  Paragraph
</p>

Any chance of getting syntax highlighting?

Make the MarkdownPipeline thread safe / immutable

The MarkdownPipeline is currently not threadsafe and is not immutable.
Options are to use a MarkdownPipelineBuilder and a .Build() method that would freeze the pipeline into an opaque MarkdownPipeline

Quid of properties on parsers? We may not make it fully immutable (e.g we would leave OpeningCharacters not immutable to simplify the design and avoid having to introduce a freeze methods for parsers) but at least prevent parsers to be added/removed after the pipeline has been created. Still this has to be determined and documented clearly

Method naming question on OrderedList

Ref. https://github.com/lunet-io/markdig/blob/v0.5.11/src/Markdig/Helpers/OrderedList.cs#L105

Should this be ReplaceBy or possibly ReplaceWith?

Add precise source location for inner parts of link elements

A follow-up of the issue #2 for some inline elements

Link and Images. They have multiple parts that need to be precisely identified and located in the sourcecode (Label, Url, Title...etc.).
List item: for the number/- part, we need a precise location where it stops

Currently, the parser doesn't keep this information.

Markdig does not appear to be threadsafe

While running inside a Parallel.ForEeach

   private readonly MarkdownPipeline _markdownPipeline = new MarkdownPipelineBuilder()
.UseEmojiAndSmiley().UseEmphasisExtras().Build();

Parallel.ForEach( =>
    Markdown.ToHtml(mytext, _markdownPipeline);

Would throw intermittent exceptions for

The character~is already used by another emphasis descriptor,
The given key was not present in the dictionary.

Given the dictionary error message, it appears somewhere you're using a dictionary where either you need to use ConcurrentDictionary or you need to copy the data.

When i switched my usage from the readonly field to

using (var localPipeline = new ThreadLocal<MarkdownPipeline>(() =>
 new MarkdownPipelineBuilder().UseEmojiAndSmiley().UseEmphasisExtras().Build()))

and closing over localPipeline.Value instead of _markdownPipeline the errors went away.

Here's a stack trace for when you have tearing that results in "the character..." error:

   at Markdig.Parsers.Inlines.EmphasisInlineParser.Initialize(InlineProcessor processor)
   at Markdig.Parsers.ParserList`2.Initialize(TState initState)
   at Markdig.Parsers.InlineParserList.Initialize(InlineProcessor initState)
   at Markdig.Parsers.InlineProcessor..ctor(StringBuilderCache stringBuilders, MarkdownDocument document, InlineParserList parsers, Boolean preciseSourcelocation)
   at Markdig.Parsers.MarkdownParser..ctor(String text, MarkdownPipeline pipeline)
   at Markdig.Parsers.MarkdownParser.Parse(String text, MarkdownPipeline pipeline)
   at Markdig.Markdown.Parse(String markdown, MarkdownPipeline pipeline)
   at Markdig.Markdown.ToHtml(String markdown, TextWriter writer, MarkdownPipeline pipeline)
   at Markdig.Markdown.ToHtml(String markdown, MarkdownPipeline pipeline)

Includes Extension

It would be cool to be able to include content from another file before processing. There's precedent for this functionality in other Markdown implementations:

This extension for Python-Markdown uses the syntax {!filename!}.
This NPM module uses the syntax #include filename to pre-process Markdown files for inclusion.
Pandoc-Include uses code fences with the include label to indicate inclusion, though it doesn't appear to be general-purpose inclusion since it still keeps the code block.
Here's a Python preprocessor that uses !INCLUDE filename

Of these three, I think prefer the {!filename!} syntax, but don't have a strong preference. I would happily defer on the syntax if you have thoughts on this. A final requirement is that this should work recursively.

I'd also like to see the includes get processed by a swappable I/O abstraction. The default would use the local file system, but an alternate implementation could be provided to the extension. It wouldn't have to be a full file system abstraction. In the simplest form, this interface would just ask for content given a path. Perhaps this flexibility could be easily supported with a delegate that can be specified when activating the extension. Regardless, it's important for use cases like mine (Wyam) where all file system interaction is through the host.

I'm going to attempt an implementation and PR for this if you think it's in-scope.

Markdown remains unparsed after pre element

I've updated from 0.2.1 to 0.5.9 and a markdown has broken. Below is my configuration.

var builder = new MarkdownPipelineBuilder()
    .UseSoftlineBreakAsHardlineBreak() // broken w/ or w/o
    .Build()

Here is the a gist of the input and output, https://gist.github.com/roydukkey/630f431a89655ee6bb9c35f117b707e1.

HTML Link with URL text tries to render twice when AutoLink() is on

If you have a link like this when Autolink is enabled:

<a href="http://somewhere.com" target="top">http://somewhere.com</a>

the parser produces:

http://somehwere.com</a>

Looks like AutoLink needs to ensure it's not already inside of a link or link text before expanding a URL.

bug?

so, having only just discovered this... and wanting to see if i could cause issues to the parsers...

I used this:
this is a test of various ~~markdown~~impl^ementations^~~.~~

while yours is admittedly the best (among the various results in babelmark), it only "works" when the period exists at the end (separator between the closing ~~ and ~). I say "works" because i've not actually checked the commonmark spec for how it should be handled, rather it's based on what i'd expect it to do.

I'll let you determine how my example fits into everything... just wanted to mention it. (also, have fun seeing how the other parsers handle it! yours was by far the closest)

otherwise, keep up the good work :)

Strong name?

Hello,

I was looking at using Markdig as a replacement for CommonMark in one project and MarkdownDeep in another, however Markdig isn't strongly named. Does you have any plans on strong naming the Markdig library (or perhaps offer two packages, one signed, one unsigned (the way YamlDotNet does it), which will satisfy both stubborn-as-mules camps)?

Thanks;
Richard Moss

Recommended JS front end editor for MarkDig?

Do you know any client side JS library or plugin that play nice with MarkDig?

Consider passing mergeIdAndProperties as true in GenericAttributesParser

Line 67 of GenericAttributesParser.cs calls the CopyTo method of the HtmlAttributes class.
It passes false as the MergeIdAndProperties argument, which means the existing Id is overwritten, even if there is no id on the instance itself.
If the argument is changed to true, then the Id property will only be overwritten if it has been set on the instance.

The scenario that I'm facing right now, is that I have an extension to automatically create Ids for definition term elements (based on their content) but, if I want to also apply a class using the Special Attributes extension, the id gets overwritten.

InvalidCastException in SmartyPantsInlineParser

No pull request for this one, sorry ---(*) until Github figures out I'm human. 😈 Besides, I'm not quite sure I follow the reasoning in the SmartyPants parser (I know changes were made to fix the table rendering).

However, test case:

        [Test]
        public void TestInvalidCastBug()
        {
            var pipeline = new MarkdownPipelineBuilder()
                .UseSmartyPants()
                .Build();
            var md = "\"Curly\" quotations and dashes --- like that one.";
            Assert.DoesNotThrow(() => Markdown.Parse(md, pipeline));
        }

Markdig throws an InvalidCastException:
Unable to cast object of type 'System.String' to type 'System.Collections.Generic.List`1[Markdig.Extensions.SmartyPants.SmartyPant]

Seems to be due to the dashes logic writing String.Empty, overwriting the list created by the quotation marks, in ParserState[Index].

(*) Written too many m-dashes today, it seems. ;-)

Support for admonition extension

Hi!

Could you add support for admonition extension? Admonition is simple markup which we can use to mark important part of documentation, it looks like this:

!!! note
    You should note that the title will be automatically capitalized.

Here is a more detailed description: Admonition Extension. And here is an example how it should looks rendered on the page:

Example from Read The Docs page .

I've tried to implement this using IMarkdownExtension (and BlockParser), however it's really hard without documentation (and BlockParser is not so obvious).

Best regards,
Marcin.

Converting to Markdown

I'm trying to develop an alternative renderer which should produce exactly the same content as the file input. I'm wondering what's the easiest way to represent inter-block whitespace. If every block kept slicing info (start, offset), it would be easier to derive whitespace info.

Could you give me some hint on what could be the best approach?

Nuget: Unable to resolve dependency 'NETStandard.Library'. targeting .net 4.6.1

Install-Package Markdig
Attempting to gather dependency information for package 'Markdig.0.1.0' with respect to project 'ConsoleApplication1', targeting '.NETFramework,Version=v4.6.1'
Attempting to resolve dependencies for package 'Markdig.0.1.0' with DependencyBehavior 'Lowest'
Install-Package : Unable to resolve dependency 'NETStandard.Library'.

Successful work around is to ignore the dependency:

Install-Package Markdig -IgnoreDependencies

[Ejmoji Extension] URLs are being injected with emojis

From madskristensen/MarkdownEditor#3

Description

Urls are being injected with emojis.

Steps to recreate

Add a plain url to the document such as http://bing.com

Current behavior

The url is converted to http😕/bing.com

Expected behavior

:/ shouldn't be converted to an emoji

Full table syntax not supported

This syntax isn't treated as a table:

|Milestone|Release Date|
|---------|------------|
|Beta7    | 24 Aug 2015|
|Beta8    | 21 Sep 2015|
|RC1      |    Nov 2015|
|RC2 Pre1 |    May 2016|

Using BabelMark3 I can see that some other parsers render it correctly, like this:

Pipetable left column alignment

The HtmlTableRenderer seems to ignore the left alignment property in the ColumnDefinition. The default user styles for the table header in at least Chrome will default it to center.

Can this be changed to include text-align: left in the <th element?

HtmlAttributes.CopyTo throws Exception depending on pipeline extension usage

MediaLinkExtension can cause this exception with other extensions (i.e. BootstrapExtension):

var pipeline = new MarkdownPipelineBuilder()
    .UseMediaLinks()
    .UseBootstrap()
    .Build();

var markdown = "![](https://path/to/image)";

Markdown.ToHtml(markdown, pipeline);

The BootstrapExtension makes use of GetAttributes. The latter does not instantiate any complex properties (i.e. Classes or Properties). When the MediaLinksExtension attempts to copy attributes, the share-parameter of CopyTo set to false causes a new List<string> [Classes] or List<KeyValuePair<string, string>> [Properties] to be created and is passed a null Classes/Properties-value to its constructor (ref. https://github.com/lunet-io/markdig/blob/2c3de5688b72aa0995dfddf873df40007c1a7cd5/src/Markdig/Renderers/Html/HtmlAttributes.cs#L130).

ArgumentNullException: Value cannot be null.
 Parameter name: collection
System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
Markdig.Renderers.Html.HtmlAttributes.CopyTo(HtmlAttributes htmlAttributes, Boolean mergeIdAndProperties, Boolean shared)
Markdig.Extensions.MediaLinks.MediaLinkExtension.TryLinkInlineRenderer(HtmlRenderer renderer, LinkInline linkInline)
Markdig.Renderers.MarkdownObjectRenderer`2.Write(RendererBase renderer, MarkdownObject obj)
Markdig.Renderers.RendererBase.Write[T](T obj)
Markdig.Renderers.RendererBase.WriteChildren(ContainerInline containerInline)
Markdig.Renderers.RendererBase.Write[T](T obj)
Markdig.Renderers.TextRendererBase`1.WriteLeafInline(LeafBlock leafBlock)
Markdig.Renderers.Html.ParagraphRenderer.Write(HtmlRenderer renderer, ParagraphBlock obj)
Markdig.Renderers.RendererBase.Write[T](T obj)
Markdig.Renderers.RendererBase.WriteChildren(ContainerBlock containerBlock)
Markdig.Renderers.RendererBase.Write[T](T obj)
Markdig.Renderers.TextRendererBase.Render(MarkdownObject markdownObject)
Markdig.Markdown.ToHtml(String markdown, TextWriter writer, MarkdownPipeline pipeline)
Markdig.Markdown.ToHtml(String markdown, MarkdownPipeline pipeline)

Extension for mermaid syntax

I would like to propose an extension to support mermaid diagram syntax.
https://github.com/knsv/mermaid

I would also love to try doing a PR for it. But I think it would require a little guidance.

My initial thoughts could be something like:
{mermaid}(
graph TD;
A-->B;
A-->C;
B-->D;
C-->D;
)
Produces:
`

graph TD;
A-->B;
A-->C;
B-->D;
C-->D;

And then of course the mermaid JS needs to be included in the HTML file.

Setting for not wrapping the parsed HTML in <p> tags

Hi @xoofx ,
Is there a setting or option that I can use so that when I'm doing:
Markdown.ToHtml("This is a text with some *emphasis*"),
I get the parsed output as This is a text with some <em>emphasis</em>
instead of <p>This is a text with some <em>emphasis</em></p>?

Add precise source location for inline elements

The current implementation doesn't track accurate source code position for inline elements (link, emphasis...etc.) but only for block parser elements (paragraph, blockquotes...etc.).

In order to optimize the execution, the code is a bit destructive when going through inline parsing (can remove spaces...etc.) but if an IDE wanted to use Markdig for precise syntax highlighting, it would require to be conservative.

Care would have to be taken to make this optional (e.g PreciseSourceLocation)

Work in progress in the precise_location branch

Table formatting is something wrong

The pandoc pipe table example， Only two columns of the table, analytical results are not expected：

| abc | def | 
|---|:---|
| cde| ddd| 
| eee| fff|
| fff | fffff   | 
|gggg  | ffff |

test on babelmark

var pipeline = new MarkdownPipelineBuilder().UseAdvancedExtensions().Build();
var result = Markdown.ToHtml(markdownDocument, pipeline);

version 0.7.4

Try to simplify BlockProcessor main loop/Improve API interaction for BlockParsers

While the top level main loop for parsing blocks in BlockProcessor is quite simple, the underlying methods TryContinueBlocks and TryOpenBlocks are not simple.

This is partly due to the complexity of lazy continuation for Paragraph (see Parsing strategy for blocks in the CommonMark specs).
This is also partly due to the different methods used by the different BlockParsers to create/modify blocks.

Check if we could simplify the code here and the API for block parsers.

Successive Footnotes fail parsing

Well, "fail" is relative here - since footnote syntax doesn't have any exact specification.

However, I'd expect this test to succeed:

[Test]
public void TestSuccessiveFootnotes()
{
    var md = @"Here is a footnote[^1]. And another one[^2]. And a third one[^3]. And a fourth[^4].

[^1]: Footnote 1 text

[^2]: Footnote 2 text

[^3]: Footnote 3 text

[^4]: Footnote 4 text
";
    var document = Markdown.Parse(md, new MarkdownPipelineBuilder().UseFootnotes().Build());

    var group = document.OfType<FootnoteGroup>().Single();
    Assert.AreEqual(4, group.Count); // Fails - group.Count is 2.
}

To be more specific, when using ToHtml(), Footnote 1 and 3 are rendered as footnotes, while 2 and 4 are rendered as paragraphs.

The reason seems to be that when entering FootnoteParser.TryOpen on the second footnote, this will cause a return:

if (processor.IsCodeIndent || processor.CurrentContainer.GetType() != typeof(MarkdownDocument) )
{
    return BlockState.None;
}

... due to CurrentContainer still being the previous footnote. I'm not confident enough with the parser to see where the correct place would be for popping back out to the document container.

From some other parsers, it seems it's even feasible to break out without an empty line (according to babelmark: php-markdown-extra, pandoc, multimarkdown, kramdown, minima, maruku; and in Javascript e.g. remarkable/markdown-it). I.e. this would render the same:

Here is a footnote[^1]. And another one[^2]. And a third one[^3]. And a fourth[^4].

[^1]: Footnote 1 text
[^2]: Footnote 2 text
[^3]: Footnote 3 text
[^4]: Footnote 4 text

... and that always seemed ideal to me for most uses of footnotes - but it's always rather complex to tell in Markdown syntax, what conflicts this might cause with other parsers.

YAML front matter

Enhancement request

Add support for YAML front matter.

The immediate need is for markdig to recognize and ignore any front matter. Don't try to parse it as Markdown text. Perhaps in the future markdig may find some use for the data.

If you can provide some hints about where this would need to happen in the markdig code, I might be able to add it myself and submit a PR.

(For what it's worth, I'm no fan of YAML, but it seems to be the most common way of adding metadata to Markdown files.)

DefinitionList Extension does not allow custom attributes on description elements

For our project, I'm writing an extension which sets custom attributes on some Markdown Elements, which should be rendered in the resulting HTML (e.g. for styling).

This works fine for almost all elements (paragraphs, lists, etc.), but fails for the definition list extension, because it will always write an empty

element without honoring any attributes set on the description paragraph.

There is also currently no good workarround for this issue. I would currently need to copy the complete extension only to change this single part.

Best regards,
Thomas

HtmlBlockType.Comment has incorrect spans

When HtmlBlock.Type == HtmlBlockType.Comment, the span ends at the very end of the MarkdownDocument instead of the comment's closing tag.

Example:

<!-- this is great --> 
**hello**

The span length goes from the beginning <!-- to the end of the document

MarkdownDocument.FindClosestLine throws ArgumentOutOfRangeException sometimes

I assume it's when the line number passed into the method exceeds the line count of the MarkdownDocument. There is no way to determine the line count of the document in the API though. In MarkdownEditor I've put a try/catch block around it for now.

Here's the callstack:

System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
Parameter name: index
   at Markdig.Syntax.ContainerBlock.get_Item(Int32 index)
   at Markdig.Syntax.BlockExtensions.FindClosestBlock(Block rootBlock, Int32 line)
   at Markdig.Syntax.BlockExtensions.FindClosestBlock(Block rootBlock, Int32 line)
   at Markdig.Syntax.BlockExtensions.FindClosestLine(MarkdownDocument root, Int32 line)
   at MarkdownEditor.Browser.UpdatePosition(Int32 line) in C:\Users\username\Documents\GitHub\MarkdownEditor\src\Margin\Browser.cs:line 121

It may be when the line contains pipe tables, but I'm not sure it's related.

DisableHtml() leaves orphaned > lying around

Input:

this is some text</td></tr>

Output:

this is some text&gt;&gt;

Expected output:

this is some text&lt;/td&gt;&lt;/tr&gt;

Having to pass a pipeline into ToHtml is weird

Simple fix is to provide an extension ontop of the pipeline

var html = _mycustomPipeline.ToHtml(markdown);

    public static class MarkdigExtensions
    {
        public static string ToHtml(this MarkdownPipeline pipeline, string markdown)
        {
            var html = Markdown.ToHtml(markdown, pipeline);
            return html;
        }
    }

Alternative solution, maybe Build() returns a type different than MarkdownPipeline and it gives you an instance that does the correct things to have ToHtml().

Empty LiteralInline

The following markdown generates an empty LiteralInline before the line break (notice the empty spaces at the end of the line to force a break) :

> *some text* 
> some other text

whereas this doesn't:

> some text 
> some other text

As a workaround, in my renderer, I can check if the content of the literal is empty and ignore it in this case. But IMHO it should be the job of the parser to get rid of it in the first place.

if (obj.Content.IsEmpty)
    return;

UrlSpan not assigned for image references

The LinkInline object has UrlSpan == null for image references like this:

[image]: images/image.png

/cc madskristensen/MarkdownEditor@2b3c53e

Requirements to build markdig?

Hi.

I cloned markdig locally and try to open it with Visual Studio. I'm getting this error and the main project doesn't load:

C:\Projects\markdig\src\Markdig\Markdig.xproj : error : The imported project "C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v14.0\DotNet\Microsoft.DotNet.Props" was not found. Confirm that the path in the declaration is correct, and that the file exists on disk. C:\Projects\markdig\src\Markdig\Markdig.xproj

What do I need to do to fix this issue?
Could you add in the Readme file the dependencies of markdig?

Note: I'm using VS 2015 Community Edition

Multiple calls to Markdown.ToHtml with same pipeline parameter throws an exception

When calling Render method of following class more than once Markdown.ToHtml throws an exception (at the bottom). It seems to be related to the _pipeline parameter. If I remove it from the parameters multiple calls work as expected.

Class:

 public class MarkdigRenderer : IMarkdownRenderer
    {
        private readonly MarkdownPipeline _pipeline;

        public MarkdigRenderer()
        {
            _pipeline = new MarkdownPipeline().UseAllExtensions();
            _pipeline.DebugLog = new StringWriter();
        }

        public string Render(string markdown)
        {
            var html =  Markdown.ToHtml(markdown, _pipeline);

            return html;
        }
    }

Exception

System.InvalidOperationException: The character `:` is already used by another emphasis descriptor
   at Markdig.Parsers.Inlines.EmphasisInlineParser.Initialize(InlineProcessor processor)
   at Markdig.Parsers.ParserList`2.Initialize(TState initState)
   at Markdig.Parsers.InlineParserList.Initialize(InlineProcessor initState)
   at Markdig.Parsers.InlineProcessor..ctor(StringBuilderCache stringBuilders, MarkdownDocument document, InlineParserList parsers)
   at Markdig.Parsers.MarkdownParser..ctor(TextReader reader, MarkdownPipeline pipeline)
   at Markdig.Parsers.MarkdownParser.Parse(TextReader reader, MarkdownPipeline pipeline)
   at Markdig.Markdown.ToHtml(TextReader reader, TextWriter writer, MarkdownPipeline pipeline)
   at Markdig.Markdown.ToHtml(TextReader reader, MarkdownPipeline pipeline)
   at Markdig.Markdown.ToHtml(String markdown, MarkdownPipeline pipeline)