Giter Site home page Giter Site logo

mmark's Introduction

MMark

License BSD3 Hackage Stackage Nightly Stackage LTS CI

MMark (read “em-mark”) is a strict markdown processor for writers. “Strict” means that not every input is considered valid markdown document and parse errors are possible and even desirable, because they allow us to spot markup issues without searching for them in rendered document. If a markdown document passes the MMark parser, then it is likely to produce an HTML output without quirks. This feature makes it a good choice for writers and bloggers.

MMark in its current state features:

  • A parser that produces high-quality error messages and does not choke on the first parse error. It is capable of reporting several parse errors simultaneously.

  • An extension system that allows us to create extensions that alter parsed markdown document in some way.

  • A lucid-based render.

There is also a blog post announcing the project:

https://markkarpov.com/post/announcing-mmark.html

Quick start: MMark vs GitHub-flavored markdown

It's easy to start using MMark if you're used to GitHub-flavored markdown. There are four main differences:

  1. URIs are not automatically recognized, you must enclose them in < and >.

  2. Block quotes require only one > and they continue as long as the inner content is indented.

    This is OK:

    > Here goes my block quote.
      And this is the second line of the quote.
    

    This produces two block quotes:

    > Here goes my block quote.
    > And this is another block quote!
    
  3. HTML blocks and inline HTML are not supported.

  4. See differences in inline parsing.

MMark and Common Mark

MMark mostly tries to follow the Common Mark specification as given here:

https://spec.commonmark.org/0.28/

However, due to the fact that we do not allow inputs that do not make sense, and also try to guard against common mistakes (like writing ##My header and having it rendered as a paragraph starting with hashes) MMark obviously can't follow the specification precisely. In particular, parsing of inlines differs considerably from Common Mark (see below).

Another difference between Common Mark and MMark is that the latter supports more (pun alert) common markdown extensions out-of-the-box. In particular, MMark supports:

  • parsing of an optional YAML block
  • strikeout using ~~this~~ syntax
  • superscript using ^this^ syntax
  • subscript using ~this~ syntax
  • automatic assignment of ids to headers
  • pipe tables (as on GitHub)

One does not need to enable or tweak anything for these to work, they are built-in features.

Differences in inline parsing

Emphasis and strong emphasis is an especially hairy topic in the Common Mark specification. There are 17 ad-hoc rules defining the interaction between * and _ -based emphasis and more than an half of all Common Mark examples (that's about 300) test just this.

Not only it is hard to implement, it's hard to understand for humans too. For example, this input:

*(*foo*)*

results in the following HTML:

<p><em>(<em>foo</em>)</em></p>

(Note the nested emphasis.)

Could it produce something like this instead?

<p><em>(</em>foo<em>)</em></p>

Well, why not? Without remembering those 17 ad-hoc rules, there going to be a lot of tricky cases when the user won't be able to tell how markdown will be parsed.

I decided to make parsing of emphasis, strong emphasis, and similar constructs like strikethrough, subscript, and superscript more symmetric and less ad-hoc. In 99% of practical cases it is identical to Common Mark, and normal markdown intuitions will work OK for the users.

Let's start by dividing all characters into four groups:

  • Space characters, including space, tab, newline, carriage return, and other characters like non-breaking space.

  • Markup characters, including the following: *, ~, _, `, ^, [, ]. These are used for markup and whenever they appear in a document, they must form valid markup constructions. To be used as ordinary punctuation characters they must be backslash escaped.

  • Punctuation characters, which include all punctuation characters that are not markup characters.

  • Other characters, which include all characters not falling into the three groups described above.

Next, let's assign levels to all groups but markup characters:

  • Space characters—level 0
  • Punctuation characters—level 1
  • Other characters—level 2

When markup characters or punctuation characters are escaped with backslash they become other characters.

We'll call markdown characters placed between a character of level L and a character of level R left-flanking delimiter run if and only if:

level(L) < level(R)

These markup characters sort of hang on the left hand side of a word.

Similarly we'll call markdown characters placed between a character of level L and a character of level R right-flanking delimiter run if and only if:

level(L) > level (R)

These markup characters hang on the right hand side of a word.

Emphasis markup (and other similar things like strikethrough, which we won't mention explicitly anymore for brevity) can start only as left-flanking delimiter run and end only as right-flanking delimiter run.

This produces a parse error:

*Something * is not right.
Something __is __ not right.

And this too:

__foo__bar

This means that inter-word emphasis is not supported.

The next example is OK because s is an other character and . is a punctuation character, so level('s') > level('.').

Here it *goes*.

In some rare cases backslash escaping can help get the right result:

Here goes *(something\)*.

We escaped the closing parenthesis ) so it becomes an other character with level 2 and so its level is greater than the level of plain punctuation character ..

Other differences

Block-level parsing:

  • If a line starts with hash signs it is expected to be a valid non-empty header (level 1–6 inclusive). If you want to start a paragraph with hashes, just escape the first hash with backslash and that will be enough.
  • Setext headings are not supported for the sake of simplicity.
  • Fenced code blocks must be explicitly closed by a closing fence. They are not closed by the end of document or by start of another block.
  • Lists and block quotes are defined by column at which their content starts. Content belonging to a particular list or block quote should start at the same column (or greater column, up to the column where indented code blocks start). As a consequence of this, block quotes do not feature “laziness”.
  • Block quotes are started by a single > character, it's not necessary to put a > character at beginning of every line belonging to a quote (in fact, this would make every line a separate block quote).
  • Paragraphs can be interrupted by unordered and ordered lists with any valid starting index.
  • HTML blocks are not supported because the syntax conflicts with autolinks and the feature is a hack to compensate for the lack of extensibility and customization in the original markdown.

Inline-level parsing:

  • MMark does not support hard line breaks represented as double space before newline. Nevertheless, hard line breaks in the form of backslash before newline are supported (these are more explicit too).
  • All URI references (in links, images, autolinks, etc.) are parsed as per RFC 3986, no support for escaping or support for entity and numeric character references is provided. In addition to that, when a URI reference in not enclosed with < and >, then closing parenthesis character ) is not considered part of URI (use <uri> syntax if you want a closing parenthesis as part of a URI). Since the empty string is a valid URI and it may be confusing in some cases, we also force the user to write <> to represent the empty URI.
  • Putting links in text of another link is not allowed, i.e. no nested links is possible.
  • Putting images in description of other images is not allowed (similarly to the situation with links).
  • HTML inlines are not supported for the same reason why HTML blocks are not supported.

About MMark-specific extensions

  • YAML block must start with three hyphens --- and end with three hyphens ---. It can only be placed at the beginning of a markdown document. Trailing white space after the --- sequences is allowed.

Performance

I have compared speed and memory consumption of various Haskell markdown libraries by running them on an identical, big-enough markdown document and by rendering it as HTML:

Library Parsing library Execution time Allocated Max residency
cmark-0.5.6 Custom C code 323.4 μs 228,440 9,608
mmark-0.0.5.1 Megaparsec 7.027 ms 26,180,272 37,792
cheapskate-0.1.1 Custom Haskell code 10.76 ms 44,686,272 799,200
markdown-0.1.16 Attoparsec 14.13 ms 69,261,816 699,656
pandoc-2.0.5 Parsec 37.90 ms 141,868,840 1,471,080

Results are ordered from fastest to slowest.

† The markdown library is sloppy and parses markdown incorrectly. For example, it parses the following *My * text as an inline containing emphasis, while in reality both asterisks must form flanking delimiter runs to create emphasis, like so *My* text. This allowed markdown to get away with a far simpler approach to parsing at the price that it's not really a valid markdown implementation.

Related packages

  • mmark-ext contains some commonly useful MMark extensions.
  • mmark-cli is a command line interface to MMark.
  • flycheck-mmark is a way to check markdown documents against MMark parser interactively from Emacs.

Contribution

Issues, bugs, and questions may be reported in the GitHub issue tracker for this project.

Pull requests are also welcome.

License

Copyright © 2017–present Mark Karpov

Distributed under BSD 3 clause license.

mmark's People

Contributors

benkolera avatar bens avatar dependabot[bot] avatar mrkkrp avatar sjakobi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mmark's Issues

MMark is confused by pipes in code spans inside tables

For example, this table is currently considered invalid, while it's perfectly fine:

Thing produced | Quotation syntax | Type
---------------|------------------|----------
Declaration    | `[d| … |]`       | `Q [Dec]`
Expression     | `[e| … |]`       | `Q Exp`
Type           | `[t| … |]`       | `Q Type`
Pattern        | `[p| … |]`       | `Q Pat`

GHCJS support

Would you open for a PR to support GHCJS?

The yaml dependency doesn't compile on GHCJS.

test/Data/Yaml/IncludeSpec.hs:7:1: warning: [-Wdeprecations]
    Module ‘Data.Yaml’:
      GHCJS is not supported yet (will break at runtime once called).
  |
7 | import           Data.Yaml (ParseException(InvalidYaml))
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2 of 4] Compiling Data.Yaml.THSpec ( test/Data/Yaml/THSpec.hs, dist/build/spec/spec-tmp/Data/Yaml/THSpec.js_o )
Linking Template Haskell ()
Linking Template Haskell (Data.Yaml.IncludeSpec,ThRunner1)
uncaught exception in Haskell main thread: ReferenceError: h$yaml_parser_initialize is not defined
ReferenceError: h$yaml_parser_initialize is not defined
 

Compilation breaks on: # Foo\n\n ### Bar \n\n

Replication steps (mmark 2.0.40, xml2rfc 2.22.0):

$ mmark index.md > index.xml
$ xml2rfc index.xml -o index.html
Error: Unable to parse the XML document: index.xml
 <string>: Line 24: Opening and ending tag mismatch: middle line 17 and
section
 <string>: Line 24: Opening and ending tag mismatch: middle line 17 and
section
 <string>: Line 28: Opening and ending tag mismatch: rfc line 3 and section
 <string>: Line 30: Extra content at the end of the document

More info:
https://mailarchive.ietf.org/arch/msg/xml2rfc/OPep-8cpM-NZFqQZzX_whWpU9iw

Compile fail in Circle CI 2.0 (out of memory error)

I'm using Circle Ci (and stack).
locally build is fine. But, get following error message in the Circle CI.

--  While building custom Setup.hs for package mmark-0.0.5.6 using:
      /root/.stack/setup-exe-cache/x86_64-linux-nopie/Cabal-simple_mPHDZzAJ_2.2.0.1_ghc-8.4.3 --builddir=.stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.1 build lib:mmark --ghc-options " -ddump-hi -ddump-to-file -fdiagnostics-color=always"
    Process exited with code: ExitFailure (-9) (THIS MAY INDICATE OUT OF MEMORY)
    Logs have been written to: /root/project/.stack-work/logs/mmark-0.0.5.6.log

    Configuring mmark-0.0.5.6...
    Preprocessing library for mmark-0.0.5.6..
    Building library for mmark-0.0.5.6..
    [1 of 9] Compiling Text.MMark.Parser.Internal.Type ( Text/MMark/Parser/Internal/Type.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.1/build/Text/MMark/Parser/Internal/Type.o )
    [2 of 9] Compiling Text.MMark.Parser.Internal ( Text/MMark/Parser/Internal.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.1/build/Text/MMark/Parser/Internal.o )
    [3 of 9] Compiling Text.MMark.Type  ( Text/MMark/Type.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.1/build/Text/MMark/Type.o )
    [4 of 9] Compiling Text.MMark.Trans ( Text/MMark/Trans.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.1/build/Text/MMark/Trans.o )
    [5 of 9] Compiling Text.MMark.Util  ( Text/MMark/Util.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.1/build/Text/MMark/Util.o )
    [6 of 9] Compiling Text.MMark.Render ( Text/MMark/Render.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.1/build/Text/MMark/Render.o )
    [7 of 9] Compiling Text.MMark.Parser ( Text/MMark/Parser.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.1/build/Text/MMark/Parser.o )
Exited with code 1

Reproduce this repo.

Expose source position to scanners and extensions

Would it be possible to expose the source position for scanners and extension building functions? I have two use cases that would greatly benefit from this:

  1. Many extensions can fail on bad markup. Currently, when this happens it can be really hard find out where it happened. For example, I would like to typecheck my code examples.

  2. Wiki like editing would benefit from this as it would allow the generated html to point at the original source

I did notice that this was mentioned in the documentation. You can count this issue as a vote for adding the source locations + an offer to do some of the work.

Terminating emphasis

I've run into a few corner cases and I'm trying to understand if I could hack your spec to make them supported. The one that really jumped out at me was that emphasis couldn't be terminated mid word. Is there any way to make this fragment supported? Example:

So in this case just plonking it into _/var/lib/pgsql_ ought to work?
Otherwise pick somewhere \[tempting but not _/tmp_\] to put it.
                                                 ^

was rejected by your mmark CLI tool. (fragment is from a conversation on Gitter which did indeed render according to my intent). In this case, the markup significant punctuation character ], as escaped, shouldn't have been part of the word, and so the emphasis span should have been valid?

I understand and support your idea of a strictly validating Markdown subset. Just trying to see if I can get to the point where I can live with it :)

AfC

Paragraphs within blockquotes are not parsed

> This is first paragraph
  
  This is second.

The above should normally be rendered as two paragraphs within a single blockquote, but mmark renders them joined together as one paragraph in the blockquote.

Support inline autolinks

Add support for inline autolinks using this syntax:

<http://foo.bar.baz>

Section 6.7 of Common Mark spec.

testsuite failing for Stackage Nightly

Test suite failure for package mmark-0.0.7.3                                                                                   
: 
    Failures:                                                                                                                                                                           
                                                                                                                                                                                        
      tests/Text/MMark/TestUtils.hs:111:37:                                                                                                                                             
      1) parse and render, 6.5 Links, CM488                                                                                                                                             
           expected: "<p><a href=\"/uri\"><img alt=\"moon\" src=\"moon.jpg\"></a></p>\n"                                                                                                
            but got: "<p><a href=\"/uri\"><img src=\"moon.jpg\" alt=\"moon\"></a></p>\n"                                                                                                

      To rerun use: --match "/parse and render/6.5 Links/CM488/"                                                                                                                        
                                                                                                                               
      tests/Text/MMark/TestUtils.hs:111:37:                                                                                    
      2) parse and render, 6.5 Links, CM502                                                                                    
           expected: "<p><a href=\"/uri\"><img alt=\"moon\" src=\"moon.jpg\"></a></p>\n"                                                                                                
            but got: "<p><a href=\"/uri\"><img src=\"moon.jpg\" alt=\"moon\"></a></p>\n"                                                                                                
                                                                                                                               
      To rerun use: --match "/parse and render/6.5 Links/CM502/"                                                                                                                        
                                                                                                                                                                                        
      tests/Text/MMark/TestUtils.hs:111:37:                                                                                                                                             
      3) parse and render, 6.6 Images, CM543                                                                                   
           expected: "<p><img alt=\"foo\" title=\"title\" src=\"/url\"></p>\n"                                                                                                                      but got: "<p><img src=\"/url\" title=\"title\" alt=\"foo\"></p>\n"                                                                                                          
                                                                                                                                                                                              To rerun use: --match "/parse and render/6.6 Images/CM543/"                                                                                                                       
                                                                                               
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      4) parse and render, 6.6 Images, CM544                                                   
           expected: "<p><img alt=\"foo bar\" title=\"train &amp; tracks\" src=\"train.jpg\"></p>\n"                                                                                    
            but got: "<p><img src=\"train.jpg\" title=\"train &amp; tracks\" alt=\"foo bar\"></p>\n"                                                                                    
                                                                                                                                                                                        
      To rerun use: --match "/parse and render/6.6 Images/CM544/"                                                                                                                       
                                                                                            
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      5) parse and render, 6.6 Images, CM546                                                                                                                                                       expected: "<p><img alt=\"foo bar\" src=\"/url2\"></p>\n"                                                                                                                     
            but got: "<p><img src=\"/url2\" alt=\"foo bar\"></p>\n"                                                                                                                                                                   
      To rerun use: --match "/parse and render/6.6 Images/CM546/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      6) parse and render, 6.6 Images, CM548                                                
           expected: "<p><img alt=\"foo bar\" title=\"train &amp; tracks\" src=\"train.jpg\"></p>\n"                                                                                    
            but got: "<p><img src=\"train.jpg\" title=\"train &amp; tracks\" alt=\"foo bar\"></p>\n"                                                                                    
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM548/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      7) parse and render, 6.6 Images, CM549                                                
           expected: "<p><img alt=\"foo\" src=\"train.jpg\"></p>\n"                                                                                                                     
            but got: "<p><img src=\"train.jpg\" alt=\"foo\"></p>\n"                                                                                                                     
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM549/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                                                                                                 [367/188774]
      8) parse and render, 6.6 Images, CM550                                                
           expected: "<p>My <img alt=\"foo bar\" title=\"title\" src=\"/path/to/train.jpg\"></p>\n"                                                                                     
            but got: "<p>My <img src=\"/path/to/train.jpg\" title=\"title\" alt=\"foo bar\"></p>\n"                                                                                     
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM550/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      9) parse and render, 6.6 Images, CM551                                                
           expected: "<p><img alt=\"foo\" src=\"url\"></p>\n"                               
            but got: "<p><img src=\"url\" alt=\"foo\"></p>\n"                               
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM551/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      10) parse and render, 6.6 Images, CM552                                               
           expected: "<p><img alt src=\"/url\"></p>\n"                                      
            but got: "<p><img src=\"/url\" alt></p>\n"                                      
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM552/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      11) parse and render, 6.6 Images, CM553                                               
           expected: "<p><img alt=\"foo\" src=\"/url\"></p>\n"                              
            but got: "<p><img src=\"/url\" alt=\"foo\"></p>\n"                              
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM553/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      12) parse and render, 6.6 Images, CM554                                               
           expected: "<p><img alt=\"foo\" src=\"/url\"></p>\n"                              
            but got: "<p><img src=\"/url\" alt=\"foo\"></p>\n"                              
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM554/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      13) parse and render, 6.6 Images, CM555                                               
           expected: "<p><img alt=\"foo\" title=\"title\" src=\"/url\"></p>\n"                                                                                                          
            but got: "<p><img src=\"/url\" title=\"title\" alt=\"foo\"></p>\n"                                                                                                          
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM555/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      14) parse and render, 6.6 Images, CM556                                               
           expected: "<p><img alt=\"foo bar\" title=\"title\" src=\"/url\"></p>\n"                                                                                                      
            but got: "<p><img src=\"/url\" title=\"title\" alt=\"foo bar\"></p>\n"                                                                                                      
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM556/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      15) parse and render, 6.6 Images, CM557                                               
           expected: "<p><img alt=\"Foo\" title=\"title\" src=\"/url\"></p>\n"                                                                                              [316/188774]
            but got: "<p><img src=\"/url\" title=\"title\" alt=\"Foo\"></p>\n"                                                                                                          
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM557/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      16) parse and render, 6.6 Images, CM559                                               
           expected: "<p><img alt=\"foo\" title=\"title\" src=\"/url\"></p>\n"                                                                                                          
            but got: "<p><img src=\"/url\" title=\"title\" alt=\"foo\"></p>\n"                                                                                                          
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM559/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      17) parse and render, 6.6 Images, CM560                                               
           expected: "<p><img alt=\"foo bar\" title=\"title\" src=\"/url\"></p>\n"                                                                                                      
            but got: "<p><img src=\"/url\" title=\"title\" alt=\"foo bar\"></p>\n"                                                                                                      
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM560/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      18) parse and render, 6.6 Images, CM562                                               
           expected: "<p><img alt=\"Foo\" title=\"title\" src=\"/url\"></p>\n"                                                                                                          
            but got: "<p><img src=\"/url\" title=\"title\" alt=\"Foo\"></p>\n"                                                                                                          
                                              
      To rerun use: --match "/parse and render/6.6 Images/CM562/"                                                                                                                       
                                              
      tests/Text/MMark/TestUtils.hs:111:37:                                                 
      19) parse and render, given a complete, comprehensive document, outputs expected the HTML fragment                                                                                
           expected: "<h1 id=\"lorem-ipsum\">Lorem ipsum</h1>\n<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer varius mi\norci, rhoncus ornare nunc tincidunt nec. A
liquam cursus posuere ornare.\nQuisque posuere euismod nunc, sed pellentesque metus hendrerit eu. Donec\nscelerisque accumsan ante quis interdum. Nullam nec mauris dolor. Lorem\nipsum 
dolor sit amet, consectetur adipiscing elit. Phasellus id porttitor\nnunc, sed laoreet eros. Maecenas ipsum ex, sagittis ut quam quis, vehicula\nfringilla tortor. Vestibulum quis conse
quat mauris, sed porta risus.\nVestibulum nec ornare leo. Cras pharetra, ex sed dapibus pretium, diam\nlectus accumsan enim, at malesuada tellus lorem et orci. Sed condimentum\nvarius 
ex in mollis.</p>\n<p><a href=\"https://example.org\">https://example.org</a></p>\n<p>Ut in imperdiet neque. Etiam iaculis rhoncus nisl vel porta. Praesent velit\norci, laoreet suscipi
t bibendum eu, ornare et orci. Fusce feugiat, felis a\nvehicula pulvinar, nulla purus dictum arcu, et varius urna purus et nibh.\nDuis lobortis fringilla ligula, in aliquet sem maximus
 a. Suspendisse\npotenti. Nullam consequat tellus a lectus vestibulum faucibus. Ut hendrerit\ndolor ut libero efficitur accumsan. Mauris dapibus, leo non porttitor\nlobortis, lectus ip
sum tempor metus, quis iaculis arcu quam malesuada nulla.</p>\n<h2 id=\"nullam-luctus\">Nullam luctus?!</h2>\n<p>Nullam <a href=\"http://example.org/luctus\">luctus</a> placerat nisl i
n dapibus.\nPhasellus id erat eros. Ut gravida risus sit amet massa tempor volutpat.\nVestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere\ncubilia Curae; Quisque d
ictum sapien vel enim tempor, quis ornare justo\nconsequat. Suspendisse porttitor mollis consectetur. Curabitur sodales,\nrisus eu dapibus mattis, tellus dolor condimentum dolor, in ul
tricies nibh\naugue vel nibh. Vivamus imperdiet, orci id posuere sollicitudin, diam purus\nconsequat eros, quis dictum mauris lacus ac lectus. Cras vitae pharetra\nrisus. Maecenas vehi
cula, leo vitae semper tristique, libero urna\nconsectetur massa, eget pharetra magna est nec massa. Vestibulum malesuada\nlobortis lacinia.</p>\n<p><img alt=\"My image\" src=\"https:/
/example.org/image.png\"></p>\n<p>Phasellus tincidunt metus quam, vel mollis turpis ultrices et. Phasellus\nconsequat diam eu turpis sollicitudin tempus. Fusce suscipit bibendum nisl,\
nquis rutrum eros volutpat in. Fusce tempor nisi eu ligula volutpat, eu\nultricies arcu blandit. Pellentesque habitant morbi tristique senectus et\nnetus et malesuada fames ac turpis e
gestas. Duis eleifend malesuada\nvenenatis. Morbi tincidunt quis diam ac aliquam.</p>\n<ul>\n<li>\nFoo\n</li>\n<li>\nBar\n<ol>\n<li>\nOne\n</li>\n<li>\nTwo\n</li>\n<li>\nThree\n</li>\n
</ol>\n</li>\n<li>\nBaz\n</li>\n</ul>\n<h2 id=\"sed-euismod\">Sed euismod</h2>\n<p>Sed euismod nisi lorem, ac tempor nibh venenatis at. Integer porta nibh quis\nmauris vehicula porta. 
Sed vel tellus nec lacus porttitor sollicitudin. Sed\nfacilisis nisl lorem, sed aliquet leo convallis sed. Curabitur vitae aliquet\ndiam, ac commodo ligula. Nulla aliquet odio at tellu
s auctor pellentesque.\nIn sagittis elementum tortor sed lobortis. Fusce nibh turpis, posuere eget\ntristique eget, commodo quis leo.</p>\n<hr>\n<p>Etiam faucibus, ipsum id lobortis mo
lestie, dolor lectus cursus purus, nec\nvolutpat massa odio vitae ligula. Nulla non consectetur ligula. In sem\nfelis, vehicula a convallis ut, pellentesque nec diam. Integer ullamcorp
er\nrutrum nulla. Nam arcu dolor, placerat nec molestie et, eleifend sit amet\nante. Suspendisse laoreet orci sit amet vestibulum varius. In at leo eu\nlorem tincidunt facilisis. Ut el
ementum elit ornare risus convallis, ut\nviverra orci pretium. Vivamus mi orci, lacinia ac ligula a, condimentum\naliquet ligula.</p>\n<h3 id=\"curabitur-ullamcorper\">Curabitur ullamc
orper</h3>\n<blockquote>\n<p>Curabitur ullamcorper, lectus id porttitor vehicula, augue purus ornare\norci, ut consequat tellus mauris ac sem. Cras tincidunt sagittis mi, sit\namet viv
erra erat ultrices vulputate. Donec urna nulla, malesuada non\ncursus et, posuere eu sapien. Fusce cursus mauris odio, id tincidunt felis\ntincidunt sed. Duis vulputate lectus eu tellu
:

There is also a similar failure for mmark-ext-0.2.1.3 for the long text.

Support blockquotes

Add support for blockquotes using this syntax:

> Here goes a quote.

Section 5.1 of Common Mark spec.

Compile time `MMark`

I'd like to embed my markdown content in the Haskell source in a manner similar to file-embed but (obviously) as MMark documents instead of ByteString.

The primary benefit here is that we fail at compile time if any of the markdown files (in the repository) fails to parse.

Would implementing this feature in this primary repo be within its scope, or is it best done as a separate library? I'm only beginning to get familiar TH, so not yet sure how trivial to implement this would be.

PHP-style footnotes

MMark feature. Something like:

Here we go [^1], ...

[^1]:
Some footnote text.

Parser fails on '_' symbol inside link text

The minimal example of this is:

[http://twitter.com/thought_sync](http://twitter.com/thought_sync)

The error is:

ErrorBundle {bundleErrors = TrivialError 27 (Just (Tokens ('_' :| \\\"\\\"))) (fromList [Tokens (']' :| \\\"\\\"),Label ('i' :| \\\"nline content\\\")]) :| [], bundlePosState = PosState {pstateInput = \\\"[http://twitter.com/thought_sync](http://twitter.com/thought_sync).\\\", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = \\\"rest\\\", sourceLine = Pos 1, sourceColumn = Pos 1}

If I remove '_' from text, parser handles this ok:

[http://twitter.com/thoughtsync](http://twitter.com/thought_sync)

Enclose sections

I really would like an option similar to pandoc's --section-divs option, so that document sections are properly enclosed, and current selected sections could be properly highlighted.

For example,

# Document

A paragraph

## A subsection

With some text

## Another subsection

### With a child

would result in the following HTML:

<section id="document">
  <h1>Document</h1>
  <p>A paragraph</p>
  <section id="a-subsection">
    <h2>A subsection</h2>
    <p>With some text</p>
  </section>
  <section id="another-subsection">
    <h2>Another subsection</h2>
    <section id="with-a-child">
      <h3>With a child</h3>
    </section>
  </section>
</section>

The current extension system doesn't seem to allow this kind of transformations --- or at the very least I wasn't able to figure out how --- and because internals of the library are hidden, I cannot implement this outside of the library by hacking my own rendering function.

Would love some discussion about whether this would be worth adding to the library, and how.
I would perfectly understand if it's outside the scope of MMark.

Inline and block transformations are not applied to nested inlines/blocks

We currently apply such transformations to "top-level" inlines and blocks here:

https://github.com/mmark-md/mmark/blob/master/Text/MMark/Render.hs#L41-L52

However if an inline is contained inside another inline, e.g. like this:

myInline :: Inline
myInline = Emphasis (Plain "My stuff" :| [])

and our transformation targets Plain inlines, it won't be applied. The situation is pretty much the same with blocks.

The solution is to recursively descend in things like Emphasis and try to apply given transformation everywhere.

Monadic `blockRender`?

I'd like to render LaTeX diagrams into images that are then inserted into the output. I have a function of the following form:

-- | Render some LaTeX content to an image whose filename is the 
-- hash of the content. Return the path to the image.
renderToFile :: Text -> IO FilePath
renderToFile latex = do
   ...
   pure savePath

I think the interface I would like is to interpret code-blocks with some particular infostring (e.g. render-latex) as LaTeX to be rendered into a PNG. If blockRenderM existed, I could just do

renderLatexExt :: Extension
renderLatexExt = blockRenderM $ \case
  CodeBlock (Just "render-latex") content 
    -> (\uri -> Image undefined uri Nothing) <$> renderToFile content
  x -> pure x

or something.

The other heading style

I have a significant corpus of material that uses the original Markdown syntax for headings ("setext style, I think?) with

Level 1
=======

and

Level 2
-------

headings. We value these for the visual strength and structural separation they give when scanning the original text documents. I like the goals you set out for mmark but I'd need to contribute to enable support for these if I was to use this for real. Do you think this would be possible with the code as currently designed?

AfC

HTML blocks

Section 4.6 of Common Mark specification.

Task support

I would like to support tasks, just as we have available here in github:

  • Like this
  • and this

I was hoping to be able to do this using the extension module but as far as I understand it that can't be done. - [ ] ends up being parsed as a broken link. Is is possible to do this as an extension?

If not, and if you're interested in having this feature, can you give me some hint on how to add it?

in GFMD it seems there is (somewhat shaky) support for task in numbered lists, e.g.

1. [ ] Numbered task
2. not numbered
3 [x] completed 

renders as

  1. Numbered task
  2. not numbered
    3 [x] completed

I never tried that before right now, I wouldn't implement it unless I had to.

However, the Block constructor for Unordered lists is UnorderedList (NonEmpty [Block a]), but tasks are only really supported in lists. The natural way seem to be wrapping the Block a ref in something that can tell if it a normal block, unfinished task, or finished task. Is there a better solution? Would that solution be accepted?

Support inline images

Add support for inline images using this syntax:

![foo](/url "title")

Section 6.6 of Common Mark spec.

Test suite failure for package mmark-0.0.7.2

In the Stackage Nightly build:

    Failures:
    
      tests/Text/MMark/TestUtils.hs:113:22: 
      1) parse and render, 6.1 Blackslash escapes, CM297
           expected: "<p><a href=\"http://example.com/?find=*\">http://example.com/?find=*</a></p>\n"
            but got: "<p><a href=\"http://example.com?find=*\">http://example.com?find=*</a></p>\n"
    
      To rerun use: --match "/parse and render/6.1 Blackslash escapes/CM297/"
    
      tests/Text/MMark/TestUtils.hs:113:22: 
      2) parse and render, 6.5 Links, CM472
           expected: "<p><a href=\"#fragment\">link</a></p>\n<p><a href=\"http://example.com/#fragment\">link</a></p>\n<p><a href=\"
http://example.com/?foo=3#frag\">link</a></p>\n"
            but got: "<p><a href=\"#fragment\">link</a></p>\n<p><a href=\"http://example.com#fragment\">link</a></p>\n<p><a href=\"h
ttp://example.com?foo=3#frag\">link</a></p>\n"
    
      To rerun use: --match "/parse and render/6.5 Links/CM472/"
    
      tests/Text/MMark/TestUtils.hs:113:22: 
      3) parse and render, 6.7 Autolinks, CM565
           expected: "<p><a href=\"http://foo.bar.baz/\">http://foo.bar.baz/</a></p>\n"
            but got: "<p><a href=\"http://foo.bar.baz\">http://foo.bar.baz</a></p>\n"
    
      To rerun use: --match "/parse and render/6.7 Autolinks/CM565/"
    
      tests/Text/MMark/TestUtils.hs:113:22: 
      4) parse and render, 6.7 Autolinks, CM571
           expected: "<p><a href=\"http://../\">http://../</a></p>\n"
            but got: "<p><a href=\"http://..\">http://..</a></p>\n"
    
      To rerun use: --match "/parse and render/6.7 Autolinks/CM571/"
    
      tests/Text/MMark/TestUtils.hs:113:22: 
      5) parse and render, given a complete, comprehensive document, outputs expected the HTML fragment
           expected: "<h1 id=\"lorem-ipsum\">Lorem ipsum</h1>\n<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer varius mi\norci, rhoncus ornare nunc tincidunt nec. Aliquam cursus posuere ornare.\nQuisque posuere euismod nunc, sed pellentesque metus hendrerit eu. Donec\nscelerisque accumsan ante quis interdum. Nullam nec mauris dolor. Lorem\nipsum dolor sit amet, consectetur adipiscing elit. Phasellus id porttitor\nnunc, sed laoreet eros. Maecenas ipsum ex, sagittis ut quam quis, vehicula\nfringilla tortor. Vestibulum quis consequat mauris, sed porta risus.\nVestibulum nec ornare leo. Cras pharetra, ex sed dapibus pretium, diam\nlectus accumsan enim, at malesuada tellus lorem et orci. Sed condimentum\nvarius ex in mollis.</p>\n<p><a href=\"https://example.org/\">https://example.org/</a></p>\n<p>Ut in imperdiet neque. Etiam iaculis rhoncus nisl vel porta. Praesent velit\norci, laoreet suscipit...
:

Add CSS classes to rendered elements

What is a good approach to adding CSS classes to elements (eg: <table>) rendered by mmark?

One approach is to fork the renderer code and do it as an extension, as is being tried by @gillchristian in srid/neuron#100 - but obviously there must be a better way that respects DRY?

Just looking at the code it doesn't seem to be possible in its current state. Is there a possibility of improving the library to support such customization? Or is this better done in a different manner?

Support for the traditional block quote syntax

The block quote syntax we currently support differs considerably from what is expected from a markdown processor. We thus must adjust the parser to make it work more conventionally.

Test failures when building with `modern-uri-0.3.4.3`

I'm building mmark on Gentoo Linux using ghc-9.0.2, and I found that certain tests fail with the latest version bump to modern-uri-0.3.4.3. These tests are passing when I build with modern-uri-0.3.4.2:

Failures:

  tests/Text/MMark/TestUtils.hs:111:37: 
  1) parse and render, 6.2 Entity and numeric character references, CM309
       expected: "<p><a href=\"/f&amp;ouml;&amp;ouml;\" title=\"föö\">foo</a></p>
"
        but got: "<p><a href=\"/f%26ouml%3b%26ouml%3b\" title=\"föö\">foo</a></p>
"

  To rerun use: --match "/parse and render/6.2 Entity and numeric character references/CM309/"

  tests/Text/MMark/TestUtils.hs:111:37: 
  2) parse and render, 6.2 Entity and numeric character references, CM310
       expected: "<p><a href=\"/f&amp;ouml;&amp;ouml;\" title=\"föö\">foo</a></p>
"
        but got: "<p><a href=\"/f%26ouml%3b%26ouml%3b\" title=\"föö\">foo</a></p>
"

  To rerun use: --match "/parse and render/6.2 Entity and numeric character references/CM310/"

  tests/Text/MMark/TestUtils.hs:111:37: 
  3) parse and render, 6.5 Links, CM468
       expected: "<p><a href=\"foo(and(bar\">link</a>))</p>
"
        but got: "<p><a href=\"foo%28and%28bar\">link</a>))</p>
"

  To rerun use: --match "/parse and render/6.5 Links/CM468/"

  tests/Text/MMark/TestUtils.hs:111:37: 
  4) parse and render, 6.5 Links, CM470
       expected: "<p><a href=\"foo(and(bar)\">link</a></p>
"
        but got: "<p><a href=\"foo%28and%28bar%29\">link</a></p>
"

  To rerun use: --match "/parse and render/6.5 Links/CM470/"

  tests/Text/MMark/TestUtils.hs:111:37: 
  5) parse and render, 6.5 Links, CM474
       expected: "<p><a href=\"foo%20b&amp;auml;\">link</a></p>
"
        but got: "<p><a href=\"foo%20b%26auml%3b\">link</a></p>
"

  To rerun use: --match "/parse and render/6.5 Links/CM474/"

  tests/Text/MMark/TestUtils.hs:111:37: 
  6) parse and render, 6.7 Autolinks, CM568
       expected: "<p><a href=\"mailto:[email protected]\">[email protected]</a></p>
"
        but got: "<p><a href=\"mailto:FOO%40BAR.BAZ\">[email protected]</a></p>
"

  To rerun use: --match "/parse and render/6.7 Autolinks/CM568/"

  tests/Text/MMark/TestUtils.hs:111:37: 
  7) parse and render, 6.7 Autolinks, CM570
       expected: "<p><a href=\"made-up-scheme://foo/,bar\">made-up-scheme://foo/,bar</a></p>
"
        but got: "<p><a href=\"made-up-scheme://foo/%2cbar\">made-up-scheme://foo/%2cbar</a></p>
"

  To rerun use: --match "/parse and render/6.7 Autolinks/CM570/"

  tests/Text/MMark/TestUtils.hs:111:37: 
  8) parse and render, 6.7 Autolinks, CM575
       expected: "<p><a href=\"mailto:[email protected]\">[email protected]</a></p>
"
        but got: "<p><a href=\"mailto:foo%40bar.example.com\">[email protected]</a></p>
"

  To rerun use: --match "/parse and render/6.7 Autolinks/CM575/"

  tests/Text/MMark/TestUtils.hs:111:37: 
  9) parse and render, 6.7 Autolinks, CM576
       expected: "<p><a href=\"mailto:[email protected]\">[email protected]</a></p>
"
        but got: "<p><a href=\"mailto:foo%2bspecial%40Bar.baz-bar0.com\">[email protected]</a></p>
"

  To rerun use: --match "/parse and render/6.7 Autolinks/CM576/"

Randomized with seed 1531926179

Finished in 0.1728 seconds
614 examples, 9 failures
Test suite tests: FAIL
Test suite logged to: dist/test/mmark-0.0.7.4-tests.log
0 of 1 test suites (0 of 1 test cases) passed.

Here are the full build logs (2 files).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.