erusev / parsedown Goto Github PK

View Code? Open in Web Editor NEW

14.6K 219.0 1.1K 1.45 MB

Better Markdown Parser in PHP

Home Page: https://parsedown.org

License: MIT License

PHP 75.72% HTML 24.28%

markdown-parser markdown parser php

parsedown's Introduction

Co-creator of nota.md and intellibar.app.

parsedown's People

Contributors

Stargazers

Watchers

Forkers

evanmo coolsou junp a973c starkd feronti mitio lidoma caporaldead mnishihan broncowdd imcoding shapeoko danielsum maxvint hidalgorafa rmnd fullontom kerkerj scottchiefbaker istvan-antal victorvacaretu imziyang glockenmeier wkpark tobi-pb kanecohen cebe evatechsoft noizefan semvdwal cerdic queirozfcom yangkun teejten9 hitormiss zither alfull ixiongdi dutchakdev lahmizzar raqqun nfreader hkilter t83714 arrolin jmb-technology-limited lucitheforgotten tmyie jonpotiron gunbuddy thachsinh zhongwei0752 macgngsta hotei clphillips tekuram scarwu st-daniel wojons gollapudi lnsoso developerroom tieutinh189 sz-po maladev billygrant24 wackychocolatef spalax doujiang24 duxet mrblaschke raffaelkessler loopbyte brav kevincio alimfree ikedam kanaweb mathildes liangfflia mopvhs alienengineer lohnn berndkammlott borutjures thomasrobertfr wangxinalex pstaender jeremyhoover gorphone brandonkelly nayar lsmoura vicfla95 noname007 soul11201 sibirtsev cyberwani felipegirotti

parsedown's Issues

Missing support for reference-style links

Example source which uses this feature extensively: https://github.com/aws/aws-sdk-php/blob/master/README.md

This is part of the original Markdown spec: http://daringfireball.net/projects/markdown/syntax#link

And according to your demo, PHP Markdown 1.3 supports it: http://parsedown.org/explorer/

Add composer / packagist support

Hey Erusev,

Your project sounds interesting, specifically that is should be fast :)

Is it possible to add support for composer and register it with packagist so that projects can include it using composer?

Thanks!

Support spaces as delimiters in inline code spans

From the Markdown spec:

The backtick delimiters surrounding a code span may include spaces — one after the opening, one before the closing. This allows you to place literal backtick characters at the beginning or end of a code span.

Here is a failing test:

A single backtick in a code span: `` ` ``

A backtick-delimited string in a code span: `` `foo` ``

<p>A single backtick in a code span: <code>`</code></p>
<p>A backtick-delimited string in a code span: <code>`foo`</code></p>

Parsedown includes the spaces inside the <code> element. These spaces would eventually be rendered literally by a browser.

<p>A single backtick in a code span: <code> ` </code></p>
<p>A backtick-delimited string in a code span: <code> `foo` </code></p>

Allow to remove the englobing paragraph

I would like to remove the  englobing the html output as it has not specifically been asked.

Just make an optional parameter to remove it in parse

$parser = new Parsedown();
$output = $parser->parse($input,false);//Without <p></p>
$output = $parser->parse($input,true);//With <p></p>
$output = $parser->parse($input);//With <p></p>

Thanks

Inline tag at paragraph start breaks subsequent parsing

This example converts fine in the official parser but not in Parsedown:

<b>Starting with an inline tag</b> breaks subsequent parsing.

- a
- b
- c

Wrapping the would-be paragraph in a div prevents the issue, as does starting it with plain text.

This may relate to #48

Include version number in header

I'm not using a git submodule, so to track updates I have to check tags / releases. Mind including the latest release number in the header?

Email links support

Parsedown does not support email links.

http://daringfireball.net/projects/markdown/syntax#autolink

According to the spec it also has to support obfuscation of the email address.

Replace the instance function with __construct

Would it work if you replace the instance function with __construct?

Error in adding a link to a code element like [`foo`](yahoo.com)

The following is valid Markdown, but gives me $0 as output in Parsedown.

BEGIN

As can be witnessed by the Github markdown, it works...

Test errors

When I tested this tool in the demo page (http://parsedown.org/explorer/) with the source from http://daringfireball.net/projects/markdown/index.text I can see many errors.

Specify value attribute on ordered list items

This is an extension to #100.

The HTML5 spec specifies a value attribute for li elements inside ol elements.

So a Markdown list like:

5. fifth
6. sixth
7. seventh

could produce HTML like:

<ol>
<li value="5">fifth</li>
<li value="6">sixth</li>
<li value="7">seventh</li>
</ol>

error when pasting html code

When I have <iframe></iframe>

on the same line it doesn't render markup that comes after. When the html code is like

it renders correctly. Can be seen directly on http://parsedown.org/demo.

There might be a minor "syntax" error. If you use double '' in a function name for example, that part becomes italic: my_special_function (which is not the case here on github). I think you need to check for whitespace before/after the '' before marking it as italic.

GFM with set_breaks_enabled(true) breaks output

I used to this Markdown source to test it.

When breaks enabled Parsedown generates a lot of unnecessary   tag. I think this is because of "line by line reading"

If I set_breaks_enabled(false)

If I set_breaks_enabled(true)

Code blocks need to be pre/code not code/pre

I'm using highlight.js and it likes code blocks to be <pre><code> not <code><pre>. Can we add an option to flip those?

Table support

Table support would be really nice and useful, see https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#wiki-tables

No support for link titles

According to the Markdown syntax specification:

To create an inline link, use a set of regular parentheses immediately after the link text’s closing square bracket. Inside the parentheses, put the URL where you want the link to point, along with an optional title for the link, surrounded in quotes.

Also:

You can put the title attribute on the next line and use extra spaces or tabs for padding, which tends to look better with longer URLs

[URL and title](/url/ "title").

should be

<p><a href="/url/" title="title">URL and title</a>.</p>

And

[URL and title](/url/   "title preceded by a tab").

should be

<p><a href="/url/" title="title preceded by a tab">URL and title</a>.</p>

Bad html with unordered list and fenced code

Hi,

First thanks for this amazing tool.

I think I catched a small bug (or new thing to document). If you try to convert this markdown block (take the actual code not the rendered page) everything should be fine (you can use your own demo) :

#!/bin/bash

# Just a comment

If you add a unordered list (* Test) just above the fenced code block then the rendering is bad. I can reproduce it with github engine but php-parsedown 1.4 handle it fine.

Note that if there is a new line between the unordered list and the code block then there is no problem.

Thanks in advance.

unset $block['interrupted'] correctly

2. aaaa
   aaaa
   aaaa

3. aaaa
   bbbb
   cccc

4. aaaa
   bbbb
   cccc

----
2. aaaa
aaaa
aaaa

3. aaaa
bbbb
cccc

4. aaaa
bbbb
cccc

aaaa
aaaa
aaaa

aaaa
bbbb
cccc

aaaa
bbbb
cccc

aaaa
aaaa
aaaa

aaaa
bbbb
cccc

aaaa
bbbb
cccc

the GFM renderer and the Markdown extra can render it correctly (except BR) but parsedown can't
and this is a simple fix for it

diff --git a/Parsedown.php b/Parsedown.php
index 7af08c9..e58d344 100755
--- a/Parsedown.php
+++ b/Parsedown.php
@@ -206,6 +206,8 @@ class Parsedown
                             $block['lines'] = array(
                                 preg_replace('/^[ ]{0,4}/', '', $matches[3]),
                             );
+
+                            unset($block['interrupted']);
                         }

                         continue 2;

One space linebreak

GFM lets you add linebreak by adding just one space at the end of the line. Adding two spaces produces empty line.

I think this would be nice thing to have in Parsedown. If nothing, it will save developers few seconds from typing one more space character.

I know the spec says linebreaks are created with two or more spaces but you could make it with one or more. Having two spaces producing empty line isn't that important. This will preserve backward compatibility.

Various list issues

I'd like to suggest few cosmetic changes about how Parsedown parses lists:

In nested lists, subsequent <ul> tags shouldn't be inline. Right now they appear as </li><ul>, that <ul> should be on new line.

Indentation of <li> with two spaces from its parent <ul>. For nested lists, this should increase, i.e. a second level <ul> should be indented with two spaces from its parent <li> and second level <li> should be indented with two spaces from its parent second level <ul>. Example:

<ul>
--<li></li>
--<li>
----<ul>
------<li></li>
----</ul>
--</li>
--<li></li>
</ul>

Note: these changes must apply to <ol> as well.

When attempting to add <code> block after <li>, the 4 spaces (or 1 tab) indentation characters are removed. Example:

* this is list
    this is code

will parse as:

<li>this is list
this is code</li>

instead as:

<li>this is list</li>
<pre><code>
this is code
</code></pre>

Additionally, <pre><code> and their closing tags shouldn't be inline. The code will look nicer and will have better human readability.

Links won't get converted

Hi,

thanks for this great and fast MD parser :) When i used it on this MD-File: https://github.com/RexDude/seo42/blob/master/README.md

I get 6-times (EDIT: due to a REDAME.md update there are now 10) the following PHP-Warning and also all the Links won't get converted.

Warning: preg_match_all(): Compilation failed: internal error: previously-checked referenced subpattern not found at offset 37 in /home/dude/Projekte/Web/AddonFactory/htdocs/redaxo/include/addons/seo42/classes/class.parsedown.inc.php on line 494

Is this something with my MD-File or with the Parser?

Allow adjacent code spans

Here is an example for adjacent code spans:

`code span``another code span`

The spec doesn't say anything about what could follow after a code span. So it could even be another code span.

However Parsedown currently "sees" a code span only if it is not followed by a ```. Relevant regex:

parsedown/Parsedown.php

Line 977 in 7a4d3c0

if (preg_match('/^(`+)(.+?)\1(?!`)/', $text, $matches))

I think this should be the actual behavior:

`code span``another code span`

<p><code>code span</code><code>another code span</code></p>

Right now, Parsedown produces:

<p><code>code span`</code>another code span`</p>

And Dingus:

<p><code>code span``another code span</code></p>

Invalid HTML with bold and italic.

When using italic inside bold text, you get invalid HTML:

$text = 'This is **bold and *italic***.';
echo Parsedown::instance()->parse($text);

Result:

<p>This is <strong>bold and <em>italic</strong></em>.</p>

The other way round does work correctly:

$text = 'This is *italic and **bold***.';
echo Parsedown::instance()->parse($text);

result:

<p>This is <em>italic and <strong>bold</strong></em>.</p>

John Gruber's test suite

This is a John Gruber's test suite
http://daringfireball.net/projects/downloads/MarkdownTest_1.0.zip

and the following example rendered incorrectly by the parsedown.

In Markdown 1.0.0 and earlier. Version
8. This line turns into a list item.
Because a hard-wrapped line in the
middle of a paragraph looked like a
list item.

Here's one with a bullet.
* criminey.

In Markdown 1.0.0 and earlier. Version
8. This line turns into a list item.
Because a hard-wrapped line in the
middle of a paragraph looked like a
list item.

Here's one with a bullet.

criminey.

babelmark2 result:
http://johnmacfarlane.net/babelmark2/?text=In+Markdown+1.0.0+and+earlier.+Version%0A8.+This+line+turns+into+a+list+item.%0ABecause+a+hard-wrapped+line+in+the%0Amiddle+of+a+paragraph+looked+like+a%0Alist+item.%0A%0AHere%27s+one+with+a+bullet.%0A*+criminey.

Missing support for image title text

Markdown is supposed to support adding title text to an image by putting it in quotes after the URL, like the following:

![Alt Text](http://www.google.com/images/srpr/logo9w.png "Title Text")

Use GitHub's markdown, the code above provides both alt text and the Title Text, which Parsedown lacks.

Parsedown is missing support for the title text. In order to follow the standard and allow users to use Markdown to its fullest potential, support for this should be added.

Parsedown needs to expose the version in code

I just wrote a command line .phar version of parsedown: https://github.com/scottchiefbaker/MarkdownCLI

It would be awesome to output the version of the Parsedown library that was used to build the .phar. If there was a php function parsedown_version(), I could include that version in my usage() output.

Double html-entity encoding in code blocks

If the markdown to be parsed comes directly from a user (as in a comment or post), then we must use htmlentities() or htmlspecialchars() prior to parsing it, in order to prevent XSS:

$_POST['markdown'] = <<<'MD'
<script>alert('xss');</script>
~~~php
$foo = true && (1 < 7 ? "bar" : '>');
\~~~
MD;

// Bad!
$markup = Parsedown::instance()->parse($_POST['markdown']);

// Good
$markdown = htmlspecialchars($_POST['markdown'], ENT_QUOTES, 'UTF-8');
$markup = Parsedown::instance()->parse($markdown);

The problem here is that any entities within fenced code blocks then become double-encoded, because these are encoded internally by Parsedown.

My first thought was to simply use strip_tags() instead of encoding; however, this breaks markdown code blocks that contain HTML/XML, as strip_tags() is not context-aware.

My next thought was to change the htmlspecialchars() calls within Parsedown - where we encode the contents of fenced code blocks - to use the 4th parameter ($double_encode) and turn off double-encoding. Unfortunately, that leaves us with the problem of not being able to represent literal encoded entities within code blocks, e.g.:

$foo = '&amp;';

We can require the calling code to cherry-pick the parts of the input to encode entities in (everything but the content between code block fences), but that requires a fair bit of markdown parsing outside the actual markdown parser.

The only other thing I can think of would be to use a DOM parser in conjunction with htmlspecialchars_decode() to un-double-encode things inside <code></code> blocks in the parsed markdown, but this seems like it would just be putting a band-aid on the problem instead of actually fixing it.

I don't have a good solution in mind yet; I just wanted to make the issue known and open discussion (maybe I missed something obvious).

Security ?

Hi,
On Github, tags like script are escaped. I think it could be a good option to have, maybe is it as simple as $text = htmlentities($text);, but it would require an options array, and may lead to some additions ...

Thanks in advance

Support title in parentheses for reference-style links

From the Markdown spec:

The following three link definitions are equivalent:

[foo]: http://example.com/  "Optional Title Here"
[foo]: http://example.com/  'Optional Title Here'
[foo]: http://example.com/  (Optional Title Here)

Parsedown is not supporting the third case. Here is a failing test:

[Reference link][link title with double quotes]
[Reference link][link title with single quotes]
[Reference link][link title with parentheses]

[Reference link with a space] [link title with double quotes]
[Reference link with a space] [link title with single quotes]
[Reference link with a space] [link title with parentheses]

[link title with double quotes]: http://example.com/  "Double Quotes Title"
[link title with single quotes]: http://example.com/  'Single Quotes Title'
[link title with parentheses]: http://example.com/  (Parentheses Title)

<p><a href="http://example.com/" title="Double Quotes Title">Reference link</a>
<a href="http://example.com/" title="Single Quotes Title">Reference link</a>
<a href="http://example.com/" title="Parentheses Title">Reference link</a></p>
<p><a href="http://example.com/" title="Double Quotes Title">Reference link with a space</a>
<a href="http://example.com/" title="Single Quotes Title">Reference link with a space</a>
<a href="http://example.com/" title="Parentheses Title">Reference link with a space</a></p>

Currently Parsedown outputs HTML like:

<a href="http://example.com/  (Parentheses Title)">Reference link</a></p>

Multi-line list items

I noticed some strange behaviour with multi-line list items:

I don't think these  tags should be created for multi-line list items. http://daringfireball.net/projects/markdown/syntax#list

Setext-style headers

As described on daringfireball is it possible to add the Setext-style headers?

Something like this doesn't get parsed like in the specification:

"This is an H1"
"============="

"This is an H2"
"-------------"

(added the " as github parses the headings without)

Feature request - Syntax-correct language declarations for fenced code blocks

Would be awesome to have this.

e.g.

```ruby
some.code()
```

would become

<pre><code class="language-ruby">some.code()</code></pre>

in line with the WHATWG Spec and allowing parsedown to be paired up with syntax highlighters.

Implant new tags

Hi !
I suggest to implant new tags for extends parsedown.
For example :
^(x) becomes x
_(x) becomes x
[ [ x ] ] (or other, i search) becomes x
i don't now for < kbd > tags
i don't now for < audio > and < video > tags.
PS : Sorry for my ~~english~~ globish !

No way to link to Markdown/Parsedown comparison on parsedown.org

It would be really helpful for bug reports if there was a way to link the Markdown/Parsedown comparison on parsedown.org with a predefined string.

I'd be happy to write the code if you'd open up the code for it. I'm sure it would be pretty simple.

Line Breaks

Nothing big, but Parsedown creates   tags which don't work in certain ancient browsers.
Changing the behavior to creating   fixes the problem.

Allow "extensions"

It might be interesting allow a more "modular" approach. It is not really easy to create a ParsedownExtra class that extends the main Parsedown class and adds additional parsing.

This could be handy for when the original spec is stable and you want to add the fenced code blocks, and maybe automatic url detection in quick blocks.

Fenced block support

Hey Erusev,

are you planning to include support for the fenced block notation (https://help.github.com/articles/github-flavored-markdown#fenced-code-blocks)?

This is a rather useful feature for my userbase :)

Thanks!

Specify start attribute on ordered lists

According to the Markdown spec any number with a following dot could start an ordered list.

Like so:

5. fifth
6. sixth
7. seventh

In the HTML5 spec there is a start attribute on ol elements which could indicate the start of the list.

So this should produce something like:

<ol start="5">
<li>fifth</li>
<li>sixth</li>
<li>seventh</li>
</ol>

Other implementations which do that:

Thanks should go to @wkpark for revealing it in #99.

4 or more asterisks causes differing results

Hi,

I ran this through your demo:

_test_

and got different results from Parsedown vs PHP Markdown 1.3.

Markdown yields: ****test****
Parsedown yields: test

parsedown.org comments

First congrats on the new version of the website!

I have several issues/questions/comments:

You could open source it and get suggestions.
The previous homepage with visualized parsing of Markdown was an incredible design decision. I have even given it out as an example. It is sad to see it go. Would you reconsider?
~~The tests in the website do not include all of the tests used by PHPUnit. Is there any particular reason for that?~~
You could have the tabbed interface even on the homepage.

All of these could have been a different issue/PR and discussed separately with code examples, but since it's not open sourced I am listing them here all at once.

Bold and italic over multiple lines

Judging from other markdown parsers, bold and italic should work over multiple lines.

Example:

**test
dsf
sfd
fsd
fds
fds
sdf
fsd
fds**

Adding the \s flag to the regex seems to fix this.

line break broken

If you try parsing

this
breaks

you should get

<p>
this<br/>
breaks
</p>

but instead you get

<p>
this
breaks
</p>

So the test page on the website is not showing the same thing as the demo page

Problems with HTML

When using html it gets wrapped in  tags.

<div class="test">

Nunc accumsan scelerisque pellentesque. Proin sed placerat turpis. Nam quis odio placerat risus molestie varius sit amet quis lorem. Morbi sed elit massa. Aliquam egestas elit quis ligula fringilla, ac placerat libero aliquam.

</div>

Would make:

<p><div class="test"></p>
<p>Nunc accumsan scelerisque pellentesque. Proin sed placerat turpis. Nam quis odio placerat risus molestie varius sit amet quis lorem. Morbi sed elit massa. Aliquam egestas elit quis ligula fringilla, ac placerat libero aliquam.</p>
<p></div></p>

Problem with lists before code blocks

I just encountered an issue where a list directly before an indented code block with begins with a docblock will confuse the parser into thinking the docblock is an extension of the list. Below is an example of the issue I encountered.

list item 1
list item 2

/**
- This confuses parser
  */

It obviously confuses the GitHub Flavored Markdown parser as well. I didn't expect that!

I was able to work around the issue like so.

list item 1
list item 2

Below is a code sample:

/**
 * This confuses parser, unless you add some text after the list and before
 * after the list and before the code block.
 */

Implementing fenced code blocks (issue #2) would help a lot, as you wouldn't need to indent code blocks.

list item 1
list item 2

/**
 * This confuses parser, but fenced code blocks avoid the issue
 */

HTML entities parsing

Is this desired behavior or bug?

Parts of list item missing in parsed output

The following markdown:

* [[\yii\caching\ApcCache]]: uses PHP [APC](http://php.net/manual/en/book.apc.php) extension. This option can be
  considered as the fastest one when dealing with cache for a centralized thick application (e.g. one
  server, no dedicated load balancers, etc.).

* [[\yii\caching\DbCache]]: uses a database table to store cached data. By default, it will create and use a
  [SQLite3](http://sqlite.org/) database under the runtime directory. You can explicitly specify a database for
  it to use by setting its `db` property.

Renders like this in GFM:

[[\yii\caching\ApcCache]]: uses PHP APC extension. This option can be
considered as the fastest one when dealing with cache for a centralized thick application (e.g. one
server, no dedicated load balancers, etc.).
[[\yii\caching\DbCache]]: uses a database table to store cached data. By default, it will create and use a
SQLite3 database under the runtime directory. You can explicitly specify a database for
it to use by setting its db property.

but in parsedown output there are parts missing from the list item text.

Autolinking feature does not cover last slash of url

This Markdown:
jQuery UI: http://jqueryui.com/

get converted to this HTML:
jQuery UI: <a href="http://jqueryui.com">http://jqueryui.com</a>/

but it should be:
jQuery UI: <a href="http://jqueryui.com">http://jqueryui.com/</a>

right?

Indentation

Continued from #60...

Note: this issue is not just for lists, everyone is free to suggest which elements and how they should be indented.

I'd like to write some more tests

Tests are free, so I was thinking about writing some tests to cover some more complex use cases. Any objection to me writing more tests, even if they don't address a current known issue?