Co-creator of nota.md and intellibar.app.
erusev / parsedown Goto Github PK
View Code? Open in Web Editor NEWBetter Markdown Parser in PHP
Home Page: https://parsedown.org
License: MIT License
Better Markdown Parser in PHP
Home Page: https://parsedown.org
License: MIT License
Co-creator of nota.md and intellibar.app.
Example source which uses this feature extensively: https://github.com/aws/aws-sdk-php/blob/master/README.md
This is part of the original Markdown spec: http://daringfireball.net/projects/markdown/syntax#link
And according to your demo, PHP Markdown 1.3 supports it: http://parsedown.org/explorer/
Hey Erusev,
Your project sounds interesting, specifically that is should be fast :)
Is it possible to add support for composer and register it with packagist so that projects can include it using composer?
Thanks!
From the Markdown spec:
The backtick delimiters surrounding a code span may include spaces โ one after the opening, one before the closing. This allows you to place literal backtick characters at the beginning or end of a code span.
Here is a failing test:
A single backtick in a code span: `` ` ``
A backtick-delimited string in a code span: `` `foo` ``
<p>A single backtick in a code span: <code>`</code></p>
<p>A backtick-delimited string in a code span: <code>`foo`</code></p>
Parsedown includes the spaces inside the <code>
element. These spaces would eventually be rendered literally by a browser.
<p>A single backtick in a code span: <code> ` </code></p>
<p>A backtick-delimited string in a code span: <code> `foo` </code></p>
I would like to remove the <p></p>
englobing the html output as it has not specifically been asked.
Just make an optional parameter to remove it in parse
$parser = new Parsedown();
$output = $parser->parse($input,false);//Without <p></p>
$output = $parser->parse($input,true);//With <p></p>
$output = $parser->parse($input);//With <p></p>
Thanks
This example converts fine in the official parser but not in Parsedown:
<b>Starting with an inline tag</b> breaks subsequent parsing.
- a
- b
- c
Wrapping the would-be paragraph in a div
prevents the issue, as does starting it with plain text.
This may relate to #48
I'm not using a git submodule, so to track updates I have to check tags / releases. Mind including the latest release number in the header?
Parsedown does not support email links.
http://daringfireball.net/projects/markdown/syntax#autolink
According to the spec it also has to support obfuscation of the email address.
Would it work if you replace the instance function with __construct?
The following is valid Markdown, but gives me $0
as output in Parsedown.
As can be witnessed by the Github markdown, it works...
When I tested this tool in the demo page (http://parsedown.org/explorer/) with the source from http://daringfireball.net/projects/markdown/index.text I can see many errors.
This is an extension to #100.
The HTML5 spec specifies a value
attribute for li
elements inside ol
elements.
So a Markdown list like:
5. fifth
6. sixth
7. seventh
could produce HTML like:
<ol>
<li value="5">fifth</li>
<li value="6">sixth</li>
<li value="7">seventh</li>
</ol>
When I have <iframe></iframe>
on the same line it doesn't render markup that comes after. When the html code is like <iframe></iframe>it renders correctly. Can be seen directly on http://parsedown.org/demo.
There might be a minor "syntax" error. If you use double '' in a function name for example, that part becomes italic: my_special_function (which is not the case here on github). I think you need to check for whitespace before/after the '' before marking it as italic.
I used to this Markdown source to test it.
When breaks enabled Parsedown generates a lot of unnecessary <br>
tag. I think this is because of "line by line reading"
I'm using highlight.js and it likes code blocks to be <pre><code>
not <code><pre>
. Can we add an option to flip those?
Table support would be really nice and useful, see https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#wiki-tables
According to the Markdown syntax specification:
To create an inline link, use a set of regular parentheses immediately after the link textโs closing square bracket. Inside the parentheses, put the URL where you want the link to point, along with an optional title for the link, surrounded in quotes.
Also:
You can put the title attribute on the next line and use extra spaces or tabs for padding, which tends to look better with longer URLs
So
[URL and title](/url/ "title").
should be
<p><a href="/url/" title="title">URL and title</a>.</p>
And
[URL and title](/url/ "title preceded by a tab").
should be
<p><a href="/url/" title="title preceded by a tab">URL and title</a>.</p>
Hi,
First thanks for this amazing tool.
I think I catched a small bug (or new thing to document). If you try to convert this markdown block (take the actual code not the rendered page) everything should be fine (you can use your own demo) :
#!/bin/bash
# Just a comment
If you add a unordered list (* Test) just above the fenced code block then the rendering is bad. I can reproduce it with github engine but php-parsedown 1.4 handle it fine.
Note that if there is a new line between the unordered list and the code block then there is no problem.
Thanks in advance.
2. aaaa
aaaa
aaaa
3. aaaa
bbbb
cccc
4. aaaa
bbbb
cccc
----
2. aaaa
aaaa
aaaa
3. aaaa
bbbb
cccc
4. aaaa
bbbb
cccc
- aaaa
aaaa
aaaa- aaaa
bbbb
cccc- aaaa
bbbb
cccc
- aaaa
aaaa
aaaa- aaaa
bbbb
cccc- aaaa
bbbb
cccc
the GFM renderer and the Markdown extra can render it correctly (except BR) but parsedown can't
and this is a simple fix for it
diff --git a/Parsedown.php b/Parsedown.php
index 7af08c9..e58d344 100755
--- a/Parsedown.php
+++ b/Parsedown.php
@@ -206,6 +206,8 @@ class Parsedown
$block['lines'] = array(
preg_replace('/^[ ]{0,4}/', '', $matches[3]),
);
+
+ unset($block['interrupted']);
}
continue 2;
GFM lets you add linebreak by adding just one space at the end of the line. Adding two spaces produces empty line.
I think this would be nice thing to have in Parsedown. If nothing, it will save developers few seconds from typing one more space character.
I know the spec says linebreaks are created with two or more spaces but you could make it with one or more. Having two spaces producing empty line isn't that important. This will preserve backward compatibility.
I'd like to suggest few cosmetic changes about how Parsedown parses lists:
In nested lists, subsequent <ul> tags shouldn't be inline. Right now they appear as </li><ul>, that <ul> should be on new line.
Indentation of <li> with two spaces from its parent <ul>. For nested lists, this should increase, i.e. a second level <ul> should be indented with two spaces from its parent <li> and second level <li> should be indented with two spaces from its parent second level <ul>. Example:
<ul>
--<li></li>
--<li>
----<ul>
------<li></li>
----</ul>
--</li>
--<li></li>
</ul>
Note: these changes must apply to <ol> as well.
When attempting to add <code> block after <li>, the 4 spaces (or 1 tab) indentation characters are removed. Example:
* this is list
this is code
will parse as:
<li>this is list
this is code</li>
instead as:
<li>this is list</li>
<pre><code>
this is code
</code></pre>
Additionally, <pre><code> and their closing tags shouldn't be inline. The code will look nicer and will have better human readability.
Hi,
thanks for this great and fast MD parser :) When i used it on this MD-File: https://github.com/RexDude/seo42/blob/master/README.md
I get 6-times (EDIT: due to a REDAME.md update there are now 10) the following PHP-Warning and also all the Links won't get converted.
Warning: preg_match_all(): Compilation failed: internal error: previously-checked referenced subpattern not found at offset 37 in /home/dude/Projekte/Web/AddonFactory/htdocs/redaxo/include/addons/seo42/classes/class.parsedown.inc.php on line 494
Is this something with my MD-File or with the Parser?
Here is an example for adjacent code spans:
`code span``another code span`
The spec doesn't say anything about what could follow after a code span. So it could even be another code span.
However Parsedown currently "sees" a code span only if it is not followed by a ```. Relevant regex:
Line 977 in 7a4d3c0
I think this should be the actual behavior:
`code span``another code span`
<p><code>code span</code><code>another code span</code></p>
Right now, Parsedown produces:
<p><code>code span`</code>another code span`</p>
And Dingus:
<p><code>code span``another code span</code></p>
When using italic inside bold text, you get invalid HTML:
$text = 'This is **bold and *italic***.';
echo Parsedown::instance()->parse($text);
Result:
<p>This is <strong>bold and <em>italic</strong></em>.</p>
The other way round does work correctly:
$text = 'This is *italic and **bold***.';
echo Parsedown::instance()->parse($text);
result:
<p>This is <em>italic and <strong>bold</strong></em>.</p>
This is a John Gruber's test suite
http://daringfireball.net/projects/downloads/MarkdownTest_1.0.zip
and the following example rendered incorrectly by the parsedown.
In Markdown 1.0.0 and earlier. Version
8. This line turns into a list item.
Because a hard-wrapped line in the
middle of a paragraph looked like a
list item.
Here's one with a bullet.
* criminey.
In Markdown 1.0.0 and earlier. Version
8. This line turns into a list item.
Because a hard-wrapped line in the
middle of a paragraph looked like a
list item.
Here's one with a bullet.
Markdown is supposed to support adding title text to an image by putting it in quotes after the URL, like the following:
![Alt Text](http://www.google.com/images/srpr/logo9w.png "Title Text")
Use GitHub's markdown, the code above provides both alt text and the Title Text, which Parsedown lacks.
Parsedown is missing support for the title text. In order to follow the standard and allow users to use Markdown to its fullest potential, support for this should be added.
I just wrote a command line .phar version of parsedown: https://github.com/scottchiefbaker/MarkdownCLI
It would be awesome to output the version of the Parsedown library that was used to build the .phar. If there was a php function parsedown_version()
, I could include that version in my usage()
output.
If the markdown to be parsed comes directly from a user (as in a comment or post), then we must use htmlentities()
or htmlspecialchars()
prior to parsing it, in order to prevent XSS:
$_POST['markdown'] = <<<'MD'
<script>alert('xss');</script>
~~~php
$foo = true && (1 < 7 ? "bar" : '>');
\~~~
MD;
// Bad!
$markup = Parsedown::instance()->parse($_POST['markdown']);
// Good
$markdown = htmlspecialchars($_POST['markdown'], ENT_QUOTES, 'UTF-8');
$markup = Parsedown::instance()->parse($markdown);
The problem here is that any entities within fenced code blocks then become double-encoded, because these are encoded internally by Parsedown.
My first thought was to simply use strip_tags()
instead of encoding; however, this breaks markdown code blocks that contain HTML/XML, as strip_tags()
is not context-aware.
My next thought was to change the htmlspecialchars()
calls within Parsedown - where we encode the contents of fenced code blocks - to use the 4th parameter ($double_encode
) and turn off double-encoding. Unfortunately, that leaves us with the problem of not being able to represent literal encoded entities within code blocks, e.g.:
$foo = '&';
We can require the calling code to cherry-pick the parts of the input to encode entities in (everything but the content between code block fences), but that requires a fair bit of markdown parsing outside the actual markdown parser.
The only other thing I can think of would be to use a DOM parser in conjunction with htmlspecialchars_decode()
to un-double-encode things inside <code></code>
blocks in the parsed markdown, but this seems like it would just be putting a band-aid on the problem instead of actually fixing it.
I don't have a good solution in mind yet; I just wanted to make the issue known and open discussion (maybe I missed something obvious).
Hi,
On Github, tags like script
are escaped. I think it could be a good option to have, maybe is it as simple as $text = htmlentities($text);
, but it would require an options array, and may lead to some additions ...
Thanks in advance
From the Markdown spec:
The following three link definitions are equivalent:
[foo]: http://example.com/ "Optional Title Here" [foo]: http://example.com/ 'Optional Title Here' [foo]: http://example.com/ (Optional Title Here)
Parsedown is not supporting the third case. Here is a failing test:
[Reference link][link title with double quotes]
[Reference link][link title with single quotes]
[Reference link][link title with parentheses]
[Reference link with a space] [link title with double quotes]
[Reference link with a space] [link title with single quotes]
[Reference link with a space] [link title with parentheses]
[link title with double quotes]: http://example.com/ "Double Quotes Title"
[link title with single quotes]: http://example.com/ 'Single Quotes Title'
[link title with parentheses]: http://example.com/ (Parentheses Title)
<p><a href="http://example.com/" title="Double Quotes Title">Reference link</a>
<a href="http://example.com/" title="Single Quotes Title">Reference link</a>
<a href="http://example.com/" title="Parentheses Title">Reference link</a></p>
<p><a href="http://example.com/" title="Double Quotes Title">Reference link with a space</a>
<a href="http://example.com/" title="Single Quotes Title">Reference link with a space</a>
<a href="http://example.com/" title="Parentheses Title">Reference link with a space</a></p>
Currently Parsedown outputs HTML like:
<a href="http://example.com/ (Parentheses Title)">Reference link</a></p>
I noticed some strange behaviour with multi-line list items:
I don't think these <p>
tags should be created for multi-line list items. http://daringfireball.net/projects/markdown/syntax#list
As described on daringfireball is it possible to add the Setext-style headers?
Something like this doesn't get parsed like in the specification:
"This is an H1"
"============="
"This is an H2"
"-------------"
(added the " as github parses the headings without)
Would be awesome to have this.
e.g.
```ruby
some.code()
```
would become
<pre><code class="language-ruby">some.code()</code></pre>
in line with the WHATWG Spec and allowing parsedown to be paired up with syntax highlighters.
Hi !
I suggest to implant new tags for extends parsedown.
For example :
^(x) becomes < sup >x</ sup >
_(x) becomes < sub >x</ sub >
[ [ x ] ] (or other, i search) becomes x
i don't now for < kbd > tags
i don't now for < audio > and < video > tags.
PS : Sorry for my english globish !
It would be really helpful for bug reports if there was a way to link the Markdown/Parsedown comparison on parsedown.org with a predefined string.
I'd be happy to write the code if you'd open up the code for it. I'm sure it would be pretty simple.
Nothing big, but Parsedown creates <br/>
tags which don't work in certain ancient browsers.
Changing the behavior to creating <br />
fixes the problem.
It might be interesting allow a more "modular" approach. It is not really easy to create a ParsedownExtra class that extends the main Parsedown class and adds additional parsing.
This could be handy for when the original spec is stable and you want to add the fenced code blocks, and maybe automatic url detection in quick blocks.
Hey Erusev,
are you planning to include support for the fenced block notation (https://help.github.com/articles/github-flavored-markdown#fenced-code-blocks)?
This is a rather useful feature for my userbase :)
Thanks!
According to the Markdown spec any number with a following dot could start an ordered list.
Like so:
5. fifth
6. sixth
7. seventh
In the HTML5 spec there is a start
attribute on ol
elements which could indicate the start of the list.
So this should produce something like:
<ol start="5">
<li>fifth</li>
<li>sixth</li>
<li>seventh</li>
</ol>
Other implementations which do that:
Hi,
I ran this through your demo:
_test_
and got different results from Parsedown vs PHP Markdown 1.3.
Markdown yields: <p>****test****</p>
Parsedown yields: <p><strong><em></strong>test<strong></em></strong></p>
similar results for:
_test_
Markdown: <p>_test__</p>
Parsedown: <p><strong><em>_test</em></strong></p>
It's my understanding that anything 4+ asterisks should result in no formatting and show the asterisks.
Thanks.
First congrats on the new version of the website!
I have several issues/questions/comments:
All of these could have been a different issue/PR and discussed separately with code examples, but since it's not open sourced I am listing them here all at once.
Judging from other markdown parsers, bold and italic should work over multiple lines.
Example:
**test
dsf
sfd
fsd
fds
fds
sdf
fsd
fds**
Adding the \s flag to the regex seems to fix this.
If you try parsing
this
breaks
you should get
<p>
this<br/>
breaks
</p>
but instead you get
<p>
this
breaks
</p>
So the test page on the website is not showing the same thing as the demo page
When using html it gets wrapped in <p>
tags.
<div class="test">
Nunc accumsan scelerisque pellentesque. Proin sed placerat turpis. Nam quis odio placerat risus molestie varius sit amet quis lorem. Morbi sed elit massa. Aliquam egestas elit quis ligula fringilla, ac placerat libero aliquam.
</div>
Would make:
<p><div class="test"></p>
<p>Nunc accumsan scelerisque pellentesque. Proin sed placerat turpis. Nam quis odio placerat risus molestie varius sit amet quis lorem. Morbi sed elit massa. Aliquam egestas elit quis ligula fringilla, ac placerat libero aliquam.</p>
<p></div></p>
I just encountered an issue where a list directly before an indented code block with begins with a docblock will confuse the parser into thinking the docblock is an extension of the list. Below is an example of the issue I encountered.
list item 1
list item 2
/**
It obviously confuses the GitHub Flavored Markdown parser as well. I didn't expect that!
I was able to work around the issue like so.
Below is a code sample:
/**
* This confuses parser, unless you add some text after the list and before
* after the list and before the code block.
*/
Implementing fenced code blocks (issue #2) would help a lot, as you wouldn't need to indent code blocks.
/**
* This confuses parser, but fenced code blocks avoid the issue
*/
Apparently 4fecd91 commit broke parsing of HTML entities. Now they just show up as © instead ยฉ, for example.
Is this desired behavior or bug?
The following markdown:
* [[\yii\caching\ApcCache]]: uses PHP [APC](http://php.net/manual/en/book.apc.php) extension. This option can be
considered as the fastest one when dealing with cache for a centralized thick application (e.g. one
server, no dedicated load balancers, etc.).
* [[\yii\caching\DbCache]]: uses a database table to store cached data. By default, it will create and use a
[SQLite3](http://sqlite.org/) database under the runtime directory. You can explicitly specify a database for
it to use by setting its `db` property.
Renders like this in GFM:
db
property.but in parsedown output there are parts missing from the list item text.
This Markdown:
jQuery UI: http://jqueryui.com/
get converted to this HTML:
<p>jQuery UI: <a href="http://jqueryui.com">http://jqueryui.com</a>/</p>
but it should be:
<p>jQuery UI: <a href="http://jqueryui.com">http://jqueryui.com/</a></p>
right?
Continued from #60...
Note: this issue is not just for lists, everyone is free to suggest which elements and how they should be indented.
Tests are free, so I was thinking about writing some tests to cover some more complex use cases. Any objection to me writing more tests, even if they don't address a current known issue?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.