Giter Site home page Giter Site logo

parsedown's Introduction

parsedown's People

Contributors

adrilo avatar aidantwoods avatar andreybolonin avatar carusogabriel avatar cebe avatar daniel-km avatar donatj avatar erusev avatar garoevans avatar grahamcampbell avatar grogy avatar harikt avatar henriquemoody avatar hkdobrev avatar jbafford avatar jeanmonod avatar jmsv avatar jstanden avatar kanecohen avatar kelunik avatar luizbills avatar m1guelpf avatar nanuke avatar nathanbaulch avatar paukenba avatar phrozenbyte avatar pjona avatar rhukster avatar scarwu avatar wkpark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsedown's Issues

Add composer / packagist support

Hey Erusev,

Your project sounds interesting, specifically that is should be fast :)

Is it possible to add support for composer and register it with packagist so that projects can include it using composer?

Thanks!

Support spaces as delimiters in inline code spans

From the Markdown spec:

The backtick delimiters surrounding a code span may include spaces โ€” one after the opening, one before the closing. This allows you to place literal backtick characters at the beginning or end of a code span.

Here is a failing test:

A single backtick in a code span: `` ` ``

A backtick-delimited string in a code span: `` `foo` ``
<p>A single backtick in a code span: <code>`</code></p>
<p>A backtick-delimited string in a code span: <code>`foo`</code></p>

Parsedown includes the spaces inside the <code> element. These spaces would eventually be rendered literally by a browser.

<p>A single backtick in a code span: <code> ` </code></p>
<p>A backtick-delimited string in a code span: <code> `foo` </code></p>

Allow to remove the englobing paragraph

I would like to remove the <p></p> englobing the html output as it has not specifically been asked.

Just make an optional parameter to remove it in parse

$parser = new Parsedown();
$output = $parser->parse($input,false);//Without <p></p>
$output = $parser->parse($input,true);//With <p></p>
$output = $parser->parse($input);//With <p></p>

Thanks

Include version number in header

I'm not using a git submodule, so to track updates I have to check tags / releases. Mind including the latest release number in the header?

Problem with italics

There might be a minor "syntax" error. If you use double '' in a function name for example, that part becomes italic: my_special_function (which is not the case here on github). I think you need to check for whitespace before/after the '' before marking it as italic.

No support for link titles

According to the Markdown syntax specification:

To create an inline link, use a set of regular parentheses immediately after the link textโ€™s closing square bracket. Inside the parentheses, put the URL where you want the link to point, along with an optional title for the link, surrounded in quotes.

Also:

You can put the title attribute on the next line and use extra spaces or tabs for padding, which tends to look better with longer URLs

So

[URL and title](/url/ "title").

should be

<p><a href="/url/" title="title">URL and title</a>.</p>

And

[URL and title](/url/   "title preceded by a tab").

should be

<p><a href="/url/" title="title preceded by a tab">URL and title</a>.</p>

Bad html with unordered list and fenced code

Hi,

First thanks for this amazing tool.

I think I catched a small bug (or new thing to document). If you try to convert this markdown block (take the actual code not the rendered page) everything should be fine (you can use your own demo) :

#!/bin/bash

# Just a comment

If you add a unordered list (* Test) just above the fenced code block then the rendering is bad. I can reproduce it with github engine but php-parsedown 1.4 handle it fine.

Note that if there is a new line between the unordered list and the code block then there is no problem.

Thanks in advance.

unset $block['interrupted'] correctly

2. aaaa
   aaaa
   aaaa

3. aaaa
   bbbb
   cccc

4. aaaa
   bbbb
   cccc

----
2. aaaa
aaaa
aaaa

3. aaaa
bbbb
cccc

4. aaaa
bbbb
cccc
  1. aaaa
    aaaa
    aaaa
  2. aaaa
    bbbb
    cccc
  3. aaaa
    bbbb
    cccc

  1. aaaa
    aaaa
    aaaa
  2. aaaa
    bbbb
    cccc
  3. aaaa
    bbbb
    cccc

the GFM renderer and the Markdown extra can render it correctly (except BR) but parsedown can't
and this is a simple fix for it

diff --git a/Parsedown.php b/Parsedown.php
index 7af08c9..e58d344 100755
--- a/Parsedown.php
+++ b/Parsedown.php
@@ -206,6 +206,8 @@ class Parsedown
                             $block['lines'] = array(
                                 preg_replace('/^[ ]{0,4}/', '', $matches[3]),
                             );
+
+                            unset($block['interrupted']);
                         }

                         continue 2;

One space linebreak

GFM lets you add linebreak by adding just one space at the end of the line. Adding two spaces produces empty line.

I think this would be nice thing to have in Parsedown. If nothing, it will save developers few seconds from typing one more space character.

I know the spec says linebreaks are created with two or more spaces but you could make it with one or more. Having two spaces producing empty line isn't that important. This will preserve backward compatibility.

Various list issues

I'd like to suggest few cosmetic changes about how Parsedown parses lists:

In nested lists, subsequent <ul> tags shouldn't be inline. Right now they appear as </li><ul>, that <ul> should be on new line.

Indentation of <li> with two spaces from its parent <ul>. For nested lists, this should increase, i.e. a second level <ul> should be indented with two spaces from its parent <li> and second level <li> should be indented with two spaces from its parent second level <ul>. Example:

<ul>
--<li></li>
--<li>
----<ul>
------<li></li>
----</ul>
--</li>
--<li></li>
</ul>

Note: these changes must apply to <ol> as well.

When attempting to add <code> block after <li>, the 4 spaces (or 1 tab) indentation characters are removed. Example:

* this is list
    this is code

will parse as:

<li>this is list
this is code</li>

instead as:

<li>this is list</li>
<pre><code>
this is code
</code></pre>

Additionally, <pre><code> and their closing tags shouldn't be inline. The code will look nicer and will have better human readability.

Links won't get converted

Hi,

thanks for this great and fast MD parser :) When i used it on this MD-File: https://github.com/RexDude/seo42/blob/master/README.md

I get 6-times (EDIT: due to a REDAME.md update there are now 10) the following PHP-Warning and also all the Links won't get converted.

Warning: preg_match_all(): Compilation failed: internal error: previously-checked referenced subpattern not found at offset 37 in /home/dude/Projekte/Web/AddonFactory/htdocs/redaxo/include/addons/seo42/classes/class.parsedown.inc.php on line 494

Is this something with my MD-File or with the Parser?

Allow adjacent code spans

Here is an example for adjacent code spans:

`code span``another code span`

The spec doesn't say anything about what could follow after a code span. So it could even be another code span.

However Parsedown currently "sees" a code span only if it is not followed by a ```. Relevant regex:

if (preg_match('/^(`+)(.+?)\1(?!`)/', $text, $matches))


I think this should be the actual behavior:

`code span``another code span`
<p><code>code span</code><code>another code span</code></p>

Right now, Parsedown produces:

<p><code>code span`</code>another code span`</p>

And Dingus:

<p><code>code span``another code span</code></p>

Invalid HTML with bold and italic.

When using italic inside bold text, you get invalid HTML:

$text = 'This is **bold and *italic***.';
echo Parsedown::instance()->parse($text);

Result:

<p>This is <strong>bold and <em>italic</strong></em>.</p>

The other way round does work correctly:

$text = 'This is *italic and **bold***.';
echo Parsedown::instance()->parse($text);

result:

<p>This is <em>italic and <strong>bold</strong></em>.</p>

John Gruber's test suite

This is a John Gruber's test suite
http://daringfireball.net/projects/downloads/MarkdownTest_1.0.zip

and the following example rendered incorrectly by the parsedown.

In Markdown 1.0.0 and earlier. Version
8. This line turns into a list item.
Because a hard-wrapped line in the
middle of a paragraph looked like a
list item.

Here's one with a bullet.
* criminey.

In Markdown 1.0.0 and earlier. Version
8. This line turns into a list item.
Because a hard-wrapped line in the
middle of a paragraph looked like a
list item.

Here's one with a bullet.

  • criminey.

babelmark2 result:
http://johnmacfarlane.net/babelmark2/?text=In+Markdown+1.0.0+and+earlier.+Version%0A8.+This+line+turns+into+a+list+item.%0ABecause+a+hard-wrapped+line+in+the%0Amiddle+of+a+paragraph+looked+like+a%0Alist+item.%0A%0AHere%27s+one+with+a+bullet.%0A*+criminey.

Missing support for image title text

Markdown is supposed to support adding title text to an image by putting it in quotes after the URL, like the following:

![Alt Text](http://www.google.com/images/srpr/logo9w.png "Title Text")

Use GitHub's markdown, the code above provides both alt text and the Title Text, which Parsedown lacks.

Alt Text

Parsedown is missing support for the title text. In order to follow the standard and allow users to use Markdown to its fullest potential, support for this should be added.

Double html-entity encoding in code blocks

If the markdown to be parsed comes directly from a user (as in a comment or post), then we must use htmlentities() or htmlspecialchars() prior to parsing it, in order to prevent XSS:

$_POST['markdown'] = <<<'MD'
<script>alert('xss');</script>
~~~php
$foo = true && (1 < 7 ? "bar" : '>');
\~~~
MD;

// Bad!
$markup = Parsedown::instance()->parse($_POST['markdown']);

// Good
$markdown = htmlspecialchars($_POST['markdown'], ENT_QUOTES, 'UTF-8');
$markup = Parsedown::instance()->parse($markdown);

The problem here is that any entities within fenced code blocks then become double-encoded, because these are encoded internally by Parsedown.

My first thought was to simply use strip_tags() instead of encoding; however, this breaks markdown code blocks that contain HTML/XML, as strip_tags() is not context-aware.

My next thought was to change the htmlspecialchars() calls within Parsedown - where we encode the contents of fenced code blocks - to use the 4th parameter ($double_encode) and turn off double-encoding. Unfortunately, that leaves us with the problem of not being able to represent literal encoded entities within code blocks, e.g.:

$foo = '&amp;';

We can require the calling code to cherry-pick the parts of the input to encode entities in (everything but the content between code block fences), but that requires a fair bit of markdown parsing outside the actual markdown parser.

The only other thing I can think of would be to use a DOM parser in conjunction with htmlspecialchars_decode() to un-double-encode things inside <code></code> blocks in the parsed markdown, but this seems like it would just be putting a band-aid on the problem instead of actually fixing it.

I don't have a good solution in mind yet; I just wanted to make the issue known and open discussion (maybe I missed something obvious).

Security ?

Hi,
On Github, tags like script are escaped. I think it could be a good option to have, maybe is it as simple as $text = htmlentities($text);, but it would require an options array, and may lead to some additions ...

Thanks in advance

Support title in parentheses for reference-style links

From the Markdown spec:

The following three link definitions are equivalent:

[foo]: http://example.com/  "Optional Title Here"
[foo]: http://example.com/  'Optional Title Here'
[foo]: http://example.com/  (Optional Title Here)

Parsedown is not supporting the third case. Here is a failing test:

[Reference link][link title with double quotes]
[Reference link][link title with single quotes]
[Reference link][link title with parentheses]

[Reference link with a space] [link title with double quotes]
[Reference link with a space] [link title with single quotes]
[Reference link with a space] [link title with parentheses]

[link title with double quotes]: http://example.com/  "Double Quotes Title"
[link title with single quotes]: http://example.com/  'Single Quotes Title'
[link title with parentheses]: http://example.com/  (Parentheses Title)
<p><a href="http://example.com/" title="Double Quotes Title">Reference link</a>
<a href="http://example.com/" title="Single Quotes Title">Reference link</a>
<a href="http://example.com/" title="Parentheses Title">Reference link</a></p>
<p><a href="http://example.com/" title="Double Quotes Title">Reference link with a space</a>
<a href="http://example.com/" title="Single Quotes Title">Reference link with a space</a>
<a href="http://example.com/" title="Parentheses Title">Reference link with a space</a></p>

Currently Parsedown outputs HTML like:

<a href="http://example.com/  (Parentheses Title)">Reference link</a></p>

Setext-style headers

As described on daringfireball is it possible to add the Setext-style headers?

Something like this doesn't get parsed like in the specification:

"This is an H1"
"============="

"This is an H2"
"-------------"

(added the " as github parses the headings without)

Implant new tags

Hi !
I suggest to implant new tags for extends parsedown.
For example :
^(x) becomes < sup >x</ sup >
_(x) becomes < sub >x</ sub >
[ [ x ] ] (or other, i search) becomes x
i don't now for < kbd > tags
i don't now for < audio > and < video > tags.
PS : Sorry for my english globish !

Line Breaks

Nothing big, but Parsedown creates <br/> tags which don't work in certain ancient browsers.
Changing the behavior to creating <br /> fixes the problem.

Allow "extensions"

It might be interesting allow a more "modular" approach. It is not really easy to create a ParsedownExtra class that extends the main Parsedown class and adds additional parsing.

This could be handy for when the original spec is stable and you want to add the fenced code blocks, and maybe automatic url detection in quick blocks.

4 or more asterisks causes differing results

Hi,

I ran this through your demo:

_test_

and got different results from Parsedown vs PHP Markdown 1.3.

Markdown yields: <p>****test****</p>
Parsedown yields: <p><strong><em></strong>test<strong></em></strong></p>

similar results for:

_test_

Markdown: <p>_test__</p>
Parsedown: <p><strong><em>_test</em>
</strong></p>

It's my understanding that anything 4+ asterisks should result in no formatting and show the asterisks.

Thanks.

parsedown.org comments

First congrats on the new version of the website!

I have several issues/questions/comments:

  • You could open source it and get suggestions.
  • The previous homepage with visualized parsing of Markdown was an incredible design decision. I have even given it out as an example. It is sad to see it go. Would you reconsider?
  • The tests in the website do not include all of the tests used by PHPUnit. Is there any particular reason for that?
  • You could have the tabbed interface even on the homepage.

All of these could have been a different issue/PR and discussed separately with code examples, but since it's not open sourced I am listing them here all at once.

Bold and italic over multiple lines

Judging from other markdown parsers, bold and italic should work over multiple lines.

Example:

**test
dsf
sfd
fsd
fds
fds
sdf
fsd
fds**

Adding the \s flag to the regex seems to fix this.

line break broken

If you try parsing

this
breaks

you should get

<p>
this<br/>
breaks
</p>

but instead you get

<p>
this
breaks
</p>

So the test page on the website is not showing the same thing as the demo page

Problems with HTML

When using html it gets wrapped in <p> tags.

<div class="test">

Nunc accumsan scelerisque pellentesque. Proin sed placerat turpis. Nam quis odio placerat risus molestie varius sit amet quis lorem. Morbi sed elit massa. Aliquam egestas elit quis ligula fringilla, ac placerat libero aliquam.

</div>

Would make:

<p><div class="test"></p>
<p>Nunc accumsan scelerisque pellentesque. Proin sed placerat turpis. Nam quis odio placerat risus molestie varius sit amet quis lorem. Morbi sed elit massa. Aliquam egestas elit quis ligula fringilla, ac placerat libero aliquam.</p>
<p></div></p>

Problem with lists before code blocks

I just encountered an issue where a list directly before an indented code block with begins with a docblock will confuse the parser into thinking the docblock is an extension of the list. Below is an example of the issue I encountered.

  • list item 1

  • list item 2

    /**

    • This confuses parser
      */

It obviously confuses the GitHub Flavored Markdown parser as well. I didn't expect that!

I was able to work around the issue like so.

  • list item 1
  • list item 2

Below is a code sample:

/**
 * This confuses parser, unless you add some text after the list and before
 * after the list and before the code block.
 */

Implementing fenced code blocks (issue #2) would help a lot, as you wouldn't need to indent code blocks.

  • list item 1
  • list item 2
/**
 * This confuses parser, but fenced code blocks avoid the issue
 */

HTML entities parsing

Apparently 4fecd91 commit broke parsing of HTML entities. Now they just show up as &copy; instead ยฉ, for example.

Is this desired behavior or bug?

Parts of list item missing in parsed output

The following markdown:

* [[\yii\caching\ApcCache]]: uses PHP [APC](http://php.net/manual/en/book.apc.php) extension. This option can be
  considered as the fastest one when dealing with cache for a centralized thick application (e.g. one
  server, no dedicated load balancers, etc.).

* [[\yii\caching\DbCache]]: uses a database table to store cached data. By default, it will create and use a
  [SQLite3](http://sqlite.org/) database under the runtime directory. You can explicitly specify a database for
  it to use by setting its `db` property.

Renders like this in GFM:


  • [[\yii\caching\ApcCache]]: uses PHP APC extension. This option can be
    considered as the fastest one when dealing with cache for a centralized thick application (e.g. one
    server, no dedicated load balancers, etc.).
  • [[\yii\caching\DbCache]]: uses a database table to store cached data. By default, it will create and use a
    SQLite3 database under the runtime directory. You can explicitly specify a database for
    it to use by setting its db property.

but in parsedown output there are parts missing from the list item text.

Autolinking feature does not cover last slash of url

This Markdown:
jQuery UI: http://jqueryui.com/

get converted to this HTML:
<p>jQuery UI: <a href="http://jqueryui.com">http://jqueryui.com</a>/</p>

but it should be:
<p>jQuery UI: <a href="http://jqueryui.com">http://jqueryui.com/</a></p>

right?

Indentation

Continued from #60...

Note: this issue is not just for lists, everyone is free to suggest which elements and how they should be indented.

I'd like to write some more tests

Tests are free, so I was thinking about writing some tests to cover some more complex use cases. Any objection to me writing more tests, even if they don't address a current known issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.