python-markdown / markdown Goto Github PK

View Code? Open in Web Editor NEW

3.6K 75.0 851.0 3.59 MB

A Python implementation of John Gruber’s Markdown with Extension support.

Home Page: https://python-markdown.github.io/

License: BSD 3-Clause "New" or "Revised" License

Python 99.75% Shell 0.09% Makefile 0.16%

python-markdown markdown markdown-parser markdown-to-html python python3

markdown's Introduction

Python-Markdown

This is a Python implementation of John Gruber's Markdown. It is almost completely compliant with the reference implementation, though there are a few known issues. See Features for information on what exactly is supported and what is not. Additional features are supported by the Available Extensions.

Documentation

pip install markdown

import markdown
html = markdown.markdown(your_text_string)

For more advanced installation and usage documentation, see the docs/ directory of the distribution or the project website at https://Python-Markdown.github.io/.

See the change log at https://python-markdown.github.io/changelog/.

Support

You may report bugs, ask for help, and discuss various other issues on the bug tracker.

Code of Conduct

Everyone interacting in the Python-Markdown project's code bases, issue trackers, and mailing lists is expected to follow the Code of Conduct.

markdown's People

Contributors

Stargazers

Watchers

Forkers

hyde sydney-linux-user-group-dependencies skurfer teepark greghaskins mfiers kgrinberg crodjer ashwoods wilfred bhutley pjankiewicz mekk sourcefrog aleray startling fiesta bitduct rafaelxyz maxking atul-bhouraskar philippbosch fin liupengke vjousse papaeye svetlyak40wt virtix jbenet erikvanzijst netscokhlee bidix grahack slig noah flying-sheep renn999 cyberpython imcj allanderek pablok mizuy luost chalkchisel datadog khan abackstrom flashingpumpkin kevinxucs phihag walkerart benjaoming mpancorbo tylerbutler alainv seavantuuz timonwong shvechikov kausikram mhaslam jakevdp filippopirro davidcorne rfscotteb thevladsoft nai7 peterclemenko forivall-old-repos divyashashank coolono smillaedler adsabs dekue nkabir xpol bmcorser yuehonghui fnd vhf csuft babosi sigvef nemec eichin metrina tmp1994 plounze znanja osamu0329nakamura sevendeadlysins dawsonc cjrd weasyl yonilavi liuzheng volks73 lahwaacz oarodriguez aarumug methane

markdown's Issues

Nested lists require 4 spaces of indent

This issue is copied from Ticket 64 of our old bug tracker. It has been copied as-is:

Nested lists do not nest. I've tried:
* Item 1
  * Item A
  * Item B
I get a flat list.

Tried it here too:
http://babelmark.bobtfish.net/?markdown=*+Item+1%0D%0A++*+Item+A%0D%0A++*+Item+B&compare=on&src=4&dest=4

Comments

By Waylan 7/1/10

Actually nested lists work fine when you indent with 4 spaces (I changed the title to better fit the actual situation). I realize that the Perl implementation works with 2 spaces of indent, but the fact is the syntax rules make no mention of any nested lists whatsoever and all other types of blocks require 4 spaces so Python-Markdown is consistent and requires 4 spaces for all types of nests content in lists (at least the first line of each block must be nested 4 spaces).

Unless someone can convince me otherwise, I'm considering this a bug in the perl implementation (and all other implementations that have copied its behavior). This will be marked wontfix in a few days. Please take any discussions on the matter to the mailing list.

SetExt headings require additional character

Ran into this one when upgrading from 1.7 to 2.1 ... when using Setext-style headings, you now need a minimum of 3x = or - characters to make the above line into a heading.

Heading doesn't work
--

Heading does work

---

Code to reproduce:

>>> import markdown
>>> m = markdown.Markdown()  
>>> m.convert("Heading doesn't work\n--\n")
u'<p>Heading doesn't work\n--</p>'
>>> m.convert("Heading does work\n---\n")
u'<h2>Heading does work</h2>'

According to the spec, "Any number of underlining =’s or -’s will work." Perl/Showdown both support two characters

The offending regex is markdown/blockprocessors.py#L434.

Better (and documented) Support for markdown="1"

PHP Markdown allows one to use markdown inside HTML blocks - simply by adding markdown=1 attribute to appropriate HTML block.

It would be nice if python markdown also allowed for such a feature.

I faced the problem after migrating my blog from PHP to Python but seems I am not the only one. It is useful in cases like

<div class="blahblah" markdown="1">
Some *normal* markdown [text][] here.
</div>

<blockquote markdown="1">
Some *markdown text*

<pre name="code" class="python">
# code block which is to stay inside blockquote
</pre>

Yet another *markdown text*
</blockquote>

Add SmartyPants extension as part of Python-Markdown

This is a feature request. It'd be nice if there was a built-in (batteries included) extension to implement SmartyPants quoting by turning on a simple extension.

I notice that someone is already using SmartyPants with Markdown for Python, though not as an extension:
http://byrneswoder.com/blog/one-secret-to-generating-clean-html-from-text/

Add Nlbr extension

This is a small feature request.

I recommend including the "Nlbr" extension as part of the Python-Markdown package, as described here:
http://deathofagremmie.com/2011/05/09/a-newline-to-break-python-markdown-extension/

Do not turn it on by default, because obviously this changes the semantics of markdown substantially. Still, it's a common change to markdown (used by Github, for example), and it's easy to implement. Making it easy to invoke, when desired, would be very nice.

Thanks!

Support all valid email addresses

This issue is a copy of Ticket 8 in our old bug reporting system. The text has been copied as-is:

This was recently brought up on the Markdown discussion list, but there are various characters that are allowed in email addresses, that non of the markdown implementations support. Interestingly, Python-Markdown appears to support the most at this time, but there is room for improvement. Perhaps we should add a test case as all the following addresses are valid:
<[email protected]>

<[email protected]>

<[email protected]>

<[email protected]>

<abc+mailbox/[email protected]>

<!#$%&'*+-/=?^_`.{|}[email protected]> (all of these characters are allowed)

<"abc@def"@example.com> (anything goes inside quotation marks)

<"Fred Bloggs"@example.com>
It appears to me that we only have issues with the last three. Although the second to last one may be right. I'm not sure how we should treat the quotes. The examples come from Wikipedia.

Single character is not emphased

Say the following Python Markdown code (tested with the branch on github):

import markdown
text = "Lorem _a_ ipsum."
print markdown.markdown(text)

It will output:

<p>Lorem _a_ ipsum.</p>

while it should generate an emphasis, as the following:

<p>Lorem <em>a</em> ipsum.</p>

Emphasis extent wrong when more than one per line

I was trying to figure out which markdown was being used by a project, and so ended up comparing the output of markdown (python-markdown-2.0.3-3.fc15.noarch) with markdown2 (python-markdown2-1.0.1.17-3.fc15.noarch). I found that both got some emphasis wrong. I'll mention both here for orientation.

Did a diff between markdown2 and markdown outputs, so second line is markdown's. Before each I put the original source line:

To alter the environment we can set the _NODE_ENV_ environment variable, for example:

106,108c81
<
< <p>To alter the environment we can set the <em>NODE</em>ENV_ environment variable, for example:</p>
<

---
> <p>To alter the environment we can set the <em>NODE_ENV</em> environment variable, for example:</p>


Note that this method _end()_s the response, so you will want to use node's _res.write()_ for multiple writes or streaming.

971,973c738
<
< <p>Note that this method <em>end()</em>s the response, so you will want to use node's <em>res.write()</em> for multiple writes or streaming.</p>
<

---
> <p>Note that this method <em>end()_s the response, so you will want to use node's _res.write()</em> for multiple writes or streaming.</p>


connections will be accepted via _INADDR_ANY_.

1490,1491c1130
< connections will be accepted via <em>INADDR</em>ANY_.</p>
<

---
> connections will be accepted via <em>INADDR_ANY</em>.</p>

markdown2 is getting it wrong when handling tokens that have embedded '' characters, such as NODE_ENV and INADDR_ANY. Apparently it is using lazy regexs excluding '' characters.

But then markdown gets this line wrong, apparently because it is using greedy RE matching and allowing '_' characters:

Note that this method _end()_s the response, so you will want to use node's _res.write()_ for multiple writes or streaming.

The result desired was obviously as what markdown2 ended up producing.

How do you have more than one emphasis span on a line?

Ahh, got it! Changing _end()_s to _end()_'s produces <em>end()</em>'s as desired. That changed the interior '' to an ending ''.

You wouldn't believe how many times I've been told not to add 'extra' apostrophes in my writing. Now I know why I want to...

Oh, boo. Changing it to _end()_\s gives you <em>end()</em>s . Since that's closer to what the author wrote, that's better? (So I'm back to "okay, okay, I'll take out the apostrophes!")

Is there some kind of "when things don't work" FAQ you could add this to?

codehilite and fenced_code don't work together

I'm trying to get syntax highlighting to work with fenced code, but it's not cooperating.

Here's the contents of the input file, codetest.md:

~~~~{.r}
# r code
c(1,2,3)
~~~~

And here's what happens when I try to run it. Apparently there's some sort of problem with pygments? This is running on Ubuntu 11.10, python 2.7.2.

Traceback (most recent call last):
  File "/home/winston/.local/bin/markdown_py", line 5, in <module>
    pkg_resources.run_script('Markdown==2.1.0', 'markdown_py')
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 467, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1200, in run_script
    execfile(script_filename, namespace, namespace)
  File "/home/winston/.local/lib/python2.7/site-packages/Markdown-2.1.0-py2.7.egg/EGG-INFO/scripts/markdown_py", line 34, in <module>
    run()
  File "/home/winston/.local/lib/python2.7/site-packages/Markdown-2.1.0-py2.7.egg/markdown/__main__.py", line 81, in run
    markdown.markdownFromFile(**options)
  File "/home/winston/.local/lib/python2.7/site-packages/Markdown-2.1.0-py2.7.egg/markdown/__init__.py", line 416, in markdownFromFile
    kwargs.get('encoding', None))
  File "/home/winston/.local/lib/python2.7/site-packages/Markdown-2.1.0-py2.7.egg/markdown/__init__.py", line 346, in convertFile
    html = self.convert(text)
  File "/home/winston/.local/lib/python2.7/site-packages/Markdown-2.1.0-py2.7.egg/markdown/__init__.py", line 280, in convert
    self.lines = prep.run(self.lines)
  File "/home/winston/.local/lib/python2.7/site-packages/Markdown-2.1.0-py2.7.egg/markdown/extensions/fenced_code.py", line 128, in run
    code = highliter.hilite()
  File "/home/winston/.local/lib/python2.7/site-packages/Markdown-2.1.0-py2.7.egg/markdown/extensions/codehilite.py", line 99, in hilite
    noclasses=self.noclasses)
  File "/usr/lib/python2.7/dist-packages/pygments/formatters/html.py", line 347, in __init__
    self.noclasses = get_bool_opt(options, 'noclasses', False)
  File "/usr/lib/python2.7/dist-packages/pygments/util.py", line 58, in get_bool_opt
    string, optname))
pygments.util.OptionError: Invalid type [False, 'Use inline styles instead of CSS classes - Default false'] for option noclasses; use 1/0, yes/no, true/false, on/off

It works fine if I use just codehilite (with indented code), or if I use fenced_code without codehilite.

An option to embed footnotes earlier than on the end of document

It would be great, if footnote extension allowed one to decide where should the footnotes text be placed, allowing (depending on some config param) one to:

put all footnote definitions on the end of the document (default, the way things work currently)
put footnote definitions on the end of the section footnotes are used (= show all referred footnotes before the next section starts) { extra option - set section level }
put footnote definitions after the current paragraph (= show all refererred footnotes once current paragraph/table/block/whatever is finished)
expand them inline where referred, just wrap with some CSS-identifiable span

Rationale:

a) (my main use-case) On ebook-reader (while viewing EPUB file) navigating to the footnote, and back, tend to be slow and (on non-touch readers) sometimes require troublesome navigation. Having footnote rendered below the current paragraph (likely styled with smaller font) would make much nicer user experience.

b) Also on webpages it may make better presentation if footnotes are close to the text they refer to (here some javascript instrumentation may make it possible to dynamically show them on mouseover or mouseclick)

HeaderId Extention Doesn't ID Underlined Headers

The Header Id extension doesn't work properly on "underlined" headers

Example:

Header 1 {#header1}
========

Header 2 {#header2}
--------

Becomes:

<h1>Header 1 {#header1}</h1>
<h2>Header 2 {#header2}</h2>

Expected:

<h1 id="header1">Header 1</h1>
<h2 id="header2">Header 2</h2>

toc extension patch

A quick hack to support non-ascii headings :

=== diff -u toc.py.old toc.py >>> ===

   --- toc.py.old   2011-11-20 20:03:03.000000000 +1100
   +++ toc.py   2011-11-20 20:04:22.000000000 +1100
   @@ -76,7 +76,7 @@
                    # Do not override pre-existing ids 
                    if not "id" in c.attrib:
                        id = self.config["slugify"][0](c.text)
   -                    if id in used_ids:
   +                    if ( id == '' ) or ( id in used_ids ):
                            ctr = 1
                            while "%s_%d" % (id, ctr) in used_ids:
                                ctr += 1

=== <<< ===

Basically the slugify() method makes an empty slug for non-ascii headings, so we name them as "_%d" % (heading_occurence) then .

Clean up logging.

This issue is copied from Ticket 89 of our old bug tracker. It is copied as-is:

We're getting anonymous "We've got a problem header!" emails from our website; turns out it's markdown being royally insane.

In addition I've recently had to dive into the source code and found some calls to sys.exit(). Libraries should never ever call this. We don't want our python interpreter dying on us randomly just because some text failed to render. Instead it should throw exceptions, which can be caught by the caller if necessary.

I found that markdown hardly ever throws exceptions, instead it has a crazy log wrapper which basically just serves to hide where problems are coming from.

I've made a few logging changes in my fork on github - here's a comparison view

It's a backwards incompatible change since I removed a lot of the logging code which custom extensions may be using. (replaced with exceptions which are much more useful). So maybe pull this one for 2.1 rather than 2.0.4.

allow a subset of markdown

While it's nice to have safe mode, as a website owner, you sometimes want to limit the abilities of users even more. For example, on blog articles posted by users, you want all the markdown options, but in comments, you only want links, bold and italic text, and inline code snippets. A way to do that would be nice.

What I propose, is that you can use the markdown function like this:

markdown("**bold** and _italic_",allowed_tags=['b','a','i','u'],safe_mode='escape')

Raw HTML Parsing is slow

See this discussion on the mailing list for details.

Surprising compilation of double-spaced asterisks

This may in fact be what the "spec" (Markdown.pl) tells us is correct behaviour, but it strikes me as odd:

>>> import markdown
>>> markdown.version
'2.0.3'
>>> print markdown.markdown('*  *  *')
<ul>
<li>
<ul>
<li>*</li>
</ul>
</li>
</ul>

Perhaps a more sensible translation would involve a single list item (containing two asterisks) inside a single unordered list.

I'm of the opinion that defining correct translations for "invalid" input is just as important as defining correct translations for valid Markdown.

NameError when Parsing an Email Address

I'm getting a NameError when trying to parse an email address like this:

markdown.markdown("<[email protected]>")

Here's the complete traceback:

>>> import markdown
>>> markdown.markdown("<[email protected]>")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/dist-packages/markdown/__init__.py", line 598, in markdown
    return md.convert(text)
  File "/usr/local/lib/python3.2/dist-packages/markdown/__init__.py", line 395, in convert
    newRoot = treeprocessor.run(root)
  File "/usr/local/lib/python3.2/dist-packages/markdown/treeprocessors.py", line 271, in run
    text), child)
  File "/usr/local/lib/python3.2/dist-packages/markdown/treeprocessors.py", line 95, in __handleInline
    data, patternIndex, startIndex)
  File "/usr/local/lib/python3.2/dist-packages/markdown/treeprocessors.py", line 219, in __applyPattern
    node = pattern.handleMatch(match)
  File "/usr/local/lib/python3.2/dist-packages/markdown/inlinepatterns.py", line 363, in handleMatch
    letters = [codepoint2name(ord(letter)) for letter in email]
  File "/usr/local/lib/python3.2/dist-packages/markdown/inlinepatterns.py", line 363, in <listcomp>
    letters = [codepoint2name(ord(letter)) for letter in email]
  File "/usr/local/lib/python3.2/dist-packages/markdown/inlinepatterns.py", line 357, in codepoint2name
    entity = html.entities.codepoint2name.get(code)
NameError: global name 'html' is not defined

Links of the same format, <url>, work well.

Hope this helps to solve the Problem. In the meantime, any workarounds would be appreciated.

This is Python Markdown 2.0.3 running on Python 3.2

Freeze with table not preceded by an empty line

Create a file n.py with the following content:

import markdown

text = """
Lorem ipsum
Lorem ipsum dolor sit amet inceptos | Lorem ipsum | Lorem ipsum
----------------------------------- | :---------: | :---------:
Lorem ipsum                         | Lorem ipsum | Lorem ipsum
"""

print markdown.markdown(text, extensions=['extra'])

Run the file:
```
$ python n.py
```

Actual result: Python-Markdown freezes. If we interrupt the process, we have the following output:

$ python n.py 
^CTraceback (most recent call last):
  File "n.py", line 10, in <module>
    print markdown.markdown(text, extensions=['extra'])
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/__init__.py", line 386, in markdown
    return md.convert(text)
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/__init__.py", line 283, in convert
    root = self.parser.parseDocument(self.lines).getroot()
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/blockparser.py", line 62, in parseDocument
    self.parseChunk(self.root, '\n'.join(lines))
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/blockparser.py", line 77, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/blockparser.py", line 93, in parseBlocks
    if processor.test(parent, blocks[0]):
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/blockprocessors.py", line 470, in test
    return bool(self.SEARCH_RE.search(block))
KeyboardInterrupt
$

Expected result: no freeze.

More information:

The following code doesn't make Python-Markdown to freeze:

import markdown

text = """
Lorem ipsum

Lorem ipsum dolor sit amet inceptos | Lorem ipsum | Lorem ipsum
----------------------------------- | :---------: | :---------:
Lorem ipsum                         | Lorem ipsum | Lorem ipsum
"""

print markdown.markdown(text, extensions=['extra'])

Tested with waylan-Python-Markdown-2.1.0.beta-0-ge8cdb0b.zip.

Incorrect tag ordering inside of lists

The following markdown:

* ### Promo Item 1 ####
Promotext line 1 Lorem ipsum dolor sit amet, consectetur adipisicing elit.

* ### Promo Item 2 ###
Promotext line 2 Lorem ipsum dolor sit amet, consectetur adipisicing elit.

Produces the following markup:

Promotext line 1 Lorem ipsum dolor sit amet, consectetur adipisicing elit.

Promo Item 1
Promo Item 2

Promotext line 2 Lorem ipsum dolor sit amet, consectetur adipisicing elit.

The first h3 is after the p... it should be the other way around. I have confirmed that this works properly with the original Markdown.pl parser.

PrettifyTreeprocessor breaks if tree contains comments

If an extension (e.g. inlinepattern) adds a comment to the tree, then markdown fails with an exception (TypeError) while subsequently running the PrettifyTreeprocessor.

The offending code is in treeprocessors.py in the _prettifyETree(self, elem) method.

The code attempts to verify that the comment is block level by calling markdown.isBlockLevel(e.tag) on the comment. However e.tag evaluates to a function for a comment and not to a string that the isBlockLevel function is expecting, causing the TypeError exception to be raised.

The code might need to explicitly check for comments and ignore them in this processor.

Blank line required after HTML Blocks

input = """<div>

</div>
## Heading"""

expected output

u'<div>\n\n</div>\n<h2>Heading</h2>'

actual output

u'<div>\n\n</div>\n## Heading'

Link to bug tracker outdated

In docs/README and docs/README.html, the link to the bug tracker is outdated. It's http://www.freewisdom.org/projects/python-markdown/Tickets instead of https://github.com/waylan/Python-Markdown/issues/.

Support fenced code block within lists and blockquotes

As per this discussion on the markdown list, we should support fenced code blocks inside lists and blockquotes. Currently, they only work at the document root.

While we're at it, we might add support for github's syntax as an alternative??

Footnote extension crashes (stack recursion depth) when there are many footnotes

The problem

Current footnotes implementation allows no more than about 1000 footnotes per document. A bit more and you get
very long traceback ending with:

       (… the line below repeated many many times …)
       File "/usr/lib/pymodules/python2.6/markdown/extensions/footnotes.py", line 177, in _handleFootnoteDefinitions
         more_plain = self._handleFootnoteDefinitions(theRest)
       File "/usr/lib/pymodules/python2.6/markdown/extensions/footnotes.py", line 176, in _handleFootnoteDefinitions
         + "\n".join(detabbed))
       File "/usr/lib/pymodules/python2.6/markdown/extensions/footnotes.py", line 98, in setFootnote
         self.footnotes[id] = text
       File "/usr/lib/pymodules/python2.6/markdown/odict.py", line 31, in __setitem__
         super(OrderedDict, self).__setitem__(key, value)
     RuntimeError: maximum recursion depth exceeded while calling a Python object

Simple demo

Save the script below as foot_demo.py:

COUNT = 990     # On my machine this is the smallest value which breaks, your values may change
for i in xrange(1, COUNT):
    print "Something[^%d]\n" % i
for i in xrange(1, COUNT):
    print "[^%d]: Another thing\n" % i

then run:

python foot_demo.py > foot_demo.txt
markdown -x footnotes foot_demo.txt

The reason

The _handleFootnoteDefinitions method finds first footnote definition, then
calls itself recursively on the remaining text. This obviously calls
for very deep stack when there are many footnotes, it is also fairly
inefficient.

The solution

Just rewrite the method so it loops instead of recursing.

PS If original author has no time to work on the code, I can try
working on the patch – but I am not sure whether I understand all
ideas in the code properly.

A tag `img` alone on a line is not put into a paragraph

Say the following code:

import markdown

text = """
Paragraphe

<img src="/exemple1.png" alt="Texte alternatif" />

Paragraphe
"""

print markdown.markdown(text)

Python Markdown doesn't put the image into a paragraph:

<p>Paragraphe</p>
<img src="/exemple1.png" alt="Texte alternatif" />

<p>Paragraphe</p>

while PHP Markdown and Markdown.pl do:

<p>Paragraphe</p>

<p><img src="/exemple1.png" alt="Texte alternatif" /></p>

<p>Paragraphe</p>

Links break if they contain underscores

Example:

[a b c](/a_b_c)

Becomes:

<p><a href="/a�klzzwxh:0000�b�klzzwxh:0001�c">a b c</a></p>

Regex for horizontal rules doesn't follow Markdown.pl and PHP Markdown

In Markdown, we can create a horizontal rule with 3 or more proper symbols (hyphens, underscores or asterisks) with 2 spaces maximum between each symbol. The following examples produce 3 horizontal rules with both Markdown.pl and PHP Markdown:

-  --  -

**  *  **

_  _  _

but Python Markdown doesn't create any horizontal rule:

<ul>
<li>--  -</li>
</ul>
<p><strong>  *  </strong></p>
<p>_  _  _</p>

For information, the regex that I use to highlight a horizontal rule in gedit is the following:

^[ ]{0,3}            # Maximum 3 spaces at the beginning of the line.
(
  (-+[ ]{0,2}){3,} | # 3 or more hyphens, with 2 spaces maximum between each hyphen.
  (_+[ ]{0,2}){3,} | # Idem, but with underscores.
  (\*+[ ]{0,2}){3,}  # Idem, but with asterisks.
)
[ \t]*$              # Optional trailing spaces or tabs.

slash escape bug in link url?

Hi, just to report a possible bug I found:

In [3]: markdown.markdown('[q=go:GO\\:0000307](/query?q=go:GO\\:0000307)')
Out[3]: u'<p><a href="/query?q=go:GO\x02klzzwxh:0001\x030000307">q=go:GO:0000307</a></p>'

I also tried from the original markdown website: http://daringfireball.net/projects/markdown/dingus. The same input generates

<p><a href="/query?q=go:GO\:0000307">q=go:GO\:0000307</a></p>

correctly.

TOC Ext doesn't ignore `[TOC]` in code blocks/spans

It's not possible to get a literal [TOC] in the output when use the (excellent) table of contents extension. It would be nice to be able to do

`[TOC]`

    [TOC]

and just get the literal [TOC] in the output.

TOC crashes on randomly ordered header levels

If there's an invalid URL following a change in indentation of the form:

### bar

# foo

[a][(b)

I get the following crash:

In [1]: from markdown import markdown

In [2]: s = """### bar
   ...: 
   ...: # foo
   ...: 
   ...: [a][(b)"""

In [3]: markdown(s, extensions=['toc'])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/wilfred/work/potatopedia/<ipython-input-3-5be88cfe066b> in <module>()
----> 1 markdown(s, extensions=['toc'])

/home/wilfred/work/potatopedia/markdown/__init__.py in markdown(text, *args, **kwargs)
    375     """
    376     md = Markdown(*args, **kwargs)
--> 377     return md.convert(text)
    378 
    379 

/home/wilfred/work/potatopedia/markdown/__init__.py in convert(self, source)
    281         # Run the tree-processors

    282         for treeprocessor in self.treeprocessors.values():
--> 283             newRoot = treeprocessor.run(root)
    284             if newRoot:
    285                 root = newRoot

/home/wilfred/work/potatopedia/markdown/extensions/toc.py in run(self, doc)
    106                     c.append(anchor)
    107 
--> 108                 list_stack[-1].append(last_li)
    109 
    110 class TocExtension(markdown.Extension):

IndexError: list index out of range

Many thanks for Python-Markdown.

disable guessing in codehilite

I’d like the option to disable syntax guessing. Something like codehilite(guess_syntax=False).

I feel like I’m starting 80% (a completely made up number) of my blocks with “:::text” to avoid weird colors here and there. So I’m adding markup to make it not do something, where I’d prefer to just add it where I want something to happen.

I could attempt this myself and submit a pull request if you prefer. Thanks.

codehilite should allow setting default language preference

If I run markdown on a single set of docs where all code examples are in the same language, I don't see a reason to repeat ...python over and over again. It would be nice if codehilite allowed specifying the common language once.

Issue: nl2br extension and lists

When using the nl2br extension i have found the following bug. If you were to create a markdown list then try to write a snippet of code the code does not get put into <pre> or <code> tags.

I believe the cause of this to be that the nl2br is not escaping out of the list causing the following syntax.

<ul>
    <li>
        <p>Helloworld</p>
        <p>My code snippet</p>
    </li>
</ul>

Placeholder slips to HTML markup

When in Markdown source an HTML comment contains Markdown's link markup, the resulting HTML contains link placeholder in place of the comment's text:

Example.md:


Please see <!-- [Example][1] -->

[1]: http://example.com/

Example.html (rendered with python-markdown 2.0.3):


<p>Please see <!-- �klzzwxh:0000� --></p>

Actually there are STX and EXT chars in the output, they are just not shown.

Collapsed Lists with headers fail to parse inline markup.

* ### Promo Item 1 ####
  Duis aute irure dolor in _reprehenderit_ in voluptate velit esse cillum dolore eu fugiat nulla pariatur. [Learn more](http://www.naz.edu)
* ### Promo Item 2 ####
  Duis aute irure dolor in reprehenderit in **voluptate** velit esse cillum dolore eu fugiat nulla pariatur. [Learn more.](http://www.naz.edu 'Yay')
* ### Promo Item 3 ####
  Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. [Learn more.](http://www.naz.edu 'Yay')
* ### Promo Item 4 ####
  Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. [Learn more.](http://www.naz.edu 'Yay')

Renders as:

<ul>
<li>
<h3>Promo Item 1</h3>
  Duis aute irure dolor in _reprehenderit_ in voluptate velit esse cillum dolore eu fugiat nulla pariatur. [Learn more](http://www.naz.edu)</li>
<li>
<h3>Promo Item 2</h3>
  Duis aute irure dolor in reprehenderit in **voluptate** velit esse cillum dolore eu fugiat nulla pariatur. [Learn more.](http://www.naz.edu 'Yay')</li>
<li>
<h3>Promo Item 3</h3>

  Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. [Learn more.](http://www.naz.edu 'Yay')</li>
<li>
<h3>Promo Item 4</h3>
  Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. [Learn more.](http://www.naz.edu 'Yay')</li>
</ul>

Buggy parsing of empty link text

When a link is written like this:

[](http://example.com/)

the HTML is generated as:

<a href="http://example.com/" />

By way of comparison GitHub generates it as:

<a href="http://example.com/"></a>

And renders it as:

Interestingly, WebKit seems to handle the <a href="" /> form really badly in the rendering and element inspector. It leaves the link open to the start of the next anchor tag and adds multiple extra tags in the inspector.

Minimum version of cElementTree

This is a copy of Ticket 86 from our old bug tracker. It has been copied as-is:

The code checks for 1.0, I had 1.0.2 installed. The Comment and PI symbols were not defined. Installed 1.0.5 of cElementTree and it fixed the problem.

Python Markdown fails to process a list of reference links separated by manual line breaks

Python Markdown fails to process the following code (note the trailing spaces for manual line breaks) according to the Markdown syntax:

[link text 1]  
[link text 2][]  
[link text 3][link label 3]  
[link text 4] [link label 4]

[link text 1]: url1
[link text 2]: url2
[link label 3]: url3
[link label 4]: url4

It outputs:

<p><a href="url2">link text 1</a>[]<br />
[link text 3]<a href="url3">link label 3</a><br />
[link text 4] <a href="url4">link label 4</a></p>

while Markdown.pl and PHP Markdown create 4 links:

<p><a href="url">link text 1</a> <br />
<a href="url">link text 2</a> <br />
<a href="url">link text 3</a> <br />
<a href="url">link text 4</a></p>

More information: if we add something between the first link and its manual line break, Python Markdown successfully creates 4 links. Example:

This little modification to the previous code (word lorem added):

[link text 1] lorem  
[link text 2][]  
[link text 3][link label 3]  
[link text 4] [link label 4]

[link text 1]: url1
[link text 2]: url2
[link label 3]: url3
[link label 4]: url4

will create 4 links:

<p><a href="url1">link text 1</a> lorem<br />
<a href="url2">link text 2</a><br />
<a href="url3">link text 3</a><br />
<a href="url4">link text 4</a></p>

Add markdown-urlize to included extensions

It would be great to see markdown-urlize as part of the included extensions, since it's very common functionality to add to markdown. Would this be possible? The author seems up for it.

Backslashes disappear during python markdown processing

Any backslashes \ get removed during the markdown processing. Backslashes that are in code blocks get displayed properly, but ones in the markdown body -- not.

For example, if you parse "C:\Program Files" the markdown processor will return "C:Program Files".

Can't pass extension arguments to Markdown

According to http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module , I should be able to pass in extension arguments as follows:

import markdown
md = markdown.Markdown(extensions=['toc'], extension_configs= {'toc' : ('anchorlink', True)},)

However, this produces the following traceback:

ValueError                                Traceback (most recent call last)
/home/wilfred/bleeding_edge/Python-Markdown/<ipython-input-19-31b461a39945> in <module>()
----> 1 md = markdown.Markdown(extensions=['toc'], extension_configs= {'toc' : ('anchorlink', True)},)

/home/wilfred/bleeding_edge/Python-Markdown/markdown/__init__.py in __init__(self, *args, **kwargs)
    132         self.htmlStash = util.HtmlStash()
    133         self.registerExtensions(extensions=kwargs.get('extensions', []),
--> 134                                 configs=kwargs.get('extension_configs', {}))
    135         self.set_output_format(kwargs.get('output_format', 'xhtml1'))
    136         self.reset()

/home/wilfred/bleeding_edge/Python-Markdown/markdown/__init__.py in registerExtensions(self, extensions, configs)
    158         for ext in extensions:
    159             if isinstance(ext, basestring):
--> 160                 ext = self.build_extension(ext, configs.get(ext, []))
    161             if isinstance(ext, Extension):
    162                 # might raise NotImplementedError, but that's the extension author's problem


/home/wilfred/bleeding_edge/Python-Markdown/markdown/__init__.py in build_extension(self, ext_name, configs)
    177 
    178         # Parse extensions config params (ignore the order)

--> 179         configs = dict(configs)
    180         pos = ext_name.find("(") # find the first "("
    181         if pos > 0:

ValueError: dictionary update sequence element #0 has length 10; 2 is required

The call that's dying is dict(('anchorlink', True)). Would it make sense for extension_configs to be a dict of dicts? e.g.:

markdown.Markdown(extensions=['toc'], extension_configs= {'toc' : {'anchorlink': True}},)

The alternative syntax works fine:

markdown.Markdown(extensions=['toc(anchorlink=1)'])

although according to http://www.freewisdom.org/projects/python-markdown/Table_of_Contents I should be able to pass booleans:

md = markdown.Markdown(extensions=['toc(anchorlink=True)'])

but this dies slightly later (I'm not 100% sure if this is the same issue, apologies if not) with:

In [33]: md = markdown.Markdown(extensions=['toc(anchorlink=True)'])

In [34]: md.convert("[TOC]\n\n# foo\n\n## bar")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/wilfred/bleeding_edge/Python-Markdown/<ipython-input-34-aafe6026c2af> in <module>()
----> 1 md.convert("[TOC]\n\n# foo\n\n## bar")

/home/wilfred/bleeding_edge/Python-Markdown/markdown/__init__.pyc in convert(self, source)
    285         # Run the tree-processors

    286         for treeprocessor in self.treeprocessors.values():
--> 287             newRoot = treeprocessor.run(root)
    288             if newRoot:
    289                 root = newRoot

/home/wilfred/bleeding_edge/Python-Markdown/markdown/extensions/toc.pyc in run(self, doc)
    102                 link.attrib["href"] = '#' + id
    103 
--> 104                 if int(self.config["anchorlink"]):
    105                     anchor = etree.Element("a")
    106                     anchor.text = c.text

ValueError: invalid literal for int() with base 10: 'True'

Markdown Extra support doesn't include modifications to middle-word emphasis

PHP Markdown Extra includes modifications to underscore emphasis, but Python Markdown doesn't support them.

According to the Markdown Extra syntax, underscores in the middle of a word don't generate an emphasis. There are 2 cases:

Strong emphasis

With Markdown Extra, the following:

The file name is "my__text__file.txt".

will be displayed as the following:

<p>The file name is "my__text__file.txt".</p>

However, the following Python Markdown code:

import markdown
text = "The file name is \"my__text__file.txt\"."
print markdown.markdown(text, ['extra'])

will output:

<p>The file name is "my<strong>text</strong>file.txt".</p>

Light emphasis

Constant SMART_EMPHASIS set to False is useful to follow official Markdown syntax. However, this constant should affect only Markdown standard, but it affects also Markdown Extra. According to the Markdown Extra syntax, the following:

The file name is "my_text_file.txt".

will be displayed as the following:

<p>The file name is "my_text_file.txt".</p>

However, the following Python Markdown code (with SMART_EMPHASIS = False):

import markdown
text = "The file name is \"my_text_file.txt\"."
print markdown.markdown(text, ['extra'])

will output:

<p>The file name is "my<em>text</em>file.txt".</p>

Id attribute for Setext-style header not recognized

HeaderId extension doesn't work with Setext-style headers. The following code:

import markdown
text = """
Setext-style header {#id1}
===================
Setext-style header {#id2}
-------------------
### Atx-style header ### {#id3}
"""
print markdown.markdown(text, ['extra'])

will output:

<h1>Setext-style header {#id1}</h1>
<h2>Setext-style header {#id2}</h2>
<h3 id="id3">Atx-style header</h3>

but it should output:

<h1 id="id1">Setext-style header</h1>
<h2 id="id2">Setext-style header</h2>
<h3 id="id3">Atx-style header</h3>

Version used is 2.0.3.

Need to patch Python-Markdown for Cygwin

Python-Markdown does not work as delivered on Cygwin. It installs, but attempting to run it produces this error:
Traceback (most recent call last):
File "/usr/bin/markdown", line 44, in
from markdown import COMMAND_LINE_LOGGING_LEVEL
File "/usr/bin/markdown.py", line 44, in
ImportError: cannot import name COMMAND_LINE_LOGGING_LEVEL

The problem is that a Cygwin system is actually a Windows system underneath, so it requires the Windows patch. HOWEVER, sys.platform reports 'cygwin', not 'win32', so the patch isn't actually run. Just change the detection code so that 'cygwin' also enables the windows workaround, and all is well.

Here's a patch, please add it!

diff -u markdown.py.old markdown.py
--- markdown.py.old     2011-05-26 17:27:46.000000000 -0400
+++ markdown.py 2011-05-26 18:28:43.014140700 -0400
@@ -30,7 +30,7 @@
 """

 import sys, os
-if sys.platform == 'win32':
+if sys.platform in ['win32', 'cygwin']:
         # We have to remove the Scripts dir from path on windows.
         # If we don't, it will try to import itself rather than markdown lib.
         # This appears to *not* be a problem on *nix systems, only Windows.

TOC: Problem with "Link-Headers"

This issue has been copied from Ticket 84 of our old bug tracker and was reported by "Eugen" (no contact info provided). It has been copied as-is:

Suppose you have got this markdown structure:
[TOC]
...
# [Wikipedia](http://en.wikipedia.org/)
...
I.e., one of the headers is a link to a website.

Then, the generated TOC link to the Section "Wikipedia" does not link to the section inside the document but also to "http://en.wikipedia.org/". I don't know if this is intended behaviour but I think it is irritating.

My tabs are converted into spaces

the code →→code (where → is a tab) is expanded into this:

    code

why aren’t my tabs retained?

multiple emphasis not handled correctly.

because _I_ wouldn't kill him, the _bunshin_ would

should come out as

<p>because <em>I</em> wouldn't kill him, the <em>bunshin</em> would</p>

(according to the official spec and Dingus) but in 2.0.3 it comes out as

<p>because <em>I_ wouldn't kill him, the _bunshin</em> would</p>

Traceback when using cmdline tool with Python 3.2

Built and installed markdown 2.1.0 using Python 3.2.1, on Fedora 16 (markdown_py has been renamed to /usr/bin/markdown_py-3.2).

Now:

% /bin/echo -e "### Heading\n\ntest" | markdown_py-3.2
Traceback (most recent call last):
  File "/usr/bin/markdown_py-3.2", line 34, in <module>
    run()
  File "/usr/lib/python3.2/site-packages/markdown/__main__.py", line 81, in run
    markdown.markdownFromFile(**options)
  File "/usr/lib/python3.2/site-packages/markdown/__init__.py", line 416, in markdownFromFile
    kwargs.get('encoding', None))
  File "/usr/lib/python3.2/site-packages/markdown/__init__.py", line 341, in convertFile
    text = input_file.read()
  File "/usr/lib64/python3.2/codecs.py", line 480, in read
    data = self.bytebuffer + newdata
TypeError: can't concat bytes to str

Specifying -e UTF-8 does not help.

Implement Attribute Lists

See Maruku for details.

Freeze with table rendered without Markdown Extra

Create a file m.py with the following content:

import markdown

text = """
Lorem ipsum                         | Lorem ipsum | Lorem ipsum
----------------------------------- | :---------: | :---------:
Lorem ipsum dolor sit amet inceptos | Lorem ipsum | Lorem ipsum
"""

print markdown.markdown(text)

Run the file:
```
$ python m.py
```

Actual result: Python-Markdown freezes. If we interrupt the process, we have the following output:

$ python m.py 
^CTraceback (most recent call last):
  File "m.py", line 9, in <module>
    print markdown.markdown(text)
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/__init__.py", line 386, in markdown
    return md.convert(text)
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/__init__.py", line 283, in convert
    root = self.parser.parseDocument(self.lines).getroot()
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/blockparser.py", line 62, in parseDocument
    self.parseChunk(self.root, '\n'.join(lines))
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/blockparser.py", line 77, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/blockparser.py", line 93, in parseBlocks
    if processor.test(parent, blocks[0]):
  File "/home/nom/.local/lib/python2.7/site-packages/markdown/blockprocessors.py", line 470, in test
    return bool(self.SEARCH_RE.search(block))
KeyboardInterrupt
$

Expected result: no freeze. Both PHP Markdown (no Extra) and Markdown.pl output the following:

<p>Lorem ipsum                         | Lorem ipsum | Lorem ipsum
----------------------------------- | :---------: | :---------:
Lorem ipsum dolor sit amet inceptos | Lorem ipsum | Lorem ipsum</p>

More information:

The bug doesn't occur with extension Extra:

print markdown.markdown(text, extensions=['extra'])

Tested with waylan-Python-Markdown-2.1.0.beta-0-ge8cdb0b.zip.

python-markdown / markdown Goto Github PK

markdown's Introduction

Documentation

Support

Code of Conduct

markdown's People

Contributors

Stargazers

Watchers

Forkers

markdown's Issues

Comments

Promo Item 1

Promo Item 2

The problem

Simple demo

The reason

The solution

Strong emphasis

Light emphasis

Recommend Projects

Recommend Topics

Recommend Org