Hi maintainers I have recently found a combination of raw HTML tag +

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Enclosing `fenced-code-blocks` in a `<div>` tag renders incorrect HTML about python-markdown2 HOT 5 CLOSED

bow commented on May 30, 2024

Enclosing `fenced-code-blocks` in a `

` tag renders incorrect HTML

from python-markdown2.

Comments (5)

nicholasserra commented on May 30, 2024 1

Good catch, thanks! Yeah if you'd like to dive deeper please do

from python-markdown2.

berndbenner commented on May 30, 2024

I am also rendering my documents with version 2.4.3 with Python 3.9 on debian and windows.
Versions 2.4.4 ff will fail on my page-breaks in the markdown source.

sub header

It seems that the the following sub header after the page-break will not be observed.

from python-markdown2.

Crozzers commented on May 30, 2024

I took a look into this. The problem lies in the _hash_html_blocks function and the _strict_block_tag_re regex.
Essentially, it attempts to match against HTML block tags (like a div) and then hash them. However, the fenced code block gets put into a nested div, on the same level of indentation, like so:

<div class="enclosing">
<div class="codehilite">
<pre><span></span><code><span class="n">x</span> <span class="o">=</span> <span class="mi">1</span>
</code></pre>
</div>

</div>

And so the regex tries finding <div> blocks by matching against an opening tag and a closing tag. Of course, it matches the closing tag for the nested div and not the second closing tag. This creates something like this, which results in the </div> tag being put into a paragraph:

md5-6c15c5207ae336b3b80cbb077f8b842e


</div>

I am currently brainstorming ideas on how to solve this but it's certainly a headache.

@berndbenner could you attach a markdown code snippet for your issue?

from python-markdown2.

bow commented on May 30, 2024

@Crozzers If it helps, in my particular example above, indentation level indeed affects the output.

Indenting the innerfenced code block:

<div class="enclosing">
  ```python
  x = 1
  ```
</div>

resulted in the closing </div> being matched correctly. Looking deeper into #462, trying to undo the new lines being added there (or rather a combination of removing specific new lines), also rendered the expected HTML.

To be honest, I am a little unsure if I could add a meaningful solution. HTML is not a regular language, and trying to parse these edge cases by piling on more regex seems like a Sisyphean task. Then again, the codebase is also new to me and there are definitely parts that I do not completely understand yet. So 🤞 ~

from python-markdown2.

Crozzers commented on May 30, 2024

I've managed to get a solution mostly working.
My solution is to simply iterate over each line in the text and manually tally up the number of opening/closing tags and then hash the relevant block. It seems to work well but one test is not passing.
The sublist_para test case looks like this:

<p>Some quick thoughts from a coder's perspective:</p>

<ul>
<li><p>The source will be available in a Mercurial ...</p></li>
<li><p>Komodo is a Mozilla-based application...</p>

<ul>
<li>Get a slightly tweaked mozilla build (C++, JavaScript, XUL).</li>
<li>Get a slightly tweaks Python build (C).</li>
<li>Add a bunch of core logic (Python)...</li>
<li>Add Komodo chrome (XUL, JavaScript, CSS, DTDs).</li>
</ul>

<p><p>What this means is that work on and add significant functionality...</p></li>
<li><p>Komodo uses the same extension mechanisms as Firefox...</p></li>
<li><p>Komodo builds and runs on Windows, Linux and ...</p></li>
</ul></p>

But this seems wrong? The final list items should not, in my opinion, be wrapped in an additional <p> tag. When rendering in Firefox it auto corrects to this:

<p></p>
<p>What this means is that work on and add significant functionality...</p></li>
<li><p>Komodo uses the same extension mechanisms as Firefox...</p></li>
<li><p>Komodo builds and runs on Windows, Linux and ...</p></li>
</ul>
<p></p>

So firefox also does not think the final block should be wrapped in a <p> tag.

I'll clean up my code a bit and submit a PR with this test case "fixed" and we'll see what happens

from python-markdown2.

Enclosing `fenced-code-blocks` in a `<div>` tag renders incorrect HTML about python-markdown2 HOT 5 CLOSED

Comments (5)

sub header

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent