Giter Site home page Giter Site logo

Comments (5)

nicholasserra avatar nicholasserra commented on May 30, 2024 1

Good catch, thanks! Yeah if you'd like to dive deeper please do

from python-markdown2.

berndbenner avatar berndbenner commented on May 30, 2024

I am also rendering my documents with version 2.4.3 with Python 3.9 on debian and windows.
Versions 2.4.4 ff will fail on my page-breaks in the markdown source.

sub header

It seems that the the following sub header after the page-break will not be observed.

from python-markdown2.

Crozzers avatar Crozzers commented on May 30, 2024

I took a look into this. The problem lies in the _hash_html_blocks function and the _strict_block_tag_re regex.
Essentially, it attempts to match against HTML block tags (like a div) and then hash them. However, the fenced code block gets put into a nested div, on the same level of indentation, like so:

<div class="enclosing">
<div class="codehilite">
<pre><span></span><code><span class="n">x</span> <span class="o">=</span> <span class="mi">1</span>
</code></pre>
</div>

</div>

And so the regex tries finding <div> blocks by matching against an opening tag and a closing tag. Of course, it matches the closing tag for the nested div and not the second closing tag. This creates something like this, which results in the </div> tag being put into a paragraph:

md5-6c15c5207ae336b3b80cbb077f8b842e


</div>

I am currently brainstorming ideas on how to solve this but it's certainly a headache.

@berndbenner could you attach a markdown code snippet for your issue?

from python-markdown2.

bow avatar bow commented on May 30, 2024

@Crozzers If it helps, in my particular example above, indentation level indeed affects the output.

Indenting the innerfenced code block:

<div class="enclosing">
  ```python
  x = 1
  ```
</div>

resulted in the closing </div> being matched correctly. Looking deeper into #462, trying to undo the new lines being added there (or rather a combination of removing specific new lines), also rendered the expected HTML.

To be honest, I am a little unsure if I could add a meaningful solution. HTML is not a regular language, and trying to parse these edge cases by piling on more regex seems like a Sisyphean task. Then again, the codebase is also new to me and there are definitely parts that I do not completely understand yet. So 🤞 ~

from python-markdown2.

Crozzers avatar Crozzers commented on May 30, 2024

I've managed to get a solution mostly working.
My solution is to simply iterate over each line in the text and manually tally up the number of opening/closing tags and then hash the relevant block. It seems to work well but one test is not passing.
The sublist_para test case looks like this:

<p>Some quick thoughts from a coder's perspective:</p>

<ul>
<li><p>The source will be available in a Mercurial ...</p></li>
<li><p>Komodo is a Mozilla-based application...</p>

<ul>
<li>Get a slightly tweaked mozilla build (C++, JavaScript, XUL).</li>
<li>Get a slightly tweaks Python build (C).</li>
<li>Add a bunch of core logic (Python)...</li>
<li>Add Komodo chrome (XUL, JavaScript, CSS, DTDs).</li>
</ul>

<p><p>What this means is that work on and add significant functionality...</p></li>
<li><p>Komodo uses the same extension mechanisms as Firefox...</p></li>
<li><p>Komodo builds and runs on Windows, Linux and ...</p></li>
</ul></p>

But this seems wrong? The final list items should not, in my opinion, be wrapped in an additional <p> tag. When rendering in Firefox it auto corrects to this:

<p></p>
<p>What this means is that work on and add significant functionality...</p></li>
<li><p>Komodo uses the same extension mechanisms as Firefox...</p></li>
<li><p>Komodo builds and runs on Windows, Linux and ...</p></li>
</ul>
<p></p>

So firefox also does not think the final block should be wrapped in a <p> tag.

I'll clean up my code a bit and submit a PR with this test case "fixed" and we'll see what happens

from python-markdown2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.