I am also rendering my documents with version 2.4.3 with Python 3.9 on debian and windows.
Versions 2.4.4 ff will fail on my page-breaks in the markdown source.
sub header
It seems that the the following sub header after the page-break will not be observed.
I took a look into this. The problem lies in the _hash_html_blocks function and the _strict_block_tag_re regex.
Essentially, it attempts to match against HTML block tags (like a div) and then hash them. However, the fenced code block gets put into a nested div, on the same level of indentation, like so:
And so the regex tries finding <div> blocks by matching against an opening tag and a closing tag. Of course, it matches the closing tag for the nested div and not the second closing tag. This creates something like this, which results in the </div> tag being put into a paragraph:
md5-6c15c5207ae336b3b80cbb077f8b842e
</div>
I am currently brainstorming ideas on how to solve this but it's certainly a headache.
@berndbenner could you attach a markdown code snippet for your issue?
@Crozzers If it helps, in my particular example above, indentation level indeed affects the output.
Indenting the innerfenced code block:
<div class="enclosing">
```python
x = 1
```
</div>
resulted in the closing </div> being matched correctly. Looking deeper into #462, trying to undo the new lines being added there (or rather a combination of removing specific new lines), also rendered the expected HTML.
To be honest, I am a little unsure if I could add a meaningful solution. HTML is not a regular language, and trying to parse these edge cases by piling on more regex seems like a Sisyphean task. Then again, the codebase is also new to me and there are definitely parts that I do not completely understand yet. So 🤞 ~
I've managed to get a solution mostly working.
My solution is to simply iterate over each line in the text and manually tally up the number of opening/closing tags and then hash the relevant block. It seems to work well but one test is not passing.
The sublist_para test case looks like this:
<p>Some quick thoughts from a coder's perspective:</p><ul><li><p>The source will be available in a Mercurial ...</p></li><li><p>Komodo is a Mozilla-based application...</p><ul><li>Get a slightly tweaked mozilla build (C++, JavaScript, XUL).</li><li>Get a slightly tweaks Python build (C).</li><li>Add a bunch of core logic (Python)...</li><li>Add Komodo chrome (XUL, JavaScript, CSS, DTDs).</li></ul><p><p>What this means is that work on and add significant functionality...</p></li><li><p>Komodo uses the same extension mechanisms as Firefox...</p></li><li><p>Komodo builds and runs on Windows, Linux and ...</p></li></ul></p>
But this seems wrong? The final list items should not, in my opinion, be wrapped in an additional <p> tag. When rendering in Firefox it auto corrects to this:
<p></p><p>What this means is that work on and add significant functionality...</p></li><li><p>Komodo uses the same extension mechanisms as Firefox...</p></li><li><p>Komodo builds and runs on Windows, Linux and ...</p></li></ul><p></p>
So firefox also does not think the final block should be wrapped in a <p> tag.
I'll clean up my code a bit and submit a PR with this test case "fixed" and we'll see what happens
from python-markdown2.
Comments (5)
Good catch, thanks! Yeah if you'd like to dive deeper please do
from python-markdown2.
I am also rendering my documents with version 2.4.3 with Python 3.9 on debian and windows.
Versions 2.4.4 ff will fail on my page-breaks in the markdown source.
sub header
It seems that the the following sub header after the page-break will not be observed.
from python-markdown2.
I took a look into this. The problem lies in the
_hash_html_blocks
function and the_strict_block_tag_re
regex.Essentially, it attempts to match against HTML block tags (like a div) and then hash them. However, the fenced code block gets put into a nested div, on the same level of indentation, like so:
And so the regex tries finding
<div>
blocks by matching against an opening tag and a closing tag. Of course, it matches the closing tag for the nested div and not the second closing tag. This creates something like this, which results in the</div>
tag being put into a paragraph:I am currently brainstorming ideas on how to solve this but it's certainly a headache.
@berndbenner could you attach a markdown code snippet for your issue?
from python-markdown2.
@Crozzers If it helps, in my particular example above, indentation level indeed affects the output.
Indenting the innerfenced code block:
resulted in the closing
</div>
being matched correctly. Looking deeper into #462, trying to undo the new lines being added there (or rather a combination of removing specific new lines), also rendered the expected HTML.To be honest, I am a little unsure if I could add a meaningful solution. HTML is not a regular language, and trying to parse these edge cases by piling on more regex seems like a Sisyphean task. Then again, the codebase is also new to me and there are definitely parts that I do not completely understand yet. So 🤞 ~
from python-markdown2.
I've managed to get a solution mostly working.
My solution is to simply iterate over each line in the text and manually tally up the number of opening/closing tags and then hash the relevant block. It seems to work well but one test is not passing.
The
sublist_para
test case looks like this:But this seems wrong? The final list items should not, in my opinion, be wrapped in an additional
<p>
tag. When rendering in Firefox it auto corrects to this:So firefox also does not think the final block should be wrapped in a
<p>
tag.I'll clean up my code a bit and submit a PR with this test case "fixed" and we'll see what happens
from python-markdown2.
Related Issues (20)
Recommend Projects
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow
An Open Source Machine Learning Framework for Everyone
Django
The Web framework for perfectionists with deadlines.
Laravel
A PHP framework for web artisans
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
Recommend Topics
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web
Some thing interesting about web. New door for the world.
server
A server is a program made to process requests and deliver data to clients.
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization
Some thing interesting about visualization, use data art
Game
Some thing interesting about game, make everyone happy.
Recommend Org
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft
Open source projects and samples from Microsoft.
Google
Google ❤️ Open Source for everyone.
Alibaba
Alibaba Open Source for everyone
D3
Data-Driven Documents codes.
Tencent
China tencent open source team.