Giter Site home page Giter Site logo

GitHub.html5 and `<....` about pandoc-goodies HOT 9 CLOSED

tajmone avatar tajmone commented on August 16, 2024
GitHub.html5 and `<....`

from pandoc-goodies.

Comments (9)

tajmone avatar tajmone commented on August 16, 2024

This looks more like a general pandoc/markdown problem, and I doubt it can be due to something in the template (I've used it myself with HTML code examples, and didn't incur in any such problems with angle brackets).

I'm trying to think what the reason of this error might be. One problem with the source you posted above is that the block is declared as being markdown.

Pandoc treats markdown differently from GitHub, and while the above example might work fine in GitHub's documents preview (as above) you must remember that pandoc markdown allows embedding raw HTML tags in a different way than GitHub's GFM.

Also, pandoc will use a different parsing engine for markdown and gfm (since v2.x), so the error might be raised with one format but not the other.

Also, pandoc markdown allows to escape everything.

My understanding is that the angle brackets in the above source are intended as placeholders for example value fields.

The problem is that if you use the ```markdown syntax, pandoc will try to syntax highlight the block as pandoc markdown syntax, and render it as HTML (which then goes through the HTML5 template and its engine).

Technically speaking, in pandoc markdown those angle quotes should be escaped:

# [\<id\>] \<title\>

so they don't get mistaken for HTML tags (not only they are not valid tags, but they have no closing counterpant in this case), but the problem here is that you'd like to see them without the \ in the final result, because you're documenting a usage example.

My advise is to try using escapes, and see if the error goes away and what the final HTML result will look like. Otherwise, you might

  • enable/disable some pandoc extensions that deal with HTML inside markdown
  • don't declare the block as markdown but just use a verbatim block (no syntax highlighting)
  • use XML entities instead of angle brackets (&lt; and &gt;)
  • try using markdown, gfm and the (depracted) markdown_github formats to see if the error is still there.

I'm sure that by playing/tweaking with these you'll find a way to obtain what you want.

from pandoc-goodies.

nvoynov avatar nvoynov commented on August 16, 2024

thank you for the answer

The first that I've done - deleted [```markdown] that does not help, I think because my file also has many quotes with < and > like creq doc [-q/--query \<query\>].

The thing is that standard pandoc htlm produces output file (also few others do), but this template does not and produces the error mentioned.

from pandoc-goodies.

tajmone avatar tajmone commented on August 16, 2024

I'm sorry to read that. I'll try to look into it and see what could be the reason for this and how to fix it — but quite frankly, I can't think of any, the template is merely used to receive the converted HTML, injected where the $body$ variable is found.

Can you confirm to me that when you convert your document using the standard pandoc HTML5 template (with --standalone option) you don't get the same error?

Could please check that both your source document file and the GitHub.html5 template are both in UTF-8 encoding, and without BOM? (I was thinking that maybe some tools might have changed the encoding or BOM)

Could you please paste here the options by which you invoke pandoc for the document conversion?

And, possibly, the full source file you're tring to convert? You might notice that in your original issue post, the source block got trimmed down by GitHub (because of its odd syntax, mixing markdown and pseudo HTML tags, even GitHub previewer didn't render it in full).

from pandoc-goodies.

nvoynov avatar nvoynov commented on August 16, 2024

The archive below contains source CReq SRS.md and two output of the source to html and docx, those both produced without errors but with the following warning

[WARNING] Duplicate link reference '[//]' at line 47 column 1

pandoc options the same for both -s --toc --highlight-style=pygments, and as you can see those both produce the right result for <> in `` quotes

creq.zip

from pandoc-goodies.

tajmone avatar tajmone commented on August 16, 2024

thanks @nvoynov . I've tested converting the markdown file you've provided in the Zip archive, and I've managed to successfully convert it using the GitHub.html5 template (pandoc v2.1.2).

I've used these options (Window CMD):

pandoc  -f markdown -t html5 ^
        --template=GitHub.html5 ^
        "CReq SRS.md" ^
        -o TEST.html

And the critical part you've mentioned at the beginning of this issue seems correctly rendered (pasted from the final HTML):

# [<id>] <title>
{{
<attribute1>: value1
<attribute2>: value2  
}}
<body>

## [<id>] <title>
<body>

I also get the warning about Duplicate link reference '[//]' at line 47 column 1, but nothing else.

Are you using other tools together with pandoc? some filters or other tools you pipe the output to?

from pandoc-goodies.

nvoynov avatar nvoynov commented on August 16, 2024

Thank you for the investigation. I'm not an advanced pandoc user at the moment so that I missed `-f markdown -t html5' options ...

I don't use any additional tools - just combine several files into the single one (CReq SRS.md in the case) and then I convert it by pandoc through CLI. Actually Ruby calls the conversion command exactly the same way as through CLI - pandoc [options] "CReq SRS.md" -o "CReq SRS.html"

I've updated pandoc to 2.1.3 and used the command format provided above, but received the same error. Have you made any changes to GitHub.html5?

>pandoc -f markdown -t html5 --template=GitHub.html5 "CReq SRS.md" -o TEST.html
[WARNING] Duplicate link reference '[//]' at line 47 column 1
"template" (line 857, column 107):
unexpected "<"
expecting letter

from pandoc-goodies.

tajmone avatar tajmone commented on August 16, 2024

Actually Ruby calls the conversion command exactly the same way as through CLI

the cause of the problem might be this. You have to be careful when using tools, and many factors have to be taken into consideration. I've stumbled on similar issues when using a pandoc preprocessor (PP) via Bash (for Windows). I had to change some Bash script settings because special character where not handled properly.

I don't use Ruby actively (just Ruby tools like Sass), so I'm not in a position to advise you where to look exactly, bu my guess is that there might be an issue with special characters escaping, either at the Ruby level or the shell level — if you're using a *nix style shell, you might resort to some shell commands or tools, like sed. It might have to do with Shell expansion, and some of the page's characters are lost during the process, thus entangling the template parsing.

As an example, you can look at how one of the PP macros of this project had to handle this for Bash:

... you'll find comments documenting the differences between the pandoc macro running under Windows CMD and Bash, and how the special character had to be handled in Bash in order to disable Shell expansion.

Sometimes it's enough to add some shell options to workaround escaping problems, other times it might be necessary to pipe contents through sed or other tools along the line.

From the Ruby snippet you've provided here (pandoc [options] "CReq SRS.md" -o "CReq SRS.html") it looks like you're using Ruby just to invoke pandoc, and the output file is written directly to disk, and not captured by Ruby; so it's unlikely that the document is being corrupted by Ruby strings or I/O handling, but it's worth checking that the [options] are expanded and received by pandoc correctly (sometimes a single corrupted character messes everything up).

Under which OS are you testing this? And which shell you're using?

I'm not an advanced pandoc user at the moment so that I missed -f markdown -t html5 options ...

I had included them for demonstration purposes, but they are actually both optional: the --template= option implicitly sets output format to HTML5, and the .md extension of the input file is also automatically associated to markdown (pandoc markdown). But it's always good to include the -f option, because different markdown formats are supported; an it also allows to add/remove extensions.

Pandoc 2 introduced lots of new features, and I'm struggling too in getting used to the changes from pandoc 1.

Have you made any changes to GitHub.html5

Yes, a few months ago I changed the template so it will work with pandoc v2. After that, only CSS tweaks, no structural changes. But the tests I've carried out were with the current version of the template of this project.

The history of changes of GitHub.html5:

Please update me on this problem and how you solved it. It looks like a common usage-scenario problem that could affect many users, so I think it's worth documenting it in this project (a tips and tricks document of sorts).

from pandoc-goodies.

tajmone avatar tajmone commented on August 16, 2024

I advise you to try converting the document via script first (ie: outside the Ruby app/script) to test if the template conversion works.

Also, try use different shells (Bash, Sh, etc., if under Linux; Bash for Windows, CMD, PowerShell if under Windows) and shell configuration settings (variables expansions, charset, etc.) and see if there problem is at the script or shell level.

Once you've found a working configuration, isolating the problem in your toolchain should be easier.

from pandoc-goodies.

nvoynov avatar nvoynov commented on August 16, 2024

it looks like you're using Ruby just to invoke pandoc, and the output file is written directly to disk, and not captured by Ruby

Yes, Ruby just invokes pandoc. So just forget about Ruby, because I use a bare command line by cmd.exe

Under which OS are you testing this? And which shell you're using?

Win10,
cmd.exe,
pandoc 2.1.3 and 2.1.1 the same result

Also, try use different shells (Bash, Sh, etc.,

will try, thanks

from pandoc-goodies.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.