Giter Site home page Giter Site logo

omikhleia / markdown.sile Goto Github PK

View Code? Open in Web Editor NEW
37.0 6.0 3.0 1.52 MB

Native Markdown and Djot support for the SILE typesetting system

License: MIT License

Lua 100.00%
print sile typesetting converter markdown pandoc markdown-to-pdf markdown-converter markdown-parser pandoc-markdown

markdown.sile's People

Contributors

celtic-coder avatar nawordar avatar omikhleia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

markdown.sile's Issues

Installing "markdown.sile" on WSL

Hi Didier (@Omikhleia),

In the past few days, I've been working on installing both the SILE Typesetter and "markdown.sile" in the Windows Subsystem for Linux (WSL). Through trial and error, I was finally able to get it installed in Ubuntu by working through the following steps. I would be interested to know if I just got it working by accident, or might there be a better way of doing this?

In Section 2.7 of the SILE manual (Installing third-party packages), it mentions checking the version of Lua that SILE uses, so that Luarocks will install the correct version. Since I wanted to install locally, I used:

luarocks --lua-version 5.2 install --dev markdown.sile

However, there were problems downloading the "rockspec" file, so I did it manually using:

wget https://luarocks.org/dev/markdown.sile-dev-1.rockspec

I then tried this command, but it gave an error about Lua header files:

luarocks install --dev --local markdown.sile-dev-1.rockspec

Cloning into 'markdown.sile'...
remote: Enumerating objects: 722, done.
remote: Counting objects: 100% (211/211), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 722 (delta 106), reused 175 (delta 88), pack-reused 511
Receiving objects: 100% (722/722), 527.47 KiB | 2.14 MiB/s, done.
Resolving deltas: 100% (364/364), done.

Error: Failed finding Lua header files. You may need to install them or configure LUA_INCDIR.

The article Error: Failed finding Lua header files. You may need to install them or configure LUA_INCDIR. - Windows 10 with Lua distribution, mentioned that Luarocks could not find "lua.h", "lualib.h" and other related files.

To find these on my system, I installed locate (sudo apt install locate -y) and updated the database (sudo updatedb --prunepaths='/mnt') and then updated the LUA_INCDIR as follows:

luarocks-5.2 config variables.LUA_INCDIR /usr/include/lua5.1 --local

However, the installation still failed. Running the Luarocks command by itself (luarocks) to get the configuration information, was showing the same error for the "include" directory, even though the folder existed and "lua.h" was present:

LUA_INCDIR : (not found)

I installed Lua 5.2 (sudo apt install lua5.2), but this did not create any further "include" folders. The article Lua 5.3 is installed, but I can't locate the correct lua.h, however, gave a pointer in the correct direction.

I installed liblua5.2-dev and updated the "locate" database:

sudo apt install liblua5.2-dev
sudo updatedb --prunepaths='/mnt'

and then checked for further "include" folders:

locate include | grep lua

This gave the following:

/usr/include/lua5.1
/usr/include/lua5.1/lauxlib.h
/usr/include/lua5.1/lua.h
....
/usr/include/lua5.2
/usr/include/lua5.2/lauxlib.h
/usr/include/lua5.2/lua.h
....

Updating the LUA_INCDIR (luarocks config variables.LUA_INCDIR "/usr/include/lua5.2" --local) gave the following configuration, with the "include" directory finally being recognised:

Lua:
   Version    : 5.2
   Interpreter: /usr/bin/lua5.2 (ok)
   LUA_DIR    : /usr (ok)
   LUA_BINDIR : /usr/bin (ok)
   LUA_INCDIR : /usr/include/lua5.2 (ok)
   LUA_LIBDIR : /usr/lib/x86_64-linux-gnu (ok)

This allowed the installation to work correctly (luarocks-5.2 install --dev --local markdown.sile-dev-1.rockspec):

Cloning into 'markdown.sile'...
remote: Enumerating objects: 722, done.
remote: Counting objects: 100% (211/211), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 722 (delta 106), reused 175 (delta 88), pack-reused 511
Receiving objects: 100% (722/722), 527.47 KiB | 1.70 MiB/s, done.
Resolving deltas: 100% (364/364), done.
Missing dependencies for markdown.sile dev-1:
   embedders.sile (not installed)
   labelrefs.sile (not installed)
   ptable.sile (not installed)
   smartquotes.sile (not installed)
   textsubsuper.sile (not installed)

markdown.sile dev-1 depends on lua >= 5.1 (5.2-1 provided by VM)
markdown.sile dev-1 depends on embedders.sile (not installed)
Installing https://luarocks.org/dev/embedders.sile-dev-1.rockspec
....
....
markdown.sile dev-1 depends on textsubsuper.sile (not installed)
Installing https://luarocks.org/dev/textsubsuper.sile-dev-1.rockspec
Cloning into 'textsubsuper.sile'...
remote: Enumerating objects: 56, done.
remote: Counting objects: 100% (56/56), done.
remote: Compressing objects: 100% (32/32), done.
remote: Total 56 (delta 23), reused 42 (delta 15), pack-reused 0
Receiving objects: 100% (56/56), 47.15 KiB | 928.00 KiB/s, done.
Resolving deltas: 100% (23/23), done.

textsubsuper.sile dev-1 depends on lua >= 5.1 (5.2-1 provided by VM)
textsubsuper.sile dev-1 is now installed in /home/<user>/.luarocks (license: MIT)

markdown.sile dev-1 is now installed in /home/<user>/.luarocks (license: MIT)

As I said, this was trial-and-error over several days. I wasn't always consistent with the commands and the Lua versions that I used, so I may have been giving myself extra work to get it installed locally.

Some final questions: might there have been some checks that I could have run earlier to find out what was missing? Since I am not familiar with the dependencies between Lua versions, or much about the Lua ecosystem, are there some assumptions that needed to be corrected?

Kind Regards,
Liam

Non-breakable space shall be justifiable

Unicode Line Breaking Algorithm:

... then expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width.

The "nbsp" is therefore intended to be stretchable.

On the parsers' side of things.

  • Regarding pandocast, Pandoc expands &nbsp (or \ with appropriate/default extension options) to U+00A0 in the Str AST node.
  • Regarding markdown, Lunamark expands &nbsp to U+00A0 passed to the writer.string() method.
  • Regarding djot, \ invokes a dedicated nbsp AST node (which we expanded to U+00A0 but see below).

On SILE's side, it doesn't seem the typesetter / shaper / whatever involved1 considers U+00A0 as shrinkable/stretchable for the purpose of justification... Bah! Regardless, we can filter them in our various inputters, and make the necessary adjustment.

In Djot, moreover, since any content can have attributes, we may accept, e.g. \ {.fixed} for a fixed inter-word space, if need be.

Footnotes

  1. IMHO, the SIL and XML inputter should actually be responsible for that, not the typesetters/shapers, etc. = i.e. it's perhaps best regarded as something that should be done at AST level. But heh, the SIL inputter doesn't even split paragraphs, currently, it's done by the typesetter for a part (see typesetter.parseppattern setting, &c). Perhaps a debatable separation of concerns issue...

Tables at 100% width break when centered

Converting a markdown or djot document and using the resilient.book class with default styling

image

Workaround:

Tweak the style file of the document:

table:
  origin: "resilient.book"
  style:
    paragraph:
      after:
        vbreak: false
      align: "justify" # was "center"
      before:
        indent: false

Analysis:

It has to do with the table being 100% of the line, and the "center" (idem for "left", "right") adding lskip/rskip glues around it.
I'm not sure exactly why and if this point towards something deeper.

Attributes with some usual SILE dimensions cannot be used

As reported privately, e.g. { width="50%lw" } or { width="2.5cm" } on an image fail to be parsed.

--> The general issue is that attribute syntax is too restrictive (my bad) - Cf. jgm/lunamark#62

I am not sure I can make sense of Pandoc's Haskell code for these, but at least we could be more tolerant in our vendored copy of lunamark for now, so as to work-around this annoying restriction.

Admonitions (syntax extension)

MkDocs' Markdown has "admonitions" as a syntax extension.

While quoting the "title" seems a bit idiosyncratic, we could support something similar:

!!! keyword Inline content for title
    4-space indented block contents...

Compare to the "fenced divs" syntax, it allows for something simpler for environments needing two arguments (where Markdown can be used in both). I mean, it's doable with fenced divs, but it looks a bit "clumsy" / less straightforward in my opinion, and also makes the task of the writer afterwards less obvious for identifying and extracting the necessary bits.

::: { .keyword }
block contents...
:::: { .title }
Content for title
::::
:::

I would tend to think it would bring a more natural way to typeset some things, with an immediate sensible use -- as shown in the PoC below.

image

I wonder what other writers using Markdown would have to say (or how they'd address that type of need with other means).

(Djot) Incorrect header identifier in nested blocks

Warning about duplicate identifiers errors when a header occurs in a block - e.g. a Blockquote, but more realistically in a Div used for styling -- and then internal links do not point to the correct element.

It's a misunderstanding on how the Djot section AST node behaves: We currently take the identifier to use on the rendered header from the previously seen section, ignoring a possible identifier on the header AST node itself.

However:

  • A section is inserted when a header is found at the document level, and the id is set on the section, not on the header.
  • In nested blocks (e.g. in divs), there's not section wrapped, and the id is set on the header.

Note by the way that with our version of djot.lua, other attributes are always on the header (differing from djot.js and the online Djot playground at the current date)

See also jgm/djot#213 for reference.

(Markdown) YAML metadata block (Pandoc extension)

I could possibly see interesting uses for Pandoc's yaml_metadata_block extension, would it be supported. We'd first have too define/restrict which bits of such a metadata block to use. It also perhaps needs to distinguish between documents processed from the command-line (where one could want to define, e.g., paper size, main fonts, PDF metadata etc.) from documents \include'd in other documents (where such choices would likely have to be ignored, but where other settings could be however still needed, e.g. additional fonts and fallbacks)

Testing framework

I experimented with an AST to SIL reconstruction (see also #15) with the purpose of helping writing tests, e.g. "busted" such as those initiated in file inputters/markdown_spec.lua. It would be nice to have a way (e.g. GitHub action?) to automate the testing:

  • On SILE latest official release or ideally a list of releases...
  • On several versions of Lua.

Correctly implement highlight(mark)/insertions/deletions with styling support

The current implementation for "highlight" a.k.a. "mark" (that is, =inline= in Markdown, {=inline=} in Djot -- additionally also supported as a .mark pseudo-class) , is just a placeholder (hard-coded to show the text in red, with a big FIXME TODO We'd need real support in the code.

What I'd want here is:

  • Some background color by default as fallback, now that we have "liners" spanning multiple lines (in SILE develop branch for 0.15 and likewise in sile-x.
  • Some style-enabled variant when used in resilient styling context, and the latter needs to be upgraded to accept:
    • Customizing the color via a style
    • Possibly to customize the rendering: rather than a rectangular box, use ptable.sile 's "rough" support in order use some sketchy variants (so as to make the text look as manually highlighted)
      • Needs support in ptable.sile for a rough highlight liner
      • Needs support in resilient.sile for a custom style

Pass attributes on link options to a span

Currently [some link](url){ #id .xxx key=value } is supported, but only a subset of the attributes is used (= the id, if present, and the pseudo-classes for cross-references via empty links, e.g. .page etc.).

Other attributes ought to be wrapped onto the content and be applied, so that e.g. [some link](url){ .underline custom-style=MyStyle } would do the expected things...

A workaround is to wrap the link in another span [[some link](url)]{ .underline custom-style=MyStyle }. It works, but it's ugly :)

Support cross-references (debatable)

"See chapter N, section N.M on page P"

We could use our "labelrefs.sile" package for such things, that would be handy for books/articles needing usual cross-reference links.

However Pandoc never seems to have reached a consensus on how this should be declared in (extended-)Markdown. We could come up with our own syntax extension, but that wouldn't be portable. To be investigated.

Ability to disable Markdown extensions

Currently the native markdown inputter enables all supported extensions before invoking the lunamark reader/writer.

IIRC, SILE's \include can accept extra options and pass them to the inputter, so the user could have a way for disabling extensions they do not want for some input file.

Styling horizontal rules... or dinkus and fleurons?

Markdown has several ways for specifying an horizontal rule: "A line containing a row of three or more *, -, or _ characters (optionally separated by spaces) produces a horizontal rule".

Both Pandoc and Lunamark handle all of these identically - resulting in an undifferentiated horizontal rule. And unless I missed it, Pandoc doesn't have an extension to "style" the rule.

I intend to use Markdown for book-like chapters, and no-one uses a bare "full rule" in real books. But overriding it with a single appropriate "separator" design is not fully satisfying either: books may use different rules for different purposes. E.g. it's common to have a centered dinkus * * * (or an asterism ) within a chapter, and a fleuron (or a combination of a fleuron and line-shaped dinkus) at the end of a chapter.

A few variants, on the other hand, might be fairly sufficient for reasonable use, say up to 3 (more tends to be bad practice, as for sectioning commands, were more that 3-4 levels is often frowned upon). In other terms, Markdown offers enough flexibility for different interpretations, and the issue is "just" that the above-mentioned converters/readers cancel this without passing the information to the rendering engine ao as to afford customization points.

It would be quite easy to slightly change Lunamark's parsing logic to return the separator(s), so the user-application could take a decision (e.g. render *** as an asterism, * * * as a dinkus, - - - as an end-of-chapter pendant, while keeping --- for the original basic rule, etc.)

It would be a departure, though, from what Pandoc's AST supports... But heh! Perhaps we can be better here, for the purpose of print-quality books.

I'm interested in any feedback (or other ways it could be addressed).

(pandocast) LuaJson breaks with recent versions of LPeg

If LuaJSON is installed from luarocks as follows, at this date (= installing latest 1.3.4-1 at this point)

luarocks install luajson

Then pandocast fails to process files as LuaJSON fails to be loaded: It is broken due to an improper check on the version of LPeg --> See harningt/luajson#47

Workaround:

Install the dev version.

luarocks install luajson --dev

Language specific smart quotes

Both pandocast and markdown support smart single and double quotes, but they replace " by English quotation marks.

Some languages have different preferences for quotation marks, see e.g. https://en.wikipedia.org/wiki/Quotation_mark#Summary_table

This should perhaps be implemented, i.e. using appropriate single or double quotation marks depending on the (current) language

  • Possibly in a specific dependency packages ("smartquotes" ?) for re-usability elsewhere
  • Perhaps using Fluent, but that might be an overkill?

E.g.

::: { lang=de }
Lorem "dolor"
:::

Should output Lorem „dolor“ rather than Lorem “dolor”

This could really improve typesetting (as those characters are not easily available from the keyboard, and this is the very rationale behind "smart" quotes).

Caveat: Some languages allow more than one way, and country code (as used by SILE, e.g. fr) are not always sufficient.

Known issues with SILE 0.15

Some issues with SILE 0.15:

  • Lots of deprecation warnings
  • Markdown cannot be processed due to lunamark's dependency on "cosmo" (and cosmo is no longer bundled with SILE 0.15)
    • We use our vendored version of lunamark and don't need "cosmo", so we should check how to split the dependency properly (and perhaps mention that upstream, it's not always a necessary dependency)
    • Workarounds:
      • keep using SILE 0.14 for now ;)
      • install cosmo with luarocks explicitly (but it requires gcc etc.)
      • or mkdir cosmo; touch cosmo/init.lua in your working directory, this will make require('cosmo') happy, and since it's not used in our package, that ought to be fine.

Support unnumbered table/figure captions

Opened further to discussion #34 (reply in thread)

With the resilient.book class, one can do in SIL-language:

\begin{figure}
(some figure with a numbered caption)
\caption{yyy}
\end{figure}

\begin[numbering=false]{figure}
(some figure with an unumbered caption)
\caption{yyy}
\end{figure}

In markdown, I would naively have expected the following to do the same thing:

![some figure with numbered caption](examples/images/someimage.png){ width=5cm }

![some figure with unumbered caption](examples/images/someimage.png){ .unnumbered width=5cm }

Apparently I forgot it and we do not propagate the necessary options ;)

Raw blocks stopped working in SILE 0.14.6

$ sile examples/sile-and-markdown-manual.sil -t

SILE v0.14.6.r8-g771d87f-dirty (Lua 5.4)
<examples/sile-and-markdown-manual.sil> as sil
[1] [2] <examples/sile-and-markdown.md> as markdown
[3] [4] <./packages/markdown/commands.lua:304> as sil
<./packages/markdown/commands.lua:304> as sil
[5] <./packages/markdown/commands.lua:304> as sil
[6] ! Underfull frame: 116.55325020005pt stretchiness required to fill but only 72pt available at:
(...)
[7] <./packages/markdown/commands.lua:304> as sil
[8] <./packages/markdown/commands.lua:304> as sil
[9] <./packages/markdown/commands.lua:304> as sil

! Document has more than one parent node that looks like a master document! at:
	./packages/markdown/commands.lua:304: in <snippet>:
		[[For instance, this \em{entire} sentence is typeset in a \em{raw block}, in SILE language.␤]] near examples/sile-and-markdown.md:0:0: in \markdown:internal:rawinline[format="sile"]
	examples/sile-and-markdown.md:0:0: in \markdown:internal:rawinline[format="sile"]
	examples/sile-and-markdown.md: in \markdown:internal:paragraph
	examples/sile-and-markdown.md:0:0: in \markdown:internal:rawblock[format="sile"]
	examples/sile-and-markdown.md: in \texlike_stuff
	examples/sile-and-markdown.md: in <snippet>:
		[[# SILE and Markdown␤␤::: {custom-style=raggedleft}␤"Markdown is intended to be as easy-to-read and e]]
	examples/sile-and-markdown.md: in <snippet>:
		[[# SILE and Markdown␤␤::: {custom-style=raggedleft}␤"Markdown is intended to be as easy-to-read and e]]
	examples/sile-and-markdown-manual.sil:58:1: in \include[src="sile-and-markdown.md"]
	examples/sile-and-markdown-manual.sil: in <snippet>:
		[[\begin[class=resilient.book]{document}␤\use[module=packages.autodoc]␤\use[module=packages.barcodes.e]]


stack traceback:
	[C]: in function 'error'
	/usr/local/share/sile/core/utilities.lua:39: in function 'core.utilities.error'
	/usr/local/share/sile/inputters/sil.lua:211: in function 'inputters.sil.parse'
	/usr/local/share/sile/inputters/base.lua:47: in function 'inputters.markdown.process'
	/usr/local/share/sile/core/sile.lua:300: in function 'core.sile.processString'
	./packages/markdown/commands.lua:304: in field '?'
	/usr/local/share/sile/core/sile.lua:391: in function 'core.sile.call'
	./packages/markdown/commands.lua:320: in function <./packages/markdown/commands.lua:319>
	(...tail calls...)
(...)
error summary:
	Processing at: ./packages/markdown/commands.lua:304: in <snippet>:
		[[For instance, this \em{entire} sentence is typeset in a \em{raw block}, in SILE language.␤]] near examples/sile-and-markdown.md:0:0: in \markdown:internal:rawinline[format="sile"]
	Using code at: /usr/local/share/sile/inputters/sil.lua:211: Document has more than one parent node that looks like a master document!

Blank line inserted between sectioning headers and some content

In 1.2.0 and earlier, we tried to play fair with the default book class from SILE, and when processing a sectioning command (chapter, section, etc.), we insert the identifier label for cross-referencing just after it.

A comment in the code did note that the solution is hardly satisfying:

Somewhat messy. If done before the sectioning, it could end on the previous page.
Within its content, it breaks as TOC entries want a table content, so we can't use a function above...
We are left with doing it after, but that's not perfect either vs. page breaks and indent/noindent...
In the resilient.book class, I added a marker option to sections and reimplemented that part, but here we work with what we have...

Or to re-explain it here:

  • The identifier materializes as two special zero-hboxes (\label --> a \pdf:destination hbox + an \info hbox)
  • We cannot insert these before the sectioning command:
    • It could add a blank line there before the section skip (e.g. if previous content terminates a paragraph)
    • It could end up on a preceding page (e.g. parts and chapters add page breaks, even sections might add goodbreak penalties)
  • We cannot insert them inside the chapter header easily: as part of the title, they then get pushed to the ToC logic, or to running headers, etc., and cause some havoc there...
  • We cannot really insert them after the sectioning command -- although this is what we do
    • It could add a blank line there after the section skip (e.g. if subsequent content is not horizontal content such as text etc.)
    • It could impact other things on the way...

The latter is visible with the resilient.book class when running with -d resilient.styles in (future) v2.0.1:

image

The solution adopted by the resilient classes was to add a marker option to the sectioning commands (e.g. \chapter[marker=id]{Title}), which inserts the identifier just before the title, but after the latter was extracted for the ToC, headers, etc.

Had we used this solution here, the problem would have been solved:

image

... But we can't do that currently with SILE's book class.

Update pandocast for pandoc-types 1.23 (Pandoc 3.1)

Implicit figures have apparently changed in pandoc 3.1 (based on pandoc-types 1.23), generating something such as:

[ Figure
    ( "" , [] , [] )
    (Caption Nothing [ Plain [ Str "text" ] ])
    [ Plain
        [ Image
            ( "" , [] , [] ) [ Str "text" ] ( "uri" , "" )
        ]
    ]
]

.... Instead of a Para with an Image.

I haven't checked yet -- it might relate (or not) to the implicit_figures extension.

We'd need to check and possibly update the pandocast inputter to support the change.

Using resilient lists instead of standard lists

As mentioned in #56, and more generally, so far the markdown/djot packages have tried to play fair with a SILE setup.
So for ordered and unordered lists, we load the (SILE) lists package.
However, when used with the resilient classes, we rather ought to use the resilient.lists package and benefit from its features, incl. styling...

Fenced graph blocks (e.g. DOT graph, etc.)

Some Markdown converters recognize fenced code blocks of a certain type (infostring/class) as graph drawing language and renders them visually.

For the record, my previous experiment with a Pandoc custom Lua writer did that too, but only when certain attributes are set:

All the above examples (= of fenced code blocks) specified the programming language after the fenced
code block marker (e.g. lua ). For the DOT graph language, this converter also supports an extended syntax {.dot width=... height=...}. When a width and/or a height are specified, the graph is included as an image, instead of the corresponding code.

Things to address

  • Prerequisite We'd need good packages for invoking the appropriate program and retrieving the image (my old quick'n dirty experiment for dot is certainly subpar).
  • So as to render the fenced block as an image rather than text, we'd need some appropriate way for the markdown package to know which supporting packages are available for a given format (e.g. possibly checking loaded converters, but the SILE converters is IMHO a wrong mess...)
  • Proper Markdown syntax. In the above case I suggested using width/height because:
    • It's very likely the user needs to control these depending on his favored page layout/style.
    • There must be a way to still get the code rendered as text if wanted.

Djot language support

Djot (https://djot.net/) is John MacFarlane's experimental "light markup syntax", derived from commonmark, but fixing most of the complex syntax pitfalls, yet with more flexibility on some aspects (e.g. attributes everywhere!)

It has an experimental Lua implementation (https://github.com/jgm/djot.lua).

It sounds pretty cool:
A real quick attempt this week-end shows that its use is quite straightforward (reusing most of the commands for Markdown processing), so I might push it here at some point (after some cleanup and experimentation, and adding to the quick experiment some of the important missing bits, e.g. tables!)

Simplifying percentage syntax in width/height attributes

Whether in Djot or Markdown, for key-value attributes used as dimensions, we currently state "that any unit system supported by SILE is accepted"

Hence, e.g. ![](someimage.png){height="4cm"}

An implication, though, is that we need to use width="50%lw", or height="50%fh" for frame-relative sizing... This is not very user-friendly, and probably not what most users would expect.

At our level, we can't know whether those attributes will eventually end up as SILE measurements/lengths. Of course, it could be better if SILE did it natively, but that's a bunch of packages to address (such as image and svg in the core distribution, my own embedders, ptable etc. -- and probably others I overlooked here...)

A proposal would we to:

  • Automatically convert "50%" to "50%lw" for width (note the "lw" rather than "fw" -- I'd tend to think this is the most correct default interpretation)
  • Automatically convert "50%" to "50%fh" for height

Rethink how feature checks are performed

You can't make an omelette without breaking eggs

After struggling with making silex.sile v0.4 less aggressive and not replace SILE's internals globally... I realize that markdown.sile actually enforces the replacements indirectly...

This is due to the feature check it does, loading the resilient class (which enforces all of silex):

local ok, ResilientBase = pcall(require, 'classes.resilient.base')
self.isResilient = ok and self.class:is_a(ResilientBase)

So even processing a document using the bare "book" ends up loading the silex replacements at some point (and possibly in the course of a document processing, which then causes havoc)

We'd possibly need to do those checks differently, there's some wrong inversion of logic.

Bibliography citations

Bibliography citations are both supported by lunamark (in the native approach route) and Pandoc (with the Pandoc AST route) -- The latter possibly taking advantages of some "pandoc-filter".

SILE has a basic bibTeX bibliography support, but I haven't investigated yet if it could be used here. See comment below...

Embedding Markdown/Djot in a surrounding inline context

Thoughts from sile-typesetter/sile#1866 (comment)

Currently, the SIL case (Some SIL stuff \raw[type=markdown]{Some Markdown}) won't lead to a single paragraph (as the markdown/djot parsers generates a surrounding paragraph).

But likewise, we cannot have e.g. Some Djot stuff `Some Markdown stuff`{.markdown}

We need to do something different whether we are already in horizontal mode or not, and the underlying parser/inputter would likely need to restrict the possibilities to inline elements only.

(Djot) Handle attributes on note references

Missed it in the initial implementation - actually I didn't thought at all to the possibility - but Djot obviously allows attaching attributes to the footnote reference. This would be super useful, e.g.

... footnote call[^djot-fun-note]...
... See note [](#id)

{#id mark="†"}
[^djot-fun-note]: Footnote content
  • Passing attributes to the footnote implementation (such as here the "mark" for resilient.footnotes)
  • Handling the id specifically for cross references

Remove the luajson manual installation for pandocast

The REAME currently states, in the pandocast usage section:

Prerequisites: The LuaJSON module must be installed and available to your SILE environment. This topic is not covered here.

This ought to be a package dependency, without causing an impediment to the user.

From what I recollect, I had issues with non-"sile" scoped dependencies in 0.14.0. To be checked again with 0.14.4 or upper, maybe I was initially wrong too.

Command line conversion fails with SILE 0.14.5

SILEX -u inputters.markdown -u packages.autodoc examples/sile-and-markdown.md 
...
! This isn't a SILE document! at examples/sile-and-markdown.md: in <snippet>:
		[[# SILE and Markdown␤␤::: {custom-style=raggedleft}␤"Markdown is intended to be as easy-to-read and e]]
...

Relates to sile-typesetter/sile#1637

Better integration with resilient styles

When the packages are used with the resilient classes, the custom-style should allow referring to a style name (and not only a SILE command). That would allow better integration with the resilient framework and styling paradigm.

(Djot) Bad priority between alpha- and roman-numbered list items

Djot input:

i. ddd
i. ddd

Observed = interpreted as an alpha list:

i. ddd
j. ddd

Workaround to get roman numbers:

i. ddd
ii. ddd

Nevertheless we need to prioritize roman numerals over alpha-numbering when ambiguous: djot.js was already fixed that way.

See jgm/djot.lua#7 (not yet merged at the time of writing) --> Check if it does the trick and consider backporting it into our vendored local copy of djot.lua.

Captioned images (figures)

Any good document often needs captioned figures -- and for full support, we have them in the resilient.book class. And as with captioned tables, our packages could have a default fallback for other classes. So the question now is how to implement this.

On one hand, there's Pandoc's implicit_figures extension (enabled by defaut, IIRC):

"An image with nonempty alt text, occurring by itself in a paragraph, will be rendered as a figure with a caption. The image’s alt text will be used as the caption."

To investigate: This doesn't seem to be a first-class citizen in the Pandoc JSON AST, so it is probably done by the supporting Pandoc writers (?) - If so, I'm not sure we'd have to tweak lunamark here, for the native markdown route.

On the other, an alternative to supporting the Pandoc way would be to use a surrounding div (:::) with some "recognized" option.

Speaker changes in dialogues

In French (at least), it's common do have speaker changes in dialogues introduced by an em-dash:

— Lorem ipsum dolor sit amet, consectetur adipiscing elit, dit il.
— C’est sûr !

For comparison, English does not use such dashes normally.

“Lorem ipsum dolor sit amet, consectetur adipiscing elit,” he said.
“Sure!”

(English may used em-dashes, but at the end of a quote. Anyway, that's not the point).

The tricky thing here is that the following space ought to be fixed (= not stretched or shrinked by line justification) so that subsequent dialogue lines start identically.

Of course, I need to typeset such dialogues...

I was first tempted to do it in SILE itself (as we do for other fancy Unicode character and French punctuation, etc.) but it is not completely obvious to do there (I'm pulling my hair out trying, and I don't have much hair left, lol) and it requires awaiting a release...

It would be fairly obvious to do here, intercepting —[ ]+ at the start of a paragraph (where the is native or derives from --- in Markdown or Djot, it ends up as — anyway in the SILE AST).

The workaround is to manually enforce it, working in Djot only (attributes on anything, here a nbsp):

---\ {.fixed}Lorem .... 

---\ {.fixed}Sure!

Cumbersome and annoying when copying/pasting text....

I dunno. Try harder to implement it in SILE for any input (incl. SIL), or push my already working attempt here?

Closer look at inline HTML in Markdown

A generalization on #13 with elements of analysis and discussion...

  • blocks elements, except <hr>

    • Pandoc has extensions native_divs and markdown_in_html_blocks, both enabled by default
    • As of yet, these are unsupported extensions in Lunamark
    • Lunamark has writer.display_html which should work as in Pandoc with the above extensions disabled. (Should = I didn't test).
    • This encompasses a lot of things (incl. <table> for instance), most of which we cannot easily render in a satisfying way (even with the help of a 3rd party HTML parsing library, such as htmlparser hinted at in #18)
  • Block elements, the special case of <hr>

    • As it is supposed not to have any content (in the W3C HTML specification, but browsers try to accept it), maybe we could have a special handling of it...
  • Inline elements, except br and wbr

    • Inline Markdown is valid in them.
    • Pandoc has extension native_spans enabled by default, for <span> elements
    • With it, spans are transformed the equivalent bracketed_spans, respecting the structure (i.e. the content is below the Pandoc.Span element)
      $ pandoc -t json
       <span>content _italic_</span>
      {"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para",  "c":[{"t":"Span","c":[["",[],[]],[{"t":"Str","c":"content"},{"t":"Space"},{"t":"Emph","c":[{"t":"Str","c":"italic"}]}]]}]}]}
      
    • Without it, and for any other inline elements (e.g. <sup> etc.), the HTML is spit out, but flattened. I.e. the structure is lost, one gets at the same level the opening tag, the content and the closing tag
      $ pandoc -t json -f markdown-native_spans
      <span>content _italic_</span>
      {"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para","c":[{"t":"RawInline","c":["html","<span>"]},{"t":"Str","c":"content"},{"t":"Space"},{"t":"Emph","c":[{"t":"Str","c":"italic"}]},{"t":"RawInline","c":["html","</span>"]}]}]}
      
    • Lunamark has writer.inline_html which is technically equivalent to Pandoc's markdown-native_spans
  • Inline elements, the special cas of br and wbr

    • As they are supposed not to have any content (in the W3C HTML specification, but browsers try to accept it), maybe we could have a special handling of them...

Preliminary conclusions

  • Block elements are hard to reach...
    • Though <hr/> could be done more easily - but is it worth the effort and test anyway, as Markdown supports horizontal rules and we can even achieve nice things with them (#27)
  • Inline elements are hard to reach due to their "flattening" losing the hierarchy tree (... and reconstructing it is probably not a very clever approach) = We can't have e.g. <sup> working without much additional logic. The fact that they allow Markdown content also makes the use of an HTML parsing library very clumsy...
    • Though <span> could be supported by implementing native_spans in Lunamark - but is it worth the effort?
    • In all cases, <br> and <wbr> would be (decently) easy to support (#13)

Should we include an `outputter` that would just write out the output from `inputter.markdown.parse(…)`?

I added a debug locally to do it but it seems highly useful as most likely some people will want to go from Markdown to SILE on a more permanent basis.

diff --git a/inputters/markdown.lua b/inputters/markdown.lua
index 980e97e..8d15216 100644
--- a/inputters/markdown.lua
+++ b/inputters/markdown.lua
@@ -3,6 +3,7 @@
 -- Using the lunamark library.
 --
 local utils = require("packages.markdown.utils")
+local ast = require("sile.core.ast")
 
 local function simpleCommandWrapper (name)
   -- Simple wrapper argound a SILE command
@@ -334,6 +335,7 @@ function inputter.parse (_, doc)
     escaped_line_breaks = true,
   })
   local tree = parse(doc)
+  SU.debug("markdown", "Parsed tree: " .. ast.astToSil(tree))
   -- The Markdown parsing returns a string or a SILE AST table.
   -- Wrap it in some document structure so we can just process it, and if at
   -- root level, load a default support class.

Math support (a.k.a. tex_math_dollars)

We'd need:

  • Support for Pandoc's tex_math_dollars extension in Lunamark (= jgm/lunamark#50) to extract the formula and invoke the writer.
  • The path then for SILE has to be clarified
    • Either we assume its TeX-like math is kind of OK, though with restrictions (at this time of writing) on the supported syntax subset...
    • Or we go to the mathml route (which at the time of writing again, seems a bit better in SILE), but that would mean converting the (La)TeX formula to MathML. We are not going, certainly, to reinvent the wheel here1, so likely use something existing, e.g. TEMML, although it implies a dependency on NodeJS and forking an external process... And MathML in SILE also has some restrictions (at the time of writing again, such as sile-typesetter/sile#1604, so I am unsure it's worth the effort going that route.
    • However, SILE math is not even an inputter, though I've a few doubts regarding the actual implementation, esp. with AST dumping and round-tripping... (#15)

Footnotes

  1. The Lua ecosystem is so non-existent that the very choice of Lua for SILE could be questioned. And IMHO, Rust is not even better -- said with a grain of salt, to those in the SILE community who might argue otherwise, heh! 😸

"kpairs" error on examples Markdown file

Hi Didier (@Omikhleia),

I installed the SILE Typesetter and "markdown.sile" on Ubuntu as noted in the Discussions. I cloned the repo and then tried to run SILE on the "introduction.md" file in the "examples" folder:

sile --use inputters.markdown --use packages.autodoc introduction.md

This gave the following error:

kpairs-error

The text of the error is:

SILE v0.14.9 (Lua 5.2)

! Unexpected Lua error

error summary:
        Processing at: introduction.md: in <snippet>:
                [[# Introduction␤␤This collection of modules for the [SILE](https://github.com/sile-typesetter/sile) t]]
        Using code at: /usr/share/sile/packages/math/texlike.lua:188: attempt to call field 'kpairs' (a nil value)

Run with --traceback for more detailed trace leading up to errors.

Running the --traceback gives the following extended output:

SILE v0.14.9 (Lua 5.2)

! Unexpected Lua error

stack traceback:
        /usr/share/sile/packages/math/texlike.lua:188: in function 'fold_pairs'
        /usr/share/sile/packages/math/texlike.lua:256: in function 'compileToMathML_aux'
        /usr/share/sile/packages/math/texlike.lua:398: in function 'compileToMathML'
        /usr/share/sile/packages/math/texlike.lua:420: in main chunk
        [C]: in function 'require'
        /usr/share/sile/packages/math/init.lua:10: in function 'init'
        /usr/share/lua/5.2/pl/class.lua:38: in function 'call_ctor'
        /usr/share/lua/5.2/pl/class.lua:171: in function 'pack'
        /usr/share/sile/classes/base.lua:145: in function 'loadPackage'
        ...
        /home/new/.luarocks/share/lua/5.2/sile/classes/markdown.lua:7: in function 'init'
        /usr/share/lua/5.2/pl/class.lua:38: in function 'call_ctor'
        /usr/share/lua/5.2/pl/class.lua:171: in function 'constructor'
        /usr/share/sile/inputters/base.lua:31: in function 'classInit'
        /usr/share/sile/inputters/base.lua:40: in function 'requireClass'
        /usr/share/sile/inputters/base.lua:48: in function 'process'
        /usr/share/sile/core/sile.lua:298: in function 'processString'
        /usr/share/sile/core/sile.lua:330: in function </usr/share/sile/core/sile.lua:303>
        (...tail calls...)
        [C]: in function 'xpcall'
        /usr/bin/sile:127: in main chunk
        [C]: in ?

error summary:
        Processing at: introduction.md: in <snippet>:
                [[# Introduction␤␤This collection of modules for the [SILE](https://github.com/sile-typesetter/sile) t]]
        Using code at: /usr/share/sile/packages/math/texlike.lua:188: attempt to call field 'kpairs' (a nil value)
new@DESKTOP-SNF8BOK:~/GitHub/Omikhleia/markdown.sile/examples$

As noted in the Discussion, I installed everything locally, which is why there is a reference to /home/new/.luarocks/share/lua/5.2/sile/classes/markdown.lua:7: in function 'init' in the above output.

Running cat /home/new/.luarocks/share/lua/5.2/sile/classes/markdown.lua gives:

local book = require("classes.book")
local class = pl.class(book)
class._name = "markdown"

function class:_init (options)
  book._init(self, options)
  self:loadPackage("markdown")
  return self
end

return class

This looks like your classes/markdown.lua file:

classes-markdown-lua

I do not know how classes or packages work in Lua, but I would have expected that the local inputters/markdown.lua would have been part of that trace, since it is specified in the --use on the command line.

The only reference to "inputters" in the traceback are the following lines:

/usr/share/sile/inputters/base.lua:40: in function 'requireClass'
/usr/share/sile/inputters/base.lua:31: in function 'classInit'
/usr/share/sile/inputters/base.lua:48: in function 'process'

Is this expected? Further, when I do a locate "markdown.lua", the first three entries are:

/home/new/.luarocks/share/lua/5.2/sile/classes/markdown.lua
/home/new/.luarocks/share/lua/5.2/sile/inputters/markdown.lua
/home/new/.luarocks/share/lua/5.2/sile/lunamark/reader/markdown.lua

These are the expected "local" entries. Is there possibly an issue with the PATH such that the "classes" is being picked up ahead of the "inputters"? Or have I gone down the wrong rabbit trail completely?

In any case, your assistance would be appreciated!

Kind Regards,
Liam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.