Giter Site home page Giter Site logo

Comments (8)

dginev avatar dginev commented on September 4, 2024

This suggestion sounds a little experimental to the usual LaTeX ecosystem, which as you know is where latexml places its main focus. Having structured inputs is lovely, and life would be easier if all of a document's frontmatter/backmatter was already in a standard structured format before the conversion starts. But in LaTeX it generally isn't, and conventions can vary wildly between different class files for \documentclass.

I can certainly imagine a new commonmeta.sty package with its own commonmeta.sty.ltxml support for latexml, which deals with its own kind of metadata.

The current organizational thinking for latexml is to start such experimental extensions outside of the core code, to avoid eclipsing the "usual" LaTeX needs.

from latexml.

castedo avatar castedo commented on September 4, 2024

start such experimental extensions outside of the core code

Yeah, that make sense to me to keep the core code (in Perl) separate from any code implementing the initial rough feature idea here.

What I'm thinking now is just hashing out some terminology and extra documentation. For instance, for the medium-term, LaTeXML could output JATS XML which is lacking important metadata by design. Then documentation can describe various ways that this limited JATS can be enhanced to have all kinds of fancy metadata.

In particular, now I am thinking a quick cheap recommendation is for an author to literally hand-code the XML that goes into the article meta and then use a trivial XSLT to merge the LaTeXML output JATS with the hand-coded XML with article metadata. Then additional documentation can suggest other approaches with 3rd party software that are fancier and more DRY (like using JSON/YAML etc...).

What you think @dginev?

from latexml.

castedo avatar castedo commented on September 4, 2024

BTW, what does "backmatter" mean?

from latexml.

dginev avatar dginev commented on September 4, 2024

I think we try to go for the usual meanings. Frontmatter are the pieces before the main content - various metadata such as title, date, author, affiliation, as well as larger overview lists, such as a table of contents, table of figures...

Backmatter is the same idea at the end of a doc - appendixes, bibliography, glossary, index... I saw a mention of BibTeX at the front page of commonmeta and assumed there are some provisions for the back as well, but on a closer look it seems not.

from latexml.

xworld21 avatar xworld21 commented on September 4, 2024

I suppose that improving the hyperxmp binding could be a good answer here. hyperxmp allows to add metadata in the tex files for inclusion in the generated PDF and it overlaps with the JATS metadata. LaTeXML already understands a few hyperxmp commands, it should be easy to fill the remaining gaps and then use the data in the JATS stylesheet. (I feel like I mentioned this already ages ago, in EPUB context, but the GitHub search doesn't find it.)

from latexml.

dginev avatar dginev commented on September 4, 2024

I'm really not a fan of XMP myself (the old thread I had a comment in was here #1440 (comment))

LaTeXML should of course be able to support hyperxmp for people who want to emit JATS using it, that's a useful idea.

But it is perfectly OK to have a separate package (or several) possibly targeting different metadata sets, with different macro dialects. Each can have a .ltxml binding into the XML schema, and then a shared path into JATS.

I think the easy point of agreement is that the latexml XML schema should be expressive enough to carry through any JATS-endorsed metadata. But the LaTeX macro dialect should be open-ended, similarly to how we support all kinds of variations for \author and friends.

from latexml.

castedo avatar castedo commented on September 4, 2024

I'm not very familiar with the latexml XML schema nor .ltxml bindings. But I like the idea of any mechanism that is extremely flexible for authors to tweak.

It is probably worth mentioning the highly unusual environment I am looking at for LaTeXML: a dialect of JATS XML that is part of Baseprint Document Format (BDF) https://baseprints.singlesource.pub/bdf/.

Long-term stable environments:

A) arXiv AutoTeX
B) BDF-to-HTML-to-PDF pipelines ***

Short-term unstable idiosyncratic environments:

C) Makefile/justfile/whatever that generate AutoTeX-compatible files from original "true" source files
D) tools that generarate BDF from author source files

LaTeXML could maybe be a component of some cases of D).

I can't speak to what arXiv wants to do, but for B), it does not matter how the source metadata is stored because B) starts with the JATS XML and does not use the original source. For D) I suspect most authors DO NOT want metadata inside LaTeX. For C) I imagine there is a wide variety of preferences.

I'll avoid speculating what makes sense for A), but for the rest I don't see any particular reason article metadata needs to originate from inside a LaTeX file.

[***] Of course, this type of pipeline is under development so it CURRENTLY is not really stable, but that's the long-term goal! 😅

from latexml.

brucemiller avatar brucemiller commented on September 4, 2024

It is probably the case that most metadata that JATS is interested in has a corresponding markup in some journal class file, though certainly not all of it in any single class file. And obviously not in article or book. Certainly there are\email and '\orcid` macros in several classes. I think there's a two part solution here.

Firstly, we should make sure that all the JATS relevant metadata is consistently encoded into LaTeXML's XML by any class bindings that define markup for that data. And then make sure that the JATS stylesheet recognizes and converts those metadata into the appropriate JATS elements.

The second part would be to make provision for such matadata to be supplied outside of the document itself, if desired. You can use --preload=[args]whatever trickery to make it look like \usepackage[args]{whatever} was in the document source. It shouldn't be too hard to write a parser for whatever metadata format you're interested in (YAML, JSON, XMP, commonmetadata...) and there are already (perhaps quirky) tools for inserting that data into the generated document.

from latexml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.