elifesciences / xml-mapping Goto Github PK

View Code? Open in Web Editor NEW

3.0 14.0 8.0 118.75 MB

Mapping the XML to new Continuum requirements and build of a new Kitchen sink

xml-mapping's Introduction

XML-mapping

This repository contains the 'kitchen sink' article XML that captures how eLife models article data in XML.

Updates to the XML are demonstrated here as feature branches to help downstream changes.

Usage of the kitchen sink as a test fixture along with it's dependencies should be consolidated here.

eLife file name conventions

See: elife_file_naming_2016_08_25.md

kitchen sink XML

elife-1234567890-v1.xml

Forked from elife-00666.xml 2021-08-12, the manuscript ID 666 was conflicting with the actual article 666 and causing problems during testing.

The new MSID is hopefully very obviously fake.

elife-1234567890-v2.xml

Forked from elife-1234567890-v1.xml to demonstrate several updates to handling of content, now adapted as XML for articles published under the Publish Review Curate (PRC) model.

elife-1234567890-v3.xml

Forked from elife-1234567890-v1.xml, this XML contains all JATS4R and other related updates for potential future inclusion. It is not indicative of current content.

elife-00666.xml

The original kitchen sink, kept until all organisation references are updated but otherwise archived.

Do not use, to be removed.

elife-00777.xml

Non-research "feature" content.

xml-mapping's People

Contributors

Stargazers

Watchers

Forkers

gnott jayalakshmib ayyppan vijayakumarexeter aravindkumarn de-code dhatshayanidayalakumar sharropbmj

xml-mapping's Issues

Tags the parseJATS parser currently doesn't pick up at all (non-exhaustive)

inline-formula / mml:math
- nested text formatting tags, such as inside a figure caption or in an abstract. The parser either picks it up as plain text (no tags) or full text (XML tags included)
- element-citation tags probably not parsed yet: comment, patent, ext-link, x [we can probably ignore x tags?]

Equal contributions

HTML: not seen in rollover samples.
In author info at end of article:
Contributed equally with: Verena Schuenemann

PDF: footnote symbol and "These authors contributed equally to this work"
If 2 seperate groups, second group as footnote symbol and "These authors also contributed equally to this work"

XML currently:

<fn fn-type="con" id="equal-contrib">
                    <label>&#x2020;</label>
                    <p>These authors contributed equally to this work</p>
                </fn>
                <fn fn-type="con" id="equal-contrib2">
                    <label>&#x2021;</label>
                    <p>These authors also contributed equally to this work</p>
                </fn>

add guidance/examples on part labels to figures

add figure with specific permission

Group authors: <contrib-id contrib-id-type="group-author-key">group-author-id1</contrib-id>

Look at this – don’t want the id to display, change somehow?

Add reviewer details

Add reviewer details when they agree to have their named exposed/revealed

using the attribute “content-type”

For previous versions, instead of using <date date-type>
Still awaiting clarification from PMC

Does correspondence need to be in author notes?

Put email into their aff only?
This would simplify the XML
Need to consider the parser implications

add image credit to feature example

Decision letter box

Website pattern does not take form XML and has boilerplated, therefore if andy changes it will need to tell prod and dev

Other footnotes

Example:

<fn fn-type="other" id="fn1">
<p>
Sophien Kamoun, Johannes Krause, Marco Thines, and Detlef Weigel are listed in alphabetical order
</p>
</fn>

<chem-struct-wrap>

Does the new kitchen sink need <chem-struct-wrap> tag examples?

JATS 1.1 change - important commit

abd40e6

Update clinical trial data in XML

Add full details as per new CrossRef and PubMed requirements.

ref with no id attribute

In the kitchen sink as at today,

https://github.com/elifesciences/XML-mapping/blob/master/elife-00666.xml#L1981

The <ref> as pointed to above has no id attribute. It is failing jats-scraper because of this

<ref>
<element-citation publication-type="software">
<person-group person-group-type="author">
<collab>R Development Core Team</collab>
</person-group>
<data-title>
R: a language and environment for statistical computing
</data-title>
<version>3.2.2</version>
<year iso-8601-date="2015">2015</year>
<publisher-loc>Vienna, Austria</publisher-loc>
<publisher-name>R Foundation for Statistical Computing</publisher-name>
<uri xlink:href="http://www.r-project.org/">http://www.r-project.org/</uri>
</element-citation>
</ref>

Adding PMIDs to References

Does it need a url link as well as the PMID?

<fn fn-type="COI-statement">

26208cb

See: https://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/dobs.html#dob-coi

PMC discussions: corresponding author

My moving author email to their aff, we are losing the cross linking footnote so display on PMC is not good. But XML is better.
MH to discuss with PMC

supplementary-material mimetype and mime-subtype

I'm looking specifically at this for parsing,

<media mimetype="xlsx" xlink:href="elife-00666-fig3-figsupp1-data1-v1.xlsx"/>

Would it have a mime-subtype and also switched, so it'd be something like this?

<media mime-subtype="xlsx" mimetype="application" xlink:href="elife-00666-fig3-figsupp1-data1-v1.xlsx"/>

Values you'd think are numeric and not always numeric

Values you'd think are numeric and not always numeric (if the json schema expects it) Such as fpage, volume, issue, year, etc.

add to Kitchen sink: Figure in Appendix with figure supplements

Levels of headings

Only 4 allowed - HTML only allows for 5 and one of these is the article title

mono and pre-formatted - add to xml

Kriya and process implications: Versioning and history information and dates

See 0bc1262

This new requirement has big implications for the workflow.

Exeter will need to know of ALL previous versions of PoA content and add them to the XML (dates of publication and URL links)
Kriya will require the addition of two new fields - publication date for PoA'd content (add the new VoR date)
Kriya will need a new field so production can add a note if publishing a new version of a VoR so they can indicate the changes made

What do we do about ORCIDs for members of group?

Example paper 17044

Author contributions

Change made:
No initials or name added to start of footnote in XML.
HTML will not need to do anything to the content, just pull straight over.
PDF will require some formatting for display

Cross linking to funding - edit but still enabled for profiles

In author aff:
<xref ref-type="other" rid="par-1"/>

In funding info:
<funding-group> <award-group id="par-1"> <funding-source> <institution-wrap> <institution>Wellcome</institution> </institution-wrap> </funding-source> <principal-award-recipient> <name> <surname>Harrison</surname> <given-names>Melissa</given-names> </name> </principal-award-recipient> </award-group> </funding-group>

Will the xref be used to build Author profiles?
Can name in funding be used instead and remove the xref link?
We think that is better option for text and data mining - Melissa can verify with T&DM people

punctuation between affiliation fields - check PMC and remove?

sub groups within author group?

Proposal:

<contrib contrib-type="author"> <collab>eLife Working Group <contrib-group> <role>Production Group</role> <contrib> <name> <surname>Shearer</surname> <given-names>Alistair</given-names> </name> <aff> <institution>eLife</institution>, <addr-line> <named-content content-type="city">Cambridge</named-content> </addr-line> <country>United Kingdom</country> </aff> </contrib> <contrib> <name> <surname>Caton</surname> <given-names>Hannah</given-names> </name> <aff> <institution>eLife</institution>, <addr-line> <named-content content-type="city">Cambridge</named-content> </addr-line> <country>United Kingdom</country> </aff> </contrib> </contrib-group> <contrib-group> <role>Technical Group</role> <contrib> <name> <surname>Harrison</surname> <given-names>Melissa</given-names> </name> <aff> <institution>eLife</institution>, <addr-line> <named-content content-type="city">Cambridge</named-content> </addr-line> <country>United Kingdom</country> </aff> </contrib> <contrib> <name> <surname>Gilbert</surname> <given-names>James</given-names> </name> <aff> <institution>eLife</institution>, <addr-line> <named-content content-type="city">Cambridge</named-content> </addr-line> <country>United Kingdom</country> </aff> </contrib> </contrib-group> </collab> </contrib>

Add link to Decision letter and response DOIs in PDF

New license information

Added to XML:
5cbf037

Add related article linking to XML

XML needs to define the alignment of column tables

Correspondence - lose content not within tags

<author-notes> <corresp id="cor1"> <label>*</label> For correspondence: <email>[email protected]</email> (MH) </corresp> <corresp id="cor2"> <email>[email protected]</email> (CW) </corresp> <fn fn-type="con" id="equal-contrib"> <label>†</label> These authors contributed equally to this work </fn> </author-notes>

Boxes and feature articles to be added

https://elifesciences.org/content/4/e05519
Box 1 : Text box with a figure and a table!
Box 2: Text box with two figures and a numbered list
Box 3: Simple text box
Box 4: Text box with bullet points

https://elifesciences.org/content/4/e06813
Box 1: Text box with section headings
Box 2: Simple text box

https://elifesciences.org/content/4/e09305
Box 1: Text box with numbered list
Box 2: Text box with bullet points

https://elifesciences.org/content/5/e16800
Box 1: Text box with numbered list and bold headings

add figure suppl to figure in appendix

Beef out appendix

Archive clean up?

For not going back to change the archives, that's good to know the opinion so far. The python parser we have right now can handle lots different sorts of JATS (like old PoA will have inline aff tags, going forward they have separate aff tags linked via xref tags). We'll need to keep this flexibility in the parser, and we can keep test scenarios that run over various XML files from old to new.

Add present address to 00666

Position of figure source code/data

I think this was a HighWire requirement:

source code and source data for a figure was contained within the caption of that figure, but the figure supplement is not.
I have been playing with this and it is not necessary and it does not fit logically, to me.

My preference is:
<fig id="fig4" position="float"> <object-id pub-id-type="doi">10.7554/eLife.13222.010</object-id> <label>Figure 4.</label> <caption> <title>Single figure with source code.</title> </caption> <graphic xlink:href="elife-00666-fig4-v1"/> <supplementary-material id="SD2-data"> <object-id pub-id-type="doi">10.7554/eLife.00666.011</object-id> <label>Figure 4—Source code 1.</label> <caption> <title>Title of the source code.</title> Legend of the source code. </caption> <media mimetype="xlsx" xlink:href="elife-00666-fig4-data1-v1.xlsx"/> </supplementary-material> </fig>

But our current model is:
<fig id="fig3s1" position="float" specific-use="child-fig"> <object-id pub-id-type="doi">10.7554/eLife.00666.008</object-id> <label>Figure 3—figure supplement 1.</label> <caption> <title>Title of the figure supplement</title> <supplementary-material id="SD1-data"> <object-id pub-id-type="doi">10.7554/eLife.00666.009</object-id> <label>Figure 3—figure supplement 1—Source data 1.</label> <caption> <title>Title of the figure supplement source data.</title> Legend of the figure supplement source data. </caption> <media mimetype="xlsx" xlink:href="elife-00666-fig3-figsupp1-data1-v1.xlsx"/> </supplementary-material> </caption> <graphic xlink:href="elife-00666-fig3-figsupp1-v1"/> </fig>

Do you agree with me to change this?