Giter Site home page Giter Site logo

xml-mapping's Introduction

XML-mapping

This repository contains the 'kitchen sink' article XML that captures how eLife models article data in XML.

Updates to the XML are demonstrated here as feature branches to help downstream changes.

Usage of the kitchen sink as a test fixture along with it's dependencies should be consolidated here.

eLife file name conventions

See: elife_file_naming_2016_08_25.md

kitchen sink XML

elife-1234567890-v1.xml

Forked from elife-00666.xml 2021-08-12, the manuscript ID 666 was conflicting with the actual article 666 and causing problems during testing.

The new MSID is hopefully very obviously fake.

elife-1234567890-v2.xml

Forked from elife-1234567890-v1.xml to demonstrate several updates to handling of content, now adapted as XML for articles published under the Publish Review Curate (PRC) model.

elife-1234567890-v3.xml

Forked from elife-1234567890-v1.xml, this XML contains all JATS4R and other related updates for potential future inclusion. It is not indicative of current content.

elife-00666.xml

The original kitchen sink, kept until all organisation references are updated but otherwise archived.

Do not use, to be removed.

elife-00777.xml

Non-research "feature" content.

xml-mapping's People

Contributors

ayyppan avatar fred-atherden avatar giorgiosironi avatar gnott avatar jgilbert-elife avatar lsh-0 avatar nuclearredeye avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xml-mapping's Issues

Tags the parseJATS parser currently doesn't pick up at all (non-exhaustive)

  • inline-formula / mml:math
    • nested text formatting tags, such as inside a figure caption or in an abstract. The parser either picks it up as plain text (no tags) or full text (XML tags included)
    • element-citation tags probably not parsed yet: comment, patent, ext-link, x [we can probably ignore x tags?]

Equal contributions

HTML: not seen in rollover samples.
In author info at end of article:
Contributed equally with: Verena Schuenemann

PDF: footnote symbol and "These authors contributed equally to this work"
If 2 seperate groups, second group as footnote symbol and "These authors also contributed equally to this work"

XML currently:

<fn fn-type="con" id="equal-contrib">
                    <label>&#x2020;</label>
                    <p>These authors contributed equally to this work</p>
                </fn>
                <fn fn-type="con" id="equal-contrib2">
                    <label>&#x2021;</label>
                    <p>These authors also contributed equally to this work</p>
                </fn>

Decision letter box

Website pattern does not take form XML and has boilerplated, therefore if andy changes it will need to tell prod and dev

Other footnotes

Example:

<fn fn-type="other" id="fn1">
<p>
Sophien Kamoun, Johannes Krause, Marco Thines, and Detlef Weigel are listed in alphabetical order
</p>
</fn>

ref with no id attribute

In the kitchen sink as at today,

https://github.com/elifesciences/XML-mapping/blob/master/elife-00666.xml#L1981

The <ref> as pointed to above has no id attribute. It is failing jats-scraper because of this

<ref>
<element-citation publication-type="software">
<person-group person-group-type="author">
<collab>R Development Core Team</collab>
</person-group>
<data-title>
R: a language and environment for statistical computing
</data-title>
<version>3.2.2</version>
<year iso-8601-date="2015">2015</year>
<publisher-loc>Vienna, Austria</publisher-loc>
<publisher-name>R Foundation for Statistical Computing</publisher-name>
<uri xlink:href="http://www.r-project.org/">http://www.r-project.org/</uri>
</element-citation>
</ref>

PMC discussions: corresponding author

screen shot 2016-08-24 at 10 01 32

My moving author email to their aff, we are losing the cross linking footnote so display on PMC is not good. But XML is better.
MH to discuss with PMC

supplementary-material mimetype and mime-subtype

I'm looking specifically at this for parsing,

<media mimetype="xlsx" xlink:href="elife-00666-fig3-figsupp1-data1-v1.xlsx"/>

Would it have a mime-subtype and also switched, so it'd be something like this?

<media mime-subtype="xlsx" mimetype="application" xlink:href="elife-00666-fig3-figsupp1-data1-v1.xlsx"/>

Levels of headings

Only 4 allowed - HTML only allows for 5 and one of these is the article title

Kriya and process implications: Versioning and history information and dates

See 0bc1262

This new requirement has big implications for the workflow.

  • Exeter will need to know of ALL previous versions of PoA content and add them to the XML (dates of publication and URL links)
  • Kriya will require the addition of two new fields - publication date for PoA'd content (add the new VoR date)
  • Kriya will need a new field so production can add a note if publishing a new version of a VoR so they can indicate the changes made

Author contributions

Change made:
No initials or name added to start of footnote in XML.
HTML will not need to do anything to the content, just pull straight over.
PDF will require some formatting for display

Cross linking to funding - edit but still enabled for profiles

In author aff:
<xref ref-type="other" rid="par-1"/>

In funding info:
<funding-group> <award-group id="par-1"> <funding-source> <institution-wrap> <institution>Wellcome</institution> </institution-wrap> </funding-source> <principal-award-recipient> <name> <surname>Harrison</surname> <given-names>Melissa</given-names> </name> </principal-award-recipient> </award-group> </funding-group>

Will the xref be used to build Author profiles?
Can name in funding be used instead and remove the xref link?
We think that is better option for text and data mining - Melissa can verify with T&DM people

sub groups within author group?

Proposal:

<contrib contrib-type="author"> <collab>eLife Working Group <contrib-group> <role>Production Group</role> <contrib> <name> <surname>Shearer</surname> <given-names>Alistair</given-names> </name> <aff> <institution>eLife</institution>, <addr-line> <named-content content-type="city">Cambridge</named-content> </addr-line> <country>United Kingdom</country> </aff> </contrib> <contrib> <name> <surname>Caton</surname> <given-names>Hannah</given-names> </name> <aff> <institution>eLife</institution>, <addr-line> <named-content content-type="city">Cambridge</named-content> </addr-line> <country>United Kingdom</country> </aff> </contrib> </contrib-group> <contrib-group> <role>Technical Group</role> <contrib> <name> <surname>Harrison</surname> <given-names>Melissa</given-names> </name> <aff> <institution>eLife</institution>, <addr-line> <named-content content-type="city">Cambridge</named-content> </addr-line> <country>United Kingdom</country> </aff> </contrib> <contrib> <name> <surname>Gilbert</surname> <given-names>James</given-names> </name> <aff> <institution>eLife</institution>, <addr-line> <named-content content-type="city">Cambridge</named-content> </addr-line> <country>United Kingdom</country> </aff> </contrib> </contrib-group> </collab> </contrib>

Boxes and feature articles to be added

https://elifesciences.org/content/4/e05519
Box 1 : Text box with a figure and a table!
Box 2: Text box with two figures and a numbered list
Box 3: Simple text box
Box 4: Text box with bullet points

https://elifesciences.org/content/4/e06813
Box 1: Text box with section headings
Box 2: Simple text box

https://elifesciences.org/content/4/e09305
Box 1: Text box with numbered list
Box 2: Text box with bullet points

https://elifesciences.org/content/5/e16800
Box 1: Text box with numbered list and bold headings

Archive clean up?

For not going back to change the archives, that's good to know the opinion so far. The python parser we have right now can handle lots different sorts of JATS (like old PoA will have inline aff tags, going forward they have separate aff tags linked via xref tags). We'll need to keep this flexibility in the parser, and we can keep test scenarios that run over various XML files from old to new.

Position of figure source code/data

I think this was a HighWire requirement:

source code and source data for a figure was contained within the caption of that figure, but the figure supplement is not.
I have been playing with this and it is not necessary and it does not fit logically, to me.

My preference is:
<fig id="fig4" position="float"> <object-id pub-id-type="doi">10.7554/eLife.13222.010</object-id> <label>Figure 4.</label> <caption> <title>Single figure with source code.</title> </caption> <graphic xlink:href="elife-00666-fig4-v1"/> <p> <supplementary-material id="SD2-data"> <object-id pub-id-type="doi">10.7554/eLife.00666.011</object-id> <label>Figure 4—Source code 1.</label> <caption> <title>Title of the source code.</title> <p>Legend of the source code.</p> </caption> <media mimetype="xlsx" xlink:href="elife-00666-fig4-data1-v1.xlsx"/> </supplementary-material> </p> </fig>

But our current model is:
<fig id="fig3s1" position="float" specific-use="child-fig"> <object-id pub-id-type="doi">10.7554/eLife.00666.008</object-id> <label>Figure 3—figure supplement 1.</label> <caption> <title>Title of the figure supplement</title> <p> <supplementary-material id="SD1-data"> <object-id pub-id-type="doi">10.7554/eLife.00666.009</object-id> <label>Figure 3—figure supplement 1—Source data 1.</label> <caption> <title>Title of the figure supplement source data.</title> <p>Legend of the figure supplement source data.</p> </caption> <media mimetype="xlsx" xlink:href="elife-00666-fig3-figsupp1-data1-v1.xlsx"/> </supplementary-material> </p> </caption> <graphic xlink:href="elife-00666-fig3-figsupp1-v1"/> </fig>

Do you agree with me to change this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.