Giter Site home page Giter Site logo

schema's Introduction

DataCite Schema Repository

Release

This repository holds the official metadata schemas from DataCite as required by the DataCite Metadata Store.

It contains the schemas itself along with examples and documentation.

Schemas

Each schema has its own folder under /source/meta e.g. /source/meta/kernel-2.0/. This directory is allowed to contain only one xsd. The directory structure is as follow:

/source/meta/{schema-name}/{filename}.xsd   root xsd
/source/meta/{schema-name}/include/         referenced xsd files
/source/meta/{schema-name}/example/         example xml files
/source/meta/{schema-name}/doc/             documentation

The /source/meta directory will be published at http://schema.datacite.org, e.g.

http://schema.datacite.org/meta/kernel-2.0/metadata.xsd

Feedback

If you have any questions about the metadata schema, please contact [email protected].

If you have an idea for a change to the DataCite Metadata Schema, let us know through this form: DataCite Metadata Schema Suggestions.

We recommend first reviewing the ideas on the DataCite Metadata Schema Trello board to see if a similar idea has already been proposed. If it has, you can contribute to the discussion by following the instructions on the card.

Learn more about contributing to the DataCite Metadata Schema here: DataCite Schema - Contribute.

Tests

There are tests to check the directory structure, existence of examples, validity of the schemas, and validity of the examples.

You can execute the tests via

rspec

schema's People

Contributors

actions-user avatar codycooperross avatar daslerr avatar dependabot[bot] avatar digitaldogsbody avatar kellystathis avatar kjgarza avatar koelnconcert avatar lnielsen avatar mark-saeon avatar mfenner avatar nichtich avatar richardhallett avatar stefanjakobsson avatar svogt0511 avatar tmorrell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

schema's Issues

Need a distribution concept

Since sizes and formats (and perhaps language, versions, descriptions...) are specific to different distributions, it would make a much better metadata scheme if there were a distributions metadata element. I.e. align with DCAT...

integrate testing of non-valid example

Currently we check if provided examples for a schema are valid. It might be useful to also test some examples which are not valid. This will make the tests more accurate.

ORCID representation in DataCite XML

Hi All,

I wanted to ask your opinion on 2 issues related to the way ORCIDs are represented in DataCite XML files.

The examples provided on the Datacite website (e.g. https://schema.datacite.org/meta/kernel-4.1/example/datacite-example-full-v4.1.xml ) seem to indicate that the ORCID value needs to be represented in the dddd-dddd-dddd-dddd format.

However, the new ORCID guidance (https://support.orcid.org/knowledgebase/articles/116780) says that:

When stored, the ORCID iD should be expressed as a full https URI: https://orcid.org/xxxx-xxxx-xxxx-xxxx, complete with the protocol (https://), and with hyphens in the identifier (xxxx-xxxx-xxxx-xxxx).

Is the value of the ORCID field in the Datacite XML going to continue being expressed in the current format, or do you reckon that the new format suggested by ORCID will be adopted?

The current schemeURI is "http://orcid.org/" - will it be changed to 'https://orcid.org/'? Can the https URI be used?

Thank you,
Michele

allow optional wrapper elements to be empty

it should be allowed to leave optional wrapper elements like <contributors/> empty.

The benefit: It makes converting to our schema e.g. via xslt more straight forward, because you don't need to handle different cases. And having empty elements it totally harmless.

add more tests

  • has index.html
  • has documentation
  • html pages valid
  • test for oai schema
  • link checker

Update Citation in Documentation

The citation-example should be done with the URL version of the DOI to be consistent with the DataCite homepage and the reccomendation of CrossRef.

use another schemalocation for xml namespace

http://www.w3.org/2001/xml.xsd is not static but may change in the future. Also currently it takes about 30 seconds to retrieve the xsd, which causes jenkins to take way too long for executing the tests. This might also causes problems with MDS.

We should use http://www.w3.org/2009/01/xml.xsd. You can find more information there.

Validation of identifier

Hi,

I'm submitting this issue in my role as co-chair of the RDA Research Data Repository Interoperability WG. We plan to use datacite to provide minimal, standardized metadata for our recommendation for an interoperable, BagIt-based exchange format for digital content between repository platforms.

One possible concern about using datacite, the necessity of a DOI, came up during one of our virtual meetings. As we are not focussing solely on published datasets the presence of a DOI cannot be guaranteed, probably for the majority of packages created according to our recommendations. Luckily, the datacite schema documentation states that if [...]one of the
required properties is unavailable[...] one should [...]use one of the standard (machine‐recognizable) codes listed in Appendix 3[...] (see Section 2.3).

However, according to the XSD schema this seems not to apply to the identifier. That's why I wanted to ask if this is a bug/feature in the schema implemenation or a misinterpretation/inaccuracy of the schema documentation?

Thanks in advance for the clarification.

Regards,
Thomas

Nokogiri version out of date

Potential security vulnerability we need update the gem file for nokogiri to something greater than 1.8.1
gem 'nokogiri', '~> 1.8.1

Need to check libraries that depend on this, might need to upgrade them too.

Add ID to controlled list entries

Mail from Alex Ball, on 2012-02-16:

DataCite Dublin Core Application Profile (DC2AP) ready for release, and a
couple of them involve the controlled vocabularies used for some DataCite
schema properties. The vocabularies in question are contributorType and
resourceType.

I was experimenting with reproducing them in RDF, but a far more sustainable
solution would be to reuse the XSDs you are already producing. According to
this document:
http://www.w3.org/TR/swbp-xsch-datatypes/
it would be very easy to do this if you were to add ID attributes to the
simpleType elements in your included XSDs, e.g.

xs:simpleType id="contributorType" name="contributorType"

and

xs:simpleType id="resourceType" name="resourceType"

Chage wording in documentation

Paragraph 1.2: Change "The first is the release of a second version of the schema, this one being in a Dublin Core application profile format" to: "The first is the release of a second version of the schema, in a Dublin Core application profile format"

Adressing version issues in the documentation

a. Change wording of “Allowed values, examples and other constraints” column for Version to accommodate major.minor tracking; consider using some language from http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines#Note_on_Versioning_and_Locators

b. Add text to introduction regarding dynamic datasets

c. Make sure that one of the example citations includes a version (and this needs to be in sync with website)

Enumerations should reflect the correct casing used in xsd.

When validating DataCite XML against our schema XSD, sometimes errors are generated that are confusing because they do not take into account case sensitivity.

e.g.

"[facet 'enumeration'] the value 'personal' is not an element of the set {'organizational', 'personal'}. at line 6, column 0"

Which of course looks werird, the problem here is the enumeration for nameType here is actually "Organisation" and "Personal"

Use of xsd:all vs. xsd:sequence

Current XSD for v3.1 makes use of xsd:sequence instead of xsd:all for e.g. creator properties. This means that the elements (creatorName, nameIdentifier and affiliation) must come in the exact order specified by the schema.

This means that the following is valid according to the schema:

<creator>
  <creatorName>Smith, John</creatorName>
  <nameIdentifier schemeURI="http://orcid.org" nameIdentifierScheme="ORCID">0000-1234-5678-0000</nameIdentifier>
  <affiliation>John College</affiliation>
</creator>

but that this is not (changed order of affiliation/nameIdentifier):

<creator>
  <creatorName>Smith, John</creatorName>
  <affiliation>John College</affiliation>
  <nameIdentifier schemeURI="http://orcid.org" nameIdentifierScheme="ORCID">0000-1234-5678-0000</nameIdentifier>
</creator>

Here's a full Python example that demonstrates the problem:

# Require lxml so make sure to first install lxml using "pip install lxml"
from lxml import etree

xsd = etree.XMLSchema(file='http://schema.datacite.org/meta/kernel-3/metadata.xsd')

doc1 = """
<resource xmlns="http://datacite.org/schema/kernel-3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd">
  <identifier identifierType="DOI">10.1234/foo</identifier>
  <creators>
    <creator>
      <creatorName>Smith, John</creatorName>
      <affiliation>John College</affiliation>
      <nameIdentifier schemeURI="http://orcid.org" nameIdentifierScheme="ORCID">0000-1234-5678-0000</nameIdentifier>
    </creator>
  </creators>
  <titles>
    <title>Dataset name</title>
  </titles>
  <publisher>Somewhere</publisher>
  <publicationYear>2016</publicationYear>
</resource>
"""

doc2 = """
<resource xmlns="http://datacite.org/schema/kernel-3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd">
  <identifier identifierType="DOI">10.1234/foo</identifier>
  <creators>
    <creator>
      <creatorName>Smith, John</creatorName>
      <nameIdentifier schemeURI="http://orcid.org" nameIdentifierScheme="ORCID">0000-1234-5678-0000</nameIdentifier>
      <affiliation>John College</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Dataset name</title>
  </titles>
  <publisher>Somewhere</publisher>
  <publicationYear>2016</publicationYear>
</resource>
"""

# Invalid doc
xsd.assertValid(etree.XML(doc1))
# Valid doc
xsd.assertValid(etree.XML(doc2))

affiliation xml:lang attribute

Not entirely sure this is the right place to bring this up, but it is related to the DataCite schema. When trying to update records (using DataCite schema 4.1), we are now encountering errors regarding the contributor and creator affiliations. We would previously have two values with an xml:lang attribute on each, for example:
<affiliation xml:lang="en">National Research Council Canada</affiliation>
<affiliation xml:lang="fr">Conseil national de recherches Canada</affiliation>

This no longer validates with the recent updates. Is this intended? Should the update be made just to strip the xml:lang attribute (see below), or is there another/better way to convey this?
<affiliation>National Research Council Canada</affiliation>
<affiliation>Conseil national de recherches Canada</affiliation>

Thank you,
Marc Dion

Some errors in 4.1 changelog and docs

  1. In the 4.1 changelog (both on the website and the PDF) it's stated that Addition of new “dateInformation” subproperty for dateType - it should be a subproperty of "Date".

  2. The website says: Addition of a new optional “nameType” attribute for Creator and Contributor. Controlled list: personal, organizational, while in the PDF docs:
    Addition of a new optional attribute for creatorName and ContributorName: nameType. It's not clear whether it's a subproperty of the top-level fields (Creator/Contributor) or the corresponding creator/contributorName subproperties. From the table it's also not clear: Under 7 Contributor, the nameType looks like a direct sub-property: 7.7 nameType, however under 2 Creator it looks like a sub-property of 2.1 creatorName: 2.1.1 nameType. I guess it should be either of those and kept consistent between the Contributor/Creator fields? From the schema it looks like nameType is a subproperty of the creator/contributorName.

  3. Addition of optional lang attribute to Rights property - "lang" is missing from the table. Misunderstood this point.

Documentation has problem with foot notes

Not sure if this is place to report it, but the current 3.0 PDF (Jan 2014) has a problem with the foot notes.

Page 18: Footnote 20 refered to from 18.1 geoLocationPoint is missing (supposedly it's the same as 23). Also footnote 19 is missing (referred to from the text), as well as foot notes 21, 22 (couldn't find them in the text).

better pattern in doiType

The currently used pattern [1][0][/.].* matches 10/foo. Probably [1][0][.].* or 10\..* was meant?!

I propose a slightly more accurate one: 10\..+/.+. This also checks for general prefix/suffix structure.

Inconsistent order of subproperties 7.3 familyName and 7.4 givenName

There is an inconsistency in the order of subproperties between Creator and Contributor: creator has 2.2 givenName and 2.3 familiyName (e.g. givenName comes before familyName), whereas contributor has 7.3 familyName and 7.4 givenName. This is relevant because XML schema definition is sensitive to the order of elements, e.g. for a XML serialization, the subproperties must appear in the proper order in the XML file, otherwise validation will fail. In fact, the XSD file lists givenName before familyName in the subelements of contributor.

Admitted, nothing in the documentation claims that the numbering of subproperties would indicate the proper order to produce the elements in the XML file. But it is at least unfortunate to suggest a wrong order by the numbering.

change <br> in XDS for version 4.2

I forward an issue on behalf of a client who encountered some difficulties and misunderstanding related to the Description field and the use of the
field/code therein. Of course, I already explained that “br” is not a real input field and should be empty. It’s stated in the XSD as:

Front logo Front conversations

schemaLocation in example should be fully qualified

Currently it is:

xsi:schemaLocation="http://schema.datacite.org/oai/oai-1.0/ oai_datacite.xsd"

it should be:

xsi:schemaLocation="http://schema.datacite.org/oai/oai-1.0/ http://schema.datacite.org/oai/oai-1.0/oai.xsd"

Typo in Table 4, Property 12

There is a typo in the documentation of the schema in Table 4 Expanded DataCite Recommended and Optional Properties for Property 12.2 relationType: the column Allowed values, examples, other constraints contains a controlled list of values. One of these values reads IsDescribed by. I believe, it should rather read IsDescribedBy (note the embedded space and the small "b" in "by").

JSON schema: geoLocationPolygons

the JSON schema allows a geoLocationPolygons arrray to consist of a single inPolygonPoint. Is this intentional to distiguish a point-located resource from a polygon-located resource for which only the polygon centroid is reported?

DataCite Schema 4.3

Update XSD, examples, tests, and schema website for schema 4.3. Changes include:

  • Addition of optional "affiliationIdentifier", "affiliationIdentifierScheme", and "schemeURI" for affiliation
  • Addition of optional "schemeURI" for funderIdentifier
  • Addition of "ROR" to allowed values for funderIdentifierType

purpose of LastMetadataUpdate and MetadataVersionNumber?

From the current documentation:

In addition to the metadata that submitters supply with registrations and updates, there are two
administrative metadata properties that the managing agency will assign to each DataCite metadata
record, shown in Table 3. These properties convey the date on which the metadata description was
stored by DataCite (LastMetadataUpdate) and a sequence number assigned to the metadata
description by DataCite (MetadataVersionNumber).

This is not what we're currently doing. We do have such fields in MDS, but directly in the database. We do not fill the fields in the metadata itself atm and this is not planned. We have to discuss if we really need it inside the metadata or not... At least it led to some confusion, because the documentation is inaccurate.

If you use OAI to harvest our metadata, there is a timestamp field holding the value of our LastMetadataUpdate. So it can be used, although it is not in our metadata directly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.