Giter Site home page Giter Site logo

metadata_converters's People

Contributors

johnjung avatar reyesj5 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

metadata_converters's Issues

Request for clarification regarding dc:creator

The spec says to pull dc:creator from the marc 100 field. Please confirm: should subfields a-z be concatenated together for one dc:creator per 100 field, or should I just use subfield a?

Images for MEPA?

When we have all of the images for MEPA, please let me know. I can assign each item an ARK and store image data in the pair tree. (ARK identifiers are also necessary for the final EDM output.)

DC converter and EDM documentation mismatch?

The spec includes the following:

<[NOID]/[path/to/providedCHO]>
For dc:coverage use MARCXML 651 fields with second indicator of 7 and subfield $2
with value “fast”. Eg:
651 7 |a Illinois |z Chicago |z Lower West Side. |2 fast
|0 http://id.worldcat.org/fast/01926123
Each subfield should generate a separate dc:coverage element.
dc:coverage [651 $a and $z subfields from MARCXML];

Right now, the DC metadata converter pulls coverage from any subfield in the 255.

In general, should all DC output for maps match the DC fields that are specified in this document? If so, I'll make a maps-specific subclass of the DC metadata converter.

add MarcXmlToOpenGraph

Look at the OpenGraph documentation. Website visitors may decide to share links to items from digital collections, e.g. an image from the Photoarchive. If they share these links via Facebook, the Library would like to provide OpenGraph metadata for Facebook's link creator.

Add a MarcXmlToOpenGraph class to classes.py. Look at MarcToDc and MarcXmlToSchemaDotOrg in that file for an example of how to set up the class, including the way that a mappings variable describes the way Twitter Card attributes map to MARC fields. Be sure to add Google style docstrings for Sphinx.

The str() method should output a series of meta tags that could be included in an HTML page- those meta tags should include values like these:

Be sure to add unit tests.

set MEPA ResourceMap creator to library or repository?

Currently, our social scientists maps EDM transformation outputs https://library.uchicago.edu as the dcterms:creator for a resource map- see https://ark.lib.uchicago.edu/ark:/61001/b2kg6jc3941j/file.ttl.

This matches the Social Scientists Maps EDM spec (https://docs.google.com/document/d/11QaNUMEtjp9DMkwkttvm1zMFF6rV1KrfVe2SYr1brVA/edit) but not the MEPA metadata template (https://docs.google.com/document/d/1Nrh31iFaqHX2rwFWF7lNA-2u6Ib2tXOpIUir6GP50DQ/edit), which specifies http://repository.lib.uchicago.edu/.

Should we update the MEPA metadata template to match the Social Scientists Maps template?

Add MarcXmlToTwitterCard

If website visitors decide to share a link to something from a digital collection, e.g. an image from the Photoarchive, via Twitter, the Library would like that tweet to include a Twitter Card to better promote our materials. See Twitter's documentation about cards and summary cards with large images for background information- because many of our materials include still images, large image cards make sense for us.

Add a MarcXmlToTwitterCard class to classes.py. Look at MarcToDc and MarcXmlToSchemaDotOrg in that file for an example of how to set up the class, including the way that a mappings variable describes the way Twitter Card attributes map to MARC fields. Be sure to add Google style docstrings for Sphinx.

The str() method should output a series of meta tags that could be included in an HTML page- those meta tags should include values like these:

<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:site" content="@UChicagoLibrary">
<meta name="twitter:title" content="...">
<meta name="twitter:description" content="...">
<meta name="twitter:image" content="http://example.com/example.jpg">
<meta name="twitter:image:alt" content="...">

Once the class is finished, be sure the output validates.

Be sure to add unit tests.

Namespace for SiteInstitutionBuilding?

The spec contains the following note:

_VCat has a custom Location field with type Repository for “SiteInstitutionBuilding ID”. For all objects in this collection, the ID value will be “5136” which resolves to a string of “Special Collections Research Center, University of Chicago Library”.

I don't see the string "5136" in my sample data...do we have a namespace for this identifier to make it globally unique? My preference is to use the "Special Collections..." string otherwise.

MEPA SPEC: use of created/modified triples

In the MEPA spec, aggregations have dcterms:created and dcterms:modified triples, resource maps have dcterms:created triples, and cultural heritage objects have neither.

In the Maps spec, aggregations have both, resource maps have both, and cultural heritage objects have neither.

Can we regularize this? Should we add a dcterms:modified triple to resource maps for MEPA? Is it still correct to omit created and modified dates for the cultural heritage objects themselves?

exclude numeric subfields by default

Find a compact, easy to maintain way to describe the crosswalk when numeric subfields are excluded by default. Make possible subfields a string, and break into a list like [c for c in 'abcdefg']?

DC converter for MEPA?

Is it useful to have a separate VRACore to DC converter, analogous to the MARC to DC converter for the social scientists maps? (the spec contains a reference to the MARC to DC converter.)

Definitive list of MEPA items

Because the EDM triples include references to ARK identifiers, it would be helpful to have a complete list of the items in MEPA so that I can assign those identifiers as soon as possible. Doing this will help me create and maintain the relationships between entities in our triples.

Request for clarification regarding dc:description

The dc:description field is spec'ed out differently in different places. In the base MARC-to-DC transform, and in the maps-specific transform, should every 5xx$a-$z field and subfield be concatenated into a single dc:description element? (Alternatively, should each 5xx produce an individual dc:description? Or should just the 500 produce a dc:description?)

Mismatch between spec and place types in database output

The MEPA metadata template spec lists the following place-related predicates for a CHO:

vra:Continent "[FMField:Continent]";
vra:Country "[FMField:Country]";
vra:State "[FMField:State/Region]";
vra:City "[FMField:City or County]";
vra:Place "[FMField:Site OR Building]";

This doesn't quite line up with what I'm getting from work elements in the VRACore output from filemaker:

<location type="creation">
<name type="geographic" vocab="TGN" refid="7001188" extent="inhabited place">Alexandria</name>
<name type="geographic" vocab="TGN" refid="7001413" extent="region">Urban</name>
<name type="geographic" vocab="TGN" refid="7016833" extent="nation">Egypt</name>
<name type="geographic" vocab="TGN" refid="7001242 " extent="continent">Africa</name>
</location>

<location type="creation">
<name type="geographic" vocab="TGN" refid="7002543" extent="inhabited place">İzmir</name>
<name type="geographic" vocab="TGN" refid="1001053" extent="province">İzmir</name>
<name type="geographic" vocab="TGN" refid="1000144" extent="nation">Turkey</name>
<name type="geographic" vocab="TGN" refid="1000004" extent="continent">Asia</name>
</location>

Sample data records all contain continents and nations. Some contain provinces. All contain "inhabited places", and none seem to contain sites or buildings. The data contains references to "regions", which don't appear in the output.

I assume we should omit triples for province/state and site or building where they don't occur in the data. Should I map inhabited place to City or County, in our output?

Need updated DC spec

dbietila is working on a spec for an updated Marc-to-Dublin-Core transform for the social scientists maps: this should override the base Marc-to-DC transform as little as possible.

DC fields that appear in linked data triples will be updated after this spec is complete. Questions for specific DC fields are included below:

dc:creator

As per dbietila, creator should include alpha subfields from Marc 100, 110, and 111. The base transform currently includes a line to bring in 245c as well. Should this stay, or should it be removed?

dc:description

The dc:description field is spec'ed out differently in different places. In the base MARC-to-DC transform, and in the maps-specific transform, should every 5xx$a-$z field and subfield be concatenated into a single dc:description element? (Alternatively, should each 5xx produce an individual dc:description? Or should just the 500 produce a dc:description?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.