Giter Site home page Giter Site logo

scriptotek / mc2skos Goto Github PK

View Code? Open in Web Editor NEW
21.0 8.0 4.0 286 KB

Command line script for converting Marc21 Classification and Authority records to SKOS/RDF

License: The Unlicense

Python 100.00%
marc21 converter python marc21classification skos rdf marc21authority

mc2skos's Introduction

Build status Test coverage Code health Latest version MIT license

Python script for converting MARC 21 Classification and MARC 21 Authority records (serialized as MARCXML) to SKOS concepts.

Initially developed to support the project "Felles terminologi for klassifikasjon med Dewey", for converting Dewey Decimal Classification (DDC) records. Issues and suggestions for generalizations and improvements are welcome!

See mapping schema for MARC21 Classification and for MARC21 Authority below.

Installation

Releases can be installed from the command line with pip:

$ pip install --upgrade mc2skos             # with virtualenv or as root
$ pip install --upgrade --user mc2skos      # install to ~/.local
  • Works with both Python 2.7 and 3.4+. See Travis for details on tested Python versions.
  • If lxml fails to install on Windows, try the windows installer from from PyPI.
  • If lxml fails to install on Unix, install system packages python-dev and libxml2-dev
  • Make sure the Python scripts folder has been added to your PATH.

To directly use a version from source code repository:

$ git clone https://github.com/scriptotek/mc2skos.git
$ cd mc2skos
$ pip install -e .

Usage

mc2skos infile.xml outfile.ttl      # from file to file
mc2skos infile.xml > outfile.ttl    # from file to standard output

Run mc2skos --help or mc2skos -h for options.

URIs

URIs are generated automatically for known concept schemes, identified from 084 $a for classification records and from 008[11] / 040 $f for authority records. To list known concept schemes:

$ mc2skos -l

To add more vocabularies, you can edit vocabularies.yml. Pull requests for adding more vocabularies are very welcome!

URIs can be also be generated on the fly from an URI template specified with option --uri. The following template parameters are recognized:

  • {control_number} is the control number from 001, 010 or 016. The current approach is to use 010 or 016 if defined, otherwise 001. If you find examples where this approach fails, please add them to [#42](#42).
  • {collection} is "class", "table" or "scheme"
  • {object} is a member of the classification scheme and part of a {collection}, such as a specific class or table. Spaces in the URI are replaced by hyphens or another character configured with option --whitespace.
  • {edition} is taken from 084 $c (with language code stripped)

To add skos:inScheme statements to all records, an URI template can be specified with option --scheme. Otherwise, it will be derived from a default template if the concept scheme is known.

To add an additional skos:inScheme statement to table records, an URI template can be specified with option --table_scheme. Otherwise, it will be derived from a default template if the concept scheme is known.

The following example is generated from a DDC table record:

<http://dewey.info/class/6--982/e21/> a skos:Concept ;
    skos:inScheme <http://dewey.info/scheme/edition/e21/>,
                  <http://dewey.info/table/6/e21/> ;
    skos:notation "T6--982" ;
    skos:prefLabel "Chibchan and Paezan languages"@en .

Mapping schema for MARC21 Classification

Only a small part of the MARC21 Classification data model is converted, and the conversion follows a rather pragmatic approach, exemplified by the mapping of the 7XX fields to skos:altLabel.

MARC21XML RDF
001 Control Number (see note above on 001, 010 & 016) dcterms:identifier
005 Date and time of latest transaction dcterms:modified
008[0:6] Date entered on file dcterms:created
008[8]="d" or "e" Classification validity owl:deprecated
010 Control Number (see note above on 001, 010 & 016) dcterms:identifier
016 Control Number (see note above on 001, 010 & 016) dcterms:identifier
153 $a, $c, $z Classification number skos:notation
153 $j Caption skos:prefLabel
153 $e, $f, $z Classification number hierarchy skos:broader
253 Complex See Reference skos:editorialNote
353 Complex See Also Reference skos:editorialNote
680 Scope Note skos:scopeNote
683 Application Instruction Note skos:editorialNote
684 Auxiliary Instruction Note skos:editorialNote
685 History Note skos:historyNote
700 Index Term-Personal Name skos:altLabel
710 Index Term-Corporate Name skos:altLabel
711 Index Term-Meeting Name skos:altLabel
730 Index Term-Uniform Title skos:altLabel
748 Index Term-Chronological skos:altLabel
750 Index Term-Topical skos:altLabel
751 Index Term-Geographic Name skos:altLabel
753 Index Term-Uncontrolled skos:altLabel
765 Synthesized Number Components mads:componentList (see below)

Synthesized number components

Components of synthesized numbers explicitly described in 765 fields are expressed using the mads:componentList property, and to preserve the order of the components, we use RDF lists. Example:

@prefix mads: <http://www.loc.gov/mads/rdf/v1#> .

<http://dewey.info/class/001.30973/e23/> a skos:Concept ;
    mads:componentList (
        <http://dewey.info/class/001.3/e23/>
        <http://dewey.info/class/1--09/e23/>
        <http://dewey.info/class/2--73/e23/>
    ) ;
    skos:notation "001.30973" .

Retrieving list members in order is surprisingly hard with SPARQL. Retrieving ordered pairs is the best solution I've come up with so far:

PREFIX mads: <http://www.loc.gov/mads/rdf/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?c1_notation ?c1_label ?c2_notation ?c2_label
WHERE { GRAPH <http://localhost/ddc23no> {

    <http://dewey.info/class/001.30973/e23/> mads:componentList ?l .
        ?l rdf:rest* ?sl .
        ?sl rdf:first ?e1 .
        ?sl rdf:rest ?sln .
        ?sln rdf:first ?e2 .

        ?e1 skos:notation ?c1_notation .
        ?e2 skos:notation ?c2_notation .

        OPTIONAL {
            ?e1 skos:prefLabel ?c1_label .
        }
        OPTIONAL {
            ?e2 skos:prefLabel ?c2_label .
        }
}}
c1_notation c1_label c2_notation c2_label
"001.3" "Humaniora"@nb "T1--09" "Historie, geografisk behandling, biografier"@nb
"T1--09" "Historie, geografisk behandling, biografier"@nb "T2--73" "USA"@nb

Additional conversion rules for WebDewey data

The script comes with a few extra rules for distinguishing between different types of notes in WebDewey records and extract entities from these. The entity extraction rules (marked with [*] below) utilizes a non-standard namespace and are not enabled by default. Specify the --webdewey flag to use them.

MARC21XML RDF
680 having $9 ess=ndf Definition note skos:definition
680 having $9 ess=nvn Variant name note wd:variantName [*] for each subfield $t
680 having $9 ess=nch Class here note wd:classHere [*] for each subfield $t
680 having $9 ess=nin Including note wd:including [*] for each subfield $t
680 having $9 ess=nph Former heading wd:formerHeading [*] for each subfield $t
694 having $9 ess=nml ??? SKOS.editorialNote
7XX having $9 ess=isCaption Relative index term to use as caption skos:prefLabel

Notes that are currently not treated in any special way:

  • 253 having $9 ess=nsx Do-not-use.
  • 253 having $9 ess=nce Class-elsewhere
  • 253 having $9 ess=ncw Class-elsewhere-manual
  • 253 having $9 ess=nse See.
  • 253 having $9 ess=nsw See-manual.
  • 353 having $9 ess=nsa See-also
  • 683 having $9 ess=nbu Preference note
  • 683 having $9 ess=nop Options note
  • 683 having $9 ess=non Options note
  • 684 having $9 ess=nsm Manual note
  • 685 having $9 ess=ndp Discontinued partial
  • 685 having $9 ess=nrp Relocation
  • 689 having $9 ess=nru Sist brukt i...

Mapping schema for MARC21 Authority

Only a small part of the MARC21 Authority data model is converted.

MARC21XML RDF
001 Control Number dcterms:identifier
005 Date and time of latest transaction dcterms:modified
008[0:6] Date entered on file dcterms:created
065 Other Classification Number skos:exactMatch (see below)
080 Universal Decimal Classification Number skos:exactMatch (see below)
083 Dewey Decimal Classification Number skos:exactMatch (see below)
1XX Headings skos:prefLabel
4XX See From Tracings skos:altLabel
5XX See Also From Tracings skos:related, skos:broader or skos:narrower (see below)
667 Nonpublic General Note skos:editorialNote
670 Source Data Found skos:note
677 Definition skos:definition
678 Biographical or Historical Data skos:note
680 Public General Note skos:note
681 Subject Example Tracing Note skos:example
682 Deleted Heading Information skos:changeNote
688 Application History Note skos:historyNote
7XX Heading Linking Entries skos:xxxMatch (see below)

Notes:

  • Mappings are generated for 065, 080 and 083 only if an URI pattern for the classification scheme has been defined in the config.
  • SKOS relations are generated from 5XX fields if the fields contain a $0 subfield containing either a control number or an URI for the related record. The relationship type is skos:broader if $w=g, skos:narrower if $w=h, and skos:related otherwise. If $w=r and $4 contains an URI, that URI is used as the relationship type. Note that $4 must precede $0 (since both subfields can be repeated).
  • Mappings/relationships are generated for 7XX headings if the fields contain a $0 subfield containing either the control number or the URI of the related record. If $0 contains a control number, an URI pattern for the vocabulary (found in indicator 2 or $2) must be defined in mc2skos.record.CONFIG. If $4 contains an URI, that URI is used as the relationship type. Otherwise, if $4 contains one of the ISO 25964 relations, the corresponding SKOS relation is used. Otherwise, the default value skos:closeMatch is used. Note that $4 must precede $0 (since both subfields can be repeated).

mc2skos's People

Contributors

captsolo avatar danmichaelo avatar nichtich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mc2skos's Issues

Support entity types

GND has entity types such as work, person, organization... in field 075. This should be mapped to additional RDF statements with rdf:type to support selecting authority records by type of entity.

The GND entity types can be found in GND ontology, e.g. gndgen:

Include $d from 1XX headings

As long as it's common tradition to include the $d dates in presentation, I guess it makes sense to include them in the SKOS labels as well. For instance, by using "Schneider, Birgit (1971–)" as the label for 03470a3#diff-69bc818a87939d320f5974936d9d818d

Of course, there's the usual complications related to ISBD punctuation. We have to handle both of these:

$aSchneider, Birgit$d1971–
$aSchneider, Birgit,$d1971–

Whether ISBD punctuation is used can be defined in LDR/18, but it seems like LC doesn't necessarily do that: http://id.loc.gov/authorities/names/nb2003041477.marcxml.xml , so I guess we have to resort to testing if there is a comma or semicolon at the end of $a..

Support URIs in fld 856

Currently, mc2skos does not export URIs that appear in field 856.

  <datafield tag="856" ind1="4" ind2="0">
    <subfield code="u">http://viaf.org/viaf/113418055</subfield>
    <subfield code="y">VIAF ID</subfield>
  </datafield>

Would it be possible to add support for exporting these URIs (when ind2 is 0, indicating that this is the same resource elsewhere)?

I could try adding it myself if someone could tell me what files / functions / ... need to be modified.

Support LC Names and Genre/Form Terms

Both use 008/11="a" (Library of Congress Subject Headings), but LCGNF (example record) also uses 040 $f="lcgft", so it can be detected in the normal way.

LCCN (example record, however, is missing 040 $f, so mc2skos currently use the URI pattern of LCSH and assigns skos:inScheme <http://id.loc.gov/authorities/subjects> rather than skos:inScheme <http://id.loc.gov/authorities/names>

How to manage multilingual labels

Is there a way to put labels in multiple languages into one MARC record, e.g. repeat field 153? If not, should mc2skos provide a method to compare and merge multiple MARC files of the same classification in different languages?

Include examples other than DDC

MARC 21 Classification format specification includes examples of Library of Congress Classification (LCC), National Library of Medicine Classification (NLM) and Universal Decimal Classification (UDC). The latter is an interesting but also complex use case next to DDC. An outline of UDC has been published as Linked Data for comparision. Next UDC Seminar will be 2017, an opportunity to present and discuss. I suppose that UDC in RDF is created from SQL but we can ask about availability of its data in MARCXML.

AttributeError when mapping 5XX fields

Program fails with an AttributeError when mapping 5XX fields in case if subfield "4" is not present:

  File "virtual-env/to-SKOS/lib/python3.7/site-packages/mc2skos/record.py", line 597, in __init__
    super(AuthorityRecord, self).__init__(record, options)
  File "virtual-env/to-SKOS/lib/python3.7/site-packages/mc2skos/record.py", line 70, in __init__
    self.parse(options or {})
  File "virtual-env/to-SKOS/lib/python3.7/site-packages/mc2skos/record.py", line 691, in parse
    elif sf_w == 'r' and is_uri(sf_4):
  File "virtual-env/to-SKOS/lib/python3.7/site-packages/mc2skos/util.py", line 2, in is_uri
    return value.startswith('http://') or value.startswith('https://')
AttributeError: 'NoneType' object has no attribute 'startswith'

Would it be possible to "fall back" to some default scheme (e.g. the scheme supplied in cmd line parameters) instead of failing with error in cases when subfield "0" does not have a URI and there is no subfield "4"?

Support converting GND authority file record

GND records provided by the German National Library (e.g. http://d-nb.info/1020118989/about/marcxml) cannot be converted because:

  • Field 008[11] is set to n (Not applicable) instead of z (Other)
  • gnd is not specified in 040$f

The first may be fixed in mc2skos by also allowing n but the second needs to be done by German National Library.

After both changes (an example is included in https://github.com/gbv/mc2skos/blob/gnd/examples/gnd-1020118989.xml), the converted RDF contains an empty string triple:

skos:note ""@de ;

This should better be fixed in mc2skos.

Finally GND records contain mappings via field 024 to be converted too (#55).

Include tables as concept (sub)schemes

According to Mitchell et al (2013) tables have an URI of their own, so in addition to

<http://dewey.info/class/2--162/> skos:inScheme <http://dewey.info/scheme/>

this should also be stated:

<http://dewey.info/class/2--162/> skos:inScheme <http://dewey.info/table/2/>

Tables are part of the full scheme but table entries are never topConceptOf the DDC. Unfortunately we cannot check dewey.info how it was originally modelled.

Language tags

Atm. we have @nb hardcoded. We could perhaps use the value from 040 $b, and default to English.

I'm a bit puzzled by the definition for 040 $b in Marc21 Classification though: “MARC code for the language used in the textual portions of the record in the Note fields (6XX).” (https://www.loc.gov/marc/classification/cd040.html). Wonder why they chose to limit it to note fields.. In Marc21 Bibliographic the definition is more general.

For converting from ISO 639-2 to 639-1 I've used iso-639 before.

New release with CHANGES.md

@danmichaelo could you release a new version including the just merged addition of rvk, please? I'd also include a CHANGES.md, just copy this:

## 0.10.2 (2018-0...

* Natively support processing of RVK

## 0.10.1 (2018-04-12)

## 0.10.0 (2018-04-05)

## 0.9.0 (2018-01-28)

## 0.8.0 (2017-12-04)

## 0.7.3 (2017-10-29)

## 0.7.2 (2017-10-16)

## 0.7.1 (2017-07-26)

## 0.7.0 (2017-07-22)

## 0.6.0 (2017-06-28)

## 0.5.1 (2017-02-10)

## 0.5.0 (2017-02-09)

## 0.4.0 (2017-01-17)

## 0.3.1 (2017-08-15)

* First release as package

Add support of MARC authority records format

MADS is used for authority files other than classification systems. There are mappings from MARC 21 Format for Authority Data to MADS and from MADS to RDF (http://www.loc.gov/standards/mads/rdf/) with further mappings from mads-rdf to skos-rdf so the conversion should not be difficult.

I also found this tool to convert MADS to SKOS, but it does not seem to be maintained anymore. Having one reliable tool for all MARC formats to SKOS would be better then the current variety of distributed scripts.

Some authority files that provide MARC data for testing (some of them only single records for human inspection):

153 parsing failure

153 $a 469.9 $e 469 $g * $j Galisisk $i [tidligere $x 469.71 $c 469.72 $i , $x 469.794 $i ]

gives notation $469.9-469.72

Need to take into account that the last $c does not follow $a.

Missing broader relations from GND

The GND record https://d-nb.info/040034232/about/marcxml contains field 550 with multiple $0 and $4 which are interpreted differently:

    <datafield tag="550" ind1=" " ind2=" ">
      <subfield code="0">(DE-101)965844773</subfield>
      <subfield code="0">(DE-588)4711780-1</subfield>
      <subfield code="0">http://d-nb.info/gnd/4711780-1</subfield>
      <subfield code="a">Moderne Physik</subfield>
      <subfield code="4">obal</subfield>
      <subfield code="4">http://d-nb.info/standards/elementset/gnd#broaderTermGeneral</subfield>
      <subfield code="w">r</subfield>
      <subfield code="i">Oberbegriff allgemein</subfield>
    </datafield>

In mc2skos 0.11.0 this is ignored. The documentation says "$4 must precede $0 (since both subfields can be repeated)" but this is not true. I found a related bug in the code but it will only work if URIs are preferred when subfield $0 (and/or $4) is repeated. This will work (subfields to be ignored commented out):

    <datafield tag="550" ind1=" " ind2=" ">
      <!--subfield code="0">(DE-101)965844773</subfield-->
      <!--subfield code="0">(DE-588)4711780-1</subfield-->
      <subfield code="0">http://d-nb.info/gnd/4711780-1</subfield>
      <subfield code="a">Moderne Physik</subfield>
      <!---subfield code="4">obal</subfield-->
      <subfield code="4">http://d-nb.info/standards/elementset/gnd#broaderTermGeneral</subfield>
      <subfield code="w">r</subfield>
      <subfield code="i">Oberbegriff allgemein</subfield>
    </datafield>

Decide on URI format from dewey.info

*related to #5 and #2 *

In my opinion the default (if no option --scheme was given) should be to emit URIs as they were given at dewey.info. To document the choice of URIs I wrote the following paragraph. I am posting it here to discuss whether we can agree on this choice.


DDC URI format

The choice of common URIs is the most important decision because it allows connection of data from multiple sources. The URIs for DDC, as described by Mitchell and Panzer (2013), follow a general pattern with URIs for classes, tables, table numbers, and editions among other parts. URIs for classes and table numbers can further be refined with an edition or date (e.g. http://dewey.info/class/641/2009/, http://dewey.info/class/641/e22/, and http://dewey.info/class/641/e23/2012-08/) and some URIs have an alternative form (e.g. http://dewey.info/class/T2--162/). To minimize the number of possible URIs, only the following URIs should be used:

Note that language editions have no distinct URIs on purpose. The URI format for particular editions of DDC only works until the 23rd edition because this will probably be the last. When OCLC moves from editions to another numbering scheme, this needs to be reflected in an update of this guideline.


Use URIs from 7XX subfield $1

Would it be possible to export relations based on 7XX subfield $1 ?

We are using it to link our authority records to LCSH and subfield $1 looked like a good choice where to record LCSH URI. Unfortunately, mc2skos currently does not handle URIs in this subfield.

Example data:

  <datafield tag="750" ind1=" " ind2="0">
    <subfield code="a">Gods, Greek, in art</subfield>
    <subfield code="1">http://id.loc.gov/authorities/subjects/sh85055623</subfield>
    <subfield code="u">http://id.loc.gov/authorities/subjects/sh85055623</subfield>
    <subfield code="2">LCSH</subfield>
    <subfield code="4">N</subfield>
  </datafield>

Don't stop processing if captions are found in 153

In https://github.com/scriptotek/mc2skos/blob/master/mc2skos/record.py#L513 processing of field 153 is stopped when a caption is found. I have some classification records with subfields j followed by e so they cannot be processed. I changed in my branch to

if code in ['h', 'j', 'k', '6', '8']:
    # Ignore captions
    continue

elif code not in ['a', 'c', 'e', 'f', 'z', 'y']:
    # We expect everything else to be captions or notes, like in the example in
    # test_153::TestParse153::testComplexEntryWithUndocumentStuff
    break

and this passes current test. But why break at all? Could we change the break to continue and remove the test?

Rename --uri to --concept or add alias

Concept URI template is specified with concept in vocabularies.yaml and with --uri on command line. The latter should be renamed to --concept to match names.

rdflib-jsonld package has been deprecated

The rdflib-jsonld package has been deprecated / merged into rdflib.

When trying to launch a newly installed mc2skos application an error message is displayed:

.../python3.7/site-packages/rdflib_jsonld/__init__.py:12: DeprecationWarning: 
The rdflib-jsonld package has been integrated into rdflib as of rdflib==6.0.1.  
Please remove rdflib-jsonld from your project's dependencies.

[...]

  File ".../mc2skos/mc2skos.py", line 19, in <module>
    import rdflib_jsonld.serializer as json_ld
ModuleNotFoundError: No module named 'rdflib_jsonld.serializer'

Version numbers:

  • rdflib==6.1.1
  • rdflib-jsonld==0.6.2

See also: https://github.com/RDFLib/rdflib-jsonld

Fix release with YAML and vocabularies.yml

The current release 0.7.1 does not require pyaml and does not include vocabularies.yml so it throws an error if installed on a new system (at least this just happened to my colleague).

How to select number spans and table entries?

I'd like to filter DDC number spans and table entries which both differ from normal DDC classes. Internally we introduced the URIs http://dewey.info/type/NumberSpan and http://dewey.info/type/TableEntry in addition to skos:Concept. See gbv@8af8323 for an example.

Map 084 to concept scheme and namespace

084 gives classification scheme and edition:

  <mx:datafield tag="084" ind2=" " ind1="0">
    <mx:subfield code="a">ddc</mx:subfield>
    <mx:subfield code="c">23no</mx:subfield>
    <mx:subfield code="e">nob</mx:subfield>
  </mx:datafield>

We could perhaps have a config file that contains a map from the 084 values to namespace, scheme, etc.:

{
    "classification_schemes":
    {
        "ddc": {
            "23no": {
                 "uri": "http://data.ub.uio.no/ddc/{class_no}",
                 "scheme": "http://data.ub.uio.no/ddc/",
                 "sameas": ["http://dewey.info/class/{class_no}/e23/"]
        }
    }
}

while still allowing command line arguments to override the config file values.

Handling of links in 5XX fields

5XX: See Also From Tracings is currently mapped to skos:related or skos:broader depending on $w but the value of $w can also indicate skos:narrower and other kinds of (non)relations. In particular subfield $4 was added to directly specify relation with an URI if $w has value r. Possible values of $w:

a - Earlier heading	h - Narrower term
b - Later heading	i - Reference instruction phrase in subfield $i
d - Acronym	        n - Not applicable
f - Musical composition	r - Relationship designation in $i or $4
g - Broader term	t - Immediate parent body

Invalid JSKOS output

examples/gnd-1020118989.xml results in

  • altLabel having a string instead of array
  • note having a string instead of array

Fix weird failing tests at travis-ci

I run exactly the same version as release 0.4.0 at travis-ci. This was passing 21 days ago:

https://travis-ci.org/scriptotek/mc2skos/builds/192751396

but now it fails:

https://travis-ci.org/gbv/mc2skos/builds/199175954

The error is caused by

tests/test_process_record.py::TestRecord::testSynthesizedNumberComponentsIncludingAddTable

which should be skipped but it isn't.

This does not seem to be the only weird fail at travis-ci. This build even fails to install although no crucial changes are included:

https://travis-ci.org/gbv/mc2skos/builds/199180795

Maybe this is cause by some caching or installation problems at travis-ci?

Use preferred relative index term for WebDewey

When using the --webdewey flag, we should support 750 $9 ess=isCaption, which tells us which relative index term is to be used as class heading substitute when presenting the class.

Example:

  <mx:datafield tag="750" ind2="7" ind1=" ">
    <mx:subfield code="a">Personlige datamaskiner</mx:subfield>
    <mx:subfield code="x">grafikkprogrammer</mx:subfield>
    <mx:subfield code="0">(OCoLC-D)1226b03f-c205-420e-ae21-34d41be81715</mx:subfield>
    <mx:subfield code="2">ddcri</mx:subfield>
    <mx:subfield code="9">ps=PE</mx:subfield>
    <mx:subfield code="9">ess=isCaption</mx:subfield>
  </mx:datafield>

Handling of 072

The NAL thesaurus uses 072, which does not seem to be mapped by the marcauth-2-madsrdf tool. In their own conversion, they convert these to concept schemes:

<skos:ConceptScheme rdf:about="http://lod.nal.usda.gov/nalt//P">
	<rdfs:label xml:lang="en">Natural Resources, Earth and Environmental Sciences</rdfs:label>
	<rdfs:label xml:lang="es">Tierra, Ambiente y Recursos Naturales</rdfs:label>
	<skos:hasTopConcept rdf:resource="http://lod.nal.usda.gov/nalt//1556"/>
         ...
</skos:ConceptScheme>

but I wonder if skos:collection is more appropriate.

In the MARCXML, a member of the collection "P Natural Resources, Earth and Environmental Sciences" looks like this:

 <marc:record>
      <marc:leader>00664nz  a2200205n  4500</marc:leader>
      <marc:controlfield tag="003">DNAL</marc:controlfield>
      <marc:controlfield tag="005">20161208094706.0</marc:controlfield>
      <marc:controlfield tag="008">161208 neazdnnbabn           a ana      </marc:controlfield>
      <marc:datafield tag="016" ind1="7" ind2=" ">
         <marc:subfield code="a">nalt00276029</marc:subfield>
         <marc:subfield code="2">DNAL</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="035" ind1=" " ind2=" ">
         <marc:subfield code="a">(DNAL) nalt00276029</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="040" ind1=" " ind2=" ">
         <marc:subfield code="a">DNAL</marc:subfield>
         <marc:subfield code="c">DNAL</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="072" ind1=" " ind2=" ">
         <marc:subfield code="a">P Natural Resources, Earth and Environmental Sciences</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="150" ind1=" " ind2=" ">
         <marc:subfield code="a">necromass</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="550" ind1=" " ind2=" ">
         <marc:subfield code="a">biological resources</marc:subfield>
         <marc:subfield code="w">g</marc:subfield>
      </marc:datafield>
      ...
      <marc:datafield tag="750" ind1=" " ind2="7">
         <marc:subfield code="a">necromasa</marc:subfield>
         <marc:subfield code="0">tesa00276029</marc:subfield>
         <marc:subfield code="2">TESA</marc:subfield>
      </marc:datafield>
   </marc:record>

and the collection itself:

 <marc:record>
      <marc:leader>01004nz  a2200325n  4500</marc:leader>
      <marc:controlfield tag="001">54870</marc:controlfield>
      <marc:controlfield tag="003">DNAL</marc:controlfield>
      <marc:controlfield tag="005">20161208094706.0</marc:controlfield>
      <marc:controlfield tag="008">161208 neazdnnbabn           a ana      </marc:controlfield>
      <marc:datafield tag="016" ind1="7" ind2=" ">
         <marc:subfield code="a">nalt00127305</marc:subfield>
         <marc:subfield code="2">DNAL</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="035" ind1=" " ind2=" ">
         <marc:subfield code="a">(DNAL) nalt00127305</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="040" ind1=" " ind2=" ">
         <marc:subfield code="a">DNAL</marc:subfield>
         <marc:subfield code="c">DNAL</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="072" ind1=" " ind2=" ">
         <marc:subfield code="a">P Natural Resources, Earth and Environmental Sciences</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="150" ind1=" " ind2=" ">
         <marc:subfield code="a">Natural Resources, Earth and Environmental Sciences</marc:subfield>
      </marc:datafield>
      <marc:datafield tag="550" ind1=" " ind2=" ">
         <marc:subfield code="a">atmospheric sciences</marc:subfield>
         <marc:subfield code="w">h</marc:subfield>
      </marc:datafield>
      ...
      <marc:datafield tag="750" ind1=" " ind2="7">
         <marc:subfield code="a">Tierra, Ambiente y Recursos Naturales</marc:subfield>
         <marc:subfield code="0">tesa00127305</marc:subfield>
         <marc:subfield code="2">TESA</marc:subfield>
      </marc:datafield>
   </marc:record>

But how do we identify the latter as a collection?

Support number spans with additional spaces

RVK notations have spaces around the - of number spans, e.g:

  <datafield tag="153" ind1=" " ind2=" ">
    <subfield code="a">MC 7700</subfield>
    <subfield code="c">MC 7773</subfield>

should result in notation MC 7700 - MC 7773 instead of MC 7700-MC 7773. I added a test case at https://github.com/gbv/mc2skos/tree/rvk-span/examples but no fix so far. This is likely yet another configuration of one particular KOS.

P.S: I'm not sure whether URIs should also have _-_ instead of -, need to check back at UB Regensburg.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.