Giter Site home page Giter Site logo

dspace-csv-import's Introduction

DSpace CSV import tools

DSpace allows for items to be imported in bulk using a format called DSpace Simple Archive Format. The format boils down to a directory structure and a metadata XML format.

Unfortunately, the DSpace Simple Archive Format is not ideal for passing on to end users. The main reason for this is that end users are often not comforable creating or editing XML documents.

At the University of Guelph we have developed a small script to facilitate metadata generation in CSV format which can be created or edited in Excel, a tool with which many users are comfortable.

Generate a sample CSV file

A sample, empty CSV file can be printed to STDOUT

./dspace_csv.py -t

To save this to a file just use a standard redirect

./dspace_csv.py -t > sample.csv

This can be useful for sending to a user and requesting that they fill it in with default metadata that will apply to the whole import. dc.language.iso would be an excellent candidate to fill in at this time.

Anything that is set at this stage can be overridden on a per-item basis, so if you have something like 500 English items an 12 French items it is still worth doing.

Generate an empty archive structure

A sample archive structure can be generated with something like

./dspace_csv.py -a [directory-name]

but you will probably want to include the -n and -s options too:

./dspace_csv.py -a [directory-name] -n [number-of-items] -s [sample-csv-file-from-above]

The -n option specifies the number of item_ subdirectories to create. The current maximum for this process is 1000 items per import, but this might increase in the future.

The -s option provides the name of a CSV file to be used as a starting point for each item. Typically you would want to use a file generated using the -t option as shown above that has been modified by the end user.

By automating the directory creation and copying of the initial CSV file we can save a lot of time. There will also be a file called README.html in the directory root containing some basic instructions for the end user.

Zip up the empty archive

Create a zip file containing the archive with something like

zip -r archive.zip directory-name

and send it to the end user. Direct them to unzip the file and refer to the README.html file.

Clean up the returned archive

After the end user has updated the metadata files and copied the item files they should have zipped it back up and sent it to you.

First, unzip the directory with

unzip archive.zip

It is probably a good idea to eyeball the CSV data on some items at this point. Once you are satisfied that the data is sound, run the cleanup script on the directory.

./dspace_csv.py -c [directory-name]

This does two things:

It parses the CSV metadata files and generates matching XML metadata files that DSpace can understand. The existing CSV files are not modified or deleted.

It generates contents files inside each item subdirectory listing the files that should be imported, as required by DSpace. It assumes that any files inside the item directory that aren't named dublin_core.csv, dublin_core.xml, metadata_[anything].csv, metadata_[anything].xml or contents should be included.

If you want more control over the XML metadata or the contents file itself you can create them manually. It is safe to run the clean script on a directory containing these files: if they exist they will not be overwritten.

Import into DSpace

Once the clean script has been successfully run the archive directory should be almost ready for DSpace to import it.

If it is not already, the directory should be placed in a location that the dspace user can access it and write to the directory. I recommend putting the directory into /home/dspace/imported-data/ and leaving it there so the mapfile can be easily found if it is needed later, e.g. to remove or modify imported data. One way to do this is

sudo cp -r [directory-name] /home/dspace/imported-data/
sudo chown -R dspace:dspace /home/dspace/imported-data/[directory-name]

Now we are ready to use the import command that comes with DSpace. Be sure to run this command as the dspace user. Something like

[dspace]/bin/import --add --eperson=[importer's email address] --collection=[collection handle] --source=[directory-name] --mapfile=[directory-name]/mapfile

will add the items in the directory to the requested collection. Please refer to the DSpace documentation for more information about the DSpace Simple Archive Format or the import/export commands.

The --mapfile argument is particularly important, and the file that gets generated should be kept along with the rest of the source directory. This file is required for deleting or modifying the imported files using the command-line tools.

dspace-csv-import's People

Watchers

Chris Charles avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.