Giter Site home page Giter Site logo

osf-pigeon's Introduction

osf-pigeon

A utility for archiving OSF data to archive.org

Purpose

This is a mirco-service that takes an OSF registration and mirrors it's files and metadata at Internet Archive.

Use

This should be able to export registration data from the OSF to archive.org assuming the registration is fully public and the DOI has been minted by the start of the archive job and the registration isn't withdrawn.

Install/Run

Simply set your add your environment settings in local.py file and run with bash command:

    pip3 install -r requirements.txt
    python3 -m osf_python

That's it! Your OSF-Pigeon server should be up and running.

Running in development

The osf.io and archive.org use a constellation of services for both testing and live environments here are recommended settings for each environment:

Setting an OSF_BEARER_TOKEN for a registration is not necessary for permissions, but is recommended to avoid rate limiting. Credentials for the Datacite and Internet Archive should be otained via your institution.

Tests

Running tests are easy enough by just running the bash command:

 pip3 install -r dev.txt
 python3 -m pytest . 

Overview

When a registration is made public on the OSF the platform will begin to upload that registrations data and metadata to archive.org in order to save the registration for posterity. This involves uploading the registration's raw archived data to archive.org as well as supplementary JSON/XML metadata files describing that registration. To aid with searchability and we are also updating the metadata associated archive.org storage item to reflect the registration it corresponds to.

Metadata Syncing Details

When a registration is made public on the OSF the platform will start sending syncing requests to Pigeon to sync metadata with it's Internet Archive item.

There are two types of metadata being sent from OSF registrations, typical registration metadata which is set once on creation and editable metadata which changes continually as registrations are edited. Here is the list of attributes that are synced with IA and their implementation details:

  • Metadata set once on creation:

    • publisher
      • Should always be set to "Center for Open Science"
      • This is an IA recommended keyword.
    • creator
      • These are the biblographic contributors for a registration.
      • This is an IA recommended keyword.
    • date
      • The date the registration was registered.
      • This is an IA recommended keyword.
    • osf_registry
      • This is the title of the Registration Provider for each registration.
      • IA recommended we add the osf_ prefix to this to assert our brand.
    • osf_registration_schema
      • This is the title of the registration schema used.
      • IA recommended we add the osf_ prefix to this to assert our brand.
    • osf_registration_doi
      • This is the DOI of the registration that has been archived, this is not a DOI reffering to any other published article or document, that is the article_doi
      • IA recommended we add the osf_ prefix to this to assert our brand.
    • source
      • A url to the OSF registration
      • This is an IA recommended keyword.
    • parent
      • A link to any parent registrations linked to that item
    • children
      • A link to any child registrations/components linked to that item
  • Editable Metadata (synced continually)

    • title
      • This is the registration's title
    • description
      • This is the registration's description.
    • osf_category
      • This is the registration's category.
      • IA recommended we add the osf_ prefix to this to assert our brand.
    • osf_subjects
      • These are a list the titles of a registration's subjects, usually the scientific discipline that registration.
      • IA recommended we add the osf_ prefix to this to assert our brand.
    • osf_tags
      • These are a list of tags to aid in the searchability of the registration.
      • IA recommended we add the osf_ prefix to this to assert our brand.
    • article_doi
      • This is a user created DOI that is supplemental to the registration, not a DOI created for that archived registration, that is the osf_registration_doi
    • license
      • This is a url to the license for the registration if it has one.
    • affiliated_institutions
      • A list titles of the institutions affiliated with the registration.

JSON/XML Metadata Details

Each archived registration includes four json files and one xml file with metadata pertaining to the archived registration, these files are:

  • registration.json
    • General metadata for the registration, title, description and links to all public relationships.
  • wiki.json
    • The text and metadata details associated with that registration's wiki.
  • contributors.json
    • The list of contributor to the registration including extra information about their ORCID identifiers and affiliated institutions.
  • logs.json
    • A list of all that registrations logs.
  • datacite.xml
    • This contains the datacite's metadata for the DOI corresponding to that registration,

osf-pigeon's People

Contributors

johnetordoff avatar corbinsanders avatar brianjgeiger avatar jwalz avatar rastogiujj12 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.