Giter Site home page Giter Site logo

aspace_sitemap's Introduction

ArchivesSpace Sitemap Generation for the PUI

Getting started

Download and unpack the latest release of the plugin into your ArchivesSpace plugins directory:

    $ curl ...
    $ cd /path/to/archivesspace/plugins
    $ unzip ...

Add the plugin name to the list of enabled plugins in config/config.rb:

AppConfig[:plugins] = ['some_plugin','aspace_sitemap']

Note

For users running ArchivesSpace versions older than v2.6.0, please note that slug related options are not available.

Institutions with large numbers of published objects may need to increase the memory alloted to the application. See http://archivesspace.github.io/archivesspace/user/tuning-archivesspace/

The sitemap generation relies on the SOLR index for some checks related to unpublished ancestors, so the sitemap generation should only be run after the indexer completes the first full index round.

What does it do?

The plugin adds a new job that generates a sitemap (at least one sitemap with a sitemap index) for the PUI. The file(s) can be downloaded and placed on a server of your choice for submission to the search engine(s) of choice and saved to the local filesystem to be served out at {pui_host}/sitemap-index.xml. There are two configuration options.

Configuration

Configure the plugin by editing your config.rb file with the following entries - modified as appropriate. If you are submitting the sitemap via the tools provided by Google or Bing, you will need to set the following.

  1. Google requires verification that you own the site. One way is by a verification meta tag.
# set the meta tag from Google to verify site ownership
AppConfig[:google_verification_meta_tag] = "your_verification_meta_tag"
  1. Bing also requires verification that you own the site. One way is by a verification meta tag.
# set the meta tag from Bing to verify site ownership
AppConfig[:bing_verification_meta_tag] = "your_verification_meta_tag"

How to Use

For users with access to Background Jobs, there is a new entry in the Create Jobs menu called ArchivesSpace PUI Sitemap Once selected, the job asks for several inputs

  1. What types of objects to include in the sitemap. At least one is required.
  2. The update frequency. For most institutions, yearly is probably fine.
  3. Use human readable slugs. Slugs generated by the user or the application will be used in the <loc> field if they are available. (v2.6.0+)
  4. Write to local filesystem. Sitemaps will be written to a static space and to the root of the PUI webspace. The generated sitemaps are stored in AppConfig[:data_directory]/pui_sitemaps and placed at the root of the site ie: {pui_host}/sitemap-index.xml It also updates the robots.txt file in the PUI to include the sitemap entry. Any existing sitemaps are copied to the PUI webroot on startup and the robots.txt file is updated on startup if there are existing sitemap files. Uncheck this option and fill in the sitemap index base url entry (below) if you want to host the sitemaps on an external server.
  5. The sitemap index base url. This is the location where you will be hosting the sitemaps. It is ignored if write to filesystem (above) is selected.
  6. The limit on the number of entries per sitemap file. You should be able to leave this at the default of 50000.

Notes

  1. The 'priority' attribute is not used in the sitemap since there is no mechanism in place to mark objects in the staff interface. Given the large number of objects that are typically published, it seems unlikely that 'priority' would be widely used. Google has also indicated that the priority attribute is not used by their algorithm.
  2. The option to use slug/human readable urls is somewhat risky, since these slugs are based on changeable metadata.

Joshua Shaw ([email protected])
Digital Library Technologies Group
Dartmouth College Library

aspace_sitemap's People

Contributors

jdshaw avatar lmcglohon avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.