Giter Site home page Giter Site logo

johnjung / digital_collection_validators Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 81.28 MB

Validate file naming conventions and content, and automated fixes, for digital collections projects.

License: GNU General Public License v3.0

Python 100.00%

digital_collection_validators's Issues

Make an MvolOwnCloudSSH class.

Make an MvolOwnCloudSSH class, which extends OwnCloudSSH in a way that's analogous to ApfOwnCloudSSH. Move the validate() method out of OwnCloudSSH into this new class, along with any other methods that don't make sense on their own in OwnCloudSSH. This will require changes in other files: be sure to check the 'mvol' command and test.py to be sure they still work correctly after making this change.

Write a validator for files from the photo archive

The University of Chicago Photographic Archive website is here. Master files (TIFFs) for the project live in Owncloud, in IIIF_Files/apf. There are stub functions in apf, classes.py, and tests.py.

Write a validator for the files in this directory. In the "1" directory, all filenames should begin with "apf1". in the "2" directory, all files should begin with "afp2", etc. We should be able to validate the names of files using a regular expression, so that every filename begins with "apf" and then includes a number, a dash, a sequence of exactly five numbers, and ends with the lowercase "tif" extension.

Research a python module (or code something yourself) to validate the TIFF files themselves.

Get the username and password for the Owncloud web interface from John. There is an existing validator for files from the Campus Publications project: please model this validator on the Campus Publications validation code, and set up an "apf" command line tool with similar functionality to the mvol command line tool. Be sure to add documentation to each function, and add tests.

Some things we use: Python virtual environments, docopt, paramiko.

Set up a development environment on Windows

This code requires access to several library servers via public key authentication. Set up a working environment on Windows that allows us to authenticate to these remote servers for development. The method that connects to remote servers using Paramiko is in classes.py- on a Mac, I was able to connect by omitting the **paramiko_kwargs param from the ssh.connect() method.

Investigate other error messages, and find a good way to set up a development environment for this repo on Windows. This may involve using Cygwin or VirtualBox.

Code cleanup

  • Be sure all tests run.
  • classes.py currently includes a reference to the SSH class, which has been renamed- fix that issue, and check to be sure old classes aren't being referenced elsewhere as well. Evaluate whether or not it makes sense to add new tests to catch cases like this- if it makes sense, please add those tests.
  • Add docstrings for all methods.
  • Use pycodestyle to be sure all the code in the project is formatted in a regular way.

Move destructive mvol options into a new command.

Several options on the mvol command are destructive- they make changes to the files in OwnCloud:

  • mvol put_dc_xml
  • mvol regularize_mets
  • mvol regularize_pdf
  • mvol regularize_struct
  • mvol regularize_txt
  • mvol rename_altos
  • mvol rename_jpegs
  • mvol rename_tiffs

In order to make it clear to the user that these options change files on disk, move them into a new command called "mvol_fix".

This command should use docopt for argument parsing, just like mvol, and it should include the destructive options listed above. Those options should be removed from the mvol command.

Modify "mvol validate" and "mvol ls" so they can work with files on a local drive in addition to files on OwnCloud.

Vendors ship files for digital collections projects to the library on hard drives. Before the preservation department uploads these files to Owncloud, they copy them from a hard drive to a local disk to perform quality control.

Modify the validate() and recursive_ls() methods of MvolOwnCloudSSH so that they will work if files are on a local drive. Be sure they still work with files in OwnCloud as well. You will want to add command line options to the mvol command so that it can be pointed at files on a local disk, and you'll need to extend the code in classes so that it can either work over SSH or locally.

You'll want to look at Python's os module, specifically os.listdir() which works similarly to Paramiko's listdir() method. This will add some complexity to the code- try to find a way to organize the code clearly to deal with this. When you're done we'll review the code and look to see if there are other ways to organize things.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.