Giter Site home page Giter Site logo

djarchive-client's Introduction

djarchive-client

This is 'djarchive-client', a library for interfacing with DataJoint Neuro's data publication service.

Status: WIP; currently for internal/prototyping use.

Installation & Setup

Installation & Setup instructions are as follows:

  1. Install the package:

    $ pip install .

  2. If applicable, configure datajoint with appropriate dj.config['custom'] values:

    Admin usage expects dj.config['custom'] values for:

    • djarchive.access_key
    • djarchive.secret_key

    Client and admin usage allow overriding dj.config['custom'] defaults for:

    • djarchive.bucket
    • djarchive.endpoint

    This should only be needed for development or custom server configurations.

Usage via 'djarchive' utility script

The 'djarchive' utility script can be used for CLI interactions with the data archive. Usage synopsys is described in the following subsections.

Browsing & Retrieving Datasets

The following example shows a minimal usage synopsis to navigate the available datasets and to download a single dataset.

$ djarchive datasets # list datasets
some-set
$ djarchive revisions # list all datasets & revisions
some-set,000
other-set,001
$ djarchive revisions some-set # list revisions for dataset 'some-set'
some-set,000
$ djarchive download some-set 000 ./some-set-000 # download some-set rev. 000

File Integrity, Download Caching & Error Handling

The djarchive client provides integrity protection in the form of a file manifest, called 'djarchive-manifest.csv', which is a CSV list of file sizes, sha1 hashes, and file subpaths for a given dataset.

By default, files having path components beginning with '.' are not considered for inclusion within archives. This behavior can be overridden by setting the dj.config.filename_filter property to a regular expression containing the desired filtering pattern, or the special string "''", which is treated as matching the empty string ('^$') and so should never match a file.

Retrieval of datasets begins with the download of this manifest, which is then used to direct the remaining download and to verify download actions complete successfullly.

To allow for resuming of partially downloaded datasets, local files in the target directory of the download are compared against the manifest. If a local file matches the manifest contents, it is not re-retrieved, and if a local file does not match the manifest contents, it is re-retrieved without confirmation.

Uploading Datasets

Datasets can be uploaded using the djarchive client using the 'upload' functionality. In short:

$ djarchive upload my-dataset 1 /path/to/my-dataset

If a djarchive manifest is not present in /path/to/my-dataset, it will be generated as part of the upload process, with the manifest uploaded as the last step to indicate that the dataset is complete.

If a djarchive manifest is present in /path/to/my-dataset, files will be checked against the manifest as part of the upload process, and an error will be signalled if files exist in the dataset folder which are not tracked in the manifest, or if files do not match the expected manifest values.

The djarchive manifest file can also be generated as a separate step via the 'manifest' command:

$ djarchive manifest /path/to/my-dataset

Usage via 'djarchive_client' python pacakge

The djarchive_client python package contains the logic for interacting with the archive. Best reference is the code/docstrings; minimal usage example is as follows:

>>> from djarchive_client import client
>>> c = client()
>>> c.datasets()
<generator object DJArchiveClient.datasets at 0x7faf65ec4ba0>
>>> list(c.datasets())
['some-set']
>>> c.revisions()
<generator object DJArchiveClient.revisions at 0x7faf65ec4ba0>
>>> list(c.revisions())
[('some-set', '000'), ('other-set', '001')]
>>> list(c.revisions('some-set'))
[('some-set', '000')]
>>> c.download('some-set', '000', './some-set-000', create_target=True)

Logging & Output

Functions in the djarchive_client library and the djarchive utility currently do not produce output in the normal case. In some cases, such as for viewing per-file download status messages, more detailed output may be desired.

In the script case, setting dj.config['loglevel'] or the environment variable DJARCHIVE_LOGLEVEl to DEBUG will increase logging output. Additionally, output can be logged to a file by setting dj.config['custom']['logfile'] to a string containing the path to a desired logfile.

For library usage, setting the python logging for the djarchive_client module to logging.DEBUG will enable more verbose output. See the function logsetup in scripts/djarchive for more details.

djarchive-client's People

Contributors

ixcat avatar ttngu207 avatar iamamutt avatar kabilar avatar guzman-raphael avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.