Giter Site home page Giter Site logo

uncvmfs's Introduction

unCVMFS
=======

unCVMFS is a tool for unpacking an entire CVMFS repository into a local
directory in a time efficient manner.


Configuration Files
-------------------
/etc/uncvmfs.conf - The main configuration file.
/etc/sysconfig/uncvmfs - Settings for the cron job (uncvmfs_cron).
/etc/cron.d/uncvmfs - The main cron entry.
[email protected] - Systemd timer unit (CentOS7+)


Basic Principles & Usage
------------------------

unCVMFS manages three directories, the repo, store & database dirs. The repo dir
is the main output directory, this is populated with the standard contents of
the target CVMFS repository (i.e. on completion this dir will look like
/cvmfs/<repo_name> would on a standard CVMFS system). The store dir is where the
main data files for the filesystem are kept. This is stored in a similar format
to a CVMFS server repository and is generally of little interest. As the repo is
poulated, files are downloaded into the store dir and if possible hardlinked
into the repo dir. This allows for inherent deduplication, but requires that
that the repo and store dir _must_ be on the same host file system. Finally the
database (db) dir holds the current unCVMFS (catalog.db) and CVMFS (various
<sha1_sum>.db) catalogs in sqlite3 format.

unCVMFS uses a single config file to describe the repositories to download. The
config file is in INI/ConfigParser format with each section represting a
repository. See the provided default config file in /etc/uncvmfs.conf for the
full config file documentation.

unCVMFS would normally run regularly from a Cron job or systemd timer to keep
the given repo directory up-to-date. The provided example crontab file has a
default run time of every two hours for the repositories listed in
/etc/sysconfig/uncvmfs. This file enables no repositories by default as most of
the configuration will be site-specific.

For systemd based platforms, the timer can be enabled with "systemctl enable
uncvmfs@<repo>.timer", which by default will run uncvmfs every two hours for
the given repo.

On a clean installation, it's recommended that the initialisation of the
repository is done manually until the repository is synced and then the cron
job/timer can be enabled. The initial run on a large repository, such as ATLAS
or CMS (.cern.ch) may take one to two days. It is also recommended that unCVMFS
be run twice for the initialisation just to ensure that all files are correctly
initialised (downloading millions of files over HTTP is seemingly prone to
bursts of errors every so often). The syncronisation process can be interrupted
& restarted by ctrl+c for any reason, it will also stop with a suitable message
if an unrecoverable error (such as running out of disk space) occurs. Once a
repository is successfully synced, the cron job/timer should be enabled to keep
it up-to-date.

An example of running unCVMFS to sync the CMS area using 16 threads is (assuming
cms.cern.ch is a section in the config file):
  uncvmfs -vv -n4 /etc/uncvmfs.conf cms.cern.ch

Use of a multiplexing terminal manager such as screen is advised for operations
that may take a long time over remote connections.


Maintenance & Problems
----------------------

A tool for doing repository maineance, uncvmfs_tool is included. Care should be
taken to disable any cron jobs/timers that may access the database while using
this, although the database will be locked to avoid any accidental damage.
There are two main maintenance operations that can be done - fsck & tidy; other
informational operations are also available (see uncvmfs_tool --help for a
complete list).

The uncvmfs_tool file system checker (fsck) checks through the repo directory
and the catalog in parallel looking for any inconsistencies. Any differences
found will be corrected or (in the case of missing files) marked for repair on
the next main uncvmfs run. unCVMFS should be run on the repository after an
uncvmfs_tool fsck operation completes (if errors were reported). In general
there should be no need to run fsck unless it is suspected that the repo has
been modified directly or if there are problems with repo files.

The tidy operation scans through the store directory and removes any files
which only have only one link. These occur when a file is superceded in the
catalog and can't be deleted immediately for performance reasons.

If any further repository corruption occurs unCVMFS may exit with an sqlite
exception. This indicates a problem with the hardware running unCVMFS or a bug
in the code. If a hardware problem is not suspected then please report it to the
developers. Running uncvmfs with the debug level at maximum (-vv flags) should
return the sha1 hash of the faulty catalog. Remove the <sha1_sum>.db file from
the database directory and the matching entry from the catalogs table of
catalogs.db to recover. If the problem persists then this indicates a
faulty/unexpected upstream catalog.

uncvmfs's People

Contributors

bbockelm avatar sfayer avatar verdurin avatar

Watchers

James Cloos avatar Dirk Hufnagel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.