Parallel Distributed Backup
- GNU Parallel
- DAR (version 2.5.12 or newer)
- Globus CLI
- Note: Be sure to install the optional
globus-cli[delegate-proxy]
package.- This is necessary to activate gridftp endpoints using an x509 certificate proxy, which is needed until gridftp supports OAuth natively.
- See also: Globus CLI Changelog, Version 1.1.1
- Note: Be sure to install the optional
lockfile-progs
pkg from EPEL repository- Python 3
- The filesystem to be backed up implements snapshots AND snapshots are actively being produced.
There are some other git projects this code makes use of. They are included as
submodules and will be included automatically when this repo is cloned (hence
the --recursive
option to clone). They are documented here to give credit to
the authors of those projects.
cd <INSTALL PATH>
git clone --recursive https://github.com/ncsa/pdbkup.git
- Update PDBKUP_BASE with value of INSTALL_PATH in file
bin/run.sh
- Review settings in
conf/settings.ini
(see the Configuration section below)
# SCAN FILESYSTEM, CREATE FILELIST, CREATE PARALLEL JOBLIST
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh bkup init
# START WORKERS ON MULTIPLE NODES
[root@adm01 ~]# xdsh backup01,test[09-10] /gpfs/fs0/DR/pdbkup/bin/run.sh bkup startworker
# SEE WHAT WORKERS ARE DOING
[root@adm01 ~]# xdsh backup01,test[09-10] /gpfs/fs0/DR/pdbkup/bin/run.sh bkup ps
# CHECK OVERALL PROGRESS
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh bkup status
# LIST CURRENT AND HISTORICAL BACKUPS
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh bkup ls
# CHECK PROGRESS OF PARALLEL TASKS
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh bkup dbstatus
# PREPARE RESTORE INFO & LOGS FOR LONG TERM STORAGE
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh bkup wrapup
# GET USAGE SYNOPSIS
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh bkup
# START NEW TRANSFER TASK FOR ALL FILES READY TO TRANSFER
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh txfr startnew
# MONITOR ACTIVE TRANSFERS
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh txfr ls
# RENEW ENDPOINT CREDENTIALS (if needed)
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh txfr update-credentials
# VIEW TXFR USAGE SYNOPSIS
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh txfr
# CLEANUP COMPLETED TRANSFERS
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh txfr clean
# REMOVE SUCCESSFULLY TRANSFERRED FILES
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh bkup purge
# LIST FILE COUNTS, FILES IN "ERROR" DIRECTORIES HAD PROBLEMS
[root@lsst-backup01 ~]# /gpfs/fs0/DR/pdbkup/bin/run.sh bkup tree
# TRANSFER FROM LONG TERM STORAGE
# this must be an externally initiated and monitored process
# Further steps assume files are transferred to a local filesystem at
# $DR_LANDING_LOCATION
# EXTRACT INFO ARCHIVE
[root@lsst-backup01 ~]# cd $DR_LANDING_LOCATION
[root@lsst-backup01 home]# dar -x $( basename $( ls *_INFO.1.dar ) '.1.dar' )
# EXTRACT DR ARCHIVE CONTENTS
[root@lsst-backup01 home]# ./xtract_dar
# VERIFY RESTORED DATA
# This is only possible as a test scenario.
# Requires that the original snapshot is available to verify against.
[root@lsst-backup01 home]# ./verify_restore
All configuration is managed through the single file config/settings.ini
.
Below is a description of each section and related settings.
- DATADIR
- A directory structure will be created here inside of which will be stored all the temporary DAR archive files (before they transfer to long term storage), task queues (that will be used by parallel). Ensure DATADIR is a location that has enough capacity; which should be at least as large as all the data that will be backed up.
- INFODIR
- A directory structure will be created here that will store all the permanent information about each backup. Everything below here defines state or status of each backup as well as information necessary for restores.
- ENDPOINT_LOCAL
- Name or UUID of local endpoint
- local endpoint should have access to the
DATADIR
from theGeneral
section (above)
- ENDPOINT_REMOTE
- Name or UUID of remote endpoint
- The offsite location where disaster recovery files will be transferred to
- BASEDIR_REMOTE
- Directory under which the archives should be stored
- On the remote side, directories will be created as necessary to create a logical structure for the archive files.
- USERNAME
- Access globus as this user
- CLI
- Path to local
globus
command
- Path to local
GNU Parallel uses a database for a task queue. This section defines how to interact with the database. Not all fields will be used; leave blank any irrelevant fields. For instance, if using sqlite3, then user, pass, host, port will be empty (likewise for csv).
- DB_VENDOR
- DB_USER
- DB_PASS
- DB_HOST
- DB_PORT
- DB_DBNAME
- DB_TABLE
- WORKDIR
- Directory in which csv or sqlite file will be stored.
- Will be automatically created inside
DATADIR
(from theGeneral
section above)
- MAX_PROCS
- How many processes to run in parallel on each node
- MIN_VERSION = 20170222
- Minimum required version of parallel.
- Options
--sqlmaster
and--sqlworker
are relatively new and don't work well before Jan 2017. Change this only at your own risk.
- PATH = /usr/local/bin
- Update this if parallel was installed to a custom location
- NOTE: parallel also installs the
sql
command and expects it to be accessible in the PATH
For more information and for a list of valid strings for DB_VENDOR, see:
- CMD
- Path to dar binary/executable No need to change anything else in this section.
Unused for now. Possible future enhancement.
No need to change anything in this section.
No need to change anything in this section.
These are defaults that apply to the DIRS that will be backed up. Most often, all the dirs to be backed up are all part of the same filesystem or same type of filesystem, so it suffices to set relevant settings in one place.
- SNAPDIR_DATE_FORMAT
- If each snapshot has a date as part of it's name, specify that format here
- Syntax is same as used by the
date
command
- ARCHIVE_MAX_SIZE
- Size limit, in Bytes, of a single archive file
- ARCHIVE_MAX_FILES
- maximum number of files in a single archive
- Adjust this based on median file size in the DIR so that archive files can reach ARCHIVE_MAX_SIZE
- Make this higher for filesystems with lots of small files
Any of these defaults can be overridden on a per DIR basis by creating
a section matching the name of the KEY in the DIRS
section and then put the
new setting and value in that section. It will apply only to that DIRKEY
- Each key in [DIRS] is a DIRKEY
- Create a new DIRKEY for each filesystem / mountpoint / directory to be backed up
- Each DIRKEY must be set to one of:
- enabled -> backups are enabled
- disabled -> do not perform new backups (reporting is still enabled)
- Foreach entry in DIRS,
- Create a new section with name matching DIRKEY (from [DIRS] section)
- Assign SNAPDIR
- Assign PATH (PATH can be empty if backing up entire filesystem)
- Backups will start from SNAPDIR/date/PATH, where "date" is the most recent snapshot
- repeat any key from section [DEFAULTS] to override the default value
There are additional comments and examples in the default conf/settings.ini
file.