The zfs_uploader from ddebeau

For file systems with many changes, the incremental backups will continue to grow in size until a new full backup is taken. We should add the ability to take more than one full backup and prune full backups.

Log upload progress and speed

We should log upload progress and speed so we'll know if the upload is stalled or not.

Create SnapshotDB object

Code like snapshot_name = f'{self._file_system}@{backup_time}' and list_snapshots() should really be part of a greater SnapshotDB object so we can document and reuse code efficiently. The Backup object should also have a snapshot parameter so we can easily restore from backup.

backup_info functions should be moved from ZFSjob to their own object

It would improve the readability and organization of the code to move the backup_info functions to their own object. It would also allow us to better support cross-compatibility with old and new backup_info formats.

zfs_uploader/zfs_uploader/job.py

Lines 149 to 179 in ccfa4b6

    
           def _read_backup_info(self): 
        
               info_object = self._s3.Object(self._bucket, 
        
                                             f'{self._filesystem}/backup.info') 
        
               try: 
        
                   with BytesIO() as f: 
        
                       info_object.download_fileobj(f) 
        
                       f.seek(0) 
        
                       return json.load(f) 
        
               except ClientError: 
        
                   return {} 
        
           def _write_backup_info(self, backup_info): 
        
               info_object = self._s3.Object(self._bucket, 
        
                                             f'{self._filesystem}/backup.info') 
        
               with BytesIO() as f: 
        
                   f.write(json.dumps(backup_info).encode('utf-8')) 
        
                   f.seek(0) 
        
                   info_object.upload_fileobj(f) 
        
           def _set_backup_info(self, key, file_system, backup_time, backup_type): 
        
               backup_info = self._read_backup_info() 
        
               backup_info[key] = {'file_system': file_system, 
        
                                   'backup_time': backup_time, 
        
                                   'backup_type': backup_type} 
        
               self._write_backup_info(backup_info) 
        
           def _del_backup_info(self, key): 
        
               backup_info = self._read_backup_info() 
        
               backup_info.pop(key) 
        
               self._write_backup_info(backup_info)

Switch to logfmt instead of custom format

logfmt is easier to read than the current format and is very easy to add information to.

Adaptive storageclass

Hi, I'm the author of https://github.com/andaag/zfs-to-glacier/ and I occasionally go hunting for alternative versions, to see if someone has invested more in this problem than me and I can stop maintaining this.

One thing I see you are missing is storage class depending on size. I got a fairly ugly hack here : https://github.com/andaag/zfs-to-glacier/blob/main/src/main.rs#L95 to adjust the storage class to standard in cases where the file size is too small for glacier. (In these cases you pay a premium, and it's actually cheaper with standard storage).

Glacier minimum billable size is 128kb. Quite a lot of incremental backups can be 1kb, so Standard storage is a very clear winner!

Jobs should run even if they are late

misfire_grace_time is currently set to 2 hours which means that if a job takes longer than 2 hours other jobs may not run on schedule. It should be set to None so that late jobs always run. We should always error on the side of backing up early.

zfs_uploader/zfs_uploader/__main__.py

Line 63 in 585e3ae

misfire_grace_time=2*60*60,

`zfsup restore` should allow you to restore to a new filesystem

Restoring to a new filesystem would help with testing.

Send stream could be sent compressed if the dataset is compressed

If the dataset being backed up has a compression property set to anything other than off, the default behaviour of zfs send is to decompress on the fly and send the full uncompressed dataset.

Simply by adding a -c, --compressed flag to zfs send, this will instead be sent compressed and takes up significantly less space on the remote. In my case this reduced a full backup of a PostgreSQL database from 56 GB to 24 GB.

I added this flag to my personal fork in Erisa@c192333 and noticed no regressions or repercussions, however since users may not always have their dataset set to compress or want this behaviour to change across versions, I believe the best way forward would be to add a zfs_uploader config variable that will enable this compressed flag.

CLI entrypoint doesn't work

The CLI entrypoint is executable-name instead of zfsup and the command returns this traceback:

executable-name --version
Traceback (most recent call last):
  File "<redacted>/.env/zfs_uploader/bin/executable-name", line 5, in <module>
    from zfs_uploader.__main__ import main
ImportError: cannot import name 'main' from 'zfs_uploader.__main__' (<redacted>/.env/zfs_uploader/lib/python3.7/site-packages/zfs_uploader/__main__.py)

A job scheduled too close to another job may get skipped

Error Message:

Run time of job "ZFSjob.start (trigger: cron[month='*', day='*', day_of_week='*', hour='6', minute='2'], next run at: 2022-02-07 06:02:00 UTC)" was missed by 0:00:28.825414

Setting misfire_grace_time to None for all jobs may fix the issue.

zfs_uploader/zfs_uploader/__main__.py

Line 62 in af5868f

scheduler.add_job(job.start, 'cron', **job.cron, coalesce=True)

Encrypt snapshots

Hello!

Are there any plans to implement encryption of snapshots for unencrypted pools / data sources before uploading to s3?

Example:

zfs send pool/dataset@snapshot-name | gpg --symmetric --cipher-algo AES256 -o /path/to/encrypted-snapshot.gpg

Implement flake8 testing

Implement flake8 testing so we can enforce style with every merge.

Update pruning

Are there any plans to add a sort of pruning schedule? Similar to borg's prune command with the --keep-hourly, --keep-daily, etc. options.

Build package using GitHub Actions

Build package using GitHub Actions. We'll also want to establish minimum package versions.

time_elapsed is always 0m for transfer callback

ZFsjob should fail early if arguments are incorrect

ZFSjob should fail early if any of the required arguments are None.

FreeBSD

Can this be installed/used on FreeBSD?

zfs receive does not work if the destination filesystem has been changed since the most recent snapshot

Error:

cat 20210607_020000.inc | zfs receive <filesystem>@20210607_020000
cannot receive incremental stream: destination <filesystem> has been modified
since most recent snapshot

zfsup restore <filesystem> 20210607_020000 doesn't return anything. If you wrap the restore command in a try except statement you'll get a BrokenPipeError. We'll want to catch that and let the user know what the problem likely is. The -F option forces a rollback to the most recent snapshot. We should add an option for that.

Delta backups done against full rather than incrementals

Hi @ddebeau I am seeing in my aws console that the incrementals created are based on the full rather than on previews incremental.

If you look at my backups, they are getting bigger and bigger duplicating a lot of data. I can't create a new full as it needs to be stored for 6 months due to deep archive constraints. I am open to create a PR to solve this but I wanted your opinion on the subject.

This affects:

Backups: They need to look into the latest incremental and use that as a base
Restores: They need to restore the full and all the chain of incrementals up to the point where the user specifies in time

The savings would be massive across time

Thanks

Recursive snapshots

First of all, thank you very much for sharing this project! I'm a newcomer to ZFS and I was able to get started in minutes thanks to the excellent documentation!

As I said before, I'm fairly new to ZFS. I've managed to put together a server with the following datasets:

$ zfs list
NAME                    USED  AVAIL     REFER  MOUNTPOINT
zfspool                 244G  5.09T      467M  /zfspool
zfspool/bulkstorage     145G  5.09T      145G  /zfspool/bulkstorage
zfspool/vm-100-disk-0     3M  5.09T      120K  -
zfspool/vm-100-disk-1  33.0G  5.12T     3.30G  -
zfspool/vm-102-disk-0  66.0G  5.13T     20.6G  -
zfspool/vm-102-disk-1     3M  5.09T      120K  -

My config file looks like this (with redacted info styled <like this>):

[DEFAULT]
bucket_name = <bucket>
region = <region>
access_key = <access key>
secret_key = <secret key>
storage_class = STANDARD
endpoint = <endpoint>

[zfspool]
cron = 0 2 * * *
max_snapshots = 7
max_incremental_backups_per_full = 6
max_backups = 7

I naively expected zfs_uploader to recursively snapshot and upload backups of each dataset within zfspool, however it only did so for the data specifically stored in the zfspool mount point that was not part of any of the children.

Is this supported by zfs_uploader, or do I need to specify each dataset manually? Ex. [zfspool/bulkstorage], [zfspool/vm-100-disk-0], etc.

Also related, I did find that ZFS supports recursive snapshots, but I haven't tried it yet: https://docs.oracle.com/cd/E19253-01/819-5461/gdfdt/index.html

Part number limit is reached when uploading large snapshots

The following traceback occurs when uploading large snapshots:

botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the UploadPart operation: Part number must be an integer between 1 and 10000, inclusive

The error is caused when the part number limit (10,000) is reached. We'll need to adjust part size ourselves instead of letting Boto do it.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

Add GitHub Actions testing

We should be able to run the test suite in the standard GitHub Actions linux runner. zpool can create pools from files.

Add basic CLI

A basic CLI should be added that has the following functions:

Display version
Display help
List backups
Backup
Restore

pip install fails due to bad script entry point

ERROR: For req: zfs_uploader. Invalid script entry point: <ExportEntry zfsup = zfs_uploader.__main__:None []> - A callable suffix is required. Cf https://packaging.python.org/specifications/entry-points/#use-for-scripts for more information.

`zfsup list` should show the backup size

Move S3 upload code to BackupDB

We should move the S3 upload code to BackupDB so that it works like SnapshotDB in that it can create the objects it references. The job code should handle the complicated stuff that involves backups and snapshots.

zfs_uploader/zfs_uploader/job.py

Lines 137 to 152 in 74e469d

    
           s3_key = f'{self._file_system}/{backup_time}.full' 
        
           self._logger.info(f'[{s3_key}] Starting full backup.') 
        
           with open_snapshot_stream(self.filesystem, backup_time, 'r') as f: 
        
               self._bucket.upload_fileobj(f.stdout, 
        
                                           s3_key, 
        
                                           Config=self._s3_transfer_config, 
        
                                           ExtraArgs={ 
        
                                               'StorageClass': self._storage_class 
        
                                           }) 
        
               stderr = f.stderr.read().decode('utf-8') 
        
           if f.returncode: 
        
               raise ZFSError(stderr) 
        
           self._check_backup(s3_key) 
        
           self._backup_db.create_backup(backup_time, 'full', s3_key)

Configuring different max part numbers for some S3 providers

When setting up zfs_uploader against Scaleway Object Storage (Specifically their GLACIER tier), everything worked as expected except with one caveat: The max part number on Scaleway is 1000 rather than the 10000 used by AWS.

This resulted in an error when uploading with the default setup, since it calculated the part sizes based on 10,000 parts and eventually failed due to exceeding Scaleway's limit of 1,000 parts.

I resolved this for my use-case by simply modifying a number in job.py: Erisa@20ed42f however I feel that going forward it would be a good idea to allow configuration of this value in the zfs_uploader configuration file, and document it on the README.

You could also detect and change the values based on predefined provider limits, however it still would be nice to have the value in a user-configurable place.

All CLI commands need to log

Right now only the backup command is logging.

Use None for optional keyword arguments

Convention is to use None for optional arguments and set the default values in code.

zfs_uploader/zfs_uploader/job.py

Lines 74 to 76 in ccfa4b6

    
           def __init__(self, bucket, access_key, secret_key, filesystem, 
        
                        region='us-east-1', cron=None, max_snapshots=None, 
        
                        max_incremental_backups=None, storage_class='STANDARD'):

Support sending encrypted snapshots

Support sending encrypted (raw) snapshots.

Restore full backup only when necessary

When restoring an incremental backup we shouldn't restore the full backup if the snapshot used for the full backup still exists on the system.

zfs_uploader/zfs_uploader/job.py

Lines 126 to 130 in fd89459

    
           elif backup_type == 'inc': 
        
               # restore full backup first 
        
               backup_full = self._backup_db.get_backup(backup.dependency) 
        
               self._restore_snapshot(backup_full) 
        
               self._restore_snapshot(backup)

Add CLI command that writes an empty configuration file

Don't return traceback for handled errors

CLI errors like No configuration file found. should just print the statement and exit 1.

	def _read_backup_info(self):
	info_object = self._s3.Object(self._bucket,
	f'{self._filesystem}/backup.info')
	try:
	with BytesIO() as f:
	info_object.download_fileobj(f)
	f.seek(0)
	return json.load(f)

	except ClientError:
	return {}

	def _write_backup_info(self, backup_info):
	info_object = self._s3.Object(self._bucket,
	f'{self._filesystem}/backup.info')
	with BytesIO() as f:
	f.write(json.dumps(backup_info).encode('utf-8'))
	f.seek(0)
	info_object.upload_fileobj(f)

	def _set_backup_info(self, key, file_system, backup_time, backup_type):
	backup_info = self._read_backup_info()
	backup_info[key] = {'file_system': file_system,
	'backup_time': backup_time,
	'backup_type': backup_type}
	self._write_backup_info(backup_info)

	def _del_backup_info(self, key):
	backup_info = self._read_backup_info()
	backup_info.pop(key)
	self._write_backup_info(backup_info)

	s3_key = f'{self._file_system}/{backup_time}.full'
	self._logger.info(f'[{s3_key}] Starting full backup.')

	with open_snapshot_stream(self.filesystem, backup_time, 'r') as f:
	self._bucket.upload_fileobj(f.stdout,
	s3_key,
	Config=self._s3_transfer_config,
	ExtraArgs={
	'StorageClass': self._storage_class
	})
	stderr = f.stderr.read().decode('utf-8')
	if f.returncode:
	raise ZFSError(stderr)

	self._check_backup(s3_key)
	self._backup_db.create_backup(backup_time, 'full', s3_key)

	def __init__(self, bucket, access_key, secret_key, filesystem,
	region='us-east-1', cron=None, max_snapshots=None,
	max_incremental_backups=None, storage_class='STANDARD'):

	elif backup_type == 'inc':
	# restore full backup first
	backup_full = self._backup_db.get_backup(backup.dependency)
	self._restore_snapshot(backup_full)
	self._restore_snapshot(backup)

ddebeau / zfs_uploader Goto Github PK

zfs_uploader's People

Contributors

Stargazers

Watchers

Forkers

zfs_uploader's Issues

Recommend Projects

Recommend Topics

Recommend Org