Giter Site home page Giter Site logo

Comments (28)

briceburg avatar briceburg commented on May 13, 2024 4

Nick,

Right -- totally agree re: new rclone command for this. The precursor is getting bucket-to-bucket functionality.

Just FYI, I'm super excited about rclone && am loving your work. For now, I'm going to go with an incredibly simple approach to backups with rclone.

basically I'll have two targets, one that is synced weekly, and one that is synced daily. E.g. my cron will look like;

10     2     *     *     *  rclone sync ~/VAULT google:vault/nesta/daily
10     4     *     *     0  rclone sync ~/VAULT google:vault/nesta/weekly

This will [hopefully] preserve deleted files in the weekly snapshot. Could also add a monthly &c.

I think this will work for now, but certainly interested in helping with bucket-to-bucket and incremental strategies. If I can help, please let me know. May need to learn Go :)

from rclone.

ncw avatar ncw commented on May 13, 2024 2

So, removing file1 results in removal from current and a copy stored in backup1, right ?

That is correct.

This should move any old file from current to the last hour, and keep the current backup in current, so, if file1 is removed now (15:10), on next run (16:00), the current will loose file1 but the 15:00 directory will keep it. right ?

Yes that sounds correct too.

There is a huge drawback: the backup-dir will only hold changes, not the full tree like rsnapshot does via links

I intend to fix this with a dedicated backup command at some point but we are not there yet.

rclone sync remote:current remote:1-hour-ago
rclone sync /path/to/local remote:current

yes that would work. The first rclone command would use server side copies so be relatively quick too. It does use a lot more space though. Some might say that was a good thing as you then have two actually independent backups.

from rclone.

guestisp avatar guestisp commented on May 13, 2024 1

So, removing file1 results in removal from current and a copy stored in backup1, right ?

I'm trying to figure out a properly naming schema, in example, I would like to create hourly backups. Currently i'm testing this:

BACKUP_DIR=$(/bin/date +'%F_%R' -d '1 hour ago')
rclone sync $dir amazon_s3:mybucket/current$dir --backup-dir amazon_s3:mybucket/${BACKUP_DIR}$dir

in a hourly cron.
This should move any old file from current to the last hour, and keep the current backup in current, so, if file1 is removed now (15:10), on next run (16:00), the current will loose file1 but the 15:00 directory will keep it. right ?

There is a huge drawback: the backup-dir will only hold changes, not the full tree like rsnapshot does via links. Probably, the following would create something more similiar to rsnapshot:

rclone sync remote:current remote:1-hour-ago
rclone sync /path/to/local remote:current

but using much more space.

from rclone.

briceburg avatar briceburg commented on May 13, 2024

My initial thought is to use remote-to-remote for this, e.g.

First Backup ("base")

rclone copy /path/to/backup remote:/backups/base

Subsequent Backups

date=`date "+%Y%m%d_%H:%M:%S"`
rclone sync remote:/backups/base remote:/backups/$date
rclone sync /path/to/backup remote:/backups/$date

Not sure about the efficiency of the remote-to-remote. Bad idea?

Also, the README indicates:

[sync] Deletes any files that exist in source that don't exist in destination.

I'm used to behavior that deletes files that exist in the target destination that do not exist in the source destination. Worried that the rclone behavior would remove [new] files from /path/to/backup ...

Thanks!

from rclone.

ncw avatar ncw commented on May 13, 2024

Your idea for the remote to remote copy is how I would approach it.

This has one disdvantage with rclone as it stands today in that it will effectively download the data and re-upload it. However I have been thinking about an allowing bucket to bucket copies which would be exactly what you want. S3, Swift and GCS all allow this. Here is the docs for GCS

So if I were to implement that then the copy to backup first would work really quite well I think.

As for

[sync] Deletes any files that exist in source that don't exist in destination.

I think it is badly worded, it "deletes files in the destination that don't exist in the source" as you would expect. I'll fix the wording

from rclone.

briceburg avatar briceburg commented on May 13, 2024

Nick,

Great! This is pretty exciting. Bucket-to-bucket copying sounds promising. What about this approach as well;

rclone sync /path/to/backup remote:/backups/base remote:/backups/changes.2015-01-19

Where rclone would compare /path/to/backup against remote:/backups/base , and copy changes to remote:/backups/changes-2015-01-19

Obviously this would mess with the deletes behavior, which could be dealt with by adding a flag that would remove deleted files from remote:/backups/base, and optionally preserving them elsewhere (e.g. copying them to remote:/backups/deleted-files ). We could then run a janitorial command that removes files older than X days from remote:/backups/deleted-files ) ... *and also take advantage of bucket-to-bucket copying without incurring the cost of doubling storage space with each snapshot *

from rclone.

ncw avatar ncw commented on May 13, 2024

Interesting idea!

I think I'd simplify the logic slightly and make it a new rclone command

rclone sync3 /path/to/backup remote:/backups/base remote:/backups/changes.2015-01-19
  • for every file in/path/to/backup
    • if it is in base unchanged - skip
    • if it is modified in base
      • copy the file from base to changes if it exists in base
      • upload the file to base
  • for every file in base but not in backup
    • move it from base to changes

This would mean that base would end up with a proper sync of backup, but changes would have any old files which changed or were deleted. It would then effectively be a delta, and you would have all the files at both points in time.

You could re-create the old filesystem easily, except for if you uploaded new files into base - there would be no way of telling just by looking at base and changes that those new files where new or just unchanged old files. This may or may not be a problem!

from rclone.

briceburg avatar briceburg commented on May 13, 2024

I've added ansible scripts to 1) install rclone and 2) implement the above backup strategy on a crontab based system (still need to make a systemd timer compatible version for archlinux &c). Sharing for fun.

Install rclone:

---

- name: install rclone 
  hosts: all
  sudo: true
  sudo_user: root

  vars:
    # check http://rclone.org/downloads/ for latest...
    rclone_version: 1.07
    rclone_vstr: rclone-v{{ rclone_version }}-linux-amd64
    rclone_target: /opt/rclone/{{ rclone_vstr }}

  pre_tasks:
    - stat: path={{ rclone_target }}
      register: stat_rclone 

  tasks:
    - name: download rclone
      uri:
        dest=/tmp/
        follow_redirects=all
        url=http://downloads.rclone.org/{{ rclone_vstr }}.zip
      when: not stat_rclone.stat.exists

    - name: unpack rclone
      command: unzip /tmp/{{ rclone_vstr }}.zip -d /opt/rclone
        creates={{ rclone_target }}

    - name: add rclone to path
      file:
        state=link
        dest=/usr/local/bin/rclone
        src={{ rclone_target }}/rclone

Backup Stategy

---
- name: vault backup 
  hosts: all

  vars:
    vault_base: "google:iceburg-vault/{{ TARGET_USER }}"
    vault_daily: "{{ vault_base }}/daily" 
    vault_weekly: "{{ vault_base }}/weekly"

  tasks:
    - name: $HOME/.rclone.conf
      file:
        state=link
        dest={{ TARGET_USER_HOME }}/.rclone.conf
        src={{ DOTFILES_DIR }}/.rclone.conf
        force={{ FORCE_LINKS }}

    - name: fetch vault
      command: rclone copy {{ vault_daily }} ~/VAULT
        creates=~/VAULT

    - name: schedule daily vault backup
      cron:
        name="daily vault backup"
        minute=40
        hour=4
        job="rclone sync ~/VAULT {{ vault_daily }}"

    - name: schedule weekly vault backup
      cron:
        name="weekly vault backup"
        minute=40
        hour=5
        job="rclone sync ~/VAULT {{ vault_weekly }}"

from rclone.

briceburg avatar briceburg commented on May 13, 2024

Nick,

I've been playing with Syncthing of late. It uses the very cool idea of "versions" I believe derrived from Dropbox and/or Bittorret Sync. Vs. the incremental ideas outlined -- perhaps an incremental versioning scheme is prefered and easier to implement?

The "simple" Versioning scheme in Syncthing allows you to specify a folder name and number of copies you would like to preserve. E.g.

  1. During a sync, if a file is changed, copy the original version to the "versioned" folder.
    E.g. :/.versions//filename.
  2. If more than X versions of a file exist, delete the oldest.

So for the sync

rclone sync-versioned /path/to/backup remote:/backups

If remote:/backups/apache/virtualhost.a was FOUND, but deleted or changed from /path/to/backup/apache/virtualhost.a , rclone would

  • make sure remote:/backups/.versions/apache folder exists (assuming .versions is the configured folder name)
  • copy remote:/backups/apache/virtualhost.a to remote:/backups/.versions/apache/virtualhost.a
    • if remote:/backups/.versions/apache/virtualhost.a exists, apply versioning scheme. E.g. rename older backups to remote:/backups/.versions/apache/virtualhost.a.[1-4] if configured to preserve 5 versions of a file.

Personally I think versions may be more accessible, and doesn't involve deltas. What do you think?

from rclone.

ncw avatar ncw commented on May 13, 2024

Sorry missed your last comment..

Yes, Versions sounds like it would be simpler for people to understand.

The renaming scheme needs a bit of thought - windows doesn't deal well with files with funny extensions.

Implementation wise, it is quite similar to the schemes above.

from rclone.

briceburg avatar briceburg commented on May 13, 2024

@ncw OK. If time allows I'll learn go and submit a PR :) Will keep an eye on the project in the meantime! Thanks.

from rclone.

ncw avatar ncw commented on May 13, 2024

I'll just note that rclone now has bucket to bucket copy and sync which may be helpful!

from rclone.

leocrawford avatar leocrawford commented on May 13, 2024

A feature along the lines of #18 or #98 would be very welcome. I agree that it is desirable to store full files rather than diffs for simplicity and ease of restoration, but i wonder if we could improve on versioned folders idea?

The main drawback of this is when a file is moved (or repeatedly removed and created) we get a lot of copies of the same file. Instead if we treated the .backup directory as a content addressable storage, such that each backed up file was stored using its md5 has as a filename we would only need a little metadata stored to allow a restore.

I'd suggest that what we could need to store for each version is a JSON file that contains a line for each filesystem change along the lines of

operation, metadata, blob

here:

  • Operation would be add, delete, mkdir or similar (probably to match operations in fs)
  • Metadata would contain chmod, date, etc.
  • blob would be a md5 of the file in question

I'd suggest that the version file itself is named as the md5 of its contents and contains a reference (probably in the first line) to the previous backup. The most recent backup would probably be retained by writing the md5 of the most recent backup to a file called HEAD in the .backup directory. this would be the only file that would ever need to change. (in effect we're creating a merkle tree)

The advantage of this approach is that as well as restoring files we can restore other changes readily, by returning to any arbitrary point in the history (including deleted files, metadata, etc) and it could cope with multi-way syncing with a little work. I also believe this approach could support a full two-way sync more readily than simple versioning, as the metadata allows us to determine what changes have been made since last sync reducing our ability to determine which update to propagate, rather than simply having to mark a potential conflict.

In practice the easiest way of doing a restore is to allow source to have an optional version specified (either by using the md5 hash or simply an integer to represent the number of steps back to go), and so a restore could simply be a copy from the (old) destination.

One interesting way to implement this would be to provide SourceVersionWrapper and DestinationVersionWrapper which wrap any existing fs object, and in the case of SourceVersionWrapper allow an arbitrary version to be specified, and DestinationVersionWrapper simply creates the .backup metadata and blobs.

The advantage of this would be that if you did implement a FUSE support #494 then you would have in effect created a versioned filesystem for free. :-)

from rclone.

thibaultmol avatar thibaultmol commented on May 13, 2024

New feature from Backblaze for B2: https://www.backblaze.com/blog/backblaze-b2-lifecycle-rules/

(might be relevant)

from rclone.

robinrosenstock avatar robinrosenstock commented on May 13, 2024

More than a half year later.. @ncw status update?
This backup feature would make rclone a possible backup solution, especially with external drives (no cloud), wouldn't it?

from rclone.

ncw avatar ncw commented on May 13, 2024

rclone now supports --backup-dir which with a tiny amount of scripting gives all the tools necessary for incremental backups.

I keep meaning to wrap this into an rclone backup command, but I haven't got round to it yet!

from rclone.

navotera avatar navotera commented on May 13, 2024

hi @ncw
i curious what does --backup-dir=DIR is doing ...
is it doing copy on the server side by its base folder OR
it is doing upload file from local to backup-dir (so there are two upload operation, 1. local sync/copy aka upload to remote base folder. 2. local sync/copy to backup folder on remote)

thank you

from rclone.

ncw avatar ncw commented on May 13, 2024

@navotera --backup-dir does a server side move, (or possibly a server side copy followed by a delete if server side move isn't available).

from rclone.

navotera avatar navotera commented on May 13, 2024

from rclone.

guestisp avatar guestisp commented on May 13, 2024

So, by using something like:
rclone sync /path/to/local remote:current --backup-dir remote:$(date) remote:current will hold the latest backup (thus, the "current" version of files) and every changes between the current version and the previous one would be stored in "remote:$(date)" resulting in something like rsnapshot?

In other words, if yesterday i had a file called "foo" that was deleted today, with today clone, this file will be removed from the current remote and placed in remote with yesterday date, right?

Isn't easier to run a remote copy before a new sync? Like the following:

rclone sync remote:current remote:yesterday
rclone sync /path/to/local remote:current

Exactly like rsnapshot

from rclone.

ncw avatar ncw commented on May 13, 2024

@guestisp

So, by using something like:
rclone sync /path/to/local remote:current --backup-dir remote:$(date) remote:current will hold the latest backup (thus, the "current" version of files) and every changes between the current version and the previous one would be stored in "remote:$(date)" resulting in something like rsnapshot?

Yes that is right

In other words, if yesterday i had a file called "foo" that was deleted today, with today clone, this file will be removed from the current remote and placed in remote with yesterday date, right?

Yes.

Isn't easier to run a remote copy before a new sync? Like the following:

That will use a lot more storage - you'll have a complete copy for yesterday and a complete copy for current.

from rclone.

guestisp avatar guestisp commented on May 13, 2024

But with --backup-dir, i have to search a file in every repository or each repository is a complete copy like with rsnapshot and hardlinks?

from rclone.

ncw avatar ncw commented on May 13, 2024

But with --backup-dir, i have to search a file in every repository or each repository is a complete copy like with rsnapshot and hardlinks?

Yes searching will be necessary as not many cloud providers support hard links. (A few do like google drive).

I intend to make a rclone backup command which hides this from the user though at some point.

from rclone.

guestisp avatar guestisp commented on May 13, 2024

I'm trying to use the suggested method (--backup-dir) but something is not working as expected.

This is a simple script that i'm running:

#!/bin/sh

BACKUP_DIR=$(/bin/date +'%F_%R')
for dir in /etc /var/www /var/backups /var/spool/backups; do 
   rclone sync $dir amazon_s3:mybuket/current/$dir --backup-dir amazon_s3:mybuket/${BACKUP_DIR} --exclude '*/storage/logs/*' --stats 2s --log-level ERROR
done

I would expect that on first run, everthing would be synced in mybuket/current (and this is working properly), then on every subsequent run, changed files should be moved to mybucket/${BACKUP_DIR} but this is not working. Files are still synced in current

I would like to have something like rsnapshot. current should hold the latest sync, then every changes from the latest sync and the previous one, should be moved to the backup-dir.
In example, yesterday I had file1, file2. These are synced in current. Today I remove file2 and change file1 content. On next run, today's version should be synced in current, the yesterday version should be moved in 20180817_0930

from rclone.

ncw avatar ncw commented on May 13, 2024

What should happen is any files that are changed or deleted get moved to the backup-dir which is I think what you are asking for.

Here is a simple example

$ tree src
src
└── file1

0 directories, 1 file
$ rclone sync src dst/current --backup-dir dst/backup1
$ tree dst
dst
└── current
    └── file1

1 directory, 1 file
$ date > src/file1
$ date > src/file2
$ rclone sync src dst/current --backup-dir dst/backup1
$ tree dst
dst
├── backup1
│   └── file1
└── current
    ├── file1
    └── file2

2 directories, 3 files
$ rm src/file1
$ rclone sync src dst/current --backup-dir dst/backup2
$ tree dst
dst
├── backup1
│   └── file1
├── backup2
│   └── file1
└── current
    └── file2

3 directories, 3 files
$ 

I would say also that amazon_s3:mybuket/${BACKUP_DIR} in your script should be amazon_s3:mybuket/${BACKUP_DIR}/$dir to fit in with the naming scheme.

from rclone.

balupton avatar balupton commented on May 13, 2024

Found this issue, and rclone, as I've been looking for an alternative to https://www.arqbackup.com that supports linux. Arq is like rclone, but specific to backup use cases. I emailed their dev, but they have no timeline for a linux client/app.

That said, this conglomeration could work:

  1. Raspberry Pi running Ubuntu Server has a 12TB drive (with 2 partitions) attached to it available via Samba
  2. Arq running on MacOS on a Macbook backs up via Samba the first partition to the second partition.
  3. The Rasberry Pi backs up the second partition to the cloud via rclone.

That way the raspberry pi does the cloud backup via rclone (the slow thing) as it is always on, and the macbook does the ocassional local snapshots (the quick thing) while it powered on and available.

from rclone.

ivandeex avatar ivandeex commented on May 13, 2024

@ncw
The last comment here is 3 years old
Do you think that rclone backup is still a viable idea?

from rclone.

hmoffatt avatar hmoffatt commented on May 13, 2024

Could you use --compare-dest with a list of all the directories since the last full backup in order to make an incremental backup?

Full backup: possibly use --copy-dest from all of the previous incrementals to avoid uploading again
Incremental backup: --compare-dest all the incrementals + the last full
Differential backup: --compare-dest the last full backup only

from rclone.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.