Giter Site home page Giter Site logo

buttersink's People

Contributors

a3nm avatar amescornish avatar eugene-bright avatar jenserat avatar smoofra avatar srendi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

buttersink's Issues

--part-size improvements

There's a few problems with the --part-size setting:

  1. the argparse help is improvable. I have no idea about what the default size is, and there is no indication that the amount is expressed in MBs and not in bytes. I had to read the source code to find out.
  2. default size of 20MB is, IMHO, way too small, doesn't offer any real benefit, and significantly slows down upload. Amazon recommends 100MB.
  3. S3 has a hard limit of 10,000 parts in a multipart upload. I discovered that when everything fell over after uploading 200 GB of data (aaaargh). Ideally, part-size should automatically adjust itself upwards when uploading large files (after printing a message), so that the limit is never hit.

delete before transfer

Hi,

Thanks for a awesome Python script, its really handy.

Recently I had a problem where my backup disk ran out of space, wouldn't it make sense to be able to "delete" the old subvolumes first before transfer, that way space is freed up..

Ben

setting automatically ro property to source, and keep source property to destination

instead of creating new ro snapshots, and to keep rw property to destination, could it be possible to :

  • set automatically ro property to source before transfert
  • set automatically ro/rw property back to source after transfer
  • set destination property the same as source after transfer

so that: we can recursively transfer snapshots. for the moment, as destination snapshots are ro, we can't recursively transfer snapshots into other snapshots.

this:

btrfs sub list -qu --sort ogen /source_dir /| awk '{ print $13}' | while read X; do echo $X; buttersink $X ssh://root@server${X%/*}; done

won't work if there are nested snapshots.

Remove destination folder if it exists

After the server had been restored from backup snapshot numbering was reset.
Thus new snapshots has the same path as those are stored on the backup server but at the same time has different UUID.
Buttersink tries to create directories that are exists already causing an error.

    ERROR:buttersink.py[275]: ERROR: Path /var/backups/myhost/rootfs/1/snapshot exists, can't receive 508f9147-20fc-4b4c-9d23-ea622fd2c230.                                                                             
Traceback (most recent call last):
  File "/root/buttersink/buttersink.py", line 257, in main
    diff.sendTo(dest, chunkSize=args.part_size << 20)
  File "/root/buttersink/Store.py", line 346, in sendTo
    receiveContext = dest.receive(self, paths)
  File "/root/buttersink/ButterStore.py", line 179, in receive
    "Path %s exists, can't receive %s" % (path, diff.toUUID)

Is it safe to silently delete destination folders if they exist already?
I would like to start discussion here.

Progess output reports weird results

Here is a part of synchronization progress output of a fresh synchronization session:

0:28:01.679565: Sent 28.11 GiB of 30.58 GiB (91%) ETA: 0:02:28.066585 (144 Mbps )
0:28:02.911090: Sent 28.12 GiB of 30.58 GiB (91%) ETA: 0:02:26.903429 (144 Mbps )
0:28:04.528255: Sent 28.14 GiB of 30.58 GiB (92%) ETA: 0:02:25.773551 (144 Mbps )    

It's very helpful. We have 30.58 GiB of data to send and we sent 28.14 GiB of it, which is 28.14 / 30.58 = 92%.

Now I take another snapshot (which I expect a few MiB to change) and here is its progress results:

0:00:33.005008: Sent 880 MiB of 395.4 MiB (222%) ETA: None (224 Mbps )          
0:00:33.675362: Sent 900 MiB of 395.4 MiB (227%) ETA: None (224 Mbps )          
0:00:34.304221: Sent 920 MiB of 395.4 MiB (232%) ETA: None (225 Mbps )          
0:00:34.936035: Sent 940 MiB of 395.4 MiB (237%) ETA: None (226 Mbps )          
0:00:35.584806: Sent 960 MiB of 395.4 MiB (242%) ETA: None (226 Mbps )          
0:00:35.603220: Sent 960.5 MiB of 395.4 MiB (242%) ETA: None (226 Mbps )        
0:00:35.603273: Sent 960.5 MiB ETA: None (226 Mbps )                     

  measured size (960.5 MiB), estimated size (395.4 MiB)

How many MiB's are there to send? How many MiB's has been sent?

Obviously, it's confusing during the transfers.

Proposal

Estimated size might be left out in order to prevent confusion. It estimates mostly wrong.

Poor heuristic choices for sending incremental diffs

I have a read-write "current" subvolume, from which I create read-only snapshots every day.
If I didn't do anything significant on a given day, btrfs send <today> -p <yesterday> will produce a stream worth kilobytes.

buttersink doesn't seem to realise this, and tries to do extremely expensive transfers from a much older snapshot.

To another btrfs hard disk:

# buttersink   -n /btrfs/crusaderky/ /mnt/ext_hdd/crusaderky/
  Waiting for btrfs quota usage scan...
  Optimal synchronization:
  36.92 GiB from 3 diffs in btrfs /btrfs/crusaderky
  452.5 GiB from 1 diffs in btrfs /mnt/ext_hdd/crusaderky
  489.4 GiB from 4 diffs in TOTAL
  Keep: ca37...2c4b /mnt/ext_hdd/crusaderky/20170902-130702 from None (452.5 GiB)
  WOULD: Xfer: f970...da46 /btrfs/crusaderky/20171218-223900 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (9.201 GiB)
  WOULD: Xfer: de7c...accf /btrfs/crusaderky/20180104-000001 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (13.86 GiB)
  WOULD: Xfer: 5d72...eef2 /btrfs/crusaderky/20180103-013041 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (13.86 GiB)

To s3:

# buttersink   -n /btrfs/crusaderky/ s3://crusaderky-buttersink/crusaderky/
  Listing S3 Bucket "crusaderky-buttersink" contents...
  measured size (27.72 GiB), estimated size (27.72 GiB)
  Optimal synchronization:
  462.9 GiB from 2 diffs in S3 Bucket "crusaderky-buttersink"
  27.72 GiB from 2 diffs in btrfs /btrfs/crusaderky
  490.7 GiB from 4 diffs in TOTAL
  Keep: ca37...2c4b /crusaderky/20170902-130702 from None (453.7 GiB)
  Keep: f970...da46 /crusaderky/20171218-223900 from ca37...2c4b /crusaderky/20170902-130702 (9.201 GiB)
  WOULD: Xfer: de7c...accf /btrfs/crusaderky/20180104-000001 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (13.86 GiB)
  WOULD: Xfer: 5d72...eef2 /btrfs/crusaderky/20180103-013041 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (13.86 GiB)

In the above situation,

  • the send of 20180103-013041 should use 20171218-223900 as a parent (4.7 GB) and not 20170902-130702 (13.86 GB).
  • the send of 20180104-000001 should use 20180103-013041 as a parent (< 1 MB) and not 20170902-130702 (13.86 GB).

Catch interrupt signal

If you press Ctrl+C while transfering data you get python errors:

  Xfer: 138e...0423 /.snapshots/37/snapshot from f84d...a4db /.snapshots/40/snapshot (~3.033 MiB)
^C:11:03.480866: Sent 1.719 GiB of 3.033 MiB (58029%) ETA: None (22.3 Mbps )                     
  btrfs receive errors
Traceback (most recent call last):
  File "/usr/bin/buttersink", line 11, in <module>
    load_entry_point('buttersink==0.6.8', 'console_scripts', 'buttersink')()
  File "/usr/lib/python2.7/site-packages/buttersink/buttersink.py", line 257, in main
    diff.sendTo(dest, chunkSize=args.part_size << 20)
  File "/usr/lib/python2.7/site-packages/buttersink/Store.py", line 354, in sendTo
    transfer(sendContext, receiveContext, chunkSize)
  File "/usr/lib/python2.7/site-packages/buttersink/Store.py", line 276, in transfer
    writer.write(data)
KeyboardInterrupt

Migrate to python3

All major project have moved to python3.
buttersink should do it too. Or must be deprecated.

Python3 provides good type annotation semantics that makes large code base more maintainable. That's much better then annotations in docstrings.

Snapshot not deleted.

When syncing buttersink is able to make new snapshot.
When it needs to delete one I get:
Delete subvolume 162/snapshot
ERROR: Device hasn't been succesfully opened. Use 'with' statement..

Regards,

Simon

stuck and eats 100% cpu

sometimes it is stuck in infinite loop and eats 100%cpu.

Can't kill it (KILL -9 do nothing), and a strace shows nothing, so it seems it does not call sys calls at all.

synchronize with USB disk

Hi,

I was under the impression that it should also be possible to synchronize snapshots between local btrfs filesystems, however when I try this I get an error, which seems to be due to the fact that both the source and destination have the same rootid.

$sudo ./buttersink.py -n btrfs:///home/.snapshot/ btrfs:///media/BackupDisk1/ -l log.txt

Traceback (most recent call last):
  File "./buttersink.py", line 168, in main
    dest = parseSink(args.dest, source is not None, args.dry_run)
  File "./buttersink.py", line 152, in parseSink
    return Sinks[parts['method']](parts['host'], parts['path'], isDest, dryrun)
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/ButterStore.py", line 53, in __init__
    self._fillVolumesAndPaths()
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/ButterStore.py", line 71, in _fillVolumesAndPaths
    for bv in mount.subvolumes:
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/btrfs.py", line 485, in     subvolumes
    self._getRoots()
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/btrfs.py", line 647, in _getRoots
    info,
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/btrfs.py", line 371, in __init__
    assert rootid not in Volume.volumes, rootid
AssertionError: 5

here's the last lines from the log file:

2015-02-08 11:22:06,452:   DEBUG:btrfs.py[552] _getMounts(): /: /media/BackupDisk1
2015-02-08 11:22:06,452:   DEBUG:btrfs.py[595] _walkTree(): Reading 14 nodes from 3992 bytes
2015-02-08 11:22:06,453:   DEBUG:btrfs.py[352] __init__(): Volume 5/0:    StructureTuple(inode=StructureTuple(generation=1, transid=0, size=3, nbytes=16384, block_group=0, nlink=1, uid=0, gid=0, mode=16877, rdev=0, flags=18446744071562067968L, sequence=0, reserved='', atime=StructureTuple(sec=0, nsec=0), ctime=StructureTuple(sec=0, nsec=0), mtime=StructureTuple(sec=0, nsec=0), otime=StructureTuple(sec=0, nsec=0)), generation=28, root_dirid=256, bytenr=31227904, byte_limit=0, bytes_used=16384, last_snapshot=0, flags=0, refs=1, drop_progress=StructureTuple(objectid=0, type=0, offset=0), drop_level=0, level=0, generation_v2=28, uuid=None, parent_uuid=None, received_uuid=None, ctransid=28, otransid=0, stransid=0, rtransid=0, ctime=StructureTuple(sec=1423352843, nsec=480916699), otime=StructureTuple(sec=0, nsec=0), stime=StructureTuple(sec=0, nsec=0), rtime=StructureTuple(sec=0, nsec=0), reserved='')
2015-02-08 11:22:06,453:   ERROR:buttersink.py[222] main(): 
Traceback (most recent call last):
  File "./buttersink.py", line 168, in main
    dest = parseSink(args.dest, source is not None, args.dry_run)
  File "./buttersink.py", line 152, in parseSink
    return Sinks[parts['method']](parts['host'], parts['path'], isDest, dryrun)
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/ButterStore.py", line 53, in __init__
    self._fillVolumesAndPaths()
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/ButterStore.py", line 71, in _fillVolumesAndPaths
    for bv in mount.subvolumes:
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/btrfs.py", line 485, in subvolumes
    self._getRoots()
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/btrfs.py", line 647, in _getRoots
    info,
  File "/home/jschrod/Downloads/System/BackupProgs/buttersink/buttersink/btrfs.py", line 371, in  __init__
    assert rootid not in Volume.volumes, rootid
AssertionError: 5                    

$sudo btrfs subvolume list /media/BackupDisk1/

ID 257 gen 18 top level 5 path @ButtersinkBackups
ID 258 gen 19 top level 257 path @ButtersinkBackups/@home
ID 259 gen 11 top level 257 path @ButtersinkBackups/@jschrod-Pictures
ID 260 gen 12 top level 257 path @ButtersinkBackups/@jschrod-Uni
ID 261 gen 13 top level 257 path @ButtersinkBackups/@jschrod-Misc 
ID 262 gen 14 top level 257 path @ButtersinkBackups/@jschrod-Documents
ID 263 gen 15 top level 257 path @ButtersinkBackups/@chris-Pictures
ID 264 gen 16 top level 257 path @ButtersinkBackups/@chris-Documents
ID 265 gen 17 top level 257 path @ButtersinkBackups/@chris-Desktop
ID 266 gen 18 top level 257 path @ButtersinkBackups/@chris-Downloads
ID 267 gen 28 top level 5 path @home

$sudo btrfs subvolume list /home/

ID 325 gen 1996 top level 5 path .snapshot/monthly_2015-01-15_12:30:40
ID 341 gen 2720 top level 5 path .snapshot/weekly_2015-01-18_06:57:16
ID 425 gen 8387 top level 5 path .snapshot/weekly_2015-01-25_08:46:03
ID 529 gen 11064 top level 5 path .snapshot/weekly_2015-02-01_09:45:57
ID 539 gen 11567 top level 5 path .snapshot/daily_2015-02-02_10:19:18
ID 548 gen 12033 top level 5 path .snapshot/daily_2015-02-03_09:22:20
ID 555 gen 12549 top level 5 path .snapshot/daily_2015-02-04_09:56:22
ID 574 gen 13593 top level 5 path .snapshot/daily_2015-02-05_11:59:55
ID 586 gen 14452 top level 5 path .snapshot/daily_2015-02-06_12:16:58
ID 595 gen 14784 top level 5 path .snapshot/daily_2015-02-07_06:53:20
ID 597 gen 14896 top level 5 path .snapshot/hourly_2015-02-07_08:17:01
ID 598 gen 14986 top level 5 path .snapshot/hourly_2015-02-07_09:17:01
ID 599 gen 15071 top level 5 path .snapshot/hourly_2015-02-07_17:17:01
ID 600 gen 15073 top level 5 path .snapshot/hourly_2015-02-07_18:17:01
ID 601 gen 15076 top level 5 path .snapshot/hourly_2015-02-07_19:17:01
ID 602 gen 15079 top level 5 path .snapshot/hourly_2015-02-07_20:17:01
ID 603 gen 15110 top level 5 path .snapshot/hourly_2015-02-07_21:17:01
ID 604 gen 15151 top level 5 path .snapshot/hourly_2015-02-07_22:17:01
ID 605 gen 15168 top level 5 path .snapshot/daily_2015-02-08_06:23:32
ID 606 gen 15177 top level 5 path .snapshot/weekly_2015-02-08_06:28:51
ID 607 gen 15259 top level 5 path .snapshot/hourly_2015-02-08_07:17:01
ID 608 gen 15296 top level 5 path .snapshot/hourly_2015-02-08_09:17:01
ID 609 gen 15336 top level 5 path .snapshot/hourly_2015-02-08_10:17:02
ID 610 gen 15406 top level 5 path .snapshot/hourly_2015-02-08_11:17:01

Am I doing something wrong?

buttersink and btrfs quota issues

When I run buttersink on a directory with snapshots, it hangs with the message "Waiting for btrfs quota usage scan"

When I run "btrfs quota rescan -s /", it seems as if the quota scan never seems to end, as the ID never changes.

Is there any way I can abort a quota scan, or perhaps force it to rescan?
I'm guessing this process is rather essential for buttersink's functionality?

Remove CRC?

Hi,
I currently try to install buttersink on archlinux from AUR. However it has another AUR dependency for the crcmod package. I have not looked at the code, but can't you use something else than crc? Like sha512sums? Wouldnt this also be safer? I might be totally wrong though.

Edit: the dependency list is also missing in the readme. It would be cool if you can add that and mark which deps are optional.

Truncate error during btrfs send

This is a new issue for the problems discussed in #14 .

Youam,

You may be running into one of various btrfs tools and kernel bugs, for example:

http://forum.rockstor.com/t/apparent-issues-with-quotas-and-snapshots/252
http://www.spinics.net/lists/linux-btrfs/msg48968.html
https://www.mail-archive.com/[email protected]&q=subject:%22Re%3A+btrfs+send+and+kernel+3.17%22&o=newest&f=1
markfasheh/duperemove#50
http://www.spinics.net/lists/linux-btrfs/msg42279.html

These errors are not fully understood yet, but here are a few things to try to see if you can narrow down the problem:

  • Do a rebalance: sudo btrfs bal start <path>
  • Turn quota off: sudo btrfs quota disable <path>
    (of course buttersink will turn quota back on the next time it's run)
  • Try a scrub: sudo btrfs scrub start <path>
  • Try updating your kernel or btrfs tools

I'm opening a new issue for this.

Snapshot source search controls [Feature Request]

Have flags that allow control of how searching for snapshots in the source directory is done.

Some of the flags from tar/rsync such as:

  • --exclude
  • --exclude-from
  • --include
  • --include-from

Some specific things like:

  • --no-recurse (since recurse is the default) only take snapshots that are direct children of the source path
  • Stop at subvolume boundaries (e.g. don't take snapshots that are inside another subvolume)

Feature request: easier Glacier integration

Writing transition rules to Amazon Glacier is problematic.
The main issue is with .bs files, as the glacier transition rules can't easily avoid them.

I see a few options:

  1. do not create the .bs files altogether. What are they to begin with, and what is their benefit?
  2. implement logic to graciously handle a .bs file archived to glacier
  3. automatically tag all snapshot files, but not the .bs file with a tag, e.g. "archivable=True". This way one can create a rule that only transitions to glacier the snapshots.
  4. ?

Diffs are not calculated correctly

I have two distinct subvolumes and I took about 15 snapshots per subvolume. First sizes are 40 GiB and 450 GiB accordingly.

When I take another snapshot, as only a few MiB of data is changed and first ones are kept, only the changes are sent. When I take another snapshot, everything goes in the same way.

Sometimes (like minutes ago) I took another snapshot and I expect only 100 MiB of data to be changed. But buttersink decides to send nearly every snapshot again. Weird side is that, not every snapshot is really being sent because even though progress output says "1.5 GiB is sent", my laptop's wifi card is not that fast (it reports +5 Gbps).

How can I debug this situation?

Transfer size indication is broken

I have invoked buttersink to transfer snapshots from one filesystem to the other as follows:

$ sudo ./buttersink.py /mnt/mem/snapshots/ /mnt/fah/BACKUP/snapshots/mem/

The current partial output is:

  Optimal synchronization:
  2.4 TiB from 2 diffs in btrfs /mnt/fah/BACKUP/snapshots/mem
  1.082 TiB from 374 diffs in btrfs /mnt/mem/snapshots
  3.482 TiB from 376 diffs in TOTAL
  Keep: aeda...1b4d /mnt/fah/BACKUP/snapshots/mem/btrfs-mem-backup-1459375237 from None (1.2 TiB)
  Keep: fcf5...a56e /mnt/fah/BACKUP/snapshots/mem/btrfs-mem-backup-1459374875 from None (1.2 TiB)
  Xfer: 7b69...2383 /mnt/mem/snapshots/snapshot-1460156461 from aeda...1b4d /mnt/mem/snapshots/btrfs-mem-backup-1459375237 (~439.2 MiB)
 0:13:42.781178: Sent 5.524 GiB of 439.2 MiB (1287%) ETA: None (57.7 Mbps )                           
  Xfer: 419b...5bcf /mnt/mem/snapshots/snapshot-1459512061 from aeda...1b4d /mnt/mem/snapshots/btrfs-mem-backup-1459375237 (~107.4 MiB)
 0:04:09.215211: Sent 5.139 GiB of 107.4 MiB (4901%) ETA: None (177 Mbps )                            
  Xfer: e897...4e4a /mnt/mem/snapshots/snapshot-1460250061 from aeda...1b4d /mnt/mem/snapshots/btrfs-mem-backup-1459375237 (~4.398 GiB)

The percentages end up being larger than 100%, so something is wrong somewhere. (The percentages get incremented and go over 100% while progress information is being displayed.)

(I should also point out that buttersink's heuristic do not seem to be performing so well in this case. The snapshots here were created by a crontab, with diff sizes much smaller than the total volume size, so I think the optimal plan would be to transfer the diffs successively. However, buttersink is apparently trying to transfer the delta with a much older state of the data because it already exists on the destination filesystem.)

Transfer fails if source snapshots are stored in subdirectories

Snapper stores snapshots in the numbered subdirectories.
For example my remote server /root snapshots path are matched by regex:

/.snapthost/{0-9}+/snapshot

Backup process is initiated by remote data storage.

buttersink -eq ssh://example.com/.snapshots/ btrfs:///var/backups/example.com/rootfs/

Snapshot are copied only on the first run with empty destination folder.
Subsequent run yields an error looks like

  Duplicate effective uuid 77e29850-6f0a-5d42-911f-6a38a65324e1 in '/var/backups/example.com/rootfs/snapshot' and '/var/backups/example.com/rootfs/86/snapshot'
  ERROR: [Errno 17] File exists: u'/var/backups/example.com/rootfs'.

I also note that buttersink creates snapshot directory /var/backups/example.com/rootfs/snapshot/which is not present on the source system.

buttersink -V                                                                                        
buttersink 0.6.8

No snapshots found when btrfs is bind mounted

I have a btrfs root mounted at /mnt/btrfs, and then I have a bind mount of /mnt/btrfs at /srv/nfs4/main. If I run something like buttersink /mnt/btrfs/snapshots it returns "No snapshots in source", despite the snapshots being there. Instead, I have to run buttersink /srv/nfs4/main/snapshots.

Are quotas required?

I was about to start experimenting with buttersink. I first tried btrfslist but I cancelled it when it said it was doing a quota scan. I don't need quotas and don't have them enabled. And from the warnings from btrfs people, I don't want them enabled, or anything to do with them.

I then wondered whether buttersink requires quotas?

Details of "Intelligent selection of full and incremental transfers"

I can see from using that not all snapshots, are dependant on previous snapshots when uploaded to S3: scrolling down the list of files in the S3 file explorer I can see that there are breaks in the places where a .bs file exists. I assume that these breaks are what make the backups more resilient with occasional full backups rather than constantly depending on the last snapshot.

My reason for asking is the want to setup long term retention in glacier storage. Once moved to glacier, the files cannot be directly accessed via S3. Are there any guarantees as to when full backups will be made? Can I safely move data older than 60 days to glacier and still be able to access say the last day, week or month of backups?

I was interested in the exact system behind these decisions and hoping that there could be some documentation on this - either within the README or the wiki.

It would also be interesting/useful to know exactly how the data is structured on the S3 side. For example, would it be possible to restore from S3 without buttersink using btrfs receive and piping the files stored on S3 to it (obviously by first following the snapshot dependencies backwards and then importing in the forward order).

"No snapshots in source" across mountpoints

Whenever I try to use buttersink across mount points I get the error: "No snapshots in source."

Example command:

buttersink /mnt/ssd/.snapshots/ /mnt/hdd/.snapshots/

But it works fine inside the same mountpoint:

buttersink /mnt/ssd/.snapshots/ /mnt/hdd/.snapshots-2/

In these examples, /mnt/ssd and /mnt/hdd are the mountpoints of 2 separate disks.

The issue even occurs across mount points of the same disk, if (for example) /dev/sda was mounted in both /mnt/sda1 and /mnt/sda2/ and I were to run:

buttersink /mnt/sda1/.snapshots/ /mnt/sda2/.snapshots-2/

I would get the same error.

Unclear output meaning

What is the different between this to lines?

Keep: ff80...9f8d /media/BACKUP/owncloud/daily_2015-09-29_07:41:09 from None (237.5 MiB)
Keep: 1164...42ef /media/BACKUP/owncloud/daily_2015-10-02_07:58:12 from b9b3...ab1e /media/BACKUP/owncloud/daily_2015-10-01_08:11:30 (~3.551 MiB)

after calling

buttersink -d /media/RAID/owncloud/.snapshot/ /media/BACKUP/owncloud/

in spezial the "from" value of the 2. line is confusing me.

Psutil error in Ubuntu 14.04 LTS

Looks like a newer version of the library is required for that ionice call... Perhaps it'd be a good idea to make it optional and throw a warning if the system version of psutil doesn't support that? (If you think that might be a good idea, I can fork and then submit a pull request with the changes at some stage...)

Traceback:

zeta@zetaX220:~$ sudo buttersink /media/bulkdata/test/.snapshots/ ssh://root@zetaserver/mnt/pool/testing_subvol/
root@zetaserver's password:
Remote version: {u'btrfs': u'Btrfs v3.14.1', u'buttersink': u'0.6', u'linux': u'Linux-3.13.0-48-generic-x86_64-with-Ubuntu-14.04-trusty'}

Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/buttersink/buttersink.py", line 233, in main
best.analyze(args.part_size << 20, source, dest)
File "build/bdist.linux-x86_64/egg/buttersink/BestDiffs.py", line 143, in analyze
edge.sink.measureSize(edge, chunkSize)
File "build/bdist.linux-x86_64/egg/buttersink/ButterStore.py", line 209, in measureSize
allowDryRun=False,
File "build/bdist.linux-x86_64/egg/buttersink/Butter.py", line 109, in send
ps.ionice(psutil.IOPRIO_CLASS_IDLE)
AttributeError: 'Process' object has no attribute 'ionice'

Python and psutil versions:

zeta@zetaX220:~/repos/projects/buttersink$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import psutil
psutil.version
'1.2.1'

(As a secondary note, I've already updated btrfs-tools to 3.14, as the repository version in 14.04 is only 3.12.)

ERROR: empty stream is not considered valid

The send operation completed, but then it failed with the error above. Here's the full output from the operation:

  Optimal synchronization:
  16.16 GiB from 27 diffs in btrfs /srv/nfs4/main/snapshots
  16.16 GiB from 27 diffs in TOTAL
  Xfer: 2a29...de86 /srv/nfs4/main/snapshots/monthly.11 from None (5.262 GiB)
 0:03:08.499894: Sent 5.225 GiB of 5.262 GiB (99%) ETA: 0:00:01.338814 (238 Mbps )                     
  btrfs receive errors
At subvol monthly.11
ERROR: empty stream is not considered valid
  ERROR: receive /mnt/backups/monthly.11 returned error 1..

Here's the relevant end from the log:

2017-01-27 21:15:46,512:    INFO:buttersink.py[245]: Optimal synchronization:
2017-01-27 21:15:46,512:    INFO:buttersink.py[250]: 16.16 GiB from 27 diffs in btrfs /srv/nfs4/main/snapshots
2017-01-27 21:15:46,513:    INFO:buttersink.py[250]: 16.16 GiB from 27 diffs in TOTAL
2017-01-27 21:15:46,513:    INFO:Store.py[343]: Xfer: 2a29...de86 /srv/nfs4/main/snapshots/monthly.11 from None (5.262 GiB)
2017-01-27 21:15:48,586:   DEBUG:Store.py[124]: [u'monthly.11']
2017-01-27 21:15:48,586:   DEBUG:Butter.py[87]: Command: ['btrfs', 'receive', '-e', u'/mnt/backups']
2017-01-27 21:15:50,592:   DEBUG:Butter.py[110]: Command: ['btrfs', 'send', u'/srv/nfs4/main/snapshots/monthly.11']
2017-01-27 21:15:51,246:   DEBUG:send.py[180]: Setting received 2a29abae-fa50-8b44-8ee3-e0b907b2de86/32890 and parent None/0
2017-01-27 21:15:51,261:   DEBUG:send.py[190]: Version: 1
2017-01-27 21:15:51,261:   DEBUG:send.py[198]: Command: 1
2017-01-27 21:15:51,261:   DEBUG:send.py[268]: Subvol: 2a29abae-fa50-8b44-8ee3-e0b907b2de86/32890 monthly.11
2017-01-27 21:15:51,263:   DEBUG:send.py[180]: Setting received 2a29abae-fa50-8b44-8ee3-e0b907b2de86/32890 and parent None/0
2017-01-27 21:15:51,272:   DEBUG:send.py[190]: Version: 1
2017-01-27 21:15:51,272:   DEBUG:send.py[198]: Command: 1
2017-01-27 21:15:51,273:   DEBUG:send.py[268]: Subvol: 2a29abae-fa50-8b44-8ee3-e0b907b2de86/32890 monthly.11
2017-01-27 21:18:59,095:   DEBUG:Butter.py[226]: Waiting for send process to finish...
2017-01-27 21:18:59,096:   DEBUG:Butter.py[147]: Waiting for receive process to finish...
2017-01-27 21:18:59,103:   ERROR:Butter.py[161]: btrfs receive errors
2017-01-27 21:18:59,103:   DEBUG:Butter.py[174]: Renamed /mnt/backups/monthly.11 to /mnt/backups/monthly.11_2017-01-27T21:18:59.103487.part
2017-01-27 21:18:59,104:   DEBUG:buttersink.py[274]: Trace information for debugging
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/buttersink/buttersink.py", line 257, in main
    diff.sendTo(dest, chunkSize=args.part_size << 20)
  File "/usr/local/lib/python2.7/dist-packages/buttersink/Store.py", line 354, in sendTo
    transfer(sendContext, receiveContext, chunkSize)
  File "/usr/local/lib/python2.7/dist-packages/buttersink/Store.py", line 276, in transfer
    writer.write(data)
  File "/usr/local/lib/python2.7/dist-packages/buttersink/Butter.py", line 179, in __exit__
    % (self.path, self.process.returncode, )
Exception: receive /mnt/backups/monthly.11 returned error 1.
2017-01-27 21:18:59,106:   ERROR:buttersink.py[275]: ERROR: receive /mnt/backups/monthly.11 returned error 1..

I'm not sure if it completely transferred, as there is a slight discrepancy in size.

Here's the original:

sudo btrfs fi du -s monthly.11
     Total   Exclusive  Set shared  Filename
   5.21GiB       0.00B     5.21GiB  monthly.11

Here's what was transferred:

sudo btrfs fi du -s monthly.11_2017-01-27T21:18:59.103487.part
     Total   Exclusive  Set shared  Filename
   5.19GiB     5.19GiB       0.00B  monthly.11_2017-01-27T21:18:59.103487.part

I am using kernel 4.8.0 with btrfs-progs 4.9 on Debian Stable.

Couple of "no such file" errors

Hi!

I mainly use your programme for sync my snapshots on a local drive. Here are some commands that brings errors:

$ buttersink -n /home /var/run/media/username/Backup_Drive
[Errno 2] No such file or directory: '/'
Waiting for btrfs quota usage scan...
No snapshots in source.
Try adding a '/' to '/home'.

Next:

$ buttersink -n /home /dev/mapper/luks-800349b4-b1e6-4b96-b9d7-5a050b191f86
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/buttersink/buttersink.py", line 201, in main
  dest = parseSink(args.dest, source is not None, args.delete, args.dry_run)
File "/usr/lib/python2.7/site-packages/buttersink/buttersink.py", line 183, in parseSink
  return Sinks[parts['method']](host, path, mode, dryrun)
File "/usr/lib/python2.7/site-packages/buttersink/ButterStore.py", line 43, in __init__
  raise Exception("'%s' is not an existing directory" % (self.userPath))
Exception: '/dev/mapper/luks-800349b4-b1e6-4b96-b9d7-5a050b191f86' is not an existing directory

In this example, I simply made a mistake, and put as a second argument the luks container (dm-crypt/LUKS) instead of the btrfs partition inside it. But It probably shouldn't fail like that though.

Some more information:

$ ls -l /dev/mapper/luks-800349b4-b1e6-4b96-b9d7-5a050b191f86
lrwxrwxrwx 1 root root 7 2015-04-17 14:07 /dev/mapper/luks-800349b4-b1e6-4b96-b9d7-5a050b191f86 -> ../dm-0

But:

$ ls -l /dev/dm-0 
brw-rw---- 1 root disk 254, 0 2015-04-17 14:07 /dev/dm-0

Thanks anyway for this tool!

Buttersink fails with an error

Hello,

I'm trying to copy dirvish archive to another host but it fails with an error ...

root@giles:/media# buttersink -l /tmp/buttersink.log /media/btrfs/ ssh://viki.net.e-net.sk/mnt/btrfs/
Remote version: {u'btrfs': u'Btrfs v3.17', u'buttersink': u'0.6.8', u'linux': u'Linux-3.16.0-4-amd64-x86_64-with-glibc2.4'}
measured size (477.4 GiB), estimated size (477.4 GiB)
Optimal synchronization:
477.4 GiB from 140 diffs in btrfs /media/btrfs
477.4 GiB from 140 diffs in TOTAL
Xfer: 09b1...e2be /media/btrfs/dirvish/iptv-mcast/20160729-0244/tree from None (8.527 MiB)
0:00:00.500888: Sent 11.75 MiB of 8.527 MiB (137%) ETA: None (197 Mbps )
ERROR: {u'traceback': u' File "/usr/local/lib/python2.7/dist-packages/crcmod/_crcfunpy.py", line 73, in _crc32r', u'errorType': u'TypeError', u'command': u'write', u'server': True, u'error': u'ord() expected string of length 1, but int found'}.

The debug log is attached. The messages are too cryptic for me ... is it possible to tell from the debug log file what went wrong, please?

Cheers,
Martin

buttersink_log.zip

Source and Destination relative to block device [Feature Request]

I am not sure if this is something that would be considered useful, or within the scope of this project but I would like to see the ability to reference locations relative to the root of block devices so that it can be used on unmounted disks.

The format for a location would be:

[btrfs://]<block dev>//<path relative to root of block dev>/[snapshot]

In terms of implementation, it would probably require the mounting of the block device to a temporary location (that is unmounted and deleted once the operation is completed).

I have implemented this using a wrapper script for my dockerisation of ButterSink and this works for my purposes. Please take a look at the bash script for the rough pseudo code of what would be implemented bearing in mind that:

  • You would want to mount to a temporary location (not /mnt/<block dev> as I do) as unlike in a docker container where I can guarantee that /mnt/<block dev> is not being used, when running on the host the same cannot be said when running on a host machine.
  • I haven't implemented unmounting as everything will become unmounted at the end of the lifetime of the container.

If you think the feature would be useful but don't have the time to make the changes, I can do this and send a PR and if not, I understand that this is probably a very niche scenario.

dry-run doesn't list subvolumes with would be deleted in <dest> if -d is given

--dry-run

sudo buttersink -n -d /media/RAID/owncloud/.snapshot/ /media/BACKUP/owncloud/
Optimal synchronization:
5.857 GiB from 2 diffs in btrfs /media/BACKUP/owncloud
5.857 GiB from 2 diffs in TOTAL
Keep: ade2...2203 /media/BACKUP/owncloud/daily_2015-09-26_07:37:40 from None (360 MiB)
Keep: 057e...c4f3 /media/BACKUP/owncloud/daily_2015-09-27_07:56:30 from None (360 MiB)

"normal run" (without --dry-run)

sudo buttersink -d /media/RAID/owncloud/.snapshot/ /media/BACKUP/owncloud/
Optimal synchronization:
5.857 GiB from 2 diffs in btrfs /media/BACKUP/owncloud
5.857 GiB from 2 diffs in TOTAL
Keep: ade2...2203 /media/BACKUP/owncloud/daily_2015-09-26_07:37:40 from None (360 MiB)
Keep: 057e...c4f3 /media/BACKUP/owncloud/daily_2015-09-27_07:56:30 from None (360 MiB)
Delete subvolume weekly_2015-09-10_07:41:02

Btrfs <-> Btrfs transfer optimization and multiple sources

I'm personally interested in btrfs <-> btrfs transferring scenario.
As btrfs-send has an option -c <clone-src> that allow unlimited number of snapshots to be used as data source for CoW.
This option provide much more opportunities in diff size optimization and simplify algorithms as FS does the job itself as I see.
Buttersink does not support concept of multiple sources for now and probably never will. But I write it here for further considerations.

Resolve .local hostnames or IP address

Using either a HOST.local address or an IP address results in a message like this:

ssh: Could not resolve hostname 192.168.X.X:: Name or service not known

even though ssh will happily connect to the host given the IP address.

buttersink seems to give wrong results when quota rescan is in progress

I have a btrfs filesystem where I enabled quota with btrfs quota enable PATH.

The quota rescan is in progress, as evidenced by:

$ sudo btrfs quota rescan PATH
ERROR: quota rescan failed: Operation now in progress

However, running ./buttersync.py PATH/subvolumes (where subvolumes is a folder of the filesystem containing snapshots) happily succeeds and reports size for each subvolume. The sizes are bogus, and they keep increasing each time I run the command.

Shouldn't buttersink wait for quota rescan to be completed, the same way that it does when it takes itself the initiative of enabling quota?

empty directory for subvolume not copied over

I have two subvolumes: root and /usr/portage.
When I use btrfs to make a snapshot of / there is an empty directory at /usr/portage.
When I copy this snapshot over with buttersink this empty directory is gone.

Keep folder structure [Feature Request]

Currently it would seem that when you use buttersink it will recurse through the source directory looking for any snapshot at any depth within the source and then output these as a flat hierarchy into the destination. It would be nice if there was a --keep-structure flag that would keep the structure relative to the source when copying to the destination. Where folders don't exist in the destination they can be created and where there are subvolumes within the source it might also be useful to have a --keep-subvolumes where it will create a subvolume (rather than an ordinary folder) to match the source (only applicable where the destination is btrfs).

Path /dir/of/dest/ exists, can't receive

Hi,

Trying my first buttersink using command:
buttersink.py /dir/of/src/ ssh://user@dest_server/dir/of/dest/

[user@src_server]# python /usr/lib/python2.7/site-packages/buttersink/buttersink.py /dir/of/src ssh://user@dest_server/dir/of/dest/
user@src_server's password:
Remote version: {u'btrfs': u'btrfs-progs v4.3.1', u'buttersink': u'0.6.7', u'linux': u'Linux-4.3.3-1.el7.elrepo.x86_64-x86_64-with-redhat-3.8.10-Core'}
measured size (7.317 GiB), estimated size (7.317 GiB)
Optimal synchronization:
7.317 GiB from 1 diffs in btrfs /dir/of/src
7.317 GiB from 1 diffs in TOTAL
Xfer: edab...a1f1 /dir/of/src from None (7.317 GiB)
ERROR: {u'traceback': u' File "/usr/lib/python2.7/site-packages/buttersink/ButterStore.py", line 179, in receive', u'errorType': u'Exception', u'command': u'receive', u'server': True, u'error': u"Path /dir/of/dest exists, can't receive edabab74-824e-ff48-bf2b-7f6a1de0a1f1"}

If I delete the destination path, it complains "S|ERROR: '/dir/of/dest' is not an existing directory". Any ideas?

How can we automatically delete partial transfers?

When a synchronization is interrupted, following error is thrown upon next run:

...
/mnt/erik/snapshots/rootfs/rootfs.20170526T0150 (1.268 GiB)
  ERROR: {u'traceback': u'  File "build/bdist.linux-x86_64/egg/buttersink/ButterStore.py", line 179, in receive', u'errorType': u'Exception', u'command': u'receive', u'server': True, u'error': u"Path /mnt/aea3/snapshots/rootfs/rootfs.20170526T1708 exists, can't receive b118c47c-3128-fc40-8999-1da4c89c4dec"}.

How can I tell buttersink to remove the problematic transfer automatically (maybe delete or mark as "to be deleted")?

How does ssh syncing work?

I'm failing to understand if this requires an existing snapshot to be on the dst to sync or if it will handle the initial sync. I'm trying to do a sync using ssh and if I select a pre-existing snapshot on the destination, it tells me it exists so cannot receive. If select a non-existing name (for an initial sync), it tells me it cannot since it doesn't exist. Really need an example that works using ssh.

t420-ssd t420 # buttersink -d boot.20150915/ ssh://[email protected]/btrfs/
[email protected]'s password: 
  Remote version: {u'btrfs': u'Btrfs v3.17', u'buttersink': u'0.6.7', u'linux': u'Linux-4.4.1-sunxi-armv7l-with-debian-8.0'}
  measured size (110.3 MiB), estimated size (110.3 MiB)
  Optimal synchronization:
  110.3 MiB from 1 diffs in btrfs /btrfs/t420/boot.20150915
  110.3 MiB from 1 diffs in TOTAL
  Xfer: bcc4...9870 /btrfs/t420/boot.20150915 from None (110.3 MiB)
  ERROR: {u'traceback': u'  File "build/bdist.linux-armv7l/egg/buttersink/ButterStore.py", line 179, in receive', u'errorType': u'Exception', u'command': u'receive', u'server': True, u'error': u"Path /btrfs exists, can't receive bcc4fe8d-3573-2743-a16b-dd77b7279870"}.
t420-ssd t420 # buttersink -d boot.20150915/ ssh://[email protected]/btrfs/test
[email protected]'s password: 
S|ERROR: '/btrfs/test' is not an existing directory.

  ERROR: Fatal remote ssh server error.
t420-ssd t420 # buttersink -d boot.20150915/ ssh://[email protected]/btrfs/test/
[email protected]'s password: 
S|ERROR: '/btrfs/test' is not an existing directory.

  ERROR: Fatal remote ssh server error.
t420-ssd t420 # buttersink -d boot.20150915/ ssh://[email protected]/btrfs/cubie-home.20160413
[email protected]'s password: 
S|ERROR: '/btrfs/cubie-home.20160413' is not an existing directory.

  ERROR: Fatal remote ssh server error.
t420-ssd t420 # buttersink -d boot.20150915/ ssh://[email protected]/btrfs/cubie-home.20150413/
[email protected]'s password: 
  Remote version: {u'btrfs': u'Btrfs v3.17', u'buttersink': u'0.6.7', u'linux': u'Linux-4.4.1-sunxi-armv7l-with-debian-8.0'}
  measured size (110.3 MiB), estimated size (110.3 MiB)
  Optimal synchronization:
  110.3 MiB from 1 diffs in btrfs /btrfs/t420/boot.20150915
  110.3 MiB from 1 diffs in TOTAL
  Xfer: bcc4...9870 /btrfs/t420/boot.20150915 from None (110.3 MiB)
  ERROR: {u'traceback': u'  File "build/bdist.linux-armv7l/egg/buttersink/ButterStore.py", line 179, in receive', u'errorType': u'Exception', u'command': u'receive', u'server': True, u'error': u"Path /btrfs/cubie-home.20150413 exists, can't receive bcc4fe8d-3573-2743-a16b-dd77b7279870"}.

buttersink uses same as snapshot as basis

I have a series of snapshots made with snapper.
Basically the idea is to make a snapshot or root every hour. The first one is snapshot 1, the second one snapshot 2 etc.
Image have 5 snapshots nr 1 till 5. Seems like buttersink copies one over (say nr 3) and then for the other ones sends over the diffs with nr. 3.
To me it makes more sence to 1 over. Diff 2 with 1, Diff 3 with 2 etc. That way the diffs are smaller.
To make it more general when different snapshots have the same parent handle them alphabetically or numerically in ascending order.

Hope this makes sence to you.

Regards,

Simon

Fails on snapshot transmittion with type error

I'm trying to run buttersink for the first time, but it aborts when copying the snapshot. Running it with the dryrun flag finishes successfully, but the actual transmittion of the snapshot fails, with the following log snippet:

2015-11-13 21:35:03,800: DEBUG:Butter.py[226]: Waiting for send process to finish...
2015-11-13 21:35:03,801: ERROR:Butter.py[230]: btrfs send errors
2015-11-13 21:35:03,802: DEBUG:Butter.py[147]: Waiting for receive process to finish...
2015-11-13 21:35:03,805: DEBUG:buttersink.py[268]: Trace information for debugging
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/buttersink/buttersink.py", line 251, in main
diff.sendTo(dest, chunkSize=args.part_size << 20)
File "/usr/local/lib/python2.7/dist-packages/buttersink/Store.py", line 352, in sendTo
transfer(sendContext, receiveContext, chunkSize)
File "/usr/local/lib/python2.7/dist-packages/buttersink/Store.py", line 264, in transfer
data = reader.read(chunkSize)
File "/usr/local/lib/python2.7/dist-packages/buttersink/Butter.py", line 250, in read
self.diff.fromGen,
File "/usr/local/lib/python2.7/dist-packages/buttersink/send.py", line 237, in replaceIDs
crc = calcCRC()
File "/usr/local/lib/python2.7/dist-packages/buttersink/send.py", line 223, in calcCRC
crc = crc32c(btrfs_cmd_header.write(header), crc)
File "/usr/local/lib/python2.7/dist-packages/crcmod/crcmod.py", line 450, in crcfun
return xorOut ^ fun(data, xorOut ^ crc, table)
File "/usr/local/lib/python2.7/dist-packages/crcmod/_crcfunpy.py", line 73, in _crc32r
crc = table[ord(x) ^ int(crc & 0xFFL)] ^ (crc >> 8)
TypeError: ord() expected string of length 1, but int found
2015-11-13 21:35:03,812: ERROR:buttersink.py[269]: ERROR: ord() expected string of length 1, but int found.

buttersink is running from yesterdays 'pip' install, the system is running a current debian essie with kernel 4.2.0-0.bpo.1-amd64. upgrading btrfs-tools from jessie's 3.17-1.1 to stretch's 4.3 did not change the error.

@ char not recognized on ssh destination

cant' do transfer if dest is ssh and contains @ in path:

buttersink -n /Source_Backup//@home.20161128 ssh://root@server/dest/@backup/

=> S|ERROR: '/dest/%40backup' is not an existing directory

Add optional compression and resume support to remote (with ssh) btrfs send/receive

Hi, thanks for this good software, it is very useful.
I found some possible improvements to make it optimal also with big remote transfert with very low bandwidth connections and possibile errors.
I think can be very useful add for remote case optional compression send with resume and bandwidth limit support.
This seems possible saving btrfs send output to file, optionally compress it, send it with rsync that can resume partial files and have also bandwidth limit support (that can be also added optional), on remote destination uncompress it (if needed) and do btrfs receive of the file. If fails maintain the "btrfs send" file and resume the transfert (with rsync).
I did some fast manual tests to check if works and how:
I did btrfs send with -f for save to file instead
I compress it to gzip that seems a good compromise between space and used resources/time.
After I send it with rsync --partial for resume support (including simulate common problem), for example I did:
rsync --partial --progress --rsh=ssh fullsend.gz remotehost:/mnt/btests/fullsend.gz
Is possible add also bandwidth limit adding --bwlimit=
After I did uncompress and btrfs receive on destination.
All is ok except the uuid of snapshot for use it for diffs with buttersink, I not found a btrfs-tools command to change uuid same as buttersink done but I suppose that including this improvements in buttersink is possibile maintaing actual buttersink features.

I don't have experience with python, can someone with experience add these features that I think are very useful (or essentials in many cases)?
Optimal result I suppose can be these additional parameters:
-r --resume-support save the btrfs send/receive data to file and make possible transfert it with possibility of resume, useful for remote transfert avoiding to redo all from start in case of error or needed "transfert pause"
-c --compress compress the btrfs send/receive data with gzip, useful for low bandwidth, require the -r
-b --bandwidth-limit limit the remote transfert, require the -r

Thanks for any reply and sorry for my bad english.

ERROR: failed to dump stream. Broken pipe

Hi,

I'm trying to set up a 2nd host that needs to copy all data on a btrfs volume from a 1st host.
Installation went fine, but when running I get the following error:

root@host02-sync:/mnt/urbackup-btrfs# buttersink -n -d ssh://host02/mnt/urbackup-btrfs/ /mnt/urbackup-btrfs/
  Remote version: {u'btrfs': u'Btrfs v3.17', u'buttersink': u'0.6.8', u'linux': u'Linux-3.16.0-4-amd64-x86_64-with-debian-8.7'}
S|Measuring 296d...3b0f /mnt/urbackup-btrfs/srvmedia/170502-1128 from b06f...ba7d /mnt/urbackup-btrfs/srvmedia/170506-1114 (~160.4 MiB)

S|btrfs send errors
At subvol /mnt/urbackup-btrfs/srvmedia/170502-1128
ERROR: failed to dump stream. Broken pipe
  ERROR: {u'traceback': u'  File "/usr/local/lib/python2.7/dist-packages/crcmod/_crcfunpy.py", line 73, in _crc32r', u'errorType': u'TypeError', u'command': u'measure', u'server': True, u'error': u'ord() expected string of length 1, but int found'}.

Just listing the snapshots does seem to work:

root@host02-sync:/mnt/urbackup-btrfs# buttersink -n -d ssh://host02/mnt/urbackup-btrfs/
  Remote version: {u'btrfs': u'Btrfs v3.17', u'buttersink': u'0.6.8', u'linux': u'Linux-3.16.0-4-amd64-x86_64-with-debian-8.7'}
296d14b6-c0a8-7d4b-a56b-f17d98e63b0f /mnt/urbackup-btrfs/srvmedia/170502-1128 (7.215 GiB 2.066 MiB exclusive)
f72540be-e48a-9848-b2d3-ed52e5761174 /mnt/urbackup-btrfs/srvmedia/170502-1632 (7.214 GiB 2.113 MiB exclusive)
8885e787-8604-5549-a640-36023bc1f806 /mnt/urbackup-btrfs/srvmedia/170502-2135 (7.206 GiB 2.473 MiB exclusive)

Am i doing something wrong?
thanks,
Stijn

buttersink depends on python-psutils and python-crcmod but does not install it

buttersink installs various dependencies when you do make, but when I run it afterwards I get:

Traceback (most recent call last):
  File "./buttersink.py", line 22, in <module>
    import ButterStore
  File "/home/a3nm/apps/buttersink/buttersink/ButterStore.py", line 10, in <module>
    import Butter
  File "/home/a3nm/apps/buttersink/buttersink/Butter.py", line 13, in <module>
    import psutil
ImportError: No module named psutil

I think python-psutil should be added to apt.txt. Likewise, python-crcmod should be added as a dependency.

Fails on Debian Wheezy 64-bit

Hi, I'm using Debian Wheezy 64-bit on my server. I run linux kernel 3.19 with the PF patchset.
btrfs tools version 3.14.1

When I try to do anything with buttersink, it crashes with a bunch of python error messages.
For example, simply calling:
"buttersink /Path/To/SnapShots/"

Produces the following output.

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/buttersink-0.6-py2.7.egg/buttersink/buttersink.py", line 214, in main
with source:
File "/usr/local/lib/python2.7/dist-packages/buttersink-0.6-py2.7.egg/buttersink/Store.py", line 74, in enter
self._fillVolumesAndPaths(self.paths)
File "/usr/local/lib/python2.7/dist-packages/buttersink-0.6-py2.7.egg/buttersink/ButterStore.py", line 67, in _fillVolumesAndPaths
for bv in mount.subvolumes:
File "/usr/local/lib/python2.7/dist-packages/buttersink-0.6-py2.7.egg/buttersink/btrfs.py", line 483, in subvolumes
self._getRoots()
File "/usr/local/lib/python2.7/dist-packages/buttersink-0.6-py2.7.egg/buttersink/btrfs.py", line 635, in _getRoots
info = btrfs_root_item.read(buf)
File "/usr/local/lib/python2.7/dist-packages/buttersink-0.6-py2.7.egg/buttersink/ioctl.py", line 226, in read
args = list(self._struct.unpack_from(data, offset))
TypeError: unpack_from() argument 1 must be string or read-only buffer, not memoryview

Any idea what might be the cause? Off the top of my head, I would assume some libs are too old in Debian Wheezy (Stable), since I had similar problems with a btrfs dedup tool.

buttersink uses wrong sizes when finding best sync path

I've got a number of snapshots, which are daily rsyncs of a directory tree, where sometimes I delete the rsync destination, so that for most days, snapN and snapN+1 are nearly identical, but when I want a "full" rsync backup, snapN and snapN+1 contain nearly-identical files, but because I rsynced against an empty destination instead of yesterdays, the actual btrfs file extends are completely different, for example:

name   size   diff to yesterday
day01  500MB  ./.
day02  510MB  50MB
day03  505MB  40MB
day04  490MB  490MB  <<<
day05  505MB  40MB

Now, I'd think the best diff algorithm figures out that there's two different clusters day01..day03 and day04..day05 where all diffs between two days in a cluster are fairly small, and diffs from one cluster to the next are fairly large and should be avoided. But that's not what I see. Instead, buttersink selects one day (which looks fairly random to me), e.g. day04, and transmits that in full. then, it selects some other day and diffs that against the transmitted day04, e.g. day05. those two days are in the same cluster of days, and everything is fine, it transfers about 40mb.

But when it figures out the diff size between days in one cluster and the other, something wrong happens. the size I'm shown looks like it does an actual /usr/bin/diff between the two snapshots, instead of an btrfs send, e.g. an estimated size of ~30MB between day04 and day03, instead of the actual 505MB, as day03 and day04 don't share any extends.

Of course, that's throwing the full algorithm off, and I end up with lots of wasted (time and) space on the destination filesystem:

 Xfer: 5e6d...4d48 /srv/backup/snapshot/youam.de/cailin.int.youam.de/2015-11-04T03:32:25+0000 from a676...44b4 /srv/backup/snapshot/youam.de/cailin.int.youam.de/2015-11-03T03:38:03+0000 (~5.242 MiB)
0:00:01.387560: Sent 4.566 MiB of 5.242 MiB (87%) ETA: 0:00:00.205553 (27.6 Mbps )                     
 Xfer: d27b...2f40 /srv/backup/snapshot/youam.de/cailin.int.youam.de/2015-11-11T03:45:01+0000 from aba1...e172 /srv/backup/snapshot/youam.de/cailin.int.youam.de/2015-11-09T05:36:57+0000 (~2.196 MiB)
0:00:19.540641: Sent 145.5 MiB of 2.196 MiB (6624%) ETA: None (62.5 Mbps )                     

I'm seeing this happen both with the --estimate flag and without it. What am I doing wrong here? (btrfs quota was not used at snapshot creation time and was only enabled by buttersink itself, if that matters)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.