welbymcroberts / cloudfiles-sync Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 4.0 965 KB

A script to Sync a set of files / directories with Rackspace Cloudfiles (or any Openstack storage provider)

Home Page: https://github.com/welbymcroberts/cloudfiles-sync/wiki

Python 100.00%

cloudfiles-sync's People

Contributors

Stargazers

Watchers

Forkers

mancdaz jessexoc wisertogether masj

cloudfiles-sync's Issues

allow exclusion of certain files based on regex

If passing a directory as an option, allow certain files or patterns to be excluded from that sync.

make the parsed directory the 'basename' for uploads

Whether supplying a directory contents manually via an issued 'find' or as an argument to the script, always make the directory the base directory for uploads.

eg
find pics/ -type f | python /home/xbmc/cloudfiles-sync/cfsync.py
find /mnt/ter1/pics/ -type f | python /home/xbmc/cloudfiles-sync/cfsync.py

Both of these should result in the following structure in cloudfiles:

pics/filename.1
pics/filename.2
pics/directory1/filename.3

etc

Speed up file comparision

Rather than iterating over the remote_files for every file every time change this to be a lot more efficient

add option to disable md5 hash compare

Add an option into the option file to allow disabling the md5 hash comparison, so that a resume of a sync of a massive number of files (if interrupted for any reason) doesn't take an age to run through the hashes again.

Or, you might just be happy with a filedate comparison.

upload progress indicator

as each file is uploaded, show a progress bar.

Move configuration to a seperate file

Threading

Feature request: keep a local cache of filenames and MD5SUMs

When I sync a big directory to cloudfiles it spends ages going through every file and making the MD5SUM of them, even though they haven't changed before it transfers a byte of data.

It would be great if cloudfiles-sync could cache the MD5SUMs along with the paths, size and modification of each file. Then it could read this file quickly see to if any MD5SUMs needed updating which would save an awful lot of disk thrashing!

allow passing of a file containing a list of objects to sync

As well as providing a base directory to sync, allow passing of a file containing a list of files to sync.

Auto-Relogin

Auth tokens expire after 24 hours,

URLEncoding of Filenames is needed

Creating Directorys if they don't exist in a sync from cloud

allow passing of directory as an argument

Rather than manually issuing a "find" statement to pass a file list via stdin, make the script accept an argument which is either a directory or a file. If a directory, issue a find on the contents of the directory. If a file, issue a find on the file itself.

ResponseError: 401: Unauthorized after a long run

I've been trying to use it to backup a large (200GB, 40000 files) directory

./cloud-sync.py -d DEBUG -T 1 -f /tmp/log -u xxxxxxxxxxxxxxxx -k xxxxxxxx --authurl https://auth.storage.memset.com/v1.0 /home/xxxxxxxxxxxx/xxxxxx/ swift://xxxxxx

It works but after a while it produces this error

2011-10-17 16:28:52,986 cloud-sync   DEBUG    5345565 completed of 5349661 - 99%
2011-10-17 16:28:52,986 cloud-sync   DEBUG    5345565 completed of 5349661 - 99%
2011-10-17 16:28:53,259 cloud-sync   INFO     xxxx/xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxx.xxx completed
2011-10-17 16:28:53,259 cloud-sync   DEBUG    Returning Connection to the pool
2011-10-17 16:28:53,259 cloud-sync   DEBUG    Run 2497
2011-10-17 16:28:53,260 cloud-sync   DEBUG    Getting Connection
2011-10-17 16:28:53,260 cloud-sync   INFO     Saving cf://xxxxxx:xxxx/xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxx.xxx to /home/xxxxxxxxxxxx/xxxxxx/xxxx/xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxx.xxx
2011-10-17 16:28:54,500 cloud-sync   DEBUG    Returning Connection to the pool
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "./cloud-sync.py", line 22, in run
    self.work()
  File "./cloud-sync.py", line 34, in work
    quote(task['file'],'/'))
  File "/home/ncw/Code/cloudfiles-sync/cloud_providers/swift.py", line 179, in put
    connection.get_container(container).create_object(remote).load_from_filename(local,callback=self.callback)
  File "/usr/lib/pymodules/python2.7/cloudfiles/connection.py", line 341, in get_container
    raise ResponseError(response.status, response.reason)
ResponseError: 401: Unauthorized

I think this is happening when the auth token expires after one hour. The python-cloudfiles module has code to get a new token when this happens but I suspect either your use of threading or ConnectionPool is breaking it. I'm using version 1.7.9.2 of the python-cloudfiles module.