Giter Site home page Giter Site logo

welbymcroberts / cloudfiles-sync Goto Github PK

View Code? Open in Web Editor NEW
15.0 15.0 4.0 965 KB

A script to Sync a set of files / directories with Rackspace Cloudfiles (or any Openstack storage provider)

Home Page: https://github.com/welbymcroberts/cloudfiles-sync/wiki

Python 100.00%

cloudfiles-sync's People

Contributors

mancdaz avatar mstevens avatar welbymcroberts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cloudfiles-sync's Issues

allow passing of directory as an argument

Rather than manually issuing a "find" statement to pass a file list via stdin, make the script accept an argument which is either a directory or a file. If a directory, issue a find on the contents of the directory. If a file, issue a find on the file itself.

make the parsed directory the 'basename' for uploads

Whether supplying a directory contents manually via an issued 'find' or as an argument to the script, always make the directory the base directory for uploads.

eg
find pics/ -type f | python /home/xbmc/cloudfiles-sync/cfsync.py
find /mnt/ter1/pics/ -type f | python /home/xbmc/cloudfiles-sync/cfsync.py

Both of these should result in the following structure in cloudfiles:

pics/filename.1
pics/filename.2
pics/directory1/filename.3

etc

ResponseError: 401: Unauthorized after a long run

I've been trying to use it to backup a large (200GB, 40000 files) directory

./cloud-sync.py -d DEBUG -T 1 -f /tmp/log -u xxxxxxxxxxxxxxxx -k xxxxxxxx --authurl https://auth.storage.memset.com/v1.0 /home/xxxxxxxxxxxx/xxxxxx/ swift://xxxxxx

It works but after a while it produces this error

2011-10-17 16:28:52,986 cloud-sync   DEBUG    5345565 completed of 5349661 - 99%
2011-10-17 16:28:52,986 cloud-sync   DEBUG    5345565 completed of 5349661 - 99%
2011-10-17 16:28:53,259 cloud-sync   INFO     xxxx/xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxx.xxx completed
2011-10-17 16:28:53,259 cloud-sync   DEBUG    Returning Connection to the pool
2011-10-17 16:28:53,259 cloud-sync   DEBUG    Run 2497
2011-10-17 16:28:53,260 cloud-sync   DEBUG    Getting Connection
2011-10-17 16:28:53,260 cloud-sync   INFO     Saving cf://xxxxxx:xxxx/xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxx.xxx to /home/xxxxxxxxxxxx/xxxxxx/xxxx/xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxx.xxx
2011-10-17 16:28:54,500 cloud-sync   DEBUG    Returning Connection to the pool
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "./cloud-sync.py", line 22, in run
    self.work()
  File "./cloud-sync.py", line 34, in work
    quote(task['file'],'/'))
  File "/home/ncw/Code/cloudfiles-sync/cloud_providers/swift.py", line 179, in put
    connection.get_container(container).create_object(remote).load_from_filename(local,callback=self.callback)
  File "/usr/lib/pymodules/python2.7/cloudfiles/connection.py", line 341, in get_container
    raise ResponseError(response.status, response.reason)
ResponseError: 401: Unauthorized

I think this is happening when the auth token expires after one hour. The python-cloudfiles module has code to get a new token when this happens but I suspect either your use of threading or ConnectionPool is breaking it. I'm using version 1.7.9.2 of the python-cloudfiles module.

add option to disable md5 hash compare

Add an option into the option file to allow disabling the md5 hash comparison, so that a resume of a sync of a massive number of files (if interrupted for any reason) doesn't take an age to run through the hashes again.

Or, you might just be happy with a filedate comparison.

Speed up file comparision

Rather than iterating over the remote_files for every file every time change this to be a lot more efficient

Feature request: keep a local cache of filenames and MD5SUMs

When I sync a big directory to cloudfiles it spends ages going through every file and making the MD5SUM of them, even though they haven't changed before it transfers a byte of data.

It would be great if cloudfiles-sync could cache the MD5SUMs along with the paths, size and modification of each file. Then it could read this file quickly see to if any MD5SUMs needed updating which would save an awful lot of disk thrashing!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.