jeremygrosser / tablesnap Goto Github PK

View Code? Open in Web Editor NEW

181.0 18.0 89.0 208 KB

Uses inotify to monitor Cassandra SSTables and upload them to S3

License: BSD 2-Clause "Simplified" License

Python 91.89% Shell 8.11%

tablesnap's Introduction

MAINTAINERS WANTED

Tablesnap

Theory of Operation

Tablesnap is a script that uses inotify to monitor a directory for IN_MOVED_TO events and reacts to them by spawning a new thread to upload that file to Amazon S3, along with a JSON-formatted list of what other files were in the directory at the time of the copy.

When running a Cassandra cluster, this behavior can be quite useful as it allows for automated point-in-time backups of SSTables. Theoretically, tablesnap should work for any application where files are written to some temporary location, then moved into their final location once the data is written to disk. Tablesnap also makes the assumption that files are immutable once written.

Installation

The simplest way to install tablesnap is from the Python Package Index, PyPI. https://pypi.python.org/pypi/tablesnap

pip install tablesnap

This distribution provides a debian/ source directory, allowing it to be built as a standard Debian/Ubuntu package and stored in a repository. The Debian package includes an init script that can run and daemonize tablesnap for you. Tablesnap does not daemonize itself. This is best left to tools like init, supervisord, daemontools, etc.

We do not currently maintain binary packages of tablesnap. To build the debian package from source, assuming you have a working pbuilder environment:

git checkout debian
git-buildpackage --git-upstream-branch=master --git-debian-branch=debian --git-builder='pdebuild'

The daemonized version of the Debian/Ubuntu package uses syslog for logging. The messages are sent to the DAEMON logging facility and tagged with tablesnap. If you want to redirect the log output to a log file other than /var/log/daemon.log you can filter by this tag. E.g. if you are using syslog-ng you could add

# tablesnap
filter f_tablesnap { filter(f_daemon) and match("tablesnap" value("PROGRAM")); };
destination d_tablesnap { file("/var/log/tablesnap.log"); };
log { source(s_src); filter(f_tablesnap); destination(d_tablesnap); flags(final); };

to /etc/syslog-ng/syslog-ng.conf.

If you are not a Debian/Ubuntu user or do not wish to install the tablesnap package, you may copy the tablesnap script anywhere you'd like and run it from there. Tablesnap depends on the pyinotify and boto Python packages. These are available via "pip install pyinotify; pip install boto;", or as packages from most common Linux distributions.

Configuration

All configuration for tablesnap happens on the command line. If you are using the Debian package, you'll set these options in the DAEMON_OPTS variable in /etc/default/tablesnap.

usage: tablesnap [-h] -k AWS_KEY -s AWS_SECRET [-r] [-a] [-B] [-p PREFIX]
                 [--without-index] [--keyname-separator KEYNAME_SEPARATOR]
                 [-t THREADS] [-n NAME] [-e EXCLUDE | -i INCLUDE]
                 [--listen-events {IN_MOVED_TO,IN_CLOSE_WRITE}]
                 [--max-upload-size MAX_UPLOAD_SIZE]
                 [--multipart-chunk-size MULTIPART_CHUNK_SIZE]
                 bucket paths [paths ...]

Tablesnap is a script that uses inotify to monitor a directory for events and
reacts to them by spawning a new thread to upload that file to Amazon S3,
along with a JSON-formatted list of what other files were in the directory at
the time of the copy.

positional arguments:
  bucket                S3 bucket
  paths                 Paths to be watched

optional arguments:
  -h, --help            show this help message and exit
  -k AWS_KEY, --aws-key AWS_KEY
  -s AWS_SECRET, --aws-secret AWS_SECRET
  -r, --recursive       Recursively watch the given path(s)s for new SSTables
  -a, --auto-add        Automatically start watching new subdirectories within
                        path(s)
  -B, --backup          Backup existing files to S3 if they are not already
                        there
  -p PREFIX, --prefix PREFIX
                        Set a string prefix for uploaded files in S3
  --without-index       Do not store a JSON representation of the current
                        directory listing in S3 when uploading a file to S3.
  --keyname-separator KEYNAME_SEPARATOR
                        Separator for the keyname between name and path.
  -t THREADS, --threads THREADS
                        Number of writer threads
  -n NAME, --name NAME  Use this name instead of the FQDN to identify the
                        files from this host
  -e EXCLUDE, --exclude EXCLUDE
                        Exclude files matching this regular expression from
                        upload.WARNING: If neither exclude nor include are
                        defined, then all files matching "-tmp" are excluded.
  -i INCLUDE, --include INCLUDE
                        Include only files matching this regular expression
                        into upload.WARNING: If neither exclude nor include
                        are defined, then all files matching "-tmp" are
                        excluded.
  --listen-events {IN_MOVED_TO,IN_CLOSE_WRITE,IN_CREATE}
                        Which events to listen on, can be specified multiple
                        times. Values: IN_MOVED_TO, IN_CLOSE_WRITE, IN_CREATE
                        (default: IN_MOVED_TO, IN_CLOSE_WRITE)
  --max-upload-size MAX_UPLOAD_SIZE
                        Max size for files to be uploaded before doing
                        multipart (default 5120M)
  --multipart-chunk-size MULTIPART_CHUNK_SIZE
                        Chunk size for multipart uploads (default: 256M or 10%
                        of free memory if default is not available)

For example:

$ tablesnap -k AAAAAAAAAAAAAAAA -s BBBBBBBBBBBBBBBB me.synack.sstables /var/lib/cassandra/data/GiantKeyspace

This would cause tablesnap to use the given Amazon Web Services credentials to backup the SSTables for my GiantKeyspace to the S3 bucket named me.synack.sstables.

Questions, Comments, and Help

The fine folks in #cassandra-ops on irc.freenode.net are an excellent resource for getting tablesnap up and running, and also for solving more general Cassandra issues.

tablesnap's People

Contributors

Stargazers

Watchers

Forkers

simplegeo pombredanne brooklynpacket osabina apigee amorton mstump ctavan ethanmiller thekad urbanairship tvpavan blair blakehilscher mailmahee curalate starttheshift li-clutter-org bmoyles qhartman web5design legolin acquia ralph-tice showops weheartit wrathofchris getstream firepub nateyoder itxaka janbalik heroku kmissoumi russss betable riccardo-ferrari tamsky dashhunter amismb cdave-git-zz omasz ameng tavisca-vtripathi mortinke bhvijaykumar motivator juiceblender conico974 alienth xlhybridsoss neroliang phanirajl continuum-vikrant-tyagi humanbrainproject yawboateng sunkarasreenu alokamvenki402 fossabot blafrisch danwashusen jacobjohansen claytonsilva nicholasamorim dmitry-saprykin srinivma1 cajnoj vishwanath1215 tsingh2k15 erikluo carlosjpc isabella232 python-repository-hub

tablesnap's Issues

Upload paths differ between initial and inotify-upload

When passing a relative path to tablesnap, then the initial backup_files upload will not create the absolute paths of the files in the S3 bucket whereas the inotify-triggered uploads will.

Say you are in /home/foo and start tablesnap bucket bar/ then initial files will be uploaded as hostname:bar/file whereas the inotify-triggered files will be uploaded as hostname:/home/foo/bar/file

Which one is the intended behavior?

tablesnap crashes with md5 mismatch error from S3

Hi There!

I have tablesnap running on ubuntu with Cassandra 3.7, and it is crashing frequently with the below error (not always on the same file).

Is this normal?

Thank you for your help!

Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]: Traceback (most recent call last):
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/bin/tablesnap", line 133, in worker
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     self.upload_sstable(bucket, keyname, f)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/bin/tablesnap", line 394, in upload_sstable
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     encrypt_key=self.with_sse)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 1374, in set_contents_from_filename
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     encrypt_key=encrypt_key)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 1305, in set_contents_from_file
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     chunked_transfer=chunked_transfer, size=size)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 762, in send_file
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     chunked_transfer=chunked_transfer, size=size)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 963, in _send_file_internal
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     query_args=query_args
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/connection.py", line 671, in make_request
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     retry_handler=retry_handler
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/connection.py", line 1071, in make_request
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     retry_handler=retry_handler)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/connection.py", line 940, in _mexe
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     request.body, request.headers)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 896, in sender
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     response.status, response.reason, body)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]: S3ResponseError: S3ResponseError: 400 Bad Request
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]: <?xml version="1.0" encoding="UTF-8"?>
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]: <Error><Code>BadDigest</Code><Message>The Content-MD5 you specified did not match what we received.<

Tablechop connection reset

I'm seeing occasional failures when tablechop attempts to load -listdir.json contents from S3. Since tablechop is meant to run nightly, this error should just be ignored, the next run should pick up the files that didn't get cleaned up.

I can fix this since I'm the only one using tablechop afaik, just needed to log the issue for the next time I'm in there:

Traceback (most recent call last):
  File "/data/syseng/cassandra/eb_tablechop.py", line 67, in <module>
    tablechop.clean_backups(duck_args, log)
  File "/usr/bin/tablechop", line 72, in clean_backups
    jdict = json.loads(ky.get_contents_as_string())
  File "/usr/local/lib/python2.6/dist-packages/boto/s3/key.py", line 1427, in get_contents_as_string
    response_headers=response_headers)
  File "/usr/local/lib/python2.6/dist-packages/boto/s3/key.py", line 1319, in get_contents_to_file
    response_headers=response_headers)
  File "/usr/local/lib/python2.6/dist-packages/boto/s3/key.py", line 1215, in get_file
    for bytes in self:
  File "/usr/local/lib/python2.6/dist-packages/boto/s3/key.py", line 248, in next
    data = self.resp.read(self.BufferSize)
  File "/usr/local/lib/python2.6/dist-packages/boto/connection.py", line 397, in read
    return httplib.HTTPResponse.read(self, amt)
  File "/usr/lib/python2.6/httplib.py", line 538, in read
    s = self.fp.read(amt)
  File "/usr/lib/python2.6/socket.py", line 353, in read
    data = self._sock.recv(left)
  File "/usr/lib/python2.6/ssl.py", line 96, in <lambda>
    self.recv = lambda buflen=1024, flags=0: SSLSocket.recv(self, buflen, flags)
  File "/usr/lib/python2.6/ssl.py", line 217, in recv
    return self.read(buflen)
  File "/usr/lib/python2.6/ssl.py", line 136, in read
    return self._sslobj.read(len)
socket.error: [Errno 104] Connection reset by peer

Tablesnap crash III - INFO Error uploading part 28 / error: [Errno 104] Connection reset by peer

2015-11-02 09:30:23,951 INFO Error uploading part 28
2015-11-02 09:30:24,165 CRITICAL Failed to upload file contents.
2015-11-02 09:30:24,239 ERROR Error uploading servername:/opt/cassandra/data/keyspace/tablenameXXX-ic-6261-Data.db
Traceback (most recent call last):
File "/usr/bin/tablesnap", line 349, in upload_sstable
mp.upload_part_from_file(chunk, part)
File "/usr/lib/python2.6/site-packages/boto/s3/multipart.py", line 260, in upload_part_from_file
query_args=query_args, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 1293, in set_contents_from_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 750, in send_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 951, in _send_file_internal
query_args=query_args
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1030, in _mexe
raise ex
error: [Errno 104] Connection reset by peer

2015-11-02 09:30:24,239 CRITICAL Failed uploading /opt/cassandra/data/keyspace/tablenameXXX-ic-6261-Data.db. Aborting.
Traceback (most recent call last):
File "/usr/bin/tablesnap", line 123, in worker
self.upload_sstable(bucket, keyname, f)
File "/usr/bin/tablesnap", line 349, in upload_sstable
mp.upload_part_from_file(chunk, part)
File "/usr/lib/python2.6/site-packages/boto/s3/multipart.py", line 260, in upload_part_from_file
query_args=query_args, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 1293, in set_contents_from_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 750, in send_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 951, in _send_file_internal
query_args=query_args
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1030, in _mexe
raise ex
error: [Errno 104] Connection reset by peer

tableslurp fails when keyspaces prefix one another

I'm having an error when downloading my backups using tableslurp. I've used tablesnap to backup 2 directories.

/mnt/cassandra/data/Usergrid

and

/mnt/cassandra/data/Usergrid_Applications

When attempting to restore the 'Usergrid' Keyspace via this command.

python tableslurp -k -s -n usergrid-dev-2012-09-25 usergrid-dev-sstables /mnt/cassandra/data/Usergrid ~/Downloads/data/restore/Usergrid

I receive this stacktrace

Traceback (most recent call last):
File "tableslurp", line 286, in
sys.exit(main())
File "tableslurp", line 282, in main
dh = DownloadHandler(args)
File "tableslurp", line 87, in init
(owner, group) = self._build_file_set(args.file)
File "tableslurp", line 132, in _build_file_set
self.fileset = json_data[self.origin]
KeyError: '/mnt/cassandra/data/Usergrid'

Upon further inspection of the json_data object, I see that the fileset returned is not the one specified ('Usergrid') but rather the longer 'UsergridApplications'

Below is a snipping from the logging output of the json_data object

tableslurp [2012-09-26 13:28:33,065] INFO json data {u'/mnt/cassandra/data/Usergrid_Applications': [u'Entity_Id_Sets-hd-8-Digest.sha1', ...']}

As you can see, the key, is '/mnt/cassandra/data/Usergrid_Applications', not '/mnt/cassandra/data/Usergrid' as expected

Seg fault in Docker on CoreOs

When running on CoreOs in Docker container I get
Segmentation fault (core dumped)
Is it supposed to work in this environment?

tableslurp usage?

I'm having a hard time restoring more than one table at time via tableslurp. Is this possible? My tablesnap invocation is:

tablesnap --recursive --auto-add --backup --exclude=/snapshots/\|-tmp-\|cassandra.log --with-sse --name node-name --prefix my-prefix/ my-bucket /cassandra-data

When I invoke tableslurp with the "origin" argument pointing to a keyspace, I get the error Cannot find anything to restore from my-bucket:my-prefix:/path. If I add the table to the origin, that seems to restore fine.

I'm not sure if it's relevant, but my tablesnap-uploaded files in S3 have -listdir.json files only for directories with files in them--directories with only other subdirectories do not have these files.

Tablesnap doesn't upload *-Summary.db files by default

Cassandra 2.x adds another file per sstable ending in "-Summary.db" to the data directory. If these aren't restored I don't believe it's a fatal error - they can get rebuilt - but it adds overhead to startup because all the indexes have to be scanned.

This file isn't uploaded by tablesnap by default because it's written to the directory in-situ rather than being moved into it. This is easily fixable by setting the command line option to listen for IN_CLOSE_WRITE as well, so I wonder if it would be worth listening for that by default?

Does Tablesnap deleted obsoletes SSTables backups ?

Sorry I file an issue for this but I'm not sure the folks in cassandra-ops know about.

After compaction SSTables become obsolete. What happens to those uploaded SSTables (in S3) ?
Do I need to manually delete them?

Thanks

tablesnap allows a custom path separator, but tableslurp assumes ':'

Compare https://github.com/JeremyGrosser/tablesnap/blob/master/tableslurp#L85 and https://github.com/JeremyGrosser/tablesnap/blob/master/tablesnap#L459 .

Tablesnap's interface allows a user to configure a custom path separator, defaulting to : but configurable also with arbitrary separators or '' to achieve no separator.

Tableslurp assumes the use of specifically a : separator, and so is incompatible with any implementation of Tablesnap that uses a custom keyname separator.

Maintainer

Hi @JeremyGrosser , I'm interested in being an maintainer to review the issues, merge PRs and adding some features such as pluggable storage backends (S3, GCS, etc).

How could we go about that?

tableslurp does not work after STS support

The commit: 9ae85f01cefa9aa73263d07e0e0381dcf03718a1adds STS support to all the components (snap,slurp,chop) however tableslurp lack of proper initialization of the token variable and throws exception when used with plain key + secret.

Tablesnap puts AWS keys in the processlist, give option to set via environment variables.

Give an option to set AWS keys as environment variables rather than setting them as arguments so the keys don't show up in the process list.

Variable names are:
AWS_KEY
AWS_SECRET

Store data compressed

Each put to S3 cost money. It's better to group files into a batch and compress the data before pushing to S3.

Limit bandwidth of tablesnap

Would be really helpful to be able to limit the tablesnap upload bandwidth, so in the case of using it on a production machine which needs to prioritise the out bandwidth from cassandra, tablesnap does not take as much bandwidth as possible.

Secret key and token may not be required if the IAM profile is configured

Currently the secret key and token is passed to boto.s3 API , but in case of IAM profile with S3 access configured for the instance , we will not need the secret key / token . boto API documents this saying we need not pass the key/token

But the tablesnap and related scripts fail as they try to pass the key and tokens. I think if the secretkey is not provided we need to just call boto.s3.connect_to_region(self.region) API

tablesnap crash II - SSLError: [Errno 8] _ssl.c:490: EOF occurred in violation of protocol

2015-10-23 11:38:52,420 INFO Performing multipart upload for /opt/cassandra/dataXXX_production/service_last_downloads/XXX_production-service_last_downloads-ic-8588-Index.db
2015-10-23 11:45:53,206 INFO Error uploading part 1
2015-10-23 11:45:53,457 CRITICAL Failed to upload file contents.
2015-10-23 11:45:53,526 ERROR Error uploading prod-cass-r01.ihost.XXX.com:/opt/cassandra/data/XXX_production/service_last_downloads/XXX_production-service_last_downloads-ic-8588-Index.db
Traceback (most recent call last):
File "/usr/bin/tablesnap", line 349, in upload_sstable
mp.upload_part_from_file(chunk, part)
File "/usr/lib/python2.6/site-packages/boto/s3/multipart.py", line 260, in upload_part_from_file
query_args=query_args, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 1293, in set_contents_from_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 750, in send_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 951, in _send_file_internal
query_args=query_args
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1030, in _mexe
raise ex
SSLError: [Errno 8] _ssl.c:490: EOF occurred in violation of protocol

2015-10-23 11:45:53,526 CRITICAL Failed uploading /opt/cassandra/data/XX/service_last_downloads/XXX_production-service_last_downloads-ic-8588-Index.db. Aborting.
Traceback (most recent call last):
File "/usr/bin/tablesnap", line 123, in worker
self.upload_sstable(bucket, keyname, f)
File "/usr/bin/tablesnap", line 349, in upload_sstable
mp.upload_part_from_file(chunk, part)
File "/usr/lib/python2.6/site-packages/boto/s3/multipart.py", line 260, in upload_part_from_file
query_args=query_args, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 1293, in set_contents_from_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 750, in send_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/lib/python2.6/site-packages/boto/s3/key.py", line 951, in _send_file_internal
query_args=query_args
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1030, in _mexe
raise ex
SSLError: [Errno 8] _ssl.c:490: EOF occurred in violation of protocol

tablechop deletes files which currently used by Cassandra

tablechop checks only the last modified date from the index_key file to decide if the backuped file will be deleted:

    for index_key in index_keys:
        if days_ago(index_key.last_modified) > args.age:
            break
        index_files_to_keep.add(index_key.name)

For my understanding, tablechop should only delete this files which no longer necessary for restoring the backup in the specified retention time. Depending from the table size and the compaction strategy some SSTables can live weeks or months without being compacted. These files are mandatory to restore the table, regardless of whether they have been in S3 for weeks or months. Currently active SSTables should not be deleted (unless uses a force parameter).
Meanwhile there can be exists small backuped SSTables, which have been compacted and are no longer required for restoring.

I'd be grateful if we can add could an additional verification whether the file is currently being used by cassandra/exists on the filesystem.

Infrequent Access storage class

I'd like to add the ability to use the Infrequent Access S3 storage class for backups. This requires upgrading to boto3. Would that be an acceptable change?

Tableslurp performance expectation

I regularly tell people, after 25 years in data centers, that nobody cares about backups. The only thing people actually care about is restores.

To that end I have been testing tablesnap with a small keyspace. The current test dataset is ~28 GB and has a fairly decent rate of churn; I have been running tablesnap for two weeks and have tablechop set to prune the data at 7 days. The S3 bucket currently holds about 145 GB.

I invoked tableslurp like so:

tableslurp -k <Key> -s <secret> --aws-region us-west-2 -r -n ip-10-14-193-47 my-cassandra-backups-us-west-2 /data/cassandra/data/test_keyspace ./cassandra/data/test_keyspace

After letting that run for 6 hours, tableslurp had created directories for 5 of my 179 tables and had not yet downloaded any files. I killed that process, and restarted with -t 50, to give me 50 threads, and went to bed.

12 hours later, tableslurp had created directories for 33 of the 179 tables, and still had downloaded exactly zero files.

Is this expected behavior? If not, what have I got wrong?

Tableslurp fails to restore column families with AttributeError: 'NoneType' object has no attribute 'get_metadata'

When restoring column families, tableslurp, may fail throwing:

AttributeError: 'NoneType' object has no attribute 'get_metadata'

The reason is that it arbitrarily picks the first item of the file list to restore. This might be a "sub-folder" such as "backups" or "snapshots" and not bucket leading to the exception and failure.

Listen to IN_DELETE event of inotify and cleanup files from S3

If this can be implemented, we may not need a tablechop script itself.

tablesnap crash

My deamonized tablesnap crashed after running for a while.

2015-10-20 03:57:02,873 CRITICAL Failed to lookup keyname after 1 retries
Traceback (most recent call last):
File "/usr/bin/tablesnap", line 148, in key_exists
key = bucket.get_key(keyname)
File "/usr/lib/python2.6/site-packages/boto/s3/bucket.py", line 192, in get_key
key, resp = self._get_key_internal(key_name, headers, query_args_l)
File "/usr/lib/python2.6/site-packages/boto/s3/bucket.py", line 199, in _get_key_internal
query_args=query_args)
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1030, in _mexe
raise ex
error: [Errno 110] Connection timed out

2015-10-20 03:57:02,874 CRITICAL Failed uploading Aborting.
Traceback (most recent call last):
File "/usr/bin/tablesnap", line 123, in worker
self.upload_sstable(bucket, keyname, f)
File "/usr/bin/tablesnap", line 272, in upload_sstable
if self.key_exists(bucket, keyname, filename, stat):
File "/usr/bin/tablesnap", line 148, in key_exists
key = bucket.get_key(keyname)
File "/usr/lib/python2.6/site-packages/boto/s3/bucket.py", line 192, in get_key
key, resp = self._get_key_internal(key_name, headers, query_args_l)
File "/usr/lib/python2.6/site-packages/boto/s3/bucket.py", line 199, in _get_key_internal
query_args=query_args)
File "/usr/lib/python2.6/site-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/lib/python2.6/site-packages/boto/connection.py", line 1030, in _mexe
raise ex
error: [Errno 110] Connection timed out

tablechop don't list deleted files with -l, --list-deletes

tablechop only outputs the number of deleted files and not which files has been deleted.

tablechop -l --aws-region <aws_region> <bucket_name> <keyspace_path> 1
tablechop [2017-02-16 11:53:43,703] INFO Connected to S3, getting keys ...
tablechop [2017-02-16 11:53:43,752] INFO Found 2 index files and 16 data files
tablechop [2017-02-16 11:53:43,754] INFO Keeping 1/2 index files
tablechop [2017-02-16 11:53:43,950] INFO Keeping 16/16 data files
tablechop [2017-02-16 11:53:43,950] INFO Deleting 1/18 files

I get the same output with --list-deletes

tablechop --list-deletes --aws-region <aws_region> <bucket_name> <keyspace_path> 1
tablechop [2017-02-16 11:50:55,010] INFO Connected to S3, getting keys ...
tablechop [2017-02-16 11:50:55,094] INFO Found 6 index files and 62 data files
tablechop [2017-02-16 11:50:55,096] INFO Keeping 5/6 index files
tablechop [2017-02-16 11:50:55,424] INFO Keeping 40/62 data files
tablechop [2017-02-16 11:50:55,424] INFO Deleting 25/68 files

tableslurp (on master) is throwing an exception...?

I'm testing out the latest tableslurp code (I want/need --recursive) but it seems to be failing with an exception that I don't really understand...

I'm invoking with:
tableslurp -n backups/cassandra/dp/staging-data-01 --recursive --aws-region us-west-1 my-ec2-shared /var/lib/cassandra/data /var/lib/cassandra/data/

That seems to progress well for a while but then throws an exception and bails:

tableslurp [2018-06-14 05:45:45,565] INFO Thread #1 finished processing
tableslurp [2018-06-14 05:45:45,570] INFO Thread #2 finished processing
tableslurp [2018-06-14 05:45:45,577] INFO Downloading backups/cassandra/dp/staging-data-01:/var/lib/cassandra/data/dp/fi_b/dp-fi_b-ka-415-Summary.db from my-ec2-shared to /var/lib/cassandra/data/dp/fi_b/dp-fi_b-ka-415-Summary.db
tableslurp [2018-06-14 05:45:45,614] INFO My job is done.
Exception in thread Thread-112 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
  File "/usr/lib/python2.7/threading.py", line 763, in run
  File "/usr/local/bin/tableslurp", line 317, in _worker
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'info'

I'm a bit of a python novice but if I'm reading that stack trace correctly it maps to the following line in tableslurp (line 317):
log.info('Thread #%d finished processing' % (idx,))

That seems to suggest that 'log' in undefined...? However, I see several other log lines that show "Thread #2 finished processing" so I'm super confused...

My python novice Google foo suggests it might be something wrong with threading... Anyone have any ideas?

Error using tableslurp - object has no attribute 'token'

I recently installed tablesnap via pip and the copy to S3 process appears to be working - however when I run the tableslurp command I get the following:
Traceback (most recent call last):
File "/usr/local/bin/tableslurp", line 295, in
sys.exit(main())
File "/usr/local/bin/tableslurp", line 291, in main
dh = DownloadHandler(args)
File "/usr/local/bin/tableslurp", line 86, in init
(owner, group) = self._build_file_set(args.file)
File "/usr/local/bin/tableslurp", line 114, in _build_file_set
bucket = self._get_bucket()
File "/usr/local/bin/tableslurp", line 105, in _get_bucket
security_token=self.token)
AttributeError: 'DownloadHandler' object has no attribute 'token'

I set AWS_SECURITY_TOKEN= in the environment but not clear if there are additional steps I am missing (such as using STS to get a token for the requests?).

My command is basically this:
tableslurp -n server.localdomain bucket /mnt/cassandra/data /mnt/cassandra/data

I tried an upgrade via pip but no luck yet.

Appreciate the help.

Logging disabled for newest Debian init.d

Rev 465e4bb Switches from daemon to start-stop-daemon, it leaves $LOGDIR in place, but does not use it.

There is no equivalent to --output in start-stop-daemon. One idea is to put tablesnap in a wrapper that does the redirect to a log file, but that seems clunky. Another is to have tablesnap do it's own logging, and make it configurable so that it can be part of $DAEMON_OPTS

Missing root key in list returned by boto

Hi, tablesnap works like a charm.

But when I run

 tableslurp -k -s -n hostname -p  bucket_name 
/var/lib/cassandra/data/keyspace_name ~/Downloads/

I get

KeyError: '/var/lib/cassandra/data/keyspace_name'

Printing the list I can see that only keys with subdirectories can be found, for example

/var/lib/cassandra/data/keyspace_name/table_name/keyspace_name-table_name-jb....-listdir.json

When I try to retrieve these subdirectories, I get a LookupError: Cannot find anything to restore from.

Am I doing something wrong?

only allows single --exclude option

Far as I can tell, tablesnap only allows a single usage of the --exclude option. whereas I would like to exclude snapshots/ and backups/ sub-folders.

Using tablesnap 0.7.3-2 from the Ubuntu 14.04 repo.

Tableslurp to be able to restore CLogs

Hi,

Just wondering what people thought about making tableslurp able to restore CLogs. An example output I have is here:

tableslurp -n 172.31.10.6 --aws-region ap-northeast-2 --commitlogs /cassandra/commitlog_archive lerhconsultest /cassandra/data backup-test/
tableslurp [2017-10-19 00:22:21,218] INFO Building fileset
tableslurp [2017-10-19 00:22:21,724] INFO Will now try to test writing to the target dir backup-test/
tableslurp [2017-10-19 00:22:21,724] INFO Will write to backup-test/
tableslurp [2017-10-19 00:22:21,724] INFO Running
tableslurp [2017-10-19 00:22:21,724] INFO Pushing file CommitLog-6-1508276606523.log onto queue
tableslurp [2017-10-19 00:22:21,724] INFO Pushing file CommitLog-6-1508276606522.log onto queue
tableslurp [2017-10-19 00:22:21,725] INFO Pushing file CommitLog-6-1508276606521.log onto queue
tableslurp [2017-10-19 00:22:21,725] INFO Thread #0 processing items
tableslurp [2017-10-19 00:22:21,726] INFO Thread #1 processing items
tableslurp [2017-10-19 00:22:21,728] INFO Thread #2 processing items
tableslurp [2017-10-19 00:22:21,729] INFO Thread #3 processing items
tableslurp [2017-10-19 00:22:21,766] INFO Thread #3 finished processing
tableslurp [2017-10-19 00:22:21,772] INFO Downloading 172.31.10.6:/cassandra/commitlog_archive/CommitLog-6-1508276606523.log from lerhconsultest to backup-test/CommitLog-6-1508276606523.log
tableslurp [2017-10-19 00:22:21,791] INFO Downloading 172.31.10.6:/cassandra/commitlog_archive/CommitLog-6-1508276606522.log from lerhconsultest to backup-test/CommitLog-6-1508276606522.log
tableslurp [2017-10-19 00:22:21,796] INFO Downloading 172.31.10.6:/cassandra/commitlog_archive/CommitLog-6-1508276606521.log from lerhconsultest to backup-test/CommitLog-6-1508276606521.log
tableslurp [2017-10-19 00:22:22,334] INFO Thread #2 finished processing
tableslurp [2017-10-19 00:22:22,431] INFO Thread #0 finished processing
tableslurp [2017-10-19 00:22:22,531] INFO Thread #1 finished processing
tableslurp [2017-10-19 00:22:22,531] INFO My job is done.

In this case, the arguments passed to --commitlogs will be key in the bucket the commitlogs live, the positional arguments are still the same as before. Bucket name, key in bucket where SSTables live and target location to download to.

The logic here now is, because Commitlog filenames are timestamped, we will do the following:

Scan every single KS/Table combo for their list of listdir.jsons, and find their newest listdir.json, respectively. Add that listdir.json to a master list.
Once we have populated the master list (which now contains the newest listdir.json for every single table), find the oldest listdir.json in that list and get its timestamp (which we will designate as oldest_timestamp). We need to replay CommitLogs from that time to the current time in UTC.
Find all the commitlogs in the commitlog directory passed in by the user (one can also use tablesnap to upload commitlogs). Their filename would look like this:

CommitLog-6-1508276606523.log where 1508276606523 is the time in UTC in milliseconds. We will order the CommitLog files in descending order and keep downloading them; the moment we find a CLog timestamp that does not have a timestamp larger than oldest_timestamp, we stop.

What do people think? Would it be something useful to have in tablesnap? You should rarely ever need CLogs in a restore because tablesnap already uploads files to S3 the moment they even show up in the watched directories, but in cases where absolute consistency is needed (in my case), it could be useful.

My reservations for such a thing is if it happens that we have a table which is rarely ever written to (and hence rarely ever flushed), its listdir.json will be really really old. This means that we are going to have to download very very many Commitlogs! (But if the requirements are there, we have no choice because a CLog may have that single 1 transaction that wasn't flushed into SSTable...)