Giter Site home page Giter Site logo

gianlucaborello / cassandradump Goto Github PK

View Code? Open in Web Editor NEW
204.0 13.0 81.0 801 KB

A data exporting tool for Cassandra inspired from mysqldump, with some additional slice and dice capabilities

License: GNU General Public License v2.0

Python 84.22% Shell 7.63% Makefile 8.15%
python cassandra nosql

cassandradump's Issues

Syntax error

File "cassandradump.py", line 63
for has_counter, columns in itertools.groupby(tableval.columns.iteritems(), lambda (k, v): v.data_type.typename == 'counter')
^
SyntaxError: invalid syntax

Ignore system keyspaces

I've found that when you don't pass a keyspace, this will try to export all keyspaces including "system_schema, system_auth, system" etc, which can't be overwritten, as I get errors like:

system_schema keyspace is not user-modifiable. or Cannot CREATE <keyspace system_auth>

I. am not familiar with cassandra, but even if we find a way to overwrite system keyspaces, it doesn't sound like we should be doing it.
A flag like --ignore-system when used without --keyspaces should be able to copy all keyspaces except the system ones.

There is no version specified for cassandra-driver in requires

When you do install it installs cassandra-driver==3.0.0c1 as a requirement which happens to cause errors when doing exports:

Traceback (most recent call last):
  File "/usr/local/bin/cassandradump", line 9, in <module>
    load_entry_point('cassandradump==0.0.1', 'console_scripts', 'cassandradump')()
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 350, in main
    export_data(session)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 230, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 94, in table_to_cqlfile
    value_encoders = make_value_encoders(tableval)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 58, in make_value_encoders
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.iteritems())
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 58, in <genexpr>
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.iteritems())
AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

Manually reverting cassandra driver to 2.6.0 solves the problem.

schema dump ";;"

python ~/cassandradump/cassandradump.py --no-insert --export-file ./table2.cql --cf=keyspace1.table2

... AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';;

returns double ";" at the end of file

when i try import schema

cqlsh localhost < ./table2.cql
:24:SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:0 no viable alternative at input ';' ([;])">

Anything I can do to speedup keyspace dump ?

Hello,

I have been running cassandradump on our local test clusters to export and import keyspaces around.

However, on larger dbs, on multi-datacenter clusters, I can see export jobs running for days, until they export all data.

I'm talking in the regards of a keyspace with replication factor 3 , spread across 4 nodes, around 200GB data total across all nodes.

While this all is fine, considering the amount of data we are dealing with, I am barely seeing any load on the machines that are holding the relevant pieces of data. No high CPU or IO usage or any abnormal behavior really.

Having that in mind, I was wondering, if there's anything I can be tuning to further improve the speeds of exporting those larger dbs ?

Any suggestions appreciated !

Thanks,

Exception on import large database

Hello, when I tried to import a large database, 600k+ lines, throwed me an exception:

Traceback (most recent call last):
File "../cassandradump/cassandradump.py", line 271, in
main()
File "../cassandradump/cassandradump.py", line 263, in main
import_data(session)
File "../cassandradump/cassandradump.py", line 65, in import_data
session.execute(line)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1405, in execute
result = future.result(timeout)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2976, in result
raise self._final_exception
cassandra.protocol.SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] >message="line 0:-1 no viable alternative at input ''">

Script imported most of the code, but the last table, with 500k lines, imported only 11k.

Thanks any way.

Does not work with database containing unicode data.

Traceback (most recent call last):
  File "cassandradump.py", line 247, in <module>
    main()
  File "cassandradump.py", line 241, in main
    export_data(session)
  File "cassandradump.py", line 142, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "cassandradump.py", line 44, in table_to_cqlfile
    filep.write('INSERT INTO "' + keyspace + '"."' + tablename + '" (' + ', '.join(row.keys()) + ') VALUES (' + ', '.join(values) + ')\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 206: ordinal not in range(128)

AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

hi,
i get an error while exporting this table with cassandra 2.1:

CREATE TABLE slots (
    type text,
    host text,
    count int,
    PRIMARY KEY (type,host)
 );
Exporting all keyspaces
Exporting schema for keyspace engine
Exporting data for column family engine.slots
Traceback (most recent call last):
  File "cassandradump.py", line 351, in <module>
    main()
  File "cassandradump.py", line 345, in main
    export_data(session)
  File "cassandradump.py", line 225, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "cassandradump.py", line 94, in table_to_cqlfile
    value_encoders = make_value_encoders(tableval)
  File "cassandradump.py", line 58, in make_value_encoders
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.ite
  File "cassandradump.py", line 58, in <genexpr>
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.ite
AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

Connection Failed

I got connection problem when executing
python dumper.py --export-file fxm_test.cql --host 172.31.5.30

I'm running Cassandra 2.1.5 and Python 2.7.6
cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3

Traceback (most recent call last):
  File "dumper.py", line 356, in <module>
    main()
  File "dumper.py", line 345, in main
    session = setup_cluster()
  File "dumper.py", line 300, in setup_cluster
    session = cluster.connect()
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 839, in connect
    self.control_connection.connect()
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2075, in connect
    self._set_new_connection(self._reconnect_internal())
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2110, in _reconnect_internal
    raise NoHostAvailable("Unable to connect to any servers", errors)
cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'172.31.5.30': ConnectionException(u'Failed to initialize new connection to 172.31.5.30: code=0000 [Server error] message="io.netty.handler.codec.DecoderException: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version: 4"',)})

The Cassandra service is running but not sure why the script couldn't connect. Any idea?
Thanks!

Export incorrectly quotes Maps

Export of table definitions creates:

    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

Should be:

    AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}

It only seems to be for the caching line, all other maps are correctly generated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.