gianlucaborello / cassandradump Goto Github PK

A data exporting tool for Cassandra inspired from mysqldump, with some additional slice and dice capabilities

License: GNU General Public License v2.0

Python 84.22% Shell 7.63% Makefile 8.15%

python cassandra nosql

cassandradump's Issues

Syntax error

File "cassandradump.py", line 63
for has_counter, columns in itertools.groupby(tableval.columns.iteritems(), lambda (k, v): v.data_type.typename == 'counter')
^
SyntaxError: invalid syntax

Ignore system keyspaces

I've found that when you don't pass a keyspace, this will try to export all keyspaces including "system_schema, system_auth, system" etc, which can't be overwritten, as I get errors like:

system_schema keyspace is not user-modifiable. or Cannot CREATE <keyspace system_auth>

I. am not familiar with cassandra, but even if we find a way to overwrite system keyspaces, it doesn't sound like we should be doing it.
A flag like --ignore-system when used without --keyspaces should be able to copy all keyspaces except the system ones.

There is no version specified for cassandra-driver in requires

When you do install it installs cassandra-driver==3.0.0c1 as a requirement which happens to cause errors when doing exports:

Traceback (most recent call last):
  File "/usr/local/bin/cassandradump", line 9, in <module>
    load_entry_point('cassandradump==0.0.1', 'console_scripts', 'cassandradump')()
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 350, in main
    export_data(session)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 230, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 94, in table_to_cqlfile
    value_encoders = make_value_encoders(tableval)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 58, in make_value_encoders
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.iteritems())
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 58, in <genexpr>
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.iteritems())
AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

Manually reverting cassandra driver to 2.6.0 solves the problem.

schema dump ";;"

python ~/cassandradump/cassandradump.py --no-insert --export-file ./table2.cql --cf=keyspace1.table2

... AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';;

returns double ";" at the end of file

when i try import schema

cqlsh localhost < ./table2.cql
:24:SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:0 no viable alternative at input ';' ([;])">

Anything I can do to speedup keyspace dump ?

Hello,

I have been running cassandradump on our local test clusters to export and import keyspaces around.

However, on larger dbs, on multi-datacenter clusters, I can see export jobs running for days, until they export all data.

I'm talking in the regards of a keyspace with replication factor 3 , spread across 4 nodes, around 200GB data total across all nodes.

While this all is fine, considering the amount of data we are dealing with, I am barely seeing any load on the machines that are holding the relevant pieces of data. No high CPU or IO usage or any abnormal behavior really.

Having that in mind, I was wondering, if there's anything I can be tuning to further improve the speeds of exporting those larger dbs ?

Any suggestions appreciated !

Thanks,

It would be great to be able to rename keyspaces as part of the export.

This way I wouldn't have to worry about clobbering production data by accident.

Exception on import large database

Hello, when I tried to import a large database, 600k+ lines, throwed me an exception:

Traceback (most recent call last):
File "../cassandradump/cassandradump.py", line 271, in
main()
File "../cassandradump/cassandradump.py", line 263, in main
import_data(session)
File "../cassandradump/cassandradump.py", line 65, in import_data
session.execute(line)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1405, in execute
result = future.result(timeout)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2976, in result
raise self._final_exception
cassandra.protocol.SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] >message="line 0:-1 no viable alternative at input ''">

Script imported most of the code, but the last table, with 500k lines, imported only 11k.

Thanks any way.

Does not work with database containing unicode data.

Traceback (most recent call last):
  File "cassandradump.py", line 247, in <module>
    main()
  File "cassandradump.py", line 241, in main
    export_data(session)
  File "cassandradump.py", line 142, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "cassandradump.py", line 44, in table_to_cqlfile
    filep.write('INSERT INTO "' + keyspace + '"."' + tablename + '" (' + ', '.join(row.keys()) + ') VALUES (' + ', '.join(values) + ')\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 206: ordinal not in range(128)

AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

hi,
i get an error while exporting this table with cassandra 2.1:

CREATE TABLE slots (
    type text,
    host text,
    count int,
    PRIMARY KEY (type,host)
 );

Exporting all keyspaces
Exporting schema for keyspace engine
Exporting data for column family engine.slots
Traceback (most recent call last):
  File "cassandradump.py", line 351, in <module>
    main()
  File "cassandradump.py", line 345, in main
    export_data(session)
  File "cassandradump.py", line 225, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "cassandradump.py", line 94, in table_to_cqlfile
    value_encoders = make_value_encoders(tableval)
  File "cassandradump.py", line 58, in make_value_encoders
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.ite
  File "cassandradump.py", line 58, in <genexpr>
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.ite
AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

Connection Failed

I got connection problem when executing
python dumper.py --export-file fxm_test.cql --host 172.31.5.30

I'm running Cassandra 2.1.5 and Python 2.7.6
cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3

Traceback (most recent call last):
  File "dumper.py", line 356, in <module>
    main()
  File "dumper.py", line 345, in main
    session = setup_cluster()
  File "dumper.py", line 300, in setup_cluster
    session = cluster.connect()
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 839, in connect
    self.control_connection.connect()
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2075, in connect
    self._set_new_connection(self._reconnect_internal())
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2110, in _reconnect_internal
    raise NoHostAvailable("Unable to connect to any servers", errors)
cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'172.31.5.30': ConnectionException(u'Failed to initialize new connection to 172.31.5.30: code=0000 [Server error] message="io.netty.handler.codec.DecoderException: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version: 4"',)})

The Cassandra service is running but not sure why the script couldn't connect. Any idea?
Thanks!

Export incorrectly quotes Maps

Export of table definitions creates:

    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

Should be:

    AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}

It only seems to be for the caching line, all other maps are correctly generated.

Not working with case sensitive columnfamily names and colums

UDTs (User-Defined Types) are not encoded correctly

For example, a UDT Point{x int; y int} is encoded as:
"Point(x=1.234, y=3.231)"

However, for it to be CQL compliant, it should be encoded as:
"{x: 1.234, y=3.231}"

gianlucaborello / cassandradump Goto Github PK

cassandradump's Issues

Syntax error

Ignore system keyspaces

There is no version specified for cassandra-driver in requires

schema dump ";;"

Anything I can do to speedup keyspace dump ?

It would be great to be able to rename keyspaces as part of the export.

Exception on import large database

Does not work with database containing unicode data.

AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

Connection Failed

Export incorrectly quotes Maps

Not working with case sensitive columnfamily names and colums

UDTs (User-Defined Types) are not encoded correctly

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent