gianlucaborello / cassandradump Goto Github PK
View Code? Open in Web Editor NEWA data exporting tool for Cassandra inspired from mysqldump, with some additional slice and dice capabilities
License: GNU General Public License v2.0
A data exporting tool for Cassandra inspired from mysqldump, with some additional slice and dice capabilities
License: GNU General Public License v2.0
File "cassandradump.py", line 63
for has_counter, columns in itertools.groupby(tableval.columns.iteritems(), lambda (k, v): v.data_type.typename == 'counter')
^
SyntaxError: invalid syntax
I've found that when you don't pass a keyspace, this will try to export all keyspaces including "system_schema, system_auth, system" etc, which can't be overwritten, as I get errors like:
system_schema keyspace is not user-modifiable.
or Cannot CREATE <keyspace system_auth>
I. am not familiar with cassandra, but even if we find a way to overwrite system keyspaces, it doesn't sound like we should be doing it.
A flag like --ignore-system
when used without --keyspaces
should be able to copy all keyspaces except the system ones.
When you do install it installs cassandra-driver==3.0.0c1
as a requirement which happens to cause errors when doing exports:
Traceback (most recent call last):
File "/usr/local/bin/cassandradump", line 9, in <module>
load_entry_point('cassandradump==0.0.1', 'console_scripts', 'cassandradump')()
File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 350, in main
export_data(session)
File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 230, in export_data
table_to_cqlfile(session, keyname, tablename, None, tableval, f)
File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 94, in table_to_cqlfile
value_encoders = make_value_encoders(tableval)
File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 58, in make_value_encoders
return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.iteritems())
File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 58, in <genexpr>
return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.iteritems())
AttributeError: 'ColumnMetadata' object has no attribute 'data_type'
Manually reverting cassandra driver to 2.6.0 solves the problem.
python ~/cassandradump/cassandradump.py --no-insert --export-file ./table2.cql --cf=keyspace1.table2
... AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';;
returns double ";" at the end of file
when i try import schema
cqlsh localhost < ./table2.cql
:24:SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:0 no viable alternative at input ';' ([;])">
Hello,
I have been running cassandradump on our local test clusters to export and import keyspaces around.
However, on larger dbs, on multi-datacenter clusters, I can see export jobs running for days, until they export all data.
I'm talking in the regards of a keyspace with replication factor 3 , spread across 4 nodes, around 200GB data total across all nodes.
While this all is fine, considering the amount of data we are dealing with, I am barely seeing any load on the machines that are holding the relevant pieces of data. No high CPU or IO usage or any abnormal behavior really.
Having that in mind, I was wondering, if there's anything I can be tuning to further improve the speeds of exporting those larger dbs ?
Any suggestions appreciated !
Thanks,
This way I wouldn't have to worry about clobbering production data by accident.
Hello, when I tried to import a large database, 600k+ lines, throwed me an exception:
Traceback (most recent call last):
File "../cassandradump/cassandradump.py", line 271, in
main()
File "../cassandradump/cassandradump.py", line 263, in main
import_data(session)
File "../cassandradump/cassandradump.py", line 65, in import_data
session.execute(line)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1405, in execute
result = future.result(timeout)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2976, in result
raise self._final_exception
cassandra.protocol.SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] >message="line 0:-1 no viable alternative at input ''">
Script imported most of the code, but the last table, with 500k lines, imported only 11k.
Thanks any way.
Traceback (most recent call last):
File "cassandradump.py", line 247, in <module>
main()
File "cassandradump.py", line 241, in main
export_data(session)
File "cassandradump.py", line 142, in export_data
table_to_cqlfile(session, keyname, tablename, None, tableval, f)
File "cassandradump.py", line 44, in table_to_cqlfile
filep.write('INSERT INTO "' + keyspace + '"."' + tablename + '" (' + ', '.join(row.keys()) + ') VALUES (' + ', '.join(values) + ')\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 206: ordinal not in range(128)
hi,
i get an error while exporting this table with cassandra 2.1:
CREATE TABLE slots (
type text,
host text,
count int,
PRIMARY KEY (type,host)
);
Exporting all keyspaces
Exporting schema for keyspace engine
Exporting data for column family engine.slots
Traceback (most recent call last):
File "cassandradump.py", line 351, in <module>
main()
File "cassandradump.py", line 345, in main
export_data(session)
File "cassandradump.py", line 225, in export_data
table_to_cqlfile(session, keyname, tablename, None, tableval, f)
File "cassandradump.py", line 94, in table_to_cqlfile
value_encoders = make_value_encoders(tableval)
File "cassandradump.py", line 58, in make_value_encoders
return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.ite
File "cassandradump.py", line 58, in <genexpr>
return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.ite
AttributeError: 'ColumnMetadata' object has no attribute 'data_type'
I got connection problem when executing
python dumper.py --export-file fxm_test.cql --host 172.31.5.30
I'm running Cassandra 2.1.5 and Python 2.7.6
cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3
Traceback (most recent call last):
File "dumper.py", line 356, in <module>
main()
File "dumper.py", line 345, in main
session = setup_cluster()
File "dumper.py", line 300, in setup_cluster
session = cluster.connect()
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 839, in connect
self.control_connection.connect()
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2075, in connect
self._set_new_connection(self._reconnect_internal())
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2110, in _reconnect_internal
raise NoHostAvailable("Unable to connect to any servers", errors)
cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'172.31.5.30': ConnectionException(u'Failed to initialize new connection to 172.31.5.30: code=0000 [Server error] message="io.netty.handler.codec.DecoderException: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version: 4"',)})
The Cassandra service is running but not sure why the script couldn't connect. Any idea?
Thanks!
Export of table definitions creates:
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
Should be:
AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}
It only seems to be for the caching line, all other maps are correctly generated.
For example, a UDT Point{x int; y int} is encoded as:
"Point(x=1.234, y=3.231)"
However, for it to be CQL compliant, it should be encoded as:
"{x: 1.234, y=3.231}"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.