scylladb / gemini Goto Github PK
View Code? Open in Web Editor NEWTest data integrity by comparing against an Oracle running in parallel
License: Apache License 2.0
Test data integrity by comparing against an Oracle running in parallel
License: Apache License 2.0
Currently, if you launch gemini
in the wrong directory, the tool complains that:
cannot create schema: open schema.json: no such file or directory
Let's add a command line option to specify the schema configuration file.
I am seeing the following error:
Failed! Mutation 'INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0) VALUES (?,?,?,?,?)' (values=[ZhtcLiAhrobsZP7Y5f9kFocDbDhPo/9jeHrF059htGLYGB126jCxOYiet 5 1994-03-12 14:31:51 +0200 EET 0cTaXQ5la32tSTixuiyUEknNFRqn5VNL3b5hh+hWEF+n7zvhpCMSu8jkfzz Uwtgq0ax18drFRuRCDPtlE6ApdMRegzcEgDUoplOWbzWgjIBSB7FhO]) caused an error: 'can not marshal time.Time into int [cluster = test, query = 'INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0) VALUES (?,?,?,?,?)']'
git bisect
blames the following commit:
13d308e4fafaa967ce0d97992cf682c244a0cc1b is the first bad commit
commit 13d308e4fafaa967ce0d97992cf682c244a0cc1b
Author: Henrik Johansson <[email protected]>
Date: Tue Mar 12 14:07:46 2019 +0100
schema: Clustering keys can now have the same types as partition keys.
:100644 100644 a6d99a514ad1231420c0ff3e1c19e2b9531a15e8 1285d83f682e5fe9ac325f666a11de776d74299c M CHANGELOG.md
:100644 100644 9bbefe97444f4404a2ea50f5d6e74f994823d89e d4cdada91f145b4e880f42f15c3d407717be69e5 M go.mod
:100644 100644 9ca24e213568b35b43472ae8298e84b324ae567e 87967a39f921bdaacc732dd09606e2e09da59467 M go.sum
:100644 100644 084890688da9a70a822c951092855a9aa81c9c34 5d3084a942089a890bbb10142fe368d464e3f186 M schema.go
Add support for CQL SELECT JSON
statements:
http://cassandra.apache.org/doc/latest/cql/json.html#select-json
We need at least a "getting started guide" and perhaps something that describes the design of the tool to advanced users.
We have rather few columns being generated and more types than fits in the columns currently.
Having this configurable would be neat to allow for more diverse data types being generated.
If we don't do this currently then the validation will step on each others feet in the same way as when we have overlapping partitions for non-int partition keys.
We restricted partition key type to int in commit 0f8e0dc because we suspect a bug in how Gemini partitions keyspace between threads.
The session.Check()
looks pretty broken.. For example, if testIter.MapScan()
fails in the middle of a result set, we bail out with bogus ErrReadNoDataReturned
error, which is a successful case.
We have a reporter routine that handles this and we just need a flexible way to pass additional information.
Add support for CQL INSERT JSON
statements:
http://cassandra.apache.org/doc/latest/cql/json.html#insert-json
Passing the --non-interactive
flag to gemini
:
$ ./scripts/gemini-launcher --non-interactive
causes the following panic:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x7250a0]
goroutine 150 [running]:
main.runJob.func1(0xc00026ac40, 0xc0002629c0, 0x8416c0, 0xc000271240, 0xc0002f6180, 0x8bb2c97000)
/home/penberg/go/src/github.com/scylladb/gemini/cmd/gemini/root.go:195 +0x2f0
created by main.runJob
/home/penberg/go/src/github.com/scylladb/gemini/cmd/gemini/root.go:171 +0x36b
Reported by @aleksbykov.
We currently support the following subset of CQL primitive types:
int
bigint
blob
uuid
text
varchar
timestamp
Let's add support for the rest of the primitive types enumerated here:
http://cassandra.apache.org/doc/latest/cql/types.html#native-types
There is heavy contention on the shared global lock.
When new dependencies are added, gemini-launcher
does not seem to fetch them automatically:
[penberg@nero gemini]$ ./scripts/gemini-launcher --duration 10s --drop-schema
session.go:14:2: cannot find package "go.uber.org/multierr" in any of:
/usr/lib/golang/src/go.uber.org/multierr (from $GOROOT)
/home/penberg/go/src/go.uber.org/multierr (from $GOPATH)
Compilation failed
Gemini currently uses the default QUORUM
consistency level. Let's add support for different consistency levels.
Creating string based queries is error prone and hard to change. With a flexible builder we could make this much safer and easier.
Example:
--concurrency=1
--max-tests=10000
The result is:
thread 0: write ops: 1 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 529 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 1017 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 1515 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2010 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2498 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2998 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 3490 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 3978 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 4486 | read ops: 0 | write errors: 0 | read errors: 0
Results:
write ops: 4988
read ops: 0
write errors: 0
read errors: 0
This seems way to short essentially half the requested runtime is cut.
Make it easier for machines to parse Gemini results by supporting result output to a file.
We get these errors:
Clustering column "ck1" cannot be restricted (preceding column "ck0" is restricted by a non-EQ relation
This is a result of queries like:
SELECT * FROM ks1.table1 WHERE pk0 = ? AND ck0 > ? AND ck0 < ? AND ck1 > ? AND ck1 < ? AND ck2 > ? AND ck2 < ?
We need to be smarter and generate a variety of queries proper EQ restrictions on the left clustering columns. Many combinations can and should be generated.
The reason we haven't seen them before is because of missing error handling which we should also fix.
With the introduction of the complex types the schema input file was left behind. It needs to be tidied up and made understandable. The complex types needs better and more descriptive json notation.
This function returns the number rows available in the current page not the count of the entire result set.
This can be very frustrating to say the least.
I passed the --fail-fast
option to gemini
, but after a timeout error, the tool keeps running.
This is a convenient thing and can possibly help with bootstrapping issues.
Both Scylla and Cassandra have Docker images, which can be used for testing. Let's add a launcher script to Gemini, which automates docker run
and IP address discovery to simplify Gemini setup.
We need a blacklist or something for types that this is not supported.
It should both simplify query building, make the code more readable and provide dogfooding for our own projects.
Add support for generating schema and queries with materialized views.
If run several times to same nodes, gemini return results
{
"result": {
"write_ops": 0,
"write_errors": 0,
"read_ops": 0,
"read_errors": 0,
"errors": null
}
}
run next command: ./gemini -d --duration 60s -c 20 -p 100 -m mixed -f -v --test-cluster=34.201.251.194 --oracle-cluster=3.93.183.245
Version of gemini is 0.9.2
while debug log shows that there are a lot of operations:
INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[861 2010-10-14 831 aAQ1fWYPK map[udt_672245080_3:52 udt_672245080_0:X19J2m9kx udt_672245080_1:true udt_672245080_2:5594391364288805258] 2006-12-15 22:54:25 +0600 +06 0.812 c51db500-53ed-11d4-99b8-b06ebf2a6a60 map[false:14h17m0s true:14h22m0s] TavRBi29 859]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[561 2010-10-14 531 aAQ1fWzfc map[udt_672245080_0:XsJs udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326e684678] 2006-12-15 22:54:25 +0600 +06 0.512 c51db500-53ed-11d4-99b9-b06ebf2a6a60 map[false:9h17m0s true:9h22m0s] TavRBi7K 559]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[6 2010-10-14 31 aAQ1fRxcU map[udt_672245080_1:true udt_672245080_2:5594391364288805258 udt_672245080_3:52 udt_672245080_0:X19J2obzb] 2006-12-15 22:54:25 +0600 +06 0.012 c51db500-53ed-11d4-99ba-b06ebf2a6a60 map[true:1h2m0s false:57m0s] TavRBmqU 3]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1061 2010-10-14 1031 aAQ1fUMZ1 map[udt_672245080_1:true udt_672245080_2:5594391364288805258 udt_672245080_3:52 udt_672245080_0:X19J2kB7o] 2006-12-15 22:54:25 +0600 +06 1.012 c51db500-53ed-11d4-99bb-b06ebf2a6a60 map[false:17h37m0s true:17h42m0s] TavRBnWw 1059]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[961 2010-10-14 931 aAQ1fTj4G map[udt_672245080_0:XsJs udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326d465349] 2006-12-15 22:54:25 +0600 +06 0.912 c51db500-53ed-11d4-99bc-b06ebf2a6a60 map[false:15h57m0s true:16h2m0s] TavRBjXB 959]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1661 2010-10-14 1631 aAQ1fVynJ map[udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a3270337261 udt_672245080_0:XsJs udt_672245080_1:false] 2006-12-15 22:54:25 +0600 +06 1.612 c51db500-53ed-11d4-99bd-b06ebf2a6a60 map[false:27h37m0s true:27h42m0s] TavRBl7O 1659]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1961 2010-10-14 1931 aAQ1fVIYV map[udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326e45684a udt_672245080_0:XsJs] 2006-12-15 22:54:25 +0600 +06 1.912 c51db500-53ed-11d4-99be-b06ebf2a6a60 map[false:32h37m0s true:32h42m0s] TavRBggJ 1959]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.416]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[1.716]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.716]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.616]) { "result": { "write_ops": 0, "write_errors": 0, "read_ops": 0, "read_errors": 0, "errors": null } } Test run completed. Exiting.
It is good to use our own software and more and more users are using it over the regular driver.
When building the first time all the dependencies are missing. By using Go modules this can quite easily be fixed to automatically download all the needed deps.
Add a command line option to run the tool for a specified duration.
Let's generate schema updates (e.g. ALTER TABLE
) as part of the test run to find Scylla bugs in that area.
This is very convenient and allows for easy inclusion of the version in reports.
Add support for generating queries that use the ALLOW FILTERING
option.
Currently each job is intended to work on an isolated set of partitions. This brings good things like easy coordination and program state management. It also comes with problems such as range queries that cross partitions doesn't work. This is an unacceptable limitation in the long run so we simply have to find another way to handle this.
When Gemini test fails, the tool prints the CQL statement as follows:
SELECT * FROM ks1.table1 WHERE pk0 = ?' (values=[W]
Let's make the CQL statement executable from cqlsh
without cumbersome copy-paste by substituting the question marks as follows:
SELECT * FROM ks1.table1 WHERE pk0 = textasblob('W');
(In this scenario, the type of pk0
is a blob
hence the textasblob
conversion function.)
Add support for generating schema and query with secondary indexes.
If schema parameter used, gemini stop with error:
./gemini -d --duration 60s -c 20 -p 100 -m mixed -f --test-cluster=34.201.251.194 --oracle-cluster=3.93.183.245 --schema scheme.json
Seed: 1
Maximum duration: 1m0s
Concurrency: 20
Number of partitions per thread: 100
Test cluster: 34.201.251.194
Oracle cluster: 3.93.183.245
Output file:
cannot create schema: json: cannot unmarshal string into Go struct field ColumnDef.Type of type gemini.Type
schema file located here: https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json
We generate range queries for both partition and clustering keys, which limits the types of the columns to int
due to limitations in value generation.
Make Gemini generate schema using different compaction strategies.
Users can already specify the schema via a configuration while, but let's add schema generation as part of the tool to increase test coverage.
Triggering bugs in Scylla can require gemini
to be run for a long time. When a problem happens, all the interesting information about IP addresses and schema are lost because of noisy progress reporting.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.