scylladb / gemini Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 15.0 1019 KB

Test data integrity by comparing against an Oracle running in parallel

License: Apache License 2.0

Go 99.03% Shell 0.51% Makefile 0.44% Dockerfile 0.01%

gemini's People

Contributors

Stargazers

Watchers

Forkers

kbr- grighetto aboudreault aleksbykov tbg guangminglion thumosiii dkropachev isabella232 soyacz illia-li nuivall yarongilor cvybhu felix-zhoux

gemini's Issues

Command line option for schema configuration file

Currently, if you launch gemini in the wrong directory, the tool complains that:

cannot create schema: open schema.json: no such file or directory

Let's add a command line option to specify the schema configuration file.

Insertion failures because of time.Time marshaling problem

I am seeing the following error:

Failed! Mutation 'INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0) VALUES (?,?,?,?,?)' (values=[ZhtcLiAhrobsZP7Y5f9kFocDbDhPo/9jeHrF059htGLYGB126jCxOYiet 5 1994-03-12 14:31:51 +0200 EET 0cTaXQ5la32tSTixuiyUEknNFRqn5VNL3b5hh+hWEF+n7zvhpCMSu8jkfzz Uwtgq0ax18drFRuRCDPtlE6ApdMRegzcEgDUoplOWbzWgjIBSB7FhO]) caused an error: 'can not marshal time.Time into int [cluster = test, query = 'INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0) VALUES (?,?,?,?,?)']'

git bisect blames the following commit:

13d308e4fafaa967ce0d97992cf682c244a0cc1b is the first bad commit
commit 13d308e4fafaa967ce0d97992cf682c244a0cc1b
Author: Henrik Johansson <[email protected]>
Date:   Tue Mar 12 14:07:46 2019 +0100

    schema: Clustering keys can now have the same types as partition keys.

:100644 100644 a6d99a514ad1231420c0ff3e1c19e2b9531a15e8 1285d83f682e5fe9ac325f666a11de776d74299c M      CHANGELOG.md
:100644 100644 9bbefe97444f4404a2ea50f5d6e74f994823d89e d4cdada91f145b4e880f42f15c3d407717be69e5 M      go.mod
:100644 100644 9ca24e213568b35b43472ae8298e84b324ae567e 87967a39f921bdaacc732dd09606e2e09da59467 M      go.sum
:100644 100644 084890688da9a70a822c951092855a9aa81c9c34 5d3084a942089a890bbb10142fe368d464e3f186 M      schema.go

CQL SELECT JSON support

Add support for CQL SELECT JSON statements:

http://cassandra.apache.org/doc/latest/cql/json.html#select-json

Improve documentation

We need at least a "getting started guide" and perhaps something that describes the design of the tool to advanced users.

Column count configuration

We have rather few columns being generated and more types than fits in the columns currently.
Having this configurable would be neat to allow for more diverse data types being generated.

Restrict secondary index queries to a partition

If we don't do this currently then the validation will step on each others feet in the same way as when we have overlapping partitions for non-int partition keys.

Partition key type is restricted to int

We restricted partition key type to int in commit 0f8e0dc because we suspect a bug in how Gemini partitions keyspace between threads.

Test result validation is broken

The session.Check() looks pretty broken.. For example, if testIter.MapScan() fails in the middle of a result set, we bail out with bogus ErrReadNoDataReturned error, which is a successful case.

Extend error details propagation

We have a reporter routine that handles this and we just need a flexible way to pass additional information.

CQL INSERT JSON support

Add support for CQL INSERT JSON statements:

http://cassandra.apache.org/doc/latest/cql/json.html#insert-json

Passing --non-interactive flag causes a panic

Passing the --non-interactive flag to gemini:

$ ./scripts/gemini-launcher --non-interactive

causes the following panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x7250a0]

goroutine 150 [running]:
main.runJob.func1(0xc00026ac40, 0xc0002629c0, 0x8416c0, 0xc000271240, 0xc0002f6180, 0x8bb2c97000)
        /home/penberg/go/src/github.com/scylladb/gemini/cmd/gemini/root.go:195 +0x2f0
created by main.runJob
        /home/penberg/go/src/github.com/scylladb/gemini/cmd/gemini/root.go:171 +0x36b

Reported by @aleksbykov.

Complete primitive CQL type support

We currently support the following subset of CQL primitive types:

int
bigint
blob
uuid
text
varchar
timestamp

Let's add support for the rest of the primitive types enumerated here:

http://cassandra.apache.org/doc/latest/cql/types.html#native-types

Goroutine local random number generator

There is heavy contention on the shared global lock.

Gemini launcher does not fetch new dependencies

When new dependencies are added, gemini-launcher does not seem to fetch them automatically:

[penberg@nero gemini]$ ./scripts/gemini-launcher --duration 10s --drop-schema
session.go:14:2: cannot find package "go.uber.org/multierr" in any of:
	/usr/lib/golang/src/go.uber.org/multierr (from $GOROOT)
	/home/penberg/go/src/go.uber.org/multierr (from $GOPATH)
Compilation failed

Support different consistency levels

Gemini currently uses the default QUORUM consistency level. Let's add support for different consistency levels.

refactoring: create query builder

Creating string based queries is error prone and hard to change. With a flexible builder we could make this much safer and easier.

max-tests seems to not be honored

Example:

--concurrency=1
--max-tests=10000

The result is:

thread 0: write ops: 1 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 529 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 1017 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 1515 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2010 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2498 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2998 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 3490 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 3978 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 4486 | read ops: 0 | write errors: 0 | read errors: 0
Results:
	write ops:    4988
	read ops:     0
	write errors: 0
	read errors:  0

This seems way to short essentially half the requested runtime is cut.

Result output to a file

Make it easier for machines to parse Gemini results by supporting result output to a file.

Invalid clustering range queries generated

We get these errors:
Clustering column "ck1" cannot be restricted (preceding column "ck0" is restricted by a non-EQ relation

This is a result of queries like:
SELECT * FROM ks1.table1 WHERE pk0 = ? AND ck0 > ? AND ck0 < ? AND ck1 > ? AND ck1 < ? AND ck2 > ? AND ck2 < ?

We need to be smarter and generate a variety of queries proper EQ restrictions on the left clustering columns. Many combinations can and should be generated.

The reason we haven't seen them before is because of missing error handling which we should also fix.

JSON Schema file input too complex

With the introduction of the complex types the schema input file was left behind. It needs to be tidied up and made understandable. The complex types needs better and more descriptive json notation.

Validation can't rely on NumRows()

This function returns the number rows available in the current page not the count of the entire result set.

gemini-launcher silently continues even if compilation fails

This can be very frustrating to say the least.

User-defined type (UDT) CQL support

Refined fail-fast option

I passed the --fail-fast option to gemini, but after a timeout error, the tool keeps running.

Support list of hosts for the clusters

This is a convenient thing and can possibly help with bootstrapping issues.

Gemini launcher for Docker images

Both Scylla and Cassandra have Docker images, which can be used for testing. Let's add a launcher script to Gemini, which automates docker run and IP address discovery to simplify Gemini setup.

Secondary indexes not supported for durations

We need a blacklist or something for types that this is not supported.

Use Gocqlx

It should both simplify query building, make the code more readable and provide dogfooding for our own projects.

Materialized views support

Add support for generating schema and queries with materialized views.

Result of gemini contains only 0

If run several times to same nodes, gemini return results
{
"result": {
"write_ops": 0,
"write_errors": 0,
"read_ops": 0,
"read_errors": 0,
"errors": null
}
}
run next command: ./gemini -d --duration 60s -c 20 -p 100 -m mixed -f -v --test-cluster=34.201.251.194 --oracle-cluster=3.93.183.245
Version of gemini is 0.9.2
while debug log shows that there are a lot of operations:
INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[861 2010-10-14 831 aAQ1fWYPK map[udt_672245080_3:52 udt_672245080_0:X19J2m9kx udt_672245080_1:true udt_672245080_2:5594391364288805258] 2006-12-15 22:54:25 +0600 +06 0.812 c51db500-53ed-11d4-99b8-b06ebf2a6a60 map[false:14h17m0s true:14h22m0s] TavRBi29 859]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[561 2010-10-14 531 aAQ1fWzfc map[udt_672245080_0:XsJs udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326e684678] 2006-12-15 22:54:25 +0600 +06 0.512 c51db500-53ed-11d4-99b9-b06ebf2a6a60 map[false:9h17m0s true:9h22m0s] TavRBi7K 559]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[6 2010-10-14 31 aAQ1fRxcU map[udt_672245080_1:true udt_672245080_2:5594391364288805258 udt_672245080_3:52 udt_672245080_0:X19J2obzb] 2006-12-15 22:54:25 +0600 +06 0.012 c51db500-53ed-11d4-99ba-b06ebf2a6a60 map[true:1h2m0s false:57m0s] TavRBmqU 3]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1061 2010-10-14 1031 aAQ1fUMZ1 map[udt_672245080_1:true udt_672245080_2:5594391364288805258 udt_672245080_3:52 udt_672245080_0:X19J2kB7o] 2006-12-15 22:54:25 +0600 +06 1.012 c51db500-53ed-11d4-99bb-b06ebf2a6a60 map[false:17h37m0s true:17h42m0s] TavRBnWw 1059]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[961 2010-10-14 931 aAQ1fTj4G map[udt_672245080_0:XsJs udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326d465349] 2006-12-15 22:54:25 +0600 +06 0.912 c51db500-53ed-11d4-99bc-b06ebf2a6a60 map[false:15h57m0s true:16h2m0s] TavRBjXB 959]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1661 2010-10-14 1631 aAQ1fVynJ map[udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a3270337261 udt_672245080_0:XsJs udt_672245080_1:false] 2006-12-15 22:54:25 +0600 +06 1.612 c51db500-53ed-11d4-99bd-b06ebf2a6a60 map[false:27h37m0s true:27h42m0s] TavRBl7O 1659]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1961 2010-10-14 1931 aAQ1fVIYV map[udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326e45684a udt_672245080_0:XsJs] 2006-12-15 22:54:25 +0600 +06 1.912 c51db500-53ed-11d4-99be-b06ebf2a6a60 map[false:32h37m0s true:32h42m0s] TavRBggJ 1959]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.416]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[1.716]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.716]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.616]) { "result": { "write_ops": 0, "write_errors": 0, "read_ops": 0, "read_errors": 0, "errors": null } } Test run completed. Exiting.

Switch to scylla/gocql driver

It is good to use our own software and more and more users are using it over the regular driver.

Make gemini use modules

When building the first time all the dependencies are missing. By using Go modules this can quite easily be fixed to automatically download all the needed deps.

Duration limit

Add a command line option to run the tool for a specified duration.

Add version number to gemini binary

Generate schema updates

Let's generate schema updates (e.g. ALTER TABLE) as part of the test run to find Scylla bugs in that area.

Gemini version in the result file

This is very convenient and allows for easy inclusion of the version in reports.

Query filtering support

Add support for generating queries that use the ALLOW FILTERING option.

Refactor concurrency model

Currently each job is intended to work on an isolated set of partitions. This brings good things like easy coordination and program state management. It also comes with problems such as range queries that cross partitions doesn't work. This is an unacceptable limitation in the long run so we simply have to find another way to handle this.

Tuple CQL type support

Make printed out CQL statements executable from cqlsh

When Gemini test fails, the tool prints the CQL statement as follows:

SELECT * FROM ks1.table1 WHERE pk0 = ?' (values=[W]

Let's make the CQL statement executable from cqlsh without cumbersome copy-paste by substituting the question marks as follows:

SELECT * FROM ks1.table1 WHERE pk0 = textasblob('W');

(In this scenario, the type of pk0 is a blob hence the textasblob conversion function.)

Secondary index support

Add support for generating schema and query with secondary indexes.

Error upone using schema parameter

If schema parameter used, gemini stop with error:
./gemini -d --duration 60s -c 20 -p 100 -m mixed -f --test-cluster=34.201.251.194 --oracle-cluster=3.93.183.245 --schema scheme.json
Seed: 1
Maximum duration: 1m0s
Concurrency: 20
Number of partitions per thread: 100
Test cluster: 34.201.251.194
Oracle cluster: 3.93.183.245
Output file:
cannot create schema: json: cannot unmarshal string into Go struct field ColumnDef.Type of type gemini.Type

schema file located here: https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json