Giter Site home page Giter Site logo

gemini's Introduction

Scylla

Slack Twitter

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

  • The community forum and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
  • The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

gemini's People

Contributors

aboudreault avatar aleksbykov avatar cvybhu avatar dahankzter avatar dkropachev avatar dorlaor avatar grighetto avatar illia-li avatar kbr-scylla avatar larisau avatar nuivall avatar penberg avatar roydahan avatar soyacz avatar vponomaryov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gemini's Issues

Refactor concurrency model

Currently each job is intended to work on an isolated set of partitions. This brings good things like easy coordination and program state management. It also comes with problems such as range queries that cross partitions doesn't work. This is an unacceptable limitation in the long run so we simply have to find another way to handle this.

Make gemini use modules

When building the first time all the dependencies are missing. By using Go modules this can quite easily be fixed to automatically download all the needed deps.

JSON Schema file input too complex

With the introduction of the complex types the schema input file was left behind. It needs to be tidied up and made understandable. The complex types needs better and more descriptive json notation.

Schema generation

Users can already specify the schema via a configuration while, but let's add schema generation as part of the tool to increase test coverage.

Use Gocqlx

It should both simplify query building, make the code more readable and provide dogfooding for our own projects.

Refined fail-fast option

I passed the --fail-fast option to gemini, but after a timeout error, the tool keeps running.

Result output to a file

Make it easier for machines to parse Gemini results by supporting result output to a file.

Make printed out CQL statements executable from cqlsh

When Gemini test fails, the tool prints the CQL statement as follows:

SELECT * FROM ks1.table1 WHERE pk0 = ?' (values=[W]

Let's make the CQL statement executable from cqlsh without cumbersome copy-paste by substituting the question marks as follows:

SELECT * FROM ks1.table1 WHERE pk0 = textasblob('W');

(In this scenario, the type of pk0 is a blob hence the textasblob conversion function.)

Progress reporting is too noisy

Triggering bugs in Scylla can require gemini to be run for a long time. When a problem happens, all the interesting information about IP addresses and schema are lost because of noisy progress reporting.

Gemini launcher does not fetch new dependencies

When new dependencies are added, gemini-launcher does not seem to fetch them automatically:

[penberg@nero gemini]$ ./scripts/gemini-launcher --duration 10s --drop-schema
session.go:14:2: cannot find package "go.uber.org/multierr" in any of:
	/usr/lib/golang/src/go.uber.org/multierr (from $GOROOT)
	/home/penberg/go/src/go.uber.org/multierr (from $GOPATH)
Compilation failed

Improve documentation

We need at least a "getting started guide" and perhaps something that describes the design of the tool to advanced users.

Duration limit

Add a command line option to run the tool for a specified duration.

max-tests seems to not be honored

Example:

--concurrency=1
--max-tests=10000

The result is:

thread 0: write ops: 1 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 529 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 1017 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 1515 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2010 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2498 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 2998 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 3490 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 3978 | read ops: 0 | write errors: 0 | read errors: 0
thread 0: write ops: 4486 | read ops: 0 | write errors: 0 | read errors: 0
Results:
	write ops:    4988
	read ops:     0
	write errors: 0
	read errors:  0

This seems way to short essentially half the requested runtime is cut.

Command line option for schema configuration file

Currently, if you launch gemini in the wrong directory, the tool complains that:

cannot create schema: open schema.json: no such file or directory

Let's add a command line option to specify the schema configuration file.

Test result validation is broken

The session.Check() looks pretty broken.. For example, if testIter.MapScan() fails in the middle of a result set, we bail out with bogus ErrReadNoDataReturned error, which is a successful case.

Generate schema updates

Let's generate schema updates (e.g. ALTER TABLE) as part of the test run to find Scylla bugs in that area.

Gemini launcher for Docker images

Both Scylla and Cassandra have Docker images, which can be used for testing. Let's add a launcher script to Gemini, which automates docker run and IP address discovery to simplify Gemini setup.

Insertion failures because of time.Time marshaling problem

I am seeing the following error:

Failed! Mutation 'INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0) VALUES (?,?,?,?,?)' (values=[ZhtcLiAhrobsZP7Y5f9kFocDbDhPo/9jeHrF059htGLYGB126jCxOYiet 5 1994-03-12 14:31:51 +0200 EET 0cTaXQ5la32tSTixuiyUEknNFRqn5VNL3b5hh+hWEF+n7zvhpCMSu8jkfzz Uwtgq0ax18drFRuRCDPtlE6ApdMRegzcEgDUoplOWbzWgjIBSB7FhO]) caused an error: 'can not marshal time.Time into int [cluster = test, query = 'INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0) VALUES (?,?,?,?,?)']'

git bisect blames the following commit:

13d308e4fafaa967ce0d97992cf682c244a0cc1b is the first bad commit
commit 13d308e4fafaa967ce0d97992cf682c244a0cc1b
Author: Henrik Johansson <[email protected]>
Date:   Tue Mar 12 14:07:46 2019 +0100

    schema: Clustering keys can now have the same types as partition keys.

:100644 100644 a6d99a514ad1231420c0ff3e1c19e2b9531a15e8 1285d83f682e5fe9ac325f666a11de776d74299c M      CHANGELOG.md
:100644 100644 9bbefe97444f4404a2ea50f5d6e74f994823d89e d4cdada91f145b4e880f42f15c3d407717be69e5 M      go.mod
:100644 100644 9ca24e213568b35b43472ae8298e84b324ae567e 87967a39f921bdaacc732dd09606e2e09da59467 M      go.sum
:100644 100644 084890688da9a70a822c951092855a9aa81c9c34 5d3084a942089a890bbb10142fe368d464e3f186 M      schema.go

Error upone using schema parameter

If schema parameter used, gemini stop with error:
./gemini -d --duration 60s -c 20 -p 100 -m mixed -f --test-cluster=34.201.251.194 --oracle-cluster=3.93.183.245 --schema scheme.json
Seed: 1
Maximum duration: 1m0s
Concurrency: 20
Number of partitions per thread: 100
Test cluster: 34.201.251.194
Oracle cluster: 3.93.183.245
Output file:
cannot create schema: json: cannot unmarshal string into Go struct field ColumnDef.Type of type gemini.Type

schema file located here: https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json

Result of gemini contains only 0

If run several times to same nodes, gemini return results
{
"result": {
"write_ops": 0,
"write_errors": 0,
"read_ops": 0,
"read_errors": 0,
"errors": null
}
}
run next command: ./gemini -d --duration 60s -c 20 -p 100 -m mixed -f -v --test-cluster=34.201.251.194 --oracle-cluster=3.93.183.245
Version of gemini is 0.9.2
while debug log shows that there are a lot of operations:
INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[861 2010-10-14 831 aAQ1fWYPK map[udt_672245080_3:52 udt_672245080_0:X19J2m9kx udt_672245080_1:true udt_672245080_2:5594391364288805258] 2006-12-15 22:54:25 +0600 +06 0.812 c51db500-53ed-11d4-99b8-b06ebf2a6a60 map[false:14h17m0s true:14h22m0s] TavRBi29 859]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[561 2010-10-14 531 aAQ1fWzfc map[udt_672245080_0:XsJs udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326e684678] 2006-12-15 22:54:25 +0600 +06 0.512 c51db500-53ed-11d4-99b9-b06ebf2a6a60 map[false:9h17m0s true:9h22m0s] TavRBi7K 559]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[6 2010-10-14 31 aAQ1fRxcU map[udt_672245080_1:true udt_672245080_2:5594391364288805258 udt_672245080_3:52 udt_672245080_0:X19J2obzb] 2006-12-15 22:54:25 +0600 +06 0.012 c51db500-53ed-11d4-99ba-b06ebf2a6a60 map[true:1h2m0s false:57m0s] TavRBmqU 3]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1061 2010-10-14 1031 aAQ1fUMZ1 map[udt_672245080_1:true udt_672245080_2:5594391364288805258 udt_672245080_3:52 udt_672245080_0:X19J2kB7o] 2006-12-15 22:54:25 +0600 +06 1.012 c51db500-53ed-11d4-99bb-b06ebf2a6a60 map[false:17h37m0s true:17h42m0s] TavRBnWw 1059]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[961 2010-10-14 931 aAQ1fTj4G map[udt_672245080_0:XsJs udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326d465349] 2006-12-15 22:54:25 +0600 +06 0.912 c51db500-53ed-11d4-99bc-b06ebf2a6a60 map[false:15h57m0s true:16h2m0s] TavRBjXB 959]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1661 2010-10-14 1631 aAQ1fVynJ map[udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a3270337261 udt_672245080_0:XsJs udt_672245080_1:false] 2006-12-15 22:54:25 +0600 +06 1.612 c51db500-53ed-11d4-99bd-b06ebf2a6a60 map[false:27h37m0s true:27h42m0s] TavRBl7O 1659]) INSERT INTO ks1.table1 (pk0,ck0,ck1,ck2,col0,col1,col2,col3,col4,col5) VALUES (?,?,?,?,?,?,?,?,?,(?,?)) (values=[1961 2010-10-14 1931 aAQ1fVIYV map[udt_672245080_1:false udt_672245080_2:8730011856623748374 udt_672245080_3:5831394a326e45684a udt_672245080_0:XsJs] 2006-12-15 22:54:25 +0600 +06 1.912 c51db500-53ed-11d4-99be-b06ebf2a6a60 map[false:32h37m0s true:32h42m0s] TavRBggJ 1959]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.416]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[1.716]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.716]) SELECT * FROM ks1.table1 WHERE pk0 IN (?) AND col2=? (values=[0.616]) { "result": { "write_ops": 0, "write_errors": 0, "read_ops": 0, "read_errors": 0, "errors": null } } Test run completed. Exiting.

refactoring: create query builder

Creating string based queries is error prone and hard to change. With a flexible builder we could make this much safer and easier.

Invalid clustering range queries generated

We get these errors:
Clustering column "ck1" cannot be restricted (preceding column "ck0" is restricted by a non-EQ relation

This is a result of queries like:
SELECT * FROM ks1.table1 WHERE pk0 = ? AND ck0 > ? AND ck0 < ? AND ck1 > ? AND ck1 < ? AND ck2 > ? AND ck2 < ?

We need to be smarter and generate a variety of queries proper EQ restrictions on the left clustering columns. Many combinations can and should be generated.

The reason we haven't seen them before is because of missing error handling which we should also fix.

Column count configuration

We have rather few columns being generated and more types than fits in the columns currently.
Having this configurable would be neat to allow for more diverse data types being generated.

Passing --non-interactive flag causes a panic

Passing the --non-interactive flag to gemini:

$ ./scripts/gemini-launcher --non-interactive

causes the following panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x7250a0]

goroutine 150 [running]:
main.runJob.func1(0xc00026ac40, 0xc0002629c0, 0x8416c0, 0xc000271240, 0xc0002f6180, 0x8bb2c97000)
        /home/penberg/go/src/github.com/scylladb/gemini/cmd/gemini/root.go:195 +0x2f0
created by main.runJob
        /home/penberg/go/src/github.com/scylladb/gemini/cmd/gemini/root.go:171 +0x36b

Reported by @aleksbykov.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.