uber / storagetapper Goto Github PK

View Code? Open in Web Editor NEW

334.0 23.0 67.0 813 KB

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

License: MIT License

Shell 2.68% Makefile 0.18% Go 97.14%

mysql kafka avro cdc etl json msgpack hdfs s3 postgresql

storagetapper's People

Contributors

Stargazers

Watchers

Forkers

efirs satryacode krishnakumar-kapil hzy9981 tom2jack qxiang88 reactual dikang123 suhuaguo www3838438 ahmad-saleem mechanicalai lokanadham100 etsangsplk 13768324554 luyee nielszeilemaker appcoreopc spencerx bajrang0789 aload didiwuliu huiwenhan yum-install-brains jonn-yan raksh93 lalitheswar inevity omirror u5surf maximus-w minervadb forkkit botogoske pzartem yaxche-io feynmanium alpes214 junneyang narayanapk vermdeep onlyone0001 uf-sushant awesomedatatool dystudio standardgalactic dllee franklinharry isgasho foxxnuaa khileshchauhan mantasmiksys shursulei 0xinfinitykernel showsmall pascal-h-kim divilla 791837060 lastexile-ch georgegg digksskawk01 171906502 lcpinto koushikr luis-pinto-fanduel

storagetapper's Issues

Check shell scripts with shellcheck

Add shellcheck build target which would run shellcheck scripts/*.sh.
Make test target dependent on shellcheck to run it along with other tests.

Add support for nats streaming

Add support of clusters endpoint in client

Add TLS support for both replication and snapshot connections

Both replication and snapshot libraries support configuring custom TLS connection.
We need to add support of custom TLS configuration per Storagtapper instance and per individual cluster, table.

Kafka tests are unstable on Travis-ci

Tests often fail with kafka server: Request was for a topic or partition that does not exist on this broker or metadata is out of date errors.

Fix datarace in CheckMySQLVersion

Need to sync access to cachedStateMySQLVersion

Implement snapshot retention policies

Add support of config endpoint in client

Fix Travis-ci build

streamer/streamer_test.go getting timed out on Travis-ci

Implement file, hdfs and s3 changelog and snapshot reader plugins

This is required for logical backup restore functionality.

We need to be able to ingest the data we produced to file, hdfs or s3.

panic: test timed out after 5m0s

When running the last test changlog, met this error as title show.
My test env is a kvm vm .

Have set mysqld,
innodb_lock_wait_timeout=500
transaction-isolation = READ-COMMITTED

But cannot fix this issue.
Then i change the test timeout to 3000s /50m to check whether the test pass.

first met some datarace warning,
then
2019-07-24T15:36:13.563+0800 ERROR changelog/mysql.go:505 Error 1205: Lock wait timeout exceeded; try restarting transaction
2019-07-24T15:44:34.629+0800 ERROR changelog/mysql.go:841 Error 1205: Lock wait timeout exceeded; try restarting transaction {"cluster": "test_cluster1"}
then
2019-07-24T15:44:34.733+0800 DEBUG runtime/asm_amd64.s:1337 Finished binlog worker in test
panic: test timed out after 50m0s

Do you have some suggest?
Detail log at https://paste.ubuntu.com/p/HxJ9bNmzrn/

Add support of schema endpoint in client

Make connection resolver more generic

In state connection resolver, was implemented to resolve MySQL connection details specifically.

It needs to be extended to be able to resolve Kafka, HDFS, S3 connetion details.

Update readme and wiki

Add support for command line config options

Currently instances can be configured with decentralized config files or centralized config in the state database.

We need to add ability to configure instances by command line parameters.

Sanity check table registration parameters earlier

We need to try check table parameters during registration endpoint call, instead of currently asynchronously check during state sync loop.

This way the issues like non-existent users on the source cluster would be obvious and easy to debug.

Setup Github build action

Setup build action to run make test on push.

Resolver should expand databases and tables wildcard

It should be possible to specify "*" for database and table name fields while adding table.
Builtin enumerator should be able to expand those wildcard by logging in to the cluster and enumerating databases and table of the databases using "SHOW DATABASES" and "SHOW TABLES" queries.

Add stdout output support

Having stdout output and configuration through command line options will allow to use Storagetapper as a command line utility.

Migrate to HDFS library branch with Kerberos support

Currently we are using v1 branch of a fork of https://github.com/colinmarc/hdfs library. The fork includes reliability fixes, but v1 branch doesn't have Kerberos support.
We need to port reliability fixes to the master branch of the library and migrate Storagetapper to master branch of the library.

The fork with the fixes: https://github.com/efirs/hdfs

Use connection resolver to resolve output pipes connections

Currently, resolvers are used to resolve source connection information only.
For output destination configuration is hard-coded in config files.
So it's difficult to configure output destinations per stream.

Builtin cluster resolver should respect connection type

Currenlty, builtin resolver doesn't use connType resolver parameter and returns first host matching service, db, no matter whether it slave or master.

Once fixed it will help to distribute load:

Use slave as a source of snapshot events to not overload master.
Use master as a source of incremental changes for better latency.

Create debian package for client utility

Package stcli client utility to separate debian storagetapper-client package.

This avro api maybe change

setter *goavro.RecordSetter

r, err := goavro.NewRecord(*e.setter)

please update tks!

Migrate to goavro v2

Reimplement client utility to support named parameters instead of positional

Named parameters alllows more flexible usage of utility, for example:
./st_client add -d db1 -t t1 -c c1 -s svc1 -i mysql -o kafka -f avro -v 1
instead of fixed positional parameters:
./stcli add svc1 t1 db1 clst1 mysql kafka avro 1 "{}"