apache / incubator-horaedb-meta Goto Github PK

View Code? Open in Web Editor NEW

27.0 10.0 16.0 972 KB

Meta service of HoraeDB cluster.

Home Page: https://horaedb.apache.org

License: Apache License 2.0

Go 99.22% Makefile 0.20% Shell 0.34% Dockerfile 0.24%

cloud-native database distributed-database golang horaedb iot-database sql timeseries-analysis timeseries-database tsdb

incubator-horaedb-meta's Introduction

⚠️ This repository has been deprecated at 2024-01-25, further development will move to here.

HoraeMeta

HoraeMeta is the meta service for managing the HoraeDB cluster.

Status

The project is in a very early stage.

Quick Start

Build HoraeMeta binary

make build

Standalone Mode

Although HoraeMeta is designed to deployed as a cluster with three or more instances, it can also be started standalone:

# HoraeMeta0
mkdir /tmp/meta0
./bin/horaemeta-server --config ./config/example-standalone.toml

Cluster mode

Here is an example for starting HoraeMeta in cluster mode (three instances) on single machine by using different ports:

# Create directories.
mkdir /tmp/meta0
mkdir /tmp/meta1
mkdir /tmp/meta2

# horaemeta0
./bin/horaemeta-server --config ./config/exampl-cluster0.toml

# horaemeta1
./bin/horaemeta-server --config ./config/exampl-cluster1.toml

# horaemeta2
./bin/horaemeta-server --config ./config/exampl-cluster2.toml

Acknowledgment

HoraeMeta refers to the excellent project pd in design and some module and codes are forked from pd, thanks to the TiKV team.

Contributing

The project is under rapid development so that any contribution is welcome. Check our Contributing Guide and make your first contribution!

License

HoraeMeta is under Apache License 2.0.

incubator-horaedb-meta's People

Contributors

Stargazers

Watchers

Forkers

shikaiwi archerny li-jin-gou chunshao90 baojinri zuliangwang rachelint mrrtree zouxiang1993 michaelleehz baojinchang web-logs2 lengxxx tisonkun

incubator-horaedb-meta's Issues

Do not return an error when dropping a table that exists in ceresmeta but does not exist in ceresdb

Description
Do not return an error when dropping a table that exists in ceresmeta but does not exist in ceresdb.

Proposal

handle_drop_table_on_shard in meta_event service support if exists.

Additional context

Send events through heartbeat response to CeresDB node

Description

Send events through heartbeat response to CeresDB node, CeresDB node can change it's state by those events.

Proposal

One sub task of #10

Additional context

CeresMeta

Describe this problem

Steps to reproduce

Expected behavior

Additional Information

feat: impl MetaStorage with etcd kv

Description

impl MetaStorage with etcd kv

Proposal
refer to https://github.com/CeresDB/ceresmeta/blob/615113cf1bb0ee1f002dd0448194f4aed7cb3e99/server/storage/storage_impl.go#L26

Additional context

Occurs data race in CI

Describe this problem
Occurs data race in CI

==================
WARNING: DATA RACE
Write at 0x00c00033f4d0 by goroutine 115:
  github.com/CeresDB/ceresmeta/server/member.TestWatchLeaderSingle()
      /home/runner/work/ceresmeta/ceresmeta/server/member/watch_leader_test.go:56 +0x572
  testing.tRunner()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1493 +0x47

Previous read at 0x00c00033f4d0 by goroutine 169:
  github.com/CeresDB/ceresmeta/server/member.TestWatchLeaderSingle.func1()
      /home/runner/work/ceresmeta/ceresmeta/server/member/watch_leader_test.go:48 +0x44

Goroutine 115 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1493 +0x75d
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1846 +0x99
  testing.tRunner()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1446 +0x216
  testing.runTests()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1844 +0x7ec
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1726 +0xa84
  main.main()
      _testmain.go:101 +0x3bc

Goroutine [169](https://github.com/CeresDB/ceresmeta/actions/runs/3540880881/jobs/5945078085#step:4:170) (running) created at:
  github.com/CeresDB/ceresmeta/server/member.TestWatchLeaderSingle()
      /home/runner/work/ceresmeta/ceresmeta/server/member/watch_leader_test.go:47 +0x519
  testing.tRunner()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.19.3/x64/src/testing/testing.go:1493 +0x47

Steps to reproduce

Expected behavior

Additional Information

Support nightly docker image

Description
The latest docker image is needed for testing with ceresdb cluster.

Proposal
Support build the nightly docker image.

Additional context

Reduce the number of operations written to the etcd storage in a single transaction

In current etcd storage implementation, it will write and query the metadata of the cluster and shard with transaction containing multiple operations, like this:

	for _, shardView := range req.ShardViews {
		opCreateShardTopologiesAndLatestVersion = append(opCreateShardTopologiesAndLatestVersion, clientv3.OpPut(key, string(value)), clientv3.OpPut(latestVersionKey, fmtID(shardView.Version)))
	}
	resp, err := s.client.Txn(ctx).
		If(keysMissing...).
		Then(opCreateShardTopologiesAndLatestVersion...).
		Commit()

An error occurs when the number of operations exceeds the maximum limit of etcd.
Try to optimize these writing methods to reduce the number of operations under the same data volume.

Refactor the cluster metadata manager.

Description
With the addition of the cluster scheduling module, the current cluster metadata management is somewhat confusing.

Proposal

Define the memory data structure corresponding to pb.
Refactor MetaStorage interface with the memory data structure instead of pb.
Modify naming of struct in storage pb to make it easier to understand.
Extract cluster metadata management into a separate module, simplifying cluster module operations.

Additional context

Fix shard version inconsistent verification

Describe this problem
In the current implementation of CeresMeta, the scheduler's verification of shard version inconsistency will result in the shard being closed. However, due to the backward heartbeat information during the table creation process, there is a probability of version inconsistency, which will lead to the shard being unavailable during the table creation process.

Steps to reproduce
Create table multiple times.

Expected behavior
All tables are created normally.

Additional Information
Because the current design cannot solve this problem, we plan to close the verification of version inconsistency first, and solve it after reconstructing the distributed model of CeresMeta.

Some MetaStorgae interface implementations should be batched

Description
Some MetaStorgae interface implementations should be batched

Proposal

Additional context

Support flow limiter persistence & flow limit of the standby machine

Description
We only support the limit of the main machine now, we need to support the flow limit of the standby machine.

Proposal

support flow limiter persistence.
master-slave synchronization.

Additional context

Dispatch database creation event to all servers

Description

This is a sub-task of apache/horaedb#597

Proposal

Two points

When a ceresdb server first register with meta, it should return all databases back to it.
When receive database create request, it should notify other servers about this new database.

Additional context

Make the ceresmeta runnable

Description
Now ceresmeta only can be compiled but it is not runnable. For further joint development with ceresdb, ceresmeta must be runnable.

Proposal
Solve the problems when starting ceresmeta cluster.

Additional context

CeresMeta follower node did not forward create/drop table request to leader

Describe this problem
When CeresDB is started in dynamic cluster mode, CeresMeta node will not forward create/drop table request after receiving it, but directly process it. Because follower node does not have complete memory data, it will get error like this:

io.ceresdb.http.errors.ManagementException: Execute sql [CREATE TABLE MY_FIRST_TABLE31(ts TIMESTAMP NOT NULL,c1 STRING TAG NOT NULL,c2 STRING TAG NOT NULL,c3 DOUBLE NULL,c4 STRING NULL,c5 INT64 NULL,c6 FLOAT NULL,c7 INT32 NULL,c8 INT16 NULL,c9 INT8 NULL,c10 BOOLEAN NULL,c11 UINT64 NULL,c12 UINT32 NULL,c13 UINT16 NULL,c14 UINT8 NULL,c15 TIMESTAMP NULL,c16 VARBINARY NULL,TIMESTAMP KEY(ts)) ENGINE=Analytic] error from server 127.0.0.1:5440, err_code=500, err_msg=Internal Server Error, detail_msg={"code":500,"message":"Failed to handle request, err:Failed to execute interpreter, query:CREATE TABLE MY_FIRST_TABLE31(ts TIMESTAMP NOT NULL,c1 STRING TAG NOT NULL,c2 STRING TAG NOT NULL,c3 DOUBLE NULL,c4 STRING NULL,c5 INT64 NULL,c6 FLOAT NULL,c7 INT32 NULL,c8 INT16 NULL,c9 INT8 NULL,c10 BOOLEAN NULL,c11 UINT64 NULL,c12 UINT32 NULL,c13 UINT16 NULL,c14 UINT8 NULL,c15 TIMESTAMP NULL,c16 VARBINARY NULL,TIMESTAMP KEY(ts)) ENGINE=Analytic, err:Failed to execute create table, err:Failed to create table by table manipulator, err:Failed to create table, msg:failed to create table by meta client, req:CreateTableRequest { schema_name: \"public\", name: \"MY_FIRST_TABLE31\", encoded_schema: [0, 10, 8, 10, 2, 116, 115, 16, 1, 32, 1, 10, 10, 10, 4, 116, 115, 105, 100, 16, 5, 32, 2, 10, 10, 10, 2, 99, 49, 16, 4, 32, 3, 40, 1, 10, 11, 10, 3, 99, 49, 48, 16, 14, 24, 1, 32, 4, 10, 11, 10, 3, 99, 49, 49, 16, 5, 24, 1, 32, 5, 10, 11, 10, 3, 99, 49, 50, 16, 11, 24, 1, 32, 6, 10, 11, 10, 3, 99, 49, 51, 16, 12, 24, 1, 32, 7, 10, 11, 10, 3, 99, 49, 52, 16, 13, 24, 1, 32, 8, 10, 11, 10, 3, 99, 49, 53, 16, 1, 24, 1, 32, 9, 10, 11, 10, 3, 99, 49, 54, 16, 3, 24, 1, 32, 10, 10, 10, 10, 2, 99, 50, 16, 4, 32, 11, 40, 1, 10, 10, 10, 2, 99, 51, 16, 2, 24, 1, 32, 12, 10, 10, 10, 2, 99, 52, 16, 4, 24, 1, 32, 13, 10, 10, 10, 2, 99, 53, 16, 7, 24, 1, 32, 14, 10, 10, 10, 2, 99, 54, 16, 6, 24, 1, 32, 15, 10, 10, 10, 2, 99, 55, 16, 8, 24, 1, 32, 16, 10, 10, 10, 2, 99, 56, 16, 9, 24, 1, 32, 17, 10, 10, 10, 2, 99, 57, 16, 10, 24, 1, 32, 18, 16, 1, 24, 2, 40, 1], engine: \"Analytic\", create_if_not_exist: false, options: {} }, err:Bad response, resp code:404, msg:create table(#404)cluster not found, cause:<nil>."}

Steps to reproduce

Set the startup parameter meta of CeresDB cluster meta_addr to CeresMeta follower node address.
Start CeresDB in cluster mode.
Send create table request by client.

CeresDB error info:

2022-11-04 14:17:06.043 ERRO [server/src/http.rs:140] Http service Failed to handle sql, err:Failed to execute interpreter, query:CREATE TABLE MY_FIRST_TABLE31(ts TIMESTAMP NOT NULL,c1 STRING TAG NOT NULL,c2 STRING TAG NOT NULL,c3 DOUBLE NULL,c4 STRING NULL,c5 INT64 NULL,c6 FLOAT NULL,c7 INT32 NULL,c8 INT16 NULL,c9 INT8 NULL,c10 BOOLEAN NULL,c11 UINT64 NULL,c12 UINT32 NULL,c13 UINT16 NULL,c14 UINT8 NULL,c15 TIMESTAMP NULL,c16 VARBINARY NULL,TIMESTAMP KEY(ts)) ENGINE=Analytic, err:Failed to execute create table, err:Failed to create table by table manipulator, err:Failed to create table, msg:failed to create table by meta client, req:CreateTableRequest { schema_name: "public", name: "MY_FIRST_TABLE31", encoded_schema: [0, 10, 8, 10, 2, 116, 115, 16, 1, 32, 1, 10, 10, 10, 4, 116, 115, 105, 100, 16, 5, 32, 2, 10, 10, 10, 2, 99, 49, 16, 4, 32, 3, 40, 1, 10, 11, 10, 3, 99, 49, 48, 16, 14, 24, 1, 32, 4, 10, 11, 10, 3, 99, 49, 49, 16, 5, 24, 1, 32, 5, 10, 11, 10, 3, 99, 49, 50, 16, 11, 24, 1, 32, 6, 10, 11, 10, 3, 99, 49, 51, 16, 12, 24, 1, 32, 7, 10, 11, 10, 3, 99, 49, 52, 16, 13, 24, 1, 32, 8, 10, 11, 10, 3, 99, 49, 53, 16, 1, 24, 1, 32, 9, 10, 11, 10, 3, 99, 49, 54, 16, 3, 24, 1, 32, 10, 10, 10, 10, 2, 99, 50, 16, 4, 32, 11, 40, 1, 10, 10, 10, 2, 99, 51, 16, 2, 24, 1, 32, 12, 10, 10, 10, 2, 99, 52, 16, 4, 24, 1, 32, 13, 10, 10, 10, 2, 99, 53, 16, 7, 24, 1, 32, 14, 10, 10, 10, 2, 99, 54, 16, 6, 24, 1, 32, 15, 10, 10, 10, 2, 99, 55, 16, 8, 24, 1, 32, 16, 10, 10, 10, 2, 99, 56, 16, 9, 24, 1, 32, 17, 10, 10, 10, 2, 99, 57, 16, 10, 24, 1, 32, 18, 16, 1, 24, 2, 40, 1], engine: "Analytic", create_if_not_exist: false, options: {} }, err:Bad response, resp code:404, msg:create table(#404)cluster not found, cause:<nil>.
Backtrace:
  0 backtrace::backtrace::libunwind::trace::h27a550175f4658ea
    /Users/zulliangwang/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/libunwind.rs:93
    backtrace::backtrace::trace_unsynchronized::hc0bc2725eecb7c13
    /Users/zulliangwang/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/mod.rs:66
  1 backtrace::backtrace::trace::hfe93c015b76a8a27
    /Users/zulliangwang/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/mod.rs:53
  2 backtrace::capture::Backtrace::create::h2dd73475d25bc41c

CeresMeta error info:

2022-11-04T14:17:05.373+0800    error   grpc/service.go:121     fail to create table    {"error": "(#404)cluster not found, cause:<nil>"}
github.com/CeresDB/ceresmeta/server/service/grpc.(*Service).CreateTable
        /Users/zulliangwang/code/ceres/ceresmeta/server/service/grpc/service.go:121
github.com/CeresDB/ceresdbproto/pkg/metaservicepb._CeresmetaRpcService_CreateTable_Handler.func1
        /Users/zulliangwang/go/pkg/mod/github.com/!ceres!d!b/[email protected]/pkg/metaservicepb/meta_service_grpc.pb.go:206
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
        /Users/zulliangwang/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
        /Users/zulliangwang/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.newUnaryInterceptor.func1
        /Users/zulliangwang/go/pkg/mod/go.etcd.io/etcd/server/[email protected]/etcdserver/api/v3rpc/interceptor.go:71
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
        /Users/zulliangwang/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.newLogUnaryInterceptor.func1
        /Users/zulliangwang/go/pkg/mod/go.etcd.io/etcd/server/[email protected]/etcdserver/api/v3rpc/interceptor.go:78
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
        /Users/zulliangwang/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
        /Users/zulliangwang/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34
github.com/CeresDB/ceresdbproto/pkg/metaservicepb._CeresmetaRpcService_CreateTable_Handler
        /Users/zulliangwang/go/pkg/mod/github.com/!ceres!d!b/[email protected]/pkg/metaservicepb/meta_service_grpc.pb.go:208
google.golang.org/grpc.(*Server).processUnaryRPC
        /Users/zulliangwang/go/pkg/mod/google.golang.org/[email protected]/server.go:1283
google.golang.org/grpc.(*Server).handleStream
        /Users/zulliangwang/go/pkg/mod/google.golang.org/[email protected]/server.go:1620
google.golang.org/grpc.(*Server).serveStreams.func1.2
        /Users/zulliangwang/go/pkg/mod/google.golang.org/[email protected]/server.go:922

Expected behavior
The create/drop table request sent by the client can be processed normally

Additional Information

Support flow limiter information reading

Description
We have no way of knowing the configuration information of the flow limiter while the flow limiter is running.

Proposal
Add interface to read flow limiter information.

Additional context

Basic failover capability of CeresDB cluster

Description
We implemented the cluster management capability of CeresDB with Procedure, but Procedure only provides shard's scheduling functionality, and it does not actively check CeresDB's cluster state. The ability to failover is still lacking.

Proposal
Implement the simplest failover of CeresDB cluster mode. After CeresDB node crash, the faulty node is automatically removed and the routing relationship is adjusted.
This should includes these functions:

Check whether the node crashed based on heartbeat.
When node is confirmed to be crashed, remove it from the metadata and transfer leader by invoke TransferLeaderProcedure.

Additional context

When there are a lot of route table requests, the CPU usage of the node is extremely high

Description
CeresMeta provides route table GRPC interface to CeresDB, we found that when there are a lot of route table requests, the CPU usage of the node is extremely high.

Proposal
Found the reason that the CPU usage of the node is extremely high, there may be hot code that needs to be optimized for performance.

Additional context

Support startup by joinning cluster

Description
Now the startup of meta cluster needs a static config describing all the members which is required by the ETCD. However, this is not practical in the production enviroment because of node replacement or expansion.

A dynamical way of startup allowing new members joining is necessary.

Proposal
The proposal has not been decided yet.

Additional context
The pd has the same machanism which we can refer to.

feat: impl ID Allocator

Description

impl ID Allocator with etcdKV

Proposal

can refer to https://github.com/tikv/pd/blob/0c3303e765f2c3580b3ec80ea89982e20cfe5219/server/id/id.go#L31
need to use KV interface, avoid to use etcd client directly

Additional context

Start and stop the cluster manager when the leader changes

Description

Start and stop the cluster manager when the leader changes.

Proposal

When the leader changes, the cluster manage is needed to be started or stopped.
Consider to avoid data race, when the leader changes. For example: schemasCache is not right.

Additional context

Publish docker image

Description
For the convenience of deployment, a docker image is necessary.

Proposal

Build and publish docker image when released.
Make all things automated by GitHub actions.

Additional context

Fmt check failed in go 1.19

Describe this problem

Make build failed due to fmt check in go 1.19.

Steps to reproduce

Install go 1.19.
Try to build.

Expected behavior

Build success.

Additional Information

Open shards concurrently

Description
Currently, shards are opened one by one, which is too slow when restarting the whole cluster.

Proposal
Open the shards concurrently.

Additional context

Implement rollback operation for procedure

Description
After realizing the basic functions of each procedure, we need to implement fault tolerance for the procedure, add rollback and cancel for the procedure.

Proposal

Implement RollbackProcedure. It will close the failed shard and reinitialize it.
Add implementation of cancel for every Procedure

Additional context

Add procedure manager to maintain procedure

Description
We defined and implemented some procedure to manage shard operation. However, procedure still lacks a unified external entry. It should provide the interface to submit procedure and retry procedure when node is crash and restart.

Proposal
Implementation ProcedureManager , it contains such function:

submit new procedure
cancel running procedure
retry all unfinished procedure

Additional context
None.

feat: define MetaStorage interface

Description

define MetaStorage interface to persist ceresdb cluster info

Proposal
ceresdb cluster info refer to https://github.com/CeresDB/ceresdbproto/blob/main/protos/meta/cluster.proto

Additional context

Timed out when creating a partitioned table with a large number of partitions

Describe this problem
When create partition table with a large number of partitions, the client's request will time out.

Steps to reproduce
Execute create partitioned table like this:

CREATE TABLE `demo` (`name` string TAG, `value` double NOT NULL, `t` timestamp NOT NULL, TIMESTAMP KEY(t))
    PARTITION BY KEY(name) PARTITIONS 512
    ENGINE=Analytic with (enable_ttl='false')

Expected behavior
Successfully completed the creation of the partition table in a short time.

Additional Information
In the current implementation, the sub-table creation of the partition table is serial, maybe we need to optimize the sub-table creation process.

The leader node of CeresMeta will OOM when the CeresDB node restarts, and it will also fail when electing new leader.

Describe this problem
The leader node of CeresMeta will OOM when the CeresDB node restarts, and it will also fail when electing new leader.

Steps to reproduce

Deploy version 1.0 of CeresDB and CeresMeta.
There are a large number of read and write requests.
Restart some CeresDB nodes.

Expected behavior

CeresMeta will not experience OOM when CeresDB node is restarted.
Under no circumstances should it affect the normal election of CeresMeta.

Additional Information
The preliminary reason for the current positioning is that under the implementation of the current Java Client, when Route fails, it will cause all Routes to be refreshed and send a Route request to CeresDB. This will result in CeresMeta receiving a large number of Route requests in a short period of time, and CeresMeta currently does not have buffering for the Route interface. Each request will read etcd, which causes this issue with etcd.

Expose ETCD API

Description

ETCD API is necessary for the following development about clustering, which will be used for election of shard leader across CeresDB instances.

Proposal
Just expose the api of ETCD service.

Additional context

attach backtrace (callstack) to the leaf errors

Description
Now the leaf error generated by coderr pkg has no inforamtion about the backtrace (callstack) which is important for trobleshooting problems.

Proposal
Generate the call stacks and attach it to the leaf error.
We may refer to implementations of errors pkg.

Additional context

Avoid open table in multiple shard.

Description
In the current implementation, we cannot guarantee that a table cannot be opened on multiple shards, maybe we need a mechanism similar to ShardLock to solve this problem

Proposal
Avoid open table in multiple shard

Additional context

Implement various procedures to manage shards

Description
To manage Shard, we need to implement some different kinds of Procedure. Each procedure corresponds to one Shard operation type, it contains the following types:

TransferLeader
Migrate
Split
Merge
Scatter

Proposal
Implement the above Procedure.

Additional context
To simplify code review, the implementation of each Procedure should be submitted as a separate pull request

Handle heartbeat from CeresDB node

Description
Implement machanism of handling heartbeat from CeresDB node, through which CeresMeta can do health check of CeresDB and push/receive events to/from CeresDB.

Proposal
The work should can be divided into these:

Support grpc service #11
Manage heartbeat streams from different nodes #12
Handle the heartbeat request

Additional context

Unit tests TestManagerSingle* are broken

Describe this problem
Current some unit tests are broken, whose names are TestManager*.

Steps to reproduce
Just run these unit tests;

Expected behavior
No error when run these UTs.

Additional Information

Make the returned information clearer during dynamic scheduling

Description
During dynamic scheduling, querying table routing will return empty routing information, which is easy to cause confusion.

Proposal
During dynamic scheduling, return error?

Additional context

Ensure the consistency between CeresDB nodes with metadata

Description
We implemented basic dynamic cluster mode in version 0.4. However, the consistency and correctness of the cluster cannot be guaranteed. We need a solution that ensures that clusters are consistent even in extreme situations, so we decided to adapt the CeresMeta implementation according to the following principles:

Procedure for the same cluster is executed strictly serially and no concurrency is allowed.
When a procedure is running, it is not allowed to create a new procedure.
Before procedure running, must ensure shards version in metadata is the same as shards version in real nodes.
If procedure running failed, it will not be rollback and no more new procedure can be submit before cluster state is reset to stable by manual.

Proposal
Refactor the procedure module according to the above principles, it contains following changes:

ProcedureManager needs to ensure that only one procedure can run at any one time.
ProcedureFactory cannot create a new procedure while a procedure is running.
Every Procedure should compare shard version in metadata and nodes, refused to running when they are not equal.
When a Procedure is running failed, ProcedureManager cannot submit new procedure until the failed procedure is canceled by manual.

Additional context

Refactor the fsm of procedure

Description
Now, the ddl and operation fsm in procedure are not clearly, and the error messages of fsm are mixed together.

Proposal

Sort out the state flow of ddl and operation
Use CanceledError to distinct error message, refer: https://pkg.go.dev/github.com/looplab/fsm#CanceledError

Additional context

Format http return results to increase readability

Describe This Problem

When I use the http service to get metadata, the returned result is as follows, which is not very readable.

curl --location --request POST 'http://127.0.0.1:8080/api/v1/getShardTables' \
--header 'Content-Type: application/json' \
--data-raw '{
    "clusterName":"defaultCluster",
    "nodeName":"127.0.0.1:8832",
    "shardIDs": [0,1,2,3,4,5,6,7]
}'

{
    "status": "success",
    "data": "map[0:{Shard:{ID:0 Role:0 Version:0} Tables:[]} 1:{Shard:{ID:0 Role:0 Version:0} Tables:[]} 2:{Shard:{ID:0 Role:0 Version:0} Tables:[]} 3:{Shard:{ID:0 Role:0 Version:0} Tables:[]} 4:{Shard:{ID:4 Role:1 Version:2} Tables:[{ID:0 Name:demo SchemaID:0 SchemaName:public PartitionInfo:key:{partition_definitions:{name:\"0\"} partition_definitions:{name:\"1\"} partition_key:\"name\"}} {ID:1 Name:____demo_0 SchemaID:0 SchemaName:public PartitionInfo:key:{partition_definitions:{name:\"0\"} partition_definitions:{name:\"1\"} partition_key:\"name\"}}]} 5:{Shard:{ID:5 Role:1 Version:0} Tables:[]} 6:{Shard:{ID:6 Role:1 Version:0} Tables:[]} 7:{Shard:{ID:7 Role:1 Version:0} Tables:[]}]"
}

Proposal

Use json to format http return results to increase readability.

Additional Context

No response

Rebuild cluster topology based on metadata when restarted

Description
When leader node restarted, all cluster information will be lost cause cluster topology will not be rebuild.

Proposal
Rebuild cluster topology based on metadata when restarted, recover to stable state.

Additional context

CI complains with warnings but won't fail

Describe this problem
The problem can be found in https://github.com/CeresDB/ceresmeta/actions/runs/3140555696/jobs/5102069441.

Steps to reproduce

Write some codes with no use;
Trigger the CI.
The CI doesn't fail.

Expected behavior
The CI should fail.

Additional Information

Redirect requests to leader

Description
The cluster of ceresmeta provides service as a whole for ceresdb nodes, that is to say, one request can be received by any ceresmeta instance.
However, some service, e.g. ID generator, actually can only be provided by the leader. So the followers must can redirect the traffic into the leaders for requests of specific types.

Proposal
(TODO)

Additional context

Support static cluster topology

Description
Cluster package manages ceresdb cluster meta data.

Proposal
support static cluster topology

MetaStorage with etcd #8
ID Allocator #22
Cluster and ClusterManager #23
static cluster topology

Additional context
todo: support dynamic cluster topology

Add data persistence implementation for procedure

Description
In order to achieve disaster recovery of shard operation, we need to support the persistent storage of Procedure. When CeresMeta restarts each time, the data of Procedure is loaded and scheduled.

Proposal

Implementation of procedure storage based on etcd, and expose appropriate interfaces externally.
Use the interface provided by storage in Procedure to realize data persistence.
Reload and continue to execute Procedure when the CeresMeta node restarts.

Additional context

Input address format mistake when get grpc client connection

Describe this problem
CeresMeta node need to create grpc client in the following two scenarios:

Forward request to CeresMeta leader node, the leader node address is http format like http://127.0.0.1:22380.
Dispatch event to CeresDB node, CeresDB node address is endpoint with port like 127.0.0.1:2379.

But CeresMeta current only support the second scenario, when forward request to leader node, the following error occurs:
ERRO [cluster/src/cluster_impl.rs:80] Send heartbeat to meta failed, err:Failed to send heartbeat, cluster:defaultCluster, err:status: Unavailable, message: "connection error: desc = \"transport: Error while dialing dial tcp: address http://127.0.0.1:22379: too many colons in address\"", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }

Steps to reproduce

Start CeresMeta and CeresDB in cluster mode.
Close CeresMeta leader node.

Expected behavior
CeresMeta node can create grpc client in both cases.

Additional Information

Tracking issue: refactor cluster mode

Description
The cluster mode of CeresDB has many defects at present, and we need a better distributed solution. We also need to refactor the current code to make the cluster mode of CeresDB more robust.

Proposal

Support shard to transfer leader based on distributed lock using etcd. The modifications to the CeresDB section are in this pull request. apache/horaedb#706
Refactor scheduler module, based on the new interaction model, use scheduler to manage the creation of procedures. #157
Refactor procedure manager, support executing procedures in shard-level serial manner. (Need to support scenarios where one procedure is associated with multiple shards.) #157
Refactor all types of procedures to remove internal modifications to shard metadata and instead update them through heartbeat. #157

Additional context

Disable create tables before the shards of the cluster are fully allocated

Description
In the current implementation, shards will start to be allocated when the number of nodes reaches the threshold, but only some shards are available for a period of time from the beginning to the completion of shards allocation. If there is a table creation request at this time, tables will be created on the allocated shards, it may result in uneven distribution of tables.

Proposal
Only after all the shards in the cluster are allocated can the table creation request be processed.

Additional context

Balance the shard distribution on servers

Description

Now the distribution of shard between servers may be not even, which cause high load on some server while low load on others. For example in my cluster, there are 128 shards and 30 servers, the shard count in every server like this:

We can observe the following:

The maximum count of shards on a single server is 8.
The minimum count of shards on a single server is 2.

Proposal

Make the distribution of shards more even.

Additional context

Support for automatic creation of default cluster by config

Description

Support for automatic creation of default cluster by configuration when meta is first run.

Proposal

Create default cluster by the leader of meta.

Additional context

Impl http service to support creating cluster.

#ceresmeta

Out of memory when runing golangci-lint

Describe this problem
When runing golangci-lint in apple M2 chip, it will out of memory.

Steps to reproduce

go version 1.20
golangci-lint version 1.49
golangci-lint run

Expected behavior
When running golangci-lint is normal

Additional Information
downgrade go version to 1.19 or update golangci-lint version to 1.51 can solve this problem.

apache / incubator-horaedb-meta Goto Github PK

incubator-horaedb-meta's Introduction

HoraeMeta

Status

Quick Start

Build HoraeMeta binary

Standalone Mode

Cluster mode

Acknowledgment

Contributing

License

incubator-horaedb-meta's People

Contributors

Stargazers

Watchers

Forkers

incubator-horaedb-meta's Issues

Describe This Problem

Proposal

Additional Context

Recommend Projects

Recommend Topics

Recommend Org