ph14 / fdb-zk Goto Github PK

ZooKeeper server on top of FoundationDB

License: MIT License

Java 98.86% Starlark 1.14%

fdb-zk's Issues

implement quotas

There are administrative commands to set up byte-size / subtree node size quotas. This can either be additional information at the directory level, or follow the exact same schema as Zookeeper which uses special children nodes

check client session expiry on the right ops

Most ZK client ops require the server to verify the session is still active, we need to tie this behavior in to FdbZooKeeperImpl

refactor directory usage

might want to have the module pull down the directories at start up time and then inject them

should there be a root directory for the whole app?

implement ZK watch semantics via changefeeds

We're currently using FDB watches to mimic ZK watches, but the guarantees are different. FDB sets a watch that fires asynchronously whenever its value is changed, but they're also allowed to not fire in the case of a quick ABA update.

ZK has pretty specific semantics (per https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#ch_zkWatches):

Watches are ordered with respect to other events, other watches, and asynchronous replies. The ZooKeeper client libraries ensures that everything is dispatched in order.

A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode.

The order of watch events from ZooKeeper corresponds to the order of the updates as seen by the ZooKeeper service.

We need to make sure the fdb-zk client dispatches all updates in order across both reads and watches. We can use FDB watches in conjunction with a changefeed of all updates to actively watched nodes. On read operations, we check this feed to see if any events need to be returned (the FDB watch didn't fire yet), and if so, yield those results first, and then continue with the read.

When a node is written:

 Check if `active_watches : zknode  : watch_type : client_id` exists
 If so, for each client:
   Remove `active_watches : zknode : watch_type : *`
   Insert `watches_changefeed : client_id : versionstamp : watch_type` --> `path`
   Increment `watch_trigger : client_id : watch_type`

The read op & watch_trigger callback flow is then:

read all `watches_changefeed : client_id : *` and remove all entries
return watches to client in ascending versionstamp order
For read ops: continue on with the request

Initial discussion in https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278

client session management

We can store client sessions in FDB, rather than in-memory like in ZK. Since we can rely on FDB as the source of truth, I'm pretty sure we can avoid having to think about which server is the session owner and SessionMovedExceptions and the like (since every client will see the same view of the world)

Session IDs are longs assigned by the server. Conveniently, FDB Versionstamp transaction ids are longs, so we can simply write a key and use its transaction version as the session id.

Clients initially pass in a timeout when connecting. Clients regularly send in heartbeats to update their timeout.

We can use two subspaces updated transactionally to store this data: session_ids and sessions_by_timeout. session_ids will map a given session id to its next timeout, and sessions_by_timeout will order all session ids by their next timeout, so that we can efficiently identify expired sessions.

On session creation:

Set: `(sessions_by_timeout, next_timeout_timestamp, incomplete_versionstamp)`
Set: `(session_ids, incomplete_versionstamp) --> next_timeout_timestamp`
Return versionstamp transaction id as session id to client

On heartbeat:

Read: `(session_ids, session_id_versionstamp) --> timeout timestamp` to find the next timeout
  If timestamp doesn't exist or is expired:
    Remove: `(session_ids, session_id_versionstamp)`
    Remove: `(sessions_by_timeout, timeout, session_id_versionstamp)`
    Return SessionExpired to client

  If timestamp is within range:
    Remove: `(sessions_by_timeout, old_timeout, session_id_versionstamp)`
    Set: `(sessions_by_timeout, next_timeout_timestamp, session_id_versionstamp)`
    Set: `(session_ids, session_id_versionstamp) --> next_timeout_timestamp`

To find expired nodes (which is a prereq for #4):

Scan: `(sessions_by_timeout, stale_timeouts)` for any timestamps that are now considered stale
Clean up their ephemeral nodes
Range delete the scan range from before

clean up ephemeral nodes

I think there are two broad pieces here, neither being too crazy:

we need to write the code to clean up ephemeral nodes
we need to figure out how we want to choose a client to execute (1)

For the latter piece, there were a few suggestions in https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278 to build out a simple leader election, or a cooperative clock and electing a given client to do the work for a given period of time. I think it could also make sense to optionally offload some of this work onto a dedicated client, so that app clients aren't cycling through unrelated background work.

handle ACL permissioning

This is overall simple to do but possibly gross.

On the storage side, we store a ZNode's ACLs in (node_ss, acls) --> [ACLs].

On the client auth side, we hook into the ZK server up until the first RequestProcessor to avoid having to deal with client connections & auth. For now at least, this means we can skip worrying about how the connection acquires its auth ids, and just trust that it does based on how ZK would normally do things.

The piece that's not currently done is actually enforcing the ACLs. For each operation, we must check the connection's auth ids against the requested action / node or parent node's ACLs. All this logic is straightforward, but unfortunately ZK-proper has it ACL-related methods marked package-private and private (https://github.com/apache/zookeeper/blob/branch-3.4.14/zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java#L273-L306), so this might come down to good ol' case of copypasta.

ph14 / fdb-zk Goto Github PK

fdb-zk's Issues

implement quotas

check client session expiry on the right ops

refactor directory usage

implement ZK watch semantics via changefeeds

client session management

clean up ephemeral nodes

handle ACL permissioning

multi-op transactions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent