ph14 / fdb-zk Goto Github PK
View Code? Open in Web Editor NEWZooKeeper server on top of FoundationDB
License: MIT License
ZooKeeper server on top of FoundationDB
License: MIT License
There are administrative commands to set up byte-size / subtree node size quotas. This can either be additional information at the directory level, or follow the exact same schema as Zookeeper which uses special children nodes
Most ZK client ops require the server to verify the session is still active, we need to tie this behavior in to FdbZooKeeperImpl
might want to have the module pull down the directories at start up time and then inject them
should there be a root directory for the whole app?
We're currently using FDB watches to mimic ZK watches, but the guarantees are different. FDB sets a watch that fires asynchronously whenever its value is changed, but they're also allowed to not fire in the case of a quick ABA update.
ZK has pretty specific semantics (per https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#ch_zkWatches):
Watches are ordered with respect to other events, other watches, and asynchronous replies. The ZooKeeper client libraries ensures that everything is dispatched in order.
A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode.
The order of watch events from ZooKeeper corresponds to the order of the updates as seen by the ZooKeeper service.
We need to make sure the fdb-zk
client dispatches all updates in order across both reads and watches. We can use FDB watches in conjunction with a changefeed of all updates to actively watched nodes. On read operations, we check this feed to see if any events need to be returned (the FDB watch didn't fire yet), and if so, yield those results first, and then continue with the read.
When a node is written:
Check if `active_watches : zknode : watch_type : client_id` exists
If so, for each client:
Remove `active_watches : zknode : watch_type : *`
Insert `watches_changefeed : client_id : versionstamp : watch_type` --> `path`
Increment `watch_trigger : client_id : watch_type`
The read op & watch_trigger
callback flow is then:
read all `watches_changefeed : client_id : *` and remove all entries
return watches to client in ascending versionstamp order
For read ops: continue on with the request
Initial discussion in https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278
We can store client sessions in FDB, rather than in-memory like in ZK. Since we can rely on FDB as the source of truth, I'm pretty sure we can avoid having to think about which server is the session owner and SessionMovedExceptions
and the like (since every client will see the same view of the world)
Session IDs are longs
assigned by the server. Conveniently, FDB Versionstamp transaction ids are longs
, so we can simply write a key and use its transaction version as the session id.
Clients initially pass in a timeout when connecting. Clients regularly send in heartbeats to update their timeout.
We can use two subspaces updated transactionally to store this data: session_ids
and sessions_by_timeout
. session_ids
will map a given session id to its next timeout, and sessions_by_timeout
will order all session ids by their next timeout, so that we can efficiently identify expired sessions.
On session creation:
Set: `(sessions_by_timeout, next_timeout_timestamp, incomplete_versionstamp)`
Set: `(session_ids, incomplete_versionstamp) --> next_timeout_timestamp`
Return versionstamp transaction id as session id to client
On heartbeat:
Read: `(session_ids, session_id_versionstamp) --> timeout timestamp` to find the next timeout
If timestamp doesn't exist or is expired:
Remove: `(session_ids, session_id_versionstamp)`
Remove: `(sessions_by_timeout, timeout, session_id_versionstamp)`
Return SessionExpired to client
If timestamp is within range:
Remove: `(sessions_by_timeout, old_timeout, session_id_versionstamp)`
Set: `(sessions_by_timeout, next_timeout_timestamp, session_id_versionstamp)`
Set: `(session_ids, session_id_versionstamp) --> next_timeout_timestamp`
To find expired nodes (which is a prereq for #4):
Scan: `(sessions_by_timeout, stale_timeouts)` for any timestamps that are now considered stale
Clean up their ephemeral nodes
Range delete the scan range from before
I think there are two broad pieces here, neither being too crazy:
For the latter piece, there were a few suggestions in https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278 to build out a simple leader election, or a cooperative clock and electing a given client to do the work for a given period of time. I think it could also make sense to optionally offload some of this work onto a dedicated client, so that app clients aren't cycling through unrelated background work.
This is overall simple to do but possibly gross.
On the storage side, we store a ZNode's ACLs in (node_ss, acls) --> [ACLs]
.
On the client auth side, we hook into the ZK server up until the first RequestProcessor
to avoid having to deal with client connections & auth. For now at least, this means we can skip worrying about how the connection acquires its auth ids, and just trust that it does based on how ZK would normally do things.
The piece that's not currently done is actually enforcing the ACLs. For each operation, we must check the connection's auth ids against the requested action / node or parent node's ACLs. All this logic is straightforward, but unfortunately ZK-proper has it ACL-related methods marked package-private and private (https://github.com/apache/zookeeper/blob/branch-3.4.14/zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java#L273-L306), so this might come down to good ol' case of copypasta.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.