Giter Site home page Giter Site logo

fdb-zk's Issues

implement quotas

There are administrative commands to set up byte-size / subtree node size quotas. This can either be additional information at the directory level, or follow the exact same schema as Zookeeper which uses special children nodes

refactor directory usage

might want to have the module pull down the directories at start up time and then inject them

should there be a root directory for the whole app?

implement ZK watch semantics via changefeeds

We're currently using FDB watches to mimic ZK watches, but the guarantees are different. FDB sets a watch that fires asynchronously whenever its value is changed, but they're also allowed to not fire in the case of a quick ABA update.

ZK has pretty specific semantics (per https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#ch_zkWatches):

Watches are ordered with respect to other events, other watches, and asynchronous replies. The ZooKeeper client libraries ensures that everything is dispatched in order.

A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode.

The order of watch events from ZooKeeper corresponds to the order of the updates as seen by the ZooKeeper service.

We need to make sure the fdb-zk client dispatches all updates in order across both reads and watches. We can use FDB watches in conjunction with a changefeed of all updates to actively watched nodes. On read operations, we check this feed to see if any events need to be returned (the FDB watch didn't fire yet), and if so, yield those results first, and then continue with the read.

When a node is written:

 Check if `active_watches : zknode  : watch_type : client_id` exists
 If so, for each client:
   Remove `active_watches : zknode : watch_type : *`
   Insert `watches_changefeed : client_id : versionstamp : watch_type` --> `path`
   Increment `watch_trigger : client_id : watch_type`

The read op & watch_trigger callback flow is then:

read all `watches_changefeed : client_id : *` and remove all entries
return watches to client in ascending versionstamp order
For read ops: continue on with the request

Initial discussion in https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278

client session management

We can store client sessions in FDB, rather than in-memory like in ZK. Since we can rely on FDB as the source of truth, I'm pretty sure we can avoid having to think about which server is the session owner and SessionMovedExceptions and the like (since every client will see the same view of the world)

Session IDs are longs assigned by the server. Conveniently, FDB Versionstamp transaction ids are longs, so we can simply write a key and use its transaction version as the session id.

Clients initially pass in a timeout when connecting. Clients regularly send in heartbeats to update their timeout.

We can use two subspaces updated transactionally to store this data: session_ids and sessions_by_timeout. session_ids will map a given session id to its next timeout, and sessions_by_timeout will order all session ids by their next timeout, so that we can efficiently identify expired sessions.

On session creation:

Set: `(sessions_by_timeout, next_timeout_timestamp, incomplete_versionstamp)`
Set: `(session_ids, incomplete_versionstamp) --> next_timeout_timestamp`
Return versionstamp transaction id as session id to client

On heartbeat:

Read: `(session_ids, session_id_versionstamp) --> timeout timestamp` to find the next timeout
  If timestamp doesn't exist or is expired:
    Remove: `(session_ids, session_id_versionstamp)`
    Remove: `(sessions_by_timeout, timeout, session_id_versionstamp)`
    Return SessionExpired to client

  If timestamp is within range:
    Remove: `(sessions_by_timeout, old_timeout, session_id_versionstamp)`
    Set: `(sessions_by_timeout, next_timeout_timestamp, session_id_versionstamp)`
    Set: `(session_ids, session_id_versionstamp) --> next_timeout_timestamp`

To find expired nodes (which is a prereq for #4):

Scan: `(sessions_by_timeout, stale_timeouts)` for any timestamps that are now considered stale
Clean up their ephemeral nodes
Range delete the scan range from before

clean up ephemeral nodes

I think there are two broad pieces here, neither being too crazy:

  1. we need to write the code to clean up ephemeral nodes
  2. we need to figure out how we want to choose a client to execute (1)

For the latter piece, there were a few suggestions in https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278 to build out a simple leader election, or a cooperative clock and electing a given client to do the work for a given period of time. I think it could also make sense to optionally offload some of this work onto a dedicated client, so that app clients aren't cycling through unrelated background work.

handle ACL permissioning

This is overall simple to do but possibly gross.

On the storage side, we store a ZNode's ACLs in (node_ss, acls) --> [ACLs].

On the client auth side, we hook into the ZK server up until the first RequestProcessor to avoid having to deal with client connections & auth. For now at least, this means we can skip worrying about how the connection acquires its auth ids, and just trust that it does based on how ZK would normally do things.

The piece that's not currently done is actually enforcing the ACLs. For each operation, we must check the connection's auth ids against the requested action / node or parent node's ACLs. All this logic is straightforward, but unfortunately ZK-proper has it ACL-related methods marked package-private and private (https://github.com/apache/zookeeper/blob/branch-3.4.14/zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java#L273-L306), so this might come down to good ol' case of copypasta.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.