Giter Site home page Giter Site logo

fdb-zk's Introduction

fdb-zk

fdb-zk is a FoundationDB layer that mimics the behavior of Zookeeper. It is installed as a local service to an application, and replaces connections to and the operation of a ZooKeeper cluster.

While the core operations are implemented, fdb-zk has not been vetted for proper production use.

Talk & Slides

Learn about how the layer works in greater detail:

Architecture

Similar to the FoundationDB Document Layer, fdb-zk is hosted locally and translates requests for the target service into FoundationDB transactions.

Applications can continue to use their preferred Zookeeper clients:

┌──────────────────────┐     ┌──────────────────────┐
│ ┌──────────────────┐ │     │ ┌──────────────────┐ │
│ │   Application    │ │     │ │   Application    │ │
│ └──────────────────┘ │     │ └──────────────────┘ │
│           │          │     │           │          │
│           │          │     │           │          │
│       ZooKeeper      │     │       ZooKeeper      │
│        protocol      │     │        protocol      │
│           │          │     │           │          │
│           │          │     │           │          │
│           ▼          │     │           ▼          │
│ ┌──────────────────┐ │     │ ┌──────────────────┐ │
│ │  fdb-zk service  │ │     │ │  fdb-zk service  │ │
│ └──────────────────┘ │     │ └──────────────────┘ │
└──────────────────────┘     └──────────────────────┘
            │                            │
         FDB ops                      FDB ops
            │                            │
            ▼                            ▼
┌───────────────────────────────────────────────────┐
│                   FoundationDB                    │
└───────────────────────────────────────────────────┘

Features

fdb-zk implements the core Zookeeper 3.4.6 API:

  • create
  • exists
  • delete
  • getData
  • setData
  • getChildren
  • watches
  • session management

It partially implements:

  • multi transactions (reads are fine, but there are no read-your-writes)

It does not yet implement:

  • getACL/setACL
  • quotas

Initial Design Discussion

https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278/

Building with Bazel

  • Compiling: bazel build //:fdb_zk
  • Testing: bazel test //:fdb_zk_test
  • Dependencies: bazel query @maven//:all --output=build

License

fdb-zk is under the Apache 2.0 license. See the LICENSE file for details.

fdb-zk's People

Contributors

claudiouzelac avatar ph14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fdb-zk's Issues

implement ZK watch semantics via changefeeds

We're currently using FDB watches to mimic ZK watches, but the guarantees are different. FDB sets a watch that fires asynchronously whenever its value is changed, but they're also allowed to not fire in the case of a quick ABA update.

ZK has pretty specific semantics (per https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#ch_zkWatches):

Watches are ordered with respect to other events, other watches, and asynchronous replies. The ZooKeeper client libraries ensures that everything is dispatched in order.

A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode.

The order of watch events from ZooKeeper corresponds to the order of the updates as seen by the ZooKeeper service.

We need to make sure the fdb-zk client dispatches all updates in order across both reads and watches. We can use FDB watches in conjunction with a changefeed of all updates to actively watched nodes. On read operations, we check this feed to see if any events need to be returned (the FDB watch didn't fire yet), and if so, yield those results first, and then continue with the read.

When a node is written:

 Check if `active_watches : zknode  : watch_type : client_id` exists
 If so, for each client:
   Remove `active_watches : zknode : watch_type : *`
   Insert `watches_changefeed : client_id : versionstamp : watch_type` --> `path`
   Increment `watch_trigger : client_id : watch_type`

The read op & watch_trigger callback flow is then:

read all `watches_changefeed : client_id : *` and remove all entries
return watches to client in ascending versionstamp order
For read ops: continue on with the request

Initial discussion in https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278

handle ACL permissioning

This is overall simple to do but possibly gross.

On the storage side, we store a ZNode's ACLs in (node_ss, acls) --> [ACLs].

On the client auth side, we hook into the ZK server up until the first RequestProcessor to avoid having to deal with client connections & auth. For now at least, this means we can skip worrying about how the connection acquires its auth ids, and just trust that it does based on how ZK would normally do things.

The piece that's not currently done is actually enforcing the ACLs. For each operation, we must check the connection's auth ids against the requested action / node or parent node's ACLs. All this logic is straightforward, but unfortunately ZK-proper has it ACL-related methods marked package-private and private (https://github.com/apache/zookeeper/blob/branch-3.4.14/zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java#L273-L306), so this might come down to good ol' case of copypasta.

refactor directory usage

might want to have the module pull down the directories at start up time and then inject them

should there be a root directory for the whole app?

implement quotas

There are administrative commands to set up byte-size / subtree node size quotas. This can either be additional information at the directory level, or follow the exact same schema as Zookeeper which uses special children nodes

clean up ephemeral nodes

I think there are two broad pieces here, neither being too crazy:

  1. we need to write the code to clean up ephemeral nodes
  2. we need to figure out how we want to choose a client to execute (1)

For the latter piece, there were a few suggestions in https://forums.foundationdb.org/t/fdb-zk-rough-cut-of-zookeeper-api-layer/1278 to build out a simple leader election, or a cooperative clock and electing a given client to do the work for a given period of time. I think it could also make sense to optionally offload some of this work onto a dedicated client, so that app clients aren't cycling through unrelated background work.

client session management

We can store client sessions in FDB, rather than in-memory like in ZK. Since we can rely on FDB as the source of truth, I'm pretty sure we can avoid having to think about which server is the session owner and SessionMovedExceptions and the like (since every client will see the same view of the world)

Session IDs are longs assigned by the server. Conveniently, FDB Versionstamp transaction ids are longs, so we can simply write a key and use its transaction version as the session id.

Clients initially pass in a timeout when connecting. Clients regularly send in heartbeats to update their timeout.

We can use two subspaces updated transactionally to store this data: session_ids and sessions_by_timeout. session_ids will map a given session id to its next timeout, and sessions_by_timeout will order all session ids by their next timeout, so that we can efficiently identify expired sessions.

On session creation:

Set: `(sessions_by_timeout, next_timeout_timestamp, incomplete_versionstamp)`
Set: `(session_ids, incomplete_versionstamp) --> next_timeout_timestamp`
Return versionstamp transaction id as session id to client

On heartbeat:

Read: `(session_ids, session_id_versionstamp) --> timeout timestamp` to find the next timeout
  If timestamp doesn't exist or is expired:
    Remove: `(session_ids, session_id_versionstamp)`
    Remove: `(sessions_by_timeout, timeout, session_id_versionstamp)`
    Return SessionExpired to client

  If timestamp is within range:
    Remove: `(sessions_by_timeout, old_timeout, session_id_versionstamp)`
    Set: `(sessions_by_timeout, next_timeout_timestamp, session_id_versionstamp)`
    Set: `(session_ids, session_id_versionstamp) --> next_timeout_timestamp`

To find expired nodes (which is a prereq for #4):

Scan: `(sessions_by_timeout, stale_timeouts)` for any timestamps that are now considered stale
Clean up their ephemeral nodes
Range delete the scan range from before

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.