Giter Site home page Giter Site logo

akd's Introduction

akd Build Status

An implementation of an auditable key directory (also known as a verifiable registry or authenticated dictionary).

Auditable key directories can be used to help provide key transparency for end-to-end encrypted messaging.

This implementation is based off of the protocols described in SEEMless, with ideas incorporated from Parakeet.

This library provides a stateless API for an auditable key directory, meaning that a consumer of this library must provide their own solution for the storage of the entries of the directory.

Documentation

The API can be found here along with an example for usage. To learn more about the technical details behind how the directory is constructed, see here.

Installation

Add the following line to the dependencies of your Cargo.toml:

akd = "0.12.0-pre.3"

Minimum Supported Rust Version

Rust 1.51 or higher.

Top-Level Directory Organization

Subfolder On crates.io? Description
akd โœ“ Main implementation of AKD which a service provider that manages the underlying directory would need to run. A good starting point for diving into this implementation.
akd_core โœ“ Minimal library consisting of core operations in AKD.
examples Contains various examples for using AKD, along with utilities such as locally verifying audit proofs that are produced by WhatsApp's key transparency deployment. More details are contained here.
xtask Used for running the code coverage pipeline.

Audit

This library was audited by NCC Group in August of 2023. The audit was sponsored by Meta for its use in WhatsApp's key transparency deployment.

The audit found issues in release v0.9.0, and the fixes were subsequently incorporated into release v0.11.0. See the full audit report here.

Contributors

The original authors of this code are Evan Au (@afterdusk), Alex Chernyak (@alexme22), Dillon George (@dillonrg), Sean Lawlor (@slawlor), Kevin Lewi (@kevinlewi), Jasleen Malvai (@jasleen1), and Ercan Ozturk (@eozturk1). To learn more about contributing to this project, see this document.

License

This project is dual-licensed under either the MIT license or the Apache License, Version 2.0. You may select, at your option, one of the above-listed licenses.

akd's People

Contributors

afterdusk avatar alexme22 avatar asonnino avatar bren2010 avatar cryo28 avatar davidjattfb avatar dependabot[bot] avatar dillonrg avatar divyatalesra-2001 avatar eozturk1 avatar haochenuw avatar jasleen1 avatar kevinlewi avatar slawlor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

akd's Issues

Unbounded ```HistoryTreeNode``` field

HistoryTreeNode has an epochs field that is used to identify the epochs which an update took place.
As it stands now, the field can grow to an unbounded size. This will pose an issue when the corresponding DB record hits either soft or hard limits imposed by the DB schema or architecture.

ESTIMATED LIMITS

Epochs are u64 = 8 bytes.

  • For mysql, with the current VARBINARY(2000) definition, 2KB / 8B = ~ 250 values (we could always raise the max value in the definition, this is just for illustration)
  • For some alternate key-value db (KVDB) with say 100KB recommended soft limit for record size, ignoring other overhead, 100KB / 8B = ~ 12500 values

SOLUTION?

The associated epochs for a HistoryTreeNode can be obtained directly from the HistoryNodeState table given the node label, since each update corresponds to a HistoryNodeState.

  • For mysql, this can be done through one call with a join, or a separate, concurrent call to the DB.
  • For the alternate KVDB, one call is also sufficient. Instead of a "get" operation to fetch a singular HistoryTreeNode record, we can use a "scan" operation to fetch both the HistoryTreeNode record AND associated HistoryNodeState records at one go, provided we have a good sharding setup.

Accumulating the data from separate tables will incur some computation time on the database server and we will need to profile and assess the impact. However, I think this is necessary to avoid the unbounded record size issue. Are there any other potential solutions I've missed?

Storage contract for values between epoch range

With adding the limited_key_history call in directory.rs we now need the ability to select a series of user values given a minimum epoch (start_epoch). This should be a new call on the storage contract which doesn't exist today.

For usage, see directory.rs:L366-368.

AkdKey and Values should be standardized

The types AkdKey and Values should be standardized in their naming, definition and usage. They're a little hazard at the moment. (one is Akd... the other is just ..., etc).

Additionally we should migrate to binary (&[u8]) instead of strings to save on storage sizing for at least the value, if not the key also. All our operations are done on the binary values, so it's just for readability that it's strings. This will induce data-layer changes in the akd_mysql crate as well, as we'll be changing data-layer types.

Call multiple lookup operations async

To improve the efficiency of bulk/multi lookups, we could call lookups async and await on the results of all lookups -- rather than awaiting on them individually.

Handle absence of ValueState

In a production system, data in storage is pruned to manage storage size and adhere to privacy commitments. For AKD, the only prunable record is ValueState, which contains the data payload (key material for E2EE messaging).

Today, operations fail when ValueState cannot be retrieved. We need to modify the library to handle the scenario gracefully and propagate the information to the caller.

Invalidate cache on new Azks epoch

In a distributed deployment, the cache may contain stale records. We want the cache to be invalidated when a new Azks record with a higher epoch number is detected, so that stale HistoryTreeNodes are not used to generate proofs. This involves remembering the epoch of the Azks record in the cache, and at TTL expiration checking it against the newly fetched Azks record to see if the epoch number changed. If it has, the entire cache should be dumped.

MySQL error `HistoryTreeNode`s `batch_get`

For big-sized trees, mysql_tests are failing in batch_get due to errors similar to

ERROR HY000 (1366): Incorrect integer value: '??%????\u{3}?\\??N??\u{4}a??H??~~S??!DU9D' for column 'label_val' at row 1

Simplifying code

The code in history_tree_node.rs and append_only_zks.rs can be further simplified. A few suggestions:

  • Remove any remaining instances of unwrap().
  • Look for repeated code among conditionals and abstract it out into a separate function.

Make set_node_child_without_hash return nothing

Right now, when trying to migrate over state maps to use Storage, we are running into an issue that I think might be caused by this line:

https://github.com/novifinancial/SEEMless/blob/main/src/history_tree_node.rs#L173-L175

Here, we are potentially copying a child, implicitly cloning its state map. This gets problematic when we move to global storage. This could be solved by ensuring that set_node_child_without_hash doesn't return anything, and we find a better way to structure the code to achieve the same thing (without having to clone a child).

Adding benchmarks

Once #17 is closed, we should write benchmarks for its functions and various component parts so we have some sort of profile on what, if anything, needs to be optimised further and how this compares to the Java implementation in the paper.

AKD client verification crate with limited dependencies

Due to the high likelihood that client's will verify the proofs with limited resources, it would be helpful to have a separate crate with as few dependencies as possible for verification of the proofs provided from the AKD. This would then simply be a dependency of the main AKD crate or could be maintained independently.

Differentiate between private and non-private mode

Some applications may be amenable to a lower privacy level, such as certificate transparency. For these examples, we do not need a VRF and can thus be more efficient. We should allow both private and non-private modes for this crate. Perhaps using features.

Creating commits for values

At the moment the values being committed in the tree for prototyping are all dummies. We would like to replace the value_to_bytes function in directory.rs to return actual commitments instead. This involves 3 main steps:

  • Changing the function value_to_bytes to return a byte array derived from a digest instead.
  • To make the commitment hiding, generating and storing a nonce for each committed value.
  • Including the nonce in the proofs sent to the client, and having the client verify these nonces.

Creating and storing persistent VRF public and secret keys.

Now that we have a PR implementing the VRFs, we should find a way to securely generate, store and access the VRF secret key and to publish a VRF public key.

The vrf crate could probably support key generation but still it needs to be stored somewhere.

Adding VRF-based computation of labels

We want to use a VRF to compute labels in seemless_directory.rs and then verify these VRFs in lookup_verify, key_history_verify etc.

See: https://crates.io/crates/vrf for one option. Below is a starting checklist of things to consider:

  • We will need to convert the output of the VRF into NodeLabel type. We may want to change the NodeLabel type as a result, or implement a trait instead.
  • VRF keys need to be stored in some sort of config file, so that the server can generate VRF values.
  • In the Java implementation of SEEMless, we used BouncyCastle, and at the time, we had to hash to the curve from strings/bytearrays. We need to hash to the correct space in this case as well.
  • If we hash to a curve and later hash the output value, we should be careful about the security guarantees offered by the various parameter sizes. Would be helpful to add documentation for this.

AKD Quorum distributed change auditor

Overview

To prevent a split-view attack the root hash of every epoch needs to have a signature signed by a private key which cannot be leaked from the quorum (via shared-secret methodology). This quorum participates to independently validate the append-only proof of the key directory and each participant provides their partial shard of the quorum signing key and when enough participants agree, the changes are signed off on and stored in persistent storage.

That way a proof only needs to give the root hash, and the signature on it to ascertain the quorum has agreed on the changes, and the AKD (or any other 3rd party) cannot generate its own signatures.

Eventually these auditors can be participants from external entities who can participate in the quorum vote.

Requirements

  1. The Quorum Key (QK) is long-lived and securely managed such that individual (or a few) participants cannot generate their own signatures and invalidate the trust of the system
  2. The quorum participants can evolve over time (i.e. removal in the face of failures, and additions for future collaborative growth or upgrades).
  3. Communication channels are secure, such that 3rd parties cannot sniff the key shares over the wire and compromise the QK.
  4. The public key of the QK is publicly distributed such that it's easy to client-side validate the validity of the epochs without additional I/O operations to a potentially malicious acting key directory.

Initial design overview

A Quorum Key is generated at the beginning of the life of the quorum, and the private key is broken into "shards" via Shamir secret sharing.
These shards are transmitted to the quorum participants who hold them in secure storage. The shards are generated with the following properties

  1. There are 3 f + 1 nodes in the quorum
  2. 2 f + 1 nodes need to agree to reconstruct the signing key (quorum key)
  3. The public key is given publicly to every client which will need to verify the signature

Communcation with the quorum

The collection of nodes in this quorum receive commands from an external 3rd party (key directory epoch notification or admin interface for example).
The messages they can receive are the following

  1. New epoch publication - The AKD has published a new epoch and the quorum should validate the changes and generate a signature
  2. Quorum member enrollment - The quorum should attempt to add the new member to the set. A series of independent tests will be
    performed against the new member and if they pass, a new shard-set will be generated and the node enrolled
  3. Quorum member removal - The quorum has a member which should be removed. If 2 f + 1 nodes agree, the Quorum Key is reconstructed and new shards are generated and
    distributed to the remaining membership.

For any of these messages, whichever node receives the request is denoted as the leader. We do not need the full RAFT
protocol, since we have no need for a persistent leader and the nodes are effectively stateless in between operations.
The temporary request leader is responsible for communicating with the other quorum members and gathering
their votes, reconstructing the shards, and either signing the commitment or enrolling the new member (re-generating shards and transmitting them).

Inter-node messages:

  1. Vote - args: (proof and hash information)
    a. Reply: Success(shard) or Failure (errors)
  2. Generate enrollment test - args: (member information, public key, etc)
    a. Reply: A "test" of enrollment for a new quorum member, with the result temporarily stored in the local machine
  3. Process enrollment test result - args: (member reply to specific proof)
    a. Reply: If test successful, return the shard to the "leader". If not, return failure and don't disclose the shard
  4. Remove member - args: (node_id)
    a. Reply: Agree(shard) or Disagree (error)
  5. Update stored shard - args: (new_shard, commitment_over_shard_signed_by_quorum_key)
    a. We need to provide a commitment, so the membership doesn't replace shards with invalid information which is potentially a different quorum key and
    corrupting the quorum

Retrieval of the latest commitment

Since the public key is available to anyone who cares, external parties can read directly from the storage layer to minimize I/O to the quorum, as they
will likely be resource constrained with validating the commitments between epochs and processing membership changes.

NOTE: If the storage layer is mutated directly, then the quorum will start failing commitments and signatures will not match so the system fault will be detectable.

Batching reads and writes

We'd like to batch reads and writes so that there is a balance between the amount of required serialization and the amount of cache used to store deserialized structs in memory.

Cleanup errors in the library

There's a lot of error codes in the errors.rs library which are incorrectly utilized or not utilized at all. These codes should be cleaned up to only what's necessary and not duplicated in multiple structures.

Optimize for size-bounded cache

Currently we have a timed cache with unbounded space. Lifetime of the items in the cache is 30 seconds.

With the growing tree size and increasing number of requested operations (e.g., lookups), the size of the cache may become a limiting factor for how many operations we can handle. In that case, we will need to manage the cache more effectively.

To this end, some initial ideas are:

  • Varying expiration times for nodes depending on their use frequency.
  • Prioritize some (e.g., most used?) nodes during batch_get operations.

cc @afterdusk

Upgrade winter_* to 0.2.0

winter_crypto and associated libs (_math, _utils) seem to have released a new version, with better handling of bytes and big namespace cleanups. It'd be worthwhile to update to the latest version for these improvements

Bulk lookup proofs

In production, under high-load we might need the ability to perform "bulk" lookup proofs. This means that we should do the same breadth-first-search for a collection of users as is done under the Directory::publish operation (as is done in append_only_zks.rs:191).

This means, when doing a lookup we should calculate all the prefixes and pre-load (via BFS walk of batch-retrievals) all the nodes along the path into the cache. Then we can generate the proofs without doing many individual retrievals.

[long-running] Changing the storage contract requires 2x publishes to crates.io

The problem

Due to the change in the storage contract, there are delays in the time between publishing akd to crates.io and the new version being available to cargo to build akd_mysql.

This happens because when breaking the contract between akd and akd_mysql, akd_mysql will rely on the latest version of akd which doesn't exist in crates.io until it's published. For example, if both crates are at v1.0.0, and we change the contract, and publish v2.0.0, akd_mysql depends on akdv2.0.0 which was _just_published. There appears to be a race in crates.io that the package isn't instantly available after publish, so the cargo publish --dry-run balks saying akdv2.0.0 can't be found but since we don't have a multi-publish, akdv2.0.0 is published and cannot be reverted. Therefore we're in a "broken" state where half of the crates are published.

Current workaround

The current workaround is we need to publish akd with the update, akd_mysql will fail, and we version bump both and publish again. The 2nd time crates.io has updated and the index is updated, akd_mysql can build and publish.

See PRs tagged with publish_race to identify instances where this is needed.

Reason to fix

This is ugly and just clutters crates.io with partial versions of akd with sometimes matching akd_mysql crates.

Improve debugging by adding error contexts

We have errors that only include the error string in the error. When these errors surface up, they don't have any context on where and why they occurred (e.g., a MySQL error). We should improve these by adding

  • More context to the errors, e.g., GetData error in batch_get should mention that the operation was batch get.
  • Stack trace info. --> Since the stack trace leading to the error is cleared up when the error is caught, we lose this info.
  • Line numbers and file names when the error occurred.

Proof structs should have some properties merged into tuples

In MembershipProof, the sibling hashes and sibling labels are a 1-1 mapping and therefore should be a Vec<(NodeLabel, Digest)> rather than 2 separate properties.

Same thing in a NonMembershipProof with longest_prefix_children*, they can be merged into a single property of a tuple.

This syntax is warranted as it's already used in AppendOnlyProof with the inserted and unchanged nodes.

Remove calls to `unwrap()`

We need to strip the repo of any panicking calls, such as unwrap. This may also include changing return types for various messages.

Option to initialize AKD without write ability

Currently, when AKD is initialized, it will fetch the Azks record in storage. If it is unable to find the record, it will attempt to write the root node (and its epoch 0 state) along with the Azks record.

In a distributed deployment with dedicated readers and writers, we do not want the readers to perform this write on startup. Instead, they should fail on being unable to retrieve the Azks record.

Serialized storage

We would like for the Storage to always get string values and get string values. The specific implementation of SEEMless could then serialize and deserialize accordingly.

  • Create an enum that encompasses all the types of data a SEEMless directory may ever need to store.
  • Implement serialization and deserialization for said enum. This would also include writing a wrapper around the function which gets keys, so that the location in the statemap is deterministic based on the type of object being written.

Reduce the number of things that are "pub"

Right now, there are way too many functions / structs that are pub, which I think do not need to be. We should reduce this as much as possible, preferring pub(crate).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.