private-attribution / ipa Goto Github PK

A raw implementation of Interoperable Private Attribution

License: MIT License

Rust 98.45% Shell 1.01% Python 0.46% Dockerfile 0.07%

ipa's Introduction

IPA

A collaborative effort to create prototype of the helper (or server) components of the Interoperable Private Attribution (IPA) proposal.

IPA enables attribution, providing information about how advertising campaigns are delivering value to advertisers, while giving users strong privacy assurances. IPA uses multi-party computation (MPC) to achieve this goal. IPA relies on three helper nodes (servers) that are trusted to faithfully execute the protocol without conspiring with other helper nodes to violate user privacy.

This Project

This project is intended to be a functional, performant, and comprehensible implementation of the core IPA protocol. This should allow for validation of the implementation and should enable performance measurement.

The eventual goal is to provide the core of an implementation that could be deployed and used. This will require additional infrastructure in order to meet the privacy and security goals of the project.

This is very much a work in progress; input is welcome. However, see our contribution guidelines for some important notices regarding how to participate in the project.

Getting Started

Installation

First, clone this repo. If you have the GitHub CLI installed:

gh repo clone private-attribution/ipa && cd ipa

or just with Git:

git clone https://github.com/private-attribution/ipa && cd ipa

Check to make sure you have a recent version of Rust with

rustc -V

If you do not have Rust installed, see the rust-lang instructions.

Building IPA

To build the project, run:

cargo build

The first time, it will download the necessary packages (crates) and compile the project.

If you're just running tests/benchmarks, it will build automatically and you can skip this step.

Running tests

To run the test suite, run

cargo test

Running Benchmarks

There are a handful of benchmarks which can be run, but oneshot_ipa will run the whole protocol locally. On a M1 Macbook Pro, this takes a couple minutes.

cargo bench --bench oneshot_ipa --no-default-features --features="enable-benches compact-gate"

Other benchmarks you can run:

Sorting:

cargo bench --bench oneshot_sort --no-default-features --features="enable-benches compact-gate"

Arithmetic gates:

cargo bench --bench oneshot_arithmetic --no-default-features --features="enable-benches compact-gate" -- --width 1 --depth 1

You can adjust the width and depth of the gates at the expense of a longer benchmarking run.

Other:

cargo bench --bench criterion_arithmetic --no-default-features --features="enable-benches compact-gate"

cargo bench --bench iai_arithmetic --no-default-features --features="enable-benches compact-gate"

(NOTE: This benchmark only works on Linux. If you are on macOS, an error is expected.)

ipa's People

Contributors

Stargazers

Watchers

ipa's Issues

Optimization opportunities identified which will be visited later

This issue aims to document all the possible optimization opportunities which were identified during coding. In a near future, when we are working on making the code optimal, these can be used as a checklist.

Use PGO for benchmarking

If we're really serious about performance, we could use PGO. It's a tiny bit of a pain to setup, but it can improve performance quite a bit.

Upgrading IPA infrastructure to support malicious and semi-honest protocols

This issue exists to solicit some early feedback on upgrading the infrastructure to seamlessly support malicious and semi-honest protocols. The current approach is that protocols are generic over secret share types and behave the same regardless of whether we use regular replicated secret sharing or malicious one. That model has a lot of benefits: the protocol code is the same, so it only needs to be written once. Verification happens behind the scenes and protocols don't know about it.

I've been working on this for some time and here is the model I propose we use. I am not 100% confident that I will be able to implement this (this change is WIP) but would love some feedback if someone thinks this is not going to work anyway @benjaminsavage , @martinthomson

/// New trait that all secret sharing models must implement. So far it seems the only thing we 
/// can do wit them is addition and multiplication
trait SecretShare<F: Field> : Add<Output = Self> + Mul<Output = Self> + Sized {}

/// Protocol context is now generic over share type and field
struct ProtocolContext<S: SecretShare<F>, F: Field> {
    s: PhantomData<S>,
    f: PhantomData<F>,
    // other fields like prss go here
}

/// Both semi-honest and malicious replicated sharings implement the secret share trait.
impl <F: Field> SecretShare<F> for Replicated<F> {}
impl <F: Field> SecretShare<F> for MaliciousReplicated<F> {}

// Generic multiplication now looks different. It requires to provide everything at once, i.e.
// record id, a, b
#[async_trait::async_trait]
trait Multiply<F: Field> {
    type Share: SecretShare<F>;

    async fn multiply(self, record_id: RecordId, a: Self::Share, b: Self::Share) -> Result<Self::Share, BoxError>;
}


/// For protocols that use replicated secret sharing we use semi-honest multiplication
#[async_trait::async_trait]
impl <F: Field> Multiply<F> for ProtocolContext<Replicated<F>, F> {
    type Share = Replicated<F>;

    async fn multiply(self, record_id: RecordId, a: Self::Share, b: Self::Share) -> Result<Self::Share, BoxError> {
        todo!()
    }
}

/// For protocols that use malicious secret sharing we use malicious multiplication
#[async_trait::async_trait]
impl <F: Field> Multiply<F> for ProtocolContext<MaliciousReplicated<F>, F> {
    type Share = MaliciousReplicated<F>;

    async fn multiply(self, record_id: RecordId, a: Self::Share, b: Self::Share) -> Result<Self::Share, BoxError> {
        todo!()
    }
}

The core idea is to make ProtocolContext generic over secret sharing type. That requires making a new trait that can be implemented by any of the secret sharing models we use or we may use in the future. Looking at Shamir and Replicated, there is not a lot of things that each sharing type can do - addition and multiplication. Looks like protocols don't care about it at all as they only pass the shares down to arithmetic circuit which will be specialized.

Then, there is a new trait Multiply that is generic over a field, but secret sharing is an associated type. That allows implementing generic semi-honest multiplication over any field but only for replicated secret shares. Similarly , malicious multiplication has malicious secret shares as its input and that will be enforced at compile time.

The biggest question I don't have a good answer for is whether we need to pretend that MaliciousReplicated<F> is Replicated<F>. Looking at Reveal protocol, having this assumption may help because we only need to reveal x part of the maliciously secure share, leaving r*x hidden. But adding this backward compatibility may relax compiler checks and make it possible to execute semi-honest multiplication using MaliciousReplicated sharing which is not great.

Right now I am in the middle of this big refactoring where everything appears to be broken, so I would post the outcome of this refactoring here to let everyone know whether this model is realistic or not. But it would be great if we can start a conversation about it earlier.

Make SecurityValidator work with malicious secret sharing scheme

Currently SecurityValidator accepts semi-honest context. It does not make much sense because validation can be performed on MaliciousReplicated<F> only, so the input should rather be ProtocolContext<'_, MaliciousReplicated<F>, F> instead.

The reason for this behavior is because Reveal and CheckZero protocols are not generic over any secret sharing scheme but they should be. Once it is done, SecurityValidator should be changed to accept malicious context.

Change Reveal to be generic over any secret sharing scheme
Change CheckZero to be generic over any secret sharing scheme
Change SecurityValidator to accept malicious context

Invalid Instruction when running MPC locally

Hey, I would like to create a few tests using the research-prototype. But I have the following error when trying to run MPC locally. Do you know how to solve it ?

Running /home/ubuntu/MP-SPDZ/Scripts/../replicated-ring-party.x 0 -R 32 ipae2e -pn 19933 -h localhost
Running /home/ubuntu/MP-SPDZ/Scripts/../replicated-ring-party.x 1 -R 32 ipae2e -pn 19933 -h localhost
Running /home/ubuntu/MP-SPDZ/Scripts/../replicated-ring-party.x 2 -R 32 ipae2e -pn 19933 -h localhost
Using security parameter 40
Trying to run 32-bit computation
terminate called after throwing an instance of 'Invalid_Instruction'
what(): Processor-Error : Invalid instruction 0x178 at 5/0x64

The complications of dynamic role allocation

We assume that shares are allocated in a ring as follows:

H1 gets s1, s2
H2 gets s2, s3
H3 gets s3, s1

However, clients and record collectors will identify helpers by their identity (for instance: A, B, C) and roles will be allocated by the leader.

If the leader allocates (A, B, C) -> (H1, H2, H3) or (B, C, A) or (C, A, B), then the order of allocations will ensure that the shares will be properly assigned. These allocations preserve the order of helps so that A is to the left of B is to the left of C is to the left of A.

However, the other three possible allocations: (A, C, B), (B, A, C), and (C, B, A) reverse the ring order. This works, but only if each helper knows that this is the case, so that it swaps the left and right values.

I think that we need something in the helper party network configuration that sets a canonical ring order, otherwise we'll potentially end up with a swapped order. A failure here would either be disastrous for all of our security guarantees or would cause an immediate failure. I'm not sure which: all of our algorithms make assumptions about how shares are replicated.

Thoughts on making Step and ProtocolContext cheaper

We don't really know whether these structures are going to impose significant overheads just yet, but we have identified reasons for both needing some attention.

Warning: lots of rust code that I just typed into a text editor. No guarantees that any of this works.

ProtocolContext

I'm advocating for ProtocolContext being cheap so that we can make copies for every step and every record. This should improve API ergonomics. The structure (as of writing) is:

pub struct ProtocolContext<'a, N> {
    role: Identity,
    step: UniqueStepId, // or Step
    prss: &'a PrssEndpoint,
    gateway: &'a Gateway<N>,
}

That's a very small enum for role (1 bit possible, 8 bits likely), an owned string (yikes), and two references (64 bits each maybe, though this might be compressed on some platforms).

The primary offender here is step, which we'll get back to, but it seems like there is an obvious fix here. Moving the shared state into another object could make this nicer. One way to do that is to do something like this:

trait HelperState {  // name TBD
  type Network: crate::protocol::Network;
  fn role(&self) -> Identity;
  fn gateway(&self) -> &Gateway<Self::Network>;
  fn prss(&self) -> &PrssEndpoint;
}

pub struct ProtocolContext<'a, H> {
  global: &'a H,
  step: UniqueStepId, // or Step
}

impl<H: HelperState> for ProtocolContext<'_, H> {
  // This can access self.global.prss() rather than self.prss
}

This style of indirection might be necessary to allow the in-memory networking to refer to a single global state and a real networking setup to refer to a single helper-wide object. Either way, the overall cost is then dominated by the step identifier. The interface is no more complex. The only cost is following a pointer when we want something.

Step

In debugging and testing, we want something that is descriptive of the process. Our current approach describes steps with strings and then composes those together into something like a path.

We can use that process happily in debugging, but step identifiers need to be passed around a LOT and that might hurt performance when we need to shift a lot of data. What we need is a process that produces an object that can be represented in memory as a simple integer.

One way to do that is to map the path-like structure we naturally produce onto an integer, assigning each node in the path (including "directories") an integer value.

protocol => 0
protocol/a => 1
protocol/a/x => 2
protocol/a/y => 3
protocol/b => 4
protocol/c => 5
protocol/c/x => 6

Then we have a way to represent any step in that specific process with a simple and small number¹. The obvious way to do this is to collect all of the steps that are used (including intermediate nodes), printing them out, (maybe) sorting them, and assigning each a number in sequence.

We can't do the number thing until we have a complete understanding of all of the steps involved in a protocol execution. For that, we can simply assemble a complete protocol, run it using generic steps, and collect the unique steps that are created during execution. From that, we can develop a complete set of states and a mapping to a number. This might be turned into a distinct type using either code generation (in build.rs say), a macro, or some combination of the two².

From this we can produce a wrapper around an integer type that encodes this information efficiently based on calls to the Step::narrow() function. Recall that each use of narrow() takes an arbitrary enum type as input. We can then define a type that captures this information.

Taking the example above, there are two types involved in narrowing, which we might say are enums SubstepABCD and SubstepXY.

First, we establish a general interface that allows the new type we are defining to accept sub steps:

trait StepNarrow<SS> {
  fn narrow(self, sub_step: SS) -> Self;
}

Then, using whatever code generation method we decide, we define a new type of step.

trait Step: AsRef<str> {}
struct ExampleStep(u8);
impl Step for ExampleStep {}

Then we define the state transitions that are valid as follows:

impl StepNarrow<SubstepXY> for ExampleStep {
  fn narrow(self, ss: Self::SS) -> Self {
    Self(match (self.0, ss) {
      (1, SubstepXY::X) => 2,
      (1, SubstepXY::Y) => 3,
      (5, SubstepXY::X) => 6,
      _ => panic!("cannot narrow with {ss.as_ref()} from state {self.as_ref()}");
    })
  }
}

impl StepNarrow<SubstepABCD> for ExampleStep {
  fn narrow(self, ss: Self::SS) -> Self {
    assert_eq!(self.0, 0, "cannot narrow with {ss.as_ref()} from state {self.as_ref()}");
    Self(match ss {
      SubstepABCD::A => 1,
      SubStepABCD::B => 4,
      SubStepABCD::C => 5,
    }) // this form might be more common, but it might not be worth implementing this way
       // because a single form is probably easier; I only did this because I'm hand-rolling code.
  }
}

For debugging purposes we would still implement AsRef<str> for the new type (plus ensuring that it is implemented for each sub-step). For the step type, this can use a macro that just maps back to the debug-friendly strings that we use.

This requires that we make the entire protocol context generic on the protocol again, which is a bit annoying, but it should only manifest in the type definition for the protocol context. For this, we define a Step like so:

pub trait Step: AsRef<str> + Default {}
pub trait Substep: AsRef<str> {
}

In place of the struct we currently have. Then we take that implementation and we can define a blanket implementation for StepNarrow.

#[derive(Default)] // or implement it, whatever suits best
pub struct GenericStep {
  s: String,
  // + all the debug stuff
}
impl<SS: Substep + ?Sized> StepNarrow<SS> for GenericStep {
  fn narrow(&self, substep: SS) -> Self {
    Self { s: self.s + '/' + substep.as_str(), /* etc... */ }
  }
}

We can also retain the implementation of Substep for str and String to make testing simpler, though this will necessarily only work for GenericStep.

When we instantiate a version of the protocol in tests we can do something clean like this:

  let world = make_world();
  let (c0, c1, c2) = make_contexts(&world);
  // cN has a type of `ProtocolContext<'a, Arc<InMemoryEndpoint>, GenericStep>`,
  // or something along those lines.  The type won't need to be exposed in the interface.
  // This works because the test code always uses `GenericStep`

And for a real helper where we care about performance, we can instantiate a ProtocolContext using ExampleStep or the FullIpaStep or whatever we call it³. Our performance tests can define a full step mapping so that they aren't burdened with strings.

Then the protocol a) won't compile if narrow() is passed the wrong types, b) will crash if we pass the wrong values in the wrong places, and c) will not have much overhead.

Note that we could also define an enum for this, but that might be complicated as you effectively need nodes to be Option<T> so that you can represent the parent node in the tree. Also, it's not clear that Rust will happily collapse the representation of that enum without extensive coaxing. That might be something we can explore later. That said, defining a flat enum is a distinct possibility. ↩
I tend to think that a macro is better because we can tune the list to remove nodes that are never used. Rust macros let you define a very clean interface in whatever format you prefer. The compiler provides some amount of type checking here as well, which is nice. ↩
We'll probably need a turbofish for that as we don't need to pass the starting value for the step if we require Default. ↩

Shuttle and unit tests should use common code when possible

Currently there is duplicate code used in shuttle and unit tests when we test higher level subprotocols/protocols. Some examples include ipa and sort.
Now that we are moving towards performance improvements, it would make development faster if we could cross test and improve benchmarking results while ensuring correctness ideally using same tests.
This issue is to ensure we use same code whenever possible between the two scenarios

Custom type to represent sizes

since rust's usize has spooky platform-dependent behavior, we want to avoid using it to represent sizes, such as content-length, or offset, or any other usize-like data. as such, it would be good to have an alias/type wrapper for a u32. That would give us a way to specify what data represents sizes, while giving us the option to later upgrade it to a u64 if we start approaching the limits of a u32.

it probably doesn't need to be complicated. something as simple as

struct Size(u32)

with the appropriate traits implemented (like Deref) and easy conversion to a usize when std lib expects that (*Size(1) as usize works) will probably be sufficient.

PRSS validation in debug runs

Currently we have Participant interface that provides the entry point into shared randomness generation. Clients can request a dedicated space of randomess (basically a PRF with the fixed key) from Participant by using an index value. There is currently no enforcement for this index to be unique.

It is easy to make a mistake right now and incorrectly implement SpaceIndex for a given step and create security violations in IPA code.

This issue is to discuss what can be done to avert that. Can we create a proc macro that can implement SpaceIndex for any enum? Is it possible to use Bloom filters to validate values generated by all PRSS instances in end-to-end tests and alert when program generates the same value for the same index?

Remove `metrics-tracing-context`

    I would also need to get rid of `metrics_tracing_context` because of this change (we no longer use labels from spans)

Originally posted by @akoshelev in #284 (comment)

Use `<[_; N]>::each_ref()` when available

There are a few places where we construct a fixed size array of objects, then run that through map() in order to produce another fixed size array. For example:

        let bit_ctx: [_; 3] = top_ctx
            .iter()
            .map(|ctx| ctx.narrow(&format!("bit{}", bit)))
            .collect::<Vec<_>>()
            .try_into()
            .unwrap();

This is super awkward as you have to iterate the slice, map, collect into a Vec<_>, then use try_into() to maybe create the array, then unwrap that. This would be cleaner, once each_ref() becomes available in mainline rust.

        let bit_ctx = top_ctx.each_ref().map(|ctx| ctx.refine(format!("bit{}", bit)));

Batching for PRSS

On some machines, it is far more efficient to invoke AES on a block of inputs. This could make using the PRSS functions more efficient.

However, we currently don't allow the PRSS references to be mutable. This simplifies access (with Arc for instance) and usage, but that means that the object can't easily maintain a cache. If we are going to have concurrent access to the PRSS instance across threads, then there might not be any gains to be had from caching either as the overhead of managing that concurrent access might outweigh any advantages. This might need some investigation.

Channel lifetime management

Currently, we have the ability to open a communication channel per step between two helper parties, but we lack a mechanism to close it. Once channel is open, it stays as such for the duration of the query on both sending and receiving side, consuming resources and CPU time if we poll them.

This problem is not trivial to solve because the decision to close it should be made on the protocol side (only that code knows when the channel is no longer in use) but this decision must be propagated through all the layers on both sending and receiving side, so both can actually destroy all the resources they hold to keep the channel open.

It may be the same solution as for prss to avoid re-using the same index more than once or we could invent another mechanism to do that. Drop guards sound really nice if we can make it work. Channels will be closed automatically after protocols terminate but likely that is not the only possible solution here.

Explore whether cloning context can be made more efficient

Our software design for IPA require protocol contexts to be cheaply cloneable because we clone them a lot.

pub struct SemiHonestContext<'a, F: Field> {
    inner: Arc<ContextInner<'a>>,
    step: Step,
}

Making steps cloning inexpensive is tracked in #139, this issue focuses on improvements that can be made on protocol contexts, specifically on ContextInner clone. It looks innocent (atomic load/store on each clone) but when doing it often it may lead to false sharing issues on CPU caches. There may be a better way to do this if we look at the structure of the ContextInner struct:

pub(super) struct ContextInner<'a> {
    pub role: Role,
    pub prss: &'a PrssEndpoint,
    pub gateway: &'a Gateway,
}

There is a role bit and two references that never change no matter how many times we clone the context. There might be an opportunity to cache those values (not references) in one place. Root-level future that is created to run protocol may be a good place to do that and Tokio crate provides some support for it: https://docs.rs/tokio/latest/tokio/task/struct.LocalKey.html.

10000 foot view

This is how query processing may look like

Helpers receive a request to process new query. They dynamically allocate roles for each of them and set up PRSS.
Each helper stores role, PRSS endpoint and Gateway inside task local data cell (tokio localkey)
Query processing is launched by spawning a tokio task (or any other runtime task). In case of tokio it would look like

   tokio::spawn(|| async {
        CONTEXT_INNER.scope((prss, gateway, role), async move {
              let context = SemiHonestContext::new("step");  // no need to pass PRSS or Gateway anymore      
         }).await;
   });

   impl SemiHonestContext {
          pub fn prss(&self) -> &PrssEndpoint {
              CONTEXT_INNER.with(|inner| inner.0) // not sure if that works fine
          }
   }

This design does not require changing protocols code, everything is contained within Infra layer

Complications

Our plan is to run IPA software on Cloudflare Workers that do not support Tokio: https://github.com/cloudflare/workers-rs. LocalKey implementation however does not depend on tokio runtime so there may be no issues
This change will require changes in our test infrastructure. Every test needs to set up per-task data before running any computation. TestWorld::contexts() function usage becomes problematic, but luckily most of our tests use Runner interface that can easily hide that compexity inside.

Enum for event structure

Don't you want to use an enum for this? That is, something like:

pub enum Event {
  Source {
    event: EventBase,
    breakdown_key: Number,
  },
  Trigger {
    event: EventBase,
    value: Number,
  },
}

I realize that you want to have the binary value of source and trigger events put the breakdown_key and value fields in different positions, but for that you will want to implement Serialize and Deserialize manually anyway.

Originally posted by @martinthomson in https://github.com/martinthomson/raw-ipa/pull/58#discussion_r935112482

[Bug] wrong permutation indices when task scheduling reordering occurs

Running concurrency test on sort revealed that we have a reordering issue somewhere.

    shuttle::check_random(
        || {
            shuttle::future::block_on(execute_sort()).unwrap();
        },
        5,
    );

fails with

test panicked in task 'main-thread'
index out of bounds: the len is 50 but the index is 1942298774
thread 'tests::randomized::sort' panicked at 'index out of bounds: the len is 50 but the index is 1942298774', src/protocol/sort/apply.rs:23:24
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_bounds_check
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:84:5
   3: core::slice::<impl [T]>::swap
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/slice/mod.rs:619:36
   4: raw_ipa::protocol::sort::apply::apply
             at ./src/protocol/sort/apply.rs:23:17
   5: raw_ipa::protocol::sort::compose::compose::{{closure}}
             at ./src/protocol/sort/compose.rs:50:5
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   7: raw_ipa::protocol::sort::generate_sort_permutation::generate_sort_permutation::{{closure}}
             at ./src/protocol/sort/generate_sort_permutation.rs:69:9
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   9: <F as futures_core::future::TryFuture>::try_poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
  10: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_future/into_future.rs:34:9
  11: <F as futures_core::future::TryFuture>::try_poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
  12: <futures_util::future::try_maybe_done::TryMaybeDone<Fut> as core::future::future::Future>::poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_maybe_done.rs:79:57
  13: <F as futures_core::future::TryFuture>::try_poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
  14: <futures_util::future::try_join_all::TryJoinAll<F> as core::future::future::Future>::poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_join_all.rs:148:27
  15: raw_ipa::test_fixture::sort::execute_sort::{{closure}}
             at ./src/test_fixture/sort.rs:53:9
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
  17: shuttle::future::block_on
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/shuttle-0.4.1/src/future/mod.rs:148:15
  18: raw_ipa::tests::randomized::sort::{{closure}}
             at ./src/tests/randomized.rs:10:13

Permutation slice contains some weird values

permutation = [
    29,
    1,
    7,
    37,
    434506957,
    16,
    14,
    44,
    776628359,
    41,
    13,
    31,
    8,
    23,
    42,
    35,
    27,
    40,
    5,
    12,
    36,
    19,
    25,
    49,
    17,
    9,
    33,
    39,
    15,
    1942298774,
    34,
    38,
    4,
    21,
    43,
    0,
    28,
    2,
    20,
    32,
    3,
    1141533336,
    48,
    26,
    6,
    45,
    11,
    22,
    30,
    10,
]

Centrally manage conversions between IPA types and network-layer types

We have various pieces of information we need to convey across a network boundary, such as QueryId, UniqueStepId, RecordId, and MessageChunks that need a conversion to/from their rust representation and HTTP representation. As of now, we are relying on the Axum-style conversions, which necessitate being paired with the type, wherever it lives.

Since we are moving away from Axum, we should take control of those conversions, and while we're at it, centralize all of them in one place (likely within the net module) so that it is easier to manage. This will likely look like a conversions mod that has a discrete function to/from the HTTP representation for each type we care about.

Address in use errors

---- net::transport::e2e_tests::fails_if_not_subscribed stdout ----
thread 'net::transport::e2e_tests::fails_if_not_subscribed' panicked at 'Failed to serve: Os { code: 98, kind: AddrInUse, message: "Address already in use" }', src/net/server/mod.rs:77:26
thread 'net::transport::e2e_tests::fails_if_not_subscribed' panicked at 'Failed to bind server to a port', src/net/server/mod.rs:95:14

Expand definition of `Network` layer into `Transport` layer

We currently have the concept of a Network layer that handles the communication between helpers. It interacts directly with the Infra layer to move data for running protocols.

the Network trait currently looks like this:

/// Network interface for components that require communication.
pub trait Network: Sync {
    /// Type of the channel that is used to send/receive messages to/from other helpers
    type Sink: futures::Sink<MessageChunks, Error = Error> + Send + Unpin + 'static;
    type MessageStream: Stream<Item = MessageChunks> + Send + Unpin + 'static;

    /// Returns a sink that accepts data to be sent to other helper parties.
    fn sink(&self) -> Self::Sink;

    /// Returns a stream to receive messages that have arrived from other helpers. Note that
    /// some implementations may panic if this method is called more than once.
    fn recv_stream(&self) -> Self::MessageStream;
}

there's a way to "send" data to another helper via the sink(), and "receive" data from the other helpers via the recv_stream. This works for an actively running query, but we don't have a way for helpers to communicate to each other about starting queries.

Here's how that workflow would look, in short:

Report Collector wants to start a new query on data that it has
Report Collector calls an externally available "Create Query" API to one of the helpers
That helper becomes the de facto "leader" of the request, generates a query id, and sends a "Prepare Query" message to the "follower" helpers; then, it tells the Report Collector what the new query id is
Report Collector sends all its data to the leader using a "Send Data" API
Leader disseminates that data to the 2 followers, then they all coordinate starting the protocol at the same time

So there is a mix of externally available APIs to initiate queries, as well as helper-to-helper communication that needs to happen before a query is even started. All of this communication should be abstracted such that it could be implemented for HTTP, InMemory, maybe protobuf, or anything else. For that, I want to propose a new trait, Transport:

pub trait Transport: Sync {
    type CommandStream: Stream<Item = TransportCommand> + Send + Unpin + 'static;
    type Sink: futures::Sink<TransportCommand, Error = TransportError> + Send + Unpin + 'static;

    /// To be called by an entity which will handle the events as indicated by the
    /// [`SubscriptionType`]. There should be only 1 subscriber per type.
    /// # Panics
    /// May panic if attempt to subscribe to the same [`SubscriptionType`] twice
    fn subscribe(&self, subscription_type: SubscriptionType) -> Self::CommandStream;

    /// To be called when an entity wants to send commands to the `Transport`.
    async fn send(&self, command: TransportCommand) -> Result<(), TransportError>;
}

The primary difference from the old Network trait is the idea of a TransportCommand. instead of just MessageChunks, you can send a command, which is defined as an enum of all possible actions a Transport might want to make. That would include commands like CreateQuery, PrepareQuery, SendData, StartProtocol, etc.

It would also include NetworkEvent, which contains MessageChunks, and would be the command used to cover the functionality the old Network provided, which I talk more about below.

The Transport trait simply exposes a subscribe function that allows you to receive a Stream of incoming commands, and a send, which allows you to send commands.

The last piece to discuss is the SubscriptionType (as part of the subscribe method), and to talk about how the Transport interacts with the infra layer. Under the existing implementation, the infra used the Network to grab a Sink and Stream to send/receive data with. In order to minimize the impact/number of changes on infra, I'm also proposing a new Network struct that would take its place. It would still deal in MessageChunks exclusively, and would look as follows:

/// Given any implementation of [`Transport`], a `Network` is able to send and receive 
/// [`MessageChunks`] for a specific query id. The [`Transport`] will receive `NetworkEvents` 
/// containing the `MessageChunks`
pub struct Network<T> {
    transport: T,
    query_id: QueryId,
}

with functions:

impl<T: Transport> Network<T> {
    /// sends a [`NetworkEvent`] containing [`MessageChunks`] on the underlying [`Transport`]
    pub async fn send(&self, message_chunks: MessageChunks) -> Result<(), Error>;

    /// returns a [`Stream`] of [`MessageChunks`]s from the underlying [`Transport`]
    /// # Panics
    /// if called more than once during the execution of a query.
    pub fn recv_stream(&self) -> impl Stream<Item = MessageChunks>;

So the major differences are

This is a struct, not a trait
Instead of a Sink, infra would use the send function

Hopefully, this would act as an almost-drop-in replacement of existing Network for infra.

Finally, returning to SubscriptionType, this has 2 options:

/// Users of a [`Transport`] must subscribe to a specific type of command, and so must pass this
/// type as argument to the `subscribe` function
pub enum SubscriptionType {
    /// Commands for managing queries
    Administration,
    /// Commands intended for a running query
    Query(QueryId),
}

So the administration piece (we are working on this in parallel) would subscribe to Administration commands, and the Network struct would use the Query subscription type to only subscribe to commands for a particular running query.

Feedback/ideas/improvements would be appreciated

Consider adding context-aware error handling library

It is often useful to provide additional context when converting error types which is impossible while using standard From and TryFrom implementations. For example, it is useful to record the destination address when send request fails. Often, the error types provided by the underlying libraries do not carry this information.

One way to handle this is to use map_err, however the code written that way tend to be repeatable and carry details that are not that important.

async fn send_request(dest: IpAddr) -> Result<(), MyError> {
    let destination_ip = "127.0.0.1:80".try_into().unwrap();
    let r = send("hello", destination_ip).await
       .map_err(|e| MyError::new(format!("Failed to send to {destination_ip}", e))
}

More elegant code that uses context aware error handling libraries may look like this

async fn send_request(dest: IpAddr) -> Result<(), MyError> {
    let destination_ip = "127.0.0.1:80".try_into().unwrap();
    let r = send("hello", destination_ip).context(dest).await?
}

impl From<Context<IpAddr, SendError>> for MyError {
    fn from(source: Context<IpAddr, SendError>) -> Self {
         MyError::new(format!("Failed to send to {destination_ip}", e)
    }
}

While standard library does not provide any API for that (however it is in the roadmap), there are several different libraries that implement this pattern

and probably more. We can consider using one of them

Fabric interface for message processing

Overview

Currently we have messaging layer implemented only for tests, see mock. There seem to be no perceived difference in how to handle communication in test vs prod. The only difference is that for tests we don't want to send messages over the wire (even through the loopback), meaning that there should exist two different fabrics that can be connected to the unified message processing interface. Tests will use in memory fabric that operates with pointers while network fabric will send and receive bytes from the socket as efficiently as possible.

Network fabric interface is being created in #79

This task is to track the unification of communication layers between test and prod use cases.

Work

Segregate in-memory fabric interface from the existing mock module
Integrate with #79 once the general shape of fabric interface is decided and deemed non-controversial.
Benchmark message communication layer under stress to identify hot spots and plan improvements.

Create a wrapper around `Vec` that supports indexing by `u32`

see the discussion here: https://github.com/martinthomson/raw-ipa/pull/108#discussion_r992621416

It would be great to have a type inside IPA repo that defines upper bound for IPA input size. We target to be able to handle 1B events, so a typealias (RecordIndex/RecordId?) for u32 is enough for now. It probably makes sense to have a vector type that allows indexing by this type without casting to usize and back - that should remove a lot of useless conversions from the protocol code.

tldr

   typealias RecordIndex = u32;
   struct RecordsVec<T> {
       inner: Vec<T>
   }

   impl TryFrom<Vec<T>> for RecordsVec<T> { ... }
   impl <T> Index<u32> for RecordsVec<T> { ... }
   impl AsRef<[T]> for RecordsVec<T> { ... }
   impl AsMut<[T]> for RecordsVec<T> { ... }
   impl Borrow<[T]> for RecordsVec<T> { ... }
   impl BorrowMut<[T]> for RecordsVec<T> { ... }
   impl <T> IntoIterator for RecordsVec<T> { ... }

Enable additional context on errors

Right now, we have a way to construct errors using the immediate context around it. But, there are some cases where we need context from a different scope to fully express the error. One example is when creating a crate::helpers::error::Error::PollSendError. This error can occur if the receiving side of a sender is closed, and a poll_ready is called on the sender.

We use PollSenders to send data to other helpers, and ideally we'd annotate the error with the helper we attempted to send to. But, we don't have that information from the context of poll_ready, and so we are left simply returning the message:

"An error occurred while sending data to unknown helper"

Obviously, this is less than ideal, and the information of which helper we are sending to does exist, just not in this specific context. We should look into a structured way to add information to errors in different contexts.

I did a cursory look at some rust crates, and found:

error context which adds a .with_context function
snafu which is a fuller error solution that would theoretically replace thiserror, and has a mechanism for adding context as well

New step model rejects attempts to use the same step on multiple records

After we landed #122 , our previous model of interaction between steps no longer works. We thought that using finite number of steps that define the circuit and executing this circuit for many records is optimal, but debug assertions added in the mentioned PR make it no longer possible.

Demo program to demonstrate the issue

#[cfg(test)]
mod toy_demo {
    use crate::field::{Field, Fp31};
    use crate::helpers::fabric::Network;
    use crate::protocol::context::ProtocolContext;
    use crate::protocol::{QueryId, RecordId, Step};
    use crate::secret_sharing::Replicated;
    use futures::StreamExt;
    use futures_util::stream::FuturesOrdered;
    use rand::thread_rng;
    use crate::test_fixture::{make_contexts, make_world, share};


    /// Takes a collection as input and executes `MultiplyTwice` protocol on every input row.
    #[derive(Debug)]
    struct ToyProtocol<'a, F> {
        input: &'a [(Replicated<F>, Replicated<F>)]
    }

    /// This protocol runs per row and just multiplies a*b*a
    struct MultiplyTwice<F> {
        a: Replicated<F>,
        b: Replicated<F>,
        record_id: RecordId
    }

    enum ToyStep {
        Step1,
        Step2
    }

    impl <'a, F: Field> ToyProtocol<'a, F> {
        async fn execute<N: Network>(self, ctx: &ProtocolContext<'a, N>) -> Vec<Replicated<F>> {

            let mut futures = FuturesOrdered::new();

            // for every input row, execute `MultiplyTwice`
            for (record_id, r) in self.input.iter().enumerate() {
                let record_id = RecordId::from(record_id as u32);
                let sub_protocol = MultiplyTwice { a: r.0, b: r.1, record_id };
                futures.push_back(sub_protocol.execute(&ctx));
            }

            let res = futures.collect::<Vec<_>>().await;
            res
        }
    }

    impl <F: Field> MultiplyTwice<F> {
        async fn execute<N: Network>(self, ctx: &ProtocolContext<'_, N>) -> Replicated<F> {
            // error occurs here because we narrow the same context more than once (once per record)
            let ctx = ctx.narrow(&ToyStep::Step1);
            let r = ctx.multiply(self.record_id).await.execute(self.a, self.b).await.unwrap();

            let ctx = ctx.narrow(&ToyStep::Step2);
            ctx.multiply(self.record_id).await.execute(self.a, r).await.unwrap()
        }
    }

    impl Step for ToyStep {}

    impl AsRef<str> for ToyStep {
        fn as_ref(&self) -> &str {
            match self {
                ToyStep::Step1 => "toy_step_1",
                ToyStep::Step2 => "toy_step_2",
            }
        }
    }

    #[tokio::test]
    pub async fn test() {
        let world = make_world(QueryId);
        let contexts = make_contexts(&world);
        let mut rng = thread_rng();
        let data = vec![
            (Fp31::from(10u128), Fp31::from(2u128)),
            (Fp31::from(5u128), Fp31::from(3u128)),
        ];

        // inputs for 3 helpers
        let mut inputs = [vec![], vec![], vec![]];
        for row in data {
            let left = share(row.0, &mut rng);
            let right = share(row.0, &mut rng);

            for i in 0..3 {
                inputs[i].push((left[i], right[i]));
            }
        }

        let protocols: [_; 3] = inputs.iter()
            .map(|v| ToyProtocol { input: &v })
            .collect::<Vec<_>>().try_into().unwrap();

        let v = contexts.iter()
            .zip(protocols)
            .map(|(ctx, protocol)| protocol.execute(ctx))
            .collect::<FuturesOrdered<_>>()
            .collect::<Vec<_>>().await;
    }
}

This program fails with the following error

Refined 'protocol' with step 'toy_step_1' twice
thread 'protocol::tests::test' panicked at 'Refined 'protocol' with step 'toy_step_1' twice', src/protocol/mod.rs:92:13
stack backtrace:

The error occurs because we attempt to run the same circuit for more than one input row (record id). This should be supported by the infrastructure imo, because otherwise folks are forced to model protocols with unique steps created per row (see #108 and #110). Not only "artificial" steps look weird, they also make infra inefficient as all the batching benefits are gone (step per row effectively means a single record per channel).

Potential solutions

Move checks from narrow to send. send method has access to both step and record id, so it can validate that there is at most one send per step, per record, per query id.

Pros: fixes the problem for message
Cons: prss method cannot benefit from this check, so we are still vulnerable to misuse of prss unless we implement something similar for it

Add record_id to narrow method. Enforce uniqueness of step + record_id rather than just step

Pros: fixes the issue for both prss and infra
Cons: it may feel unnatural to pass both record_id and step for composite protocols that operate on collections rather than single record. For example: shuffle wants to define a step that groups 3 sub-protocols, so it attempts to narrow the context but it needs to invent record_id out of thin air.

any other ideas?

share_of_one is in an odd place

This is currently attached to the Mesh interface, which doesn't make a whole lot of sense. It probably makes more sense to attach this to the Replicated interface.

Record metrics inside infrastructure layer that allows comparing different protocols

Things that would help engineers to evaluate which protocol is better

number of field values sent (total/per channel/per helper).
% of send buffer capacity send to network layer. The higher value the better. If this value is close to 100%, it means protocol fully utilizes parallel communication. However it shouldn't be 100% all the time as it will lead to congestion

there might be something else

Replicated probably shouldn't implement Copy

Though we are currently implementing Copy on Field, we really don't want to be copying collections of them all over the place. A single field needs to be sent over channels and all of that business, but we shouldn't be need to send Replicated instances anywhere.

I found a couple of places where we relied on having Copy implemented, so this might not be trivial to fix, so I'm opening an issue.

Use `Stream`s for the network layer

When MPC helpers receive messages from other helpers, we need a workflow to bubble those messages up to the protocol layer and land in the correct steps of the protocol. This issue will focus specifically on the lowest-level network layer and how it interacts with the infra layer. It will not specifically cover the interface between the infra and protocol layers.

Today, we have an existing solution that works, but doesn't quite align with the model that we think will work long term. I'll briefly cover today's solution, and mention a couple of its shortcomings:

Currently, the infra and network layers send and receive discrete chunks of data between them. When a helper is sending data to another helper, that means that the infra layer takes a Stream of data from the protocol, chunks it into Vecs of some size, and passes those chunks to the network layer. The network layer will then make a request with that chunk as its body, and also pass metadata describing the ChannelId associated with that chunk, as well as the record id of the first element in that chunk. On the receiving side, it accepts the chunk, inspects the metadata to assign record ids to each element, and passes the records within the chunk to the infra layer.

Under this model, the network layer makes many requests for any given Stream of data, based on the chunking that the infra layer passes to it. While this model will work, it may have issues at scale due to how many requests will be made, even if they reuse the same HTTP/2 connection. It's also less ergonomic because Hyper, the infra layer, and the protocol layer already deal in Streams, so it would be easier to understand the end-to-end if the network layer also dealt in Streams. Thus, we should rewrite the network layer to use Streams directly.

Using `Stream`s

At a high level, the Streams version of network layer should have the following behavior:

On send:

infra passes Stream of all records and ChannelId to network layer
network layer makes a POST to the receiving helper, where the body is the Stream of records

On receive:

network layer receives a POST from another helper
it takes the body as a Stream and adds it to a map of ChannelId -> Stream body
when infra requests data from a different helper, network layer associates it in a map of ChannelId -> request for Stream body
when both the Stream body itself, and its request are placed in the map, then infra's request is completed and the Stream body is passed on.

Specifically for receive, there's an open question on the best approach for linking up an incoming Stream body with the request from the infra layer. The tricky part is due to the order of operations -- does the request for the Stream body arrive first, or does the Stream body?

If the Stream body arrives first, then when the request comes in, the network layer can respond with that Stream body from the map right away, and call it a day.

If the request arrives first, how does the network layer respond? The interface for the request expects a Stream, so ideally we would return a Stream that will return Poll::Pending until the Stream body arrives.

I've brainstormed 2 possible solutions with @akoshelev :

The network layer will create a channel and return its receiver (which implements Stream) to the infra layer. It will store the sender in the map of ChannelId -> request for Stream body. When the body actually arrives, spawn a task that sends all values of the body on the channel.
- there would be an extra task spawned per stream, which means an extra context switch for every element of the stream; this will add latency
The network layer will respond with a custom Stream implementation which holds an Arc<Mutex<Option<Stream>>> initialized to None. The poll_ready() function of this stream will return Poll::Pending for as long as the Option is None. When the body actually arrives, set the Option to Some(body_stream), and from that point on the custom Stream will transparently expose the Stream it's holding.
- On every call to poll_ready and poll, it will have to lock the mutex
- For each Stream, there will be only 1 chance of lock contention, and that's when the network layer is setting Some(body_stream). So the frequency of real-world contention will be vanishingly small
- What is the performance implication of frequently locking a mutex that has guaranteed little to no contention?

add a warning on unclosed channels

@andyleiserson - can you add some kind of warning to help us catch these types of cases? This time we caught it, but I can easily see us missing this in the future.
If a channel is still open and expecting like, one more record... and its been like 30 seconds and it hasn't come... maybe log a warning or something?

Originally posted by @benjaminsavage in #497 (comment)

sum_of_products should accept slice of shares and not slice of reference of shares

          This is stupid.

This is here because sum_of_products really wants to accept a slice of references to shares... what we have is vectors of shares. So this converts them into vectors of references...

I attempted to eliminate this, and make sum_of_products just accept a slice of shares... but unfortunately, while this is easy for the semi-honest one, this isn't easy for the malicious version. That's because it attempts to call the semi-honest version, but it doesn't have vectors of replicated shares - it has vectors of malicious replicated...

I couldn't fix this without copy-pasting a lot of code. We can return to this later. There is another optimisation waiting here.

Originally posted by @benjaminsavage in #402 (comment)

Infra send/receive channel buffers should have configurable size

    Should we be configuring the size of the buffers on channels?  For tests, 1 is fine, but it seems like maybe we should let them hold a few more otherwise.

Not for this change, but it seems like we might benefit from it some time.

Originally posted by @martinthomson in #314 (comment)

[Tracking issue] Sort sometime hangs when running on large fields

This is the tracking issue for sporadic failures I am observing while running the following program (bench). This is almost an exact copy of generate_sort_permutation test, but rans faster under bench profile and therefore have a higher chance to be reproduced:

use std::time::Instant;
use futures_util::future::try_join_all;
use rand::Rng;
use raw_ipa::error::BoxError;
use raw_ipa::ff::Fp32BitPrime;
use raw_ipa::protocol::QueryId;
use raw_ipa::test_fixture::{make_contexts, make_world_with_config, TestWorldConfig, validate_and_reconstruct};
use raw_ipa::protocol::sort::generate_sort_permutation::GenerateSortPermutation;
use raw_ipa::ff::Field;

#[tokio::main(flavor = "multi_thread", worker_threads = 3)]
async fn main() -> Result<(), BoxError> {
    let mut config = TestWorldConfig::default();
    config.gateway_config.send_buffer_config.items_in_batch = 1;
    config.gateway_config.send_buffer_config.batch_count = 1000;
    let world = make_world_with_config(QueryId, config);
    let [ctx0, ctx1, ctx2] = make_contexts::<Fp32BitPrime>(&world);
    let num_bits = 64;
    let mut rng = rand::thread_rng();

    let batchsize = 100;

    let mut match_keys: Vec<u64> = Vec::new();
    for _ in 0..batchsize {
        match_keys.push(rng.gen::<u64>());
    }

    let input_len = match_keys.len();
    let mut shares = [
        Vec::with_capacity(input_len),
        Vec::with_capacity(input_len),
        Vec::with_capacity(input_len),
    ];
    for match_key in match_keys.clone() {
        let share_0 = rng.gen::<u64>();
        let share_1 = rng.gen::<u64>();
        let share_2 = match_key ^ share_0 ^ share_1;

        shares[0].push((share_0, share_1));
        shares[1].push((share_1, share_2));
        shares[2].push((share_2, share_0));
    }

    let start = Instant::now();
    let result = try_join_all(vec![
        GenerateSortPermutation::new(&shares[0], num_bits).execute(ctx0),
        GenerateSortPermutation::new(&shares[1], num_bits).execute(ctx1),
        GenerateSortPermutation::new(&shares[2], num_bits).execute(ctx2),
    ]).await?;
    let duration = start.elapsed().as_secs_f32();
    println!("benchmark complete after {duration}s");

    assert_eq!(result[0].len(), input_len);
    assert_eq!(result[1].len(), input_len);
    assert_eq!(result[2].len(), input_len);

    let mut mpc_sorted_list: Vec<u128> = (0..input_len).map(|i| i as u128).collect();
    for (i, match_key) in match_keys.iter().enumerate() {
        let index = validate_and_reconstruct((result[0][i], result[1][i], result[2][i]));
        mpc_sorted_list[index.as_u128() as usize] = u128::from(*match_key);
    }

    let mut sorted_match_keys = match_keys.clone();
    sorted_match_keys.sort_unstable();
    for i in 0..input_len {
        assert_eq!(u128::from(sorted_match_keys[i]), mpc_sorted_list[i]);
    }

    Ok(())
}

Noticeable difference is batch_count setting which is set to 1000. That allows larger buffers but still given that batch_size is set to 1, Infra sends messages one-by-one. There should be no choking in the system with these settings, but it is clearly happens at some point

Continuous validation of inputs

As inputs are provided to helpers, it might make sense for them to be continuously validated.

Conveniently, each helper receives replicated secrets. That means that all values that are received arrive at two helpers. Each helper can hash all of the left shares it receives and periodically compare that with the right shares of the helper to its left. If this disagrees, the process can abort with an error (for now). We can discuss recovery options at some later point.

Oneshot "sort" test is broken

It does not run successfully. Who knows how long it has been broken.

Change breakdown_keys to use XorReplicated to avoid bit-decomposition calls

There is a TODO for this in sort_by_breakdown_key() in aggregate_credit.rs. bit_decompose_breakdown_key is taking the majority of time in the aggregate_credit step.

Python pre-commit fails

Obtaining mp_spdz_compiler from git+ssh://****@github.com/eriktaubeneck/MP-SPDZ.git@39b6d7e22dedc5798823d306741d33ff6635bc25#egg=mp_spdz_compiler (from -r /home/martin/code/raw-ipa/requirements.txt (line 6))
  Cloning ssh://****@github.com/eriktaubeneck/MP-SPDZ.git (to revision 39b6d7e22dedc5798823d306741d33ff6635bc25) to ./.venv/src/mp-spdz-compiler
  Running command git clone --filter=blob:none --quiet 'ssh://****@github.com/eriktaubeneck/MP-SPDZ.git' /home/martin/code/raw-ipa/.venv/src/mp-spdz-compiler
  [email protected]: Permission denied (publickey).
  fatal: Could not read from remote repository.

  Please make sure you have the correct access rights
  and the repository exists.
  error: subprocess-exited-with-error
  
  × git clone --filter=blob:none --quiet 'ssh://****@github.com/eriktaubeneck/MP-SPDZ.git' /home/martin/code/raw-ipa/.venv/src/mp-spdz-compiler did not run successfully.
  │ exit code: 128
  ╰─> See above for output.

Cloning using SSH here seems to be the issue.

Write benchmarks for messaging layer

Overview

as discussed in #84 and offline, we want to identify weak spots in messaging layer implementation. Here is the proposed setup for the benchmark

Circuit width is 1M and depth 64. That should give 64M multiplications.
Use criterion for benchmark.
Use cargo-flamegraph to generate a flamegraph

The desired outcome is to understand the baseline and identify areas where a significant performance improvement can be made

Limiting usage of macros in IPA code

Opening an issue to start a conversation about what is considered appropriate usage of macros and what is not.

I will share my thoughts and would love some feedback before we give an official guidance. My thinking is that generally we should try to avoid adding macros.

Macros are expensive to compile.
Macros support is very limited in many, if not all, modern IDEs.
Macros syntax is not intuitive to understand.
Macros compile errors are not easy to understand and fix. Compiler points to generated code that does not exist in the code editor.

On the other hand, in some places macros are the only way to achieve the desired outcome and adding a macro is totally fine in this case. For instance:

Providing a blanket From/Into implementation for custom struct type conversion where the implementation is the same.
Implementing a trait (finite field is an example)
any sort of repetitive code that cannot be solved by adding a function.

Use LTO for benchmarking

LTO really slows down the build, but it seems to greatly improve performance. I don't know how people feel about the trade-offs, but it seems like I can reduce the overall run time of the IPA test by about 10-15% using it.

actions-rs actions are effectively unmaintained

We're getting warnings about node 12 deprecation, so these will stop working soon-ish.

It looks like we can just run cargo audit for ourselves. We don't get the fancy reporting and whatnot, but that's probably tolerable. We can also probably consolidate the audits; the steps taken are the same.

Ring feedback

Just looking at the code in ring.rs and I'm liking it a fair bit.

There are some things that I think need some consideration though. Some of this only affects the test implementation.

For Message: Send + Serialize: I don't see why you would need Send (and 'static for that matter). If you implement Serialize, you can serialize on the same thread as the send() call is invoked and then you only need to transfer a buffer of bytes.
Box<[u8]> == Vec<u8>.
Buffering. There is buffering in the mpsc::channel. Why do you need to hold messages in a separate one-item buffer (the HashMap)? Given that the receiver is single-threaded (though it is Send), you can just pull from that channel synchronously when calling receive(). The single item buffer already essentially assumes that there is no intermixed messages, so you don't even need to keep the type annotation on each message, just have the serialization fail.
Using Option<Sender> for peers. Better to have a setup object instead; see how I did PRSS. That will allow you to lose the input queue also.

Sorting might take values

Not sure if there's a reason these functions take a reference to sort_keys. It looks like the callers have an owned Vec anyways, and the upgrade functions want to take the raw input by value, so it may be possible to change the argument type to this function and get rid of this clone and several like it.

Originally posted by @andyleiserson in #396 (comment)

Transposing bit-decomposed match-keys is inefficient

Sort wants match keys like this:

[
    [0, 1, 0, 1, 0], // 0th bit of all the match keys
    [1, 1, 0, 1, 0], // 1st bit of all the match keys
    [0, 0, 1, 0, 0], // 2nd bit of all the match keys
...
    [1, 0, 1, 0, 1], // 40th bit of all the match keys
]

This is the current output of the method convert_all_bits.

But we want the transposed matrix for the purpose of computing the "helper bits":

[
    [0, 1, 0, /* lot of bits */, 1] // ALL the bits of the 0th match key
    [1, 1, 0, /* lot of bits */, 0] // ALL the bits of the 1st match key  
    [0, 0, 1, /* lot of bits */, 1] // ALL the bits of the 2nd match key
    [1, 1, 0, /* lot of bits */, 0] // ALL the bits of the 3rd match key
    [0, 0, 0, /* lot of bits */, 1] // ALL the bits of the 4th match key
]

I currently call transpose in the IPA protocol. This is inefficient.

Ideally we would do something like what @martinthomson has proposed:

I think that ultimately what we'll want to do is provide the sort functions with entire input rows, with a means of it keying into that input row with a bit index (or radix N index, I guess).

Update encryption

I just changed the proposal to include the origin of the match key provider. We also need to change from site origin to site registrable domain (which it should have been all along). There are lots of rules about how you construct these inputs, but we can just take them as strings.

Compile tests to use different field implementations

Let's look at this hollistically. In a real protocol execution, we'll primarily be operating in a set field (Fp31 maybe in tests, but obviously something larger in reality). But, sometimes, we will need to do some operations in Fp2 specifically, or maybe we need to upgrade to a larger field for some reason. For that, I think that we can (and maybe should) offer a function that can create Context<...Fp2...> from a Context<...Fp31...>. For those tests that use Fp2, this is a tiny bit more work, but it should keep things clearer.

One thing I've been thinking of doing is conditionally compiling tests to use a different field implementation so that we can be confident that the different fields are working correctly.

That is, something like this:

#[cfg(feature = "test-fpm31")]
type TestField = FpM31;
#[cfg(not(all(feature = "test-fpm31")))]
type TestField = Fp31;

With all mentions of Fp31 in tests replaced with TestField.

Originally posted by @martinthomson in #184 (comment)

s/Identity/Role

The term "identity" is pretty heavily loaded already, but what we really want this type to represent is the role that the helper is assuming in the execution.

This might reflect the fact that a helper might get assigned different roles for different queries.

Besides, it's a shorter string to type.

Channel-per-step for sending

Right now, all sending through the gateway goes through a single MPSC channel. What I would like to move to is a separate sending structure for each ChannelId (i.e., Step + destination Role). That means having a collection of these senders at the context layer.

The problem is that this relies on a context-wide structure. We are currently creating channels on demand, so concurrent access will be challenging. Ideally, accesses to this structure are cheap, but the process for creating new values will complicate that. On the sending end, we need to ensure that multiple channels don't get created in parallel. And on the receiving end of this channel, we need to spawn a task that will receive messages (see also #360).

Alex suggested that we could pre-generate channels from what we know about a protocol, which is a good idea that might build on the #139 nicely. That might allow us to avoid more expensive structures and locking, just to access the channels. We don't have to build that straight away though, because...

For tests. we'll need something a little more flexible. For that, I was thinking that we put a HashMap at the context layer that is guarded with a Mutex and we could simply lock the structure and create the necessary objects as needed.

In doing this, I want to remove Mesh, or at least start to. It doesn't add much value and a move to separate sending and receiving channels will make it cost more to invoke than we want (want to send left? you get a receiver as well, whether you want it or not). A similar structure might be needed to encapsulate the sender's view of the channel, just as there might be a receiver object. That means we'll end up with an API something like:

gateway.send_channel(role, step).send(message).await;

...which can be wrapped up by the context implementation.

We'll need a similar structure for receiving as well, I think, but let's do this one step at a time. I think we are able to safely move sending without disturbing the receiving side too much.

(cc @andyleiserson and @akoshelev)

Deprecate Axum

After running into a few problems working with Axum, as well as hearing about @akoshelev's understanding of current best practices, I think we should consider replacing Axum with an alternative framework.

At a high level, we are noticing that Axum is designed for simple and easy-to-use request/response handling. For our use case, where we are very cognizant of connection throughput, we sometimes need to go under the hood to directly interface with the connection, or manipulate timing and ordering of stages of a request. For example, we need to determine when we can accept the next request. We first need to check headers of a request, then compare with internal state for throughput availability, then finally pull body off of the network buffer. Axum makes this fairly difficult to do correctly, and there are some subtle pieces that feel like taking advantage of UB.

As a starting point, we might consider simply using Tower directly, in combination with h2 for the web server/client.

Tracking issue for building IPA against `wasm-unknown-unknown` target

According to https://github.com/cloudflare/workers-rs this is required to run IPA on CF Workers

private-attribution / ipa Goto Github PK

ipa's Introduction

IPA

This Project

Getting Started

Installation

Building IPA

Running tests

Running Benchmarks

ipa's People

Contributors

Stargazers

Watchers

Forkers

ipa's Issues

ProtocolContext

Step

Footnotes

10000 foot view

Complications

Overview

Work

Potential solutions

Using Streams

Overview

Recommend Projects

Recommend Topics

Recommend Org

Using `Stream`s