emp-toolkit / emp-agmpc Goto Github PK

View Code? Open in Web Editor NEW

40.0 40.0 23.0 40 KB

Global-Scale Secure Multiparty Computation

License: Other

CMake 1.73% Shell 0.37% C++ 97.30% C 0.60%

emp-agmpc's People

Contributors

Stargazers

Watchers

emp-agmpc's Issues

Typo in fpremp.h?

In fpremp.h, each party generates random AND triplets. To do that, each party has authenticated shares of [x] and [y] (the 3*i and 3*i+1 bits in the tr array in fpremp.h), and they engage in a garbled-circuit like protocol to compute a tiny AND gate to then compute the shares of [v]/[z] (will be 3*i+2 bits in the tr array in fpremp.h).

Currently, fpremp.h has every party generate a random tr, and then make the tr zero:

      prg.random_bool(tr, length*bucket_size*3+3*ssp);
      memset(tr, false, length*bucket_size*3+3*ssp);

I kind of feel that the second line is unnecessary and should be removed?

An experiment shows that if the second line makes the evaluate function in fpremp.h always deals with the first column of the garbled table. If we remove the second line, then other columns have the hope of being touched.

Removing the second line still preserves all MACes are correct, as confirmed by running AG-MPC with debug on.

If one wants semi-honest security, should this person just mute a few checks?

It is known that AG-MPC has similar performance with the prior work in semi-honest MPC; however, AG-MPC achieves malicious security.

One who wants the fastest semi-honest MPC may want to start from AG-MPC and turn AG-MPC into semi-honest.

The shortcut is thus to find invocations to SHA256 that are unnecessary for the semi-honest setting.

My observation is that:

No need to cut calls to H outside \Pi_{Pre}, we cannot simply delete any H, but at most think to reduce the hash output size of H from 64 bytes to 48 bytes (and then removing MAC). However, H is a SHA256 hash function supported by CPU instructions, so shrinking the output does not save anything.
Cut the checking in \Pi_{aShare}, basically, set \rho to 0. This saves the invocation to commitment schemes. That is to say, in this implementation, cut check2 in abitmp.h.
Cut the triple checking in \Pi_{LaAND}. This is some code in fpremp.h, from https://github.com/emp-toolkit/emp-agmpc/blob/master/emp-agmpc/fpremp.h#L134 to 227, I guess. This again cuts a few calls to H and commitments.
There is a check in \Pi_{aBit}, yet this seems not involving any invocation to H. This part of the code should be check1 in abitmp.h. Again, we can set \rho to 0.

Any thoughts or advice?

identifiable abort?

One desired property in malicious MPC is identifiable abort, in which if some servers cause the protocol to abort, these servers will be identified. Attempts to make SPDZ with such properties have led to significant overhead.

The AG-MPC protocol, in principle, is authenticated, which is somehow close to identifiability.

Thus, this is a short question: Does AG-MPC have identifiable abort? If not, any high-level idea where it is not identifiable?

storage overhead in the offline-online model

Not really an issue, but likely a direction. The offline-online model allows us to reduce online latency, but it also means that the servers must store the offline preprocessed result (e.g., the many garbled circuits in AG-MPC).

This result is still a little bit big. For 10 parties, the first server stores ~400MB per SHA-256 circuit, mostly the garbled circuits. Other servers only need to store little data, such as the MAC keys for the input/output wires.

So it seems that applications might desire a way to "compress" the offline preprocessed result somehow, such as an alternative offline phase that has fewer copies of circuits, since it seems unavoidable but also kind of unideal for the first party to keep $N-1$ circuits.

Replace fixed key for the random linear combination in aBit

This is likely left as a future TODO.

To obtain maliciously secure n-party authenticated bits, the parties, in a pairwise manner, run COT. This is followed by a random validity check.

discussed in Step 3, Figure 14 of the AG-MPC paper: https://eprint.iacr.org/2017/189.pdf
implemented in fpremp.h#L194

In practice, this linear combination has to be generated online, which requires a coin-tossing functionality.

This can be implemented by having each party commit to the random numbers (or a PRNG seed), broadcast the commitments, and open the commitments. See Appendix A.1 in https://eprint.iacr.org/2019/1104.pdf for more details.

AG-MPC with FerretCOT?

It seems that FerretCOT architecture has been stabilized. Would AG-MPC use FerretCOT by default? Any suggestions on the number of threads to be used? (How major the computation cost is in FerretCOT?)

I would cc @carlweng here!

A candidate implementation for flexible input/output for BristolFormat

Ryan Deng (@ryandeng1) and I implemented a fork of EMP-AGMPC that supports flexible input/output, which suffices for reactive computation.

Fork: https://github.com/n-for-1-auth/emp-agmpc-flex-in-out

Features

Basically, inputs and outputs are flexible in the following ways.

Each bit of the input can be in one of four modes:

Secret-shared, un-authenticated. (denoted by -2) Each party provides a bit, and the actual input to the GC is the XOR of all the parties' bit. This is useful for providing a random tape.
Secret-shared, authenticated. (denoted by -1) Parties provide an authenticated bit (bit, mac, key), likely from a previous execution under the same delta key.
Public (denoted by 0). All party enforces this input bit to be something agreed publicly.
Private (denoted by party ID). One party provides this bit.

Each bit of the output can be in one of three modes:

Secret-shared, authenticated. (denoted by -1) Parties obtain an authenticated bit representing the output. This authenticated bit can then be fed into another AGMPC execution, if under the same delta key.
Public (denoted by 0). All party sees this output.
Private (denoted by party ID). One party sees this bit.

API

The input array is replaced by FlexIn<nP>. Parties first agree on the mode (here, "party") of each bit.

FlexIn<nP> in(cf.n1 + cf.n2, party);
for(int i = 0; i < 64; i++) {
	in.assign_party(i, ALICE);
}
for(int i = 64; i < cf.n1; i++) {
	in.assign_party(i, BOB);
}

And the party can feed a bit or an authenticated bit into it.

for(int i = 0; i < cf.n1; i++){
	in.assign_plaintext_bit(i, i % 2 == 0);
}
for(int i = 0; i < cf.n2; i++) {
	AuthBitShare<nP> mabit = out.get_authenticated_bitshare(i);
	in.assign_authenticated_bitshare(cf.n1 + i, &mabit);
}

The output array is replaced by FlexOut<nP>, which works in a similar fashion.

FlexOut<nP> out(cf.n3, party);
for(int i = 0; i < 32; i++) {
	out.assign_party(i, ALICE);
}
for(int i = 32; i < 64; i++) {
	out.assign_party(i, BOB);
}
for(int i = 64; i < cf.n3; i++) {
	out.assign_party(i, 0);
}

One can get an authenticated bit like this, after the online phase.

out.get_authenticated_bitshare(i);

Or, if the party is allowed to see the plaintext value, it can do:

if(party == ALICE) {
	for (int i = 0; i < 32; i++) {
		cout << out.get_plaintext_bit(i) << " ";
	}
	for (int i = 32; i < 64; i++) {
		cout << "x" << " ";
	}
	for (int i = 64; i < cf.n3; i++) {
		cout << out.get_plaintext_bit(i) << " ";
	}
}

Test

The code has been tested for 2 parties over the AES circuit. This test script tries all the functionalities and compares them with the original input/output framework.

https://github.com/n-for-1-auth/emp-agmpc-flex-in-out/blob/master/test/test_in_out.cpp

Potential improvement

There are two potential improvements to the code.

Due to laziness, the current implementation sends out a lot of dummy data (and they are all zeroes) during the input/output. Careful implementation of the broadcast will get rid of such communication.
Due to laziness, the MAC checking is not batched. That is to say, one party actually sends out the MAC of an authenticated bit for MAC testing. We know this could be reduced by using a hash over the MAC.

We feel this should be fine since input/output is often small, compared with the circuit size. But this may not always be the case.

Roadmap to merging

Though the commit is small, there are two obstacles.

The memory corruption issue of #10.
In this implementation, passing authenticated bits between different CMPC objects requires the same delta key. Our fork hardcodes this delta key, which is insecure. We might need a way to generate and use the same delta key across a number of instances that require such reactive computation (and not do so when not needed). Ferret's COT has a setup phase that seems relevant.

So, the code is currently left as a fork.

Potential next steps

Note that EMP-AGMPC currently only supports BristolFormat, but, especially given this framework, BristolFashion should work as well. A possible direction is to allow easy conversion from BristolFormat to BristolFashion or merge their API.

Credits

This implementation is made when working on https://eprint.iacr.org/2021/342.

We also leveraged some code snippets of Senate by @podcastinator and @sukritkalra.

output MAC check?

It seems that the implementation does not check the MAC of the outputs (even with __debug turned on). I feel that it is okay for benchmark purposes since the overhead is comparably small.

But, should the code for output MAC check be added (which would be invoking check_MAC)? Or should we leave as it is?

My local test, which tampers party 2's value before https://github.com/emp-toolkit/emp-agmpc/blob/master/emp-agmpc/mpc.h#L416, suggests that one malicious party can, however, tamper with the output since the party 1 does not check the MAC.

Doesn't build on clean Ubuntu 18.04

cmake -DENABLE_FLOAT=False . as seen in the install script does not work on a clean Ubuntu 18.04 install.

CMake Error at cmake_install.cmake:41 (file):
   file INSTALL cannot find "/root/emp-agmpc/cmake".

All you need to do is remove
the line

install(DIRECTORY cmake/ DESTINATION cmake)

from the CMakeLists.txt file

Amortized fun-indep would be very different from non-amortized fun-indep

This is likely a note for reference of performance evaluation: amortized cost can be significantly smaller than the non-amortized cost for the function-independent phase.

Example: If parties are going to run a lot of small circuits (e.g., run AES circuits for one million times), they can:

use amortization. Do a big function-independent phase that is sufficient for one million invocations (or one thousand, also sufficient). Note that these circuits would need to use the same delta key.
not use amortization. Do a function-independent phase for each invocation.

The latter will show performance numbers that are suboptimal.

Benefits of amortization: Based on some evaluations, it seems that amortization affects the evaluation in many ways.

First, and most obviously and severely, the network latency, as already noted in [WRK17].

The function-independent phase has a lot of rounds. To generate 10k AND triples, using this repo (aka IKNP) and 2Gbit/s network, with two parties,

For RTT = 20 ms, per AND triple cost is 38 us.
For RTT = 4 ms, per AND triple cost is 9 us.

If one instead generates 10^7 AND triple, then one can get similar numbers that are insensitive to network latency: 3.9 us.

Second, also obviously and severely when bandwidth is limited, the authenticated bit cost if using Ferret.

This one goes without saying. Ferret generates a large batch of AND triples at the same time. And one would miss the opportunities if not all AND triples are consumed. Ferret's benefits are more obvious when machines are powerful and when the network is slow.

Third, the bucket size needed for removing the leakage in leaky AND triples.

Most common circuits would use more than 3100 AND gates, so bucket_size = 4. But for a bucket with size 280000 or more, bucket_size = 3, and this is indeed a non-negligible saving (if network latency is handled).

This means that if one can do a batch function-independent phase, which does not need to be large---10^7 AND gates are not that expensive---then one could always enjoy the bucket size of 3.

Implications for benchmark: This may suggest that benchmarks on one invocation of small circuits would need significant cautions, as network latency may dominate.

Furthermore, the function-dependent phase shows a similar situation---if the circuit is very small, the network latency may dominate.

Implications for the platform: This may suggest that for benchmark purposes, people may want to get good numbers, and reasonable amortization would be considered.

Currently, emp-agmpc does not provide a separation between online/offline, nor a useful tool for benchmarking that considers amortization at heart. In the future, this may be an interesting topic on its own, as it benefits paper writers. E.g., how to store the offline phase result efficiently is another problem (#11).

inProdTableBlock should be {zero_block, one_block}?

This is related to the inProd function in this repo.
https://github.com/emp-toolkit/emp-agmpc/blob/master/emp-agmpc/helper.h#L15

It uses the following selection pad:

const static block inProdTableBlock[] = {zero_block, makeBlock(0xFFFFFFFFULL, 0xFFFFFFFFULL)};

However, the one pad here is not fully one. In fact, a fully one pad should be:

const block all_one_block = makeBlock(0xFFFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF);

Note that emp-tool's 'block.h offered const block select_mask[2] = {zero_block, all_one_block};, which could be a drop-in replacement here.

Installation Command

The following installation command worked for me as opposed to single-dash parameters provided in the README.md.

python install.py -install --tool --ot --agmpc

double_block -> sigma?

It seems that emp-toolkit is replacing double_block with sigma.

The double_block is removed from emp-tool in this commit emp-toolkit/emp-tool@d707eac

However, emp-agmpc still uses double_block, as in here: https://github.com/emp-toolkit/emp-agmpc/blob/master/emp-agmpc/mpc.h#L443

Are we going to see a new version of emp-agmpc recently, which would use sigma?

threshold AG-MPC?

AG-MPC is full-threshold, meaning that one honest party is sufficient for security.

In some settings, one may desire threshold security in exchange for better performance, e.g., assuming 20 out of 100 parties are honest, while the rest are malicious.

For arithmetic settings, SCALE-MAMBA provides such a threshold option, using Shamir secret sharing.

But for Boolean circuit settings, a very flexible threshold setting seems missing. However, circuit representations matter, such as AES is commonly expressed in Boolean circuits.

Are any ongoing projects over emp-toolkit aimed for such a threshold setting? Any recommended readings over the malicious + threshold setting for Boolean circuits?

Thanks!