zkbob / fawkes-crypto Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zeropoolnetwork/fawkes-crypto

2.0 1.0 1.0 628 KB

Fawkes-Crypto - zkSNARKs framework

Home Page: https://github.com/zeropoolnetwork/fawkes-crypto

License: Apache License 2.0

Rust 100.00%

fawkes-crypto's Introduction

Fawkes-Crypto - zkSNARKs framework

Abstract

Fawkes-Crypto is a lightweight framework for building circuits in bellman, using groth16 proving system and BN254 curve.

The framework is targeted to use best practices for circuit building from circom and sapling-crypto.

Final fields and circuit math are wrapped and operators are implemented, so, in most cases if you want to type a+b, you can do it.

Example

Here is an example, how Merkle tree implementation is working. Also you may check the rollup here.

#[derive(Clone, Signal)]
#[Value="MerkleProof<CS::F, L>"]
pub struct CMerkleProof<'a, CS: ConstraintSystem, const L: usize> {
    pub sibling: SizedVec<CNum<'a, CS>, L>,
    pub path: SizedVec<CBool<'a, CS>, L>
}


pub fn c_poseidon_merkle_proof_root<'a, CS: ConstraintSystem, const L: usize>(
    leaf:  &CNum<'a, CS>, 
    proof: &CMerkleProof<'a, CS, L>,
    params: &PoseidonParams<CS::F>
) -> CNum<'a, CS> {
    let mut root = leaf.clone();
    for (p, s) in proof.path.iter().zip(proof.sibling.iter()) {
        let first = s.switch(p, &root); 
        let second = &root + s - &first;
        root = c_poseidon( [first, second].as_ref(), params);
    }
    root
}

Signal is a sparse linear combination of inputs, based on ordered linked list, so we perform arithmetics with Signal with U(N) complexity. With Signal bellman will allocate additional inputs only when you really need it (for example, in the case when you multiply two nonconstant Signal). If you perform multiplication with constant or zero Signal, no additional inputs will be allocated.

Benchmarks

Circuit	Constraints	Per bit
poseidon hash (4, 8, 54)	255	0.33
jubjub oncurve+subgroup check	19
ecmul_const 254 bits	513	2.02
ecmul 254 bits	2296	9.04
poseidon merkle proof 32	7328
poseidon eddsa	3860
rollup 1024 txs, 2^32 set	35695616

At i9-9900K rollup is proved for 628 seconds.

Source code of the rollup is available at https://github.com/snjax/fawkes-rollup.

Circuit improvements

We are using indeterministic subgroup checks, performing most part of computations as witness-only and perform cofactor multiplication at the circuit.
ecmul and ecmul_cost operations are working assuming that the base point is in the subgroup. This allows us to use Montgomery (0, 0) point as adder initial state. Then the adder never reaches zero point and subgroup point, because (0, 0) is not in subgroup and we can use cheap montgomery_add circuit safely.
improved compconstant circuit. The same PR into circomlib available here

See more as ethresear.ch here.

Authors

Igor Gulamov

Disclaimer

Fawkes-Crypto has not been audited and is provided as is, use at your own risk.

License

Fawkes-Crypto is available under Apache License 2.0 license or MIT as your choice.

fawkes-crypto's People

Contributors

Stargazers

Watchers

Forkers

cleancoindev

fawkes-crypto's Issues

Move gates parsing from proving

I have profiled our proving process and discovered that a significant portion of the process involves decompressing and parsing gates.

We can decompress and parse the gates during the parameters initialization phase, which means we can move this process from the proving phase to the initialization phase. Since parameter initialization now takes place in the background, preprocessing will mostly be finished before the first proving in most cases.

I've tried to implement it, and here are some results of the decreased proving time:

Laptop (WASM) (Intel(R) Core(TM) i7-10750H): -18 %
Laptop (Native) (Intel(R) Core(TM) i7-10750H): -35 %
Laptop (WASM) (Intel Core i7 (2.9 GHz, 4 cores\8 threads)): -14 %
MAC M1 (WASM): -16 %
Android (WASM) (Galaxy S23): -18 %
iOS (WASM) (iPhone 11): -18 %

But the size of the decompressed gates is huge, it's about 260 MB. Here are some potential drawbacks of this modification:

Parameter initialization time is likely to increase due to the need to decompress and parse the gates. However, it is important to note that this is a one-time cost during the initialization phase and it can result in significant time savings during the proving phase.
The RAM usage of the application will increase from 420 MB to 680 MB.
WebAssembly.Memory.grow(..) is extremely slow on iOS, so we need to compute the necessary memory size and allocate it with a single call. This results in some time overhead during the parameter initialization phase.
Mobile versions of Chrome and Safari have a strict limit on memory usage (500-800 mb, it depends on fragmentation), and attempting to exceed that limit can cause the application to crash. So this optimization is generally not feasible for mobile devices.

I think that this optimization makes sense for delegated prover and cloud environments, and may also make sense for PC. However, it is definitely not viable for mobile devices due to their limited memory capacity.

Related PRs:

Note: It is necessary to update zkbob-cloud and zkbob-prover to include the changes introduced in this PR.

Parallel proof creation in ZeroPool

zkcrypto/bellman#69
zkcrypto/bellman#71
make https://github.com/zeropoolnetwork/phase2-bn254/blob/master/bellman/src/multicore.rs wasm compatible as opposed to
https://github.com/zeropoolnetwork/phase2-bn254/blob/master/bellman/src/multicore.rs

Optimize full-width u64 multiplications for wasm target

The main bottleneck of the cryptography we're using is the full-width u64 multiplication. In the case of wasm it compiles into __multi3 function that is relatively slow. One easy way to target this problem (probably not the best though) is to replace:

(x as u128) * (y as u128)

with

// x = x_1 * 2^32 + x_0
let x_0 = (x as u32) as u64;
let x_1 = x >> 32;

// y = y_1 * 2^32 + y_0
let y_0 = (y as u32) as u64;
let y_1 = y >> 32;

// x * y = (x_1 * 2^32 + x_0)(y_1 * 2^32 + y_0) = x_1*y_1 * 2^64 + (x_1*y_0 + x_0 * y_1)*2^32 + x_0*y_0 = 
// = z_2 * 2^64 + z_1 * 2^32 + z_0
let z_0 = x_0 * y_0;
let z_2 = x_1 * y_1;

let z_1 = u128::from(x_0 * y_1) + u128::from(x_1 * y_0);

(u128::from(z_2) << 64) + (z_1 << 32) + u128::from(z_0)

for wasm target.

It may not look clear but this way we're replacing one u128 multiplication with four u64 multiplications and a couple additions.

This optimization in the fawkes-crypto leads to speeding up synchronization time by ~15 % and improves proving time a bit. Similarly, we can implement it in phase2-bn254 and it can improve proving time by ~ 13 %.

Optimize native poseidon implementation

An efficient implementation of the poseidon is presented in Supplementary Material B of https://eprint.iacr.org/2019/458.pdf.
This optimization can't be applied to circuit and works only for native implementation.

zkbob / fawkes-crypto Goto Github PK

fawkes-crypto's Introduction

Fawkes-Crypto - zkSNARKs framework

Abstract

Example

Benchmarks

Circuit improvements

Authors

Disclaimer

License

fawkes-crypto's People

Contributors

Stargazers

Watchers

Forkers

fawkes-crypto's Issues

Recommend Projects

Recommend Topics

Recommend Org