Giter Site home page Giter Site logo

yuewuo / fusion-blossom Goto Github PK

View Code? Open in Web Editor NEW
58.0 2.0 15.0 5.42 MB

A fast minimum-weight perfect matching solver for quantum error correction

License: MIT License

C++ 0.04% Rust 55.69% HTML 3.57% JavaScript 4.89% Python 32.95% Gnuplot 2.69% Shell 0.03% TeX 0.02% Makefile 0.12%

fusion-blossom's Introduction

Build Status Test Status

Fusion Blossom

A fast Minimum-Weight Perfect Matching (MWPM) solver for Quantum Error Correction (QEC)

Please see our tutorial for a quick explanation and some Python demos.

Our paper is out on arXiv! https://arxiv.org/abs/2305.08307

Key Features

  • Correctness: This is an exact MWPM solver, verified against the Blossom V library with millions of randomized test cases .
  • Linear Complexity: The average decoding time is roughly $O(N)$ given small physical error rate, proportional to the number of defect vertices $N$.
  • Parallelism: A single MWPM decoding problem can be partitioned and solved in parallel, then fused together to find an exact global MWPM solution.
  • Simple Interface: The graph problem is abstracted and easy-to-use for QEC applications. Especially, it supports decoding erasure errors (by setting some edge weights to 0 on the fly).

Benchmark Highlights

  • In circuit-level noise model with $p$ = 0.001, code distance $d$ = 21, rotated CSS code, 100000 measurement rounds
    • single-thread: 3.9us per defect vertex or 13us per measurement round
    • 64-threads: 95ns per defect vertex or 0.3us per measurement round
    • in streaming mode, the latency is 0.7ms regardless of rounds of measurement

Background and Key Ideas

MWPM decoders are widely known for its high accuracy [1] and several optimizations that further improves its accuracy [2]. However, there weren't many publications that improve the speed of the MWPM decoder over the past 10 years. Fowler implemented an $O(N)$ asymptotic complexity MWPM decoder in [3] and proposed an $O(1)$ complexity parallel MWPM decoder in [4], but none of these are publicly available to our best knowledge. Higgott implemented a fast but approximate MWPM decoder (namely "local matching") with roughly $O(N)$ complexity in [5]. With recent experiments of successful QEC on real hardware, it's time for a fast and accurate MWPM decoder to become available to the community.

Our idea comes from our study on the Union-Find (UF) decoder [6]. UF decoder is a fast decoder with $O(N)$ worst-case time complexity, at the cost of being less accurate compared to the MWPM decoder. Inspired by the Fowler's diagram [3], we found a relationship between the UF decoder [7]. This nice animation (press space to trigger animation) could help people see the analogy between UF and MWPM decoders. With this interpretation, we're able to combind the strength of UF and MWPM decoders together.

  • From the UF decoder, we learnt to use a sparse decoding graph representation for fast speed
  • From the MWPM decoder, we learnt to find an exact minimum-weight perfect matching for high accuracy

We shall note that Oscar Higgott and Craig Gidney independently reach the same approach of using sparse decoding graph to solve MWPM more efficiently in PyMatching V2 [8]. It's impressive to see that they have much better single-thread performance than fusion-blossom (about 5x-20x speedup). They use more optimizations like priority queue that appears in existing literature of blossom algorithms (check their paper at [9]). Also, they use C++ language rather than Rust programming language. Rust enforces memory safety which adds some runtime overhead (like vector boundary check and unnecessary locks in parallel programming). We use unsafe Rust to improve the speed by 1.5x-3x but these features are not yet enabled in the Python library and only available when using the native Rust library. In fact, PyMatching V2's optimizations and fusion-blossom's parallel computation (fusion operation) are compatible with each other. With their impressive work, we're looking forward to another 5x-20x speed of the parallel MWPM decoder if combined together. For now, user should use PyMatching for better CPU efficiency in large-scale simulations, and use fusion-blossom library as an educational tool with visualization tool and entry-level tutorials.

Demo

We highly suggest you watch through several demos here to get a sense of how the algorithm works. All our demos are captured from real algorithm execution. In fact, we're showing you the visualized debugger tool we wrote for fusion blossom. The demo is a 3D website and you can control the view point as you like.

Click the demo image below to view the corresponding demo

Serial Execution

Parallel Execution (Shown in Serial For Better Visual)

Usage

Our code is written in Rust programming language for speed and memory safety, but it's hardly an easy language to learn. To make the decoder more accessible, we bind the library to Python and user can simply install the library using pip3 install fusion-blossom.

We have several Python demos at the tutorial website . Also there is a simple example for decoder, and you can run it by cloning the project and run python3 scripts/demo.py.

For parallel solver, it needs user to provide a partition strategy. Please check our paper for a thorough description of how partition works.

Interface

Sparse Decoding Graph and Integer Weights

The weights in QEC decoding graph are computed by taking the log of error probability, e.g. $w_e = \log{(1-p)/p}$ or roughly $w_e = -\log{p}$, we can safely use integers to save weights by e.g. multiplying the weights by 1e6 and truncate to nearest integer. In this way, the truncation error $\Delta w_e = 1$ of integer weights corresponds to relative error $\Delta p /{p}=10^{-6}$ which is small enough. Suppose physical error rate $p$ is in the range of a positive f64 variable (2.2e-308 to 1), the maximum weight is 7e7,which is well below the maximum value of a u32 variable (4.3e9). Since weights only sum up in our algorithm (no multiplication), u32 is large enough and accurate enough. By default we use usize which is platform dependent (usually 64 bits), but you can

We use integer also for ease of migrating to FPGA implementation. In order to fit more vertices into a single FPGA, it's necessary to reduce the resource usage for each vertex. Integers are much cheaper than floating-point numbers, and also it allows flexible trade-off between resource usage and accuracy, e.g. if all weights are equal, we can simply use a 2 bit integer.

Note that other libraries of MWPM solver like Blossom V also default to integer weights as well. Although one can change the macro to use floating-point weights, it's not recommended because "the code may even get stuck due to rounding errors".

Tests

In order to test the correctness of our MWPM solver, we need a ground-truth MWPM solver. Blossom V is widely-used in existing MWPM decoders, but according to the license we cannot embed it in this library.To run the test cases with ground truth comparison or enable the functions like blossom_v_mwpm, you need to download this library at this website to a folder named blossomV at the root directory of this git repo.

wget -c https://pub.ist.ac.at/~vnk/software/blossom5-v2.05.src.tar.gz -O - | tar -xz
cp -r blossom5-v2.05.src/* blossomV/
rm -r blossom5-v2.05.src

Visualization

Visualize the solving procedure of a single decoding problem

To start a server, run the following

cd visualize
python3 server.py  # server

# to view it in a browser interactively, you can run the following command to get url
cd ..
cargo test visualize_paper_weighted_union_find_decoder -- --nocapture

# alternatively, you can choose to render locally
npm install  # to download packages
node index.js <url> <width> <height>  # local render

Visualize parallel execution of multiple decoding problems

In order to understand the bottleneck of parallel execution, we wrote a visualization tool to display the execution windows of base partitions and fusion operations on multiple threads. Fusion operation only scales with the size of the fusion boundary and the depth of active partitions, irrelevant to the base partition's size. We study different partition and fusion strategies in our paper. Below shows the parallel execution on 64 threads. Blue blocks are base partitions, each consists of 49 rounds of measurement. Green blocks are fusion operations, a single round of measurement sandwiched by two neighbor base partitions. You can click the image which jumps to this interactive visualization tool.

TODOs

  • support erasures in parallel solver

Acknowledgements

This project is funded by NSF MRI: Development of PARAGON: Control Instrument for Post NISQ Quantum Computing

References

[1] Fowler, Austin G., et al. "Topological code autotune." Physical Review X 2.4 (2012): 041003.

[2] Criger, Ben, and Imran Ashraf. "Multi-path summation for decoding 2D topological codes." Quantum 2 (2018): 102.

[3] Fowler, Austin G., Adam C. Whiteside, and Lloyd CL Hollenberg. "Towards practical classical processing for the surface code: timing analysis." Physical Review A 86.4 (2012): 042313.

[4] Fowler, Austin G. "Minimum weight perfect matching of fault-tolerant topological quantum error correction in average $ O (1) $ parallel time." arXiv preprint arXiv:1307.1740 (2013).

[5] Higgott, Oscar. "PyMatching: A Python package for decoding quantum codes with minimum-weight perfect matching." ACM Transactions on Quantum Computing 3.3 (2022): 1-16.

[6] Delfosse, Nicolas, and Naomi H. Nickerson. "Almost-linear time decoding algorithm for topological codes." Quantum 5 (2021): 595.

[7] Wu, Yue. APS 2022 March Meeting Talk "Interpretation of Union-Find Decoder on Weighted Graphs and Application to XZZX Surface Code". Slides: https://us.wuyue98.cn/aps2022, Video: https://youtu.be/BbhqUHKPdQk

[8] Higgott, Oscar. PyMatching v2 https://github.com/oscarhiggott/PyMatching

[9] Higgott, Oscar, and Craig Gidney. "Sparse Blossom: correcting a million errors per core second with minimum-weight matching." arXiv preprint arXiv:2303.15933 (2023).

fusion-blossom's People

Contributors

atomgardner avatar yangliuwillow avatar yuewuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fusion-blossom's Issues

Better visualization tool that supports Jupyter notebook

Currently the visualization tool needs a local server to host the exported visualization json file, and the user needs to open it in a separate browser tab. We have explored a way to embed the whole visualization tool using IPython and thus can use within Jupyter notebook. See https://github.com/yuewuo/jupyter-embedded-visualizer-example for such an example.

The goal is to migrate the visualization tool to the new technique, and thus enabling easy visualization when using jupyter notebook.

Missing field 'dynamic weights' in the generated .syndrome files

Hello,

When running the tests in ./benchmark/paper_fusion_circuit_level/decoding_latency_T, we encountered an error that says:

thread 'main' panicked at 'called Result::unwrap() on an Err value: Error("missing field dynamic_weights", line: 1, column: 275)', src\example_codes.rs:1200:94

It refers to the absence of a 'dynamic weights' field in the generated ".syndrome" file. Therefore, we manually added the 'dynamic weights' part in these files and leave the content empty. Eventually these tests run successfully, and we obtain latency results close to the results in your paper. But we are still curious if the way we change the file affects the final results.

It would be great if you could check :>

understanding the accuracy implication of `max_tree_size`

As explain in [1,2], the only difference between the MWPM decoder and the Union-Find decoder is whether they maintain detailed structures within each cluster. Recently, I added a parameter called max_tree_size which allows one to control how much detail they want to preserve inside each cluster. When max_tree_size = 0, it's equivalent to UF decoder; when max_tree_size = infinity, it's equivalent to MWPM decoder. This parameter actually generates a full spectrum of decoders between MWPM and UF decoder.

Now, the question is, what parameter should I choose to reach the same decoding accuracy of MWPM? For example, if the logical error rate of MWPM decoder is $p_M$ and the logical error rate of UF decoder is $p_U$, generally $p_U &gt; p_M$. By tuning this parameters, we can create a decoder with logical error rate of $p_L$ which has $P_U \ge p_L \ge p_M$. Say, if we want to reach at most 20% more logical error rates in exchange for faster decoding, what max_tree_size should I choose? That is, what is the minimum max_tree_size value that satisfies $p_L \le p_M \times 1.2$?

Furthermore, does this parameter changes with the size of the code? For example, do we need to increase the minimum max_tree_size with code distance $d$?

It is your choice to start with any code, and it's quite easy to benchmark fusion blossom logical error rate using QEC-Playground. For example, you can run this command in QEC-Playground to obtain the logical error rate:

# for parameters:
cargo run --release -- tool benchmark --help
# code capacity noise model: 2D graph
cargo run --release -- tool benchmark '[5]' '[0]' '[0.05]' --decoder fusion -p10
cargo run --release -- tool benchmark '[5]' '[0]' '[0.05]' --decoder fusion --decoder-config '{"max_tree_size":0}' -p10
cargo run --release -- tool benchmark '[5]' '[0]' '[0.05]' --decoder fusion --decoder-config '{"max_tree_size":3}' -p10
# circuit-level noise model (the same with `stim`)
cargo run --release -- tool benchmark '[5]' '[5]' '[0.005]' --noise-model stim-noise-model --decoder fusion -p10
cargo run --release -- tool benchmark '[5]' '[5]' '[0.005]' --noise-model stim-noise-model --decoder fusion --decoder-config '{"max_tree_size":0}' -p10

[1] Wu, Yue, and Lin Zhong. "Fusion blossom: Fast mwpm decoders for qec." 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). Vol. 1. IEEE, 2023.
[2] Wu, Yue, Namitha Liyanage, and Lin Zhong. "An interpretation of union-find decoder on weighted graphs." arXiv preprint arXiv:2211.03288 (2022).

optimize dual module with priority queue

Sparse Blossom is another open-source implementation of MWPM decoder [1] that achieves faster single-core execution that Fusion Blossom [2]. The key ingredient of their speedup is the use of priority queue to maintain the future events so that we don't have to iterate over each cluster to check if an "obstacle" occurs. This speedup is especially important when the weights in the graph is diverse, e.g., in complex noise models or running behind a belief-propagation decoder (aka BP-MWPM or belief-matching decoder).

Can we combine these two papers to have a decoder that runs fastest in single core as well as providing parallel decoding? As a first step, let's optimize the dual module to embed priority queue optimization. This should be another dual module implementation called dual_module_pq.rs as an alternative to our existing implementation of dual_module_serial.rs. No parallel support is needed for now, but it's good to have that in mind.

[1] Higgott, Oscar, and Craig Gidney. "Sparse Blossom: correcting a million errors per core second with minimum-weight matching." arXiv preprint arXiv:2303.15933 (2023).
[2] Wu, Yue, and Lin Zhong. "Fusion blossom: Fast mwpm decoders for qec." 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). Vol. 1. IEEE, 2023.

documentation of configuring parallel parameters

The tutorial documentation of parallel computing is missing. Write the documentation of configuration graph partitions and fusion plans according to the descriptions in the paper [1]. The documentation should be placed in tutorial/src.

Note that in order to improve speed, it's required that each partition contains the vertices with continuous indices. This is always achievable by reordering vertices. By designing a better interface in Python, maybe we could hide this complexity, by always letting user to describe partitions as a collection of vertices instead of a continuous indices region?

[1] Wu, Yue, and Lin Zhong. "Fusion blossom: Fast mwpm decoders for qec." 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). Vol. 1. IEEE, 2023.

Rust programming: fix build error in latest rust compiler

In fusion blossom implementation, we use an optional feature dangerous_pointer to enable raw pointer in the graph instead of RwLock to improve the performance (by almost 2x). However, compiling the code with the latest Rust compiler will give the following error.

error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`

This could lead to undefined behavior, so please fix this with an implementation using UnsafeCell. For details please check https://doc.rust-lang.org/book/ch15-05-interior-mutability.html and https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html.

To test your modification, modify rust-toolchain.toml to channel = "nightly" (removing the version). Also you need to install the latest Rust compiler by running rustup update.

Rust crate

Hello, I was wondering if a rust crate is accessible (on crates.io).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.