This is already a lot but not all, I'm posting it to kickstart the discussion
Definition
In quantum compilation, the term "decomposition" is quite overloaded since many techniques and transformations can fit the idea of "breaking down" (transforming) one gate into a sequence of other gates. Hence, it will be necessary for us to pinpoint what we mean by "decomposition" in the scope of the compiler. One tentative definition is the following:
We define "decomposition" as systematically breaking down high-level gates into a series of lower-level ones. A decomposition technique builds this lower-level sequence by applying some construction rule(s). We emphasize "systematic" because it is the characteristic that differentiates decomposition from other transformation techniques. For example, compared to synthesis, decomposition techniques should be faster and produce predictable results, i.e., we know the result in the number of operations and the number of qubits in advance.
Note that this definition still has some loose ends; specifically, it needs more clarity about which gates are considered high-level and low-level. As we will see next, such clarification requires more context.
Role in compilation
While compiling a technology-independent quantum program, high-level operations must often be broken down into lower-level ones. For example, transforming a two-control Z
operation into a sequence of CNOT
s and T
s:
┌───┐
───●──── ──────────────●───────────────────●──────●─────────●──┤ T ├
│ │ │ │ │ └───┘
│ │ │ ┌─┴─┐┌───┐┌─┴─┐┌───┐
───●─── = ────●─────────┼─────────●─────────┼────┤ X ├┤ ┴ ├┤ X ├┤ T ├
│ │ │ │ │ └───┘└───┘└───┘└───┘
┌─┴─┐ ┌─┴─┐┌───┐┌─┴─┐┌───┐┌─┴─┐┌───┐┌─┴─┐ ┌───┐
─┤ z ├─ ──┤ X ├┤ ┴ ├┤ X ├┤ T ├┤ X ├┤ ┴ ├┤ X ├─────────────────┤ T ├
└───┘ └───┘└───┘└───┘└───┘└───┘└───┘└───┘ └───┘
(Note: ┴
denotes the adjoint of T
)
We call the above a decomposition pattern, and applying it to a circuit objectively breaks down high-level operations in sequences of low-level ones. (This is always the case when breaking down multi-control operations.) However, what is considered high-level (low-level) changes depending on the target device. For example, suppose we want to try a program on different devices. In one of these devices, the CNOT
is a primitive operation, while in another, it is not, but CZ
is. Hence, for the first device, we need to decompose CZ
but not CNOT
, while for the second is the other way around. We could use the following patterns:
CZ decomposition
|
CNOT decomposition
|
───●─── = ────────●────────
│ │
┌─┴─┐ ┌───┐┌─┴─┐┌───┐
─┤ Z ├─ ─┤ H ├┤ X ├┤ H ├─
└───┘ └───┘└───┘└───┘
|
───●─── = ────────●────────
│ │
┌─┴─┐ ┌───┐┌─┴─┐┌───┐
─┤ X ├─ ─┤ H ├┤ Z ├┤ H ├─
└───┘ └───┘└───┘└───┘
|
Ultimately, the compilation process must implement the program using only the primitive unitary operators supported by the specific quantum device. In this sense, the compiler cannot blindingly decompose operations; the process must move towards these primitive operations and, hopefully, not get lost or enter a loop. For example, if we were decomposing a two-control Z
operation (CCZ
) while targeting the second device, we could first apply the pattern for CCZ
and then use the pattern for CZ
---meaning that we compose decomposition patterns: by applying one pattern, we create a sequence of gates that might itself be further decomposed. Note that we would enter a loop if we tried to apply both the CZ
and CNOT
decomposition patterns.
Before moving on, let's further suppose that H
is not a primitive operation in these devices, but Rx
is. We would need to transform these Hadamards gates into Rx
ones---which, however, is a 1-1 transformation and doesn't fit our decomposition definition because it doesn't "break down" an operation.
Implementation
This RFC discusses two alternative ways of implementing decomposition into our compilation flow. However, those different ways might not be alternatives to each other but complementary.
Quake or QTX
First, we should address "where" decomposition must happen in the compilation flow. The compiler provides two dialects to represent quantum programs, Quake and QTX, and both have a similar level of abstraction regarding quantum operations. For example, both can have an X
operation with an arbitrary number of controls. The difference lies elsewhere: Quake uses memory (reference) semantics and thus is not an SSA representation, while QTX is---in fact, QTX has a few quirks concerning arrays to keep them in an SSA form. The following example shows this difference:
Quake
|
QTX
|
func.func @foo() {
%q0 = quake.alloca : !quake.qref
%q1 = quake.alloca : !quake.qref
quake.h %q0
quake.x [%q0] %q1
...
}
|
qtx.circuit @foo() {
%w0_0 = qtx.alloca : !qtx.wire
%w1_0 = qtx.alloca : !qtx.wire
%w0_1 = qtx.h %w0_0 : !qtx.wire
%w1_1 = qtx.x [%w0_1] %w1_0 : !qtx.wire
...
}
|
In Quake, the quantum operations do not create new values, so both X
and H
act on the same qubit reference %q0
. If Quake was SSA, this would imply that changing the order of X
and H
would not affect the result---which is not the case. In QTX, quantum operations return new values for their targets but not for controls; this aligns with the definition of SSA and shows the order of two operations that share control wires, but not targets, can be changed without affecting the result.
The question remains:
- Should we decompose in Quake or in QTX? (TLDR: most likely QTX)
Doing decomposition in Quake is more straightforward, as the discussion above demonstrates. Indeed, we would rewrite the operation using a specific decomposition pattern, and we can more and less straightforwardly rely on existing MLIR infrastructure to do it.
Decomposition in QTX is more complicated: a simple pattern such as the "CZ" one wouldn't be problematic since it only uses one target and produces one new value. However, the "CCZ" pattern complicates things: when using this pattern, we will substitute an operation that generates one new value with a sequence of operations that creates three---the controls become targets. This means that when decomposing a CCZ, say %new_t = x [%c0, %c1] %t
, we need to change any use of %c0
and %c1
that come after this operation, but not the ones that come before. (Unfortunately, MLIR seems to not be built to handle this sort of rewriting, so it requires a bit more work.)
Despite the complications, I think decomposition should still happen in QTX. The reasoning here is that decompositions do not occur in isolation, and we should retain high-level operations until optimization passes are run. These optimization passes are more likely to exist in QTX since the dialect provides a great advantage to specific optimizations as data dependencies are explicit in the program structure and thus is the dialect used for optimization. Therefore, the more likely compilation flow looks like this:
- AST to Quake
- Quake to QTX
- Optimize
- Decompose
- Optimize more
If we decompose too early, we will hinder the ability to optimize high-level operations since once they are decomposed, it will be less likely that the compiler will to be able to completely optimize them out.
Alternatively, we could keep converting between Quake and QTX:
- AST to Quake
- Quake to QTX
- Optimize
- QTX to Quake
- Decompose
- Quake to QTX
- Optimize more
To consider this idea, we need to further investigate how costly is this conversion and whether it can scale up. An interesting advantage of doing decomposition in Quake is that we can keep short-circuiting the compiler when optimizations are turned off---since there would be no need to use QTX.
Compiler rewrite patterns or library function
Now we turn to whether we should implement decomposition using compiler rewrite patterns or library functions. I will use the following toy example to guide the discussion:
CUDA Quantum
|
QTX
|
void foo() __qpu__ {
cudaq::qubit c0, c1, t;
x<cudaq::ctrl>(c0, c1, t);
}
|
qtx.circuit @foo() {
%c0 = qtx.alloca : !qtx.wire
%c1 = qtx.alloca : !qtx.wire
%t = qtx.alloca : !qtx.wire
%t_0 = x [%c0, %c1] %t : [!qtx.wire, !qtx.wire] !qtx.wire
}
|
We want to decompose the CCX
gate into a sequence of CX
s and T
s. The first step could be decomposing CCX
into CCZ
and H
, which would then be followed by the decomposition of CCZ
.
───●──── ────────●────────
│ │
│ │
───●─── = ────────●────────
│ │
┌─┴─┐ ┌───┐┌─┴─┐┌───┐
─┤ x ├─ ─┤ H ├┤ Z ├┤ H ├─
└───┘ └───┘└───┘└───┘
Let's look at how a decomposition pass would differ in both cases. For reference, here is how the rewrite pattern and library function would look:
Rewrite pattern
|
Library function
|
struct XOpDecomposition : public OpRewritePattern {
using OpRewritePattern::OpRewritePattern;
LogicalResult matchAndRewrite(
qtx::XOp op, PatternRewriter &rewriter) const override {
Location loc = op->getLoc();
Value t = op.getTarget();
t = createOp<qtx::HOp>(rewriter, loc, t);
t = createOp<qtx::ZOp>(rewriter, loc, op.getControls(), t);
t = createOp<qtx::HOp>(rewriter, loc, t);
op.getResult(0).replaceAllUsesWith(t);
rewriter.eraseOp(op);
return success();
}
};
|
void x_decomp(cudaq::qubit &c0,
cudaq::qubit &c1,
cudaq::qubit &t) {
h(t);
z<cudaq::ctrl>(c0, c1, t);
h(t);
}
|
(Note: I assuming that decomposition is done on QTX. I will try to enumerate differences if it was done in Quake, for one: the rewrite patterns would be simpler)
Bird's eye view
The result of using a compiler rewriter pattern is straightforward. The decomposition pass will walk the IR and rewrite, in place, all operations that have a registered decomposition pattern:
qtx.circuit @foo() {
%c0 = qtx.alloca : !qtx.wire
%c1 = qtx.alloca : !qtx.wire
%t = qtx.alloca : !qtx.wire
%t_0 = h %t : !qtx.wire
%t_1 = z [%c0, %c1] %t_0 : [!qtx.wire, !qtx.wire] !qtx.wire
%t_2 = h %t_1 : !qtx.wire
}
The decomposition pass will walk the IR and substitute all operations with a registered decomposition by a call to the library function that implements the decomposition:
qtx.circuit @x_decomp(%c0: !qtx.wire, %c1: !qtx.wire, %t: !qtx.wire) -> (!qtx.wire, !qtx.wire, !qtx.wire) {
%t_0 = h %t : !qtx.wire
%t_1 = z [%c0, %c1] %t_0 : [!qtx.wire, !qtx.wire] !qtx.wire
%t_2 = h %t_1 : !qtx.wire
return %c0, %c1, %t_2 : !qtx.wire, !qtx.wire, !qtx.wire
}
qtx.circuit @foo() {
%c0 = qtx.alloca : !qtx.wire
%c1 = qtx.alloca : !qtx.wire
%t = qtx.alloca : !qtx.wire
%c0_0, %c1_0, %t_0 = qtx.call @x_decomp(%c0, %c1, %t) : ...
}
In this simple example, an extra inlining step is necessary to get us to the same result as the solution base on rewrite patterns
Details
The discussion until this point served the purpose of painting a broader picture of what decomposition is, its role in the compilation, and a bird's eye view of the two alternative implementations. At this point, these alternatives might seem similar. However, choosing one over the other will have nontrivial implications for the compiler and the compilation process.
I will assume that using the rewrite patterns is the baseline ("natural") way of implementing decomposition. This is because the rewrite patterns and library functions will look similar (minus the MLIR boilerplate, which we can minimize). Hence, from the point of view of someone implementing decomposition patterns, there is little difference (code-wise). Also, we already have decomposition implemented as rewrite patterns, so I presume the implications of this method are understood and will focus on the details of using library functions.
The main disadvantage of the rewrite patterns method is that adding or changing decompositions requires changing the compiler. This disadvantage primarily motivates the use of library functions.
The first couple of questions that arise for using library functions are:
- Where will the decomposition functions live?
- How do we inject these functions into the translation unit?
We could place these functions in a header file, say cudaq/decomposition.h
, that would be included in the cudaq
header, guaranteeing that the decomposition function will be available to the compiler. The downside here is that the compiler will do the work of parsing and compiling all decomposition functions in all translation units. Also, note that the compiler will need to know the name of these functions, so adding new decomposition patterns would still require changing the compiler unless we pass these names as configuration options.
Once we settle these questions, we turn to what will happen during compilation that relies on library functions. The careful reader might have noticed that in our previous example, the rewrite pattern handles X
operations with an arbitrary number of controls, while the library function is limited to operations of two controls. We can fix this by changing the library function:
void x_decomp(cudaq::qreg &controls, cudaq::qubit &t) {
h(t);
z<cudaq::ctrl>(controls, t);
h(t);
}
This new function complicates decomposition in QTX because the number controls can only be bounded at decomposition time. Specifically, the compilation won't be able to lower this function to QTX, so before doing decomposition, we will have something like this:
func.func @x_decomp(%controls: !quake.vec<?>, %t: !quake.qref) {
h %t
z [%controls] %t
h %t
}
qtx.circuit @foo() {
%c0 = qtx.alloca : !qtx.wire
%c1 = qtx.alloca : !qtx.wire
%t = qtx.alloca : !qtx.wire
%t_0 = x [%c0, %c1] %t : [!qtx.wire, !qtx.wire] !qtx.wire
}
Now, it needs to deal with the following problems:
- The user code uses separate qubits as controls, not a vector (or array in the case of QTX).
- The compiler must specialize the function for the appropriate number of controls. (This would be less of a problem if decomposition was done using Quake.)
After dealing with issues, we are still left with the problem that likely burdens the compiler the most. When using a library function, the decomposition process does not straightforwardly generates the sequence of operations that implements the gate; instead, it injects a function that, if executed, would do so. We then rely on the compiler optimization capabilities to get us this sequence. In our toy example, this was a simple inlining of the function; however, not all decompositions are this simple.
Summary
It seems to me that using rewrite patterns is a better solution. We already have simple decompositions implemented using them, and we understand the technical implications. Furthermore, from a compiler developer perspective, the difference between writing a pattern and a decomposition function seems minimal, so there is little gain. (This assumes all infrastructure required to support using library functions for decomposition is done.) From a user perspective, library functions could enable an easy way to change decompositions without changing the compiler, but what percentage of users would want to do that? And since there is an extra cost for allowing this, are we willing to impose it on all users?