Giter Site home page Giter Site logo

regalloc2's Introduction

regalloc2: another register allocator

This is a register allocator that started life as, and is about 50% still, a port of IonMonkey's backtracking register allocator to Rust. In many regards, it has been generalized, optimized, and improved since the initial port.

In addition, it contains substantial amounts of testing infrastructure (fuzzing harnesses and checkers) that does not exist in the original IonMonkey allocator.

See the design overview for (much!) more detail on how the allocator works.

License

This crate is licensed under the Apache 2.0 License with LLVM Exception. This license text can be found in the file LICENSE.

Parts of the code are derived from regalloc.rs: in particular, src/checker.rs and src/domtree.rs. This crate has the same license as regalloc.rs, so the license on these files does not differ.

regalloc2's People

Contributors

a1phyr avatar afonso360 avatar alexcrichton avatar amanieu avatar bnjbvr avatar cfallin avatar crockagile avatar elliottt avatar felixonmars avatar fitzgen avatar jameysharp avatar johan-mi avatar lengyijun avatar numas13 avatar teymour-aldridge avatar zhiqiangxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

regalloc2's Issues

Adapt regalloc2-tool to work with creduce

The fuzzing strategy we use for regalloc2 doesn't minimize well, which can make isolating a failure somewhat tedious. If the textual format emitted by regalloc2-tool looked enough like c to be used with creduce, we could write a script to automate minimization by using creduce to shrink the input.

Limit operand to a set of physical registers

It seems like there are currently two options when expressing which registers an Operand can use:

  • use reg_def/reg_use/..., in this case the operand can be allocated to any physical register.
  • use reg_fixed_def/reg_fixed_use, in this case the operand can only be a single physical register.

This does not seem to cover the case where only of a limited set of physical registers can be used as an operand.

This is often the case on x86 (32-bit), where some of the general purpose registers don't expose their lower 8 bits as a pseudoregister. This means that setcc and other instructions dealing with a byte can only use the first 4 registers, not eg. esi.

My current workaround is to always use fixed operands for these instructions, but that is a bit too limiting. The other solution is to completely remove the additional registers from the MachineEnv, but that has even bigger drawbacks. Is there a better solution? It seems none of the ISAs supported by wasmtime have this issue, so I can't steal any ideas there.

looks like a alias bug!


TRACE - Parsed run command: run: %atomic_cas_little_i16(305419896, 2, 4660, -21555) == -1412606344
TRACE - ABI: func signature Signature { params: [AbiParam { value_type: types::I32, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I64, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I16, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I16, purpose: Normal, extension: None, legalized_to_pointer: false }], returns: [AbiParam { value_type: types::I32, purpose: Normal, extension: None, legalized_to_pointer: false }], call_conv: WindowsFastcall }
TRACE - ABISig: sig Signature { params: [AbiParam { value_type: types::I32, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I64, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I16, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I16, purpose: Normal, extension: None, legalized_to_pointer: false }], returns: [AbiParam { value_type: types::I32, purpose: Normal, extension: None, legalized_to_pointer: false }], call_conv: WindowsFastcall } => args = [Slots { slots: [Reg { reg: p10i, ty: types::I32, extension: None }], purpose: Normal }, Slots { slots: [Reg { reg: p11i, ty: types::I64, extension: None }], purpose: Normal }, Slots { slots: [Reg { reg: p12i, ty: types::I16, extension: None }], purpose: Normal }, Slots { slots: [Reg { reg: p13i, ty: types::I16, extension: None }], purpose: Normal }] rets = [Slots { slots: [Reg { reg: p10i, ty: types::I32, extension: None }], purpose: Normal }] arg stack = 0 ret stack = 0 stack_ret_arg = None
TRACE - BlockLoweringOrder: function body function %atomic_cas_littl(i32, i64, i16, i16) -> i32 windows_fastcall {
    ss0 = explicit_slot 4

block0(v0: i32, v1: i64, v2: i16, v3: i16):
    v4 = stack_addr.i64 ss0
    store little v0, v4
    v5 = iadd v4, v1
    v6 = atomic_cas v5, v2, v3
    v7 = load.i32 little v4
    return v7
}

TRACE - BlockLoweringOrder: BlockLoweringOrder { lowered_order: [Orig { block: block0 }], lowered_succs: [], lowered_succ_indices: [], lowered_succ_ranges: [(0, 0)], orig_map: SecondaryMap { elems: [Some(Block(0))], default: None, unused: PhantomData }, cold_blocks: {} }
TRACE - bb block0 param v0: regs ValueRegs { parts: [v128, v2097151] }
TRACE - bb block0 param v1: regs ValueRegs { parts: [v129, v2097151] }
TRACE - bb block0 param v2: regs ValueRegs { parts: [v130, v2097151] }
TRACE - bb block0 param v3: regs ValueRegs { parts: [v131, v2097151] }
TRACE - bb block0 inst inst0 (StackLoad { opcode: StackAddr, stack_slot: ss0, offset: Offset32(0) }): result v4 regs ValueRegs { parts: [v132, v2097151] }
TRACE - bb block0 inst inst2 (Binary { opcode: Iadd, args: [v4, v1] }): result v5 regs ValueRegs { parts: [v133, v2097151] }
TRACE - bb block0 inst inst3 (AtomicCas { opcode: AtomicCas, args: [v5, v2, v3], flags: MemFlags { bits: 8 } }): result v6 regs ValueRegs { parts: [v134, v2097151] }
TRACE - bb block0 inst inst4 (Load { opcode: Load, arg: v4, flags: MemFlags { bits: 8 }, offset: Offset32(0) }): result v7 regs ValueRegs { parts: [v135, v2097151] }
TRACE - retval gets regs ValueRegs { parts: [v136, v2097151] }
TRACE - bb block0 inst inst0 has color 1
TRACE - bb block0 inst inst1 has color 1
TRACE -  -> side-effecting; incrementing color for next inst
TRACE - bb block0 inst inst2 has color 2
TRACE - bb block0 inst inst3 has color 2
TRACE -  -> side-effecting; incrementing color for next inst
TRACE - bb block0 inst inst4 has color 3
TRACE -  -> side-effecting; incrementing color for next inst
TRACE - bb block0 inst inst5 has color 4
TRACE -  -> side-effecting; incrementing color for next inst
TRACE - arg v0 used, old state Unused, new Once
TRACE - arg v4 used, old state Unused, new Once
TRACE - arg v4 used, old state Once, new Multiple
TRACE -  -> pushing args for v4 onto stack
TRACE - arg v1 used, old state Unused, new Once
TRACE - arg v5 used, old state Unused, new Once
TRACE - arg v2 used, old state Unused, new Once
TRACE - arg v3 used, old state Unused, new Once
TRACE - arg v4 used, old state Multiple, new Multiple
TRACE - arg v7 used, old state Unused, new Once
DEBUG - timing: Starting VCode lowering, (during Processing test file)
TRACE - about to lower function: function %atomic_cas_littl(i32, i64, i16, i16) -> i32 windows_fastcall {
    ss0 = explicit_slot 4

block0(v0: i32, v1: i64, v2: i16, v3: i16):
    v4 = stack_addr.i64 ss0
    store little v0, v4
    v5 = iadd v4, v1
    v6 = atomic_cas v5, v2, v3
    v7 = load.i32 little v4
    return v7
}

TRACE - lower_clif_block: block block0 inst inst5 (MultiAry { opcode: Return, args: EntityList { index: 25, unused: PhantomData } }) is_branch false side_effect true value_needed false
TRACE - lowering: inst inst5: MultiAry { opcode: Return, args: EntityList { index: 25, unused: PhantomData } }
TRACE - get_input_for_val: val v7 at cur_inst Some(inst5) cur_scan_entry_color Some(InstColor(4))
TRACE -  -> src inst inst4
TRACE -  -> has lowering side effect: true
TRACE -  -> side-effecting op inst4 for val v7: use state Once
TRACE - put_value_in_regs: val v7
TRACE -  -> regs ValueRegs { parts: [v135, v2097151] }
TRACE - emit: Mov { rd: Writable { reg: v136 }, rm: v135, ty: types::I32 }
TRACE - emit: Mov { rd: Writable { reg: p10i }, rm: v136, ty: types::I32 }
TRACE - emit: Ret
TRACE - lower_clif_block: block block0 inst inst4 (Load { opcode: Load, arg: v4, flags: MemFlags { bits: 8 }, offset: Offset32(0) }) is_branch false side_effect true value_needed true
TRACE - lowering: inst inst4: Load { opcode: Load, arg: v4, flags: MemFlags { bits: 8 }, offset: Offset32(0) }
TRACE - get_input_for_val: val v4 at cur_inst Some(inst4) cur_scan_entry_color Some(InstColor(3))
TRACE -  -> src inst inst0
TRACE -  -> has lowering side effect: false
TRACE - put_value_in_regs: val v4
TRACE -  -> regs ValueRegs { parts: [v132, v2097151] }
TRACE - emit: Load { rd: Writable { reg: v135 }, op: Lw, flags: MemFlags { bits: 8 }, from: RegOffset(v132, 0, types::I64) }
TRACE - lower_clif_block: block block0 inst inst3 (AtomicCas { opcode: AtomicCas, args: [v5, v2, v3], flags: MemFlags { bits: 8 } }) is_branch false side_effect true value_needed false
TRACE - lowering: inst inst3: AtomicCas { opcode: AtomicCas, args: [v5, v2, v3], flags: MemFlags { bits: 8 } }
TRACE - put_value_in_regs: val v5
TRACE -  -> regs ValueRegs { parts: [v133, v2097151] }
TRACE - put_value_in_regs: val v2
TRACE -  -> regs ValueRegs { parts: [v130, v2097151] }
TRACE - put_value_in_regs: val v3
TRACE -  -> regs ValueRegs { parts: [v131, v2097151] }
TRACE - emit: AtomicCas { t0: Writable { reg: v137 }, dst: Writable { reg: v134 }, e: v130, addr: v133, v: v131, ty: types::I16 }
TRACE - lower_clif_block: block block0 inst inst2 (Binary { opcode: Iadd, args: [v4, v1] }) is_branch false side_effect false value_needed true
TRACE - lowering: inst inst2: Binary { opcode: Iadd, args: [v4, v1] }
TRACE - put_value_in_regs: val v4
TRACE -  -> regs ValueRegs { parts: [v132, v2097151] }
TRACE - put_value_in_regs: val v1
TRACE -  -> regs ValueRegs { parts: [v129, v2097151] }
TRACE - emit: AluRRR { alu_op: Add, rd: Writable { reg: v138 }, rs1: v132, rs2: v129 }
TRACE - set vreg alias: from v133 to v138
TRACE - lower_clif_block: block block0 inst inst1 (Store { opcode: Store, args: [v0, v4], flags: MemFlags { bits: 8 }, offset: Offset32(0) }) is_branch false side_effect true value_needed false
TRACE - lowering: inst inst1: Store { opcode: Store, args: [v0, v4], flags: MemFlags { bits: 8 }, offset: Offset32(0) }
TRACE - get_input_for_val: val v0 at cur_inst Some(inst1) cur_scan_entry_color Some(InstColor(1))
TRACE - put_value_in_regs: val v0
TRACE -  -> regs ValueRegs { parts: [v128, v2097151] }
TRACE - get_input_for_val: val v4 at cur_inst Some(inst1) cur_scan_entry_color Some(InstColor(1))
TRACE -  -> src inst inst0
TRACE -  -> has lowering side effect: false
TRACE - put_value_in_regs: val v4
TRACE -  -> regs ValueRegs { parts: [v132, v2097151] }
TRACE - emit: Store { to: RegOffset(v132, 0, types::I64), op: Sw, flags: MemFlags { bits: 8 }, src: v128 }
TRACE - lower_clif_block: block block0 inst inst0 (StackLoad { opcode: StackAddr, stack_slot: ss0, offset: Offset32(0) }) is_branch false side_effect false value_needed true
TRACE - lowering: inst inst0: StackLoad { opcode: StackAddr, stack_slot: ss0, offset: Offset32(0) }
TRACE - emit: LoadAddr { rd: Writable { reg: v132 }, mem: NominalSPOffset(0, types::I8) }
TRACE - gen_arg_setup: entry BB block0 args are:
[v0, v1, v2, v3]
TRACE - emit: Mov { rd: Writable { reg: v128 }, rm: p10i, ty: types::I32 }
TRACE - emit: Mov { rd: Writable { reg: v129 }, rm: p11i, ty: types::I64 }
TRACE - emit: Mov { rd: Writable { reg: v130 }, rm: p12i, ty: types::I16 }
TRACE - emit: Mov { rd: Writable { reg: v131 }, rm: p13i, ty: types::I16 }
TRACE - gen_retval_area_setup: not needed
TRACE - built vcode: VCode {
  Entry block: 0
  v133 := v138
Block 0:
    (original IR block: block0)
    (instruction range: 0 .. 12)
  Inst 0: mov v128,a0
  Inst 1: mov v129,a1
  Inst 2: mov v130,a2
  Inst 3: mov v131,a3
  Inst 4: load_addr v132,0(nominal_sp)
  Inst 5: sw v128,0(v132)
  Inst 6: add v138,v132,v129
  Inst 7: atomic_cas v134,v130,v131,(v133);; t0=v137
  Inst 8: lw v135,0(v132)
  Inst 9: mov v136,v135
  Inst 10: mov a0,v136
  Inst 11: ret
}

DEBUG - timing: Ending VCode lowering
TRACE - vcode from lowering:
VCode {
  Entry block: 0
  v133 := v138
Block 0:
    (original IR block: block0)
    (instruction range: 0 .. 12)
  Inst 0: mov v128,a0
  Inst 1: mov v129,a1
  Inst 2: mov v130,a2
  Inst 3: mov v131,a3
  Inst 4: load_addr v132,0(nominal_sp)
  Inst 5: sw v128,0(v132)
  Inst 6: add v138,v132,v129
  Inst 7: atomic_cas v134,v130,v131,(v133);; t0=v137
  Inst 8: lw v135,0(v132)
  Inst 9: mov v136,v135
  Inst 10: mov a0,v136
  Inst 11: ret
}

DEBUG - timing: Starting Register allocation, (during Processing test file)
INFO - === REGALLOC RESULTS ===
INFO - block0: [succs [] preds []]
INFO -   inst0-pre:  <<< start v2 in p2i (range10) (bundle4294967295)
INFO -   inst0-pre:  <<< start v11 in p11i (range13) (bundle4294967295)
INFO -   inst0-pre:  <<< start v12 in p12i (range12) (bundle4294967295)
INFO -   inst0-pre:  <<< start v13 in p13i (range11) (bundle4294967295)
INFO -   inst0-pre:  <<< start v128 in p10i (range9) (bundle0)
INFO -   inst0-pre:  <<< start v129 in p5i (range14) (bundle13)
INFO -   inst0-pre:  <<< start v130 in p28i (range15) (bundle11)
INFO -   inst0-pre:  <<< start v131 in p7i (range16) (bundle15)
INFO -   inst0: op Def: v128i reg [none], Use: v10i reg [none]
INFO -   inst0-post:      end   v129 in p5i (range14) (bundle13) >>>
INFO -   inst0-post:      end   v130 in p28i (range15) (bundle11) >>>
INFO -   inst0-post:      end   v131 in p7i (range16) (bundle15) >>>
INFO -   inst1-pre:      end   v11 in p11i (range13) (bundle4294967295) >>>
INFO -   inst1: op Def: v129i reg [none], Use: v11i reg [none]
INFO -   inst1-post:  <<< start v129 in p11i (range8) (bundle12)
INFO -   inst2-pre:      end   v12 in p12i (range12) (bundle4294967295) >>>
INFO -   inst2: op Def: v130i reg [none], Use: v12i reg [none]
INFO -   inst2-post:  <<< start v130 in p12i (range5) (bundle10)
INFO -   inst3-pre:      end   v13 in p13i (range11) (bundle4294967295) >>>
INFO -   inst3: op Def: v131i reg [none], Use: v13i reg [none]
INFO -   inst3-post:  <<< start v131 in p13i (range7) (bundle14)
INFO -   inst4: op Def: v132i reg [p15i], Use: v2i reg [p2i]
INFO -   inst4-post:      end   v2 in p2i (range10) (bundle4294967295) >>>
INFO -   inst4-post:  <<< start v132 in p15i (range2) (bundle4)
INFO -   inst5: op Use: v132i reg [p15i], Use: v128i reg [p10i]
INFO -   inst5-post:      end   v128 in p10i (range9) (bundle0) >>>
INFO -   inst6: op Def: v138i reg [p7i], Use: v132i reg [p15i], Use: v129i reg [p11i]
INFO -   inst6-post:      end   v129 in p11i (range8) (bundle12) >>>
INFO -   inst6-post:  <<< start v138 in p7i (range6) (bundle9)
INFO -   inst7: op Def: v137i reg [p7i], Def: v134i reg [p29i], Use: v130i reg [p12i], Use: v138i reg [p7i], Use: v131i reg [p13i]
INFO -   inst7-post:      end   v130 in p12i (range5) (bundle10) >>>
INFO -   inst7-post:      end   v131 in p13i (range7) (bundle14) >>>
INFO -   inst7-post:  <<< start v134 in p29i (range4) (bundle5)
INFO -   inst7-post:  <<< start v137 in p7i (range3) (bundle8)
INFO -   inst7-post:      end   v138 in p7i (range6) (bundle9) >>>
INFO -   inst8-pre:      end   v134 in p29i (range4) (bundle5) >>>
INFO -   inst8-pre:      end   v137 in p7i (range3) (bundle8) >>>
INFO -   inst8: op Use: v132i reg [p15i], Def: v135i reg [p10i]
INFO -   inst8-post:      end   v132 in p15i (range2) (bundle4) >>>
INFO -   inst8-post:  <<< start v135 in p10i (range1) (bundle6)
INFO -   inst9: op Def: v136i reg [none], Use: v135i reg [none]
INFO -   inst9-post:  prog-move v135 (Any) -> v136 (Any)
INFO -   inst10-pre:      end   v135 in p10i (range1) (bundle6) >>>
INFO -   inst10-pre:  <<< start v136 in p10i (range0) (bundle6)
INFO -   inst10: op Def: v10i reg [none], Use: v136i reg [none]
INFO -   inst11-pre:      end   v136 in p10i (range0) (bundle6) >>>
INFO -   inst11: ret
DEBUG - timing: Ending Register allocation
DEBUG - timing: Starting VCode emission, (during Processing test file)
TRACE - MachBuffer: first 1 labels are for blocks
TRACE - MachBuffer: next 0 labels are for constants
TRACE - emitting block Block(0)
TRACE -  -> entry block
TRACE - MachBuffer: put 32-bit word @ 0: ff810113
TRACE - MachBuffer: put 32-bit word @ 4: 813023
TRACE - MachBuffer: put 32-bit word @ 8: 16413
TRACE - MachBuffer: put 32-bit word @ 12: ff010113
TRACE - MachBuffer: bind label MachLabel(0) at offset 16
TRACE - enter optimize_branches:
 b = []
 l = [MachLabel(0)]
 f = []
TRACE - leave optimize_branches:
 b = []
 l = [MachLabel(0)]
 f = []
TRACE - MachBuffer: put 32-bit word @ 16: 10793
TRACE - MachBuffer: put 32-bit word @ 20: a7a023
TRACE - MachBuffer: put 32-bit word @ 24: b783b3
TRACE - MachBuffer: new label -> MachLabel(1)
TRACE - MachBuffer: new label -> MachLabel(2)
TRACE - MachBuffer: bind label MachLabel(2) at offset 28
TRACE - enter optimize_branches:
 b = []
 l = [MachLabel(2)]
 f = []
TRACE - leave optimize_branches:
 b = []
 l = [MachLabel(2)]
 f = []
TRACE - MachBuffer: put 32-bit word @ 28: 1403a3af
TRACE - MachBuffer: put data @ 32: len 4
TRACE - MachBuffer: use_label_at_offset: offset 36 label MachLabel(1) kind Jal20
TRACE - MachBuffer: put 32-bit word @ 36: 6f
TRACE - MachBuffer: put 32-bit word @ 40: 1ad3aeaf
TRACE - MachBuffer: use_label_at_offset: offset 44 label MachLabel(2) kind B12
TRACE - MachBuffer: put 32-bit word @ 44: de9063
TRACE - MachBuffer: bind label MachLabel(1) at offset 48
TRACE - enter optimize_branches:
 b = [MachBranch { start: 36, end: 40, target: MachLabel(1), fixup: 0, inverted: None, labels_at_this_branch: [] }, MachBranch { start: 44, end: 48, target: MachLabel(2), fixup: 1, inverted: Some([99, 128, 222, 0]), labels_at_this_branch: [] }]
 l = [MachLabel(1)]
 f = [MachLabelFixup { label: MachLabel(1), offset: 36, kind: Jal20 }, MachLabelFixup { label: MachLabel(2), offset: 44, kind: B12 }]
TRACE - optimize_branches: last branch MachBranch { start: 44, end: 48, target: MachLabel(2), fixup: 1, inverted: Some([99, 128, 222, 0]), labels_at_this_branch: [] } at off 48
TRACE - leave optimize_branches:
 b = [MachBranch { start: 36, end: 40, target: MachLabel(1), fixup: 0, inverted: None, labels_at_this_branch: [] }, MachBranch { start: 44, end: 48, target: MachLabel(2), fixup: 1, inverted: Some([99, 128, 222, 0]), labels_at_this_branch: [] }]
 l = [MachLabel(1)]
 f = [MachLabelFixup { label: MachLabel(1), offset: 36, kind: Jal20 }, MachLabelFixup { label: MachLabel(2), offset: 44, kind: B12 }]
TRACE - MachBuffer: put 32-bit word @ 48: 7a503
TRACE - Epilogue: [AjustSp { amount: 16 }, Load { rd: Writable { reg: p8i }, op: Ld, flags: MemFlags { bits: 3 }, from: SPOffset(0, types::I64) }, AjustSp { amount: 8 }, Ret]
TRACE - MachBuffer: put 32-bit word @ 52: 1010113
TRACE - MachBuffer: put 32-bit word @ 56: 13403
TRACE - MachBuffer: put 32-bit word @ 60: 810113
TRACE - MachBuffer: put 32-bit word @ 64: 8067
DEBUG - timing: Ending VCode emission
DEBUG - timing: Starting VCode emission finalization, (during Processing test file)
TRACE - emit_island: fixup MachLabelFixup { label: MachLabel(1), offset: 36, kind: Jal20 }
TRACE -  -> label_offset = 48, known, required = false (pos 2097150 neg 2097152)
TRACE - patching in-range!
TRACE - emit_island: fixup MachLabelFixup { label: MachLabel(2), offset: 44, kind: B12 }
TRACE -  -> label_offset = 28, known, required = false (pos 8190 neg 8192)
TRACE - patching in-range!
DEBUG - timing: Ending VCode emission finalization
INFO - compiler code:  addi sp,-8
  sd fp,0(sp)
  mov fp,sp
  addi sp,-16
block0:
  load_addr a5,0(nominal_sp)
  sw a0,0(a5)
  add t2,a5,a1
  atomic_cas t4,a2,a3,(t2);; t0=t2
  lw a0,0(a5)
  addi sp,16
  ld fp,0(sp)
  addi sp,8
  ret


the alias v133 := v138 cause v133 def and not use ,liferange been shorted to it's def,I have v133,c137,v138 go to the same register "t2" .

Investigate better bitpacking for Operand and Use

Two core data-structure elements, Operand and Use, are both designed to fit a relatively large amount of information in one u32. This is a performance optimization that we have found to be relatively impactful; expanding even to a u64 has a measurable impact (of at least a few percent) on compilation time.

Unfortunately, the scarcity of bits means that certain limits are lower than we would prefer. For example, we support only a 5-bit index for physical registers in each register class (so 32 integer registers and 32 float/vector registers), which may not be enough for some use-cases (though it can work for aarch64 and x64 at least). This also limits the VReg count to 1M (2^20).

We should investigate ways of, e.g., out-of-lining infrequently-used information (such as fixed-PReg constraints) to raise the limits on VRegs, PRegs, instruction count, and the like and provide enough headroom for any reasonably-imaginable use case.

Fuzzer test failure

While running the fuzzer overnight, I got this failure (reproduces on the main branch):

thread '<unnamed>' panicked at 'Could not allocate minimal bundle, but the allocation problem should be possible to solve', /home/amanieu/code/regalloc2/src/ion/process.rs:1018:17

The fuzz testcase which reproduces this issue is here (base64-encoded, use base64 -d to decode).

The input program is:

TestCase {
    func: {
      REF: v0
      REF: v2
      REF: v3
      REF: v4
      REF: v5
      REF: v6
      REF: v7
      REF: v8
      REF: v10
      REF: v14
      REF: v15
      REF: v49
      REF: v50
      REF: v51
      REF: v57
      REF: v58
      REF: v82
      REF: v83
      REF: v84
      REF: v85
      REF: v86
      REF: v87
      REF: v88
      REF: v89
      REF: v90
      REF: v91
      REF: v92
      REF: v95
      REF: v96
      REF: v97
      REF: v98
      REF: v99
      REF: v100
      REF: v101
      REF: v102
      REF: v105
      REF: v106
      REF: v107
      REF: v108
      REF: v109
      REF: v110
      REF: v111
      REF: v112
      REF: v113
      REF: v114
      REF: v115
      REF: v116
      REF: v117
      REF: v118
      REF: v119
      REF: v120
      REF: v121
      REF: v122
      REF: v123
      REF: v124
      REF: v126
      REF: v127
      REF: v128
      REF: v129
      REF: v130
      REF: v131
      REF: v132
      REF: v133
      REF: v134
      REF: v135
      REF: v136
      REF: v137
      REF: v138
      REF: v139
      REF: v140
      REF: v141
      REF: v142
      REF: v143
      REF: v144
      REF: v145
      REF: v146
      REF: v147
      REF: v153
      REF: v154
      REF: v155
      REF: v156
      REF: v157
      REF: v158
      REF: v159
      REF: v160
      REF: v170
      REF: v187
      REF: v188
      REF: v189
      REF: v190
      REF: v191
      REF: v209
      REF: v210
      REF: v211
      REF: v212
      REF: v213
      REF: v214
      REF: v215
      REF: v216
      REF: v217
      REF: v218
      REF: v219
      REF: v246
      REF: v247
      REF: v248
      REF: v249
      REF: v250
      REF: v251
      REF: v271
      REF: v272
      REF: v273
      REF: v274
      REF: v275
      block0(): # succs:[1] preds:[]
        inst0: Op ops:[Def: v0i any] clobber:[]
        inst1: Op ops:[Def: v1i any] clobber:[]
        inst2: Op ops:[Def: v2i any] clobber:[]
        inst3: Op ops:[Def: v3i any] clobber:[]
        inst4: Op ops:[Def: v4i any] clobber:[]
        inst5: Op ops:[Def: v5i any] clobber:[]
        inst6: Op ops:[Def: v6i any] clobber:[]
        inst7: Op ops:[Def: v7i any] clobber:[]
        inst8: Branch ops:[] clobber:[]
      block1(): # succs:[2] preds:[0]
        inst9: Op ops:[Def: v8i any] clobber:[]
        inst10: Op ops:[Def: v9i any] clobber:[]
        inst11: Op ops:[Def: v10i any] clobber:[]
        inst12: Op ops:[Def: v11i any] clobber:[]
        inst13: Op ops:[Def: v12i any] clobber:[]
        inst14: Op ops:[Def: v13i any] clobber:[]
        inst15: Op ops:[Def: v14i any, Use: v11i reg, Use: v11i reg, Use: v11i reg] clobber:[]
        inst16: Op ops:[Def: v15i reg] clobber:[]
        inst17: Branch ops:[] clobber:[]
      block2(): # succs:[3] preds:[1, 20]
        inst18: Op ops:[Def: v16i reuse(1), Use: v15i reg, Use: v9i reg, Use: v9i reg] clobber:[]
        -- SAFEPOINT --
        inst19: Op ops:[Def: v17i reuse(2), Use: v16i reg, Use: v16i reg, Use: v16i reg] clobber:[]
        -- SAFEPOINT --
        inst20: Op ops:[Def: v18i reuse(2), Use: v17i reg, Use: v17i reg, Use: v17i reg] clobber:[]
        inst21: Op ops:[Def@Early: v19i reg, Use: v17i any, Use: v4i any, Use: v4i any] clobber:[]
        inst22: Op ops:[Def: v20i any] clobber:[]
        inst23: Branch ops:[] clobber:[]
      block3(): # succs:[4] preds:[2, 17]
        -- SAFEPOINT --
        inst24: Op ops:[Def: v21i reuse(1), Use: v19i reg] clobber:[]
        -- SAFEPOINT --
        inst25: Op ops:[Def@Early: v22i reg] clobber:[]
        -- SAFEPOINT --
        inst26: Op ops:[Def: v23i reuse(1), Use: v22i reg] clobber:[]
        -- SAFEPOINT --
        inst27: Op ops:[Def: v24i reuse(1), Use: v23i reg] clobber:[]
        -- SAFEPOINT --
        inst28: Op ops:[Def: v25i reuse(1), Use: v22i reg] clobber:[]
        -- SAFEPOINT --
        inst29: Op ops:[Def: v26i reuse(1), Use: v22i reg] clobber:[]
        inst30: Op ops:[Def: v27i reuse(2), Use: v5i reg, Use: v24i reg] clobber:[]
        inst31: Branch ops:[] clobber:[]
      block4(): # succs:[5] preds:[3, 23]
        -- SAFEPOINT --
        inst32: Op ops:[Def: v28i reuse(1), Use: v21i reg] clobber:[]
        -- SAFEPOINT --
        inst33: Op ops:[Def: v29i reuse(1), Use: v28i reg] clobber:[]
        -- SAFEPOINT --
        inst34: Op ops:[Def: v30i reuse(1), Use: v29i reg] clobber:[]
        -- SAFEPOINT --
        inst35: Op ops:[Def: v31i reuse(2), Use: v29i reg, Use: v29i reg, Use: v29i reg] clobber:[]
        -- SAFEPOINT --
        inst36: Op ops:[Def: v32i reuse(2), Use: v31i reg, Use: v31i reg, Use: v31i reg] clobber:[]
        -- SAFEPOINT --
        inst37: Op ops:[Def: v33i reuse(1), Use: v29i reg, Use: v32i reg, Use: v31i any] clobber:[]
        -- SAFEPOINT --
        inst38: Op ops:[Def: v34i reuse(1), Use: v31i reg] clobber:[]
        inst39: Branch ops:[] clobber:[]
      block5(): # succs:[6, 7] preds:[4]
        -- SAFEPOINT --
        inst40: Op ops:[Def: v35i reuse(1), Use: v28i reg] clobber:[]
        -- SAFEPOINT --
        inst41: Op ops:[Def: v36i reuse(1), Use: v35i reg] clobber:[]
        -- SAFEPOINT --
        inst42: Op ops:[Def: v37i reuse(1), Use: v36i reg] clobber:[]
        -- SAFEPOINT --
        inst43: Op ops:[Def: v38i reuse(1), Use: v37i reg] clobber:[]
        -- SAFEPOINT --
        inst44: Op ops:[Def: v39i reuse(1), Use: v36i reg] clobber:[]
        -- SAFEPOINT --
        inst45: Op ops:[Def: v40i reuse(1), Use: v36i reg] clobber:[]
        -- SAFEPOINT --
        inst46: Op ops:[Def: v41i reuse(1), Use: v40i reg] clobber:[]
        inst47: Branch ops:[Use: v28i reg] clobber:[]
      block6(v45): # succs:[31] preds:[5]
        -- SAFEPOINT --
        inst48: Op ops:[Def: v42i reuse(1), Use: v35i reg] clobber:[]
        -- SAFEPOINT --
        inst49: Op ops:[Def: v43i reuse(1), Use: v45i reg] clobber:[]
        inst50: Op ops:[Def: v44i reuse(3), Use: v45i any, Use: v45i reg, Use: v45i reg] clobber:[]
        inst51: Op ops:[Def@Early: v46i reg, Use: v45i any, Use: v44i any, Use: v44i any] clobber:[]
        -- SAFEPOINT --
        inst52: Op ops:[Def: v47i reuse(1), Use: v46i reg] clobber:[]
        -- SAFEPOINT --
        inst53: Op ops:[Def: v48i reuse(1), Use: v44i reg] clobber:[]
        inst54: Branch ops:[] clobber:[]
      block7(): # succs:[8] preds:[5]
        inst55: Op ops:[Def: v49i reuse(1), Use: v35i reg] clobber:[]
        inst56: Op ops:[Def: v50i reg] clobber:[]
        inst57: Op ops:[Def@Early: v51i reg] clobber:[]
        inst58: Op ops:[Def: v52i reuse(3), Use: v51i any, Use: v50i reg, Use: v51i reg] clobber:[]
        inst59: Op ops:[Def@Early: v53i reg, Use: v49i fixed(p0i)] clobber:[]
        inst60: Op ops:[Def: v54i reuse(1), Use: v51i reg] clobber:[]
        inst61: Op ops:[Def@Early: v55i reg, Use: v52i any] clobber:[]
        inst62: Op ops:[Def: v56i reuse(1), Use: v49i reg] clobber:[]
        inst63: Branch ops:[] clobber:[]
      block8(): # succs:[9] preds:[7, 29]
        inst64: Op ops:[Def: v57i reuse(1), Use: v35i reg] clobber:[]
        inst65: Op ops:[Def: v58i reuse(1), Use: v57i reg] clobber:[]
        inst66: Op ops:[Def: v59i reuse(1), Use: v58i reg] clobber:[]
        -- SAFEPOINT --
        inst67: Op ops:[Def: v60i reuse(1), Use: v59i reg] clobber:[]
        inst68: Op ops:[Def: v61i reuse(1), Use: v58i reg] clobber:[]
        inst69: Op ops:[Def@Early: v62i reg, Use: v58i reg] clobber:[]
        inst70: Op ops:[Def: v63i reuse(1), Use: v60i reg] clobber:[]
        inst71: Op ops:[Def: v64i reuse(3), Use: v57i reg, Use: v63i reg, Use: v57i reg] clobber:[]
        inst72: Op ops:[Def: v65i reuse(1), Use: v57i reg, Use: v63i reg] clobber:[]
        -- SAFEPOINT --
        inst73: Op ops:[Def: v66i any] clobber:[]
        inst74: Branch ops:[] clobber:[]
      block9(): # succs:[10] preds:[8, 32]
        -- SAFEPOINT --
        inst75: Op ops:[Def: v67i reuse(1), Use: v35i reg] clobber:[]
        -- SAFEPOINT --
        inst76: Op ops:[Def: v68i reuse(1), Use: v67i reg] clobber:[]
        -- SAFEPOINT --
        inst77: Op ops:[Def: v69i reuse(1), Use: v68i reg] clobber:[]
        -- SAFEPOINT --
        inst78: Op ops:[Def@Early: v70i reg] clobber:[]
        -- SAFEPOINT --
        inst79: Op ops:[Def: v71i reuse(1), Use: v68i reg] clobber:[]
        inst80: Branch ops:[] clobber:[]
      block10(): # succs:[11, 12] preds:[9]
        -- SAFEPOINT --
        inst81: Op ops:[Def: v72i reuse(1), Use: v70i reg] clobber:[]
        -- SAFEPOINT --
        inst82: Op ops:[Def: v73i reuse(1), Use: v72i reg] clobber:[]
        inst83: Op ops:[Def: v74i reuse(1), Use: v73i reg] clobber:[]
        -- SAFEPOINT --
        inst84: Op ops:[Def: v75i reuse(1), Use: v74i reg] clobber:[]
        -- SAFEPOINT --
        inst85: Op ops:[Def: v76i reuse(1), Use: v73i reg] clobber:[]
        inst86: Branch ops:[Use: v67i reg, Use: v68i reg, Use: v67i reg] clobber:[]
      block11(): # succs:[31] preds:[10]
        -- SAFEPOINT --
        inst87: Op ops:[Def: v77i reuse(1), Use: v73i reg] clobber:[]
        -- SAFEPOINT --
        inst88: Op ops:[Def: v78i reg] clobber:[]
        inst89: Op ops:[Def: v79i reuse(1), Use: v78i reg] clobber:[]
        inst90: Op ops:[Def@Early: v80i any, Use: v77i reg, Use: v77i reg, Use: v77i any] clobber:[]
        inst91: Op ops:[Def: v81i any] clobber:[]
        inst92: Branch ops:[] clobber:[]
      block12(v85, v86, v88): # succs:[13] preds:[10]
        inst93: Op ops:[Def: v82i any] clobber:[]
        inst94: Op ops:[Def: v83i any] clobber:[]
        inst95: Op ops:[Def: v84i any] clobber:[]
        inst96: Op ops:[Def: v87i any] clobber:[]
        inst97: Op ops:[Def: v89i any] clobber:[]
        inst98: Op ops:[Def: v90i any] clobber:[]
        inst99: Op ops:[Def: v91i any] clobber:[]
        inst100: Op ops:[Def: v92i any] clobber:[]
        inst101: Op ops:[Def: v93i any] clobber:[]
        inst102: Op ops:[Def: v94i reuse(1), Use: v82i reg, Use: v91i reg, Use: v87i any] clobber:[]
        inst103: Branch ops:[] clobber:[]
      block13(): # succs:[14, 15] preds:[12]
        inst104: Op ops:[Def: v95i reuse(1), Use: v90i reg] clobber:[]
        inst105: Op ops:[Def: v96i reg] clobber:[]
        inst106: Op ops:[Def: v97i any] clobber:[]
        inst107: Op ops:[Def: v98i any] clobber:[PReg(hw = 3, class = Int, index = 3), PReg(hw = 0, class = Int, index = 0)]
        inst108: Op ops:[Def@Early: v99i reg] clobber:[]
        inst109: Op ops:[Def: v100i any] clobber:[]
        inst110: Op ops:[Def: v101i any] clobber:[]
        inst111: Op ops:[Def: v102i any] clobber:[]
        inst112: Branch ops:[] clobber:[]
      block14(): # succs:[31] preds:[13]
        inst113: Op ops:[Def: v103i any] clobber:[]
        inst114: Op ops:[Def: v104i any] clobber:[]
        inst115: Op ops:[Def: v105i any] clobber:[]
        inst116: Op ops:[Def: v106i any] clobber:[]
        inst117: Op ops:[Def: v107i any] clobber:[]
        inst118: Op ops:[Def: v108i any] clobber:[]
        inst119: Op ops:[Def: v109i any] clobber:[]
        inst120: Op ops:[Def: v110i any] clobber:[]
        inst121: Branch ops:[] clobber:[]
      block15(): # succs:[16] preds:[13]
        inst122: Op ops:[Def: v111i reuse(1), Use: v102i reg, Use: v96i reg, Use: v101i any] clobber:[]
        inst123: Op ops:[Def: v112i reuse(1), Use: v111i reg] clobber:[]
        inst124: Op ops:[Def@Early: v113i reg, Use: v112i reg, Use: v0i any, Use: v0i any] clobber:[]
        inst125: Op ops:[Def: v114i any] clobber:[]
        inst126: Op ops:[Def: v115i any] clobber:[]
        inst127: Op ops:[Def: v116i reuse(1), Use: v114i reg] clobber:[]
        inst128: Op ops:[Def@Early: v117i reg, Use: v114i reg] clobber:[]
        inst129: Op ops:[Def: v118i any, Use: v2i any, Use: v2i any] clobber:[]
        inst130: Branch ops:[] clobber:[]
      block16(): # succs:[17, 18] preds:[15]
        inst131: Op ops:[Def: v119i reuse(2), Use: v2i any, Use: v70i reg] clobber:[]
        inst132: Op ops:[Def: v120i reuse(1), Use: v119i reg] clobber:[]
        inst133: Op ops:[Def: v121i reuse(1), Use: v120i reg] clobber:[]
        inst134: Op ops:[Def: v122i any] clobber:[]
        inst135: Op ops:[Def: v123i any] clobber:[]
        inst136: Op ops:[Def@Early: v124i any] clobber:[]
        inst137: Op ops:[Def: v125i any] clobber:[]
        inst138: Op ops:[Def: v126i any] clobber:[]
        inst139: Branch ops:[Use: v123i reg] clobber:[]
      block17(v127): # succs:[3] preds:[16]
        inst140: Op ops:[Def: v128i any] clobber:[]
        inst141: Op ops:[Def: v129i any] clobber:[]
        inst142: Op ops:[Def: v130i any] clobber:[]
        inst143: Op ops:[Def: v131i any] clobber:[PReg(hw = 0, class = Int, index = 0)]
        inst144: Op ops:[Def: v132i reuse(1), Use: v127i reg, Use: v127i reg, Use: v127i reg] clobber:[]
        inst145: Op ops:[Def: v133i reuse(1), Use: v130i reg, Use: v130i reg, Use: v130i reg] clobber:[]
        inst146: Op ops:[Def: v134i any] clobber:[]
        inst147: Op ops:[Def: v135i any] clobber:[]
        inst148: Op ops:[Def: v136i any] clobber:[]
        inst149: Op ops:[Def: v137i any] clobber:[]
        inst150: Op ops:[Def: v138i any] clobber:[]
        inst151: Op ops:[Def: v139i reuse(1), Use: v128i reg] clobber:[]
        inst152: Branch ops:[] clobber:[]
      block18(): # succs:[19] preds:[16]
        inst153: Op ops:[Def: v140i reuse(1), Use: v120i reg] clobber:[]
        inst154: Op ops:[Def: v141i reuse(1), Use: v140i reg] clobber:[]
        inst155: Op ops:[Def: v142i reuse(1), Use: v141i reg] clobber:[]
        inst156: Op ops:[Def: v143i reuse(1), Use: v142i reg] clobber:[]
        inst157: Op ops:[Def: v144i reuse(1), Use: v141i reg] clobber:[]
        inst158: Op ops:[Def@Early: v145i reg, Use: v141i reg] clobber:[]
        inst159: Op ops:[Def: v146i reuse(2), Use: v145i reg, Use: v143i reg] clobber:[]
        inst160: Op ops:[Def: v147i reuse(1), Use: v140i reg] clobber:[]
        inst161: Branch ops:[] clobber:[]
      block19(): # succs:[20, 21] preds:[18, 38, 41]
        -- SAFEPOINT --
        inst162: Op ops:[Def@Early: v148i reg] clobber:[]
        -- SAFEPOINT --
        inst163: Op ops:[Def: v149i reuse(1), Use: v148i reg] clobber:[]
        -- SAFEPOINT --
        inst164: Op ops:[Def: v150i reuse(2), Use: v149i reg, Use: v149i reg, Use: v149i reg] clobber:[]
        -- SAFEPOINT --
        inst165: Op ops:[Def: v151i reuse(2), Use: v149i reg, Use: v149i reg, Use: v149i reg] clobber:[]
        -- SAFEPOINT --
        inst166: Op ops:[Def: v152i reuse(3), Use: v11i any, Use: v151i reg, Use: v149i reg] clobber:[]
        inst167: Branch ops:[] clobber:[]
      block20(): # succs:[2] preds:[19]
        inst168: Op ops:[Def@Early: v153i reg] clobber:[]
        inst169: Op ops:[Def: v154i reuse(1), Use: v153i reg] clobber:[]
        inst170: Op ops:[Def: v155i reuse(1), Use: v154i reg] clobber:[]
        inst171: Op ops:[Def: v156i reuse(1), Use: v155i reg] clobber:[]
        inst172: Op ops:[Def: v157i reuse(1), Use: v154i reg] clobber:[]
        inst173: Op ops:[Def: v158i reuse(1), Use: v154i reg] clobber:[]
        inst174: Op ops:[Def: v159i reuse(1), Use: v156i reg] clobber:[]
        inst175: Op ops:[Def: v160i reuse(1), Use: v153i reg] clobber:[]
        inst176: Branch ops:[] clobber:[]
      block21(): # succs:[22] preds:[19]
        -- SAFEPOINT --
        inst177: Op ops:[Def: v161i reuse(1), Use: v149i reg] clobber:[]
        -- SAFEPOINT --
        inst178: Op ops:[Def: v162i reuse(1), Use: v152i reg] clobber:[]
        -- SAFEPOINT --
        inst179: Op ops:[Def: v163i reuse(1), Use: v35i reg] clobber:[]
        -- SAFEPOINT --
        inst180: Op ops:[Def: v164i reg, Use: v161i reg, Use: v163i any, Use: v163i reg] clobber:[]
        -- SAFEPOINT --
        inst181: Op ops:[Def: v165i reuse(3), Use: v20i reg, Use: v162i reg, Use: v151i reg] clobber:[]
        -- SAFEPOINT --
        inst182: Op ops:[Def: v166i reuse(1), Use: v164i reg] clobber:[]
        -- SAFEPOINT --
        inst183: Op ops:[Def@Early: v167i reg] clobber:[]
        -- SAFEPOINT --
        inst184: Op ops:[Def: v168i reuse(1), Use: v161i reg] clobber:[]
        -- SAFEPOINT --
        inst185: Op ops:[Def: v169i reuse(1), Use: v162i reg] clobber:[]
        inst186: Op ops:[Def: v170i reuse(1), Use: v161i reg] clobber:[]
        -- SAFEPOINT --
        inst187: Op ops:[Def@Early: v171i reg] clobber:[]
        inst188: Branch ops:[] clobber:[]
      block22(): # succs:[23, 24] preds:[21, 35]
        -- SAFEPOINT --
        inst189: Op ops:[Def: v172i reuse(1), Use: v35i reg] clobber:[]
        -- SAFEPOINT --
        inst190: Op ops:[Def: v173i reuse(1), Use: v172i reg] clobber:[]
        -- SAFEPOINT --
        inst191: Op ops:[Def: v174i reuse(1), Use: v173i reg] clobber:[]
        -- SAFEPOINT --
        inst192: Op ops:[Def: v175i reg, Use: v172i fixed(p0i)] clobber:[]
        -- SAFEPOINT --
        inst193: Op ops:[Def: v176i reuse(1), Use: v173i reg] clobber:[]
        inst194: Branch ops:[] clobber:[]
      block23(): # succs:[4] preds:[22]
        -- SAFEPOINT --
        inst195: Op ops:[Def: v177i reuse(1), Use: v176i reg] clobber:[]
        inst196: Op ops:[Def@Early: v178i reg, Use: v177i reg, Use: v177i any, Use: v177i any] clobber:[]
        inst197: Op ops:[Def: v179i reuse(1), Use: v177i reg, Use: v178i any] clobber:[]
        -- SAFEPOINT --
        inst198: Op ops:[Def@Early: v180i any, Use: v177i any, Use: v177i fixed(p0i)] clobber:[]
        -- SAFEPOINT --
        inst199: Op ops:[Def: v181i reuse(1), Use: v178i reg] clobber:[]
        inst200: Branch ops:[] clobber:[]
      block24(): # succs:[25] preds:[22]
        -- SAFEPOINT --
        inst201: Op ops:[Def: v182i reuse(1), Use: v172i reg] clobber:[]
        -- SAFEPOINT --
        inst202: Op ops:[Def: v183i reuse(1), Use: v182i reg] clobber:[]
        -- SAFEPOINT --
        inst203: Op ops:[Def: v184i reuse(1), Use: v183i reg] clobber:[]
        -- SAFEPOINT --
        inst204: Op ops:[Def: v185i reuse(1), Use: v184i reg] clobber:[]
        -- SAFEPOINT --
        inst205: Op ops:[Def: v186i reuse(1), Use: v183i reg] clobber:[]
        inst206: Branch ops:[Use: v172i reg] clobber:[]
      block25(v187): # succs:[26, 27] preds:[24]
        inst207: Op ops:[Def: v188i reuse(3), Use: v9i reg, Use: v187i reg, Use: v187i reg] clobber:[]
        inst208: Op ops:[Def: v189i reuse(1), Use: v188i reg] clobber:[]
        inst209: Op ops:[Def: v190i reuse(1), Use: v189i reg] clobber:[]
        inst210: Op ops:[Def: v191i reuse(1), Use: v188i reg] clobber:[]
        inst211: Branch ops:[] clobber:[]
      block26(): # succs:[40] preds:[25]
        inst212: Op ops:[Def: v192i reuse(1), Use: v188i reg] clobber:[]
        -- SAFEPOINT --
        inst213: Op ops:[Def: v193i reuse(1), Use: v192i reg, Use: v192i reg, Use: v192i reg] clobber:[]
        -- SAFEPOINT --
        inst214: Op ops:[Def: v194i reuse(2), Use: v193i reg, Use: v193i reg, Use: v193i reg] clobber:[]
        -- SAFEPOINT --
        inst215: Op ops:[Def: v195i reuse(1), Use: v192i reg, Use: v192i reg, Use: v192i reg] clobber:[]
        -- SAFEPOINT --
        inst216: Op ops:[Def@Early: v196i reg, Use: v195i fixed(p0i), Use: v195i reg, Use: v195i reg] clobber:[]
        -- SAFEPOINT --
        inst217: Op ops:[Def: v197i any] clobber:[PReg(hw = 0, class = Int, index = 0)]
        -- SAFEPOINT --
        inst218: Op ops:[Def: v198i reuse(1), Use: v195i reg] clobber:[]
        inst219: Branch ops:[] clobber:[]
      block27(): # succs:[28] preds:[25]
        inst220: Op ops:[Def: v199i reuse(1), Use: v187i reg] clobber:[]
        -- SAFEPOINT --
        inst221: Op ops:[Def: v200i reuse(1), Use: v199i reg] clobber:[]
        -- SAFEPOINT --
        inst222: Op ops:[Def: v201i reuse(1), Use: v200i reg] clobber:[]
        -- SAFEPOINT --
        inst223: Op ops:[Def: v202i reuse(1), Use: v201i reg] clobber:[]
        -- SAFEPOINT --
        inst224: Op ops:[Def: v203i reuse(1), Use: v200i reg] clobber:[]
        inst225: Branch ops:[] clobber:[]
      block28(): # succs:[29, 30] preds:[27]
        inst226: Op ops:[Def: v204i reuse(1), Use: v200i reg] clobber:[]
        -- SAFEPOINT --
        inst227: Op ops:[Def: v205i reuse(2), Use: v204i any, Use: v204i reg] clobber:[]
        -- SAFEPOINT --
        inst228: Op ops:[Def: v206i reuse(1), Use: v205i reg, Use: v205i reg, Use: v205i reg] clobber:[]
        -- SAFEPOINT --
        inst229: Op ops:[Def: v207i reuse(3), Use: v204i reg, Use: v204i reg, Use: v205i reg] clobber:[]
        inst230: Op ops:[Def@Early: v208i reg, Use: v205i reg, Use: v205i reg] clobber:[PReg(hw = 1, class = Int, index = 1), PReg(hw = 23, class = Int, index = 23)]
        inst231: Branch ops:[] clobber:[]
      block29(): # succs:[8] preds:[28]
        inst232: Op ops:[Def: v209i any] clobber:[]
        inst233: Op ops:[Def: v210i reuse(1), Use: v209i reg] clobber:[]
        inst234: Op ops:[Def: v211i reuse(1), Use: v210i reg] clobber:[]
        inst235: Op ops:[Def: v212i reg] clobber:[]
        inst236: Op ops:[Def: v213i reuse(1), Use: v210i reg] clobber:[]
        inst237: Branch ops:[] clobber:[]
      block30(): # succs:[31] preds:[28]
        inst238: Op ops:[Def: v214i reuse(1), Use: v29i reg, Use: v189i reg, Use: v204i reg] clobber:[]
        inst239: Op ops:[Def: v215i reuse(1), Use: v214i reg] clobber:[]
        inst240: Op ops:[Def@Early: v216i reg, Use: v215i reg] clobber:[]
        inst241: Op ops:[Def: v217i any, Use: v215i any, Use: v0i any] clobber:[]
        inst242: Op ops:[Def: v218i reuse(1), Use: v215i reg] clobber:[]
        inst243: Op ops:[Def@Early: v219i reg, Use: v0i any] clobber:[]
        inst244: Op ops:[Def: v220i any] clobber:[]
        inst245: Branch ops:[] clobber:[]
      block31(): # succs:[32, 33] preds:[6, 11, 14, 30]
        inst246: Op ops:[Def: v221i any] clobber:[]
        inst247: Op ops:[Def: v222i any] clobber:[]
        inst248: Op ops:[Def: v223i reuse(1), Use: v222i reg] clobber:[]
        -- SAFEPOINT --
        inst249: Op ops:[Def: v224i reuse(1), Use: v221i reg] clobber:[]
        inst250: Op ops:[Def@Early: v225i reg] clobber:[]
        inst251: Branch ops:[] clobber:[]
      block32(): # succs:[9] preds:[31]
        inst252: Op ops:[Def: v226i any] clobber:[]
        -- SAFEPOINT --
        inst253: Op ops:[Def: v227i any] clobber:[]
        inst254: Op ops:[Def: v228i reg] clobber:[]
        inst255: Op ops:[Def: v229i any] clobber:[]
        inst256: Op ops:[Def: v230i any] clobber:[]
        inst257: Branch ops:[] clobber:[]
      block33(): # succs:[34] preds:[31]
        inst258: Op ops:[Def: v231i any] clobber:[]
        inst259: Op ops:[Def: v232i any] clobber:[]
        inst260: Op ops:[Def: v233i any] clobber:[]
        inst261: Op ops:[Def: v234i any] clobber:[]
        inst262: Op ops:[Def: v235i any] clobber:[]
        inst263: Branch ops:[] clobber:[]
      block34(): # succs:[35, 36] preds:[33]
        inst264: Op ops:[Def: v236i any] clobber:[]
        inst265: Op ops:[Def: v237i any] clobber:[]
        inst266: Op ops:[Def: v238i any] clobber:[]
        -- SAFEPOINT --
        inst267: Op ops:[Def: v239i reuse(1), Use: v237i reg] clobber:[]
        -- SAFEPOINT --
        inst268: Op ops:[Def: v240i reuse(1), Use: v237i reg] clobber:[]
        inst269: Branch ops:[Use: v235i reg] clobber:[]
      block35(): # succs:[22] preds:[34]
        inst270: Op ops:[Def: v241i any] clobber:[]
        inst271: Op ops:[Def: v242i any] clobber:[]
        inst272: Op ops:[Def: v243i any] clobber:[]
        inst273: Op ops:[Def: v244i any] clobber:[]
        inst274: Op ops:[Def: v245i any] clobber:[]
        inst275: Branch ops:[] clobber:[]
      block36(v246): # succs:[37] preds:[34]
        inst276: Op ops:[Def: v247i reuse(1), Use: v246i reg] clobber:[]
        inst277: Op ops:[Def@Early: v248i reg] clobber:[]
        inst278: Op ops:[Def: v249i any] clobber:[]
        inst279: Op ops:[Def: v250i any] clobber:[]
        inst280: Op ops:[Def: v251i any] clobber:[]
        inst281: Branch ops:[] clobber:[]
      block37(): # succs:[38, 39] preds:[36]
        inst282: Op ops:[Def: v252i any] clobber:[]
        inst283: Op ops:[Def: v253i any] clobber:[]
        inst284: Op ops:[Def: v254i any] clobber:[]
        inst285: Op ops:[Def: v255i any] clobber:[]
        inst286: Op ops:[Def: v256i any] clobber:[]
        inst287: Op ops:[Def: v257i any] clobber:[]
        inst288: Op ops:[Def: v258i any] clobber:[]
        inst289: Op ops:[Def: v259i any] clobber:[]
        inst290: Op ops:[Def: v260i any] clobber:[]
        inst291: Branch ops:[] clobber:[]
      block38(): # succs:[19] preds:[37]
        inst292: Op ops:[Def: v261i any] clobber:[]
        inst293: Op ops:[Def: v262i any] clobber:[]
        inst294: Op ops:[Def: v263i any] clobber:[]
        inst295: Op ops:[Def: v264i any] clobber:[]
        inst296: Op ops:[Def: v265i any] clobber:[]
        inst297: Branch ops:[] clobber:[]
      block39(): # succs:[40] preds:[37]
        inst298: Op ops:[Def: v266i any] clobber:[]
        inst299: Op ops:[Def: v267i any] clobber:[]
        -- SAFEPOINT --
        inst300: Op ops:[Def: v268i any] clobber:[]
        -- SAFEPOINT --
        inst301: Op ops:[Def: v269i reuse(1), Use: v267i reg] clobber:[]
        inst302: Op ops:[Def@Early: v270i reg, Use: v267i reg] clobber:[]
        inst303: Branch ops:[] clobber:[]
      block40(): # succs:[41, 42] preds:[26, 39]
        inst304: Op ops:[Def: v271i any] clobber:[]
        inst305: Op ops:[Def: v272i any] clobber:[]
        inst306: Op ops:[Def: v273i any] clobber:[]
        inst307: Op ops:[Def: v274i any] clobber:[]
        inst308: Op ops:[Def: v275i any] clobber:[]
        inst309: Op ops:[Def: v276i any] clobber:[]
        inst310: Op ops:[Def: v277i any] clobber:[]
        inst311: Op ops:[Def: v278i any] clobber:[]
        inst312: Branch ops:[] clobber:[]
      block41(): # succs:[19] preds:[40]
        inst313: Op ops:[Def: v279i any] clobber:[]
        inst314: Op ops:[Def: v280i any] clobber:[]
        inst315: Op ops:[Def: v281i any] clobber:[]
        inst316: Op ops:[Def: v282i any] clobber:[]
        inst317: Op ops:[Def: v283i any] clobber:[]
        inst318: Branch ops:[] clobber:[]
      block42(): # succs:[43] preds:[40]
        inst319: Op ops:[Def: v284i any] clobber:[]
        inst320: Op ops:[Def: v285i any] clobber:[]
        inst321: Op ops:[Def: v286i any] clobber:[]
        inst322: Op ops:[Def: v287i any] clobber:[]
        inst323: Op ops:[Def: v288i any] clobber:[]
        inst324: Branch ops:[] clobber:[]
      block43(): # succs:[44] preds:[42]
        inst325: Op ops:[Def: v289i any] clobber:[]
        inst326: Op ops:[Def: v290i any] clobber:[]
        inst327: Op ops:[Def: v291i any] clobber:[]
        inst328: Op ops:[Def: v292i any] clobber:[]
        inst329: Op ops:[Def: v293i any] clobber:[]
        inst330: Branch ops:[] clobber:[]
      block44(): # succs:[] preds:[43]
        inst331: Op ops:[Def: v294i any] clobber:[]
        inst332: Op ops:[Def: v295i any] clobber:[]
        inst333: Op ops:[Def: v296i any] clobber:[]
        inst334: Op ops:[Def: v297i any] clobber:[]
        inst335: Op ops:[Def: v298i any] clobber:[]
        inst336: Op ops:[Def: v299i any] clobber:[]
        inst337: Op ops:[Def: v300i any] clobber:[]
        -- SAFEPOINT --
        inst338: Op ops:[Def: v301i any] clobber:[]
        inst339: Op ops:[Def@Early: v302i reg] clobber:[]
        -- SAFEPOINT --
        inst340: Op ops:[Def: v303i reuse(1), Use: v302i reg] clobber:[]
        -- SAFEPOINT --
        inst341: Op ops:[Def: v304i reuse(1), Use: v301i reg] clobber:[]
        inst342: Ret ops:[] clobber:[]
    }
    ,
}

Support instructions that clobber all registers and have non-fixed uses

There are currently special cases for when an instruction uses a fixed register and clobbers it, but when an instruction uses an unconstrained register and clobbers all registers there is no special case and we get a TooManyLiveRegisters error. This can be avoided by adding an unnecessary fixed register constraint so that the first special case is hit, but one shouldn't have to constrain the register allocation in that way, regalloc2 should just handle this case.

Replace manual use of index newtypes `bundles[bundle.index()]` with `Index` impls and typed `Vec` wrappers

Right now, regalloc2 uses an entity-component-system sort of pattern with toplevel Vecs of LiveBundle, VRegData, and the like, and newtype'd index wrappers like LiveBundleIndex, VRegIndex, etc. We have a whole bunch of instances of self.bundles[bundle.index()]....

Ideally we would make bundles a Vec-wrapper type that has an Index implementation that natively takes LiveBundleIndex, and then we could make all of these sites slightly less verbose.

Fuzzbug: checker mismatch after recent changes

Recently came up via OSS-Fuzz (private bugtracker link). Almost certainly has to do with one of my cleanup PRs yesterday (#108, #109). Investigating!

base64-encoded fuzz input for `ion_checker` target DWz///+RUFBQUP//UP///wWGBP//////////////////YmJiECAgICAgICAgICDvICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICCwsLCwsLCwsLCwICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAAAAAC6AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICArICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg//////8gIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB4gILCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsACAAE8v5+cAAADn5wALAEFp/wIAAAAAAAByw8PDw8PDw8PDw8PDw8PHw8PDw8PDw8PDw8PDw8PDw8PDw8PDw8PDw8PDw8PDw8PDw8PDw8PDw8PDw8M/AI2NjY2NjY2NjY1pQWn/AgA9PQElJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSWlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl2+ElJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJiUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSWlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUkJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl2+ElJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJdvhJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSYlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlpSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJCUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJdvhJSUlJSUlJdHaJSUlJSUlJSUlJSUlLSUlBwAAACUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlO3R5cG4vbi9uL24vbg==

Support for cold blocks

It would be useful to be able to mark some blocks as "cold" which means that they are rarely taken cold paths. The register allocator should prefer placing spills and moves in cold blocks if possible.

It turns out that very little needs to be done if we take advantage of the block ordering by requiring all cold blocks to be after normal blocks in terms of block index and instruction indices. This has the following consequences:

  • Due to the sorting of blockparams_out, bundle merging will attempt to merge all branch parameters coming from normal blocks before attempting to merge ones coming from cold blocks.
  • If a requirement conflict exists within a bundle, the bundle is split at the first conflict point. Since ranges are iterated in order, this will place the split in a cold block if there are no conflicts within normal blocks.
  • If register allocation requires splitting a bundle, the first conflict point is used as the split point. Same as above.

My only concern is that the block order will no longer be in RPO which is the ordering recommended by the documentation. While regalloc2 will still function properly, I am less sure of the impact it may have on the heuristics.

Note that there are no requirements related to the ordering of blocks, and there is no requirement that the control flow be reducible. Some heuristics used by the allocator will perform better if the code is reducible and ordered in reverse postorder (RPO), however: in particular, (1) this interacts better with the contiguous-range-of-instruction-indices live range representation that we use, and (2) the "approximate loop depth" metric will actually be exact if both these conditions are met.

Conflict between Stack and Reg uses of the same vreg

Two uses of the same vreg on the same instruction causes an assertion failure if one of them is Stack and the other is Reg or FixedReg since it results in a minimal bundle that cannot be split.

We already handle the case of multiple conflicting FixedReg constraints, should we extend this to also cover the case of Stack/Reg conflicts?

How do I run this programme?

I'm a newbie to rust, and after I clone the code, I run the cargo run directly, and he gives me an error!
error: a bin target must be available for cargo run

What should I do?

Remove the need for dedicated, non-allocatable scratch registers per class

Currently, regalloc2 requires the embedder to set aside one register from each class as a dedicated scratch register, and exclude this from the list of allocatable registers.

The scratch register for a given class allows "parallel moves" to work in all cases: when we have a cycle of moves that semantically happen concurrently on an edge (due to arguments to blockparams, aka ฯ†-nodes), we need one more storage space than the cycle length in order to lower into a sequence of individual moves.

However, we could take one of several other strategies:

  • We could look at the PReg allocation maps at the given location, finding a PReg that is not in use and using it as the scratch for this one cyclic move;
  • Or, we could create a scratch register by temporarily saving a register that doesn't participate in the cycle into a stackslot;
  • Or, we could use a stackslot as the temporary (this is the only possibility if the cycle involves all registers in the class).

Some care must be taken to handle stack-to-stack moves (we currently guarantee that we don't generate these by using the scratch register as well, saving it off temporarily in a sort of two-level scheme).

The benefit of this would be the addition of one register per class, reducing register pressure and improving runtime performance.

Come up with a better scheme for fixed-register usages

In many cases, a regalloc2 client will want to generate instructions that refer to fixed registers for uses/defs. There are multiple kinds of these register references: (i) uses/defs of a vreg that needs to be in a certain fixed register just at one instruction (e.g. for a call argument or return value), (ii) uses/defs of a machine register that is not allocatable but is used in an operand slot that can also take allocatable registers.

The former is well-covered by OperandConstraint::FixedReg, but the latter is interesting; for now our answer is to use pinned vregs, but we should have a better way of "passing through" non-allocatable registers, and we should clearly define what the semantics are.

Support more than two register classes

Riscv architecture have three category of registers.
integer registers are "x0-x31",
float registers are "f0-f31".
vector registers are "v0-v31".
riscv registers are not so general purpose as other platform.

Support explicit stack slots

This is useful for functions which take many arguments and return many values, some of which are in registers while the rest are in ABI-specified stack slots.

I think the best way to add support for fixed stack slots is to add an extra bit to PReg to add 32 integer stack slots and 32 float stack slots.

Implementation-wise, a fixed stack slot is very similar to Operand::Fixed:

  • The same fixups are required in case of multiple Fixed uses of the same vreg.
  • Live ranges assigned to a fixed stack slot need to tracked just like a PReg.
  • Allocation into fixed stack slots must be done in process_bundle: we may need to split a bundle in case of conflicts or evict another conflicting bundle.
  • Requirement computation only needs minor changes to merge Requirement::Stack with Requirement::Fixed(preg) when preg refers to a fixed stack slot.
  • A spill weight of 2000 still makes sense for fixed stack slots: we need expensive moves if we don't get the stack slot that we want.

Type confusion between `Int` and `Float` in the move resolver

This only happens when both Int and Float classes have the same spillslot size.

The root of the problem is that a single spillslot can be used (separately) to hold values of either class. This is because spill slot allocation only looks at slots_by_size and ignores the register class of the spillset. The class of the SpillSlot seems to be whichever spillset first allocates this spillslot (source).

This seems to be intentional since the move resolution code specifically ignores the register class on spillslots (1 2 3).

However this causes problems when constructing the move lists because the register class is obtained from m.from_alloc.class() which may be incorrect if from_alloc is a spillslot (i.e. it won't match the class of the other registers in the move list).

I can see 2 ways we might fix this:

  1. Require that a spillslot only be used with a single register class. This may end up increasing stack usage.
  2. Completely remove the register class from SpillSlot. But then the move resolver needs some way to know which scratch register to use for resolving stack-to-stack moves.

Rematerialization

As an alternative to spilling and reloading, a vreg value can sometimes be recomputed from other available values. There are two ways this can be supported in regalloc2:

Simple rematerialization

Supporting rematerialization for constants, including runtime constants which only depend on pinned registers (e.g. VMContext), is relatively straightforward. The client needs to track rematerialization metadata for each VReg, after which we can elide spills for this vreg (unless there is a stack use) and replace reloads with rematerializations.

trait Function {
    fn can_rematerialize(&self, vreg: VReg) -> bool;
}

enum Edit {
    /// The code sequence for rematerialization is only allowed to
    /// use `dest` and the dedicated scratch register.
    Rematerialize {
        vreg: VReg,
        // This is not an Allocation because rematerializing
        // into a stack slot doesn't make sense.
        dest: PReg,
    }
}

Complex rematerialization

Rematerialization involving values in other vregs is more complex and probably not worth the effort of implementing. It has been done before as part of a research project on V8 but AFAIK this has not made it into the main V8 codebase.

Remove support for non-SSA code

regalloc2 was initially designed for SSA inputs only (with blockparams rather than the more usual phi-nodes).

In the course of porting Cranelift to use regalloc2, we decided to build a shim in regalloc.rs on top of regalloc2, and to make this work, we added support for non-SSA code. This includes:

  • Support for more than one def of a given vreg
  • Support for "mod" (modify) operands that read and then write a vreg
  • More efficient support for move instructions, which are used heavily by Cranelift to lower phis internally before passing code to regalloc

Once we adapt Cranelift's backends to pass SSA to regalloc2 directly instead of using the regalloc.rs API (which will be a significant effort on its own), we will be able to remove this functionality and simplify the code in various places, mostly in liverange computation and in move-generation. We should verify that we obtain equivalent code quality without the move coalescing in particular.

Matching block params to allocations

Currently, checker.rs has to perform a dataflow analysis to derive the mapping from block params to allocations, but the block params are an input and conceivably the register allocator could provide this information as part of its output. In my compiler, I need this information as part of regular compilation, and while I could do a similar dataflow analysis, I imagine that the register allocator could provide this information more efficiently than if I were to reconstruct it.

The API I am thinking of is something like:

  • Output contains a field block_params: Vec<Allocation> and block_param_offsets: Vec<u32>, encoding a list of slices similar to instructions
  • The [Allocation] slice for a given block bl is parallel to f.block_params(bl)
  • The Allocation for a given block parameter is either None, or a Reg or Stack value according to where it happens to be stored at the moment.

This would enable an alternative non-fixpoint implementation of checker.rs, where the checker state is initialized directly from the block params: each mapping of vreg -> alloc (where alloc is not None) implies an entry alloc -> CheckerValue::Reg(vreg, reftyped(vreg)) in the initial state. There is no distinction between Unknown and Conflicted here, but the constraint is that at the bottom of each basic block, the state at the successor blocks is obtained from this state by setting some allocations to None. Unlike the fixpoint algorithm this does not require that the output of the checker is a least fixpoint, only a fixpoint, but it is a witness of correctness either way.

Thoughts? I imagine that some compilers don't want this information and/or don't want to pay for constructing this output data, so it can be behind a flag (a runtime flag would be useful for applications like the checker, and the cost should be minimal if the flag is off, since it's just two more empty vectors in Output, but it could also be a compile time flag). But for some applications this data is quite relevant for connecting pre-regalloc data to post-regalloc data when generating auxiliary information about the code (or even the code itself, if the ISA requires some kind of capabilities for the used registers, although this seems unlikely).

Impossible constraints when a def and use have same fixed-reg constraint and use is live-out

regalloc2 crashes with: 'Could not allocate minimal bundle, but the allocation problem should be possible to solve', regalloc2-0.2.0/src/ion/process.rs:1006:17

It seems to be related to 128-bit multiplication when targeting x86_64. The following is a manually reduced test-case:

function %regalloc_crash(i64, i64, i64) {
block0(v0: i64, v1: i64, v2: i64):
    v3 = uextend.i128 v1
    v4 = uextend.i128 v2

    v5 = imul.i128 v3, v4
    store.i128 v5, v0

    store.i64 v1, v0
    return
 }

Potentially related to #19 but continues to crash with the latest version.

Relicense fully under Apache-2.0 WITH LLVM-exception

Currently, regalloc2 contains some MPL code. This code was a result of an initial translation effort that started from IonMonkey's register allocator; IonMonkey (as part of SpiderMonkey and thus the Firefox codebase) is licensed under the MPL. This is a perfectly fine open-source license, but ideally, we would like to relicense this code to fall under Apache-2.0 WITH LLVM-exception, in alignment with other BA codebases, if all of the relevant parties agree and are happy with this.

cc @tschneidereit, who has been in contact with various folks about this and is working hard to obtain the appropriate signoffs (thanks!).

Support branch instructions with operands

Currently branch instructions that take blockparams (which implies that the target block has multiple predecessors) are not allowed to have any operands. However on many targets a temporary register is needed to synthesize a jump whose target is outside the range of a single instruction.

For example on RISC-V any jump to a target outside a +/- 1MB range requires a auipc tmp, %hi(target); jr tmp, %lo(target) instruction sequence. I would like to be able to use an EarlyDef with a temporary vreg to tell the register allocator to select a free temporary register for tmp. Would it be possible to make this work?

is `a0` not been treated as return value????

I am doing some test with wasmtime.
there is a very wired problem.
p10i is the a0 register.

test compile precise-output
set unwind_info=false
target riscv64
function %xxxxxxxxxx(i64 vmctx, i64) -> i64, f32, i64, f64, i32, f32, i64, i32 wasmtime_system_v {
    gv0 = vmctx
    gv1 = load.i64 notrap aligned readonly gv0+8
    gv2 = load.i64 notrap aligned gv1
    gv3 = vmctx
    sig0 = (i64 vmctx, i64) -> i64, f32, i64, f64, i32, f32, i64, i32 wasmtime_system_v
    fn0 = u0:0 sig0
    stack_limit = gv2

block0(v0: i64, v1: i64):
    v10 = global_value.i64 gv3
    v11 = load.i64 notrap aligned v10+72
    v12 = load.i64 notrap aligned v10+80
    v13, v14, v15, v16, v17, v18, v19, v20 = call_indirect sig0, v11(v12, v0)
    jump block1(v13, v14, v15, v16, v17, v18, v19, v20)

block1(v2: i64, v3: f32, v4: i64, v5: f64, v6: i32, v7: f32, v8: i64, v9: i32):
    return v2, v3, v4, v5, v6, v7, v8, v9
}

;   add sp,-16
;   sd ra,8(sp)
;   sd fp,0(sp)
;   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
;   mv fp,sp
;   ld t6,8(a0)
;   ld t6,0(t6)
;   trap_ifc stk_ovf##(sp ult t6)
;   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 16 }
;   unwind SaveReg { clobber_offset: 8, reg: p18i }
;   sd s2,-8(sp)
;   add sp,-16
; block0:
;   mv s2,a2
;   ld t3,72(a0)
;   ld t4,80(a0)
;   mv a1,a0
;   add sp,-48
;   virtual_sp_offset_adj +48
;   mv a0,t4
;   load_addr a2,0(sp)
;   callind t3
;   flw ft8,0(sp)
;   ld a2,8(sp)
;   fld ft0,16(sp)
;   lw a6,24(sp)
;   flw ft4,28(sp)
;   ld t0,32(sp)
;   lw t2,40(sp)
;   add sp,+48
;   virtual_sp_offset_adj -48
;   j label1
; block1:
;   mv a0,s2
;   fsw ft8,0(a0)
;   sd a2,8(a0)
;   fsd ft0,16(a0)
;   sw a6,24(a0)
;   fsw ft4,28(a0)
;   sd t0,32(a0)
;   sw t2,40(a0)
;   add sp,+16
;   ld s2,-8(sp)
;   ld ra,8(sp)
;   ld fp,0(sp)
;   add sp,+16
;   ret

the problem with this code is that register a0 , a0 contain the first result for the function. should not used for instruction like sd a2,8(a0), a0 is been overwrite to some address for store operation.

Alternative regalloc backends

The regalloc2 crate provides a good API (VReg, PReg, Operand, Function, etc) for interacting with a register allocator. It would be nice if other register allocator backends could be made available with the same API to simplify integration on the embedder side.

What I specifically have in mind is a "fast" backend that simply allocates registers in a single pass with the goal of minimizing compilation time. This will have a huge impact on compilation time since my compiler spends 80% of its time in register allocation. In practice I expect to use this for a tiering JIT where hot code can later be re-compiled using the slow regalloc backend that produces better code.

Make regalloc2 panic-clean: always return errors when impossible constraints occur

Right now, the allocator can panic if the client provides impossible constraints, such as requiring two different operands to be placed in the same PReg.

While this represents a programming error in the client and shouldn't arise from an invalid input program, it is always better to bubble up errors; we should reserve panic!() and assert!() for conditions that can only be violated due to errors in regalloc2 itself.

regalloc2 makes too many memory allocations

I used heaptrack to analyze the memory allocation patterns of my compiler which uses regalloc2. I used a large benchmark (LLVM library) which has 1257955 basic blocks in 104913 functions. Out of a total of 46780667 allocations, 46779011 of them come from regalloc2 and only 1656 come from other parts of the compiler.

My compiler uses a similar structure 1 to Cranelift where a Context structure allows memory allocations to be reused across compilation units. Specifically, this allows various Vecs and HashMaps to grow to a sufficient size only once while being reused many times for compiling multiple functions.

regalloc2 unfortunately doesn't follow this design:

  • A fresh Env structure is allocated and freed for each function. No memory allocations are preserved.
  • Heavy use of temporary SmallVecs and HashSets. Heaptrack reports that 19865290 allocations (42% of the total) are "temporary". These are allocations which are immediately followed by a free, with no other allocations in between.
  • Heavy use of BTreeMap which needs to allocate many nodes separately and cannot reuse memory unlike HashMap and Vec.

Heaptrack results

Here are the major allocation hotspots shown by heaptrack:

In try_to_allocate_bundle_to_reg:

  • 16158324 temporary allocations from the FxHashSet.
  • 9677435 allocations from inserting liveranges into PRegData::allocations (LiveRangeSet backed by a BTreeMap).

In merge_bundles:

  • 4478881 allocations from pushing a VReg to SpillSet::vregs (SmallVec).
  • 1895443 allocations from appending LiveRangeListEntrys to LiveBundle::ranges (SmallVec).

In insert_use_into_liverange:

  • 2578957 allocations from pushing a Use to LiveRange::uses (SmallVec).

In add_liverange_to_vreg:

  • 1432853 allocations from pushing a LiveRangeListEntry to VRegData::ranges (SmallVec).

In create_pregs_and_vregs:

  • 948582 allocations from pushing to Env::vregs and Env::allocs and Env::inst_alloc_offsets. (Vec)

In Env::new:

  • 1154043 allocations from initializing the various Vecs with Vec::with_capacity.

Performance impact

Benchmarking shows that approximately 10% of the time is spent in memory allocation functions (malloc, realloc, free). Additionally, another 10% of the time is spent in memcpy/memmove which could be due to vector reallocation.

Compiling with multiple threads can cause additional contention on the global allocator. This was noticed on musl targets where the global allocator doesn't have thread-local caches: compiler performance on 6 threads (on a 6-core machine) was reduced to that of a single thread whereas other allocators allow almost-perfect performance scaling.

Proposed solution

I believe the best way to improve regalloc2 would be to use a design similar to Cranelift with a Context structure that can preserve memory allocation across multiple invocations of the register allocator.

  • Temporary SmallVecs and HashSets should be replaced with a single instance in Context.
  • The many BTreeMaps should be replaced with cranelift-bforest which maintains a pool of nodes that can be reused.
  • The many SmallVecs should be replaced with EntityList from cranelift-entity and a ListPool in Context.

Since this would require a dependency on the cranelift-entity and cranelift-bforest crates, this is also a good opportunity to resolve #62 and switch to using proper indexed types internally.

Footnotes

  1. In fact I am using the cranelift-entity crate directly, it's very well written. โ†ฉ

Feedback/questions/review on the checker

Here are my notes, questions, and suggestions while reading over the regalloc2
checker. They are pretty raw, feel free to ask for clarification if anything
doesn't make sense.


The analysis lattice is:

                     Top (V)
                        |
                       ๐’ซ(V)   // the Powerset of the set of virtual regs
                        |
                Bottom ( โˆ… )     // the empty set

Doesn't P(V) already include the empty set and also a top element (V itself)?
Is this because we have an explicit representation of the universe value that
isn't the actual set V but just universe: true?


/// Abstract state for an allocation.
///
/// Equivalent to a set of virtual register names, with the
/// universe-set as top and empty set as bottom lattice element. The
/// meet-function is thus set intersection.
#[derive(Clone, Debug, PartialEq, Eq)]
struct CheckerValue {
    /// This value is the "universe set".
    universe: bool,
    /// The VRegs that this value is equal to.
    vregs: FxHashSet<VReg>,
}

If universe == true then the hash set isn't really ever used, right? If my
understanding is correct, then I think this would be slightly cleaner rewritten
as an enum:

enum CheckerValue {
    Universe,
    VRegs(FxHashSet<VReg>),
}

Now, not only are we not distracted by having a hash set accessible when
universe == true that could potentially lead to bugs if we forget to check
universe and only look at the hash set, but we can't be wasting heap space
on the hash set's elements when universe == true because of the type
definition.


The top level comment is very nice and helpful.


/// State that steps through program points as we scan over the instruction stream.
#[derive(Clone, Debug, PartialEq, Eq)]
struct CheckerState {
    top: bool,
    allocations: FxHashMap<Allocation, CheckerValue>,
}

Similar to CheckerValue above, I think this could be better rewritten as:

enum CheckerState {
    Top,
    Allocations(FxHashMap<Allocation, CheckerValue>),
}

That's it! Everything else looks good to me!

Support running checker with pinned vregs.

Right now, the checker does not support the semantics implied by pinned vregs. There is no particular reason it can't; we just didn't need this while fuzzing regalloc2 independently.

However, supporting this could allow embedders that use pinned vregs to run the checker on actual allocation runs in the embedding. For example, Cranelift will need this before we can support checker runs on register-allocated VCode.

I'm not too worried about fuzz-coverage implications here, as pinned vregs are handled early in the pipeline and the core of the allocator has no idea they exist (their uses become fixed-reg constraints). However closing this last gap would be quite nice for real-world applications.

Use spill-bundles directly for stack-constrained uses

In bytecodealliance/wasmtime#3785, we discovered a case where an unnecessary register-to-register move occurs. The pattern is like:

  • Def of value, constrained to a caller-saved register;
  • Callsite, clobbering all caller-saves;
  • Safepoint, requiring above value on stack.

Currently, we start a liverange for the value, and split just before the safepoint (due to conflict in requirements). We then chop off the tail of the tail of the first liverange and put it in the spillbundle instead. The first liverange gets the register from its constraint; the last gets a stackslot; and the spillbundle gets a callee-save register, because it's live across a callsite.

In this particular case, it would be better to move straight to the stackslot before the callsite. We could probably most easily achieve this by (i) putting all stack-constrained uses directly on the spillbundle, and (ii) in such cases, skipping second-chance register allocation for the spillbundle.

This would require some careful evaluation to ensure that it doesn't regress cases where there happens to be a stack constraint in one place but we mostly want to benefit from second-chance allocation for the spillbundle.

A more general solution is for the spillbundle allocation to take hint(s) from the allocations of all the other LRs it is tied to, possibly weighted somehow. However, in this case it would be pretty fragile because one could equally take a hint from either the first LR (constrained to a reg that is clobbered by the callsite) or the last LR; we somehow need to know to prefer the last. And whatever logic we come up with for the above, one could imagine the inverse case (safepoint first, then callsite, then use in register) where we would want the opposite hinting preference. So, this more general solution likely needs a lot more careful thought re: cost model if it were to be viable.

Include useful diagnostic information in errors

Right now, we just give error messages like BB(Block(0)), meaning that there is some error somewhere in block 0.

If one enables logging, then one gets messages like "block 0 does not end in a terminator" or something like that.

It would be nice if we put that information in the actual error itself, so folks didn't have to enable logging to figure out what went wrong.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.