Typically, most of the existing callbr tests are in the form:
callbr ... to label %fallthrough [label %indirect_target] ... fallthrough: ... indirect_target: ...
Optimizations such as loop-sink and mergeicmps have been observed to
produce callbr's in the form:
callbr ... to label %fallthrough [label %indirect_target] ... indirect_target: ... fallthrough: ...
Note the order of the BasicBlocks is reversed. This generally isn't a
problem, until we encounter register pressure in this non-canonical
form.
Under register pressure, it has been observed that the greedy
register allocator will spill the registers defined in INLINEASM_BR
MachineInstrs that whose LiveIntervals cross into the copy blocks
split during pre-RA ISel.
This is bad. Having a register spill (ie. a COPY) post a Terminal
MachineInstr is a violation of the MachineVerifiers invariant that
subsequent Terminal MachineInstrs in a MachineBasicBlock after the first
must also be Terminal.
Further, this leads to a cascaded failure where much later in
branch-folder calls to TargetInstructionInfo::analyzeBranch() fail to
properly classify a MachineBasicBlock. This triggers calls to
MachineBasicBlock::CorrectCFGEdges() which removes the indirect
successors of the MachineBasicBlock containing the INDIRECTASM_BR.
Finally, branch-folder removes the INDIRECTASM_BR indirect target
MachineBasicBlocks, as they appear to have no predecessors. This results
in references to labels that have been removed.
We should try harder to force these outputs to stay in registers and not
be spilled.
Some alternatives considered:
- We could teach loop-sink and mergeicmps to not create "non-canonical" callbrs. I prefer to allow LLVM IR to be flexible; I don't think we should canonicalize an order to the BasicBlocks.
- We could "canonicalize" the MachineBasicBlocks during pre-RA ISel, such that the MachineBasicBlock following the split block following a block terminated by an INLINEASM_BR was the fallthrough successor. So three MachineBasicBlocks that fallthrough in order. I prefer to allow arbitrary ordering of MachineBasicBlocks leading up to regalloc, which makes regalloc able to handle more generic cases.
It might seem bad to take away freedom from the register allocator, but
in this case and especially when in the canonical form, the
LiveIntervals are quite small, and we generally don't want a register
spill between the INLINEASM_BR and it's first COPY to GPR. Later passes
typically re-order the fallthrough case anyways.
A helpful command I used to observe regalloc in action for debugging
this:
llc -debug-only=calcspillweights -debug-only=spill-code-placement \ -debug-only=greedy -debug-only=regalloc -regalloc=greedy \ -stop-after=greedy -verify-regalloc -verify-machineinstrs \ llvm/test/CodeGen/X86/callbr-asm-regalloc.ll
This test is way too complicated. This really needs a MIR test