The current version of CriticalAntiDepBreaker makes conservative decisions:
#1 If we see a definition of a register r, we set Classes[SuperReg(r)] = -1 in scan function conservatively. Later, in pre-scan function, for a register operand r of the current instruction, if Classes[a] is non-null, where a is an alias of r, then we set Classes[r] = -1, and prevent r getting renamed. #2 We also initialize all aliases of live-out registers to be alive, even when they are never used in the program. This may prevent renaming of registers that overlap with such aliases.
This may result in the following missed opportunities: Assume r0 and r1 are subregisters of a super-register R0.
Case1: We may only rename the last definition of a register r0, which has a super-register.
Because its super-register will be marked as live after processing r0’s definition.
This will stop r0 getting renamed later, till R0 is really defined (when Classes[R0] will be set to nullptr).
Case 2: if r0 is defined after r1, then we can’t rename even the last definition of r1.
Because r0’s definition marks Classes[R0] = -1, which will prevent both r0 and r1 getting renamed.
Case 3: If r0 is live-in and live-out, and not modified in a basic block, and r1 participates in a CriticalAntiDependence, then that can’t be broken.
Because we initialize all super-regs of r0 (liveout) as live.
Case 4: We can’t use R0 for renaming a live-range that does not overlap with another subsequent live-range involving r0.
Case 1 is exercised, for instance, in v16f32_two_step function of llvm/test/CodeGen/X86/recip-fastmath.ll - The following anti-dep involving $ymm2 is not broken.
renamable $ymm0 = nnan ninf nsz arcp contract afn reassoc nofpexcept VFNMADDPS4Yrr killed renamable $ymm2, killed renamable $ymm0, renamable $ymm2, implicit $mxcsr //broken with ymm5 in this patch
renamable $ymm2 = nnan ninf nsz arcp contract afn reassoc VRCPPSYr renamable $ymm1
Case 4 is exercised in Part_Create function in llvm/test/CodeGen/X86/critical-anti-dep-breaker.ll
In the existing version, r10 is used to break the following anti-dependence. In the new version, rsi is used (rsi is considered before r10).
MOV64mr $rsp, 1, $noreg, 8, $noreg, killed renamable $rax :: (store 8 into %ir.Vchunk) // broken by rsi in this patch
renamable $rax = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @PartClass, $noreg :: (load 8 from got)
At a high level, this patch makes the following changes:
- When we process a register’s definition, don’t update liveness info of aliased registers eagerly.
- Keep liveness info separate for aliases as much as possible.
- Be careful when choosing a free register for renaming, or when considering replacing a register, by considering liveness information of all aliases.
This patch updates the test cases as well, to account for the broken anti-dependencies.
This comment still mentions R10, but the check line now uses RSI.
Probably best to regenerate the whole test with update_llc_test_checks.py and pre-commit it. Then rebase this patch to show how the whole function chagnes. Seeing only one instruction is hard to understand.