SplitEditor::defFromParent() can create a register copy. If register is a tuple of other registers and not all lanes are used a copy will be done on a full tuple regardless. Later register unit for an unused lane will be considered free and another overlapping register tuple can be assigned to a different value even though first register is live at that point. That is because interference only look at liveness info, while full register copy clobbers all lanes, even unused.
This patch fixes copy to only cover used lanes.
This is how it happens in the app I was debugging:
Before Virtual Register Rewriter:
187344B %vreg16749:sub0<def,read-undef> = V_ADD_I32_e32 %vreg12357, %vreg16754, %VCC<imp-def>, %EXEC<imp-use>; VReg_64:%vreg16749 SReg_32_XM0:%vreg12357 VGPR_32:%vreg16754 187360B %vreg16749:sub1<def> = V_ADDC_U32_e32 %vreg16742, %vreg15898, %VCC<imp-def,dead>, %VCC<imp-use>, %EXEC<imp-use>; VReg_64:%vreg16749 VGPR_32:%vreg16742,%vreg15898 ... 197648B %vreg21888<def> = COPY %vreg21887; VReg_64:%vreg21888,%vreg21887 ... 197988B %vreg12551<def> = FLAT_LOAD_DWORDX2 %vreg16749, 0, 0, 0, %EXEC<imp-use>, %FLAT_SCR<imp-use>; mem:LD8[%arrayidx112.i18013006(addrspace=2)](tbaa=!11) VReg_64:%vr
After Virtual Register Rewriter:
187344B %VGPR119<def> = V_ADD_I32_e32 %SGPR3, %VGPR1<kill>, %VCC<imp-def>, %EXEC<imp-use> 187360B %VGPR120<def> = V_ADDC_U32_e32 %VGPR8<kill>, %VGPR19, %VCC<imp-def,dead>, %VCC<imp-use>, %EXEC<imp-use> ... 197648B %VGPR118_VGPR119<def> = COPY %VGPR33_VGPR34 ... 197988B %VGPR52_VGPR53<def> = FLAT_LOAD_DWORDX2 %VGPR119_VGPR120, 0, 0, 0, %EXEC<imp-use>, %FLAT_SCR<imp-use>; mem:LD8[%arrayidx112.i18013006(addrspace=2)](tbaa=!11)
The RA debug log excerpt:
selectOrSplit VReg_64:%vreg21888 [197648r,200388r:0) 0@197648r L00000001 [197648r,200388r:0) 0@197648r w=4.824841e-04 assigning %vreg21888 to %VGPR118_VGPR119: VGPR118 [197648r,200388r:0) 0@197648r selectOrSplit VReg_64:%vreg16749 [187344r,187360r:0)[187360r,197988r:1) 0@187344r 1@187360r L00000001 [187344r,197988r:0) 0@187344r L00000002 [187360r,197988r:0) 0@187360r assigning %vreg16749 to %VGPR119_VGPR120: VGPR119 [187344r,197988r:0) 0@187344r VGPR120 [187360r,197988r:0) 0@187360r [%vreg16749 -> %VGPR119_VGPR120] VReg_64 [%vreg21888 -> %VGPR118_VGPR119] VReg_64
One can see that live intervals for vreg21888 and vreg16749 do overlap, but only lane 0 of %vreg21888 is used, so VGPR119 considered free. This allows rewriter to assign pair of registers 119~120 to vreg16749. Then VGPR119 is clobbered by the %VGPR118_VGPR119<def> = COPY %VGPR33_VGPR34.
After the fix the copy will become %vreg21888:sub0<def, read-undef> = COPY %vreg21887:sub0.
I’m struggling to create a small and robust testcase so far. The original testcase is more than 12000 lines or IR and MIR does not give a reproducible result. If/when I have a smaller and better testcase I will add it.