1、This patch is concerning about the following IR:
3504B bb.16.for.body62:
; predecessors: %bb.15, %bb.16
successors: %bb.17(0x04000000), %bb.16(0x7c000000); %bb.17(3.12%), %bb.16(96.88%)
3872B %72.ssub_0:v64_regs = Vector_Calc_Op %72.ssub_0:v64_regs
3904B %239:v32_regs = Vector_Calc_Op %239:v32_regs
4256B BNE %263:scalar_regs, $r0, %bb.16
4272B bb.17.for.cond71.preheader:
; predecessors: %bb.16
successors: %bb.19; %bb.19(100.00%)
4304B %72.ssub_1:v64_regs = COPY %239:v32_regs
4320B %237:v64_regs = IMPLICIT_DEF
4336B %238:v64_regs = COPY %72:v64_regs
When we are coalescing the following instruction in the RegisterCoalescer::joinVirtRegs()
4304B: %72.ssub_1:v64_regs = COPY %239:v32_regs.
It will return false in the JoinVals::mapValues because of the Interference of LiveIntervals between %72 and %239. To be more specifically, the interferencing LiveIntervals are 3872B:4272B and 3904B:4304B belonging to %72 and %239 respectively. The 3872B:4272B is a Phi-Def LiveInterval by the way.
But the %239 is only conflicting with %72.ssub_0 not %72.ssub_1, meaning the Interencing LiveIntervals in different Lanes. The conflicting subreg is ssub_0 not ssub_1, so we can coalesce %72.ssub_1:v64_regs = COPY %239:v32_regs. The analysis is followed.
With the above idea, I think we should return CR_Replace in the following condition check:
if ((V.WriteLanes & OtherV.ValidLanes).none()) return CR_Replace;
The V is Segment of LHSVals, %239:v32_regs:, 3904B:4304B and OtherV is the conflicting Segment of RHSVals, %72.ssub_1:v64_regs, 3872B:4272B. The V.WriteLanes is 0x0003000 and OtherV.ValidLanes is 0xFFFFFFFF, thus leading to conflicting Lanes.
Here, OtherV is 3872B:4272B, a PHI-Def LiveInterval. Now, all lanes in a PHI are conservatively assumed valid. But, the Phi-def Instruction is only liveAt the subrange of ssub_0 of 3872B:4272B. So the LaneMask of This Phi-Def LiveInterval should be the lanemask of ssub_0 when we are checking the mainrange interfence and the TrackSubRegLiveness=true. Generally, the Phi-Def LiveInterval should be the union of subranges' lanemask in which the VNI is live.
2、Met a case in AMDGPU: llvm/test/CodeGen/AMDGPU/coalescer-subranges-prune-kill-copy.mir
IR:
bb.0:
undef %0.sub0:vreg_128 = IMPLICIT_DEF %0.sub1:vreg_128 = IMPLICIT_DEF %1:vreg_128 = COPY %0 %2:vreg_128 = COPY killed %0 S_BRANCH %bb.2
Analysis:
%1 and %2 should be identical values.
But in JoinVals::followCopyChain, %0 has subranges. The program will return in if (LRQ.valueIn() && ValueIn != LRQ.valueIn()) condition.
But the function returns TrackReg which is %1 and %2 respectively.
When %0 doesnot have subranges, func will return SrcReg, %0.
I think in subranges branch, it should return the final def values, which is SrcReg and VNI should be updated with LRQ.valueIn().