When using a register operand for the last time and if all its lanes are defined, virtregrewriter may fail to add the killed flag when tracking liveness of subregisters. On PPC, this may cause the generation of unnecessary code.
For example, for the following code:
0B bb.0 (%ir-block.0): liveins: $v2, $v3, $v4, $v5 16B undef %14.sub_vsx1:vsrprc_with_sub_64_in_vfrc = COPY $v5 32B %14.sub_vsx0:vsrprc_with_sub_64_in_vfrc = COPY $v4 48B undef %8.sub_vsx1:vsrprc_with_sub_64_in_vfrc = COPY $v3 64B %8.sub_vsx0:vsrprc_with_sub_64_in_vfrc = COPY $v2 80B %4:g8rc_and_g8rc_nox0 = LD 0, %fixed-stack.0 :: (load 8 from %fixed-stack.0, align 16) 160B %8:vsrprc_with_sub_64_in_vfrc = KILL_PAIR %8:vsrprc_with_sub_64_in_vfrc(tied-def 0) 176B undef %15.sub_pair0:uaccrc = COPY %8:vsrprc_with_sub_64_in_vfrc 256B %14:vsrprc_with_sub_64_in_vfrc = KILL_PAIR %14:vsrprc_with_sub_64_in_vfrc(tied-def 0) 288B %15.sub_pair1:uaccrc = COPY %14:vsrprc_with_sub_64_in_vfrc 304B %28:accrc = BUILD_UACC %15:uaccrc 336B %28:accrc = XXMTACC %28:accrc(tied-def 0) 340B SPILL_ACC %28:accrc, 0, %stack.0 :: (store 64 into %stack.0, align 16) 352B ADJCALLSTACKDOWN 32, 0, implicit-def dead $r1, implicit $r1 368B BL8_NOTOC @foo, <regmask $cr2 $cr3 $cr4 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24 $f25 $f26 $f27 $f28 $f29 $f30 $f31 $r14 $r15 $r16 $r17 $r18 $r19 $r20 $r21 $r22 $r23 $r24 $r25 and 60 more...>, implicit-def dead $lr8, implicit $rm, implicit-def $r1 384B ADJCALLSTACKUP 32, 0, implicit-def dead $r1, implicit $r1 408B %25:accrc = RESTORE_ACC 0, %stack.0 :: (load 64 from %stack.0, align 16) 416B %25:accrc = XXMFACC %25:accrc(tied-def 0) 448B STXV %25.sub_vsx0:accrc, 48, %4:g8rc_and_g8rc_nox0 :: (store 16 into %ir.2 + 48) 480B STXV %25.sub_vsx1:accrc, 32, %4:g8rc_and_g8rc_nox0 :: (store 16 into %ir.2 + 32, align 32) 528B STXV %25.sub_pair1_then_sub_vsx0:accrc, 16, %4:g8rc_and_g8rc_nox0 :: (store 16 into %ir.2 + 16) 536B undef %26.sub_pair1_then_sub_vsx1:accrc = COPY %25.sub_pair1_then_sub_vsx1:accrc 560B STXVX %26.sub_pair1_then_sub_vsx1:accrc, $zero8, %4:g8rc_and_g8rc_nox0 :: (store 16 into %ir.2, align 64) 576B BLR8 implicit $lr8, implicit $rm
LiveIntervals::addKillFlags should add a killed flag to the %28 operand of the SPILL_ACC instruction (340B).
The live interval for %28 is:
%28 [304r,336r:0)[336r,340r:1) 0@304r 1@336r L0000000000000080 [304r,336r:0)[336r,340r:1) 0@304r 1@336r L0000000000000040 [304r,336r:0)[336r,340r:1) 0@304r 1@336r L0000000000000002 [304r,336r:0)[336r,340r:1) 0@304r 1@336r L0000000000000100 [304r,336r:0)[336r,340r:1) 0@304r 1@336r weight:INF RegMasks: 368r
We see that lanes L0000000000000080, L0000000000000040, L0000000000000002 and L0000000000000100 are all live until 340B.
The full lane mask of the accumulator registers (accrc) is L00000000000001C2 meaning that all the lanes used for the SPILL are defined and the flag should be added.
This patch fixes two things:
- The way the mask for defined lanes is computed. We currently compute the mask L0000000000000000 instead of L00000000000001C2. To fix that, we go through all the segments before the instruction using the register operand to see which lanes are defined and killed at this instruction.
- The way the mask for used lanes is computed. In case we use a register that is not a subregister, we currently use the conservative default mask LFFFFFFFFFFFFFFFF to represent used lanes. However, this mask is not necessarily accurate. As shown above, on PPC, the mask should be L00000000000001C2 for accumulators. So we try to retrieve this mask from the register class instead of using the default mask.
I also need someone familiar with the AMDGPU backend to check if the test cases are still correct after these changes.
What was the point of copying out the subranges here before?