This is an archive of the discontinued LLVM Phabricator instance.

Do not rename registers that do not start an independent live range
ClosedPublic

Authored by kparzysz on May 25 2016, 8:52 AM.

Details

Reviewers
qcolombet
MatzeB
Summary

With subregister liveness tracking enabled, the aggressive anti-dependency breaker can rename only a part of a live super-register.

This situation happened on Hexagon while experimenting with the subregister liveness tracking.

This is the relevant code before the post-RA scheduler, which the interesting instructions marked:

BB#9: derived from LLVM BB %for.end30
    Live Ins: %R16 %R17 %R19 %R22 %R23
    Predecessors according to CFG: BB#0 BB#8
        [...]
-->     %R2<def> = A2_add %R23, %R17<kill>
        %R6<def> = M2_mpyi %R16, %R16
        %R22<def,tied1> = M2_accii %R22<tied0>, %R2<kill>, 2
        %R7<def> = A2_tfrsi <ga:@.str>
        %R3<def> = A2_tfr %R16<kill>
        %D2<def> = A2_tfrp %D0<kill>
        %R2<def> = L2_loadri_io %R29, 28; mem:LD4[FixedStack5]
        %R2<def> = M2_mpyi %R6<kill>, %R2<kill>
-->     %R23<def> = S2_asr_i_r %R22, 31
        S2_storeri_io %R29<kill>, 0, %R7<kill>; mem:ST4[Stack]
-->     %D0<def> = A2_tfrp %D11<kill>
        J2_call <ga:@check>  ; imp-refs removed for brevity

The register D11 is a pair R23:R22. What happens next is that the anti-dependency breaker identifies the anti-dependency on R23 between the first instruction and the S2_asr_i_r. It will then proceed to rename D11 to D13 (i.e. R27:R26) and R23 to R27. The problem is that R22 is still live at this point and is not renamed (to R26).

The relevant part of the DAG before renaming:

SU(0):   %R2<def> = A2_add %R23, %R17<kill>
  # preds left       : 0
  # succs left       : 3
  # rdefs left       : 0
  Latency            : 1
  Depth              : 0
  Height             : 5
  Successors:
   out SU(6): Latency=1 Reg=%R2
   val SU(2): Latency=1 Reg=%R2
   antiSU(8): Latency=0 Reg=%R23

[...]

SU(8):   %R23<def> = S2_asr_i_r %R22, 31
  # preds left       : 2
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 1
  Depth              : 2
  Height             : 2
  Predecessors:
   val SU(2): Latency=1 Reg=%R22
   antiSU(0): Latency=0 Reg=%R23
  Successors:
   val SU(10): Latency=1 Reg=%D11

[...]

SU(10):   %D0<def> = A2_tfrp %D11<kill>
  # preds left       : 3
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 1
  Depth              : 3
  Height             : 1
  Predecessors:
   val SU(8): Latency=1 Reg=%D11
   antiSU(5): Latency=0 Reg=%D0
   val SU(2): Latency=1 Reg=%D11
  Successors:
   ch  SU(4294967295) *: Latency=1

Subsequently:

===== Aggressive anti-dependency breaking
Available regs:  ...
Anti:   %D0<def> = A2_tfrp %D11<kill>
        Def Groups: D0=g0->g0(via R0)->g0(via R1)
        Antidep reg: D0 (zero group)
        Use Groups: D11=g0->g415(last-use) R22->g416(last-use) R23->g417(last-use)
Anti:   S2_storeri_io %R29<kill>, 0, %R7<kill>; mem:ST4[Stack]
        Def Groups:
        Use Groups: R29=g0->g418(last-use) R7=g0->g419(last-use)
Anti:   %R23<def> = S2_asr_i_r %R22, 31
        Def Groups: R23=g417->g415(via D11)
        Antidep reg: R23
        Rename Candidates for Group g415:
                D11: DoubleRegs :: D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13
                R23: IntRegs :: R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
        Find Registers: [D13: D13 R27]
        Breaking anti-dependence edge on R23: D11->D13(1 refs) R23->R27(1 refs)
        Use Groups: R22=g416

The final code is this. Note that R22 has not been renamed, even though its super-register use (i.e. D11 in the last instruction before the call) has been renamed to D13:

BB#9: derived from LLVM BB %for.end30
    Live Ins: %R16 %R17 %R19 %R22 %R23
    Predecessors according to CFG: BB#0 BB#8
        [...]
        %R2<def> = A2_add %R23<kill>, %R17<kill>
        %D2<def> = A2_tfrp %D0<kill>
        %R6<def> = M2_mpyi %R16, %R16
        %R0<def> = L2_loadri_io %R29, 28; mem:LD4[FixedStack5]
        %R22<def,tied1> = M2_accii %R22<tied0>, %R2<kill>, 2
        %R2<def> = M2_mpyi %R6<kill>, %R0<kill>
        %R3<def> = A2_tfr %R16<kill>
        %R7<def> = A2_tfrsi <ga:@.str>
-->     %R27<def> = S2_asr_i_r %R22<kill>, 31
        S2_storeri_io %R29<kill>, 0, %R7<kill>; mem:ST4[Stack]
-->     %D0<def> = A2_tfrp %D13<kill>
        J2_call <ga:@check>

The proposed solution is to avoid breaking of anti-dependencies on registers that do not start independent live ranges. If the definition of the anti-dep register has successors based on any alias of the anti-dep register, that alias must be completely covered by the anti-dep register.

Diff Detail

Repository
rL LLVM

Event Timeline

kparzysz updated this revision to Diff 58433.May 25 2016, 8:52 AM
kparzysz retitled this revision from to Do not rename registers that do not start an independent live range.
kparzysz updated this object.
kparzysz added reviewers: qcolombet, MatzeB.
kparzysz set the repository for this revision to rL LLVM.
kparzysz added a subscriber: llvm-commits.
kparzysz added inline comments.May 25 2016, 10:05 AM
lib/CodeGen/AggressiveAntiDepBreaker.cpp
920

An output dependency could still defeat this test (i.e. we'd need to make sure that the later def in the output dependency starts a new live range), but I'll wait for more comments before making this change.

kparzysz added inline comments.May 25 2016, 10:08 AM
lib/CodeGen/AggressiveAntiDepBreaker.cpp
920

On the second thought, if the only aliased dependency was an output dependency on a register covered by the anti-dep register, renaming should be fine...

MatzeB edited edge metadata.May 25 2016, 3:16 PM

First: I have no experience with this pass and (independently of this patch) the code in this pass is complex and confusing...

It looks like you forbid renaming of schedule nodes if they have any earlier anti-,output- or data- dependencies of the same register. This seems overly conservative, shoulnd't you rather be checking that if you have a def of register R that all uses (data dep successors) only read R or a subregister but not a super-register of R?

lib/CodeGen/AggressiveAntiDepBreaker.cpp
909–911

This seems expensive. I would at least move the Aliases variable out of the loop and simply clear it so the we do not need to allocate the memory again each iteration.

919–920

Isn't the R == AntiDepReg || TRI->isSubRegister(AntiDepReg, R) redundant after checking the Aliases bitset you construct here?

Just out of interested: It seems to me that we only enable anti dependenc breakers for PPC and X86 by overriding getAntiDepBreakMode(), why does this code affect hexagon?

The node PathSU is the def in the use->def anti-dependency. AntiDepReg is the register causing that anti-dependency. The code goes over all successors of PathSU to see if any of them are dependent on a register that is "larger" than AntiDepReg. I think this is what your suggestion is.

lib/CodeGen/AggressiveAntiDepBreaker.cpp
909–911

Sure, I can do that.

919–920

In the failing case, AntiDepReg is R23, which is a subregister of D11. This loop goes over the successors of the def of R23: if all successors are either dependent on R23 or any subregister of R23, they are fine, because the def of R23 will cover them. Only if any of them depends on an alias of R23 that is not contained within R23 (such as D11, for example), the loop resets the anti-dep register and prevents renaming.
The check is really "if (R is not contained within AntiDepReg) and (R is aliased to AntiDepReg), then don't rename".

Just out of interested: It seems to me that we only enable anti dependenc breakers for PPC and X86 by overriding getAntiDepBreakMode(), why does this code affect hexagon?

Hexagon will use the aggressive anti-dependency breaker. In fact, our production compiler has it enabled (as well as the post-RA scheduler). I want to switch to using subregister liveness tracking, and it exposed this issue. The production compiler uses some intricate code to keep the liveness up to date using implicit operands and I want to eliminate it in favor of the subregister liveness. After I get that to work, I'll make upstream changes to enable both post-RA scheduler and anti-dependency breaker for Hexagon. In theory I could upstream that "intricate code", but I don't think there is a point if the next step will be to delete it.

MatzeB accepted this revision.May 25 2016, 6:43 PM
MatzeB edited edge metadata.

LGTM (with Aliases variable moved as discussed).

This revision is now accepted and ready to land.May 25 2016, 6:43 PM
MatzeB requested changes to this revision.May 25 2016, 6:48 PM
MatzeB edited edge metadata.

Actually I have to withdraw my approval: There is not test. Could you try coming up with a .mir testcase that does llc -run-pass post-ra-sched or so?

This revision now requires changes to proceed.May 25 2016, 6:48 PM

Sure, let me rerun the test suite with the upstream compiler (with relevant passes enabled). The original testcase doesn't fail with the upstream compiler due to a difference in register allocation, so I'll need to come up with another one. I've never used mir, so this is likely the fastest route for me to get a testcase.

kparzysz updated this revision to Diff 58631.May 26 2016, 9:20 AM
kparzysz edited edge metadata.

Added testcase, moved BitVector allocation to the top level.

kparzysz updated this revision to Diff 58632.May 26 2016, 9:21 AM
kparzysz edited edge metadata.

Forgot to commit a change before last update.

MatzeB accepted this revision.May 26 2016, 11:24 AM
MatzeB edited edge metadata.

LGTM

This revision is now accepted and ready to land.May 26 2016, 11:24 AM
kparzysz closed this revision.May 26 2016, 11:33 AM

Committed in r270885.