This is an archive of the discontinued LLVM Phabricator instance.

RegisterCoalescer: Fix joinReservedPhysReg()
ClosedPublic

Authored by MatzeB on Feb 1 2017, 6:38 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB created this revision.Feb 1 2017, 6:38 PM
kparzysz added inline comments.Feb 2 2017, 9:54 AM
lib/CodeGen/RegisterCoalescer.cpp
1825

I think starting from the next index is somewhat dangerous. Consider this scenario: Hexagon has an instruction "J4_jumpsetr Rd, Rs, #bb", which is equivalent to "Rd = Rs; jump #bb". This one will not cause a problem here, but it inspired me to invent a new hypothetical instruction where the branch is indirect: "J4_jumprsetr Rd, Rs, Rt", defined as "Rd = Rs; jumpr Rt".

I converted the testcase to Hexagon (use R30 instead of FP), with the change that it now uses the invented instruction to define virtual register: "J4_jumprsetr %0, R0, R30". Before coalescing, this would be equivalent to "%0 = R0; jumpr R30", after coalescing it becomes "R30 = R0; jumpr R30". While with the Hexagon semantics the R30 in the target of the branch would still be the old value, I'm not sure that this can be said about any potential architecture (which would support this kind of instructions). Reserved physical registers may not obey the same use/def ordering, and there is no guarantee that the branch to R30 above would not be redirected to the newly defined value (or that is still a legal instruction to begin with).

Here's the modified testcase:

# RUN: llc -mtriple=hexagon -run-pass=simple-register-coalescing %s -o - | FileCheck %s
--- |
  define void @func1() { ret void }
...
---
name: func1
body: |
  bb.0:
    successors: %bb.3, %bb.4
    J2_jumpt undef %p0, %bb.3, implicit def %pc
    J2_jump %bb.4, implicit-def %pc

  bb.1:
    successors: %bb.2, %bb.5
    %r30 = COPY %0
    J2_jumpt undef %p0, %bb.2, implicit-def %pc
    J2_jump %bb.5, implicit-def %pc

  bb.2:
    successors: %bb.6
    %r30 = COPY %r0; outside the lifetime of %0, so shouldn't matter
    J2_jump %bb.6, implicit-def %pc

  bb.3:
    %r0 = COPY %r30 ; outside the lifetime of %0, so shouldn't matter
    J2_jumpr %r31, implicit-def %pc

  bb.5:
    S2_storeri_io %r30, 0, %r30
    J2_jumpr %r31, implicit-def %pc

  bb.6:
    J2_jumpr %r31, implicit-def %pc

  bb.4:
    successors: %bb.1
    %0 : intregs = J4_jumprsetr %r0, %r30, implicit-def %pc
...

You'd need to add a definition of it to the Hexagon target for the testcase to work, but you can change the testcase to use J4_jumpsetr (no "r" in the middle) to still get an instruction that uses R30.

Here's the output:

********** SIMPLE REGISTER COALESCING **********
********** Function: func1
********** JOINING INTERVALS ***********
(null):
64B     %R30<def> = COPY %vreg0; IntRegs:%vreg0
        Considering merging %vreg0 with %R30
                RHS = %vreg0 [48B,64r:0)[304r,320B:0)  0@304r
                Removing phys reg def of %R30 at 64B
                updated: 304B   %R30<def> = J4_jumprsetr %R0, %R30, %PC<imp-def>
        Success: %vreg0 -> %R30
        Result = %R30
(null):
(null):
128B    %R30<def> = COPY %R0
        Not coalescable.
(null):
(null):
176B    %R0<def> = COPY %R30
        Not coalescable.
(null):
(null):
Trying to inflate 0 regs.

and we end up with

288B    BB#6:
            Predecessors according to CFG: BB#0
304B            %R30<def> = J4_jumprsetr %R0, %R30, %PC<imp-def>
            Successors according to CFG: BB#1(0x80000000 / 0x80000000 = 100.00%)
MatzeB added inline comments.Feb 2 2017, 6:01 PM
lib/CodeGen/RegisterCoalescer.cpp
1825

Sorry, I am getting confused here. For my understand: Your concern is about the defining instruction of the virtual register. You think of "R30 = R0; jumpr R30" as execution in parallel (VLIW style). IN terms of register allocation/semantics this would be: read R0 and R30, the execute the two instructions; then write R30? That would match the semantics we model in our liveranges. If this is not the case we would need to set the definition to the earlyclobber register slot. You are right that this case is not handled correctly in the patch right now. I'll update.

MatzeB updated this revision to Diff 86932.Feb 2 2017, 6:38 PM

Addressed Krzysztofs comments. I will try to extend the tests for those sneaky corner cases tomorrow.

MatzeB added a comment.Feb 2 2017, 6:39 PM

I will try to update the tests tomorrow. BTW: For testing regalloc infrastructure I found that you do not need to invent new instructions, just take an existing one (like NOP) and add implicit def and use operands in your .mir test case :)

MatzeB updated this revision to Diff 87031.Feb 3 2017, 2:48 PM

Add tests for early-clobber corner cases.

kparzysz accepted this revision.Feb 6 2017, 6:48 AM
kparzysz added inline comments.
lib/CodeGen/RegisterCoalescer.cpp
1825

Yes, although I missed the earlyclobber solution (I was thinking about the concept of "late use" and the earlyclobber flag eluded me here).

This revision is now accepted and ready to land.Feb 6 2017, 6:48 AM
wmi edited edge metadata.Feb 6 2017, 3:19 PM

Sorry to chime in late.

test/CodeGen/AArch64/regcoal-physreg.mir
125–132

If bb.4 has another successor to bb.5, after %0 = ADRP 0 is changed to %fp = ADRP 0, it may change the value of %fp in bb.5. So the original check that the def of physical register has no overlap with the interval of SrcReg is not enough in the case of complex control flow?

bb.4:
  successors: %bb.1(0x40000000), %bb.5(0x40000000)

  %fp = COPY %x3
  %0 : gpr64 = ADRP 0
  CBZX undef %x0, %bb.5
  B %bb.1

bb.5:
  STRXui %fp, %fp, 0
  RET_ReallyLR
MatzeB added a comment.Feb 6 2017, 4:54 PM

Good catch Wei! Looks like the only sensible way to fix this today is to require the liverange to be whithin a single basic block.

I put this simple fix into a different review: https://reviews.llvm.org/D29436 that we can hopefully pick into the llvm 4.0 branch.
I will update this review as a followup patch to D29436 as it has grown some interesting features like proper support for earlyclobber defs and multiple uses of the vreg and some code cleanups (they are however not critical enough that they must go into the 4.0 branch).

MatzeB updated this revision to Diff 87343.Feb 6 2017, 5:35 PM
MatzeB edited the summary of this revision. (Show Details)

Addressed review comments. Note that I put the fix of the original problem into D29611 now.

This revision was automatically updated to reflect the committed changes.