This is an archive of the discontinued LLVM Phabricator instance.

MachineLICM: Add new condition for hoisting of caller preserved registers
ClosedPublic

Authored by lei on May 25 2017, 1:08 PM.

Details

Summary

On PPC, global variable access is done through the table of contents (TOC) which is always in register X2. The ABI reserves this register in any functions that have calls or access global variables. Furthermore, a call through a function pointer involves saving, changing and restoring the register around the call.

Currently MachineLICM is not able to hoist instructions used to get the TOC-based addresses out of a loop if there is a virtual function call within that loop. This is because X2 is (visibly) saved/restored by the caller around the call. MachineLICM sees the use of the X2 register but since it is NOT a constant physical register in this machine function, it deemed it unsafe and does not hoist it out of the loop. Of course, overriding isConstantPhysicalRegister() for X2 is not valid since there are visible defs of the register.

What is needed is a way to communicate to MachineLICM that even though the register isn't constant in the function, it is preserved between uses - i.e. ensure that any clobber of X2 will be followed by a restore before the next use. This patch adds such a query to the MachineRegisterInfo/TargetRegisterInfo.

SPEC perf run showed negligible differences in speedups.

Diff Detail

Event Timeline

lei created this revision.May 25 2017, 1:08 PM
lei updated this revision to Diff 100448.May 26 2017, 12:23 PM

Specific to 64bit LE

iteratee accepted this revision.May 26 2017, 12:56 PM

This looks fine to me.

This revision is now accepted and ready to land.May 26 2017, 12:56 PM

Looks ok to me too, Matthias?

-eric

MatzeB edited edge metadata.May 26 2017, 2:19 PM

Instead of introducing yet another target callback, how about some simple heuristic like MRI.hasOneDef(Reg) && !MRI.getUsedPhysRegsMask().test(Reg)[1] allowing hoisting up to the point of that definition (which you can find out with MRI.def_instr_begin(Reg) or similar).

[1] We really oughta rename UsedPhysRegMask to something that makes sense (it's simply the union of call regmask operands in the function).

nemanjai edited edge metadata.May 28 2017, 10:25 PM

Instead of introducing yet another target callback, how about some simple heuristic like MRI.hasOneDef(Reg) && !MRI.getUsedPhysRegsMask().test(Reg)[1] allowing hoisting up to the point of that definition (which you can find out with MRI.def_instr_begin(Reg) or similar).

[1] We really oughta rename UsedPhysRegMask to something that makes sense (it's simply the union of call regmask operands in the function).

If that does the job, I think it would be great to not have to add another target hook. But to be honest, it doesn't really sound like it will solve this problem. Namely, the call through a function pointer does clobber X2 (but restores it on return). So a call of that sort being in the loop will presumably still prevent the global address calculation (which uses X2) from being hoisted out of the loop.

Instead of introducing yet another target callback, how about some simple heuristic like MRI.hasOneDef(Reg) && !MRI.getUsedPhysRegsMask().test(Reg)[1] allowing hoisting up to the point of that definition (which you can find out with MRI.def_instr_begin(Reg) or similar).

[1] We really oughta rename UsedPhysRegMask to something that makes sense (it's simply the union of call regmask operands in the function).

If that does the job, I think it would be great to not have to add another target hook. But to be honest, it doesn't really sound like it will solve this problem. Namely, the call through a function pointer does clobber X2 (but restores it on return). So a call of that sort being in the loop will presumably still prevent the global address calculation (which uses X2) from being hoisted out of the loop.

You talk about a call instruction? Is X2 saved and restored in the called function? Then it's just a CSR and should not be mentioned in the clobber list so no problem with my proposal above.

You talk about a call instruction? Is X2 saved and restored in the called function? Then it's just a CSR and should not be mentioned in the clobber list so no problem with my proposal above.

But it is the caller that saves and restores it, not the callee. The sequence is essentially this (all in the caller of course):

  • Save X2 to it's stack slot
  • Update X2 prior to the call
  • Call the function through a pointer
  • Restore X2 immediately after the call
lei accepted this revision.May 30 2017, 10:15 AM

Matthias (@MatzeB), do you have any additional comments for this patch?

You talk about a call instruction? Is X2 saved and restored in the called function? Then it's just a CSR and should not be mentioned in the clobber list so no problem with my proposal above.

But it is the caller that saves and restores it, not the callee. The sequence is essentially this (all in the caller of course):

  • Save X2 to it's stack slot
  • Update X2 prior to the call
  • Call the function through a pointer
  • Restore X2 immediately after the call

But all of that occurs via the linker.
X2 is saved by a trampoline if necessary, and restored in the nop slot if necessary.
As far as the surrounding code is concerned, the trampoline is the callee, (the nop slot may be considered part of the trampoline.)
Treating X2 as callee saved makes perfect sense as far as llvm is concerned.

You talk about a call instruction? Is X2 saved and restored in the called function? Then it's just a CSR and should not be mentioned in the clobber list so no problem with my proposal above.

But it is the caller that saves and restores it, not the callee. The sequence is essentially this (all in the caller of course):

  • Save X2 to it's stack slot
  • Update X2 prior to the call
  • Call the function through a pointer
  • Restore X2 immediately after the call

But all of that occurs via the linker.
X2 is saved by a trampoline if necessary, and restored in the nop slot if necessary.
As far as the surrounding code is concerned, the trampoline is the callee, (the nop slot may be considered part of the trampoline.)
Treating X2 as callee saved makes perfect sense as far as llvm is concerned.

For a normal call the linker inserts the save and restore. When calling through a function pointer the compiler inserts the save/restore of the TOC pointer, and that is what is causing the address calculations not to get hoisted.

MatzeB added inline comments.May 30 2017, 11:28 AM
include/llvm/CodeGen/MachineRegisterInfo.h
584–586 ↗(On Diff #100448)

No need to add this to MachineRegisterInfo.

include/llvm/Target/TargetRegisterInfo.h
500–501

Please improve upon the description. Currently this sounds like a check where PhysReg is a CSR register, which you just explained to me in the review is not the case!

test/CodeGen/PowerPC/licm-tocReg.ll
2
  • Can you write a .mir test that is better targetted at the problem?
  • Please add a sentence or two to describe what you are testing.

You talk about a call instruction? Is X2 saved and restored in the called function? Then it's just a CSR and should not be mentioned in the clobber list so no problem with my proposal above.

But it is the caller that saves and restores it, not the callee. The sequence is essentially this (all in the caller of course):

  • Save X2 to it's stack slot
  • Update X2 prior to the call
  • Call the function through a pointer
  • Restore X2 immediately after the call

But all of that occurs via the linker.
X2 is saved by a trampoline if necessary, and restored in the nop slot if necessary.
As far as the surrounding code is concerned, the trampoline is the callee, (the nop slot may be considered part of the trampoline.)
Treating X2 as callee saved makes perfect sense as far as llvm is concerned.

For a normal call the linker inserts the save and restore. When calling through a function pointer the compiler inserts the save/restore of the TOC pointer, and that is what is causing the address calculations not to get hoisted.

Just for reference... Here's what Machine LICM sees for the call through a function ptr:

BCTRL8_LDinto_toc 24, %X1, <regmask %CR2 %CR3 %CR4 %F14 %F15 %F16 %F17 %F18 %F19 %F20 %F21 %F22 %F23 %F24 %F25 %F26 %F27 %F28 %F29 %F30 %F31 %R14 %R15 %R16 %R17 %R18 %R19 %R20 %R21 %R22 %R23 %R24 %R25 %R26 %R27 %R28 %R29 %R30 %R31 %V20 %V21 %V22 %V23 %V24 %V25 %V26 %V27 %V28 %V29 %V30 %V31 %VF20 %VF21 %VF22 %VF23 %VF24 %VF25 %VF26 %VF27 %VF28 %VF29 %VF30 %VF31 %X14 %X15 %X16 %X17 %X18 %X19 %X20 %X21 %X22 %X23 %X24 %X25 %X26 %X27 %X28 %X29 %X30 %X31 %CR2EQ %CR3EQ %CR4EQ %CR2GT %CR3GT %CR4GT %CR2LT %CR3LT %CR4LT %CR2UN %CR3UN %CR4UN>, %LR8<imp-def,dead>, %X2<imp-def,dead>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X2<imp-use>, %R1<imp-def>, %X3<imp-def>

And of course, this "instruction" will emit both the branch on the count register and the load into X2 to restore the TOC pointer. Of course, previously to this, prepareCall() will have emitted a store of the TOC pointer to the stack.

So it really is the caller that saves and restores the TOC ptr. So X2 is not a CSR - the caller modifies it so it is of course not a "constant physical register". The only thing that makes this situation special is that what needs to be communicated to Machine LICM is that this modification is guaranteed to give X2 the same value it had before the call.

lei added inline comments.May 31 2017, 8:24 AM
include/llvm/CodeGen/MachineRegisterInfo.h
584–586 ↗(On Diff #100448)

k

include/llvm/Target/TargetRegisterInfo.h
500–501

k

test/CodeGen/PowerPC/licm-tocReg.ll
2

When I tried to create a .mir test I get machine verifier errors. Since there are no passes invoked that identify the TOC access in the function, we don't reserve X2.
The problem is that MachineFunctionInfo isn't saved/dumped as part of emitting .mir. So any passes that rely on a previously populated MFI run the risk of failing. Specifically, this is an issue with TOC access which is what I'm trying to test. The proper solution would be to emit MFI data .mir, but that is beyond the scope of this patch.

Would an IR test with -stop-before/-stop-after checks be a reasonable compromise?

MatzeB added inline comments.May 31 2017, 10:51 AM
test/CodeGen/PowerPC/licm-tocReg.ll
2

That's unfortunate. In that case I'd rather leave the test as is (this is about late MachineLICM so there's not that many passes coming after it anyway that could destabilize the test).
My point about documenting the test a bit still stands though (for example add a comment, which of the instructions was hoisted out of the loop and which call was blocking it before this patch).

lei added inline comments.May 31 2017, 11:55 AM
test/CodeGen/PowerPC/licm-tocReg.ll
2

Will do

lei updated this revision to Diff 101049.Jun 1 2017, 10:35 AM

Addressed all comments.

lei marked 6 inline comments as done.Jun 1 2017, 10:36 AM
lei added a comment.Jun 6 2017, 8:34 AM

Hi Matthias (@MatzeB), any additional comments?

MatzeB accepted this revision.Jun 14 2017, 4:34 PM

LGTM with nitpicks.

include/llvm/Target/TargetRegisterInfo.h
505

The last sentence is not true anymore.

lib/CodeGen/MachineLICM.cpp
895–898

Could you move this check below the !isConstantPhysReg check as I expect this to be a rare case.

lei added inline comments.Jun 15 2017, 8:22 AM
include/llvm/Target/TargetRegisterInfo.h
505

will remove.

lib/CodeGen/MachineLICM.cpp
895–898

This needs to be above !isConstantPhysReg which is an early function exit check where as this is just an loop exit check. MRI->isConstantPhysReg(X2) will return false which will cause this function to exit with false. I could combine the two into:

if (!MRI->isConstantPhysReg(Reg) && !(TRI->isCallerPreservedPhysReg(Reg, *I.getParent()->getParent())))
   return false;
lei updated this revision to Diff 102688.Jun 15 2017, 10:41 AM

Update as per comments

lei marked 4 inline comments as done.Jun 15 2017, 10:41 AM
MatzeB added inline comments.Jun 15 2017, 11:29 AM
lib/CodeGen/MachineLICM.cpp
895–898

yes of course, my point was to check the ConstantPhysReg condition first and then the isCallerPreservedPhysReg which this does fine.

This revision was automatically updated to reflect the committed changes.