This is an archive of the discontinued LLVM Phabricator instance.

[ShrinkWrap] check for PPC's non-callee-saved LR
ClosedPublic

Authored by nickdesaulniers on Dec 30 2021, 2:30 PM.

Details

Summary

As pointed out in https://reviews.llvm.org/D115688#inline-1108193, we
don't want to sink the save point past an INLINEASM_BR, otherwise
prologepilog may incorrectly sink a prolog past the MBB containing an
INLINEASM_BR and into the wrong MBB.

ShrinkWrap is getting this wrong because LR is not in the list of callee
saved registers. Specifically, ShrinkWrap::useOrDefCSROrFI calls
RegisterClassInfo::getLastCalleeSavedAlias which reads
CalleeSavedAliases which was populated by
RegisterClassInfo::runOnMachineFunction by iterating the list of
MCPhysReg returned from MachineRegisterInfo::getCalleeSavedRegs.

Because PPC's LR is non-allocatable, it's NOT considered callee saved.
Add an interface to TargetRegisterInfo for such a case and use it in
Shrinkwrap to ensure we don't sink a prolog past an INLINEASM or
INLINEASM_BR that clobbers LR.

Diff Detail

Event Timeline

nickdesaulniers requested review of this revision.Dec 30 2021, 2:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2021, 2:30 PM
nickdesaulniers planned changes to this revision.Jan 4 2022, 10:33 AM
nickdesaulniers added inline comments.
llvm/lib/CodeGen/ShrinkWrap.cpp
515–519

There's something simpler going on here. The same exact input works as expected for other architectures with link registers, such as arm and aarch64. It seems that useOrDefCSROrFI is returning true for the INLINEASM_BR for those architectures, but not powerpc64le.

I would expect that useOrDefCSROrFI should encompass any behavior we care about; in general, there isn't any reason why we can't sink save points past an INLINEASM_BR. It's possible we need to special-case non-allocatable registers in certain cases, or something like that.

llvm/lib/CodeGen/ShrinkWrap.cpp
515–519

In aarch64 and arm, the MachineOperand useOrDefCSROrFI is interested is marked dead, but for ppc64le it's not.

implicit-def dead early-clobber $lr (arm, aarch64)
implicit-def early-clobber $lr (ppc64le)

For arm and aarch64, the MachineOperand is marked dead by the livevars pass. For ppc64le, it does not.


Also, within useOrDefCSROrFI, the call to RCI.getLastCalleeSavedAlias returns 0 on ppc, unlike aarch64 and arm.

efriedma added inline comments.Jan 4 2022, 10:59 AM
llvm/lib/CodeGen/ShrinkWrap.cpp
515–519

The PowerPC "lr" register is not a general-purpose register, and can only be accessed by a few specialized instructions. Because of this, it isn't marked allocatable (which is why it isn't marked "dead"). Not sure this is actually the right decision; if we want various interactions here to work well, it probably should be allocatable.

That said, given the current state, the best incremental step might be something like isNonallocatableRegisterCalleeSave.

I assume this issue applies to both INLINEASM and INLINEASM_BR.

efriedma added inline comments.Jan 4 2022, 11:02 AM
llvm/lib/CodeGen/ShrinkWrap.cpp
515–519

That said, given the current state, the best incremental step might be something like isNonallocatableRegisterCalleeSave.

I mean, adding a new target hook named something like isNonallocatableRegisterCalleeSave

llvm/lib/CodeGen/ShrinkWrap.cpp
515–519

Shouldn't LR be listed explicitly in llvm/lib/Target/PowerPC/PPCCallingConv.td def CSR_PPC64's list of CalleeSavedRegs? LR is explicitly listed in llvm/lib/Target/AArch64/AArch64CallingConvention.td def CSR_AArch64_AAPCS's CalleeSavedRegs.

llvm/lib/CodeGen/ShrinkWrap.cpp
515–519
diff --git a/llvm/lib/Target/PowerPC/PPCCallingConv.td b/llvm/lib/Target/PowerPC/PPCCallingConv.td
index 1e81276f1de3..f9162a3716fe 100644
--- a/llvm/lib/Target/PowerPC/PPCCallingConv.td
+++ b/llvm/lib/Target/PowerPC/PPCCallingConv.td
@@ -298,7 +298,7 @@ def CSR_PPC64   : CalleeSavedRegs<(add X14, X15, X16, X17, X18, X19, X20,
                                         X21, X22, X23, X24, X25, X26, X27, X28,
                                         X29, X30, X31, F14, F15, F16, F17, F18,
                                         F19, F20, F21, F22, F23, F24, F25, F26,
-                                        F27, F28, F29, F30, F31, CR2, CR3, CR4
+                                        F27, F28, F29, F30, F31, CR2, CR3, CR4, LR
                                    )>;

appears to fix shrinkwrap, but later asserts:

Unknown regclass!

If I use LR8 instead, then shrink-wrap doesn't work; I think implicit-def early-clobber $lr MachineOperand might need to be emitted using $lr8? ie. implicit-def early-clobber $lr8?

llvm/lib/CodeGen/ShrinkWrap.cpp
515–519

if I use LR8 in llvm/lib/Target/PowerPC/PPCCallingConv.td AND lr8 in the clobber list, then that works... IIUC, lr8 is the 64b alias of the 32b lr? Then shouldn't lr in inline asm clobber list be emitted as LR8 rather than LR?

efriedma added inline comments.Jan 4 2022, 12:04 PM
llvm/lib/CodeGen/ShrinkWrap.cpp
515–519

That might work? Might need some additional adjustments. I don't think any other targets put reserved/non-allocatable registers in the CalleeSavedRegs list.

efriedma added inline comments.Jan 4 2022, 12:12 PM
llvm/lib/CodeGen/ShrinkWrap.cpp
515–519

On 64-bit "lr" should translate to LR8, I think, yes.

nickdesaulniers retitled this revision from [ShrinkWrap] don't sink Save points past INLINEASM_BR MachineInstrs to [PowerPC] add LR to CalleeSavedRegs.
nickdesaulniers edited the summary of this revision. (Show Details)
  • rewrite; add LR to CalleeSavedRegs
nickdesaulniers added inline comments.Jan 4 2022, 2:04 PM
llvm/lib/Target/PowerPC/PPCCallingConv.td
301 ↗(On Diff #397396)

I'm guessing we probably also want to add LR to the 32b CSR list?

llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll
7 ↗(On Diff #397396)

using update_llc_test_checks.py totally massacred this test. I'm happy to update it by hand, if the reviewers agree with the approach of this patch otherwise.

llvm/test/CodeGen/PowerPC/ppc64-inlineasm-clobber.ll
75

Note to reviewers; this test case is the one I'm trying to fix here.

llvm/test/CodeGen/PowerPC/xray-conditional-return.ll
1 ↗(On Diff #397396)

I'm happy to pre-commit this conversion if reviewers are generally ok with this approach.

nickdesaulniers added a reviewer: Restricted Project.Jan 4 2022, 2:04 PM
  • add LR and LR8 to a few other CallingConventions

From the test changes, it looks like the current patch completely disables shrink-wrapping in all cases? Or am I missing something?

From the test changes, it looks like the current patch completely disables shrink-wrapping in all cases? Or am I missing something?

Doesn't @loopInfoRestoreOutsideLoop in llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll show that that's NOT the case? The mflr/mtlr pair are sunk+raised respectively since the entry block is conditional and doesn't use the stack.

@foo in llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll also has an early return rather than an unconditional prolog. bgelr (branch to lr if greater than or equal to). Compared to the codegen with shrink wrap disabled, the mflr/mtlr pair is unconditional.

Though I'm less convinced about @@shrinkwrap in llvm/test/CodeGen/PowerPC/ppc64-rop-protection.ll and @test in llvm/test/CodeGen/PowerPC/pr43527.ll.

I would expect that useOrDefCSROrFI should encompass any behavior we care about; in general, there isn't any reason why we can't sink save points past an INLINEASM_BR. It's possible we need to special-case non-allocatable registers in certain cases, or something like that.
That said, given the current state, the best incremental step might be something like isNonallocatableRegisterCalleeSave.
I mean, adding a new target hook named something like isNonallocatableRegisterCalleeSave

Perhaps that's another approach I can try, but I suspect it will have the same results on tests. Let me see.

From the test changes, it looks like the current patch completely disables shrink-wrapping in all cases? Or am I missing something?

Doesn't @loopInfoRestoreOutsideLoop in llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll show that that's NOT the case? The mflr/mtlr pair are sunk+raised respectively since the entry block is conditional and doesn't use the stack.

@foo in llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll also has an early return rather than an unconditional prolog. bgelr (branch to lr if greater than or equal to). Compared to the codegen with shrink wrap disabled, the mflr/mtlr pair is unconditional.

Though I'm less convinced about @@shrinkwrap in llvm/test/CodeGen/PowerPC/ppc64-rop-protection.ll and @test in llvm/test/CodeGen/PowerPC/pr43527.ll.

Maybe verify that llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll shrink-wraps in both 32-bit and 64-bit modes?

In any case, the patch is clearly having some effect beside the one you're trying for.

From the test changes, it looks like the current patch completely disables shrink-wrapping in all cases? Or am I missing something?

Doesn't @loopInfoRestoreOutsideLoop in llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll show that that's NOT the case? The mflr/mtlr pair are sunk+raised respectively since the entry block is conditional and doesn't use the stack.

@foo in llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll also has an early return rather than an unconditional prolog. bgelr (branch to lr if greater than or equal to). Compared to the codegen with shrink wrap disabled, the mflr/mtlr pair is unconditional.

Though I'm less convinced about @@shrinkwrap in llvm/test/CodeGen/PowerPC/ppc64-rop-protection.ll and @test in llvm/test/CodeGen/PowerPC/pr43527.ll.

Maybe verify that llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll shrink-wraps in both 32-bit and 64-bit modes?

In any case, the patch is clearly having some effect beside the one you're trying for.

diff --git a/llvm/lib/CodeGen/ShrinkWrap.cpp b/llvm/lib/CodeGen/ShrinkWrap.cpp
index f89069e9f728..01ec62dc02a0 100644
--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -288,8 +288,13 @@ bool ShrinkWrap::useOrDefCSROrFI(const MachineInstr &MI,
       // separately. An SP mentioned by a call instruction, we can ignore,
       // though, as it's harmless and we do not want to effectively disable tail
       // calls by forcing the restore point to post-dominate them.
+      const MachineFunction *MF = MI.getParent()->getParent();
+      const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
       UseOrDefCSR = (!MI.isCall() && PhysReg == SP) ||
-                    RCI.getLastCalleeSavedAlias(PhysReg);
+                    RCI.getLastCalleeSavedAlias(PhysReg) ||
+                    (MO.isEarlyClobber() &&
+                     !TRI->isCalleeSavedPhysReg(PhysReg, *MF) &&
+                     !TRI->isInAllocatableClass(PhysReg));
     } else if (MO.isRegMask()) {
       // Check if this regmask clobbers any of the CSRs.
       for (unsigned Reg : getCurrentCSRs(RS)) {

fixes just the intended case of ClobberLR_BR in llvm/test/CodeGen/PowerPC/ppc64-inlineasm-clobber.ll without regressing any other tests and without any other changes necessary.

It's written assuming that early-clobber is only used in clobber lists of TargetOpcode::INLINEASM and TargetOpcode::INLINEASM_BR. IDK if that's actually the case (ie. can other MachineInstrs have early-clobber MachineOperands?); perhaps I should check for either of those two opcodes instead? The second part of the check for !TRI->isCalleeSavedPhysReg(PhysReg, *MF) && !TRI->isInAllocatableClass(PhysReg)) could instead be converted into a virtual method on TargetRegisterInfo called isNonallocatableRegisterCalleeSave that returns false, and is overridden by PPCRegisterInfo to return true for LR and LR8 parameters?

nickdesaulniers retitled this revision from [PowerPC] add LR to CalleeSavedRegs to [ShrinkWrap] check for PPC's non-callee-saved LR.
nickdesaulniers edited the summary of this revision. (Show Details)
  • rewrite with new TargetRegisterInfo interface
  • restore previous short circuiting logic

Do we need coverage for lr use and/or def, in addition to clobber?

Do we need coverage for lr use and/or def, in addition to clobber?

Can you help me come up with test cases? :-3

I would have guessed for a use+def:

void x (void) {
    register int y asm("lr");
    asm (""::"r"(y));
}

but that actually asserts during post-RA pseudo expansion: Impossible reg-to-reg copy (probably we're missing support for mfspr?).

I also would have guess there's perhaps intrinsics for accessing the lr on PPC; instead it seems that mfspr is used to read the PPC Special Purpose Registers (SPRs) into the General Purpose Registers (GPRs) IIUC.

llvm/lib/CodeGen/ShrinkWrap.cpp
297

FWIW: this special case is also implicit-def. The full MachineOperand is implicit-def early-clobber $lr.

llvm/lib/Target/PowerPC/PPCRegisterInfo.h
190

I should probably assert that PPC::LR and PPC::LR8 are not callee saved, since that's the whole point of this interface.

Do we need coverage for lr use and/or def, in addition to clobber?

Can you help me come up with test cases? :-3

Oh, I assumed it was working based on other testcases in this file. I guess it actually isn't. (There are in-tree testcases with an "lr" output, but they discard the output value.)

I'm a little concerned that isEarlyClobber() won't handle this correctly if it ever does get implemented, but I guess we can leave that for a followup, if anyone hacking on the PPC backend is interested.

I would have guessed for a use+def:

void x (void) {
    register int y asm("lr");
    asm (""::"r"(y));
}

but that actually asserts during post-RA pseudo expansion: Impossible reg-to-reg copy (probably we're missing support for mfspr?).

Yes, probably copies just aren't implemented.

but I guess we can leave that for a followup, if anyone hacking on the PPC backend is interested.

Care to accept the revision then? Otherwise I could still modify isNonallocatableRegisterCalleeSave to be a combination of MCRegister.isAllocatable() && isCalleeSaved (for the base case, which the PPC version will defer to after first checking for lr+lr8.

efriedma accepted this revision.Jan 5 2022, 2:00 PM

LGTM, but give it a few days in case a PPC backend expert wants to comment.

This revision is now accepted and ready to land.Jan 5 2022, 2:00 PM
  • add some asserts, make base isNonallocatableRegisterCalleeSave actually do work.
nickdesaulniers marked 4 inline comments as done.Jan 5 2022, 2:43 PM
nickdesaulniers added inline comments.
llvm/lib/Target/PowerPC/PPCRegisterInfo.h
190

I've modified TargetRegisterInfo::isNonallocatableRegisterCalleeSave since you accepted this patch, @efriedma , PTAL.

efriedma added inline comments.Jan 5 2022, 3:06 PM
llvm/include/llvm/CodeGen/TargetRegisterInfo.h
1107

What's the motivation for making the target-independent function non-trivial?

nickdesaulniers added inline comments.Jan 5 2022, 3:14 PM
llvm/include/llvm/CodeGen/TargetRegisterInfo.h
1107

Two fold;

  1. given the method name isNonallocatableRegisterCalleeSave, we can actually tell if a given register is allocatable or callee saved. Do work that more closely matches the identifier.
  2. The method accepts a const MachineFunction * that is only really used to validate the callee saved registers list. Otherwise in PPCRegisterInfo::isNonallocatableRegisterCalleeSave() I need to have (void)MF; (I think) to silence any unused param warnings (or I could just revert the whole latest change).

Perhaps there's a more concise name than isNonallocatableRegisterCalleeSave if it's not actually going to tell you whether a register is BOTH non-allocatable AND callee saved (if we revert back to the target-independent base method just returning false)? I recognize I'm in bikeshed territory here and probably should just get back to work.

llvm/include/llvm/CodeGen/TargetRegisterInfo.h
1107

Thinking about this overnight, I think I should revert back to the previously approved diff (Diff 397682). Either PPC should make LR/LR8 callee saved, or it should not. I don't see that changing in the future if it doesn't do so now.

  • revert back to Diff 397682 (trivial target-independent function)
nickdesaulniers marked an inline comment as done.Jan 6 2022, 10:57 AM
efriedma accepted this revision.Jan 6 2022, 11:37 AM
nemanjai accepted this revision.Jan 7 2022, 5:28 AM

LGTM. Thanks for fixing this and sorry that our strange handling of the LR caused all the trouble. There are certainly reasons for the LR being weird on PPC.

jyknight added inline comments.Jan 7 2022, 12:07 PM
llvm/lib/CodeGen/ShrinkWrap.cpp
297

It's not clear to me why this checks "earlyclobber" -- surely it'd be wrong to shrinkwrap around any other uses of LR, as well, not just ones from inlineasm clobbers?

Thus, I'd suggest removing " && MO.isEarlyClobber()" here. Hopefully that won't cause any more test diffs. (And if it does, either that's showing examples of other more non-asm things we're getting wrong, or else show that I don't understand how it's supposed to work...)

nickdesaulniers added inline comments.Jan 7 2022, 2:47 PM
llvm/lib/CodeGen/ShrinkWrap.cpp
297

My motivation was that this was probably tied to TargetOpcode::INLINEASM and TargetOpcode::INLINEASM_BR. I suspect (but don't know) that they are the only MachineInstr that can have earlyclobber MachineOperands. That may not be correct; maybe it would better match my intent to just check for those 2 opcodes.

That said, if I removed the additional condition on MO.isEarlyClobber(), the following tests regress under llvm/test/:

Failed Tests (6):
  LLVM :: CodeGen/PowerPC/ppc-shrink-wrapping.ll
  LLVM :: CodeGen/PowerPC/ppc64-rop-protection-aix.ll
  LLVM :: CodeGen/PowerPC/ppc64-rop-protection.ll
  LLVM :: CodeGen/PowerPC/pr43527.ll
  LLVM :: CodeGen/PowerPC/remove-redundant-load-imm.ll
  LLVM :: CodeGen/PowerPC/xray-conditional-return.ll

(these were ones I had touched in an earlier version of this CL). Playing with llvm/utils/updated_llc_test_checks.py on some of those, it looks like we're blocking shrinkwrap with that approach).

llvm/lib/CodeGen/ShrinkWrap.cpp
297

ah, because $lr is the operand for each return on ppc: BLR8 implicit $lr8, implicit $rm. Technically, it is a "use" but now we're preventing the restore point from being raised (when I remove the condition on MO.isEarlyClobber()).

aarch64 (technically I think it's "ARMv8" and newer) has ret instructions; I'm guessing armv7's bx lr would also have this issue. Looks like in MIR, BX_RET 14, $noreg, implicit $r0 is lowered to bx lr, so in MIR the equivalent of bx lr doesn't list $lr as an operand (this is specific to arm). Should those instructions have implicit uses of their link register? Or should ppc's BLR8 not have an implicit use of $lr?

Perhaps changing the condition to MO.isDef() (from MO.isEarlyClobber()) or MO.isImp() would be more appropriate? Or only checking the MI.getOpcode() == TargetOpcode::INLINEASM || MI.getOpcode() == TargetOpcode::INLINEASM_BR?

jyknight added inline comments.Jan 10 2022, 2:56 PM
llvm/lib/CodeGen/ShrinkWrap.cpp
297

Maybe excluding MI.isReturn() from the check?

  • check for !MI.isReturn() rather than MO.isEarlyClobber
nickdesaulniers marked 2 inline comments as done.Jan 10 2022, 3:00 PM
nickdesaulniers added inline comments.
llvm/lib/CodeGen/ShrinkWrap.cpp
297

hah! I was literally testing that same idea! Yeah, it works. Uploaded that version, PTAL.

nickdesaulniers marked an inline comment as done.
  • rebase
jyknight accepted this revision.Jan 11 2022, 7:53 AM
This revision was automatically updated to reflect the committed changes.