This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
ClosedPublic

Authored by RamNalamothu on Nov 26 2021, 10:37 AM.

Details

Summary

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register sgpr30_sgpr31. This is resolved by hiding the return address register
use by the return instruction through the SI_RETURN pseudo instruction, which
doesn't take any input operands, until the SI_RETURN pseudo gets lowered to the
S_SETPC_B64_return during the expandPostRAPseudo().

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Diff Detail

Event Timeline

RamNalamothu created this revision.Nov 26 2021, 10:37 AM
RamNalamothu requested review of this revision.Nov 26 2021, 10:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 26 2021, 10:37 AM
sebastian-ne added inline comments.Nov 26 2021, 3:28 PM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
94–96

Can the MVT be deduced from getRegSizeInBits(*getPhysRegClass(Reg))? I think that would look cleaner.

232

Why is the return address only marked as callee-save if this function has calls? I assume it can’t it be overwritten if there are no calls because it is marked as reserved?

Coming back here after reading the tests, I think s[30:31] should also be marked as callee-save if there are no calls. It shouldn’t cause harm if there are no calls (the register is reserved anyway) and it should fix the tests.

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
897

This test is not working with this patch. It has no calls, so s[30:31] are not marked as callee-save. But they are overwritten in inline assembly.
Before this patch, they were saved and restored. With this patch this is not the case and the return will jump to an invalid address.

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll
75–84

I think this test is supposed to clobber the return address in s[30:31].

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
419

I think these tests should continue to work without the need to introduce a call.

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll
127

I think this test purposefully clobbered the return address in s[30:31].

sebastian-ne added inline comments.Nov 26 2021, 3:28 PM
llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
26–49

This codes gets quite a bit worse. But I’m not sure if we can do anything about it.

RamNalamothu added inline comments.Nov 26 2021, 8:42 PM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
94–96

Can the MVT be deduced from getRegSizeInBits(*getPhysRegClass(Reg))? I think that would look cleaner.

Will check, thank you.

232

Why is the return address only marked as callee-save if this function has calls? I assume it can’t it be overwritten if there are no calls because it is marked as reserved?

Yes.

Coming back here after reading the tests, I think s[30:31] should also be marked as callee-save if there are no calls. It shouldn’t cause harm if there are no calls (the register is reserved anyway) and it should fix the tests.

AFAIK that's the eventual goal and there is a discussion to pick the ABI registers for return address from the callee saved range. And these changes should work with no/minimal changes.

However, I don't think I understand how the tests will get fixed automatically if the return address registers are marked as callee saved because even the SGPR CSR save/restore happens during SGPRSpillLowering which is where I am manually adding s[30:31] to the CSR list.

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
419

I think these tests should continue to work without the need to introduce a call.

Thanks. I will take a re-look at it. I don't recall now why I added the call.

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll
127

I think this test purposefully clobbered the return address in s[30:31].

I believe the intent here is to test the behavior when non callee saved registers are clobbered inside inline asm but not particularly s[30:31] and I guess s[30:31] happened to be used since they are the boundary of non callee save range. The same goes for the other tests which were clobbering s[30:31].

As of now there is no policy on how the inline asm could use the ABI registers. I spoke to @arsenm and he is aware of this test changes.

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
26–49

This codes gets quite a bit worse. But I’m not sure if we can do anything about it.

Yeah. This is a known scenario where previously the s[30:31] were being spilled to another SGPR pair, which was used to return, but now since the return address should always be in s[30:31] it has to be spilled into either VGPR or memory.

For now, we will have to live with it.

May be we could do something during SGPRSpillLowering to use the available SGPRs, not only for this case but in general?
I didn't check if that capability is already in place and s[30:31] is a corner case which is not handled.

scott.linder added inline comments.Nov 29 2021, 3:22 PM
llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
26–49

I'm not sure I understand how the old code is correct to begin with; does it rely on knowledge about the called function?

I think in a "fixed function ABI" world we can't just clobber (in this example) s[36:37], right?

sebastian-ne added inline comments.Nov 30 2021, 12:01 AM
llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
26–49

As you said, it relies on knowledge about the called function.
The IPRA (Inter Procedural Register Allocation) pass looks at direct callees and removes registers from the call-clobbered list if the called function does not touch them.

arsenm added inline comments.Dec 1 2021, 5:57 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

I don't think this register should be reserved, which avoids the need to do this

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
686–689

I don't think it's necessary to reserve this

Rebase and fix tests in llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll.

RamNalamothu marked 2 inline comments as done.Dec 1 2021, 6:23 AM
RamNalamothu added inline comments.
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
94–96

getRegSizeInBits(*getPhysRegClass(Reg)) is not useful as getPhysRegClass(Reg) gives base register class i.e. SReg_32.

/// Return the 'base' register class for this register.
/// e.g. SGPR0 => SReg_32, VGPR => VGPR_32 SGPR0_SGPR1 -> SReg_32, etc.
const TargetRegisterClass *getPhysRegClass(MCRegister Reg) const;
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
686–689

But otherwise the register allocation thinks it is available for allocation as s[30:31] are scratch registers at the moment and its usage with return instruction is hidden with using SI_RETURN through the register allocation and we don't emit Live-ins anymore.

And we will have to reserve it here even after these registers actually get picked from CSR range because we want to emit save/restore exclusively so that the unwind info machinery knows it and can generate CFI as needed for return address. That's the reason why all of this got started right.

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
419

My bad. I understood them wrongly.

Yes, you are right. Removed the calls and fixed the tests accordingly.

arsenm added inline comments.Dec 1 2021, 7:26 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
686–689

I think something special happens with the return address we are ignoring, but I haven't followed it all the way through.

AArch64 passes LR to the TargetRegisterInfo constructor, where we're passing the dummy PC_REG. If you replace that with the real return address, does that help?

arsenm added inline comments.Dec 1 2021, 7:36 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
686–689

Even if not, AArch64 does not reserve the return address

RamNalamothu added inline comments.Dec 1 2021, 8:41 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
686–689

I looked at it already and it appears there is nothing magical about it except getting MCRegisterInfo::getRARegister() to return that register from the constructor.

And we get the same by hardcoding s[30:31] in SIRegisterInfo::getReturnAddressReg().

Though I didn't go through everything that Hexagon does, but they also reserve the return address register.

def retflag : SDNode<"HexagonISD::RET_FLAG", SDTNone,
                      [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
def: Pat<(retflag),   (PS_jmpret (i32 R31))>;
sebastian-ne added inline comments.Dec 1 2021, 9:07 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

Why is the return address not callee-save if there are no calls?

Does removing the && MFI.hasCalls() condition have any bad effects?

RamNalamothu added inline comments.Dec 1 2021, 9:19 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

No functional issues but its just that it will be unnecessary to save/restore when the registers are not clobbered i.e. when we have no calls.

sebastian-ne added inline comments.Dec 2 2021, 1:15 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

But if we have no calls and the register is not clobbered, marking the registers as callee-save shouldn’t save and restore them?

I think it doesn’t hurt to remove this condition. On the contrary, I expect that removing the hasCalls condition will fix the case where s30 or s31 are overwritten in inline-assembly.

RamNalamothu added inline comments.Dec 2 2021, 2:36 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

Sorry, I didn't get your point well.

Are you saying we should save/restore s[30:31] even when we don't have calls i.e. even when s[30:31] is not modified in the current function? Does that help in any way, other than the inline asm scenario?

For the inline asm scenario, as I already mentioned. I heard that we currently don't have any policy and the ABI registers cannot be clobbered like any other scratch register right?
And even if we want to handle that scenario, I guess a blanket removal of hasCalls() condition is not the best/optimal IMO because we will be emitting unnecessary save/restore if the code inside inline asm does not actually modify s[30:31].

sebastian-ne added inline comments.Dec 2 2021, 3:45 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

Ups, ignore my previous comment. I should have read the code again before making that comment.

What I thought this code would do, is add s[30:31] to the callee-save register list if this condition is satisfied. I did not see that it is already spilling it, based on the condition.

Let me try what happens if s[30:31] is (unconditionally) added to the CSR list and the code here is removed.

sebastian-ne added inline comments.Dec 2 2021, 5:37 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

I tested it and I think marking s[30:31] as callee-save register instead of explicitly saving them here is better.

  1. We don’t need hasCall to check if s[30:31] may be overwritten.
  2. We shouldn’t need to mark the register as reserved (didn’t test this).
  3. It “just works”, even if there is inline assembly overwriting s30 or s31 (and there are no unnecessary saves/restores if it isn’t overwritten).

The last point also means you don’t need to add work-arounds to tests by introducing artificial calls or changing the registers in inline assembly, because it will keep working as is.

See this sketched commit: https://github.com/Flakebi/amdvlk-llvm-project/commit/240040601e0f8849acb85141142408b3e6a32ff9
E.g. it fixed the test here: https://github.com/Flakebi/amdvlk-llvm-project/commit/240040601e0f8849acb85141142408b3e6a32ff9#diff-25d99c15c21155a13557b29a1c0edfa792c19040f02a0d7cd676865b46275713L75-R88

RamNalamothu added inline comments.Dec 2 2021, 6:35 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

What you did there is you just took a part of this patch and doing that is incorrect. If you still want to do those experiments, please take the full patch and try making your changes on top of that.

This patch makes changes to return instruction and its use of return address during the instruction selection. I think you missed to fully understand that aspect.

Please read again my earlier response to @arsenm which I copied here.

I don't think it's necessary to reserve this

But otherwise the register allocation thinks it is available for allocation as s[30:31] are scratch registers at the moment and its usage with return instruction is hidden with using SI_RETURN through the register allocation and we don't emit Live-ins anymore.

And we will have to reserve it here even after these registers actually get picked from CSR range because we want to emit save/restore exclusively so that the unwind info machinery knows it and can generate CFI as needed for return address. That's the reason why all of this got started right.

sebastian-ne added inline comments.Dec 2 2021, 7:01 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

I used arc patch to get the diff (https://github.com/Flakebi/amdvlk-llvm-project/commit/240040601e0f8849acb85141142408b3e6a32ff9) and based my commit on top of that, so I hope it got all of your changes?

Unfortunately I’m not familiar with CFI (I guess it refers to call-frame-information and not control-flow-integrity?). Are there more upcoming patches that need the explicit spill here and can’t rely on the llvm infrastructure for CSRs?

I like that this patch simplifies the call-related code in amdgpu, but as you probably noticed, I’m not that happy that inline assembly that touches s[30:31] now ends up with non-working code.

If this is an intended change (and I don’t feel authoritative to say if this is fine or not), I think it should be mentioned in the commit message that s[30:31] is not saved anymore if used in inline assembly. And the tests that intend to test that (like @void_func_void_clobber_s30_s31) should probably just be removed.

RamNalamothu added inline comments.Dec 2 2021, 7:53 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232

Ah, my bad. I didn't see the parent commit.

It is call frame information and there will be additional CFI changes but only in the downstream. They are not/can't be up-streamed yet.

This patch is written with the assumption that s[30:31] can't be moved to CSR range right now. Like I mentioned in the beginning, there is already a discussion to pick a different register pair (i.e. s[28:29]) for return address which will be part of the (new) CSR range.

Yes, the condition check in SILowerSGPRSpills can be removed if we can move s[30:31] to CSR range now. @t-tye @arsenm what do you say?

However s[30:31] spill should not come from the register allocation and must be generated only during SGPR/frame lowering for the unwind info (CFI) generation. For this, I still believe s[30:31] has to be reserved so that the register allocation won't interfere with that. Or am I missing something here @scott.linder ?

RamNalamothu added inline comments.Dec 6 2021, 6:20 AM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
232
RamNalamothu retitled this revision from [AMDGPU] Treat call clobbered return address registers s[30:31] as callee saved to [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range.
RamNalamothu edited the summary of this revision. (Show Details)

Rebase onto ToT and also made the following changes to address review feedback.

  • Move s[30:31] into CSR range
  • Do not reserve s[30:31] during register allocation

Thanks for making these changes!
The patch looks good to me, I’ll leave it for others to accept.

arsenm added inline comments.Dec 15 2021, 4:40 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2048

If we're going to just hardcode this to s[30:31], probably should do away with getReturnAddressReg

2051–2056

There's a copyImplicitOperands helper for this

RamNalamothu marked an inline comment as done.

Rebase onto ToT and use copyImplicitOps.

RamNalamothu marked an inline comment as done.Dec 20 2021, 5:58 AM

I think I have addressed all the outstanding feedback. Is this good to go now?

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2048

I believe using getReturnAddressReg() doesn't hurt and any future updates can be made easily without changing all the places.

arsenm accepted this revision.Dec 21 2021, 6:42 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2052

Should break or return true, I don't like switch fallthrough here

This revision is now accepted and ready to land.Dec 21 2021, 6:42 PM
This revision was landed with ongoing or failed builds.Dec 22 2021, 7:21 AM
This revision was automatically updated to reflect the committed changes.
RamNalamothu marked an inline comment as done.
RamNalamothu marked an inline comment as done.Dec 22 2021, 7:24 AM
RamNalamothu added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2052

Ah, thanks for catching it. Fixed in the committed changes.

RamNalamothu reopened this revision.Jan 14 2022, 11:21 AM
RamNalamothu marked an inline comment as done.
This revision is now accepted and ready to land.Jan 14 2022, 11:21 AM

Fixed an issue exposed by OpenMP LIT test failures.

determineCalleeSavesSGPR() now explicitly adds S30 and S31 to CSR list
when we have calls and this is needed for cases like D117243.

arsenm accepted this revision.Jan 14 2022, 1:39 PM
ronlieb accepted this revision.Jan 17 2022, 3:24 AM

Fixed the blender application failure seen in the downstream testing.

Added the test llvm/test/CodeGen/AMDGPU/ipra-return-address-save-restore.ll to catch that scenario.

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 4:48 PM
Herald added a subscriber: arichardson. · View Herald Transcript
arsenm added inline comments.Mar 7 2022, 5:21 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1351 ↗(On Diff #413656)

Referring to phabricator reviews in a comment is weird

1359 ↗(On Diff #413656)

Can just directly check isPhysRegModified(RetAddrReg)

Address review feedback.

RamNalamothu marked 2 inline comments as done.Mar 8 2022, 2:39 AM
RamNalamothu added inline comments.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1351 ↗(On Diff #413656)

I have seen this in many places of clang/llvm and in fact in AMDGPU's source/tests as well. But if you say this is not a recommended practice, I will remove those references while landing the patch.

1359 ↗(On Diff #413656)

Fixed. Thanks.

arsenm accepted this revision.Mar 8 2022, 8:15 AM

I'm not sure I understand why a separate isPhysRegUsed check is needed since ultimately that's what IPRA is doing, but I don't see how it could hurt

RamNalamothu marked 2 inline comments as done.Mar 8 2022, 8:43 AM

I'm not sure I understand why a separate isPhysRegUsed check is needed since ultimately that's what IPRA is doing, but I don't see how it could hurt

The separate isPhysRegModified check is needed because we will have the SI_RETURN pseudo, which doesn't have return address operand/use, when IPRA's register usage collection runs and that hides the return address use.
Since the caller function save/restores return address registers, as the call instruction clobbers them, and return address registers use in the callee function is hidden, the IPRA/register allocator concludes that return address registers can be clobbered without save/restoring them in the callee function if the callee function doesn't have any function calls.

Hope that helps.

This revision was landed with ongoing or failed builds.Mar 8 2022, 10:48 PM
This revision was automatically updated to reflect the committed changes.