This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
ClosedPublic

Authored by cdevadas on Sep 30 2022, 4:43 AM.

Details

Summary

Unlike the callee-saved VGPR spill instructions emitted by
PEI::spillCalleeSavedRegs, the CS VGPR spills inserted during
emitPrologue/emitEpilogue require the exec bits flipping to avoid
clobbering the inactive lanes of VGPRs used for SGPR spilling.
Currently, these spill instructions are referenced from the SP at
function entry and when the callee performs a stack realignment,
they ended up getting incorrect stack offsets. Even if we try to
adjust the offsets, the FP-SP becomes a runtime entity with dynamic
stack realignment and the offsets would still be inaccurate.

To fix it, use FP as the frame base in the spill instructions
whenever the function has FP. The offsets obtained for the CS
objects would always be the right values from FP.

Diff Detail

Event Timeline

cdevadas created this revision.Sep 30 2022, 4:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2022, 4:43 AM
cdevadas requested review of this revision.Sep 30 2022, 4:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2022, 4:43 AM
arsenm added inline comments.Sep 30 2022, 5:05 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
343

I don't understand what this is supposed to mean

352

I don't understand ignore, but checkIgnore is a weird getter name. getIgnored or isIgnored

cdevadas updated this revision to Diff 464231.Sep 30 2022, 5:35 AM

Changed the flag to IsIgnored.

arsenm added inline comments.Oct 27 2022, 7:50 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1126–1127

Don't understand why this is special cased when you also call hasFP below

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
343

I still don’t understand what ignored really man’s or why it would occur

arsenm added inline comments.Oct 27 2022, 7:52 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1088

It kind of doesn't matter at this point

cdevadas added inline comments.Oct 27 2022, 9:54 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1114

Here is the case why we need IsIgnored. When FP itself is spilled and the SGPRSaveKind is COPY_TO_SCRATCH_SGPR, I avoid the temporary copy to preserve the old value of FP. Instead, I spill FP up front (a copy to the selected scratch SGPR) and mark it as ignored while processing all other spills in the usual way. I could have deleted the entry from CustomSGPRSpills Map. But the entry is needed for emitting epilogue spill.

You can see the test case AMDGPU/GlobalISel/call-outgoing-stack-args.ll ( in func_caller_stack) FP is moved to s4 in the prolog and later copied back from s4 to FP in the epilog.
But in AMDGPU/GlobalISel/localizer.ll (sink_null_insert_pt), FP is temporarily moved to s16 before bumping it and then s16 is spilled into VGPR41 lane 0. Here the temporary copy is needed because the spill to VGPR41 can be inserted only after spilling VGPR41 to preserve its value.

Another situation where I use the IsIgnored field is in the downstream copy while emitting CFI for the EXEC MASK. There the spill is needed only in the prolog and not in the epilog. So I mark it ignored after emitting the EXEC prolog spill.

1126–1127

When FP is used, the spill instructions are going to use FP as the frame register. In that case, the spill instructions should be emitted only after FP is assigned the new value. Otherwise, the spill instructions use SP as before and they are all inserted before SP is bumped to the new value.
I just moved out TRI.hasStackRealignment(MF) from the condition below, updated HasFP earlier, and then used it to see whether FP is not going to be used in the function along with !hasFP(MF) check.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
343

This is needed to handle the case when FP itself is spilled while FP is used as the frame register in the spill instructions (buffer_load/store).
Better explained with a test case reference in the other comment.

The term "custom" here is problematic. Can we be more precise?

The term "custom" here is problematic. Can we be more precise?

Yes, will avoid the use of "custom" here.
Will change the title into [AMDGPU][SIFrameLowering] Use the right frame register in CSR spills.
Also, I will fix the summary.

cdevadas updated this revision to Diff 472287.Nov 1 2022, 6:59 AM
cdevadas retitled this revision from [AMDGPU] Use the right frame register in custom CSR spills to [AMDGPU][SIFrameLowering] Use the right frame register in CSR spills.
cdevadas edited the summary of this revision. (Show Details)

Fixed the use of prefix "Custom" while referring the SGPR spills during PEI.
Also, fixed the review summary and title.

arsenm added inline comments.Nov 14 2022, 1:17 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1114

This needs to go in a comment on the field (It also might be clearer to just special case these registers cases directly)

scott.linder added inline comments.Nov 14 2022, 1:48 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1115

I agree with others that this Ignored mode is hard to reason about, although now that I think I understand it I'm not certain I have a suggestion to improve it.

One thought I had was to try to make the case it is needed for "just work" by changing the order SGPR->SGPR spills occur relative to the FP setup. Is there a reason we can't just emit all of the COPY_TO_SCRATCH_SGPR spills before we set up FP? In that version the only special case remaining is for managing FramePtrRegScratchCopy, and I don't see any way to simplify that further.

The whole thing would look like:

<SGPR->SGPR spills>
if (<FP is spilled, and the kind of that spill is not COPY_TO_SCRATCH_SGPR>) {
  <Do the FramePtrRegScratchCopy scavenging>
}
<FP and SP setup>
<All remaining (i.e. non SGPR->SGPR) spills, respecting FramePtrRegScratchCopy>

For the downstream case, can we just erase the entry in the map after emitting the spill?

For the downstream case, can we just erase the entry in the map after emitting the spill?

Can do that.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1115

I will remove the Ignored flag entirely and can handle it differently.

We should be able to use SP relative offset to spill CSRs before adjusting the SP at function entry. It currently needs some effort in the AMDGPU frame-lowering implementation to adjust the offsets assigned for the CSR objects before inserting the spill code for various cases, especially when stack realignment is needed. Some targets handle it better (I believe AArch64).
It's already in the pipeline and should be implemented soon.

cdevadas updated this revision to Diff 475429.Nov 15 2022, 4:46 AM

Removed the IsIgnored flag and special-handled FP spill & restore cases.

arsenm accepted this revision.Nov 17 2022, 2:33 PM

LGTM

This revision is now accepted and ready to land.Nov 17 2022, 2:33 PM
cdevadas updated this revision to Diff 482889.Dec 14 2022, 9:14 AM

code rebase

arsenm accepted this revision.Dec 14 2022, 9:55 AM
This revision was landed with ongoing or failed builds.Dec 16 2022, 10:23 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll