This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
ClosedPublic

Authored by cdevadas on Apr 21 2022, 12:16 PM.

Details

Summary

SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.

Diff Detail

Event Timeline

cdevadas created this revision.Apr 21 2022, 12:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:16 PM
cdevadas requested review of this revision.Apr 21 2022, 12:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:16 PM
foad added inline comments.Apr 22 2022, 3:05 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12

Seems like a regression. Does this get fixed by a later patch?

cdevadas added inline comments.Apr 25 2022, 4:20 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12

Yes, it is.
With spilling SGPRs into virtual VPGR lanes, it won't directly be possible to track the unused lanes of the physical VGPR allocated for the last virtual register created during SILowerSGPRSpills pass.
Going to insert a custom pass in the VGPR regalloc pipeline to map the physReg from virtRegMap. In that way, we can reuse the VGPR for any custom SGPR spills during PEI if free lanes are available.
However, this regression can only be avoided for higher optimization levels. The regallocfastdoesn't provide a way to correctly map a virtual to PhysReg and we can't avoid this extra VGPR usage when compiled for -O0.

arsenm added inline comments.Apr 25 2022, 1:44 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12

I'm not sure a separate pass using VirtRegMap is the right solution to merging spill VGPRs of different SGPRs, but either way this is a separate optimization that needs to be re-implemented.

cdevadas added inline comments.Apr 26 2022, 3:45 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12

It's worth implementing when it comes to saving a VGPR.
Yep, planning it as a separate patch.

cdevadas updated this revision to Diff 440292.Jun 27 2022, 10:05 AM

Code rebase

arsenm added inline comments.Jun 28 2022, 11:51 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
442

Needs to document what "custom" means. Also the fact that it's not serialized makes me nervous. However, that should be OK since this is only set and read in PEI so it should be OK. Ideally we would have somewhere else to put it

cdevadas updated this revision to Diff 440730.Jun 28 2022, 12:51 PM

Added a meaningful comment for SGPRToVGPRCustomSpills.

arsenm added inline comments.Jun 28 2022, 3:41 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
449

Isn't this count implied by SGPRToVGPRCustomSpills.size()? I'd like to avoid multiplying the number of unserialized fields

cdevadas added inline comments.Sep 23 2022, 5:37 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
449

Currently, the function SIMachineFunctionInfo::allocateSGPRSpillToVGPR needs this variable to choose the num spills between the SILowerSGPRSpills pass and the custom spills later during FrameLowering. I'm planning to move these functions entirely out of SIMachineFunctionInfo and can avoid these variables entirely.

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

  • allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.
  • They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.
  • How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)
  • Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.
  • A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
461–470

Simplify this further to a simpler for loop and finally SGPRToVGPRSpills.clear()

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
440–441

s/wave index/lane index/?

442

Agree that "custom" is a bad name. It seems to be for prolog/epilog purposes, name it accordingly.

The patch is reasonable in terms of what it does, by the way, just that the code is a mess and I think it should be cleaned up reasonably while it's being touched anyway.

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

  • allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.
  • They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.
  • How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)
  • Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.
  • A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.

Ideally, D124195 and D132436 are mostly code-refactor and enabler patches for spilling SGPRs into virtual VGPR lanes and they both should have gone in the final patch D124196 that does the spill to virtual VGPRs. I want to use the convention SGPRSpillToVirtVGPRLanes (for SILowerSGPRSpills) and SGPRSpillToPhysVGPRLanes (for SIFrameLowering) for the two maps that track the spill info. But combining them into a single review would make the patch more complex with too many things in one place. So, I have split them into separate reviews. At this point, the SGPR spills at both places go into physical VGPR lanes and I can’t use the aforementioned names for the maps. The original plan was to have a code clean-up after all these patches landed. Yes, SIMachineFunctionInfo is currently in a bad shape. I want to move out the spill related tables and methods and place them into SILowerSGPRSpills and SIFrameLowering passes. Yes, planning to introduce a structure (just like SIMachineFunctionInfo::SGPRToVGPRSpills). I can incorporate all the suggestions you mentioned here in the post-cleanup patch.
At this point, there is a lot of common code for spill handling. But after they become spill to Virtual vs Physical VGPRs, the bookkeeping differs, and we can have a better cleanup.
Hope that would be ok. For now, I will change the term “custom” in this review and can use a better name.
I don’t either like the name “custom”. But couldn’t find a better short name.
How about PrologEpilogSGPRSpillToVGPRLanes instead of SGPRToVGPRCustomSpills?

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

  • allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.
  • They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.
  • How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)
  • Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.
  • A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.

Ideally, D124195 and D132436 are mostly code-refactor and enabler patches for spilling SGPRs into virtual VGPR lanes and they both should have gone in the final patch D124196 that does the spill to virtual VGPRs. I want to use the convention SGPRSpillToVirtVGPRLanes (for SILowerSGPRSpills) and SGPRSpillToPhysVGPRLanes (for SIFrameLowering) for the two maps that track the spill info. But combining them into a single review would make the patch more complex with too many things in one place. So, I have split them into separate reviews. At this point, the SGPR spills at both places go into physical VGPR lanes and I can’t use the aforementioned names for the maps. The original plan was to have a code clean-up after all these patches landed. Yes, SIMachineFunctionInfo is currently in a bad shape. I want to move out the spill related tables and methods and place them into SILowerSGPRSpills and SIFrameLowering passes. Yes, planning to introduce a structure (just like SIMachineFunctionInfo::SGPRToVGPRSpills). I can incorporate all the suggestions you mentioned here in the post-cleanup patch.
At this point, there is a lot of common code for spill handling. But after they become spill to Virtual vs Physical VGPRs, the bookkeeping differs, and we can have a better cleanup.

Hmm, I suppose we can live with that. I keep wishing for better ways to review patch series like this one. I miss e-mail based reviews :(

Hope that would be ok. For now, I will change the term “custom” in this review and can use a better name.
I don’t either like the name “custom”. But couldn’t find a better short name.
How about PrologEpilogSGPRSpillToVGPRLanes instead of SGPRToVGPRCustomSpills?

Yeah, it's long but that name is at least precise :) I guess longer term it just becomes spill-to-virtual and spill-to-physical as you said?

Yeah, it's long but that name is at least precise :) I guess longer term it just becomes spill-to-virtual and spill-to-physical as you said?

That's right.

cdevadas updated this revision to Diff 472236.Nov 1 2022, 1:45 AM
cdevadas retitled this revision from [AMDGPU] Separate out custom SGPR spills to VGPR during PEI to [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.
cdevadas edited the summary of this revision. (Show Details)
cdevadas added a reviewer: nhaehnle.

Removed the prefix "Custom" from the SGPR spills during PrologEpilogInserter and used a meaningful name instead.

cdevadas updated this revision to Diff 472278.Nov 1 2022, 6:39 AM

Made allocateVGPRForSGPRSpills & allocateVGPRForPrologEpilogSGPRSpills methods private.

  • How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)

I believe this is actually an optimization we're regressing on with the switch to spilling to virtual VGPRs. It will need to be reimplemented as a new optimization

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
348–351

Can we defer this until after all the spills are handled?

cdevadas updated this revision to Diff 472474.Nov 1 2022, 6:44 PM

Deferred adding lane VGPR into BBLiveIns until all SGPR spills are handled.

arsenm added inline comments.Nov 1 2022, 6:59 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302

Actually, do we really need to do this anymore? If they were allocated from virtual registers, they should have correct livens lists already

cdevadas added inline comments.Nov 1 2022, 7:27 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302

They are needed for prolog/epilog spill insertion. If we don't mark them liveIn, there will be a MIR verifier error indicating the use of undefined registers in spill instructions.

arsenm added inline comments.Nov 7 2022, 8:00 AM
llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
415

Why the behavior change? Is this restored in a later patch?

cdevadas added inline comments.Nov 7 2022, 8:23 AM
llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
415

It's already been discussed. Jay earlier asked about the same in this review.
I'm planning a follow-up patch to regain it. Using the VRM map, the unused lanes of the last allocated VGPR virtual register for SGPR spilling can be tracked and can use later during FrameLowering while trying to spill FP/BP.

arsenm added inline comments.Nov 14 2022, 11:54 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302

This feels too coarse grain. The whole point of doing this was to allocate these like normal virtual registers, which should then have naturally set liveins already. Is this only handling the prolog/epilog cases? It should only need to do anything for those

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
307–314

I think this referenced error cannot happen anymore

349

IsPEI feels like the wrong name. IsPrologEpilog would be a bit better but not great

cdevadas added inline comments.Nov 14 2022, 10:08 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302

Yes, they are needed only for prolog/epilog spill cases.

cdevadas updated this revision to Diff 475339.Nov 14 2022, 10:14 PM

Renamed IsPEI to IsPrologEpilog & removed the unwanted comment.

arsenm added inline comments.Nov 14 2022, 10:53 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302

But getWWMSpills covers everything? this is adding excess live ins?

cdevadas added inline comments.Nov 14 2022, 11:06 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1296–1302

It doesn't necessarily add the live-ins at the entry block. We insert the spill to a virt-VGPR at a block properly adding the IMPLICIT_DEF at its dominator block. The physical VGPR allocated for this virt-VGPR should be added to the prolog block live-ins otherwise verifier would complain about its spill store for using an undefined register.

arsenm accepted this revision.Nov 17 2022, 5:33 PM
This revision is now accepted and ready to land.Nov 17 2022, 5:33 PM
cdevadas updated this revision to Diff 482885.Dec 14 2022, 9:06 AM

code rebase

arsenm accepted this revision.Dec 14 2022, 10:10 AM
This revision was landed with ongoing or failed builds.Dec 16 2022, 10:20 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll