SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll | ||
---|---|---|
12 | Seems like a regression. Does this get fixed by a later patch? |
llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll | ||
---|---|---|
12 | Yes, it is. |
llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll | ||
---|---|---|
12 | I'm not sure a separate pass using VirtRegMap is the right solution to merging spill VGPRs of different SGPRs, but either way this is a separate optimization that needs to be re-implemented. |
llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll | ||
---|---|---|
12 | It's worth implementing when it comes to saving a VGPR. |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
---|---|---|
465 | Needs to document what "custom" means. Also the fact that it's not serialized makes me nervous. However, that should be OK since this is only set and read in PEI so it should be OK. Ideally we would have somewhere else to put it |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
---|---|---|
469 | Isn't this count implied by SGPRToVGPRCustomSpills.size()? I'd like to avoid multiplying the number of unserialized fields |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
---|---|---|
469 | Currently, the function SIMachineFunctionInfo::allocateSGPRSpillToVGPR needs this variable to choose the num spills between the SILowerSGPRSpills pass and the custom spills later during FrameLowering. I'm planning to move these functions entirely out of SIMachineFunctionInfo and can avoid these variables entirely. |
Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:
- allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.
- They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.
- How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)
- Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.
- A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
466–469 | Simplify this further to a simpler for loop and finally SGPRToVGPRSpills.clear() | |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
462–463 | s/wave index/lane index/? | |
465 | Agree that "custom" is a bad name. It seems to be for prolog/epilog purposes, name it accordingly. |
The patch is reasonable in terms of what it does, by the way, just that the code is a mess and I think it should be cleaned up reasonably while it's being touched anyway.
Ideally, D124195 and D132436 are mostly code-refactor and enabler patches for spilling SGPRs into virtual VGPR lanes and they both should have gone in the final patch D124196 that does the spill to virtual VGPRs. I want to use the convention SGPRSpillToVirtVGPRLanes (for SILowerSGPRSpills) and SGPRSpillToPhysVGPRLanes (for SIFrameLowering) for the two maps that track the spill info. But combining them into a single review would make the patch more complex with too many things in one place. So, I have split them into separate reviews. At this point, the SGPR spills at both places go into physical VGPR lanes and I can’t use the aforementioned names for the maps. The original plan was to have a code clean-up after all these patches landed. Yes, SIMachineFunctionInfo is currently in a bad shape. I want to move out the spill related tables and methods and place them into SILowerSGPRSpills and SIFrameLowering passes. Yes, planning to introduce a structure (just like SIMachineFunctionInfo::SGPRToVGPRSpills). I can incorporate all the suggestions you mentioned here in the post-cleanup patch.
At this point, there is a lot of common code for spill handling. But after they become spill to Virtual vs Physical VGPRs, the bookkeeping differs, and we can have a better cleanup.
Hope that would be ok. For now, I will change the term “custom” in this review and can use a better name.
I don’t either like the name “custom”. But couldn’t find a better short name.
How about PrologEpilogSGPRSpillToVGPRLanes instead of SGPRToVGPRCustomSpills?
Hmm, I suppose we can live with that. I keep wishing for better ways to review patch series like this one. I miss e-mail based reviews :(
Hope that would be ok. For now, I will change the term “custom” in this review and can use a better name.
I don’t either like the name “custom”. But couldn’t find a better short name.
How about PrologEpilogSGPRSpillToVGPRLanes instead of SGPRToVGPRCustomSpills?
Yeah, it's long but that name is at least precise :) I guess longer term it just becomes spill-to-virtual and spill-to-physical as you said?
Yeah, it's long but that name is at least precise :) I guess longer term it just becomes spill-to-virtual and spill-to-physical as you said?
That's right.
Removed the prefix "Custom" from the SGPR spills during PrologEpilogInserter and used a meaningful name instead.
Made allocateVGPRForSGPRSpills & allocateVGPRForPrologEpilogSGPRSpills methods private.
I believe this is actually an optimization we're regressing on with the switch to spilling to virtual VGPRs. It will need to be reimplemented as a new optimization
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
339–342 | Can we defer this until after all the spills are handled? |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1283–1289 | Actually, do we really need to do this anymore? If they were allocated from virtual registers, they should have correct livens lists already |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1283–1289 | They are needed for prolog/epilog spill insertion. If we don't mark them liveIn, there will be a MIR verifier error indicating the use of undefined registers in spill instructions. |
llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll | ||
---|---|---|
415 | Why the behavior change? Is this restored in a later patch? |
llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll | ||
---|---|---|
415 | It's already been discussed. Jay earlier asked about the same in this review. |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1283–1289 | This feels too coarse grain. The whole point of doing this was to allocate these like normal virtual registers, which should then have naturally set liveins already. Is this only handling the prolog/epilog cases? It should only need to do anything for those | |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
298–305 | I think this referenced error cannot happen anymore | |
378 | IsPEI feels like the wrong name. IsPrologEpilog would be a bit better but not great |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1283–1289 | Yes, they are needed only for prolog/epilog spill cases. |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1283–1289 | But getWWMSpills covers everything? this is adding excess live ins? |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1283–1289 | It doesn't necessarily add the live-ins at the entry block. We insert the spill to a virt-VGPR at a block properly adding the IMPLICIT_DEF at its dominator block. The physical VGPR allocated for this virt-VGPR should be added to the prolog block live-ins otherwise verifier would complain about its spill store for using an undefined register. |
Actually, do we really need to do this anymore? If they were allocated from virtual registers, they should have correct livens lists already