This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add WWM reserved VGPRs to WWMSpills
ClosedPublic

Authored by cdevadas on Apr 21 2022, 12:15 PM.

Details

Summary

The custom VGPR spills inserted during frame lowering
maintain a separate list for WWM reserved registers.
Added them into WWMSpills that already tracks such
reserved registers. It unifies the spill insertion.

Diff Detail

Event Timeline

cdevadas created this revision.Apr 21 2022, 12:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:15 PM
cdevadas requested review of this revision.Apr 21 2022, 12:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:15 PM
arsenm added inline comments.Apr 21 2022, 1:32 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1252

If we're just going to add these to lane VGPRs, is there any real reason to distinguish the WWM registers? Could we just add them to LaneVGPRs to start with?

1252–1253

Theoretically we could want to call determineCalleeSaves multiple times, so I'm not sure accumulating state here is a good idea.

llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
151 ↗(On Diff #424259)

Drop this comment, it's not unconditionally true. The relevant problem is addressed in frame lowering

cdevadas added inline comments.Apr 25 2022, 4:15 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1252

The separate list was maintained for Serialize info. We could directly add them into LaneVGPRs otherwise.

1252–1253

The best place to fill in the LaneVGPRs (both writelane and WWM reserved VGPRs) would be in processFunctionBeforeFrameFinalized. But that is invoked after determineCalleeSaves where we mask these spills off the CSR list so that the default spill insertion in PEI would defer them, and we could later custom insert them during FrameLowering.

At this moment I don't see a better place to move it. We could add a condition to enable this code only once if determineCalleeSaves needs to be called multiple times.

cdevadas updated this revision to Diff 425169.Apr 26 2022, 3:45 AM

Removed the unwanted comment.

arsenm added inline comments.May 17 2022, 2:55 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1252

The serialization is supposed to reflect whatever's here. It's not worth preserving for its own sake

It makes sense to unify that handling, but I object to the term "lane VGPRs". In a way, it's the other way around: the registers that are used for SGPR->VGPR spills are de facto "WWM VGPRs", because they're used in a wave-wide manner, which is the opposite of lane-based usage.

cdevadas updated this revision to Diff 440285.Jun 27 2022, 9:54 AM
cdevadas retitled this revision from [AMDGPU] Add WWM reserved VGPRs to lane VGPRs list to [AMDGPU] Add WWM reserved VGPRs to existing WWMVGPRs list.
cdevadas edited the summary of this revision. (Show Details)

Code rebase + review title and summary fixup.

arsenm added inline comments.Jun 27 2022, 5:25 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
474–483

I'm confused by the relationship between "WWMReservedRegs" and WWMVGPRs.

cdevadas updated this revision to Diff 440471.Jun 27 2022, 11:04 PM

Improved the comment for WWMReservedRegs to have more details especially to differentiate it from the superset WWMVGPRs.

arsenm added inline comments.Jun 28 2022, 11:02 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1252

allocateWWMSpill would be a better name

cdevadas updated this revision to Diff 440725.Jun 28 2022, 12:41 PM
cdevadas retitled this revision from [AMDGPU] Add WWM reserved VGPRs to existing WWMVGPRs list to [AMDGPU] Add WWM reserved VGPRs to WWMSpills.
cdevadas edited the summary of this revision. (Show Details)

Code rebase.

arsenm added inline comments.Jun 28 2022, 1:17 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1252–1253

With the rename it's clearer what's going on here now. I'd prefer to keep the frame index creation in processFunctionBeforeFrameFinalized

cdevadas updated this revision to Diff 440938.Jun 29 2022, 4:37 AM

Moved WWM reserved registers frame index creation into processFunctionBeforeFrameFinalized.

@sebastian-ne Is there a case when SIPreAllocateWWMRegs pass allocates a CSR VGPR to operands in WWM instructions?
If it is true, we may have a bug in the current handling that we failed to reset the WWMReservedRegs from the SavedVGPRs during determineCalleeSaves that allow us later to custom insert these spills in the emitPrologue/emitEpilogue functions.

Here is what we do for VGPRs used for SGPR spills.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp#L1278

It doesn't come under the scope of this patch. But if this case can arise, we must fix it.

@sebastian-ne Is there a case when SIPreAllocateWWMRegs pass allocates a CSR VGPR to operands in WWM instructions?
If it is true, we may have a bug in the current handling that we failed to reset the WWMReservedRegs from the SavedVGPRs during determineCalleeSaves that allow us later to custom insert these spills in the emitPrologue/emitEpilogue functions.

Here is what we do for VGPRs used for SGPR spills.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp#L1278

It doesn't come under the scope of this patch. But if this case can arise, we must fix it.

Sorry for getting back to you so late. You are correct, WWMReservedRegs should be removed from the SavedVGPRs during determineCalleeSaves.

Currently, if WWM registers fall on callee-saves, they are spilled twice, once as CSR, once as WWM register. At least it’s not incorrect.
Here’s a test demonstrating this: https://github.com/Flakebi/amdvlk-llvm-project/commit/0b0e056c5be110096a92c8175d6650d8535d4e33#diff-eddae1d72701ad027181cd89a7409a0b653da8debae655c3379c5b4d97934a73R1241-R1246

This revision is now accepted and ready to land.Oct 28 2022, 7:28 AM
arsenm accepted this revision.Nov 1 2022, 2:37 PM
cdevadas updated this revision to Diff 482879.Dec 14 2022, 8:59 AM

Code rebase

arsenm accepted this revision.Dec 14 2022, 10:10 AM
This revision was landed with ongoing or failed builds.Dec 16 2022, 10:18 PM
This revision was automatically updated to reflect the committed changes.