This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Implement whole wave register spill
ClosedPublic

Authored by cdevadas on Feb 10 2023, 9:30 AM.

Details

Summary

To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.

This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.

Diff Detail

Event Timeline

cdevadas created this revision.Feb 10 2023, 9:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 9:30 AM
cdevadas requested review of this revision.Feb 10 2023, 9:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 9:30 AM
yassingh updated this revision to Diff 523313.May 18 2023, 3:22 AM
  • Rebase over new TARGET::PRED_COPY approach
arsenm requested changes to this revision.Jun 19 2023, 6:15 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIDefines.h
934

I assumed the value would be 1. Change this to be the pre-shifted values rather than using 1 << flag in the getter?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13430–13436

Move this up with the other reserved register logic

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
549

Capitalization is weird, Capitalize clone

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
651

Why are the inbounds checks needed? The analagous MRI functions do not tolerate invalid register values

656–657

This should assert (like setFlag already does)

659

Ditto, why range check?

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2151–2152

Braces and indent under if

2160

Indent off

This revision now requires changes to proceed.Jun 19 2023, 6:15 AM
cdevadas added inline comments.Jun 19 2023, 6:53 AM
llvm/lib/Target/AMDGPU/SIDefines.h
934

Will do.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
549

Will do.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
651

VRegFlags will be grown each time (if the size exceeds the initial size set) when a new virtual register is created/cloned. The MRI's 'VRegToType' has a similar bound check.

656–657

I guess it's already been discussed in the downstream review. There are instances when Physical registers are taken to spill - fastAlloc, CSR spills during PEI. To handle such instances, an early return was inserted intentionally for the PhysRegs instead of an assertion.

arsenm added inline comments.Jun 19 2023, 6:55 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
651

VRegToType isn't really analagous. Untyped and typed registers can coexist, I'd expect it to be unnecessary here

arsenm added inline comments.Jun 19 2023, 6:56 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13430–13436

Actually, why is this not in getReservedRegs?

yassingh updated this revision to Diff 532851.Jun 20 2023, 3:19 AM

Addressed Matt's comments:

  • Updated flag
  • Formatting
yassingh added inline comments.Jun 20 2023, 3:23 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
651

Getting rid of inbounds check causes multiple test failures. (In spill192.mir, spill224.mir, spill288.mir, spill320.mir, etc).

yassingh added inline comments.Jun 20 2023, 3:25 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2151–2152

Missed indenting, will fix it.

cdevadas added inline comments.Jun 21 2023, 6:51 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2151–2152

Update the patch after clang-format.

arsenm added inline comments.Jun 21 2023, 6:53 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
647

D149775 is adding another reserved register, but I'm pretty sure they cannot contextually overlap. After both patches are in they should be consolidated to reuse the same reserved register

arsenm added inline comments.Jun 21 2023, 7:12 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1467

ST.getWaveMaskRegClass

1469

This misses VCC as a possibility, which is fixed by using getWaveMaskRegClass

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
338

Same, use getWaveMaskRegClass

yassingh updated this revision to Diff 533300.Jun 21 2023, 9:36 AM
  • Review comments (use getWaveMaskRegClass(), getExec() functions)
yassingh added inline comments.Jun 21 2023, 9:37 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
647

Will update once both lands. Added FIXME for now.

2151–2152

Oh right, clang-format reverts the indentation. Going to leave it as it is, else clang-format will keep trying.

lgtm besides a few nits

llvm/include/llvm/CodeGen/MachineRegisterInfo.h
60

Didn't realize this already existed

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4985

ST is already a member

4986

This is just this already, you don't need to query it

4987–4988
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2218–2219

Indentation's off

2226

Indentation's off

yassingh updated this revision to Diff 533866.Jun 22 2023, 11:28 PM
  • review comments
yassingh updated this revision to Diff 533868.Jun 22 2023, 11:31 PM

Reuploading diff

yassingh added inline comments.Jun 22 2023, 11:33 PM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2218–2219

clang format as well arc undos this indentation.

2226

same

arsenm accepted this revision.Jun 23 2023, 5:41 AM

lgtm with nits

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
70

Documentation comment for what IncludeScratchCopy is

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2218–2219

Well it's clearly wrong, so just manually fix it

2226

Well it's clearly wrong, so just manually fix it

This revision is now accepted and ready to land.Jun 23 2023, 5:41 AM
yassingh updated this revision to Diff 534586.Jun 26 2023, 8:50 AM

Review comments and fix formatting

cdevadas added inline comments.Jun 28 2023, 6:32 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2151

No need for braces in case of a single instruction.

2159

Wrong indentation.

2217

Avoid braces.

yassingh updated this revision to Diff 535427.Jun 28 2023, 8:33 AM

rebase and formatting

cdevadas accepted this revision.Jun 28 2023, 8:55 AM
arsenm added inline comments.Jun 29 2023, 4:17 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
936

Should be marked convergent

yassingh updated this revision to Diff 536981.Jul 4 2023, 1:17 AM

Mark SI_SPILL_WWM_V32 convergent, rebase.

yassingh added inline comments.
llvm/test/CodeGen/MIR/AMDGPU/long-branch-reg-all-sgpr-used.ll
42 ↗(On Diff #536981)

We are using sgpr100, sgpr101 in our code below using inline asm calls why not sure what do they mean here?
cc @cdevadas

307 ↗(On Diff #536981)

@crobeck I believe this test is now not serving the purpose you added it for. I tried freeing up s98 and s99(see below changes in the same file) expecting them to be used for 'longBranchReservedReg', but didn't work. Any pointers on why it might be?

yassingh updated this revision to Diff 538186.Jul 7 2023, 10:20 AM

Rebase before merge

This revision was landed with ongoing or failed builds.Jul 7 2023, 10:22 AM
This revision was automatically updated to reflect the committed changes.