This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Implement whole wave register spill
ClosedPublic

Authored by cdevadas on Feb 10 2023, 9:30 AM.

Details

Summary

To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.

This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.

Diff Detail

Event Timeline

cdevadas created this revision.Feb 10 2023, 9:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 9:30 AM
cdevadas requested review of this revision.Feb 10 2023, 9:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 9:30 AM
yassingh updated this revision to Diff 523313.May 18 2023, 3:22 AM
  • Rebase over new TARGET::PRED_COPY approach
arsenm requested changes to this revision.Jun 19 2023, 6:15 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIDefines.h
920

I assumed the value would be 1. Change this to be the pre-shifted values rather than using 1 << flag in the getter?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12603–12609

Move this up with the other reserved register logic

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
546

Capitalization is weird, Capitalize clone

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
646

Why are the inbounds checks needed? The analagous MRI functions do not tolerate invalid register values

651–652

This should assert (like setFlag already does)

654

Ditto, why range check?

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2139–2140

Braces and indent under if

2148

Indent off

This revision now requires changes to proceed.Jun 19 2023, 6:15 AM
cdevadas added inline comments.Jun 19 2023, 6:53 AM
llvm/lib/Target/AMDGPU/SIDefines.h
920

Will do.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
546

Will do.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
646

VRegFlags will be grown each time (if the size exceeds the initial size set) when a new virtual register is created/cloned. The MRI's 'VRegToType' has a similar bound check.

651–652

I guess it's already been discussed in the downstream review. There are instances when Physical registers are taken to spill - fastAlloc, CSR spills during PEI. To handle such instances, an early return was inserted intentionally for the PhysRegs instead of an assertion.

arsenm added inline comments.Jun 19 2023, 6:55 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
646

VRegToType isn't really analagous. Untyped and typed registers can coexist, I'd expect it to be unnecessary here

arsenm added inline comments.Jun 19 2023, 6:56 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12603–12609

Actually, why is this not in getReservedRegs?

yassingh updated this revision to Diff 532851.Jun 20 2023, 3:19 AM

Addressed Matt's comments:

  • Updated flag
  • Formatting
yassingh added inline comments.Jun 20 2023, 3:23 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
646

Getting rid of inbounds check causes multiple test failures. (In spill192.mir, spill224.mir, spill288.mir, spill320.mir, etc).

yassingh added inline comments.Jun 20 2023, 3:25 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2139–2140

Missed indenting, will fix it.

cdevadas added inline comments.Jun 21 2023, 6:51 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2139–2140

Update the patch after clang-format.

arsenm added inline comments.Jun 21 2023, 6:53 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
657

D149775 is adding another reserved register, but I'm pretty sure they cannot contextually overlap. After both patches are in they should be consolidated to reuse the same reserved register

arsenm added inline comments.Jun 21 2023, 7:12 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1470

ST.getWaveMaskRegClass

1472

This misses VCC as a possibility, which is fixed by using getWaveMaskRegClass

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
338

Same, use getWaveMaskRegClass

yassingh updated this revision to Diff 533300.Jun 21 2023, 9:36 AM
  • Review comments (use getWaveMaskRegClass(), getExec() functions)
yassingh added inline comments.Jun 21 2023, 9:37 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
657

Will update once both lands. Added FIXME for now.

2139–2140

Oh right, clang-format reverts the indentation. Going to leave it as it is, else clang-format will keep trying.

lgtm besides a few nits

llvm/include/llvm/CodeGen/MachineRegisterInfo.h
60 ↗(On Diff #533300)

Didn't realize this already existed

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4955

ST is already a member

4956

This is just this already, you don't need to query it

4957–4958
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2206–2207

Indentation's off

2213

Indentation's off

yassingh updated this revision to Diff 533866.Jun 22 2023, 11:28 PM
  • review comments
yassingh updated this revision to Diff 533868.Jun 22 2023, 11:31 PM

Reuploading diff

yassingh added inline comments.Jun 22 2023, 11:33 PM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2206–2207

clang format as well arc undos this indentation.

2213

same

arsenm accepted this revision.Jun 23 2023, 5:41 AM

lgtm with nits

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
70

Documentation comment for what IncludeScratchCopy is

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2206–2207

Well it's clearly wrong, so just manually fix it

2213

Well it's clearly wrong, so just manually fix it

This revision is now accepted and ready to land.Jun 23 2023, 5:41 AM
yassingh updated this revision to Diff 534586.Jun 26 2023, 8:50 AM

Review comments and fix formatting

cdevadas added inline comments.Jun 28 2023, 6:32 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2139

No need for braces in case of a single instruction.

2147

Wrong indentation.

2205

Avoid braces.

yassingh updated this revision to Diff 535427.Jun 28 2023, 8:33 AM

rebase and formatting

cdevadas accepted this revision.Jun 28 2023, 8:55 AM
arsenm added inline comments.Jun 29 2023, 4:17 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
901

Should be marked convergent

yassingh updated this revision to Diff 536981.Jul 4 2023, 1:17 AM

Mark SI_SPILL_WWM_V32 convergent, rebase.

yassingh added inline comments.
llvm/test/CodeGen/MIR/AMDGPU/long-branch-reg-all-sgpr-used.ll
42 ↗(On Diff #536981)

We are using sgpr100, sgpr101 in our code below using inline asm calls why not sure what do they mean here?
cc @cdevadas

307 ↗(On Diff #536981)

@crobeck I believe this test is now not serving the purpose you added it for. I tried freeing up s98 and s99(see below changes in the same file) expecting them to be used for 'longBranchReservedReg', but didn't work. Any pointers on why it might be?

yassingh updated this revision to Diff 538186.Jul 7 2023, 10:20 AM

Rebase before merge

This revision was landed with ongoing or failed builds.Jul 7 2023, 10:22 AM
This revision was automatically updated to reflect the committed changes.