This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid use of V_READLANE into EXEC in SGPR spills
ClosedPublic

Authored by critson on Jun 16 2020, 1:32 AM.

Details

Summary

Depends on D81224.
Always prefer to clobber input SGPRs and restore them after the
spill. This applies to both spills to VGPRs and scratch.

Diff Detail

Event Timeline

critson created this revision.Jun 16 2020, 1:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2020, 1:32 AM

Can you try to constrain the class of the virtual register earlier to avoid this from happening instead? I think that should avoid this coming up (e.g. we do MRI.constrainRegClass(SrcReg, &AMDGPU::SReg_32_XM0RegClass) already to try avoiding m0 spills, plus in foldMemoryOperandImpl to avoid another problem when spills are optimized out)

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
871

I'm pretty sure this used to be true, so this should depend on the subtarget?

critson updated this revision to Diff 271333.Jun 17 2020, 4:36 AM

Update spill-special-sgpr.mir for auto updated version.

Can you try to constrain the class of the virtual register earlier to avoid this from happening instead? I think that should avoid this coming up (e.g. we do MRI.constrainRegClass(SrcReg, &AMDGPU::SReg_32_XM0RegClass) already to try avoiding m0 spills, plus in foldMemoryOperandImpl to avoid another problem when spills are optimized out)

I could look at that, but it seems advantageous to be able to spill and restore EXEC directly when we at the point of spilling SGPRs?

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
871

If we can establish at what point it became unsupported, but it seems it does not work on anything recent.
I have verified that it does not work on gfx900 or gfx1010.
If I get a chance I will test gfx7.

critson updated this revision to Diff 271339.Jun 17 2020, 5:13 AM

Fix error in test update.

Can you try to constrain the class of the virtual register earlier to avoid this from happening instead? I think that should avoid this coming up (e.g. we do MRI.constrainRegClass(SrcReg, &AMDGPU::SReg_32_XM0RegClass) already to try avoiding m0 spills, plus in foldMemoryOperandImpl to avoid another problem when spills are optimized out)

I could look at that, but it seems advantageous to be able to spill and restore EXEC directly when we at the point of spilling SGPRs?

As a reserved register, ppilling of exec should be impossible. The only situations where it "spills" are when a copy was coalesced of another register that spilled. We should be able to satisfy everything by register class constraints. Even if that weren't the case, I would prefer to proceed by first constraining the register classes, and then applying the more complex fix if it turns out there are still cases that can't be avoided

critson updated this revision to Diff 271575.Jun 17 2020, 10:02 PM

This is the simplified version which blocks spill/reload of exec from occuring.

arsenm added inline comments.Jun 18 2020, 8:19 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
7034–7035 ↗(On Diff #271575)

This is now way more complicated. You shouldn't need to loop, or consider specific physical registers. You should be apply to just change the register classes that were used here

I also think this is missing the test to stress foldMemoryOperandImpl; Can you add one following the example of the test added in d9e0a2942ac71327166a3a597e8383192fd19b17?

critson updated this revision to Diff 271959.Jun 19 2020, 2:08 AM
  • Add test for foldMemoryOperandImpl
  • Rework code in foldMemoryOperandImpl to remove references to specific physical registers
arsenm added inline comments.Jun 19 2020, 5:29 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
7038–7041 ↗(On Diff #271959)

This shouldn't consider the register class of the physical register. Only the register class of the virtual register matters (plus getPhysRegClass is really slow), and you shouldn't need to get the size and check if it's an SGPR.

Something like:
RC = MRI.getRegClass(VirtReg);
if (RC->contains(m0) || RC->contains(exec) ...)

or alternatively
RC->hasSuperClassEq(AMDGPU::SReg_64RegClass)
...
RC->hasSuperClassEq(AMDGPU::SReg_32RegClass)

critson updated this revision to Diff 272041.Jun 19 2020, 6:17 AM

Address comments on foldMemoryOperandImpl.

arsenm accepted this revision.Jun 19 2020, 6:33 AM
This revision is now accepted and ready to land.Jun 19 2020, 6:33 AM
This revision was automatically updated to reflect the committed changes.