This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos
ClosedPublic

Authored by arsenm on Dec 8 2021, 2:04 PM.

Details

Summary

If we had a large offset which required materializing in a register,
we would emit an s_add_i32, clobbering SCC. Start checking if SCC is
live, and instead use a VGPR offset. For MUBUF, we switch to using
offen. We would do this anyway in a normal load/store with a frame
index, but not for spills.

The same problem still exists in other contexts where we expand frame
indices.

The nasty edge case is when SGPRs are spilled to memory at a large
frame offset where SCC is also clobbered. This requires a second
scavenging index, and also required several patches in the scavenger
to correctly handle multiple recursive scavenge indexes.

An even nastier edge case we still don't support is if we don't have
any free SGPRs. If SCC is live and we don't have any free SGPRs to
save exec, we have no way of flipping exec back and forth without also
clobbering SCC.

Fixes: SWDEV-309419

Diff Detail

Event Timeline

arsenm created this revision.Dec 8 2021, 2:04 PM
arsenm requested review of this revision.Dec 8 2021, 2:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 2:04 PM
Herald added a subscriber: wdng. · View Herald Transcript
sebastian-ne added inline comments.Dec 14 2021, 4:28 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
189

What is the difference between TmpVGPRUsed and TmpVGPRLive? It looks like the same thing to me?

228

Isn’t TmpVGPR still used in the second buildVGPRSpillLoadStore? How does this work if it’s killed here?

llvm/test/CodeGen/AMDGPU/swdev309419-agpr.mir
1588–1593

This looks incorrect. It’s trying to use v0 as offset and agpr intermediate at the same time.

llvm/test/CodeGen/AMDGPU/swdev309419.mir
8

Typo: we the offset

1074–1078

I guess this is expected to still incorrectly overwrite $scc?
Maybe add a fixme?

arsenm added inline comments.Feb 3 2022, 8:17 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
189

Not sure how I ended up with this, everything seems to work with it removed

228

It turns out this doesn't do anything, and is in the path that doesn't work. I also don't hit this in any existing test if I add unreachable

llvm/test/CodeGen/AMDGPU/swdev309419-agpr.mir
1588–1593

Yes, this is broken. I spent most of the time on this patch trying to get this case to work. I thought I succeeded but I guess not. Given that we're going to be forced to reserve a scratch VGPR for dealing with AGPRs as part of D118415, I think I can fix this case by reusing the same register in a follow up patch.

arsenm updated this revision to Diff 405652.Feb 3 2022, 8:18 AM

Drop redundant variable

rampitec added inline comments.Feb 3 2022, 11:07 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1379

This probably needs to be a proper diagnostics instead of assert.

llvm/test/CodeGen/AMDGPU/swdev309419-agpr.mir
2

Can you rename file to something meaningful? Like accvgpr-spill-scc-clober.mir.

4

Probably add run line with flat scratch and another with -mattr=+architected-flat-scratch.

llvm/test/CodeGen/AMDGPU/swdev309419-sgpr-to-vmem.mir
2

Same here, I am not sure we need swdev here.

llvm/test/CodeGen/AMDGPU/swdev309419-unhandled.mir
2

Same here and below.

arsenm marked 3 inline comments as done.Feb 3 2022, 1:22 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1379

Real diagnostics are for things which can be hit by a user or a mir test at least. I don't think this can ever happen. This path is only reached from the prolog/epilog code where there should always be free VGPRs available

llvm/test/CodeGen/AMDGPU/swdev309419-agpr.mir
4

Doesn't architected flat scratch just changed the prolog initialization? Why does it matter here?

arsenm updated this revision to Diff 405767.Feb 3 2022, 1:22 PM

Rename tests, add flat scr run lines

rampitec added inline comments.Feb 3 2022, 1:56 PM
llvm/test/CodeGen/AMDGPU/swdev309419-agpr.mir
4

Technically it also eliminates wave scratch offset. I doubt you can run into a problem here since flat scratch path shall not use it after initialization, just a precaution.

rampitec accepted this revision.Feb 3 2022, 2:00 PM

It still can fail, but I guess it is better than what we have now. LGTM.

This revision is now accepted and ready to land.Feb 3 2022, 2:00 PM