This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Spill more than wavesize CSR SGPRs
ClosedPublic

Authored by saiislam on Jun 24 2020, 7:02 AM.

Details

Summary

In case of more than wavesize CSR SGPR spills, lanes of reserved VGPR were getting
overwritten due to wrap around.

Reserve a VGPR (when NumVGPRSpillLanes = 0, WaveSize, 2*WaveSize, ..) and when one
of the two conditions is true:

  1. One reserved VGPR being tracked by VGPRReservedForSGPRSpill is not yet reserved.
  2. All spill lanes of reserved VGPR(s) are full and another spill lane is required.

Diff Detail

Event Timeline

saiislam created this revision.Jun 24 2020, 7:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2020, 7:02 AM
arsenm added inline comments.Jun 24 2020, 7:48 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
312

I think it might be clearer to split out the logic for the special reserved case.

Could you do something like

if (FuncInfo->VGPRReservedForSGPRSpill && NumVGPRSpillLanes <= WaveSize) {
  assert(VGPRResevedForSGPRSpill == SpillVGPRs.back().VGPR);
  LaneVGPR = VGPRResevedForSGPRSpill;
} else (VGPRIndex == 0) {
 ....
} else {
  LaneVGPR = SpillVGPRs.back().VGPR
}
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
11–12

I'm pretty sure this isn't broken if you do not have a stack object, but these should be unrelated so I think something is still wrong here

saiislam updated this revision to Diff 273119.Jun 24 2020, 11:50 AM
  1. Split the VGPR allocation logic as suggested
arsenm added inline comments.Jun 24 2020, 1:13 PM
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
1

Maybe also try with and without the reserved VGPR flag?

10

You removed the stack object, but my point was without your patch, this already works:

v_writelane_b32 v0, s98, 63
v_writelane_b32 v1, s99, 0
v_writelane_b32 v1, s100, 1

So that means this test should include with and without the stack object. We also need to solve the mystery of why this stack object is needed

kerbowa added inline comments.Jun 24 2020, 11:26 PM
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
10

Without the stack object FuncInfo->VGPRReservedForSGPRSpill is 0.

kerbowa added inline comments.Jun 24 2020, 11:40 PM
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
10

No VGPR is pre-reserved when there are no stack objects.
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/AMDGPU/SIISelLowering.cpp#L11283

This patch will fix some current Navi issues as well where it's more common to run out of lanes in the pre-reserved register.

saiislam updated this revision to Diff 273353.Jun 25 2020, 7:34 AM

Updated the test case to check for spilling in the second vgpr with and without stack object present, once all lanes of first vgpr has been used.

kerbowa added inline comments.Jun 26 2020, 8:58 AM
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
31

I think you need "store volatile i32 0, i32 addrspace(5)* %alloca" as well or there is no stack object.

saiislam updated this revision to Diff 273760.Jun 26 2020, 9:25 AM

Corrected the test case

kerbowa accepted this revision.Jun 29 2020, 2:08 PM

LGTM

This revision is now accepted and ready to land.Jun 29 2020, 2:08 PM
arsenm accepted this revision.Jun 29 2020, 2:19 PM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
10

This doesn't really make sense though; why would this be a factor in the reservation logic?

saiislam marked an inline comment as done.Jun 30 2020, 7:08 AM
saiislam added inline comments.
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
32

With this patch applied over recent trunk, I am seeing following behavior:

  • without volatile in store instruction, correct behavior (using next vgpr for spilling). stack object instruction gets optimized out
  • with volatile in store instruction, spills are still getting overwritten
  • With -O0, with and without volatile spills are getting overwritten.

So, whenever stack object is present, wrong behavior is getting triggered somehow.

arsenm added inline comments.Jun 30 2020, 9:40 AM
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
32

Did you mean without the patch?

saiislam marked an inline comment as done.Jun 30 2020, 9:49 AM
saiislam added inline comments.
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
32

No, I mean with the patch. Something changed in last 3-4 days.
Earlier, this patch used to work correctly for volatile store instruction as well.

kerbowa added inline comments.Jun 30 2020, 5:21 PM
llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll
32

It's because you reversed the logic. It should be 'NumVGPRSpillLanes < WaveSize'.

saiislam updated this revision to Diff 274684.Jun 30 2020, 10:10 PM

Corrected condition on NumVGPRSpillLanes and updated the test case.

kerbowa accepted this revision.Jun 30 2020, 10:12 PM
This revision was automatically updated to reflect the committed changes.