This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: fix scratch resource register allocation on Tonga & Iceland
ClosedPublic

Authored by nhaehnle on Dec 22 2015, 1:23 PM.

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 43475.Dec 22 2015, 1:23 PM
nhaehnle retitled this revision from to AMDGPU/SI: fix scratch resource register allocation on Tonga & Iceland.
nhaehnle updated this object.
nhaehnle added reviewers: arsenm, tstellarAMD.
nhaehnle added a subscriber: llvm-commits.
arsenm edited edge metadata.Dec 22 2015, 1:53 PM

Do you have a test that hits this?

This is also repeated for the ScratchWaveOffsetReg below, so that probably also needs this.

I don't really remember why I did it this way. At this point we should know exactly which registers vcc/flat_scr/xnack should be in

lib/Target/AMDGPU/SIFrameLowering.cpp
124

This should also account for flat_scr. I'm not sure we have a way to check if flat_scr is needed yet

Yes, I actually ran into this with a real shader, but it's a very big one :) To hit this, all SGPRs _and_ VGPR spilling have to be used. Perhaps a long sequence of loads followed by some kind of barrier intrinsics could work?

And you're right, the code below also looks like it needs to be adapted, but somehow the bug got fixed even without it. I'll look into it.

nhaehnle updated this revision to Diff 43764.Dec 29 2015, 3:15 PM
nhaehnle edited edge metadata.

I have distilled a test that hits the issue. It's a pretty subtle balancing
to ensure the right number of SGPRs are used to trigger the bug without
running into the "Ran out of VGPRs for SGPR spilling" issue.

Two logically separate changes are required to make this new test pass:

  1. Just skip the moving around of scratch resource registers on chips with the

SGPR init bug; after all, there's nothing to be gained by the move. This is a
simpler approach to what I tried before.

  1. Separately, make getSIProgramInfo assume that XNACK_MASK is unused; this is

not really a long-term solution, but the other register calculations I've seen
(in particular getReservedRegs) make the same assumption.

  • AMDGPU: add test/CodeGen/AMDGPU/spill-stress.ll
  • AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland
  • AMDGPU/SI: Assume XNACK_MASK is not used in getSIProgramInfo
nhaehnle updated this revision to Diff 43765.Dec 29 2015, 3:17 PM

Reverted an incorrect hunk in getSIProgramInfo

  • AMDGPU/SI: Assume XNACK_MASK is not used in getSIProgramInfo
tstellarAMD added inline comments.Jan 4 2016, 6:58 AM
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
426–428 ↗(On Diff #43765)

I would prefer if we future proofed this by adding the XNACK registers and checking for them like we do VCC and Flat.

test/CodeGen/AMDGPU/spill-stress.ll
1 ↗(On Diff #43765)

You may be able to write a better test using inline assembly. Take a look at the flat-scratch-reg.ll test for an example.

tstellarAMD requested changes to this revision.Jan 4 2016, 6:58 AM
tstellarAMD edited edge metadata.
This revision now requires changes to proceed.Jan 4 2016, 6:58 AM
nhaehnle updated this revision to Diff 43920.Jan 4 2016, 1:48 PM
nhaehnle edited edge metadata.

Thanks for the hint regarding inline assembly, that makes the test much nicer.

I dropped the part about XNACK from this patch. With only this patch applied,
the new test should actually still fail, though with a different error message.
The test passes when D15869 is applied as well.

  • AMDGPU: add test/CodeGen/AMDGPU/spill-stress.ll
  • AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland
tstellarAMD accepted this revision.Jan 5 2016, 10:50 AM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Jan 5 2016, 10:50 AM
nhaehnle closed this revision.Jan 5 2016, 1:51 PM

This is r256870.

Not sure why it wasn't closed automatically, I did add the Differential Revision comment after squashing the patches I had locally. It seems SVN is having problems right now, perhaps that's why the bot hasn't picked it up.