The correct number of available SGPRs needs to be taken into account here.
Details
Diff Detail
Event Timeline
Do you have a test that hits this?
This is also repeated for the ScratchWaveOffsetReg below, so that probably also needs this.
I don't really remember why I did it this way. At this point we should know exactly which registers vcc/flat_scr/xnack should be in
lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
107–108 | This should also account for flat_scr. I'm not sure we have a way to check if flat_scr is needed yet |
Yes, I actually ran into this with a real shader, but it's a very big one :) To hit this, all SGPRs _and_ VGPR spilling have to be used. Perhaps a long sequence of loads followed by some kind of barrier intrinsics could work?
And you're right, the code below also looks like it needs to be adapted, but somehow the bug got fixed even without it. I'll look into it.
I have distilled a test that hits the issue. It's a pretty subtle balancing
to ensure the right number of SGPRs are used to trigger the bug without
running into the "Ran out of VGPRs for SGPR spilling" issue.
Two logically separate changes are required to make this new test pass:
- Just skip the moving around of scratch resource registers on chips with the
SGPR init bug; after all, there's nothing to be gained by the move. This is a
simpler approach to what I tried before.
- Separately, make getSIProgramInfo assume that XNACK_MASK is unused; this is
not really a long-term solution, but the other register calculations I've seen
(in particular getReservedRegs) make the same assumption.
- AMDGPU: add test/CodeGen/AMDGPU/spill-stress.ll
- AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland
- AMDGPU/SI: Assume XNACK_MASK is not used in getSIProgramInfo
Reverted an incorrect hunk in getSIProgramInfo
- AMDGPU/SI: Assume XNACK_MASK is not used in getSIProgramInfo
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
426–428 | I would prefer if we future proofed this by adding the XNACK registers and checking for them like we do VCC and Flat. | |
test/CodeGen/AMDGPU/spill-stress.ll | ||
1 | You may be able to write a better test using inline assembly. Take a look at the flat-scratch-reg.ll test for an example. |
Thanks for the hint regarding inline assembly, that makes the test much nicer.
I dropped the part about XNACK from this patch. With only this patch applied,
the new test should actually still fail, though with a different error message.
The test passes when D15869 is applied as well.
- AMDGPU: add test/CodeGen/AMDGPU/spill-stress.ll
- AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland
This is r256870.
Not sure why it wasn't closed automatically, I did add the Differential Revision comment after squashing the patches I had locally. It seems SVN is having problems right now, perhaps that's why the bot hasn't picked it up.
I would prefer if we future proofed this by adding the XNACK registers and checking for them like we do VCC and Flat.