This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Implement SGPR spilling with scalar stores
ClosedPublic

Authored by arsenm on Oct 13 2016, 3:22 AM.

Details

Reviewers
tstellarAMD
Summary

This avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 74489.Oct 13 2016, 3:22 AM
arsenm retitled this revision from to AMDGPU: Implement SGPR spilling with scalar stores.
arsenm updated this object.
arsenm added a subscriber: llvm-commits.
nhaehnle added inline comments.Oct 14 2016, 1:45 AM
lib/Target/AMDGPU/SIInsertWaits.cpp
626–627

Please also handle the SI_RETURN case here and below.

arsenm added inline comments.Oct 14 2016, 3:51 AM
lib/Target/AMDGPU/SIInsertWaits.cpp
626–627

I was specifically not handling that, but I guess it isn't a normal function return

nhaehnle added inline comments.Oct 14 2016, 4:29 AM
lib/Target/AMDGPU/SIInsertWaits.cpp
626–627

Yes. The way it's used, we're just concatenating the binaries of multiple shader parts. Only the middle part should ever contain register spills (well, unless you compile with -O0, but we never do that), so it makes sense that all the necessary handling is confined to the shader part where it happens.

arsenm updated this revision to Diff 74711.Oct 14 2016, 10:29 AM
arsenm edited edge metadata.

Handle si_return

arsenm updated this revision to Diff 76270.Oct 28 2016, 4:12 PM
arsenm edited edge metadata.

Fix if offset is 0

One question about the offsets at which spills happen, though this could stay a TODO for now. Apart from that, LGTM.

test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll
27–37

I'm a bit surprised by the offsets as they seem too far apart. I guess the offset allocation assumes VGPR spilling, and this is a TODO to be fixed later? Should probably be mentioned here in the test and in the appropriate location in the code.

arsenm added inline comments.Nov 7 2016, 11:24 AM
test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll
27–37

The offsets need to be multiplied by the wave size, so they end up looking big. It would be a code size improvement to use the previous one when splitting the spill (although this is mitigated by using the wider instructions in the follow up patch)

I guess the comment on D26005 about leaving 63 elements wasted belongs here? It would be nice to have a TODO about this here, since it seems to be just an arbitrary limitation of how the spill slots are currently allocated.

I guess the comment on D26005 about leaving 63 elements wasted belongs here? It would be nice to have a TODO about this here, since it seems to be just an arbitrary limitation of how the spill slots are currently allocated.

D26104 is the patch to use the wide stores which fixes this

tstellarAMD accepted this revision.Nov 11 2016, 4:23 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Nov 11 2016, 4:23 PM
arsenm closed this revision.Nov 13 2016, 10:35 AM

r286766