This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader
ClosedPublic

Authored by critson on Oct 3 2020, 1:10 AM.

Details

Summary

Add pseudo instruction to allow early termination of pixel shader
anywhere based on the value of SCC. The intention is to use this
when a mask of live lanes is updated, e.g. live lanes in WQM pass.
This facilitates early termination of shaders even when EXEC is
incomplete, e.g. in non-uniform control flow.

Diff Detail

Event Timeline

critson created this revision.Oct 3 2020, 1:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2020, 1:10 AM
critson requested review of this revision.Oct 3 2020, 1:10 AM
critson updated this revision to Diff 301238.Oct 28 2020, 4:36 AM
  • Remove restrictions on types of shader where early terminate can occur.
arsenm added inline comments.Oct 28 2020, 7:02 AM
llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
378

const reference

LGTM modulo the inline comment.

critson updated this revision to Diff 303703.Nov 8 2020, 2:46 AM
  • Address reviewer comments.
  • Rebase.
foad added a subscriber: foad.Jan 8 2021, 2:08 AM

Patch looks fine to me. Why is it useful to terminate on scc0 instead of scc1, or is it an arbitrary choice? Could you give a slightly more realistic example of how it would be used? Your tests all have:

S_CMP_EQ_U32 killed $sgpr0, 0, implicit-def $scc
SI_EARLY_TERMINATE_SCC0 implicit $scc, implicit $exec

which will terminate early if sgpr0 is not 0. Is the idea that this would be used when we're just about to AND sgpr0 into exec?

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
494

Could erase here unconditionally, to avoid having an erase in earlyTerm as well.

llvm/lib/Target/AMDGPU/SIInstructions.td
328

Am I right in thinking that this is not a terminator, so it can appear in the middle of an MBB? Is that normal for this kind of pseudo?

arsenm added inline comments.Jan 8 2021, 6:08 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
328

If this does any branching to any point in the program, this should be a terminator. If it behaves more like a trap/abort, it's OK if it's not a terminator

critson marked an inline comment as done.Jan 11 2021, 1:53 AM

Patch looks fine to me. Why is it useful to terminate on scc0 instead of scc1, or is it an arbitrary choice? Could you give a slightly more realistic example of how it would be used? Your tests all have:

S_CMP_EQ_U32 killed $sgpr0, 0, implicit-def $scc
SI_EARLY_TERMINATE_SCC0 implicit $scc, implicit $exec

which will terminate early if sgpr0 is not 0. Is the idea that this would be used when we're just about to AND sgpr0 into exec?

The expectation is that this is used after updating a stored EXEC mask (not EXEC itself).
So SCC will be set from S_AND or S_ANDN2 in most cases.
SCC0 means mask is empty and program can be terminated.

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
494

Yes.

llvm/lib/Target/AMDGPU/SIInstructions.td
328

I think it is reasonable to consider this an abort. It always branches to terminate the program.

critson updated this revision to Diff 315734.Jan 11 2021, 2:06 AM
critson marked an inline comment as done.
  • Address reviewer comments
  • Change tests to use more indicative S_AND_B32
foad accepted this revision.Jan 11 2021, 2:47 AM

LGTM.

This revision is now accepted and ready to land.Jan 11 2021, 2:47 AM
This revision was landed with ongoing or failed builds.Jan 12 2021, 8:29 PM
This revision was automatically updated to reflect the committed changes.