Add pseudo instruction to allow early termination of pixel shader
anywhere based on the value of SCC. The intention is to use this
when a mask of live lanes is updated, e.g. live lanes in WQM pass.
This facilitates early termination of shaders even when EXEC is
incomplete, e.g. in non-uniform control flow.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
470 ms | linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp |
Event Timeline
llvm/lib/Target/AMDGPU/SIInsertSkips.cpp | ||
---|---|---|
394 | const reference |
Patch looks fine to me. Why is it useful to terminate on scc0 instead of scc1, or is it an arbitrary choice? Could you give a slightly more realistic example of how it would be used? Your tests all have:
S_CMP_EQ_U32 killed $sgpr0, 0, implicit-def $scc SI_EARLY_TERMINATE_SCC0 implicit $scc, implicit $exec
which will terminate early if sgpr0 is not 0. Is the idea that this would be used when we're just about to AND sgpr0 into exec?
llvm/lib/Target/AMDGPU/SIInsertSkips.cpp | ||
---|---|---|
511 | Could erase here unconditionally, to avoid having an erase in earlyTerm as well. | |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
364 | Am I right in thinking that this is not a terminator, so it can appear in the middle of an MBB? Is that normal for this kind of pseudo? |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
364 | If this does any branching to any point in the program, this should be a terminator. If it behaves more like a trap/abort, it's OK if it's not a terminator |
The expectation is that this is used after updating a stored EXEC mask (not EXEC itself).
So SCC will be set from S_AND or S_ANDN2 in most cases.
SCC0 means mask is empty and program can be terminated.
llvm/lib/Target/AMDGPU/SIInsertSkips.cpp | ||
---|---|---|
511 | Yes. | |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
364 | I think it is reasonable to consider this an abort. It always branches to terminate the program. |
const reference