Depends on D77544.
Generate a single early exit block out-of-line and branch to this
if all lanes are killed. This avoids branching if lanes are active.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
This is a stepping stop to adding early termination in more places because it means termination can be added anywhere in a basic block (without splitting basic blocks).
Will need to be revised with D77544 is committed.
We are trying to have the kill intrinsic handling early during the instruction selection.
I can try to incorporate it all entirely there including the unify block code (if can't find a better place now to have it).
I will also update D77544 if this patch goes upstream.
This pass seems to be the logic place for this right now.
Is there any code or ongoing discussion I can refer to for moving kill handling to instruction selection?
I can see a few problems with doing that, but that might just be because I do not know what is planned.
I am fine with this getting blown away when SIInsertSkips is replaced by a better solution, and contributing again to that solution.
I think this patch can be simplified if D77544 is submitted first and the order of SIInsertSkips and SIPreEmitPeephole are swapped.
Rework this on top of D77544.
Make insertion of skips near the beginning of blocks work correctly by splitting blocks.
Whether this is a good idea depends on what is meant by it :) Kill intrinsics interact with WQM for how pixel shaders are defined, and we really need to move the WQM pass later because it adds instructions that interfere with scheduling in a bad way. Setting up the CFG to be the right shape earlier makes sense to me.
Mostly looks good to me, some minor inline comments, but most importantly: please add a test case in skip-if-dead.ll with a function that triggers the early-exit and returns a value (the only test return right now is @test_kill_control_flow, but it doesn't trigger this early exit).
llvm/lib/Target/AMDGPU/SIInsertSkips.cpp | ||
---|---|---|
167 | Should probably be renamed to generatePsEndPgm. | |
222–223 | Why the update to NextBBI? The old code didn't do that, I believe on purpose because while it doesn't break correctness, there simply shouldn't be a need to inspect the EarlyExitBlock. |
I see this may be covered by llvm/test/CodeGen/AMDGPU/transform-block-with-return-to-epilog.ll?
llvm/lib/Target/AMDGPU/SIInsertSkips.cpp | ||
---|---|---|
222–223 | This update is not part of the iteration. |
- Rebase
- Rename generateEndPgm
- Edit comment for clarity
- Add explicit test to skip-if-dead.ll
Should probably be renamed to generatePsEndPgm.