This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Tidy up conditional branches from early termination
AbandonedPublic

Authored by critson on Jan 14 2021, 10:05 PM.

Details

Reviewers
foad
arsenm
Summary

Replace conditional branches with unconditional ones where the
value of SCC is known apriori.

Diff Detail

Event Timeline

critson created this revision.Jan 14 2021, 10:05 PM
critson requested review of this revision.Jan 14 2021, 10:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2021, 10:05 PM
foad added inline comments.Jan 15 2021, 6:03 AM
llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
228

I'm confused by this. Why can't you generate an unconditional branch to the early exit block from a BB that happened to have two or more successors?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.demote.ll
11

The v_cmp is dead, and so is the whole %bb.1 block. Would you need to run another MachineDCE to get rid of them?

Alternatively could we get rid of dead code at the IR level? Maybe the AMDGPU instCombineIntrinsic could spot that the argument is false and call setDoesNotReturn() on the call?

critson updated this revision to Diff 317129.Jan 15 2021, 5:38 PM
  • Incorporate code that should have been in this patch.

I'm not sure this is the right pass for this (I've been hoping to move everything out of this pass eventually)

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
195

Comment what the point of this is

201–202

Invert and early return?

205–206

Comment for the instruction pattern

227–237

Why isn't BranchFolding handling this case?

I'm not sure this is the right pass for this (I've been hoping to move everything out of this pass eventually)

I agree, I think SIPreEmitPeephole is probably a better place for this.

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp
228

I am not sure why I wrote this either.
I retested without it and could not produce any problems.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.demote.ll
11

Adding a late MachineDCE does clean up the v_cmp, but not %bb.1. I notice that in some of the other examples it returns other unreachable instructions. It might be interesting to test results of this on a large collection of shaders.

I am not sure if setting "does not return" is valid, as this will only terminate the whole shader if every lane is demoted. This is true in uniform control flow, but this intrinsic is valid in non-uniform control flow too.

critson abandoned this revision.Jan 31 2021, 6:43 PM

While the pattern this optimises for occurs a fair bit in trivial lit test code, after evaluating game shaders I can find only a handful of examples (4 in 10000 shaders).
So I do not think it is worth adding code for.