This patch allows AMDGPUUnifyDivergenceExitNodes pass
to transform a function whose PDT has exactly one root
and ends in a branch instruction. Fixes
https://github.com/llvm/llvm-project/issues/58861.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for working on this, but I think it is more reasonable to fix the issue early. The function is showing a problem in our cfg lowering passes for AMDGPU. The cfg lowering are being done by several passes. I know it is very tricky and fragile now, so it is not easy to determine what the right fix should be. For this specific case, the input IR does not have a return block, thus bypass the structurizeCFG pass. The AMDGPUUnifyDivergentExits happened before StructurizeCFG already has some ability to handle infinite loops, but this case was skipped because of the check if (PDT.root_size() <= 1). In this case, BB3 is a root in PDT but it is not a true exit. I would suggest we change this. If there is only one root in PostDominatorTree, please check whether it ends with return/unreachable instruction before we can skip the transformation. If the single root in PDT ends with a branch instruction, we should continue the transformation to insert a dummy return block and insert a control flow edge that is never taken with trick like br i1 true, ..., DummyReturn. By doing this, we should be able to lower the cfg correctly.
@ruiling PDT.getRoot() yields BB4. So for PDT with exactly one root, we should only allow the transformation when its terminator is a branch instruction? This causes the following tests to fail due to insertion of dummy blocks:
LLVM :: CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
LLVM :: CodeGen/AMDGPU/branch-relaxation.ll LLVM :: CodeGen/AMDGPU/cf-loop-on-constant.ll LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll LLVM :: CodeGen/AMDGPU/infinite-loop.ll LLVM :: CodeGen/AMDGPU/kill-infinite-loop.ll LLVM :: CodeGen/AMDGPU/loop-live-out-copy-undef-subrange.ll LLVM :: CodeGen/AMDGPU/optimize-negated-cond.ll LLVM :: CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll LLVM :: CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll
Yes.
This causes the following tests to fail due to insertion of dummy blocks:
I think you just need to update the test checks. It is expected there will be changes in generated code.
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll | ||
---|---|---|
657 | Why there is no DummyReturnBlock for GFX908? |
I haven't looked closely but some of these test changes look worse
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp | ||
---|---|---|
192 | What about switches? |
We need this change to correctly structurize the function that ends with an infinite loop. Most test changes are minor except unstructured-cfg-def-use-issue.ll. This is mainly because it is compiled incorrectly before. We have to structurize a function if there is a divergent branch.
I don't think just checking for switch terminator would work. My suggestion is don't handle switch here. The pass has never been designed to work with switch terminator, and it needs non-trivial work to support switch terminator in this pass. As we already lower switch terminator before this pass, I think it is not important to support switch terminator in this pass now.
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp | ||
---|---|---|
109 | We should have a required LowerSwitchID too |
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp | ||
---|---|---|
109 | I think for function pass dependency or pass ordering, I still prefer they are managed by compiler developer. If I remember correctly, the new pass manager does not support dependency between function passes? | |
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll | ||
657 | Did you try to get the answer for the question? It sounds strange we get different behavior for gfx908 and gfx90A here. |
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp | ||
---|---|---|
109 | The important part is verification. We shouldn't have arbitrary pass contracts not enforced by a verifier |
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp | ||
---|---|---|
109 | I agree that pass contracts or assumption should be enforced by verification. For this specific issue, can we verify within this pass that a terminator should not be SwitchInst? |
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll | ||
---|---|---|
657 | In gfx908, the block is eliminated much later in the pipeline which does not happen in gfx90a. |
I think it is good to go. Thanks!
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll | ||
---|---|---|
657 | Thanks for taking a second look! |
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp | ||
---|---|---|
109 | That's what I was asking for for switch handling (should also worry about indirectbr, caller and invoke) |
We should have a required LowerSwitchID too