This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Unify divergent nodes if the PostDom tree has one root
ClosedPublic

Authored by gandhi21299 on Dec 10 2022, 5:33 PM.

Details

Summary

This patch allows AMDGPUUnifyDivergenceExitNodes pass
to transform a function whose PDT has exactly one root
and ends in a branch instruction. Fixes
https://github.com/llvm/llvm-project/issues/58861.

Diff Detail

Event Timeline

gandhi21299 created this revision.Dec 10 2022, 5:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2022, 5:33 PM
gandhi21299 requested review of this revision.Dec 10 2022, 5:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2022, 5:33 PM
gandhi21299 added a project: Restricted Project.

refactored patch

Thanks for working on this, but I think it is more reasonable to fix the issue early. The function is showing a problem in our cfg lowering passes for AMDGPU. The cfg lowering are being done by several passes. I know it is very tricky and fragile now, so it is not easy to determine what the right fix should be. For this specific case, the input IR does not have a return block, thus bypass the structurizeCFG pass. The AMDGPUUnifyDivergentExits happened before StructurizeCFG already has some ability to handle infinite loops, but this case was skipped because of the check if (PDT.root_size() <= 1). In this case, BB3 is a root in PDT but it is not a true exit. I would suggest we change this. If there is only one root in PostDominatorTree, please check whether it ends with return/unreachable instruction before we can skip the transformation. If the single root in PDT ends with a branch instruction, we should continue the transformation to insert a dummy return block and insert a control flow edge that is never taken with trick like br i1 true, ..., DummyReturn. By doing this, we should be able to lower the cfg correctly.

gandhi21299 added a comment.EditedDec 12 2022, 9:19 PM

@ruiling PDT.getRoot() yields BB4. So for PDT with exactly one root, we should only allow the transformation when its terminator is a branch instruction? This causes the following tests to fail due to insertion of dummy blocks:

LLVM :: CodeGen/AMDGPU/agpr-copy-no-free-registers.ll

LLVM :: CodeGen/AMDGPU/branch-relaxation.ll
LLVM :: CodeGen/AMDGPU/cf-loop-on-constant.ll
LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll
LLVM :: CodeGen/AMDGPU/infinite-loop.ll
LLVM :: CodeGen/AMDGPU/kill-infinite-loop.ll
LLVM :: CodeGen/AMDGPU/loop-live-out-copy-undef-subrange.ll
LLVM :: CodeGen/AMDGPU/optimize-negated-cond.ll
LLVM :: CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll
LLVM :: CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll

@ruiling PDT.getRoot() yields BB4. So for PDT with exactly one root, we should only allow the transformation when its terminator is a branch instruction?

Yes.

This causes the following tests to fail due to insertion of dummy blocks:

I think you just need to update the test checks. It is expected there will be changes in generated code.

corrected solution as suggested by reviewer, updated tests accordingly

gandhi21299 retitled this revision from [AMDGPU] Annotate control flow on visited blocks to [AMDGPU] Unify divergent nodes if the PostDom tree has one root.Dec 13 2022, 11:14 PM
gandhi21299 edited the summary of this revision. (Show Details)

corrected failing test different-addrspace-crash.ll

ruiling added a comment.EditedDec 14 2022, 11:07 PM
This comment has been deleted.
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
657

Why there is no DummyReturnBlock for GFX908?

I haven't looked closely but some of these test changes look worse

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
192

What about switches?

I haven't looked closely but some of these test changes look worse

We need this change to correctly structurize the function that ends with an infinite loop. Most test changes are minor except unstructured-cfg-def-use-issue.ll. This is mainly because it is compiled incorrectly before. We have to structurize a function if there is a divergent branch.

  • inserted a check for switch terminator, corrected tests accordingly
gandhi21299 marked an inline comment as done.Jan 2 2023, 4:41 PM

ran update_test_checks

  • inserted a check for switch terminator, ...

I don't think just checking for switch terminator would work. My suggestion is don't handle switch here. The pass has never been designed to work with switch terminator, and it needs non-trivial work to support switch terminator in this pass. As we already lower switch terminator before this pass, I think it is not important to support switch terminator in this pass now.

Sure, I will revert to the previous revision then.

arsenm accepted this revision.Jan 3 2023, 9:34 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109

We should have a required LowerSwitchID too

This revision is now accepted and ready to land.Jan 3 2023, 9:34 AM
  • Rebased and removed switch inst check

Thanks Matt for the approval! How does the patch look @ruiling ?

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109

I will have a seperate patch for that, it seems to be causing difficulties when the pass manager schedules UnifyDivergentExitNodes.

ruiling added inline comments.Jan 3 2023, 9:02 PM
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109

I think for function pass dependency or pass ordering, I still prefer they are managed by compiler developer. If I remember correctly, the new pass manager does not support dependency between function passes?

llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
657

Did you try to get the answer for the question? It sounds strange we get different behavior for gfx908 and gfx90A here.

arsenm added inline comments.Jan 3 2023, 9:07 PM
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109

The important part is verification. We shouldn't have arbitrary pass contracts not enforced by a verifier

ruiling added inline comments.Jan 3 2023, 10:07 PM
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109

I agree that pass contracts or assumption should be enforced by verification. For this specific issue, can we verify within this pass that a terminator should not be SwitchInst?

gandhi21299 added inline comments.Jan 3 2023, 10:16 PM
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
657

In gfx908, the block is eliminated much later in the pipeline which does not happen in gfx90a.

ruiling accepted this revision.Jan 3 2023, 11:16 PM

I think it is good to go. Thanks!

llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
657

Thanks for taking a second look!

arsenm accepted this revision.Jan 4 2023, 7:19 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109

That's what I was asking for for switch handling (should also worry about indirectbr, caller and invoke)

This revision was landed with ongoing or failed builds.Jan 4 2023, 9:45 AM
This revision was automatically updated to reflect the committed changes.