This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Unify unreachable intrinsics
ClosedPublic

Authored by yaxunl on Aug 4 2022, 10:18 AM.

Details

Summary

si-annotate-control-flow does depth first traversal of BB's of
a function to insert amdgcn if intrinsics for conditional
branches so that isel can generate correct instructions later.

si-annotate-control-flow checks whether the successor BB for the 'else'
branch of a conditional branch has been visited. If it has been
visited, si-annotate-control-flow assumes the conditional
branch has been handled and will not try to insert if intrinsic
for it.

This assumption is not correct when the IR contains multiple
unreachable BB's. Then if intrinscs are not inserted and incorrect
ISA are generated.

This patch fixes the issue by let amdgpu-unify-divergent-exit-nodes
unify unreachables even if they are uniformly reached. In this say
the IR will not contain multiple exits, and structurizer is able to
structurize the IR containing one unified exit.

Diff Detail

Event Timeline

yaxunl created this revision.Aug 4 2022, 10:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2022, 10:18 AM
yaxunl requested review of this revision.Aug 4 2022, 10:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2022, 10:18 AM
Herald added a subscriber: wdng. · View Herald Transcript

Is this to fix the same issue as D131150

Is this to fix the same issue as D131150

No. They are fixing different issues. I tried D131150. It does not add the missing if intrinsic needed by this issue.

I don't think this is the right fix for the failure test. The attached test uncover some issues in passes running before annotate-cf. The test escaped structurization because there are multiple function exit blocks. This is because unify-function-exit did not unify UnreachableBlock that is uniformly reachable from entry(here: %cond.false). Then structurizecfg simply skip the whole function which is wrong. Two possible directions I can see are: a.) make the unify-divergent-exit also unify UnreachableBlock even it is uniformly reachable, this sounds more like a quick fix. b.) teach StructurizeCFG also work on function with multiple (more specifically uniformly reachable) exits.

yaxunl added a comment.Aug 5 2022, 6:09 AM

I don't think this is the right fix for the failure test. The attached test uncover some issues in passes running before annotate-cf. The test escaped structurization because there are multiple function exit blocks. This is because unify-function-exit did not unify UnreachableBlock that is uniformly reachable from entry(here: %cond.false). Then structurizecfg simply skip the whole function which is wrong. Two possible directions I can see are: a.) make the unify-divergent-exit also unify UnreachableBlock even it is uniformly reachable, this sounds more like a quick fix. b.) teach StructurizeCFG also work on function with multiple (more specifically uniformly reachable) exits.

I doubt that would help. In the lit test si-annotate-cf-else-visited.ll, the issue before this fix is that the if.else BB misses if intrinsic. This is because the depth-first BB traversal follows the order entry -> if.then -> if.end6.sink.split -> if.end6 -> cond.false ->if.else. When it visits if.else, since if.end6 has been visited, it does not insert if intrinsic. Unifying cond.false and cond.false.i8 does not change the traversal order between if.end6 and if.else. The issue is still there.

I doubt that would help.

You can have a try to remove isUniformlyReached() check for UnreachableBlock (i.e. remove the line https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp#L214) to see what would happen.

In the lit test si-annotate-cf-else-visited.ll, the issue before this fix is that the if.else BB misses if intrinsic.

We also need a matching endif intrinsic in the post-dominator of if.else to make the threads diverged at if.else reconverged by kind of restoring execution mask. And the control flow edge (if.then -> if.end6.sink.split) is breaking the structure requirement for divergent control flow, which should be rewired by structurizeCFG.

yaxunl added a comment.Aug 8 2022, 9:18 AM

I doubt that would help.

You can have a try to remove isUniformlyReached() check for UnreachableBlock (i.e. remove the line https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp#L214) to see what would happen.

Removing isUniformlyReached() fixes the issue. Thanks. However, I got a few regressions:

LLVM :: CodeGen/AMDGPU/GlobalISel/bool-legalization.ll
LLVM :: CodeGen/AMDGPU/si-annotate-control-flow-condition-common-blocks.ll
LLVM :: CodeGen/AMDGPU/skip-if-dead.ll
LLVM :: CodeGen/AMDGPU/switch-default-block-unreachable.ll

I am checking whether these are real issues or need an update.

yaxunl updated this revision to Diff 450935.Aug 8 2022, 2:05 PM
yaxunl retitled this revision from [AMDGPU] Fix si-annotate-control-flow for visited else branch to [AMDGPU] Unify unreachable intrinsics.
yaxunl edited the summary of this revision. (Show Details)
yaxunl added a reviewer: ruiling.

Revised by Ruiling's comments.

Thanks for working on the change! Please check the inline comment to see whether it make sense to you. I want to make it clear this is a short-term solution to workaround the limitation of structurizer, I have been working on re-writing most of the stuff in StructurizeCFG, especially the re-wire process, hope I can solve this issue in the new version. I will revert this change if the case is correctly handled.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
214–219

Please add some TODO comment to make sure this will not be missed: Actually we don't need to unify UnreachableBlocks that are uniformly reachable, we are unifying them for now to workaround the limitation of structurizer, which could not handle multiple function exits.

llvm/test/CodeGen/AMDGPU/si-unify-exit-multiple-unreachables.ll
10

Maybe something like: This used to bypass the structurization process because structurizer is unable to handle multiple-exits CFG, this should be correctly structurized.

llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll
3

no need to add extra check lines. you can just update the test.

yaxunl marked 3 inline comments as done.Aug 8 2022, 7:25 PM

I doubt that would help.

You can have a try to remove isUniformlyReached() check for UnreachableBlock (i.e. remove the line https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp#L214) to see what would happen.

Removing isUniformlyReached() fixes the issue. Thanks. However, I got a few regressions:

LLVM :: CodeGen/AMDGPU/GlobalISel/bool-legalization.ll
LLVM :: CodeGen/AMDGPU/si-annotate-control-flow-condition-common-blocks.ll
LLVM :: CodeGen/AMDGPU/skip-if-dead.ll
LLVM :: CodeGen/AMDGPU/switch-default-block-unreachable.ll

I am checking whether these are real issues or need an update.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
214–219

will do

214–219

will do

llvm/test/CodeGen/AMDGPU/si-unify-exit-multiple-unreachables.ll
10

will do

llvm/test/CodeGen/AMDGPU/switch-default-block-unreachable.ll
3

will remove

yaxunl updated this revision to Diff 451028.Aug 8 2022, 7:27 PM
yaxunl marked 3 inline comments as done.

revised by Ruiling's comments

ruiling accepted this revision.Aug 8 2022, 7:32 PM

LGTM

This revision is now accepted and ready to land.Aug 8 2022, 7:32 PM
This revision was landed with ongoing or failed builds.Aug 9 2022, 7:24 AM
This revision was automatically updated to reflect the committed changes.