This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][SIAnnotateCF] Support Conditions Using a Common Basic Block
AbandonedPublic

Authored by Pierre-vh on Aug 4 2022, 2:30 AM.

Details

Reviewers
arsenm
ruiling
Summary

Allows popping the same basic block multiple times off the stack.
Such scenarios can happen if two conditional branches use the same destination
basic block, e.g., the "false" basic block is the same.

Diff Detail

Event Timeline

Pierre-vh created this revision.Aug 4 2022, 2:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2022, 2:30 AM
Pierre-vh requested review of this revision.Aug 4 2022, 2:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2022, 2:30 AM
arsenm added inline comments.Aug 4 2022, 6:56 AM
llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp
354

Could you construct a situation where the block appears more times, and not consecutively? It may be better to change the stack to a SetVector

llvm/test/CodeGen/AMDGPU/si-annotate-control-flow-condition-common-blocks.ll
4

Add a brief comment explaining what this tests. Also, can you add an llc run line? The control flow pass pipeline is a bit fragile and subject to change

arsenm added inline comments.Aug 4 2022, 7:12 AM
llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp
354

The answer is probably no since this is supposed to be structurized already

Pierre-vh added inline comments.Aug 5 2022, 12:58 AM
llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp
354

I managed to create a similarly problematic situation by changing the function below so the second branch uses the common basic block for the "true" instead of the false. We end up with BB22 buried under BB21 and it cant pop it off the stack because of that:
Not sure how we can approach this? I'll look into it but any tips are appreciated

Pierre-vh added inline comments.Aug 5 2022, 2:10 AM
llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp
354

To be more specific, I wrote two additional tests:

  • One has the first branch use the common block for the True, and the second branch uses it for the False
  • The second is the opposite.

The first test is completely fine, but the second fails because bb22 gets buried under bb21 as the latter gets visited before it. bb22 gets trapped on the stack, never popped, and the pass fails.

Pierre-vh updated this revision to Diff 450767.Aug 8 2022, 5:06 AM

I added a special case to "bury" BBs under the last stack entry if the "then" is top of the stack. I feel like it's cheating a bit but I'm not sure how else we can deal with this.
All tests pass now with this change at least.

ruiling added a subscriber: ruiling.Aug 8 2022, 8:13 AM

Hi @Pierre-vh Thanks for working on this. I have to say that the si-annotate-control-flow was not designed to run separately. The control flow lowering process for AMDGPU was completed by several llvm passes running one by one. Also as @arsenm said, the control flow lowering process might change in the future, so a llc test is more helpful.

For this specific issue, I think this might share the same root-cause with D131181. I have leaved some comments there. My point is the si-annotate-control-flow assume the CFG of the input function should already being structurized, but this case happened to skip the structurization process because the unify-function-divergent-exit tries to optimize for uniformly reachable Unreachable block. I am not sure whether we have any workload depend on such optimization. Changing the behavior of unify-function-divergent-exit might regress quality of generated code in some cases. But teaching the existing structurizer to work on such case might need some non-trivial effort. Any comment? @arsenm

Hi @Pierre-vh Thanks for working on this. I have to say that the si-annotate-control-flow was not designed to run separately. The control flow lowering process for AMDGPU was completed by several llvm passes running one by one. Also as @arsenm said, the control flow lowering process might change in the future, so a llc test is more helpful.

For this specific issue, I think this might share the same root-cause with D131181. I have leaved some comments there. My point is the si-annotate-control-flow assume the CFG of the input function should already being structurized, but this case happened to skip the structurization process because the unify-function-divergent-exit tries to optimize for uniformly reachable Unreachable block. I am not sure whether we have any workload depend on such optimization. Changing the behavior of unify-function-divergent-exit might regress quality of generated code in some cases. But teaching the existing structurizer to work on such case might need some non-trivial effort. Any comment? @arsenm

I can remove the opt line then, that's okay. The test is fine either way (fails before the patch, passes after)

If I understand correctly, you think this patch might make codegen worse in some cases? I am not sure what you mean by unify-function-divergent-exit
Does D131181 make this patch obsolete?

Hi @Pierre-vh, If you confirm D131181 help solve your problem, I think may be you can just drop the change.

Pierre-vh abandoned this revision.Aug 12 2022, 1:56 AM

resolved by D131181