This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB
ClosedPublic

Authored by XChy on Jul 21 2023, 4:10 AM.

Details

Summary

This patch extends function TryToSimplifyUncondBranchFromEmptyBlock to handle the similar cases below.

define i8 @src(i8 noundef %arg) {
start:
  switch i8 %arg, label %unreachable [
    i8 0, label %case012
    i8 1, label %case1
    i8 2, label %case2
    i8 3, label %end
  ]

unreachable:
  unreachable

case1:
  br label %case012

case2:
  br label %case012

case012:
  %phi1 = phi i8 [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
  br label %end

end:
  %phi2 = phi i8 [ %phi1, %case012 ], [ 4, %start ]
  ret i8 %phi2
}

The phis here should be merged into one phi, so that we can better optimize it:

define i8 @tgt(i8 noundef %arg) {
start:
  switch i8 %arg, label %unreachable [
    i8 0, label %end
    i8 1, label %case1
    i8 2, label %case2
    i8 3, label %case3
  ]

unreachable:
  unreachable

case1:
  br label %end

case2:
  br label %end

case3:
  br label %end

end:
  %phi = phi i8 [ 4, %case3 ], [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
  ret i8 %phi
}

Proof:
normal
multiple stages
multiple stages 2
multiple phi combinations

And lookup table optimization should convert it into add %arg 1.
This patch just match similar CFG structure and merge the phis in different cases.

Maybe such transform can be applied to other situations besides switch,
but I'm not sure whether it's better than not merging. Therefore, I only try it in switch,

Related issue:
Switch not fully optimized #63876

Diff Detail

Event Timeline

XChy created this revision.Jul 21 2023, 4:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2023, 4:10 AM
XChy requested review of this revision.Jul 21 2023, 4:10 AM
XChy updated this revision to Diff 542860.Jul 21 2023, 4:19 AM
XChy updated this revision to Diff 542908.Jul 21 2023, 7:24 AM

Avoid infinite loops

XChy updated this revision to Diff 542912.Jul 21 2023, 7:39 AM
  • Preserve domtree
XChy updated this revision to Diff 543242.Jul 22 2023, 7:39 PM

Fix the bugs of merging the phis from the BasicBlock out of switch.

nikic added a comment.Jul 23 2023, 1:22 PM

Just some nits for now, I need to go over the logic carefully another time.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
284 ↗(On Diff #543242)

nit: wrong & formatting

6750 ↗(On Diff #543242)

drop "the"

6771 ↗(On Diff #543242)

Missing period at end of sentence.

6776 ↗(On Diff #543242)

Omit braces

6778 ↗(On Diff #543242)

Using successors() instead of cases() would be more direct here?

6791 ↗(On Diff #543242)

Move this logic into a separate function, so you can use early return instead of IsMatched.

6825 ↗(On Diff #543242)

Just because something has a single successor, does not mean it's a plain branch.

6832 ↗(On Diff #543242)

CasePhi->use_empty()?

XChy updated this revision to Diff 543392.Jul 23 2023, 11:28 PM
XChy marked 7 inline comments as done.
XChy set the repository for this revision to rG LLVM Github Monorepo.
  • Fix all the nits.
  • Extract the logic of canMergePhisOfCase
XChy marked an inline comment as done.Jul 23 2023, 11:30 PM
nikic added a comment.Jul 24 2023, 7:30 AM

Okay, I think the logic here is mostly correct, though I'm not sure whether the overall framing for the transform is the right one.

I think the core of the transform is the following:

  1. We have a block A containing only phi nodes.
  2. With an unconditional branch to a block B that uses the phi nodes from A only in phi nodes.
  3. For edges PA -> A, PB -> B we have two cases:

3a. PA != PB. This is the straightforward case, where we can just redirect PA -> B without causing a phi conflict.
3b. PA == PB. This is the interesting case. We keep both edges here, and keep the phi node in A. If there is just one such conflict, then the phi node in A because a single-element phi, and we can remove it entirely. Otherwise, we should have to keep the phi. In that case the transform is probably(?) not worthwhile, so we should limit to the case where A and B have a single common predecessor.

This makes the root of the transform the "uncond branch with only phis" block, and has no direct relation to switch instructions. This would be pretty close to TryToSimplifyUncondBranchFromEmptyBlock(), just for the case where we have a conflict on one predecessor, and as such cannot actually remove the block completely, but can rewrite in order to remove the phi nodes.

Does this framing make sense to you?

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
6752 ↗(On Diff #543392)

SmallPtrSetImpl for arguments

6767–6773 ↗(On Diff #543392)
6780 ↗(On Diff #543392)

Omit braces

6784 ↗(On Diff #543392)

Unless I'm missing something, nothing here prevents CaseBB from dominating SuccBB, in which case there might be uses of phis from CaseBB outside of phis in SuccBB. Is the transform valid in that case?

6837 ↗(On Diff #543392)

2 *

XChy marked 5 inline comments as done.Jul 24 2023, 10:35 AM

Thanks for your review! I understand your framing and it clear my mind. This framing is much general, not just for switch. And as you analyse, actually, a large part of the logic of this patch is copied from TryToSimplifyUncondBranchFromEmptyBlock, specified for switch. I restrict this fold in a narrow situation. Though this transformation is not directly related to switch, but I didn't apply such mergence to all BasicBlocks, since it's not always worthwhile as you say.

In switch, there are often empty case successors with only phis that can be merged into CommonDest, and such phis mergence

  • Can be much more friendly to LookupTable optimization or something else.
  • May prevent multiple calls to getCaseResults, I think.
  • Bring better IR.
  • Can serve as a canonicalization of switch.

But for other situation, I'm not sure if this mergence is always a good transform. I have concerns here:

  • Too much inefficient mergeablity checks.
  • Such mergence may not help with other transforms (Quite a few transforms are based on two branches, but not multiple branches)

Maybe it's possible to refactor TryToSimplifyUncondBranchFromEmptyBlock to implement the same thing, but it's too large and too hard for me.
I'll highly appreciate it if you have other suggestions or some knowledge about the stuff above.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
6784 ↗(On Diff #543392)

CaseBB doesn't dominate SuccBB here. Because SuccBB is a case successor of switch, SuccBB has at least two predecessors (CaseBB, Switch Block). And what dominates SuccBB should be Switch Block or other predecessors, but not CaseBB.
CaseBBs.contains(SuccBB) guarantees it.

XChy updated this revision to Diff 543634.Jul 24 2023, 10:37 AM
XChy marked an inline comment as done.
  • Fix the nits
  • Clear the logic
XChy added a comment.Jul 25 2023, 12:14 AM

Here is an optimized example related to the lookup table optimization in the issue in summary:
add 1

You may need to run git clang-format HEAD~1.
The jump-table-islands.ll test case looks like it needs updating.

First time reviewer, please let me know if anything is wrong.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
6206 ↗(On Diff #543634)

Unnecessary debugging logs?

6758 ↗(On Diff #543634)

SuccBB looks like it could be a parameter?

6775 ↗(On Diff #543634)

Performing this check first may reduce unnecessary CasePredBBs updates?

6781 ↗(On Diff #543634)

Does it make sense to use a Changed variable to update multiple Cases at once?

DianQK added inline comments.Jul 25 2023, 6:53 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
6762 ↗(On Diff #543634)

Can these checks be used to create a small enough CaseBBs at MergePhisInSwitch?

XChy updated this revision to Diff 543964.Jul 25 2023, 7:30 AM
XChy marked 4 inline comments as done.
XChy set the repository for this revision to rG LLVM Github Monorepo.
  • Fix the nits
  • Change the order of checks to increase efficiency.

jump-table-islands.ll belongs to ARM's CodeGen test and it seems to be handwritten. Could anyone familiar with it suggest me how to update it?

XChy added inline comments.Jul 25 2023, 7:32 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
6758 ↗(On Diff #543634)

Sure.

6762 ↗(On Diff #543634)

I don't think it's necessary to create a small enough CaseBBs here. Instead, CaseBBs should include the successors of all cases, in order to guarantee correct checks like All predecessors of CaseBB is also the successor of one case. SuccBB is the successor of one case.

6775 ↗(On Diff #543634)

Thanks. That's more efficient

6781 ↗(On Diff #543634)

It may be much more complex to update them at once. There may be some diamond-like CFG or loops in Cases. Updating at once would bring some conflicts here.

XChy updated this revision to Diff 544253.Jul 26 2023, 1:28 AM

Omit jump-table-islands.ll test temporarily. (Sorry, I'm not familiar with ARM and unable to update it)
Contact related committers to update if this patch is approved.

nikic added inline comments.Jul 27 2023, 5:25 AM
llvm/test/CodeGen/ARM/jump-table-islands.ll
1

You can probably avoid the test change using -arm-atomic-cfg-tidy=0.

nikic added a comment.EditedJul 27 2023, 5:42 AM

Such mergence may not help with other transforms (Quite a few transforms are based on two branches, but not multiple branches)

Do you have any particular issues this could cause in mind?

I think even if we don't always perform the transform, we should still express it generically, and then have a profitability heuristic on top: E.g. say that the one common predecessor must have a switch terminator. That way we can clearly separate correctness from profitability requirements.

Edit: Just for reference, this is an example of the non-switch CFG I have in mind where such a transform should be possible: https://gist.github.com/nikic/8f22ef4e366fd3041c7e150c228f9673

XChy added a comment.Jul 27 2023, 8:00 AM

Do you have any particular issues this could cause in mind?

Sure, something like FoldTwoEntryPHINode/HoistThenElseToIf may miss/delay. Or some Phi to Select may be delayed/sunk. I haven't constructed counter-examples here. (Maybe it's my intuitive fault, I actually constructed some examples optimized well).
I think the probitability can be tested by the existing testcases here. I'll try to generalize TryToSimplifyUncondBranchFromEmptyBlock to perform the tests.

I think even if we don't always perform the transform, we should still express it generically, and then have a profitability heuristic on top: E.g. say that the one common predecessor must have a switch terminator. That way we can clearly separate correctness from profitability requirements.

I agree with you. The generalization is indeed necessary. It may take some time to complete it. At that time can we determine what profitability heuristic shoud be.

Edit: Just for reference, this is an example of the non-switch CFG I have in mind where such a transform should be possible: https://gist.github.com/nikic/8f22ef4e366fd3041c7e150c228f9673

Thanks! That could serve as a testcase to be expanded/reduced.

XChy updated this revision to Diff 544785.Jul 27 2023, 8:30 AM
XChy marked an inline comment as done.

Add -arm-atomic-cfg-tidy=0 option.

XChy updated this revision to Diff 545192.Jul 28 2023, 9:26 AM
  • Refactor TryToSimplifyUncondBranchFromEmptyBlock to merge phis between BB and Succ with a common predecessor. Logic correctly embedded (OK in Local tests), but not efficient currently.
  • A regression found, in switch-simplify-crash2.ll. But InstCombine would solve it.
  • Testcases not added now, incoming. And ARM/jump-islands-table.ll doesn't test well still.
llvm/test/CodeGen/ARM/jump-table-islands.ll
1

Fail still

XChy updated this revision to Diff 545346.Jul 29 2023, 2:05 AM
  • Add complete positive tests
  • Fix bugs about preserving domtree

Enhance further possible: If BB and Succ share more than one predecessor and combine some phis meanwhile, we can also simplify CFG.
However, we are not able to erase the phi in BB completely here. Does it make sense still?
example

XChy updated this revision to Diff 545365.Jul 29 2023, 6:20 AM

Update codesize-loop.ll

XChy updated this revision to Diff 545418.Jul 30 2023, 3:53 AM

[Clang-format]
It seems perform correctly and well in most cases.

XChy updated this revision to Diff 545568.Jul 31 2023, 3:06 AM

Update AArch64's codegen tests.

XChy retitled this revision from [SimplifyCFG] Transform for merging the combination of phis in switch to [SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB.Aug 12 2023, 11:59 PM
XChy edited the summary of this revision. (Show Details)

ping.

XChy updated this revision to Diff 555262.Aug 31 2023, 10:42 PM

Remove unnecessary helper function

DianQK requested changes to this revision.Aug 31 2023, 11:45 PM

Too much irrelevant formatting. It causes review to become complicated. It also affects other people blame related changes.
Could you ignore this formatting?

In fact, I'm not sure why. It could be that some commits are not formatted, or it could be a change in formatting rules. Or something else.
My formatting workflow is:

  • Ignore all irrelevant formatting if present.
  • Add and commit.
  • git clang-format HEAD~1
  • Since the latest submission is my own. Then I just run git add . and git clang-format HEAD~1.

And why didi stlxrb w9, w1, [x2] became stlxrb wzr, w1, [x2]?

This revision now requires changes to proceed.Aug 31 2023, 11:45 PM
XChy updated this revision to Diff 555409.Sep 1 2023, 9:11 AM

Drop unnecessary formatting.

Allen added a subscriber: Allen.Sep 1 2023, 9:32 AM
XChy marked an inline comment as done.Sep 1 2023, 9:48 AM

Too much irrelevant formatting. It causes review to become complicated. It also affects other people blame related changes.
Could you ignore this formatting?

In fact, I'm not sure why. It could be that some commits are not formatted, or it could be a change in formatting rules. Or something else.
My formatting workflow is:

  • Ignore all irrelevant formatting if present.
  • Add and commit.
  • git clang-format HEAD~1
  • Since the latest submission is my own. Then I just run git add . and git clang-format HEAD~1.

Sorry for bringing irrelevant formatting, which may be caused by my incautious global formatting.
You workflow advice is highly appreicated.

And why didi stlxrb w9, w1, [x2] became stlxrb wzr, w1, [x2]?

The cause is that we remove dead PHI nodes before elimination of mostly empty blocks. Actually the up-to-date llvm-repo has implemented it. You can see details in D158503.
I would make this patch up-to-date later.

XChy updated this revision to Diff 555430.Sep 1 2023, 10:21 AM
DianQK added a comment.Sep 3 2023, 7:47 PM

I read the history of jump-table-islands.ll.
This test checks for oversized jump tables jumping over each other.
You transform changes from

end:                                              ; preds = %complex, %simple
  %val = phi i8500 [ %l, %complex ], [ -1, %simple ]
  br label %common.ret

to

end:                                              ; preds = %complex
  br label %common.ret

The test was to check disappears.
Increasing BigInt makes no sense.
Sorry I don't know how to create a friendly diff display.
You could remove -arm-atomic-cfg-tidy=0 and change the end bb to:

end:
  %val = phi %BigInt [ %l, %complex ], [ -1, %simple ]
  %val2 = add %BigInt %val, 1
  ret %BigInt %val2
}

https://reviews.llvm.org/differential/diff/555646/

I feel the SKIP_TABLE name is confusing and should be changed to CONTINUE_TABLE/ISLAND_JUMP?

XChy added a comment.Sep 4 2023, 12:22 AM

I read the history of jump-table-islands.ll.
This test checks for oversized jump tables jumping over each other.
You transform changes from

end:                                              ; preds = %complex, %simple
  %val = phi i8500 [ %l, %complex ], [ -1, %simple ]
  br label %common.ret

to

end:                                              ; preds = %complex
  br label %common.ret

The test was to check disappears.
Increasing BigInt makes no sense.
Sorry I don't know how to create a friendly diff display.
You could remove -arm-atomic-cfg-tidy=0 and change the end bb to:

end:
  %val = phi %BigInt [ %l, %complex ], [ -1, %simple ]
  %val2 = add %BigInt %val, 1
  ret %BigInt %val2
}

https://reviews.llvm.org/differential/diff/555646/

I feel the SKIP_TABLE name is confusing and should be changed to CONTINUE_TABLE/ISLAND_JUMP?

Thanks for your information and suggestions. It's OK to apply IR below to avoid related transform:

declare void @use(%BigInt)
......
end:
  %val = phi %BigInt [ %l, %complex ], [ -1, %simple ]
  call void @use(%BigInt %val)
  ret %BigInt %val

If possible, however, it would be better if there is any other available option that does not modify the original IR.

XChy updated this revision to Diff 555666.Sep 4 2023, 12:26 AM

Insert use function into jump-table-islands.ll to avoid unrelated transform.

DianQK accepted this revision.Sep 8 2023, 7:46 AM

LGTM. But before landing, you'd better request @nikic approval for this.

If possible, however, it would be better if there is any other available option that does not modify the original IR.

I don't know. The purpose of your transformation is to tweak this test case.

llvm/lib/Transforms/Utils/Local.cpp
886

Irrelevant formatting.

1004

Should clang-tidy recommend lowercase beginnings?

1148

Irrelevant formatting.

1152

Again.

llvm/test/CodeGen/ARM/jump-table-islands.ll
39

“The purpose is to prevent SimplifyCFG from simplifying the phi node above.”
Would this description be better?

This revision is now accepted and ready to land.Sep 8 2023, 7:46 AM
XChy added a comment.Sep 19 2023, 4:15 AM

LGTM. But before landing, you'd better request @nikic approval for this.

If possible, however, it would be better if there is any other available option that does not modify the original IR.

I don't know. The purpose of your transformation is to tweak this test case.

Sure, thanks for your review!

XChy updated this revision to Diff 557030.Sep 19 2023, 4:21 AM
XChy marked 4 inline comments as done.

Reduce irrelevant formatting

nikic added inline comments.Sep 19 2023, 8:13 AM
llvm/lib/Transforms/Utils/Local.cpp
1124–1135

This sentence did not parse to me. Maybe something like this?

1128

Is it intentional that if BBKillable is true we still call CanRedirectPredsOfEmptyBBToSucc? Should this be !BBKillable &&?

1179

Why is this set no longer needed? Can no predecessors appear multiple times in this code?

1315

use -> U

XChy updated this revision to Diff 557089.Sep 19 2023, 8:35 PM
XChy marked 2 inline comments as done.

Polish comment and fix when to print log.

nikic added inline comments.Sep 19 2023, 10:47 PM
llvm/lib/Transforms/Utils/Local.cpp
1128

In particular, what worries me here is that we set BBKillable = true, then call CanRedirectPredsOfEmptyBBToSucc() which may set CommonPred (even if the function ultimately returns false) and then various code in the following will do things differently based on CommonPred. It's not clear to me whether this will result in misbehavior or not.

(With that in mind, I think it may also be cleaner if the BBPhisMergeable flag is removed and only CommonPred left, so that CommonPred != nullptr is equivalent to BBPhisMergeable and we don't leave dangling CommonPred if CanRedirectPredsOfEmptyBBToSucc exits early.)

XChy added inline comments.Sep 19 2023, 11:55 PM
llvm/lib/Transforms/Utils/Local.cpp
1004

Yes. But I just follow the original style like CanPropagatePredecessorsForPHIs.

1124–1135

Thanks for polishing my comment.

1128

Both BBKillable || redirect() and !BBKillable && redirect() do not call CanRedirectPredsOfEmptyBBToSucc when BBKillable is true.
What is different is , that when BBKillable is true, the former BBPhisMergeable is true while the latter is false.
Actually it's logically OK, since when BBKillable is true, whether BBPhisMergeable doesn't have any impact.
I write BBKillable || redirect() based on semantic meaning : Phis are mergeable if BB can be killed.

1179

BBPreds is a SmallPtrSet, on which we would not iterate the same predecessor multiple times.

nikic added inline comments.Sep 20 2023, 12:12 AM
llvm/lib/Transforms/Utils/Local.cpp
1179

Oh, I missed that this does the updates based on SmallPtrSet now. This is not allowed, because it makes the updates non-deterministic (order matters here). You should replace SmallPtrSet with SmallSetVector to keep order deterministic.

XChy updated this revision to Diff 557098.EditedSep 20 2023, 12:37 AM
XChy marked an inline comment as done.

Keep order when updating dom-tree.

XChy marked an inline comment as done.Sep 20 2023, 12:39 AM
XChy added inline comments.
llvm/lib/Transforms/Utils/Local.cpp
1128

What you're worried about make sense to me. But as far as I know, the assumption "BBKillable || redirect() will call redirect() when BBKillable is true" is wrong。

1179

Sure, restore to original codem though I don't know why order matters here. Because the algorithm for updating dom-tree?

nikic accepted this revision.Sep 24 2023, 7:00 AM

LG

llvm/lib/Transforms/Utils/Local.cpp
1128

Yeah, sorry, I got confused here.

1179

The structure of the dominator tree will be the same regardless of update order, but the order of the children in the tree may be different, which can lead to optimization differences.

XChy marked an inline comment as done.Sep 24 2023, 6:31 PM

Thanks for @DianQK and @nikic! This patch will be submitted as Github PR (It seems that doc
about submitting by Phabricator has been removed).

XChy closed this revision.Sep 24 2023, 7:17 PM