Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB
AcceptedPublic

Authored by XChy on Jul 21 2023, 4:10 AM.

Details

Summary

This patch extends function TryToSimplifyUncondBranchFromEmptyBlock to handle the similar cases below.

define i8 @src(i8 noundef %arg) {
start:
  switch i8 %arg, label %unreachable [
    i8 0, label %case012
    i8 1, label %case1
    i8 2, label %case2
    i8 3, label %end
  ]

unreachable:
  unreachable

case1:
  br label %case012

case2:
  br label %case012

case012:
  %phi1 = phi i8 [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
  br label %end

end:
  %phi2 = phi i8 [ %phi1, %case012 ], [ 4, %start ]
  ret i8 %phi2
}

The phis here should be merged into one phi, so that we can better optimize it:

define i8 @tgt(i8 noundef %arg) {
start:
  switch i8 %arg, label %unreachable [
    i8 0, label %end
    i8 1, label %case1
    i8 2, label %case2
    i8 3, label %case3
  ]

unreachable:
  unreachable

case1:
  br label %end

case2:
  br label %end

case3:
  br label %end

end:
  %phi = phi i8 [ 4, %case3 ], [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
  ret i8 %phi
}

Proof:
normal
multiple stages
multiple stages 2
multiple phi combinations

And lookup table optimization should convert it into add %arg 1.
This patch just match similar CFG structure and merge the phis in different cases.

Maybe such transform can be applied to other situations besides switch,
but I'm not sure whether it's better than not merging. Therefore, I only try it in switch,

Related issue:
Switch not fully optimized #63876

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
XChy added a comment.Jul 25 2023, 12:14 AM

Here is an optimized example related to the lookup table optimization in the issue in summary:
add 1

You may need to run git clang-format HEAD~1.
The jump-table-islands.ll test case looks like it needs updating.

First time reviewer, please let me know if anything is wrong.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
6206 ↗(On Diff #543634)

Unnecessary debugging logs?

6758 ↗(On Diff #543634)

SuccBB looks like it could be a parameter?

6762 ↗(On Diff #543634)

Can these checks be used to create a small enough CaseBBs at MergePhisInSwitch?

6775 ↗(On Diff #543634)

Performing this check first may reduce unnecessary CasePredBBs updates?

6781 ↗(On Diff #543634)

Does it make sense to use a Changed variable to update multiple Cases at once?

XChy updated this revision to Diff 543964.Jul 25 2023, 7:30 AM
XChy marked 4 inline comments as done.
XChy set the repository for this revision to rG LLVM Github Monorepo.
  • Fix the nits
  • Change the order of checks to increase efficiency.

jump-table-islands.ll belongs to ARM's CodeGen test and it seems to be handwritten. Could anyone familiar with it suggest me how to update it?

XChy added inline comments.Jul 25 2023, 7:32 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
6758 ↗(On Diff #543634)

Sure.

6762 ↗(On Diff #543634)

I don't think it's necessary to create a small enough CaseBBs here. Instead, CaseBBs should include the successors of all cases, in order to guarantee correct checks like All predecessors of CaseBB is also the successor of one case. SuccBB is the successor of one case.

6775 ↗(On Diff #543634)

Thanks. That's more efficient

6781 ↗(On Diff #543634)

It may be much more complex to update them at once. There may be some diamond-like CFG or loops in Cases. Updating at once would bring some conflicts here.

XChy updated this revision to Diff 544253.Jul 26 2023, 1:28 AM

Omit jump-table-islands.ll test temporarily. (Sorry, I'm not familiar with ARM and unable to update it)
Contact related committers to update if this patch is approved.

nikic added inline comments.Jul 27 2023, 5:25 AM
llvm/test/CodeGen/ARM/jump-table-islands.ll
1

You can probably avoid the test change using -arm-atomic-cfg-tidy=0.

nikic added a comment.EditedJul 27 2023, 5:42 AM

Such mergence may not help with other transforms (Quite a few transforms are based on two branches, but not multiple branches)

Do you have any particular issues this could cause in mind?

I think even if we don't always perform the transform, we should still express it generically, and then have a profitability heuristic on top: E.g. say that the one common predecessor must have a switch terminator. That way we can clearly separate correctness from profitability requirements.

Edit: Just for reference, this is an example of the non-switch CFG I have in mind where such a transform should be possible: https://gist.github.com/nikic/8f22ef4e366fd3041c7e150c228f9673

XChy added a comment.Jul 27 2023, 8:00 AM

Do you have any particular issues this could cause in mind?

Sure, something like FoldTwoEntryPHINode/HoistThenElseToIf may miss/delay. Or some Phi to Select may be delayed/sunk. I haven't constructed counter-examples here. (Maybe it's my intuitive fault, I actually constructed some examples optimized well).
I think the probitability can be tested by the existing testcases here. I'll try to generalize TryToSimplifyUncondBranchFromEmptyBlock to perform the tests.

I think even if we don't always perform the transform, we should still express it generically, and then have a profitability heuristic on top: E.g. say that the one common predecessor must have a switch terminator. That way we can clearly separate correctness from profitability requirements.

I agree with you. The generalization is indeed necessary. It may take some time to complete it. At that time can we determine what profitability heuristic shoud be.

Edit: Just for reference, this is an example of the non-switch CFG I have in mind where such a transform should be possible: https://gist.github.com/nikic/8f22ef4e366fd3041c7e150c228f9673

Thanks! That could serve as a testcase to be expanded/reduced.

XChy updated this revision to Diff 544785.Jul 27 2023, 8:30 AM
XChy marked an inline comment as done.

Add -arm-atomic-cfg-tidy=0 option.

XChy updated this revision to Diff 545192.Jul 28 2023, 9:26 AM
  • Refactor TryToSimplifyUncondBranchFromEmptyBlock to merge phis between BB and Succ with a common predecessor. Logic correctly embedded (OK in Local tests), but not efficient currently.
  • A regression found, in switch-simplify-crash2.ll. But InstCombine would solve it.
  • Testcases not added now, incoming. And ARM/jump-islands-table.ll doesn't test well still.
llvm/test/CodeGen/ARM/jump-table-islands.ll
1

Fail still

XChy updated this revision to Diff 545346.Jul 29 2023, 2:05 AM
  • Add complete positive tests
  • Fix bugs about preserving domtree

Enhance further possible: If BB and Succ share more than one predecessor and combine some phis meanwhile, we can also simplify CFG.
However, we are not able to erase the phi in BB completely here. Does it make sense still?
example

XChy updated this revision to Diff 545365.Jul 29 2023, 6:20 AM

Update codesize-loop.ll

XChy updated this revision to Diff 545418.Jul 30 2023, 3:53 AM

[Clang-format]
It seems perform correctly and well in most cases.

XChy updated this revision to Diff 545568.Jul 31 2023, 3:06 AM

Update AArch64's codegen tests.

XChy retitled this revision from [SimplifyCFG] Transform for merging the combination of phis in switch to [SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB.Aug 12 2023, 11:59 PM
XChy edited the summary of this revision. (Show Details)

ping.

XChy updated this revision to Diff 555262.Thu, Aug 31, 10:42 PM

Remove unnecessary helper function

DianQK requested changes to this revision.Thu, Aug 31, 11:45 PM

Too much irrelevant formatting. It causes review to become complicated. It also affects other people blame related changes.
Could you ignore this formatting?

In fact, I'm not sure why. It could be that some commits are not formatted, or it could be a change in formatting rules. Or something else.
My formatting workflow is:

  • Ignore all irrelevant formatting if present.
  • Add and commit.
  • git clang-format HEAD~1
  • Since the latest submission is my own. Then I just run git add . and git clang-format HEAD~1.

And why didi stlxrb w9, w1, [x2] became stlxrb wzr, w1, [x2]?

This revision now requires changes to proceed.Thu, Aug 31, 11:45 PM
XChy updated this revision to Diff 555409.Fri, Sep 1, 9:11 AM

Drop unnecessary formatting.

Allen added a subscriber: Allen.Fri, Sep 1, 9:32 AM
XChy marked an inline comment as done.Fri, Sep 1, 9:48 AM

Too much irrelevant formatting. It causes review to become complicated. It also affects other people blame related changes.
Could you ignore this formatting?

In fact, I'm not sure why. It could be that some commits are not formatted, or it could be a change in formatting rules. Or something else.
My formatting workflow is:

  • Ignore all irrelevant formatting if present.
  • Add and commit.
  • git clang-format HEAD~1
  • Since the latest submission is my own. Then I just run git add . and git clang-format HEAD~1.

Sorry for bringing irrelevant formatting, which may be caused by my incautious global formatting.
You workflow advice is highly appreicated.

And why didi stlxrb w9, w1, [x2] became stlxrb wzr, w1, [x2]?

The cause is that we remove dead PHI nodes before elimination of mostly empty blocks. Actually the up-to-date llvm-repo has implemented it. You can see details in D158503.
I would make this patch up-to-date later.

XChy updated this revision to Diff 555430.Fri, Sep 1, 10:21 AM
DianQK added a comment.Sun, Sep 3, 7:47 PM

I read the history of jump-table-islands.ll.
This test checks for oversized jump tables jumping over each other.
You transform changes from

end:                                              ; preds = %complex, %simple
  %val = phi i8500 [ %l, %complex ], [ -1, %simple ]
  br label %common.ret

to

end:                                              ; preds = %complex
  br label %common.ret

The test was to check disappears.
Increasing BigInt makes no sense.
Sorry I don't know how to create a friendly diff display.
You could remove -arm-atomic-cfg-tidy=0 and change the end bb to:

end:
  %val = phi %BigInt [ %l, %complex ], [ -1, %simple ]
  %val2 = add %BigInt %val, 1
  ret %BigInt %val2
}

https://reviews.llvm.org/differential/diff/555646/

I feel the SKIP_TABLE name is confusing and should be changed to CONTINUE_TABLE/ISLAND_JUMP?

XChy added a comment.Mon, Sep 4, 12:22 AM

I read the history of jump-table-islands.ll.
This test checks for oversized jump tables jumping over each other.
You transform changes from

end:                                              ; preds = %complex, %simple
  %val = phi i8500 [ %l, %complex ], [ -1, %simple ]
  br label %common.ret

to

end:                                              ; preds = %complex
  br label %common.ret

The test was to check disappears.
Increasing BigInt makes no sense.
Sorry I don't know how to create a friendly diff display.
You could remove -arm-atomic-cfg-tidy=0 and change the end bb to:

end:
  %val = phi %BigInt [ %l, %complex ], [ -1, %simple ]
  %val2 = add %BigInt %val, 1
  ret %BigInt %val2
}

https://reviews.llvm.org/differential/diff/555646/

I feel the SKIP_TABLE name is confusing and should be changed to CONTINUE_TABLE/ISLAND_JUMP?

Thanks for your information and suggestions. It's OK to apply IR below to avoid related transform:

declare void @use(%BigInt)
......
end:
  %val = phi %BigInt [ %l, %complex ], [ -1, %simple ]
  call void @use(%BigInt %val)
  ret %BigInt %val

If possible, however, it would be better if there is any other available option that does not modify the original IR.

XChy updated this revision to Diff 555666.Mon, Sep 4, 12:26 AM

Insert use function into jump-table-islands.ll to avoid unrelated transform.

DianQK accepted this revision.Fri, Sep 8, 7:46 AM

LGTM. But before landing, you'd better request @nikic approval for this.

If possible, however, it would be better if there is any other available option that does not modify the original IR.

I don't know. The purpose of your transformation is to tweak this test case.

llvm/lib/Transforms/Utils/Local.cpp
886

Irrelevant formatting.

1004

Should clang-tidy recommend lowercase beginnings?

1148

Irrelevant formatting.

1152

Again.

llvm/test/CodeGen/ARM/jump-table-islands.ll
39

“The purpose is to prevent SimplifyCFG from simplifying the phi node above.”
Would this description be better?

This revision is now accepted and ready to land.Fri, Sep 8, 7:46 AM
XChy added a comment.Tue, Sep 19, 4:15 AM

LGTM. But before landing, you'd better request @nikic approval for this.

If possible, however, it would be better if there is any other available option that does not modify the original IR.

I don't know. The purpose of your transformation is to tweak this test case.

Sure, thanks for your review!

XChy updated this revision to Diff 557030.Tue, Sep 19, 4:21 AM
XChy marked 4 inline comments as done.

Reduce irrelevant formatting

nikic added inline comments.Tue, Sep 19, 8:13 AM
llvm/lib/Transforms/Utils/Local.cpp
1119–1135

This sentence did not parse to me. Maybe something like this?

1123

Is it intentional that if BBKillable is true we still call CanRedirectPredsOfEmptyBBToSucc? Should this be !BBKillable &&?

1179

Why is this set no longer needed? Can no predecessors appear multiple times in this code?

1323

use -> U

XChy updated this revision to Diff 557089.Tue, Sep 19, 8:35 PM
XChy marked 2 inline comments as done.

Polish comment and fix when to print log.

nikic added inline comments.Tue, Sep 19, 10:47 PM
llvm/lib/Transforms/Utils/Local.cpp
1123

In particular, what worries me here is that we set BBKillable = true, then call CanRedirectPredsOfEmptyBBToSucc() which may set CommonPred (even if the function ultimately returns false) and then various code in the following will do things differently based on CommonPred. It's not clear to me whether this will result in misbehavior or not.

(With that in mind, I think it may also be cleaner if the BBPhisMergeable flag is removed and only CommonPred left, so that CommonPred != nullptr is equivalent to BBPhisMergeable and we don't leave dangling CommonPred if CanRedirectPredsOfEmptyBBToSucc exits early.)

XChy added inline comments.Tue, Sep 19, 11:55 PM
llvm/lib/Transforms/Utils/Local.cpp
1004

Yes. But I just follow the original style like CanPropagatePredecessorsForPHIs.

1119–1135

Thanks for polishing my comment.

1123

Both BBKillable || redirect() and !BBKillable && redirect() do not call CanRedirectPredsOfEmptyBBToSucc when BBKillable is true.
What is different is , that when BBKillable is true, the former BBPhisMergeable is true while the latter is false.
Actually it's logically OK, since when BBKillable is true, whether BBPhisMergeable doesn't have any impact.
I write BBKillable || redirect() based on semantic meaning : Phis are mergeable if BB can be killed.

1179

BBPreds is a SmallPtrSet, on which we would not iterate the same predecessor multiple times.

nikic added inline comments.Wed, Sep 20, 12:12 AM
llvm/lib/Transforms/Utils/Local.cpp
1179

Oh, I missed that this does the updates based on SmallPtrSet now. This is not allowed, because it makes the updates non-deterministic (order matters here). You should replace SmallPtrSet with SmallSetVector to keep order deterministic.

XChy updated this revision to Diff 557098.EditedWed, Sep 20, 12:37 AM
XChy marked an inline comment as done.

Keep order when updating dom-tree.

XChy marked an inline comment as done.Wed, Sep 20, 12:39 AM
XChy added inline comments.
llvm/lib/Transforms/Utils/Local.cpp
1123

What you're worried about make sense to me. But as far as I know, the assumption "BBKillable || redirect() will call redirect() when BBKillable is true" is wrong。

1179

Sure, restore to original codem though I don't know why order matters here. Because the algorithm for updating dom-tree?