This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block
ClosedPublic

Authored by lebedev.ri on Dec 4 2022, 10:06 AM.

Details

Summary

This tries to approach the problem noted by @arsenm:
terrible codegen for __builtin_fpclassify():
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Diff Detail

Event Timeline

lebedev.ri created this revision.Dec 4 2022, 10:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2022, 10:06 AM
lebedev.ri requested review of this revision.Dec 4 2022, 10:06 AM
arsenm added a comment.Dec 5 2022, 8:59 AM

Thanks

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3737

How can you be sure this won't constant fold to a ConstantExpr?

lebedev.ri updated this revision to Diff 480252.Dec 5 2022, 2:40 PM
lebedev.ri marked an inline comment as done.

@arsenm thank you for taking a look!
I believe this addresses your concern.

I've tried to take even more cautious approach here,
because we *do* want this xform to happen,
but at the same time, if we produce something silly,
we won't undo it soon enough, so it seems like we should
pay extra care to canonicalize all the "selects" we "form".

lebedev.ri planned changes to this revision.Dec 5 2022, 3:29 PM
lebedev.ri updated this revision to Diff 480285.Dec 5 2022, 4:53 PM

Don't forget to remap just-created instruction,
and don't mishandle the case where we invert the cond via not.
I *think* this is it.

lebedev.ri planned changes to this revision.Dec 5 2022, 5:00 PM
lebedev.ri updated this revision to Diff 480298.Dec 5 2022, 5:55 PM

@arsenm apologies noise :)
Now(C) this should be non-miscompiling.
Forgot to add truly exhaustive test coverage, and thus
didn't deal with all combinations branch flow. All good now.

lebedev.ri planned changes to this revision.Dec 6 2022, 1:23 PM
lebedev.ri updated this revision to Diff 480615.Dec 6 2022, 2:14 PM

Actually you know what, let's keep this as simple as possible for the first step.
I've dropped all of InstSimplify stuff, and ensured that llvm test-suite (+one more thing) passes.

arsenm added a comment.Dec 6 2022, 2:27 PM

I can't well which of these test changes is the new example. Is one of them literally what you get out of __builtin_fpclassify?

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3570

Recepie?

I can't well which of these test changes is the new example. Is one of them literally what you get out of __builtin_fpclassify?

Ctrl+F for test_builtin_fpclassify, it's at the end of the diff for llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll.
We don't quite manage to deal with it, but it's an improvement...

lebedev.ri updated this revision to Diff 480626.Dec 6 2022, 2:31 PM
lebedev.ri marked an inline comment as done.

@arsenm thank you for taking a look!
Fix typo.

arsenm added inline comments.Dec 6 2022, 3:11 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3575

Comment what a SelectCache is?

3740

Comment what this is trying to do

3742–3743

std::get_if?

3754–3755

dyn_cast instead of isa + cast

3887

I'm not really sure what this is doing; should there be a vector variant of the fpclassify test?

lebedev.ri updated this revision to Diff 480684.Dec 6 2022, 4:34 PM
lebedev.ri marked 5 inline comments as done.

@arsenm thank you for taking a look!
Test added @incompatible_ivs_of_two_phis.vec, without SawVectorSelect |= PN.getType()->isVectorTy(); it would diverge into CHEAP and COSTLY variants.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3887

These two lines are cost modelling.
We need to know how many selects we'll need,
and the cost model logic also rewards (allows bigger budget)
the cases where some of the instructions operate on vectors.

nikic added a subscriber: nikic.Dec 8 2022, 3:28 AM
nikic added inline comments.
llvm/test/Transforms/SimplifyCFG/X86/SpeculativeExec.ll
15

Looks like this test, and many similar tests regress, where we go from two selects to three selects. And we never seem to recover from this in either the middle end or the back end: https://alive2.llvm.org/ce/z/GjCXkB

nikic added inline comments.Dec 8 2022, 3:41 AM
llvm/test/Transforms/SimplifyCFG/X86/SpeculativeExec.ll
15

Filed https://github.com/llvm/llvm-project/issues/59393. We should at least add an InstCombine fold, but ideally we'd avoid regressing this in SimplifyCFG.

lebedev.ri added inline comments.Dec 8 2022, 6:03 AM
llvm/test/Transforms/SimplifyCFG/X86/SpeculativeExec.ll
15

I'll deal with the InstCombine side of this.
I agree that this is a regression, that obviously affects further speculation,
but i'm not sure how much we can avoid it during speculation.
I can take a look, but not in this patch,
so i hope this patch isn't going to be blocked on that.

lebedev.ri marked 2 inline comments as done.

Thanks for taking a look!

Added requested instcombine fold (9ddff66d0c9c3e18d56e6b20aa26a2a8cdfb6d2b), rebased.
Does anyone else have any other comments here?

llvm/test/Transforms/SimplifyCFG/X86/SpeculativeExec.ll
15

Added InstCombine fold (9ddff66d0c9c3e18d56e6b20aa26a2a8cdfb6d2b), after which we get:

define i32 @test1(i32 %a, i32 %b, i32 %c) {
entry:
  %t1 = icmp eq i32 %b, 0
  %t2 = icmp sgt i32 %c, 1
  %t3 = zext i1 %t2 to i32
  %t4.sel = add i32 %t3, %a
  %t4 = select i1 %t1, i32 %t4.sel, i32 %b
  %t5 = add i32 %t4, -1
  ret i32 %t5
}

... so one of the two LHS selects was redundant also.
We may be able to improve this, but i really would prefer not to conflate things.

lebedev.ri added inline comments.Dec 12 2022, 5:49 AM
llvm/test/Transforms/SimplifyCFG/X86/SpeculativeExec.ll
15

I'm going to make a guess that if that feedback
was meant to be blocking, it would have been marked as such,
and proceed with the change, acknowledging the possible further improvements.

nikic added inline comments.Dec 12 2022, 6:12 AM
llvm/test/Transforms/SimplifyCFG/X86/SpeculativeExec.ll
15

This is not blocking, but I did not review the implementation and only glanced over test diffs. The patch looks pretty complicated for what it does (in the sense of a minimal baseline implementation on top of which incremental changes may be stacked).

lebedev.ri marked an inline comment as done.Dec 12 2022, 6:30 AM
lebedev.ri added inline comments.
llvm/test/Transforms/SimplifyCFG/X86/SpeculativeExec.ll
15

This is not blocking

but I did not review the implementation and only glanced over test diffs.

Right, i did not imply that you did.

The patch looks pretty complicated for what it does (in the sense of a minimal baseline implementation on top of which incremental changes may be stacked).

Right, the non-test side of things could be much smaller if not for the abstractons.
I'm hoping to make use of them in other places we speculate.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 12 2022, 7:20 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
lebedev.ri marked an inline comment as done.

@nathanchance is reporting that this is breaking PPC32 Linux kernel boots.
https://github.com/ClangBuiltLinux/linux/issues/1770
We'll work on trying to get a reproducer or more info, but "heads up."

@nathanchance is reporting that this is breaking PPC32 Linux kernel boots.
https://github.com/ClangBuiltLinux/linux/issues/1770
We'll work on trying to get a reproducer or more info, but "heads up."

We are also experiencing some test failures that root cause to this commit. Here is a reproducer:
https://godbolt.org/z/aq6nYffxv

The function returns different results with the two different compiler versions.

alexfh added a subscriber: alexfh.Dec 16 2022, 8:17 AM

I will revert this patch. Please ensure the provided test case works fine before recommitting.

I will revert this patch. Please ensure the provided test case works fine before recommitting.

Thanks for the reproducer. Will revert in a sec.

I will revert this patch. Please ensure the provided test case works fine before recommitting.

Thanks for the reproducer. Will revert in a sec.

I've done that already. I had a revert prepared with conflict resolutions for the opaque pointers migration.

I will revert this patch. Please ensure the provided test case works fine before recommitting.

Thanks for the reproducer. Will revert in a sec.

I've done that already. I had a revert prepared with conflict resolutions for the opaque pointers migration.

So. There is a problem.
Before the patch, this is what we got: https://alive2.llvm.org/ce/z/WMZXQu <-- good
With the patch, we'd get: https://alive2.llvm.org/ce/z/oBg872 <-- miscompile
BUT. If we just run simplifycfg on the old result, we get: https://alive2.llvm.org/ce/z/zb7bsb <-- good!
So the actual miscompile is elsewhere.

-- 49. InstCombinePass

----------------------------------------
define float @_Z1ff(float noundef %0) nofree noundef willreturn memory(none) {
%1:
  %2 = fcmp ugt float noundef %0, 0.000000
  %3 = fcmp ult float noundef %0, 1.000000
  %or.cond = and i1 %2, %3
  %4 = fcmp uge float noundef %0, 0.200000
  %5 = xor i1 %2, %or.cond
  %.0.sel6 = select i1 %5, float 0.100000, float 0.000000
  %or.cond7 = and i1 %4, %or.cond
  %6 = fadd float noundef %0, -0.100000
  %.0 = select i1 %or.cond7, float %6, float %.0.sel6
  ret float %.0
}
=>
define float @_Z1ff(float noundef %0) nofree noundef willreturn memory(none) {
%1:
  %2 = fcmp ugt float noundef %0, 0.000000
  %3 = fcmp ult float noundef %0, 1.000000
  %or.cond = and i1 %2, %3
  %4 = fcmp uge float noundef %0, 0.200000
  %5 = fadd float noundef %0, -0.100000
  %.0.sel6 = select i1 %4, float %5, float 0.100000
  %.0 = select i1 %or.cond, float %.0.sel6, float 0.000000
  ret float %.0
}
Transformation doesn't verify! (unsound)
ERROR: Value mismatch

Example:
float noundef %0 = #x7f800000 (+oo)

Source:
i1 %2 = #x1 (1)
i1 %3 = #x0 (0)
i1 %or.cond = #x0 (0)
i1 %4 = #x1 (1)
i1 %5 = #x1 (1)
float %.0.sel6 = #x3dcccccd (0.100000001490?)
i1 %or.cond7 = #x0 (0)
float %6 = #x7f800000 (+oo)
float %.0 = #x3dcccccd (0.100000001490?)

Target:
i1 %2 = #x1 (1)
i1 %3 = #x0 (0)
i1 %or.cond = #x0 (0)
i1 %4 = #x1 (1)
float %5 = #x7f800000 (+oo)
float %.0.sel6 = #x7f800000 (+oo)
float %.0 = #x00000000 (+0.0)
Source value: #x3dcccccd (0.100000001490?)
Target value: #x00000000 (+0.0)

Pass: InstCombinePass
Command line: '/builddirs/llvm-project/build-Clang15/bin/opt' '-load=/repositories/alive2/build-Clang-release/tv/tv.so' '-load-pass-plugin=/repositories/alive2/build-Clang-release/tv/tv.so' '-tv-exit-on-error' '-O1' '/tmp/test.ll' '-tv-smt-to=20000' '-tv-report-dir=/repositories/alive2/build-Clang-release/logs' '-tv-smt-stats'

just wanted to note that this did increase binary size in some places (https://bugs.chromium.org/p/chromium/issues/detail?id=1401820), even though the description claims it will basically never fire. (I'm not blocking anything on that, just a note)

separately, IMO this patch is non-trivial enough to deserve an lgtm before landing

just wanted to note that this did increase binary size in some places (https://bugs.chromium.org/p/chromium/issues/detail?id=1401820),

It's almost assuredly second-order effects.

even though the description claims it will basically never fire. (I'm not blocking anything on that, just a note)

That description is based off of it's current cost model, which is extremely, extremely, conservative.

separately, IMO this patch is non-trivial enough to deserve an lgtm before landing

There are several things to consider in this cases:

  • Trivial-ness is different for everyone. I think, this is reasonably straight-forward.
  • Review is useful to catch issues, BUT it is even more important to learn from previous issues to not repeat them, both to reduce review cycles, especially so when no adequate review can be requested/acquired.
  • We seem to not have many active contributors in the areas i touch, so if i always just wait for review, i will be waiting a long while. And at the same time, not learn from mistakes, and not be able to review other's code.

FYI, this change caused widespread performance regressions in our testing. We haven't investigated further since the change was reverted.

I can re-run these tests and do some analysis on the next revision once you decide to reapply.

FYI, this change caused widespread performance regressions in our testing. We haven't investigated further since the change was reverted.

I can re-run these tests and do some analysis on the next revision once you decide to reapply.

At this point this is waiting on @alexfh's reproducer mostly.
I'm not sure what sequence of events you have in mind,
but presumably it may be good to have an idea
about at least some of those regressions before this is relanded?

samtebbs added a subscriber: samtebbs.EditedJan 5 2023, 3:27 AM

We've also seen lots of large regressions for a variety of ARM CPUs with the original patch and that weren't fixed by any updates. At the moment it's not clear if downstream changes are to blame so I'll sort out a reproducer as soon as I can.

FYI, this change caused widespread performance regressions in our testing. We haven't investigated further since the change was reverted.

I can re-run these tests and do some analysis on the next revision once you decide to reapply.

At this point this is waiting on @alexfh's reproducer mostly.

IIUC, the reproducer you mentioned is the one here: https://reviews.llvm.org/rG9ddff66d0c9c3e18d56e6b20aa26a2a8cdfb6d2b#1158810. Given that there was a bit of confusion and the problem had been fixed in ToT before the issue was reported, there's no concerns on our side about relanding this commit.

Just fyi, our performance testing shows around 20% improvements of the mediabench-g721 benchmark from the LLVM's MultiSource test-suite associated with the reland of this patch (428f36401b1b695fd501ebfdc8773bed8ced8d4e) and no regressions on any other benchmarks we run.

@apilipenko @samtebbs please, can you clarify, do you wish to block relanding of this patch until those issues you saw are addressed, or at least reduced and reported?

FYI, this change caused widespread performance regressions in our testing. We haven't investigated further since the change was reverted.

I can re-run these tests and do some analysis on the next revision once you decide to reapply.

At this point this is waiting on @alexfh's reproducer mostly.

IIUC, the reproducer you mentioned is the one here: https://reviews.llvm.org/rG9ddff66d0c9c3e18d56e6b20aa26a2a8cdfb6d2b#1158810. Given that there was a bit of confusion and the problem had been fixed in ToT before the issue was reported, there's no concerns on our side about relanding this commit.

Just fyi, our performance testing shows around 20% improvements of the mediabench-g721 benchmark from the LLVM's MultiSource test-suite associated with the reland of this patch (428f36401b1b695fd501ebfdc8773bed8ced8d4e) and no regressions on any other benchmarks we run.

Thank you!

samtebbs added a comment.EditedJan 17 2023, 8:28 AM

After investigating it looks like the regressions are due to something downstream so I don't want to block this relanding based on that. Thank you for waiting.

After investigating it looks like the regressions are due to something downstream so I don't want to block this relanding based on that. Thank you for waiting.

Thanks! @apilipenko ?

Hi @lebedev.ri , after some more investigation it looks like the regressions are due to this patch breaking canonical loop form and therefore preventing optimisations later on in the pipeline. I don't think that the additional comparisons and selects that come from this transformation are ideal in all cases, either. I've attached a reproducer.