This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Don't speculatively execute preductably-taken block
AbandonedPublic

Authored by lebedev.ri on Jan 24 2022, 12:43 PM.

Details

Summary

Back in D106650 / D106717 i've added these profile-guided bailouts,
but i've kept the speculation in the case where there is
a single block to speculate, and we predict it to be taken.

But now, i'm having second thoughts.
If we predict it to be taken, isn't flattening it,
and incurring the cost of a select, a pessimization?

Diff Detail

Event Timeline

lebedev.ri created this revision.Jan 24 2022, 12:43 PM
lebedev.ri requested review of this revision.Jan 24 2022, 12:43 PM
nikic added a comment.Jan 25 2022, 1:03 AM

Not sure on this. I think in theory you may be right, but in practice I suspect that converting a separate block into a select will on average be beneficial, because of the followup optimizations it enables. We'll be able to do more with a select than a whole basic block.

lkail added a subscriber: lkail.Jan 25 2022, 1:04 AM

This change implies that speculation is not beneficial regardless whether block is predicted to be taken or untaken. That essentially means that the optimization is not beneficial in most cases of unpredicted (at compile time) branch as well since hardware will actually predict well in most cases. While there is an explicit note that it's hard to do proper cost modeling on IR level there are still some simple cases which are beneficial. So I think we will regress in some cases.

Another thing to consider is (AFAIU) there is a more aggressive version of this optimization on machine level. Thus there could be cases when by not doing the optimization on IR we will still end up with transformation applied on MIR but will miss potential benefit from another IR level optimizations.

To summarize. I think this change may cause improvements and regressions as well. To better asses overall impact we need perf data.
We can start with the motivating example clearly showing benefit of not doing the transform and understand why and how general is the pattern ....

lkail added a comment.EditedJan 25 2022, 10:49 PM

If we predict it to be taken, isn't flattening it, and incurring the cost of a select

I think the cost of branch instruction(not including mispredict cost) can't be ignored either. The fallthrough basicblock doesn't need branch instruction, but the other does. If we got multiple selects, branch might be a win.

I agree with nikic and ebrevnov in that it is better to avoid converting more selects to branches in SimplifyCFG since they might block other IR-level optimizations.

In general, I think we need to do the inverse. Flatten the CFG more aggressively, allow a simpler control flow that will facilitate IR-level optimizations and then at the end of middle-end make the decision on whether to convert the selects to branches.
I have a RFC on cmov/branch decision-making that proposes extending the logic on CodeGenPrepare or having a pass just before it (essentially at the end of middle-end) .
In the current proposal, I did not change SimplifyCFG but from some preliminary experiments, preventing SimplifyCFG from making such decisions and deferring this for later further improves performance.
Besides, isn't SimplifyCFG supposed to be a canonicalization pass that enables others, rather than an optimization pass.

In general, I think we need to do the inverse. Flatten the CFG more aggressively, allow a simpler control flow that will facilitate IR-level optimizations

But you can expect perf problems due to missing optimizations with select over phi.

https://groups.google.com/g/llvm-dev/c/VcJnLMI7Deg/m/TshM3BkHCAAJ

(Or maybe GVN is now smarter..)

But you can expect perf problems due to missing optimizations with select over phi.

https://groups.google.com/g/llvm-dev/c/VcJnLMI7Deg/m/TshM3BkHCAAJ

(Or maybe GVN is now smarter..)

Thanks for pointing this out. I guess it is debatable what the canonical form should be (selects or phis).
I would assume that the current canonical form is the one that enables the most subsequent optimizations.
I thought that this was selects (but not sure). Despite some passes that would prefer phis (such as GVNs perhaps) there are still a lot of other ones that prefer selects. So, some regressions seem currently unavoidable but overall I would hope that one is better.

It would be great for others to weigh-in of what's the current state and whether either selects or phis are more canonical (i.e., on average bigger enablers).

Besides this debate, shouldn't SimplifyCFG convert to this canonical form (whichever is better) and no try to optimize on its own? It seems hacky to force SimplifyCFG to make such decisions but maybe if neither form is significantly better then this ad-hoc decision-making helps overall.

In general, I believe we should canonicalize to select, fix missing transforms where it makes sense, then treat optimal branch vs cmov lowering as a backend problem.

(Quick comment, may be missing context in the discussion which is important, don't let me derail if so.)

Matt added a subscriber: Matt.Jun 1 2022, 4:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 4:28 PM
lebedev.ri abandoned this revision.Oct 18 2022, 5:46 PM