Page MenuHomePhabricator

[SimplifyCFG] don't sink common insts too soon (PR34603)
ClosedPublic

Authored by spatel on Oct 4 2017, 3:41 PM.

Details

Summary

This is a minimal patch to solve PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

As discussed there, we need to prevent SimplifyCFG from performing aggressive flattening before early-cse, GVN, and potentially other passes have had a chance to run because we may not be able to see through those transformations. In the example from the original bug report (PR28343), this prevents a whole series of optimizations: recognizing a 'max', vectorizing that max, unrolling a loop, etc.

I've opted to just keep going with the current -simplifycfg / -latesimplifycfg separation, but we could lift the individual options to the pass manager layer if that seems better.

Diff Detail

Event Timeline

spatel created this revision.Oct 4 2017, 3:41 PM
hfinkel edited edge metadata.Oct 4 2017, 6:16 PM

This is a minimal patch to solve PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Can you please summarize here what's going on and how you arrived at this solution? Last I recall, in PR34603, we were discussing select formation, and how it might not be desirable to form selects early. By "aggressive flattening" do you mean select formation? What you're doing here seems like more than just disabling select formation, but sinking? I also think that we might not want to sink early, at least some things (because we can't later prove it safe to re-hoist), but it's not immediately clear how all of these things are related.

spatel added a comment.Oct 5 2017, 7:04 AM

This is a minimal patch to solve PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Can you please summarize here what's going on and how you arrived at this solution? Last I recall, in PR34603, we were discussing select formation, and how it might not be desirable to form selects early. By "aggressive flattening" do you mean select formation? What you're doing here seems like more than just disabling select formation, but sinking? I also think that we might not want to sink early, at least some things (because we can't later prove it safe to re-hoist), but it's not immediately clear how all of these things are related.

Sorry for the invented vocabulary. 'Aggressive flattening' was meant to include the select formation, but that's not actually the root of the problem. The sinking is the root of the problem, so that's why I'm proposing to disable that in the early incarnation of SimplifyCFG. My intent (probably needing refinement) is to delay the transform that was added with rL279460 ( SinkThenElseCodeToEnd ) because that's where we regressed the examples in the bug reports. Let me illustrate using the example in the test cases (it's a slightly simplified form of the motivating example in the bug report):

  1. When we start, we have redundant values across multiple blocks:
entry:
%xi_ptr = getelementptr double, double* %x, i64 %i
%yi_ptr = getelementptr double, double* %y, i64 %i
%xi = load double, double* %xi_ptr
%yi = load double, double* %yi_ptr

if:
%xi_ptr_again = getelementptr double, double* %x, i64 %i
%xi_again = load double, double* %xi_ptr_again

else:
%yi_ptr_again = getelementptr double, double* %y, i64 %i
%yi_again = load double, double* %yi_ptr_again

early-cse can kill the redundant ops, so we just have the gep+load in 'entry' when it's done.

  1. simplifycfg currently runs before early-cse in all -O# pipelines, and SinkThenElseCodeToEnd() applies this transform to create sunk ops in the 'end' block
entry:
  %xi_ptr = getelementptr double, double* %x, i64 %i
  %yi_ptr = getelementptr double, double* %y, i64 %i
  %xi = load double, double* %xi_ptr, align 8
  %yi = load double, double* %yi_ptr, align 8
  %cmp = fcmp ogt double %xi, %yi
  br i1 %cmp, label %if, label %end

if:                                          
  br label %end

end:                                           
  %y.sink = phi double* [ %x, %if ], [ %y, %entry ]    <--- sunk phi of gep base pointers
  %new_gep = getelementptr double, double* %y.sink, i64 %i  <--- redundant geps got killed
  %new_load = load double, double* %new_gep, align 8  <--- redundant loads got killed
  ret double %new_load
  1. At this point, there are no CSE opportunities. The initial geps/loads are not the same as the 'common' sunk ops. SimplifyCFG carries on and turns the 2-case phi into a select:
%xi_ptr = getelementptr double, double* %x, i64 %i
%yi_ptr = getelementptr double, double* %y, i64 %i
%xi = load double, double* %xi_ptr
%yi = load double, double* %yi_ptr
%cmp = fcmp ogt double %xi, %yi
%y.sink = select i1 %cmp, double* %x, double* %y
%new_gep = getelementptr double, double* %y.sink, i64 %i
%new_load = load double, double* %new_gep
ret double %new_load
  1. Based on https://bugs.llvm.org/show_bug.cgi?id=34603#c19 , I think we agreed that this is practically impossible to recover from in later passes. Therefore, I'm proposing to delay the initial sinking that creates 'common' ops out of potentially redundant ops. It's possible that we also, independently, want to delay select formation in other cases. It's also possible that I've used a sledgehammer where a chisel would've worked. :)

This is a minimal patch to solve PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Can you please summarize here what's going on and how you arrived at this solution? Last I recall, in PR34603, we were discussing select formation, and how it might not be desirable to form selects early. By "aggressive flattening" do you mean select formation? What you're doing here seems like more than just disabling select formation, but sinking? I also think that we might not want to sink early, at least some things (because we can't later prove it safe to re-hoist), but it's not immediately clear how all of these things are related.

Sorry for the invented vocabulary. 'Aggressive flattening' was meant to include the select formation, but that's not actually the root of the problem. The sinking is the root of the problem, so that's why I'm proposing to disable that in the early incarnation of SimplifyCFG. My intent (probably needing refinement) is to delay the transform that was added with rL279460 ( SinkThenElseCodeToEnd ) because that's where we regressed the examples in the bug reports. Let me illustrate using the example in the test cases (it's a slightly simplified form of the motivating example in the bug report):

Thanks for the explanation. So, to be clear, we're only delaying the "sinking through PHIs" type of sinking until late?

I'm a bit concerned about the fact that LateSimplyCFG currently runs right after the SLP vectorizer, but we probably want to do select formation aggressively before the SLP vectorizer. We might, in fact, want to form selects more aggressively before the loop vectorizer too. In short, I'm not sure that this variety of "late" is the same as our existing concept of "late" (which is something that interferes with loop structure, and thus, should come after vectorization (at least loop vectorization)).

spatel added a comment.Oct 5 2017, 1:11 PM

Thanks for the explanation. So, to be clear, we're only delaying the "sinking through PHIs" type of sinking until late?

Correct - everything is still here, but we're using the existing def of "late" to limit when it happens.

I'm a bit concerned about the fact that LateSimplyCFG currently runs right after the SLP vectorizer, but we probably want to do select formation aggressively before the SLP vectorizer. We might, in fact, want to form selects more aggressively before the loop vectorizer too. In short, I'm not sure that this variety of "late" is the same as our existing concept of "late" (which is something that interferes with loop structure, and thus, should come after vectorization (at least loop vectorization)).

Ok - it sounds like I should proceed with using the optional bits within the pass manager(s). I'll make it so that we're only preventing the sinking on the first instantiation of simplifycfg (the one before early-cse).

spatel updated this revision to Diff 120830.Oct 30 2017, 8:50 AM

Patch updated:
Now that we have SimplifyCFGOptions exposed to the pass managers, we can use a precise disablement of sinking common instructions to only the first instantiation of -simplifycfg (the one that runs before early-cse).

Depending on your perspective, this is either a redefinition of "canonical form" or we've created a new "pre-canonical form".

Given the comments in PR34603, I suspect we want to limit simplifycfg sinking even more, but that should be a follow-up patch with more phase ordering tests, so we know exactly what kind of patterns we're affecting.

Patch updated:
Now that we have SimplifyCFGOptions exposed to the pass managers, we can use a precise disablement of sinking common instructions to only the first instantiation of -simplifycfg (the one that runs before early-cse).

Why? The selects also hurt GVN, etc. I thought we were going to delay this until the end of the canonicalization phase. Only then might be have more "phases" in this sense.

Depending on your perspective, this is either a redefinition of "canonical form" or we've created a new "pre-canonical form".

Given the comments in PR34603, I suspect we want to limit simplifycfg sinking even more, but that should be a follow-up patch with more phase ordering tests, so we know exactly what kind of patterns we're affecting.

Patch updated:
Now that we have SimplifyCFGOptions exposed to the pass managers, we can use a precise disablement of sinking common instructions to only the first instantiation of -simplifycfg (the one that runs before early-cse).

Why? The selects also hurt GVN, etc. I thought we were going to delay this until the end of the canonicalization phase. Only then might be have more "phases" in this sense.

I might be overcautious here, but as I wrote in the last comment, I'd like to at least try to manufacture a GVN test case to show that difference. Also, I was hoping to get this diff in independently to reduce the risk of revert from unintended consequences from changing behavior in the later simplifycfg instantiations. If you feel more confident, I can flip the parameter in more places (or invert the default setting) in this patch.

Depending on your perspective, this is either a redefinition of "canonical form" or we've created a new "pre-canonical form".

Given the comments in PR34603, I suspect we want to limit simplifycfg sinking even more, but that should be a follow-up patch with more phase ordering tests, so we know exactly what kind of patterns we're affecting.

spatel added a comment.Nov 6 2017, 7:25 AM

I might be overcautious here, but as I wrote in the last comment, I'd like to at least try to manufacture a GVN test case to show that difference. Also, I was hoping to get this diff in independently to reduce the risk of revert from unintended consequences from changing behavior in the later simplifycfg instantiations. If you feel more confident, I can flip the parameter in more places (or invert the default setting) in this patch.

As evidence that my fear is not completely delusional, the preliminary patches for this change were recently reverted:
https://reviews.llvm.org/rL317444
...this even erased the commit that added the innocent bystander test case for this patch (sigh).

As evidence that my fear is not completely delusional, the preliminary patches for this change were recently reverted:
https://reviews.llvm.org/rL317444
...this even erased the commit that added the innocent bystander test case for this patch (sigh).

I recommitted the preliminary patches at:
https://reviews.llvm.org/rL318299
https://reviews.llvm.org/rL318300

...so ping?

davide added a subscriber: davide.Nov 15 2017, 11:18 AM

The proliferation of cl::opt(s) maybe hints SimplifyCFG is growing a little too much? & might need to be split in separate passes.
SimplifyCFG is a pretty fundamental canonicalization pass & should be almost free to run. For sinking, we have a separate pass, so maybe it's worth taking a look at whether that's fixable instead of patching SimplifyCFG behaviour? I really would like to prevent SimplifyCFG to become the next Instcombine.

Until ~ 1 year ago, SimplifyCFG was a single pass, and everything was fine (and IMHO, that's how it should be).
Then we found out it had bad interactions with inlining, so we decided to split in two passes. That's OK, as long as it's una tantum.
Keeping adding knobs to tweak the 4-5 different SimplifyCFG invocations in-tree is a slippery rope.

Until ~ 1 year ago, SimplifyCFG was a single pass, and everything was fine (and IMHO, that's how it should be).
Then we found out it had bad interactions with inlining, so we decided to split in two passes. That's OK, as long as it's una tantum.
Keeping adding knobs to tweak the 4-5 different SimplifyCFG invocations in-tree is a slippery rope.

I'm not opposed to splitting it up, but that's a larger effort / follow-up to this.
Creating a struct of options for this pass was discussed as a reasonable way to solve PR34603, so that's what I've implemented:
https://bugs.llvm.org/show_bug.cgi?id=34603#c20

One issue with the struct approach. Suppose I opt-bisect to a failure caused by one of these simplifyCFG passes. How do I invoke that specific version of simplifycfg from opt to debug it or even produce a reduced test case.

One issue with the struct approach. Suppose I opt-bisect to a failure caused by one of these simplifyCFG passes. How do I invoke that specific version of simplifycfg from opt to debug it or even produce a reduced test case.

I haven't tried opt-bisect. Does it show the sequence of passes leading up to the failure, or just a single pass where the IR goes wrong? If you have the sequence, then you could cross-reference that to the pipeline manager's instantiation to see how it got invoked and mimic that with the cl::opt. That's not ideal of course, but this is a general problem for any pass that takes options, right?

One issue with the struct approach. Suppose I opt-bisect to a failure caused by one of these simplifyCFG passes. How do I invoke that specific version of simplifycfg from opt to debug it or even produce a reduced test case.

You need to remember the (now) 4, and possibly more flags and try all the combinations (at least this is what I did when I bisected https://bugs.llvm.org/show_bug.cgi?id=35246).
It's a little inconvenient, to put it lightly. That very same bug shows that several part of SimplifyCFG don't play nice with each other, FWIW.

Until ~ 1 year ago, SimplifyCFG was a single pass, and everything was fine (and IMHO, that's how it should be).
Then we found out it had bad interactions with inlining, so we decided to split in two passes. That's OK, as long as it's una tantum.
Keeping adding knobs to tweak the 4-5 different SimplifyCFG invocations in-tree is a slippery rope.

I'm not opposed to splitting it up, but that's a larger effort / follow-up to this.
Creating a struct of options for this pass was discussed as a reasonable way to solve PR34603, so that's what I've implemented:
https://bugs.llvm.org/show_bug.cgi?id=34603#c20

Once you end up adding n knobs people start relying on them and splitting the pass becomes more complicated, so "doing that as a follow-up" doesn't really work unless you have a clear plan in mind on how to split. If the majority thinks this is OK, I'm not going to argue further, but this combinatorial explosion of options introduced is clearly a symptom of a larger problem, IMHO.

Ping.

Note that there's another bug that I think will be helped by this patch - https://bugs.llvm.org/show_bug.cgi?id=35354 .
D40304 is proposing to fix that in instcombine.

davide removed a subscriber: davide.Nov 27 2017, 12:53 PM

Ping * 2.

Can you please summarize where we are now with this? I still prefer to have only one form of SimplifyCFG during canonicalization. I feel that doing otherwise is a mistake, in part to make the canonical form understandable, and in part because canonicalization is iterative (now for devirtualization, but maybe in other ways in the future), and so it's much better if we don't think about phases within canonicalization itself.

Ping * 2.

Can you please summarize where we are now with this? I still prefer to have only one form of SimplifyCFG during canonicalization. I feel that doing otherwise is a mistake, in part to make the canonical form understandable, and in part because canonicalization is iterative (now for devirtualization, but maybe in other ways in the future), and so it's much better if we don't think about phases within canonicalization itself.

Sure - the patch is identical to when we discussed this around Oct 30,31. The preliminary steps were blamed for causing a bug ( https://bugs.llvm.org/show_bug.cgi?id=35210 ) and reverted, but subsequently the patches were exonerated, and the bug was fixed independently of this.

There was also concern that having options to passes makes it harder to track down bugs and is a sign of bad design. I don't disagree with that view, but we've already come this far, so I view a change of direction as a follow-up because that's significantly more work at this point. Also, this patch fixes a known regression (a significant performance hit), and I want to close that hole ASAP.

So we have the bare minimum to avoid the bug in the motivating case. But I see your point - having multiple canonical forms/phases will make things more complicated. I thought I might have a test case for the larger patch, but I haven't found an example yet. I'll update the patch as you've requested. If I can come up with a test for it, I'll include it. If not, hopefully, we can just proceed with the one test that's already here.

spatel updated this revision to Diff 126573.Dec 12 2017, 10:16 AM

Patch updated:
As suggested, turn off sinking of common instructions by default because that's known to be potentially harmful to both early-cse and GVN. Enable that option after GVN has run, in the former "latesimplifycfg" invocation, and when invoked in the target-specific backend pipelines for ARM/AArch64 (there is an AArch64 regression test in test/CodeGen/AArch64/rm_redundant_cmp.ll that requires the sinking transform).

hfinkel added inline comments.Dec 12 2017, 8:20 PM
lib/Passes/PassBuilder.cpp
460

I don't think that we want this here. This is still during canonicalization. You already have this right after SLP vectorization (we might actually want this right before SLP vectorization - do we SLP vectorize things we'll turn into selects otherwise?). Is it bad if you don't have this here for some test case?

spatel added inline comments.Dec 13 2017, 8:31 AM
lib/Passes/PassBuilder.cpp
460

There are no regression test diffs if I remove the sinking from that invocation (here and the equivalent in the old pass manager).

But there is no SimplifyCFG ahead of the vectorizers in buildModuleOptimizationPipeline() (around line 685). Should I just add a SimplifyCFG with sinkCommonInsts ahead of the loop passes? I agree that we have the potential to miss vectorizations if we have not converted into select form before we get to the vectorization passes.

I tried to manufacture an -O2 PhaseOrdering test that would show the potential regression for SLP, but no luck yet.

spatel updated this revision to Diff 126770.Dec 13 2017, 9:13 AM

Patch updated:

  1. Removed the sinking option from the SimplifyCFG invocations that are part of the canonicalization phase.
  2. Speculatively implemented what I suggested in the last comment - added a SimplifyCFG pass ahead of vectorization to allow the sinking transform (help the vectorizers by creating larger blocks with selects).

Patch updated:

  1. Removed the sinking option from the SimplifyCFG invocations that are part of the canonicalization phase.
  2. Speculatively implemented what I suggested in the last comment - added a SimplifyCFG pass ahead of vectorization to allow the sinking transform (help the vectorizers by creating larger blocks with selects).

Thanks. I'm pretty sure that you want it after the loop vectorizer and before the SLP vectorizer. The reason is that the loop vectorizer does its own if conversion, and moreover, the selects on addresses tend to mess up vectorization (or, at least, I'm pretty sure that's what @craig.topper told me last month). Otherwise, I think this is good.

spatel updated this revision to Diff 126999.Dec 14 2017, 11:13 AM

Patch updated:
This version just moves what was known as 'latesimplifycfg' before the SLP vectorizer (no extra passes are added as in the last rev).
SLP preserves the CFG, so there's no reason to wait to run SimplifyCFG after that.

hfinkel accepted this revision.Dec 14 2017, 12:16 PM

Patch updated:
This version just moves what was known as 'latesimplifycfg' before the SLP vectorizer (no extra passes are added as in the last rev).
SLP preserves the CFG, so there's no reason to wait to run SimplifyCFG after that.

Thanks. LGTM.

This revision is now accepted and ready to land.Dec 14 2017, 12:16 PM
This revision was automatically updated to reflect the committed changes.