We can sometimes replace a select with a Phi node if all of its values
are available on respective incoming edges.
Details
Diff Detail
Event Timeline
Recent work by @lebedev.ri went opposite way; phi -> select (mainly to exploit cmov on x86) so I am wondering why should we go this way..
I'm not sure what direction we want to take this, but forming phis out of selects may conflict with SimplifyCFG. It can create select from phi in cases like this:
declare void @call(i32) define i32 @select_or_phi(i1 %cond, i32 %x, i32 %y) { entry: br i1 %cond, label %if.true, label %if.false if.true: call void @call(i32 %x) br label %merge if.false: call void @call(i32 %y) br label %merge merge: %r1 = phi i32 [ %y, %if.false ], [ %x, %if.true ] %r2 = select i1 %cond, i32 %x, i32 %y %r = add i32 %r1, %r2 ret i32 %r }
$ opt -O1 phi.ll -S declare void @call(i32) define i32 @select_or_phi(i1 %cond, i32 %x, i32 %y) { entry: %y.sink = select i1 %cond, i32 %x, i32 %y %r1 = select i1 %cond, i32 %x, i32 %y call void @call(i32 %y.sink) %r2 = select i1 %cond, i32 %x, i32 %y %r = add i32 %r1, %r2 ret i32 %r }
Canonicalization between phis and select in either direction are pretty dangerous I think, I remember seeing discussions around this a couple times on llvm-dev, for example https://groups.google.com/d/msg/llvm-dev/VcJnLMI7Deg/20b3kBGMCgAJ.
We might want to look at some of the specific issues that D81375 ran into. Here is one: https://llvm.godbolt.org/z/KN9GFh We should be replacing %s with %c, but nothing in the pipeline does that. This looks like something we should be addressing anyway, because I don't think this patch does.
If you look at the test select_dominating_cond_and_sink at select.ll, it shows an opportunity how we can reduce the amount of work on each execution path by sinking TrueValue and FalseValue to different branches. Maybe it will be even more obvious if TrueValue and FalseValue will be not as simple as mul and add, but some huge formulas, such that moving them to one of branches gives great benefit. Compared to this benefit, advantage of having a cmov is negligible.
If we turn Phis into Selects, we will always have all those values on common path. If we think more of it, Phis open up great opportunities for jump threading and PRE while selects don't.
If the only reason of preferring selects over phis is their better lowering on some platforms, then there should be a backend transform that does it. From optimizer's perspective, Phi has more advantages.
There is no contradiction here. It's OK to turn Phi into select if we also reduce number of blocks. Add volatile stores to every branch of your example and SimplifyCFG will stop working.
But Select into a Phi provides opportunity for redundancy elimination (see my comment above) and will potentially help Jump Threading.
In general, I'm pretty sure that InstCombine is just a wrong place to do any backend-targeted optimizations (we've seen these in transforms that turn analyzable comparison conditions into weird mixture of shifts and xors that no other pass understands).
@mkazantsev I don't think this is about backend-specific optimizations (which indeed InstCombine should not be concerned about), phis and selects generally have different weak points in mid-level optimization. While you are right that the use of phis enables better jump threading, phis will usually not work with redundancy elimination and code motion optimizations. E.g. I don't think CSE or GVN do anything with phis right now.
That said, I can definitely see the argument that if you already have a branch, it is beneficial to represent a select as a phi, and leave the decision on whether to turn the whole branch into a sequence of selects to SimplifyCFG.
Well, my patch doesn't break any existing test. I also have tests that get improved and may be potentially improved further. If you say that it might *not* be a profitable transform, please come up with a particular example.
Actually, GVN does *a lot* about Phis (and mostly Phis) in PRE. And EarlyCSE does not walk above merge points anyways.
This discussion reminds me of:
https://bugs.llvm.org/show_bug.cgi?id=34603
cc @efriedma @hfinkel @reames
This patch converts an explicit use of the condition value into an implicit use via control-flow, so that seems like a good choice if we're early in optimization. Given that instcombine is used at every stage of the 'opt -Ox' pipeline though, I'm not sure what the overall implications will be.
So I don't object to this patch, but it would be good to hear from others if there are concerns.
I think this makes sense as a canonicalization: if we could express a select idiom using either a PHI or a select, the PHI probably makes sense as the canonical form. We clearly wouldn't want to canonicalize the other way. (The more general discussion of select vs. control flow is more centered around whether we should have the control flow in the first place; that's outside the scope of instcombine.)
The test coverage here seems a little slim; I'd like to see more coverage for interesting cases (in particular, cases where the PHI node doesn't have exactly two incoming edges).
We could try to generalize this transform to try to insert the PHI in blocks other than the block that contains the select; maybe leave that as a followup, though.
I agree. We did discuss this general question in https://bugs.llvm.org/show_bug.cgi?id=34603#c10 and I think the conclusion is still valid: Assuming we don't cause regressions because of passes that don't handle control flow (and don't cause significant compile-time slowdowns or memory overhead), canonicalizing in favor of PHIs is where we should be aiming.
Some more test cases I would suggest:
- An incoming multi-edge (same predecessor occurring multiple times) from a switch.
- A select operand coming from the terminator of a predecessor block, e.g. an invoke result.
- A select with a very obviously non-dominating input. All the existing examples look "salvageable" in some way (via phi translation or hoisting).
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
2466 | This looks like dead code, as we canonicalize branch of not by swapping successors. Might change with D81089, but for now it should be fine to just drop it. | |
2494 | Nit: Unnecessary braces. |
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
2466 | WDYM dead code? This code is needed to initialize variables TrueSucc and FalseSucc. |
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
2466 | By the moment when we process this Phi, we did not process its dominator block yet. So no, this branch will not be canonicalized when we are handling this one. |
An incoming multi-edge (same predecessor occurring multiple times) from a switch.
Not sure why it is important, but ok, will do.
A select operand coming from the terminator of a predecessor block, e.g. an invoke result.
We do not support that. Do you want a negative test?
A select with a very obviously non-dominating input.
A select cannot have a non-dominating input.
Thanks!
A select operand coming from the terminator of a predecessor block, e.g. an invoke result.
We do not support that. Do you want a negative test?
What I actually had in mind here is something along these lines:
declare i32 @__gxx_personality_v0(...) declare i32 @foo() define i32 @test_invoke_neg(i1 %cond, i32 %x, i32 %y) nounwind uwtable ssp personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) { ; CHECK-LABEL: @test_invoke_neg( ; CHECK-NEXT: entry: ; CHECK-NEXT: br i1 [[COND:%.*]], label [[IF_TRUE:%.*]], label [[IF_FALSE:%.*]] ; CHECK: if.true: ; CHECK-NEXT: br label [[MERGE:%.*]] ; CHECK: if.false: ; CHECK-NEXT: [[RESULT:%.*]] = invoke i32 @foo() ; CHECK-NEXT: to label [[MERGE]] unwind label [[LPAD:%.*]] ; CHECK: merge: ; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 0, [[IF_TRUE]] ], [ [[RESULT]], [[IF_FALSE]] ] ; CHECK-NEXT: [[SEL:%.*]] = select i1 [[COND]], i32 1, i32 [[PHI]] ; CHECK-NEXT: ret i32 [[SEL]] ; CHECK: lpad: ; CHECK-NEXT: [[LP:%.*]] = landingpad { i1, i32 } ; CHECK-NEXT: filter [0 x i1] zeroinitializer ; CHECK-NEXT: unreachable ; entry: br i1 %cond, label %if.true, label %if.false if.true: br label %merge if.false: %result = invoke i32 @foo() to label %merge unwind label %lpad merge: %phi = phi i32 [ 0, %if.true ], [ %result, %if.false ] %sel = select i1 %cond, i32 1, i32 %phi ret i32 %sel lpad: %lp = landingpad { i1, i32 } filter [0 x i1] zeroinitializer unreachable }
That is, a case where the select input exactly dominates the terminator of the predecessor. But now that I wrote that, it's not really applicable to this patch, because there's the phi node sitting in between, so this is more a test for D82072 than this one.
A select with a very obviously non-dominating input.
A select cannot have a non-dominating input.
Sorry, what I meant here is that the input does not dominate the terminator of the predecessor. You do have some tests of this kind, but only as a side-effect of instructions being sunk into the block that contains the select, while the original test cases could be transformed. The suggestion here is to add one true-negative test case that will not transform even once the possible extensions (phi translation, hoisting instructions back) to this patch are implemented. Something like:
define i32 @test(i1 %cond, i32 %a, i32 %b) { ; CHECK-LABEL: @test( ; CHECK-NEXT: entry: ; CHECK-NEXT: br i1 [[COND:%.*]], label [[IF_TRUE:%.*]], label [[IF_FALSE:%.*]] ; CHECK: if.true: ; CHECK-NEXT: br label [[MERGE:%.*]] ; CHECK: if.false: ; CHECK-NEXT: br label [[MERGE]] ; CHECK: merge: ; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[A:%.*]], [[IF_TRUE]] ], [ [[B:%.*]], [[IF_FALSE]] ] ; CHECK-NEXT: [[ADD:%.*]] = add i32 [[PHI]], 1 ; CHECK-NEXT: [[S:%.*]] = select i1 [[COND]], i32 0, i32 [[ADD]] ; CHECK-NEXT: ret i32 [[S]] ; entry: br i1 %cond, label %if.true, label %if.false if.true: br label %merge if.false: br label %merge merge: %phi = phi i32 [%a, %if.true], [%b, %if.false] %add = add i32 %phi, 1 %s = select i1 %cond, i32 0, i32 %add ret i32 %s }
In any case, this LGTM.
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
2466 | Hm, you're right. If I add an assert(0) in here, I do get some test-suite failures. D75362 should guarantee that the branch is canonicalized first, but until then it's not dead code indeed (though it should still picked up by fixpoint iteration, but best not rely on that). |
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
2466 | It's because InstCombine iterates blocks in post-order, so dominators go after dominated blocks. |
Ok, thanks. I'll add the test with invoke as one of selected values. As for another one (select argument does not dominate terminator), it already has coverage in exsting tests.
commit 9bff376e5c10ea384a6eee93f7d7668d670a66e7 Author: Max Kazantsev <mkazantsev@azul.com> Date: Tue Jun 23 11:46:59 2020 +0700 [InstCombine] Replace selects with Phis We can sometimes replace a select with a Phi node if all of its values are available on respective incoming edges. Differential Revision: https://reviews.llvm.org/D82005 Reviewed By: nikic
Hi,
We've noticed a crash that started occuring with this patch.
Running
opt -S -o - bbi-50827.ll -instcombine
results in
opt: ../include/llvm/ADT/DenseMap.h:1241: llvm::DenseMapIterator::pointer llvm::DenseMapIterator<llvm::BasicBlock *, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > >, llvm::DenseMapInfo<llvm::BasicBlock *>, llvm::detail::DenseMapPair<llvm::BasicBlock *, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > > >, true>::operator->() const [KeyT = llvm::BasicBlock *, ValueT = std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > >, KeyInfoT = llvm::DenseMapInfo<llvm::BasicBlock *>, Bucket = llvm::detail::DenseMapPair<llvm::BasicBlock *, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > > >, IsConst = true]: Assertion `Ptr != End && "dereferencing end() iterator"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
with bbi-50827.ll being
@a = external global i1, align 1 define i16 @g() { entry: %.b = load i1, i1* @a, align 1 %0 = select i1 %.b, i16 0, i16 5 br label %for.body.i for.body.i: ; preds = %for.body.i, %entry br label %for.body.i f.exit: ; No predecessors! ret i16 %0 }
Note the use of %0 in f.exit, which is not reachable from entry.
This looks similar to the problem in 94f6d365 ( https://llvm.org/PR48369 ). I'll take a look.
This should handle the test shown: 4f051fe
If there are still variations of this bug present, we might want to try a different fix in instcombine and/or domtree.
This looks like dead code, as we canonicalize branch of not by swapping successors. Might change with D81089, but for now it should be fine to just drop it.