This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Replace selects with Phis
ClosedPublic

Authored by mkazantsev on Jun 17 2020, 5:23 AM.

Details

Summary

We can sometimes replace a select with a Phi node if all of its values
are available on respective incoming edges.

Diff Detail

Event Timeline

mkazantsev created this revision.Jun 17 2020, 5:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2020, 5:23 AM

Recent work by @lebedev.ri went opposite way; phi -> select (mainly to exploit cmov on x86) so I am wondering why should we go this way..

spatel added a subscriber: spatel.Jun 17 2020, 7:51 AM

I'm not sure what direction we want to take this, but forming phis out of selects may conflict with SimplifyCFG. It can create select from phi in cases like this:

declare void @call(i32)

define i32 @select_or_phi(i1 %cond, i32 %x, i32 %y) {
entry:
  br i1 %cond, label %if.true, label %if.false

if.true:
  call void @call(i32 %x)
  br label %merge

if.false:
  call void @call(i32 %y)
  br label %merge

merge:
  %r1 = phi i32 [ %y, %if.false ], [ %x, %if.true ]
  %r2 = select i1 %cond, i32 %x, i32 %y
  %r = add i32 %r1, %r2
  ret i32 %r
}
$ opt -O1 phi.ll -S

declare void @call(i32)

define i32 @select_or_phi(i1 %cond, i32 %x, i32 %y) {
entry:
  %y.sink = select i1 %cond, i32 %x, i32 %y
  %r1 = select i1 %cond, i32 %x, i32 %y
  call void @call(i32 %y.sink)
  %r2 = select i1 %cond, i32 %x, i32 %y
  %r = add i32 %r1, %r2
  ret i32 %r
}

I believe selects are better for other optimizations than phis

Canonicalization between phis and select in either direction are pretty dangerous I think, I remember seeing discussions around this a couple times on llvm-dev, for example https://groups.google.com/d/msg/llvm-dev/VcJnLMI7Deg/20b3kBGMCgAJ.

We might want to look at some of the specific issues that D81375 ran into. Here is one: https://llvm.godbolt.org/z/KN9GFh We should be replacing %s with %c, but nothing in the pipeline does that. This looks like something we should be addressing anyway, because I don't think this patch does.

mkazantsev added a comment.EditedJun 17 2020, 9:14 PM

Recent work by @lebedev.ri went opposite way; phi -> select (mainly to exploit cmov on x86) so I am wondering why should we go this way..

I believe selects are better for other optimizations than phis

If you look at the test select_dominating_cond_and_sink at select.ll, it shows an opportunity how we can reduce the amount of work on each execution path by sinking TrueValue and FalseValue to different branches. Maybe it will be even more obvious if TrueValue and FalseValue will be not as simple as mul and add, but some huge formulas, such that moving them to one of branches gives great benefit. Compared to this benefit, advantage of having a cmov is negligible.

If we turn Phis into Selects, we will always have all those values on common path. If we think more of it, Phis open up great opportunities for jump threading and PRE while selects don't.

If the only reason of preferring selects over phis is their better lowering on some platforms, then there should be a backend transform that does it. From optimizer's perspective, Phi has more advantages.

mkazantsev added a comment.EditedJun 17 2020, 9:18 PM

I'm not sure what direction we want to take this, but forming phis out of selects may conflict with SimplifyCFG. It can create select from phi in cases like this:

There is no contradiction here. It's OK to turn Phi into select if we also reduce number of blocks. Add volatile stores to every branch of your example and SimplifyCFG will stop working.
But Select into a Phi provides opportunity for redundancy elimination (see my comment above) and will potentially help Jump Threading.

mkazantsev added a comment.EditedJun 17 2020, 9:54 PM

In general, I'm pretty sure that InstCombine is just a wrong place to do any backend-targeted optimizations (we've seen these in transforms that turn analyzable comparison conditions into weird mixture of shifts and xors that no other pass understands).

nikic added a comment.EditedJun 18 2020, 12:36 AM

@mkazantsev I don't think this is about backend-specific optimizations (which indeed InstCombine should not be concerned about), phis and selects generally have different weak points in mid-level optimization. While you are right that the use of phis enables better jump threading, phis will usually not work with redundancy elimination and code motion optimizations. E.g. I don't think CSE or GVN do anything with phis right now.

That said, I can definitely see the argument that if you already have a branch, it is beneficial to represent a select as a phi, and leave the decision on whether to turn the whole branch into a sequence of selects to SimplifyCFG.

mkazantsev added a comment.EditedJun 18 2020, 3:23 AM

@mkazantsev I don't think this is about backend-specific optimizations (which indeed InstCombine should not be concerned about), phis and selects generally have different weak points in mid-level optimization. While you are right that the use of phis enables better jump threading, phis will usually not work with redundancy elimination and code motion optimizations. E.g. I don't think CSE or GVN do anything with phis right now.

That said, I can definitely see the argument that if you already have a branch, it is beneficial to represent a select as a phi, and leave the decision on whether to turn the whole branch into a sequence of selects to EarlyCSE.

Well, my patch doesn't break any existing test. I also have tests that get improved and may be potentially improved further. If you say that it might *not* be a profitable transform, please come up with a particular example.

Actually, GVN does *a lot* about Phis (and mostly Phis) in PRE. And EarlyCSE does not walk above merge points anyways.

@mkazantsev I don't think this is about backend-specific optimizations (which indeed InstCombine should not be concerned about), phis and selects generally have different weak points in mid-level optimization. While you are right that the use of phis enables better jump threading, phis will usually not work with redundancy elimination and code motion optimizations. E.g. I don't think CSE or GVN do anything with phis right now.

That said, I can definitely see the argument that if you already have a branch, it is beneficial to represent a select as a phi, and leave the decision on whether to turn the whole branch into a sequence of selects to EarlyCSE.

Well, my patch doesn't break any existing test. I also have tests that get improved and may be potentially improved further. If you say that it might *not* be a profitable transform, please come up with a particular example.

Actually, GVN does *a lot* about Phis (and mostly Phis) in PRE. And EarlyCSE does not walk above merge points anyways.

This discussion reminds me of:
https://bugs.llvm.org/show_bug.cgi?id=34603
cc @efriedma @hfinkel @reames

This patch converts an explicit use of the condition value into an implicit use via control-flow, so that seems like a good choice if we're early in optimization. Given that instcombine is used at every stage of the 'opt -Ox' pipeline though, I'm not sure what the overall implications will be.

So I don't object to this patch, but it would be good to hear from others if there are concerns.

I think this makes sense as a canonicalization: if we could express a select idiom using either a PHI or a select, the PHI probably makes sense as the canonical form. We clearly wouldn't want to canonicalize the other way. (The more general discussion of select vs. control flow is more centered around whether we should have the control flow in the first place; that's outside the scope of instcombine.)

The test coverage here seems a little slim; I'd like to see more coverage for interesting cases (in particular, cases where the PHI node doesn't have exactly two incoming edges).

We could try to generalize this transform to try to insert the PHI in blocks other than the block that contains the select; maybe leave that as a followup, though.

I think this makes sense as a canonicalization: if we could express a select idiom using either a PHI or a select, the PHI probably makes sense as the canonical form. We clearly wouldn't want to canonicalize the other way. (The more general discussion of select vs. control flow is more centered around whether we should have the control flow in the first place; that's outside the scope of instcombine.)

The test coverage here seems a little slim; I'd like to see more coverage for interesting cases (in particular, cases where the PHI node doesn't have exactly two incoming edges).

We could try to generalize this transform to try to insert the PHI in blocks other than the block that contains the select; maybe leave that as a followup, though.

I agree. We did discuss this general question in https://bugs.llvm.org/show_bug.cgi?id=34603#c10 and I think the conclusion is still valid: Assuming we don't cause regressions because of passes that don't handle control flow (and don't cause significant compile-time slowdowns or memory overhead), canonicalizing in favor of PHIs is where we should be aiming.

Added multi-entry block tests.

nikic added a comment.Jun 20 2020, 3:05 AM

Some more test cases I would suggest:

  • An incoming multi-edge (same predecessor occurring multiple times) from a switch.
  • A select operand coming from the terminator of a predecessor block, e.g. an invoke result.
  • A select with a very obviously non-dominating input. All the existing examples look "salvageable" in some way (via phi translation or hoisting).
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2466

This looks like dead code, as we canonicalize branch of not by swapping successors. Might change with D81089, but for now it should be fine to just drop it.

2494

Nit: Unnecessary braces.

mkazantsev marked an inline comment as done.Jun 21 2020, 9:39 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2466

WDYM dead code? This code is needed to initialize variables TrueSucc and FalseSucc.

mkazantsev marked an inline comment as done.Jun 21 2020, 9:41 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2466

By the moment when we process this Phi, we did not process its dominator block yet. So no, this branch will not be canonicalized when we are handling this one.

An incoming multi-edge (same predecessor occurring multiple times) from a switch.

Not sure why it is important, but ok, will do.

A select operand coming from the terminator of a predecessor block, e.g. an invoke result.

We do not support that. Do you want a negative test?

A select with a very obviously non-dominating input.

A select cannot have a non-dominating input.

Added test with duplicating preds and negative test with invokes. Removed braces.

mkazantsev marked an inline comment as done.Jun 21 2020, 10:22 PM
nikic accepted this revision.Jun 22 2020, 12:28 AM

An incoming multi-edge (same predecessor occurring multiple times) from a switch.

Not sure why it is important, but ok, will do.

Thanks!

A select operand coming from the terminator of a predecessor block, e.g. an invoke result.

We do not support that. Do you want a negative test?

What I actually had in mind here is something along these lines:

declare i32 @__gxx_personality_v0(...)
declare i32 @foo()

define i32 @test_invoke_neg(i1 %cond, i32 %x, i32 %y) nounwind uwtable ssp personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
; CHECK-LABEL: @test_invoke_neg(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_TRUE:%.*]], label [[IF_FALSE:%.*]]
; CHECK:       if.true:
; CHECK-NEXT:    br label [[MERGE:%.*]]
; CHECK:       if.false:
; CHECK-NEXT:    [[RESULT:%.*]] = invoke i32 @foo()
; CHECK-NEXT:    to label [[MERGE]] unwind label [[LPAD:%.*]]
; CHECK:       merge:
; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ 0, [[IF_TRUE]] ], [ [[RESULT]], [[IF_FALSE]] ]
; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[COND]], i32 1, i32 [[PHI]]
; CHECK-NEXT:    ret i32 [[SEL]]
; CHECK:       lpad:
; CHECK-NEXT:    [[LP:%.*]] = landingpad { i1, i32 }
; CHECK-NEXT:    filter [0 x i1] zeroinitializer
; CHECK-NEXT:    unreachable
;
entry:
  br i1 %cond, label %if.true, label %if.false

if.true:
  br label %merge

if.false:
  %result = invoke i32 @foo()
  to label %merge unwind label %lpad

merge:
  %phi = phi i32 [ 0, %if.true ], [ %result, %if.false ]
  %sel = select i1 %cond, i32 1, i32 %phi
  ret i32 %sel

lpad:
  %lp = landingpad { i1, i32 }
  filter [0 x i1] zeroinitializer
  unreachable
}

That is, a case where the select input exactly dominates the terminator of the predecessor. But now that I wrote that, it's not really applicable to this patch, because there's the phi node sitting in between, so this is more a test for D82072 than this one.

A select with a very obviously non-dominating input.

A select cannot have a non-dominating input.

Sorry, what I meant here is that the input does not dominate the terminator of the predecessor. You do have some tests of this kind, but only as a side-effect of instructions being sunk into the block that contains the select, while the original test cases could be transformed. The suggestion here is to add one true-negative test case that will not transform even once the possible extensions (phi translation, hoisting instructions back) to this patch are implemented. Something like:

define i32 @test(i1 %cond, i32 %a, i32 %b) {
; CHECK-LABEL: @test(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_TRUE:%.*]], label [[IF_FALSE:%.*]]
; CHECK:       if.true:
; CHECK-NEXT:    br label [[MERGE:%.*]]
; CHECK:       if.false:
; CHECK-NEXT:    br label [[MERGE]]
; CHECK:       merge:
; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ [[A:%.*]], [[IF_TRUE]] ], [ [[B:%.*]], [[IF_FALSE]] ]
; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[PHI]], 1
; CHECK-NEXT:    [[S:%.*]] = select i1 [[COND]], i32 0, i32 [[ADD]]
; CHECK-NEXT:    ret i32 [[S]]
;
entry:
  br i1 %cond, label %if.true, label %if.false

if.true:
  br label %merge

if.false:
  br label %merge

merge:
  %phi = phi i32 [%a, %if.true], [%b, %if.false]
  %add = add i32 %phi, 1
  %s = select i1 %cond, i32 0, i32 %add
  ret i32 %s
}

In any case, this LGTM.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2466

Hm, you're right. If I add an assert(0) in here, I do get some test-suite failures. D75362 should guarantee that the branch is canonicalized first, but until then it's not dead code indeed (though it should still picked up by fixpoint iteration, but best not rely on that).

This revision is now accepted and ready to land.Jun 22 2020, 12:28 AM
mkazantsev marked an inline comment as done.Jun 22 2020, 9:36 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2466

It's because InstCombine iterates blocks in post-order, so dominators go after dominated blocks.

Ok, thanks. I'll add the test with invoke as one of selected values. As for another one (select argument does not dominate terminator), it already has coverage in exsting tests.

mkazantsev closed this revision.Jun 22 2020, 10:31 PM
commit 9bff376e5c10ea384a6eee93f7d7668d670a66e7
Author: Max Kazantsev <mkazantsev@azul.com>
Date:   Tue Jun 23 11:46:59 2020 +0700

    [InstCombine] Replace selects with Phis

    We can sometimes replace a select with a Phi node if all of its values
    are available on respective incoming edges.

    Differential Revision: https://reviews.llvm.org/D82005
    Reviewed By: nikic

Hi,

We've noticed a crash that started occuring with this patch.
Running

opt -S -o - bbi-50827.ll -instcombine

results in

opt: ../include/llvm/ADT/DenseMap.h:1241: llvm::DenseMapIterator::pointer llvm::DenseMapIterator<llvm::BasicBlock *, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > >, llvm::DenseMapInfo<llvm::BasicBlock *>, llvm::detail::DenseMapPair<llvm::BasicBlock *, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > > >, true>::operator->() const [KeyT = llvm::BasicBlock *, ValueT = std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > >, KeyInfoT = llvm::DenseMapInfo<llvm::BasicBlock *>, Bucket = llvm::detail::DenseMapPair<llvm::BasicBlock *, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > > >, IsConst = true]: Assertion `Ptr != End && "dereferencing end() iterator"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.

with bbi-50827.ll being

@a = external global i1, align 1

define i16 @g()  {
entry:
  %.b = load i1, i1* @a, align 1
  %0 = select i1 %.b, i16 0, i16 5
  br label %for.body.i

for.body.i:                                       ; preds = %for.body.i, %entry
  br label %for.body.i

f.exit:                                           ; No predecessors!
  ret i16 %0
}

Note the use of %0 in f.exit, which is not reachable from entry.

Note the use of %0 in f.exit, which is not reachable from entry.

This looks similar to the problem in 94f6d365 ( https://llvm.org/PR48369 ). I'll take a look.

Note the use of %0 in f.exit, which is not reachable from entry.

This looks similar to the problem in 94f6d365 ( https://llvm.org/PR48369 ). I'll take a look.

This should handle the test shown: 4f051fe
If there are still variations of this bug present, we might want to try a different fix in instcombine and/or domtree.

This should handle the test shown: 4f051fe

Yes, thanks!