This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Simpify inverted predicates in 'or'
AbandonedPublic

Authored by modocache on May 14 2017, 10:50 AM.

Details

Summary

Addresses PR#32791 (https://bugs.llvm.org//show_bug.cgi?id=32791).

When attempting to simplify an 'or' instruction, check whether its operands are
the results of 'select' instructions, and whether those instructions are the
inverse of one another (that is, their predicate are the inverse of one another,
but their true and false conditions are identical). If they are, the 'or' can
be simplified to a single 'select' instruction.

Event Timeline

modocache created this revision.May 14 2017, 10:50 AM
milseman added inline comments.
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2288

You should probably follow the style of the surrounding code by using the pattern matchers. See PatternMatch.h, and an example at e.g. lines 2261 above.

modocache updated this revision to Diff 98935.May 14 2017, 3:43 PM

Use PatternMatch.h. Also, actually perform checks for zero operands (whoops!).

modocache marked an inline comment as done.May 14 2017, 3:43 PM
spatel added inline comments.May 14 2017, 4:48 PM
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2269–2270

This comment didn't read clearly to me. The 'or' (not a comparison) is based on the selects.

Have the formula in the comment line up with the variable names in the code:
or (select (cmp Pred1, A, B), C, 0), (select (cmp Pred2, A, B), D, 0) --> select (cmp Pred1, A, B), C, D
when Pred1 is the inverse of Pred2.

2285–2290

Use another pattern matcher here (warning: untested code):

if (match(X, m_Cmp(Pred1, m_Value(A), m_Value(B))) &&
    match(Y, m_Cmp(Pred2, m_Specific(A), m_Specific(B))) &&
    CmpInst::getInversePredicate(Pred1) == Pred2)
test/Transforms/InstCombine/logical-select.ll
67

We need more tests:

  1. Check the case where the 0 operands are on the left instead of the right.
  2. The code should work with fcmps, so we need a test for that.
  3. The code should work with vectors, so we need a test for that.
modocache updated this revision to Diff 98939.May 14 2017, 6:24 PM

Use a match() in the nested condition, update comment for clarity. To come: additional tests.

modocache marked 2 inline comments as done.May 14 2017, 6:24 PM
modocache updated this revision to Diff 98942.May 14 2017, 8:26 PM

Add tests, and fix a bug that the 'reversed' tests uncovered.

modocache marked an inline comment as done.May 14 2017, 8:28 PM

Thanks again, @spatel! I think I've addressed all of your feedback.

I starred a bit at this patch, and I think it's correct.
OTOH, I have feeling of perplexity that arises from the way we're looking at this.
In particular, it seems like we're widening the peephole a little too much -> or (select (cmp.
I'm fairly sure we do the same in other places in InstCombine, so there's a precedent.
I don't have a strong opinion on this, but I'd like if @majnemer or @efriedma could take a look.

davide requested changes to this revision.EditedMay 14 2017, 10:51 PM

After giving some more careful thought, I'm not necessarily sure InstCombine is the right pass to solve this problem (or at least, there's an alternative solution).
I've been convincing myself this is a value-range propagation/copy-propagation problem, and in fact, this is how GCC optimizes this code.
If I understand correctly, you start with some C code that looks like this: https://godbolt.org/g/0HBds4

Now you have some pseudo-IR that looks like this (actually, this is GIMPLE, so we can map to something real :))

<bb 2>:
if (a_3(D) < b_4(D))
  goto <bb 4>;
else
  goto <bb 3>;

<bb 3>:

<bb 4>:
# iftmp.0_1 = PHI <c_5(D)(2), 0(3)>
if (a_3(D) >= b_4(D))
  goto <bb 6>;
else
  goto <bb 5>;

<bb 5>:

<bb 6>:
# iftmp.1_2 = PHI <d_6(D)(4), 0(5)>
_7 = iftmp.0_1 | iftmp.1_2;
return _7;

your VRP algorithm can prove after SSA replacement (where LHS replaces RHS)

a_9 -> { a_3(D) }
a_10 -> { a_3(D) }
b_11 -> { b_4(D) }
b_12 -> { b_4(D) }

that your SSA values have the following ranges:

a_9: [-INF, b_4(D) + -1]
a_10: [b_4(D), +INF]
b_11: [a_9 + 1, +INF]
b_12: [-INF, a_10]

therefore removing the second compare and transforming your IR into:

<bb 2>:
if (a_3(D) < b_4(D))
  goto <bb 5>;
else
  goto <bb 4>;

<bb 3>:
# iftmp.1_2 = PHI <0(5), d_6(D)(4)>
# iftmp.0_15 = PHI <iftmp.0_14(5), iftmp.0_13(4)>
_7 = iftmp.0_15 | iftmp.1_2;
return _7;

<bb 4>:
# iftmp.0_13 = PHI <0(2)>
goto <bb 3>;

<bb 5>:
# iftmp.0_14 = PHI <c_5(D)(2)>
goto <bb 3>;

(at that point you have a bunch of trivial phi nodes that probably copy propagation or something else could clean up, you get the idea).

This revision now requires changes to proceed.May 14 2017, 10:51 PM
spatel edited edge metadata.May 15 2017, 8:11 AM

I agree that we're missing some kind of more general transform. Eg:

  int goo(int x, int y, int r) {
    if (x > y) r += 42;
    if (x <= y) r += 12;
    return r;
  }

define i32 @goo(i32, i32, i32) {
  %4 = icmp sgt i32 %0, %1
  %5 = add nsw i32 %2, 42
  %6 = select i1 %4, i32 %5, i32 %2
  %7 = add nsw i32 %6, 12
  %8 = select i1 %4, i32 %5, i32 %7
  ret i32 %8
}

https://godbolt.org/g/tszo72

It's worth noting that one motivation for PR32791 (although I forgot to write it there) is that I think we should canonicalize all:
not (cmp P, X, Y) --> cmp P', X, Y
Ie, invert the predicate to eliminate an explicit 'not' (xor) because then we have eliminated a dependency in the IR. This was discussed in D32725.

So we should look at how the existing InstCombine test ("@poo" is just above the new test here) fails if we implement that canonicalization:

define i32 @poo5(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %and1 = select i1 %cmp, i32 %c, i32 0
  %cmpnot = xor i1 %cmp, -1   <--- this can be replaced by "icmp sge i32 %a, %b", and we have the first testcase in this patch (the test from PR32791)
  %and2 = select i1 %cmpnot, i32 %d, i32 0
  %or = or i32 %and1, %and2
  ret i32 %or
}

define i32 @poo6(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %and1 = select i1 %cmp, i32 %c, i32 0
  %and2 = select i1 %cmp, i32 0, i32 %d  <--- we fold a 'not' compare by swapping the select operands
  %or = or i32 %and1, %and2
  ret i32 %or
}

For this pattern, there's a fold in InstCombiner::SimplifyUsingDistributiveLaws():

// (op (select (a, c, b)), (select (a, d, b))) -> (select (a, (op c, d), 0))
// (op (select (a, b, c)), (select (a, b, d))) -> (select (a, 0, (op c, d)))

Note: the implementation for that fold seems bogus. We create a new binop and a new select for *both* of the current selects in this example and then don't use one pair.

So if we want a more general InstCombine solution than what is currently proposed here, I think we should enhance that to account for an inverted predicate as well as matching predicates:
binop (select P, A, B), (select P', C, D) --> binop (select P, A, B), (select P, D, C)
and let the existing fold take it from there.

Swapping select operands to eliminate an inverted compare seems like a generally good fold for some other pass too. Would that be GVN?

I agree that we're missing some kind of more general transform. Eg:

  int goo(int x, int y, int r) {
    if (x > y) r += 42;
    if (x <= y) r += 12;
    return r;
  }

define i32 @goo(i32, i32, i32) {
  %4 = icmp sgt i32 %0, %1
  %5 = add nsw i32 %2, 42
  %6 = select i1 %4, i32 %5, i32 %2
  %7 = add nsw i32 %6, 12
  %8 = select i1 %4, i32 %5, i32 %7
  ret i32 %8
}

https://godbolt.org/g/tszo72

It's worth noting that one motivation for PR32791 (although I forgot to write it there) is that I think we should canonicalize all:
not (cmp P, X, Y) --> cmp P', X, Y
Ie, invert the predicate to eliminate an explicit 'not' (xor) because then we have eliminated a dependency in the IR. This was discussed in D32725.

So we should look at how the existing InstCombine test ("@poo" is just above the new test here) fails if we implement that canonicalization:

define i32 @poo5(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %and1 = select i1 %cmp, i32 %c, i32 0
  %cmpnot = xor i1 %cmp, -1   <--- this can be replaced by "icmp sge i32 %a, %b", and we have the first testcase in this patch (the test from PR32791)
  %and2 = select i1 %cmpnot, i32 %d, i32 0
  %or = or i32 %and1, %and2
  ret i32 %or
}

define i32 @poo6(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %and1 = select i1 %cmp, i32 %c, i32 0
  %and2 = select i1 %cmp, i32 0, i32 %d  <--- we fold a 'not' compare by swapping the select operands
  %or = or i32 %and1, %and2
  ret i32 %or
}

For this pattern, there's a fold in InstCombiner::SimplifyUsingDistributiveLaws():

// (op (select (a, c, b)), (select (a, d, b))) -> (select (a, (op c, d), 0))
// (op (select (a, b, c)), (select (a, b, d))) -> (select (a, 0, (op c, d)))

Note: the implementation for that fold seems bogus. We create a new binop and a new select for *both* of the current selects in this example and then don't use one pair.

So if we want a more general InstCombine solution than what is currently proposed here, I think we should enhance that to account for an inverted predicate as well as matching predicates:
binop (select P, A, B), (select P', C, D) --> binop (select P, A, B), (select P, D, C)
and let the existing fold take it from there.

Swapping select operands to eliminate an inverted compare seems like a generally good fold for some other pass too. Would that be GVN?

Not entirely sure I follow this last one.
NewGVN finds whether two expressions e1 and e2 are equivalent (actually, not general equivalence as checking equivalence of program expressions is undecidable, so we approximate with Herbrand equivalence). If you have an example (IR), that will help (and I can take a look).

Swapping select operands to eliminate an inverted compare seems like a generally good fold for some other pass too. Would that be GVN?

Not entirely sure I follow this last one.
NewGVN finds whether two expressions e1 and e2 are equivalent (actually, not general equivalence as checking equivalence of program expressions is undecidable, so we approximate with Herbrand equivalence). If you have an example (IR), that will help (and I can take a look).

declare i32 @bar(i32, i32)

define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %sel1 = select i1 %cmp, i32 %c, i32 0
  %cmpnot = icmp sge i32 %a, %b
  %sel2 = select i1 %cmpnot, i32 %d, i32 0
  %call = tail call i32 @bar(i32 %sel1, i32 %sel2)
  ret i32 %call
}

I think this should be canonicalized to:

define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %sel1 = select i1 %cmp, i32 %c, i32 0
  %sel2 = select i1 %cmp, i32 0,  i32 %d
  %call = tail call i32 @bar(i32 %sel1, i32 %sel2)
  ret i32 %call
}

The general problem is that we should recognize when a compare value used by a select already exists in inverted form. If it does, swap the select operands and eliminate the inverted usage.

The problem may be more general than that. It's really about recognizing inverted compares, so let's take selects out of the equation (using div ops to thwart hoisting/sinking here, but that's just to produce a minimal IR example):

define i32 @inverted_compares(i32 %a, i32 %b, i32 %c) {
entry:
  %cmp = icmp slt i32 %a, %b
  br i1 %cmp, label %t1, label %f1

t1:
  %div1 = sdiv i32 42, %c
  br label %endif1

f1:
  %div2 = srem i32 43, %c
  br label %endif1

endif1:
  %phi1 = phi i32 [ %div1, %t1 ], [ %div2, %f1 ]
  %cmpnot = icmp sge i32 %a, %b
  br i1 %cmpnot, label %t2, label %f2

t2:
  %div3 = sdiv i32 44, %phi1
  br label %endif2

f2:
  %div4 = srem i32 45, %phi1
  br label %endif2

endif2:
  %phi2 = phi i32 [ %div3, %t2 ], [ %div4, %f2 ]
  %call = call i32 @bar(i32 %phi1, i32 %phi2)

  ret i32 %call
}

Because of instcombine predicate canonicalization rules, we optimize this to one compare:

define i32 @inverted_compares_optimized(i32 %a, i32 %b, i32 %c) {
entry:
  %cmp = icmp slt i32 %a, %b
  br i1 %cmp, label %f2, label %t2

t2:               
  %div2 = srem i32 43, %c
  %div3 = udiv i32 44, %div2
  br label %endif2

f2:                               
  %div1 = sdiv i32 42, %c
  %div4 = srem i32 45, %div1
  br label %endif2

endif2:                               
  %phi12 = phi i32 [ %div2, %t2 ], [ %div1, %f2 ]
  %phi2 = phi i32 [ %div3, %t2 ], [ %div4, %f2 ]
  %call = tail call i32 @bar(i32 %phi12, i32 %phi2)
  ret i32 %call
}

But if the compare has an extra use, the instcombine doesn't fire, and this won't optimize at -O2:

define i32 @inverted_compares_and_one(i32 %a, i32 %b, i32 %c) {
entry:
  %cmp = icmp slt i32 %a, %b
  br i1 %cmp, label %t1, label %f1

t1:
  %div1 = sdiv i32 42, %c
  br label %endif1

f1:
  %div2 = srem i32 43, %c
  br label %endif1

endif1:
  %phi1 = phi i32 [ %div1, %t1 ], [ %div2, %f1 ]
  %cmpnot = icmp sge i32 %a, %b
  br i1 %cmpnot, label %t2, label %f2

t2:
  %div3 = sdiv i32 44, %phi1
  br label %endif2

f2:
  %div4 = srem i32 45, %phi1
  br label %endif2

endif2:
  %phi2 = phi i32 [ %div3, %t2 ], [ %div4, %f2 ]
  %call = call i32 @bar_extra_cmp_use(i32 %phi1, i32 %phi2, i1 %cmpnot)
  ret i32 %call
}

We could have gotten this to:

define i32 @inverted_compares_optimized(i32 %a, i32 %b, i32 %c) {
entry:
  %cmp = icmp slt i32 %a, %b
  br i1 %cmp, label %f2, label %t2

t2:               
  %div2 = srem i32 43, %c
  %div3 = udiv i32 44, %div2
  br label %endif2

f2:                               
  %div1 = sdiv i32 42, %c
  %div4 = srem i32 45, %div1
  br label %endif2

endif2:                               
  %phi12 = phi i32 [ %div2, %t2 ], [ %div1, %f2 ]
  %phi2 = phi i32 [ %div3, %t2 ], [ %div4, %f2 ]
  %cmpnot = icmp sge i32 %a, %b
  %call = call i32 @bar_extra_cmp_use(i32 %phi1, i32 %phi2, i1 %cmpnot)
  ret i32 %call
}

Swapping select operands to eliminate an inverted compare seems like a generally good fold for some other pass too. Would that be GVN?

Not entirely sure I follow this last one.
NewGVN finds whether two expressions e1 and e2 are equivalent (actually, not general equivalence as checking equivalence of program expressions is undecidable, so we approximate with Herbrand equivalence). If you have an example (IR), that will help (and I can take a look).

declare i32 @bar(i32, i32)

define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %sel1 = select i1 %cmp, i32 %c, i32 0
  %cmpnot = icmp sge i32 %a, %b
  %sel2 = select i1 %cmpnot, i32 %d, i32 0
  %call = tail call i32 @bar(i32 %sel1, i32 %sel2)
  ret i32 %call
}

I think this should be canonicalized to:

define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %sel1 = select i1 %cmp, i32 %c, i32 0
  %sel2 = select i1 %cmp, i32 0,  i32 %d
  %call = tail call i32 @bar(i32 %sel1, i32 %sel2)
  ret i32 %call
}

If you really want canonicalize (and that needs to be evaluated yet), I think you might consider instead canonicalizing to:

declare i32 @bar(i32, i32)

define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) {
  %cmp = icmp slt i32 %a, %b
  %sel1 = select i1 %cmp, i32 %c, i32 0
  %cmpnot = icmp slt i32 %a, %b
  %sel2 = select i1 %cmpnot, i32 0, i32 %d
  %call = tail call i32 @bar(i32 %sel1, i32 %sel2)
  ret i32 %call
}

and then let NewGVN discover that %cmpnot and %cmp are actually in the same congruence class (GVN will also get to the same conclusion, but it doesn't have a real analysis/notion of congruence class).

A couple of updates:

  1. Added the tests included here to show the current IR with: rL303133 / rL303185
  1. Refactored some code that may be relevant for this transform:

rL303261

I think the predicate canonicalization that we do for icmp+branch is worthwhile because it increases the likelihood of creating duplicate values (fcmp needs fixes IMO). It also can make pattern matching for other transforms easier if we know that half of the possible predicates are eliminated by canonicalization (at least for the single-use case). For the same reasons, I favor making that canonicalization apply to cmp+select as well. Ie, I would invert the predicates / swap the select operands shown in the tests here. This probably requires preliminary patches to avoid regressions though.

modocache updated this revision to Diff 99665.May 20 2017, 1:59 AM
modocache edited edge metadata.

Rebased onto rL303133. Sorry, I'm a new contributor, so I'm having trouble understanding how to proceed here. Is there anything specifically I should change in this diff? Should I be using the isCanonicalPredicate helper added in rL303261? Or should I be using a different approach altogether?

Rebased onto rL303133. Sorry, I'm a new contributor, so I'm having trouble understanding how to proceed here. Is there anything specifically I should change in this diff? Should I be using the isCanonicalPredicate helper added in rL303261? Or should I be using a different approach altogether?

It turns out this simple patch raises deep philosophical questions for LLVM. :)
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113184.html

I agreed with Davide that the patch as written is too specific. We should handle this problem more generally, and likely that's a shortcoming of some other pass. IMO and regardless of that, InstCombine is still missing a canonicalization of the cmp predicate that is feeding the select. I did not see any other comments to know if anyone else agrees or disagrees with that position. If you disagree with that, then I assume you would also want to see the related cmp+branch canonicalization removed? I suspect that would require some thorough perf analysis to justify.

But since we have that canonicalization today, I think the immediate problem will be solved either here in InstCombine or somewhere else using existing logic.

There's also a question about whether the existing fold that I pointed out:
// (op (select (a, c, b)), (select (a, d, b))) -> (select (a, (op c, d), 0))
should exist in InstCombine. Again, whether you think that belongs in InstCombine or not, IMO is irrelevant to the immediate goal of solving PR32791 because the fold already exists, and there's no evidence that some other pass handles that case. If you advocate removing that, then you have to come up with a better solution...which could be a major undertaking at this point.

So to answer the question: yes, I think we should use the helper added in rL303261 to canonicalize cmps used by selects. However, doing that likely requires not introducing any known regressions to instcombine, and I already know there will be one. :)

Let me see if I can solve that one non-controversially and report back here on that.

There are at least a couple of current patches that delve into the same philosophical territory, so I think it's worth following the discussions here too:
https://reviews.llvm.org/D33342
https://reviews.llvm.org/D33338

Ah OK, thanks! I'll keep this open and try to follow along with the discussions as best I can. Thanks for the reviews!

Nothing's easy with this one...the transform that I thought could incur a regression may not even be legal...but it's not clear if anyone knows for certain. :)
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113261.html

modocache abandoned this revision.Jun 24 2017, 8:36 PM

This is a bit beyond my depth at this point, sorry! Going to abandon this one :)