Download Raw Diff

Details

Reviewers

t.p.northover
reames
majnemer
gberry
sanjoy
llvm-commits
hfinkel
mcrosier

Commits

rGcf3bcb2625a7: [InstCombine] Canonicalize icmp instructions based on dominating conditions.
rL268521: [InstCombine] Canonicalize icmp instructions based on dominating conditions.

Summary

This patch canonicalizes conditions based on the constant range information
of the dominating branch condition.
For example:

  %cmp = icmp slt i64 %a, 0
  br i1 %cmp, label %land.lhs.true, label %lor.rhs
  lor.rhs:
    %cmp2 = icmp sgt i64 %a, 0

Would now be canonicalized into:

  %cmp = icmp slt i64 %a, 0
  br i1 %cmp, label %land.lhs.true, label %lor.rhs
  lor.rhs:
    %cmp2 = icmp ne i64 %a, 0

Diff Detail

Repository: rL LLVM

Event Timeline

bmakam updated this revision to Diff 52839.Apr 6 2016, 12:53 PM

bmakam retitled this revision from to [InstCombine] Canonicalize icmp instructions based on dominating conditions..

bmakam updated this object.

bmakam added reviewers: mcrosier, gberry, t.p.northover, llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptApr 6 2016, 12:53 PM

Is this the best pass for this change to be in? ISTM that something like jump threading would be more appropriate.

In D18841#393646, @majnemer wrote:

Is this the best pass for this change to be in? ISTM that something like jump threading would be more appropriate.

I see no reason to only solve this one way. This could be handled at instsimplify or early-cse or jumpthreading or instcombine or gvn. I felt instcombine to be better place because we have dominator tree info and there already exists a function isSignBitCheck that checks if the comparison only checks the sign bit.

Fixed a bug. We need to guarantee that the dominating branch condition holds at the compare instruction, so it is not enough to check the immediate dominator. We also need to ensure that the edge dominates the compare instruction.

Addressed few minor nits to reduce the amount of work done and also added another test for an interesting case found in spec2006/povray.
FWIW, this change has 8% speedup in spec2006/soplex on Kryo and some small wins in other CINT2006 benchmarks.

Ping?

Ping #2?
I plan to improve this patch to also handle PR23333, but wanted to get consensus on the current approach. I believe this can be also solved in JumpThreading too but it will have to depend on LazyValueInfo which is known to have compile-time implications. Please let me know if this approach is reasonable.

bmakam added reviewers: reames, sanjoy.Apr 14 2016, 4:00 AM

bmakam added a reviewer: hfinkel.Apr 14 2016, 4:16 AM

bmakam mentioned this in D18905: [ValueTracking] Improve isImpliedCondition for conditions with matching operands..Apr 15 2016, 10:40 AM

Hi,

I think this is an overtly specific solution. Ideally we'd do
something that would generalize to cases like

if (a >= 5)
  if (a < 6) {
    FOO;
  }

if (a == 5) {
  FOO;
}

as well.

IMO the right layering for all the predicate logic you have here is in
ConstantRange. Just like we have makeAllowedICmpRegion and
makeSatisfyingICmpRegion, we should have a makeExactICmpRegion
where the RHS is not a ConstantRange but an APInt and have it
return *exactly* the set of values that satisfy the predicate (i.e. a
value satisfies the predicate *iff* it is in that set). Then the
current algorithm could become:

For a dominating icmp instruction "Var Pred RHS",
    compute R_Dom = makeAllowedICmpRegion(Pred, getRange(RHS));
If the current icmp is of the form "Var Pred ConstRHS" then
    compute R_Self = makeExactICmpRegion(Pred, ConstRHS)
    if R_Self.intersect(R_Dom) is empty:
      replace current icmp with false
    if R_Self.intersect(R_Dom) has exactly one element E:
      replace current icmp with "Var == E"
    if R_Self.intersect(R_Dom).complement() has exact one element E:
      replace current icmp with "Var != E"

lib/Transforms/InstCombine/InstCombineCompares.cpp
3263 ↗	(On Diff #53296)	If you're going to restrict yourself to this case, I don't think you should depend on `DT`. Instead you can just check that the only predecessor of `Parent` is `TrueBB` or `FalseBB` (depending on `TrueIfSigned`) and be done with it, since the only value of `IDom` you'll succeed with here is `Parent->getSinglePredecessor()`.

Thanks for the review Sanjoy. Please let me know if this looks reasonable.

sanjoy requested changes to this revision.Apr 21 2016, 11:37 PM

sanjoy edited edge metadata.

sanjoy added inline comments.

lib/Transforms/InstCombine/InstCombineCompares.cpp
3284 ↗	(On Diff #54417)	I think you also need to check that `TrueBB` and `FalseBB` are not equal.
3288 ↗	(On Diff #54417)	I think what you want is `makeAllowedICmpRegion(getInversePredicate(Pred), CI2)`. In this case (where `Other` is a singleton set) what you have is fine, but generally (i.e. if the RHS was not a constant integer, but a range) it is not. Actually, it might be clearer to introduce a new helper: `ConstantRange::makeExactICmpRegion` that takes an `APInt` as RHS, and returns a set that is equal to both the "allowed" and "satisfying" icmp range.
3290 ↗	(On Diff #54417)	This is confusingly named -- by "RHS", I'd normally understand `CI` or `CI2`; but what you mean here is the dominated LHS. How about calling it just that: `DominatedLHSRange`?
3295 ↗	(On Diff #54417)	Why do you care about `isSignBit`?
test/Transforms/InstCombine/icmp.ll
2038 ↗	(On Diff #54417)	Can you add a dominated check with, say, `10` or something just to demonstrate that we're general here?

This revision now requires changes to proceed.Apr 21 2016, 11:37 PM

Addressed Sanjoy's comments and added more tests.

lib/Transforms/InstCombine/InstCombineCompares.cpp
3284 ↗	(On Diff #54417)	I think we won't match this pattern if TrueBB == FalseBB because instcombine will already have transformed such IR into br i1 undef, label %dest, label %dest. I will add an assert to make sure they aren't equal.
3288 ↗	(On Diff #54417)	I am not sure if I follow this correctly, could you please give an example?. If the new helper takes an APInt as RHS isn't it a singleton set in this case? Taking a general example if Pred is ult and Other is i8 [2, 5) makeAllowedICmpRegion would return [0, 4) and make SatisfyingICmpRegion would return [0, 2). Isn't a set that is equal to both same as the "satisfying" icmp range i.e [0, 2) in this example?
3290 ↗	(On Diff #54417)	How about using CR and DominatingCR corresponding to CI and CI2?
3295 ↗	(On Diff #54417)	If it is a SignBit, canonicalizing could transform for example an icmp slt 0 into icmp eq/ne 0 which we want to avoid in situations where the icmp is only used as input to a branch because it will pessimize the codegen by generating a compare and branch on zero instruction (cbz) instead of a test and branch(tbz). tbz has better branch displacement than a cbz. If the icmp is an input to anything other than branches such as a select we would be better of canonicalizing it to eq/ne. Added a helper function isZBranchCheck to restrict it further for this case and added testcases.

sanjoy requested changes to this revision.Apr 22 2016, 8:13 PM

sanjoy edited edge metadata.

sanjoy added inline comments.

lib/Transforms/InstCombine/InstCombineCompares.cpp
118 ↗	(On Diff #54645)	I'd use an architecture neutral name here, if possible.
3293 ↗	(On Diff #54645)	My bias is that it is better to have these transforms be correct in isolation, and not rely on ordering across a large pass like -instcombine. But if you're sure that the "branch on undef" optimization is guaranteed to kick in before this optimization (i.e. that borders around "obvious"), then feel free to add an assert instead of checking and bailing out.
3297 ↗	(On Diff #54645)	Yeah, I wasn't as clear as I should have been. The "problem" here is that the logic is structured in a way that will not generalize to RHS being a non-singleton range. For instance, if we have if (lhs u< "[2, 5)") { ... } else { // This is what you're trying to optimize if (lhs u< 3) print("hello"); } then currently you will infer the range of `lhs` on the else branch to be `[0, 4).inverse()` == `[4, 0)`, and fold `lhs u< 3` to `false`. But `lhs u< 3` can be true also, if the RHS (the value we know is in `[2, 5)`) was `2` and `lhs` was `2`. In other words, in the above IR you know that in the "then" case `lhs` cannot be unsigned-greater than 4, but that does not mean that in the else case `lhs` has to be unsigned greater than 4. Now I think this won't be an observable bug if RHS is always a singleton set, but I'd like to have some measures in place that make it clear what's going on. There are two ways we can do that: For a singleton RHS, I think (but please verify) `makeAllowedICmpRegion` and `makeSatisfyingICmpRegion` will return the same exact set, which contains _exactly_ the value that the predicate is true for. If this assumption is true, my original suggestion was to add a `makeExactICmpRegion` method as: makeExactICmpRegion(...) { assert(makeAllowedICmpRegion(...) == makeSatisfyingICmpRegion(...)); return makeAllowedICmpRegion(...); } and to use that here instead of `makeAllowedICmpRegion`. It would take an `APInt` as RHS, and if someone comes along and tries to generalize this code to work on non-singleton RHSs, they'll immediately know that it isn't a 100% straightforward. The second possibility is to generalize this logic to work with a non-singleton RHS range. To do that, we can start by allowing the RHS to be a load instruction with `!range` metadata, and eventually move over to even more sophisticated analyses.
3299 ↗	(On Diff #54645)	SGTM

This revision now requires changes to proceed.Apr 22 2016, 8:13 PM

Addressed Sanjoy's comments.

lib/Transforms/InstCombine/InstCombineCompares.cpp
118 ↗	(On Diff #54645)	Does 'isBranchOnSignBitCheck' sound good?
3293 ↗	(On Diff #54645)	Since instcombine visits the instruction in topological order, we will always visit the dominating branch before visting the icmp instruction and so it is guranteed that the branch condition is undef when the condition is irrelevant.
3297 ↗	(On Diff #54645)	For a singleton RHS, I think (but please verify) makeAllowedICmpRegion and makeSatisfyingICmpRegion will return the same exact set, which contains _exactly_ the value that the predicate is true for. I think this assumption is not always true, see my previous example. then currently you will infer the range of lhs on the else branch to be [0, 4).inverse() == [4, 0) I see your point here, I have already modified this to 'makeAllowedICmpRegion(getInversePredicate(Pred), CI2)' based on your earlier suggestion. The second possibility is to generalize this logic to work with a non-singleton RHS range. I have no idea if non-singleton RHS ranges occur enough to be worthwhile to generalize. Do you mind if I defer this suggestion?

correctly update the use of isBranchOnSignBitCheck()

Hi Balaram,

I've replied to your inline comment, but it is possible that we're just talking past each other. I'm usually on #llvm as sanjoyd on PDT daytime, we can chat there in realtime if you think that'll help.

lib/Transforms/InstCombine/InstCombineCompares.cpp
3294 ↗	(On Diff #54879)	Add a comment here on why that is true.
3297 ↗	(On Diff #54879)	I think this assumption is not always true, see my previous example. Are you talking about the "Taking a general example if Pred is ult and Other is i8 [2, 5) makeAllowedICmpRegion would return [0, 4) and make SatisfyingICmpRegion would return [0, 2). Isn't a set that is equal to both same as the "satisfying" icmp range i.e [0, 2) in this example?" bit? In that case isn't the RHS (i.e. `Other`) `[2, 5)`, specifically not a singleton set? If you're talking about another example, then I must have missed it. I have already modified this to 'makeAllowedICmpRegion(getInversePredicate(Pred), CI2)' based on your earlier suggestion. That's not the only place where the current code will not generalize to a non-singleton RHS. For instance, the `Intersection.isSingleElement()` check won't generalize either -- you cannot transform if (a u< "[2, 5)") { // AllowedRegion = [0, 4) if (a >=u "[3, 5)") { // AllowedRegion = [3, 0) print(); } else abort(); } to if (a u< "[2, 5)") if (a == 3) print(); else abort(); since for `"[2, 5)"` == `4` and `"[3, 5)"` == `4` and `a` == `3`, the first program would have called `abort()` but the new program will call `print()`. On the other hand, for the `Intersection.isEmptySet()` you do need to call `AllowedICmpRegion` on both the conditions. There is a general pattern here -- `makeAllowedICmpRegion` only lets you "rule out" certain values (e.g. if `a >=u "[3, 5)"` then `a` definitely cannot be in {`0`, `1`, `2`}, but it can be anything else), but it does not guarantee that the comparison will fold one way or the other. IOW, it is the "only if" part of the constraint (the comparison succeeds only if the LHS in the returned set), but not the "if" part (just because LHS is in the returned set does not mean the comparison will succeed). I have no idea if non-singleton RHS ranges occur enough to be worthwhile to generalize. Do you mind if I defer this suggestion? Sure, but in that case we need to have the logic phrased in a way that makes it obvious that the current logic is not fully general (i.e. with a `makeExactICmpRegion`). I don't think non-singletom RHS ranges are too rare: if (a u< 100) if (b u< a) if (b >u 500) In the inner condition `b u< a`, you know the RHS is in `[0, 100)`, and you can use that information to fold the innermost condition to `false`. However, it is fine to not handle these cases as a starting point.

majnemer added inline comments.Apr 25 2016, 11:38 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp
3293 ↗	(On Diff #54879)	This is only true for the very first iteration of InstCombine. InstCombine adds the users of all instructions that it modifies to a worklist, this worklist is not toplogically sorted.

In D18841#411704, @sanjoy wrote:

Hi Balaram,

I've replied to your inline comment, but it is possible that we're just talking past each other. I'm usually on #llvm as sanjoyd on PDT daytime, we can chat there in realtime if you think that'll help.

Thanks Sanjoy, sure I will be available on #llvm as bmakam. I think I have misinterpreted what you mean by non-singleton RHS. However, I think the current logic as is should be able to generalize to what you define as non-singleton RHS ranges.

since for "[2, 5)" == 4 and "[3, 5)" == 4 and a == 3, the first program would have called abort() but the new program will call print().

The condition here is u< [2,5) so the allowed range is [0,4) which means a can never be == 4 for the first/dominating if condition and the program will abort() as expected.

If it helps to clarify, I think of the if conditions allowed ranges as set A for the dominating if condition and set B for the second(Parent) if condition. If dominating if condition is true, then the second if condition is true iff A ∩ B is true. So if {A ∩ B} == {ε} then it is sufficient to canonicalize the second if condition to icmp eq ε. OTOH, the second if condition is false iff A\B (aka relative complement) is true. So if {A\B} == {ε} then it is sufficient to canonicalize the second if condition to icmp ne ε. Similarly, if the dominating if condition is false, then the second if condition is true iff ~A ∩ B is true and the second if condition is false iff ~A\B is true. For non-singleton RHS ranges, the satisfying range is ⊆ allowed range, so it is sufficient to check for allowed range.

In D18841#412078, @bmakam wrote:

In D18841#411704, @sanjoy wrote:

Hi Balaram,

I've replied to your inline comment, but it is possible that we're just talking past each other. I'm usually on #llvm as sanjoyd on PDT daytime, we can chat there in realtime if you think that'll help.

Thanks Sanjoy, sure I will be available on #llvm as bmakam. I think I have misinterpreted what you mean by non-singleton RHS. However, I think the current logic as is should be able to generalize to what you define as non-singleton RHS ranges.

since for "[2, 5)" == 4 and "[3, 5)" == 4 and a == 3, the first program would have called abort() but the new program will call print().

The condition here is u< [2,5) so the allowed range is [0,4) which
means a can never be == 4 for the first/dominating if condition and
the program will abort() as expected.

I didn't mean to say a was 4. By "[2, 5)" == 4 I meant the
RHS we know is in "[2, 5)" is 4 (similar for the other time I used
that notation). a itself is 3 in the above example.

[Edit: added this example]IOW, at runtime the ranged values resolve as

if (a u< 4) {
  // AllowedRegion = [0, 4)
  if (a >=u 4") {
    // AllowedRegion = [3, 0)
    print();
  } else
    abort();
}

If it helps to clarify, I think of the if conditions allowed ranges
as set A for the dominating if condition and set B for the
second(Parent) if condition. If dominating if condition is true,
then the second if condition is true iff A ∩ B is true.

As I said in my previous email, the second condition is true only
if A ∩ B is true, not iff A ∩ B is true (follows from the
fact that the second condition is true only if B is true).
I.e. the second condition implies A ∩ B but not the other way around.

makeAllowedICmpRegion gives you a set with the only if property.
makeSatisfyingICmpRegion gives you a set with the if property.
If they return equal sets, then you have iff on that set (or if
you take an intersection of the sets they return).

makeAllowedICmpRegion gives you a set with the only if property.
makeSatisfyingICmpRegion gives you a set with the if property.
If they return equal sets, then you have iff on that set (or if
you take an intersection of the sets they return).

Add a helper makeExactICmpRegion that will return the region if makeSatisfyingICmpRegion is exactly same as makeAllowedICmpRegion.

bmakam marked an inline comment as done.Apr 26 2016, 12:51 PM

bmakam added inline comments.

lib/Transforms/InstCombine/InstCombineCompares.cpp
3307 ↗	(On Diff #55071)	if Difference is empty then the inner condition is always true.

bmakam added inline comments.Apr 26 2016, 12:56 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp
3293 ↗	(On Diff #55071)	This is only true for the very first iteration of InstCombine. InstCombine adds the users of all instructions that it modifies to a worklist, this worklist is not toplogically sorted. Since it isn't very obvious, to be on the safer side I have changed the assert back into a check.

sanjoy requested changes to this revision.Apr 27 2016, 3:19 PM

sanjoy edited edge metadata.

sanjoy added inline comments.

lib/IR/ConstantRange.cpp
132 ↗	(On Diff #55071)	Can you add a comment on why this assert is valid (i.e. that we're relying on knowledge of how the two `makeAllowedXXXRegion` s are implemented).
lib/Transforms/InstCombine/InstCombineCompares.cpp
118 ↗	(On Diff #55071)	We no longer repeat function names in comments.
119 ↗	(On Diff #55071)	This isn't an exploded icmp right?
3308 ↗	(On Diff #55071)	Add a comment on why you need `isBranchOnSignBitCheck` here.
3309 ↗	(On Diff #55071)	These are better written as: if (auto AI = Intersection.getSingleElement()) return new ICmpInst(..., Builder->getInt(AI)); etc.

This revision now requires changes to proceed.Apr 27 2016, 3:19 PM

Thanks Sanjoy, please let me know if the comments added make sense.

bmakam marked 3 inline comments as done.Apr 28 2016, 6:00 AM

bmakam added inline comments.

lib/Transforms/InstCombine/InstCombineCompares.cpp
118 ↗	(On Diff #55404)	Right, I have corrected the comment.

Generally speaking, InstCombine canonicalization helps out down stream analysis by simplifying the IR or exposes further opportunities for canonicalization. i'm not entirely sure that this transform fits that bill.

I wonder if a more appropriate place for this to live is in CodeGenPrepare seeing as how sensitive this transform is to the peculiarities of the target ISA, perhaps with a TargetLowering check which verifies that it would make sense to perform on such a target.

In D18841#415959, @majnemer wrote:

Generally speaking, InstCombine canonicalization helps out down stream analysis by simplifying the IR or exposes further opportunities for canonicalization. i'm not entirely sure that this transform fits that bill.

FWIW, the examples listed in the testcases demonstrate that it helps out by simplifying the IR. In my tests on spec2006 benchmarks it has reduced more than 3000 static instructions in xalancbmk and 100s of static instructions in gcc and several other benchmarks. This is similar to the transformations done in FoldAndOfICmps except that here we consider dominating condition as the LHS. I have constrained this only to the first predecessor to find the dominating icmp and could still remove several static instructions which should improve the code quality and compile time.

I wonder if a more appropriate place for this to live is in CodeGenPrepare seeing as how sensitive this transform is to the peculiarities of the target ISA, perhaps with a TargetLowering check which verifies that it would make sense to perform on such a target.

I am not totally against moving this to the backend and infact I started off by implementing it at the very end of the pipeline in D18572, but realized that it would catch more such cases and might help other downstream optimizations when we do this transform in instcombine. What do you think?

In D18841#416085, @bmakam wrote:

In D18841#415959, @majnemer wrote:

Generally speaking, InstCombine canonicalization helps out down stream analysis by simplifying the IR or exposes further opportunities for canonicalization. i'm not entirely sure that this transform fits that bill.

FWIW, the examples listed in the testcases demonstrate that it helps out by simplifying the IR. In my tests on spec2006 benchmarks it has reduced more than 3000 static instructions in xalancbmk and 100s of static instructions in gcc and several other benchmarks. This is similar to the transformations done in FoldAndOfICmps except that here we consider dominating condition as the LHS. I have constrained this only to the first predecessor to find the dominating icmp and could still remove several static instructions which should improve the code quality and compile time.

Does this result in instructions removed at the LLVM IR or does it result in fewer instructions emitted in the backend?

If it results in fewer IR instructions, can you determine why this is the case? If it results in the same number of IR instructions, then performing this transform might make more sense in CodeGenPrepare.

I wonder if a more appropriate place for this to live is in CodeGenPrepare seeing as how sensitive this transform is to the peculiarities of the target ISA, perhaps with a TargetLowering check which verifies that it would make sense to perform on such a target.

I am not totally against moving this to the backend and infact I started off by implementing it at the very end of the pipeline in D18572, but realized that it would catch more such cases and might help other downstream optimizations when we do this transform in instcombine. What do you think?

I would have done this optimization a little earlier at around the CodeGenPrepare level.

Does this result in instructions removed at the LLVM IR or does it result in fewer instructions emitted in the backend?

If it results in fewer IR instructions, can you determine why this is the case? If it results in the same number of IR instructions, then performing this transform might make more sense in CodeGenPrepare.

I generated the IR after LTO using -Wl,plugin-opt=emit-llvm for xalancbmk and with this patch the number of IR instructions were 1059942. Compared to the baseline without my patch the number of IR instructions were 1067179. So there was a net reduction of 7237 IR instructions which translated into 3531 fewer instructions emitted in the backend. Over 1000 add instructions, and branch instructions were removed. Most of the cases were similar to the tests I have added in this patch.

gentle ping.

This looks okay to me (pending the one inline comment), assuming @majnemer is fine with too.

Btw, looks like instcombine already canonicalizes towards equality, for instance:

define i1 @f(i32 %x) {
  %v = icmp ugt i32 %x, 0
  ret i1 %v
}

is instcombined to

define i1 @f(i32 %x) {
  %v = icmp ne i32 %x, 0
  ret i1 %v
}

and this change looks like it is doing more of the same.

lib/Transforms/InstCombine/InstCombineCompares.cpp
3287 ↗	(On Diff #55404)	I think you can just use `dyn_cast` here. For a well formed CFG, `getTerminator` should always return `nonnull`.

This revision is now accepted and ready to land.May 2 2016, 2:19 PM

Thank you very much for the review Sanjoy.

@majnemer, Is it reasonable to commit this patch?

majnemer added inline comments.May 4 2016, 10:18 AM

lib/Transforms/InstCombine/InstCombineCompares.cpp
3306–3317 ↗	(On Diff #55968)	The prior two replacements seem like pure goodness: they make conditional branches branch on true or false. These two seem weird to have in InstCombine. Seeing as how you are concerned with pessimization, can we move these to CGP and have them operate on a TargetLowering hook?

majnemer accepted this revision.May 4 2016, 10:20 AM

majnemer added a reviewer: majnemer.

majnemer added inline comments.

lib/Transforms/InstCombine/InstCombineCompares.cpp
3306–3317 ↗	(On Diff #55968)	On second thought, this is a nice canonicalization regardless because it makes the comparisons easier to analyze. LGTM.

Thank you very much David! I will commit this patch shortly.

Closed by commit rL268521: [InstCombine] Canonicalize icmp instructions based on dominating conditions. (authored by bmakam). · Explain WhyMay 4 2016, 10:40 AM

This revision was automatically updated to reflect the committed changes.

bmakam mentioned this in D19550: [InstCombine] Determine the result of a select based on a dominating condition..May 4 2016, 2:28 PM

Diff 56172

llvm/trunk/include/llvm/IR/ConstantRange.h

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	public:
/// 'intersection over all y in Other . { x : icmp op x y is true }'. If the		/// 'intersection over all y in Other . { x : icmp op x y is true }'. If the
/// exact answer is not representable as a ConstantRange, the return value		/// exact answer is not representable as a ConstantRange, the return value
/// will be a proper subset of the above.		/// will be a proper subset of the above.
///		///
/// Example: Pred = ult and Other = i8 [2, 5) returns [0, 2)		/// Example: Pred = ult and Other = i8 [2, 5) returns [0, 2)
static ConstantRange makeSatisfyingICmpRegion(CmpInst::Predicate Pred,		static ConstantRange makeSatisfyingICmpRegion(CmpInst::Predicate Pred,
const ConstantRange &Other);		const ConstantRange &Other);

		/// Produce the exact range such that all values in the returned range satisfy
		/// the given predicate with any value contained within Other. Formally, this
		/// returns the exact answer when the superset of 'union over all y in Other
		/// is exactly same as the subset of intersection over all y in Other.
		/// { x : icmp op x y is true}'.
		///
		/// Example: Pred = ult and Other = i8 3 returns [0, 3)
		static ConstantRange makeExactICmpRegion(CmpInst::Predicate Pred,
		const APInt &Other);

/// Return the largest range containing all X such that "X BinOpC Y" is		/// Return the largest range containing all X such that "X BinOpC Y" is
/// guaranteed not to wrap (overflow) for all Y in Other.		/// guaranteed not to wrap (overflow) for all Y in Other.
///		///
/// NB! The returned set does not contain all possible values of X for		/// NB! The returned set does not contain all possible values of X for
/// which "X BinOpC Y" does not wrap -- some viable values of X may be		/// which "X BinOpC Y" does not wrap -- some viable values of X may be
/// missing, so you cannot use this to contrain X's range. E.g. in the last		/// missing, so you cannot use this to contrain X's range. E.g. in the last
/// example, "(-2) + 1" is both nsw and nuw (so the "X" could be -2), but (-2)		/// example, "(-2) + 1" is both nsw and nuw (so the "X" could be -2), but (-2)
/// is not in the set returned.		/// is not in the set returned.
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/ConstantRange.cpp

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	ConstantRange ConstantRange::makeSatisfyingICmpRegion(CmpInst::Predicate Pred,
// Follows from De-Morgan's laws:		// Follows from De-Morgan's laws:
//		//
// ~(~A union ~B) == A intersect B.		// ~(~A union ~B) == A intersect B.
//		//
return makeAllowedICmpRegion(CmpInst::getInversePredicate(Pred), CR)		return makeAllowedICmpRegion(CmpInst::getInversePredicate(Pred), CR)
.inverse();		.inverse();
}		}

		ConstantRange ConstantRange::makeExactICmpRegion(CmpInst::Predicate Pred,
		const APInt &C) {
		// Computes the exact range that is equal to both the constant ranges returned
		// by makeAllowedICmpRegion and makeSatisfyingICmpRegion. This is always true
		// when RHS is a singleton such as an APInt and so the assert is valid.
		// However for non-singleton RHS, for example ult [2,5) makeAllowedICmpRegion
		// returns [0,4) but makeSatisfyICmpRegion returns [0,2).
		//
		assert(makeAllowedICmpRegion(Pred, C) == makeSatisfyingICmpRegion(Pred, C));
		return makeAllowedICmpRegion(Pred, C);
		}

ConstantRange		ConstantRange
ConstantRange::makeGuaranteedNoWrapRegion(Instruction::BinaryOps BinOp,		ConstantRange::makeGuaranteedNoWrapRegion(Instruction::BinaryOps BinOp,
const ConstantRange &Other,		const ConstantRange &Other,
unsigned NoWrapKind) {		unsigned NoWrapKind) {
typedef OverflowingBinaryOperator OBO;		typedef OverflowingBinaryOperator OBO;

// Computes the intersection of CR0 and CR1. It is different from		// Computes the intersection of CR0 and CR1. It is different from
// intersectWith in that the ConstantRange returned will only contain elements		// intersectWith in that the ConstantRange returned will only contain elements
▲ Show 20 Lines • Show All 721 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	if (VectorType *VTy = dyn_cast<VectorType>(In1->getType())) {
return false;		return false;
}		}

return HasSubOverflow(cast<ConstantInt>(Result),		return HasSubOverflow(cast<ConstantInt>(Result),
cast<ConstantInt>(In1), cast<ConstantInt>(In2),		cast<ConstantInt>(In1), cast<ConstantInt>(In2),
IsSigned);		IsSigned);
}		}

		/// Given an icmp instruction, return true if any use of this comparison is a
		/// branch on sign bit comparison.
		static bool isBranchOnSignBitCheck(ICmpInst &I, bool isSignBit) {
		for (auto *U : I.users())
		if (isa<BranchInst>(U))
		return isSignBit;
		return false;
		}

/// isSignBitCheck - Given an exploded icmp instruction, return true if the		/// isSignBitCheck - Given an exploded icmp instruction, return true if the
/// comparison only checks the sign bit. If it only checks the sign bit, set		/// comparison only checks the sign bit. If it only checks the sign bit, set
/// TrueIfSigned if the result of the comparison is true when the input value is		/// TrueIfSigned if the result of the comparison is true when the input value is
/// signed.		/// signed.
static bool isSignBitCheck(ICmpInst::Predicate pred, ConstantInt *RHS,		static bool isSignBitCheck(ICmpInst::Predicate pred, ConstantInt *RHS,
bool &TrueIfSigned) {		bool &TrueIfSigned) {
switch (pred) {		switch (pred) {
case ICmpInst::ICMP_SLT: // True if LHS s< 0		case ICmpInst::ICMP_SLT: // True if LHS s< 0
▲ Show 20 Lines • Show All 3,154 Lines • ▼ Show 20 Lines	if (I.isEquality()) {
return Inst;		return Inst;
}		}
}		}

// If this comparison is a normal comparison, it demands all		// If this comparison is a normal comparison, it demands all
// bits, if it is a sign bit comparison, it only demands the sign bit.		// bits, if it is a sign bit comparison, it only demands the sign bit.
bool UnusedBit;		bool UnusedBit;
isSignBit = isSignBitCheck(I.getPredicate(), CI, UnusedBit);		isSignBit = isSignBitCheck(I.getPredicate(), CI, UnusedBit);

		// Canonicalize icmp instructions based on dominating conditions.
		BasicBlock *Parent = I.getParent();
		BasicBlock *Dom = Parent->getSinglePredecessor();
		auto *BI = Dom ? dyn_cast<BranchInst>(Dom->getTerminator()) : nullptr;
		ICmpInst::Predicate Pred;
		BasicBlock TrueBB, FalseBB;
		ConstantInt *CI2;
		if (BI && match(BI, m_Br(m_ICmp(Pred, m_Specific(Op0), m_ConstantInt(CI2)),
		TrueBB, FalseBB)) &&
		TrueBB != FalseBB) {
		ConstantRange CR = ConstantRange::makeAllowedICmpRegion(I.getPredicate(),
		CI->getValue());
		ConstantRange DominatingCR =
		(Parent == TrueBB)
		? ConstantRange::makeExactICmpRegion(Pred, CI2->getValue())
		: ConstantRange::makeExactICmpRegion(
		CmpInst::getInversePredicate(Pred), CI2->getValue());
		ConstantRange Intersection = DominatingCR.intersectWith(CR);
		ConstantRange Difference = DominatingCR.difference(CR);
		if (Intersection.isEmptySet())
		return replaceInstUsesWith(I, Builder->getFalse());
		if (Difference.isEmptySet())
		return replaceInstUsesWith(I, Builder->getTrue());
		// Canonicalizing a sign bit comparison that gets used in a branch,
		// pessimizes codegen by generating branch on zero instruction instead
		// of a test and branch. So we avoid canonicalizing in such situations
		// because test and branch instruction has better branch displacement
		// than compare and branch instruction.
		if (!isBranchOnSignBitCheck(I, isSignBit) && !I.isEquality()) {
		if (auto *AI = Intersection.getSingleElement())
		return new ICmpInst(ICmpInst::ICMP_EQ, Op0, Builder->getInt(*AI));
		if (auto *AD = Difference.getSingleElement())
		return new ICmpInst(ICmpInst::ICMP_NE, Op0, Builder->getInt(*AD));
		}
		}
}		}

// See if we can fold the comparison based on range information we can get		// See if we can fold the comparison based on range information we can get
// by checking whether bits are known to be zero or one in the input.		// by checking whether bits are known to be zero or one in the input.
if (BitWidth != 0) {		if (BitWidth != 0) {
APInt Op0KnownZero(BitWidth, 0), Op0KnownOne(BitWidth, 0);		APInt Op0KnownZero(BitWidth, 0), Op0KnownOne(BitWidth, 0);
APInt Op1KnownZero(BitWidth, 0), Op1KnownOne(BitWidth, 0);		APInt Op1KnownZero(BitWidth, 0), Op1KnownOne(BitWidth, 0);

▲ Show 20 Lines • Show All 1,332 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/icmp.ll

	Show First 20 Lines • Show All 1,973 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[TMP1]], -43			; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[TMP1]], -43
	; CHECK-NEXT: ret i1 [[CMP]]			; CHECK-NEXT: ret i1 [[CMP]]
	;			;
	%or = or i32 %x, 42			%or = or i32 %x, 42
	%cmp = icmp ne i32 %or, -1			%cmp = icmp ne i32 %or, -1
	ret i1 %cmp			ret i1 %cmp
	}			}

				; CHECK-LABEL: @idom_sign_bit_check_edge_dominates
				define void @idom_sign_bit_check_edge_dominates(i64 %a) {
				entry:
				%cmp = icmp slt i64 %a, 0
				br i1 %cmp, label %land.lhs.true, label %lor.rhs

				land.lhs.true: ; preds = %entry
				br label %lor.end

				; CHECK-LABEL: lor.rhs:
				; CHECK-NOT: icmp sgt i64 %a, 0
				; CHECK: icmp eq i64 %a, 0
				lor.rhs: ; preds = %entry
				%cmp2 = icmp sgt i64 %a, 0
				br i1 %cmp2, label %land.rhs, label %lor.end

				land.rhs: ; preds = %lor.rhs
				br label %lor.end

				lor.end: ; preds = %land.rhs, %lor.rhs, %land.lhs.true
				ret void
				}

				; CHECK-LABEL: @idom_sign_bit_check_edge_not_dominates
				define void @idom_sign_bit_check_edge_not_dominates(i64 %a) {
				entry:
				%cmp = icmp slt i64 %a, 0
				br i1 %cmp, label %land.lhs.true, label %lor.rhs

				land.lhs.true: ; preds = %entry
				br i1 undef, label %lor.end, label %lor.rhs

				; CHECK-LABEL: lor.rhs:
				; CHECK: icmp sgt i64 %a, 0
				; CHECK-NOT: icmp eq i64 %a, 0
				lor.rhs: ; preds = %land.lhs.true, %entry
				%cmp2 = icmp sgt i64 %a, 0
				br i1 %cmp2, label %land.rhs, label %lor.end

				land.rhs: ; preds = %lor.rhs
				br label %lor.end

				lor.end: ; preds = %land.rhs, %lor.rhs, %land.lhs.true
				ret void
				}

				; CHECK-LABEL: @idom_sign_bit_check_edge_dominates_select
				define void @idom_sign_bit_check_edge_dominates_select(i64 %a, i64 %b) {
				entry:
				%cmp = icmp slt i64 %a, 5
				br i1 %cmp, label %land.lhs.true, label %lor.rhs

				land.lhs.true: ; preds = %entry
				br label %lor.end

				; CHECK-LABEL: lor.rhs:
				; CHECK-NOT: [[B:%.*]] = icmp sgt i64 %a, 5
				; CHECK: [[C:%.*]] = icmp eq i64 %a, %b
				; CHECK-NOT: [[D:%.*]] = select i1 [[B]], i64 %a, i64 5
				; CHECK-NOT: icmp ne i64 [[D]], %b
				; CHECK-NEXT: br i1 [[C]], label %lor.end, label %land.rhs
				lor.rhs: ; preds = %entry
				%cmp2 = icmp sgt i64 %a, 5
				%select = select i1 %cmp2, i64 %a, i64 5
				%cmp3 = icmp ne i64 %select, %b
				br i1 %cmp3, label %land.rhs, label %lor.end

				land.rhs: ; preds = %lor.rhs
				br label %lor.end

				lor.end: ; preds = %land.rhs, %lor.rhs, %land.lhs.true
				ret void
				}

				; CHECK-LABEL: @idom_zbranch
				define void @idom_zbranch(i64 %a) {
				entry:
				%cmp = icmp sgt i64 %a, 0
				br i1 %cmp, label %lor.end, label %lor.rhs

				; CHECK-LABEL: lor.rhs:
				; CHECK: icmp slt i64 %a, 0
				; CHECK-NOT: icmp eq i64 %a, 0
				lor.rhs: ; preds = %entry
				%cmp2 = icmp slt i64 %a, 0
				br i1 %cmp2, label %land.rhs, label %lor.end

				land.rhs: ; preds = %lor.rhs
				br label %lor.end

				lor.end: ; preds = %land.rhs, %lor.rhs
				ret void
				}

				; CHECK-LABEL: @idom_not_zbranch
				define void @idom_not_zbranch(i32 %a, i32 %b) {
				entry:
				%cmp = icmp sgt i32 %a, 0
				br i1 %cmp, label %return, label %if.end

				; CHECK-LABEL: if.end:
				; CHECK-NOT: [[B:%.*]] = icmp slt i32 %a, 0
				; CHECK: [[C:%.*]] = icmp eq i32 %a, %b
				; CHECK-NOT: [[D:%.*]] = select i1 [[B]], i32 %a, i32 0
				; CHECK-NOT: icmp ne i32 [[D]], %b
				; CHECK-NEXT: br i1 [[C]], label %return, label %if.then3
				if.end: ; preds = %entry
				%cmp1 = icmp slt i32 %a, 0
				%a. = select i1 %cmp1, i32 %a, i32 0
				%cmp2 = icmp ne i32 %a., %b
				br i1 %cmp2, label %if.then3, label %return

				if.then3: ; preds = %if.end
				br label %return

				return: ; preds = %if.end, %entry, %if.then3
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Canonicalize icmp instructions based on dominating conditions.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56172

llvm/trunk/include/llvm/IR/ConstantRange.h

llvm/trunk/lib/IR/ConstantRange.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/trunk/test/Transforms/InstCombine/icmp.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Canonicalize icmp instructions based on dominating conditions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56172

llvm/trunk/include/llvm/IR/ConstantRange.h

llvm/trunk/lib/IR/ConstantRange.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/trunk/test/Transforms/InstCombine/icmp.ll

[InstCombine] Canonicalize icmp instructions based on dominating conditions.
ClosedPublic