This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold nuw left-shifts in `ugt`/`ule` comparisons.
ClosedPublic

Authored by bryant on Oct 24 2016, 10:04 AM.

Details

Summary

This transforms

%a = shl nuw %x, c1
%b = icmp {ugt|ule} %a, c0

into

%b = icmp {ugt|ule} %x, (c0 >> c1)

z3:

(declare-const x (_ BitVec 64))
(declare-const c0 (_ BitVec 64))
(declare-const c1 (_ BitVec 64))

(push)
(assert (= x (bvlshr (bvshl x c1) c1)))  ; nuw
(assert (not (= (bvugt (bvshl x c1) c0)
                (bvugt x
                       (bvlshr c0 c1)))))
(check-sat)
(get-model)
(pop)

(push)
(assert (= x (bvlshr (bvshl x c1) c1)))  ; nuw
(assert (not (= (bvule (bvshl x c1) c0)
                (bvule x
                       (bvlshr c0 c1)))))
(check-sat)
(get-model)
(pop)

Diff Detail

Repository
rL LLVM

Event Timeline

bryant updated this revision to Diff 75607.Oct 24 2016, 10:04 AM
bryant retitled this revision from to [InstCombine] Fold nuw left-shifts in `ugt`/`ule` comparisons..
bryant updated this object.
bryant added reviewers: spatel, RKSimon.
bryant set the repository for this revision to rL LLVM.
bryant added a subscriber: llvm-commits.
spatel edited edge metadata.Oct 24 2016, 11:11 AM

The patch will crash for vectors:
define <2 x i1> @icmp_ugt_16_vec(<2 x i64> %x) {

%c = shl nuw <2 x i64> %x, <i64 16, i64 16>
%d = icmp ugt <2 x i64> %c, <i64 65535, i64 65535>
ret <2 x i1> %d

}

Splat vectors should be handled the same as scalars. You can look at the fold just above yours for a model of how to fix it.

lib/Transforms/InstCombine/InstCombineCompares.cpp
1969–1970 ↗(On Diff #75607)

Nit: start sentence with capital letter.

test/Transforms/InstCombine/icmp-shl-zext-simplify.ll
3–13 ↗(On Diff #75607)
  1. You can simplify the tests in all cases by:

a. Remove the leading zext
b. Add 'nuw' to the shl
c. Remove the trailing zext (just return the i1)

  1. Please use the python script at utils/update_test_checks.py to auto-generate the check lines.
  1. I'd prefer to see tests that check different constant values rather than different type sizes. Ie, we can be pretty sure that InstCombine folds work for arbitrary sizes, so those tests don't add a lot of value, but we need to make sure that the new constant is created correctly for the new fold:
define i1 @icmp_ugt_16trunc(i64 %x) {
  %shl = shl nuw i64 %x, 16
  %cmp = icmp ugt i64 %shl, 65537
  ret i1 %cmp
}
bryant updated this revision to Diff 75631.Oct 24 2016, 12:52 PM
bryant edited edge metadata.
  • Handle the vector case properly.
  • Remove redundant tests.
  • Test vector and non-zero cases.
bryant marked 2 inline comments as done.Oct 24 2016, 12:59 PM
bryant added inline comments.
test/Transforms/InstCombine/icmp-shl-zext-simplify.ll
37 ↗(On Diff #75631)

This trunc bothers me. On X86, I think it generates an extra pand:

.LCPI3_0:
        .short  65535                   # 0xffff
        .short  65535                   # 0xffff
        .short  65535                   # 0xffff
        .short  0                       # 0x0
        .short  65535                   # 0xffff
        .short  65535                   # 0xffff
        .short  65535                   # 0xffff
        .short  0                       # 0x0
        .text
        .globl  icmp_ule_i64x2
        .p2align        4, 0x90
        .type   icmp_ule_i64x2,@function
icmp_ule_i64x2:                         # @icmp_ule_i64x2
        .cfi_startproc
# BB#0:
        pand    .LCPI3_0(%rip), %xmm0      # <=======
        pxor    %xmm1, %xmm1
        pcmpeqd %xmm0, %xmm1
        pshufd  $177, %xmm1, %xmm0      # xmm0 = xmm1[1,0,3,2]
        pand    %xmm1, %xmm0
        retq

The same goes for icmp_ule_64, but no extra X86 op is generated in that case.

bryant updated this revision to Diff 75634.Oct 24 2016, 1:09 PM
  • Handle uge, ult.
  • Prioritize over case that outputs trunc.
spatel added inline comments.Oct 24 2016, 1:32 PM
test/Transforms/InstCombine/icmp-shl-zext-simplify.ll
37 ↗(On Diff #75631)

I noticed that example too and agree it doesn't look like a good fold. We shouldn't be creating illegal types in InstCombine without a fallback in a later pass to make sure the backend is not harmed.

But this is the trunc fold above yours that is firing, right? Ie, this test does not change with your patch.
Please add a vector test that is affected by your patch.

You can add all of the tests with the current output based on trunk, then rebase this patch after that commit, so we'll see just the diffs in the resulting IR.

bryant added inline comments.Oct 24 2016, 2:28 PM
test/Transforms/InstCombine/icmp-shl-zext-simplify.ll
37 ↗(On Diff #75631)

I've already moved this patch above the trunc fold, though.

bryant updated this revision to Diff 75649.Oct 24 2016, 2:32 PM

Add tests to handle uge (icmp_uge_8x2) and ult (icmp_ult_8). Also add vector test directly affected by this patch (icmp_ugt_16x2).

spatel added inline comments.Oct 25 2016, 8:13 AM
test/Transforms/InstCombine/icmp-shl-zext-simplify.ll
37 ↗(On Diff #75631)

Your last update came in while I was writing my last comment, so I hadn't seen it.
Now there's even more reason to add the tests with current trunk output before the code patch because we should document the current behavior that your patch is going to alter. Please do that and rebase.

bryant updated this revision to Diff 75752.Oct 25 2016, 11:52 AM

Rebase onto D25955.

Rebase onto D25955.

Thanks for the preliminary patches.

As you can see from the current trunk transforms, we always canonicalize the compare predicates to ugt/ult. Unless you see a case where that doesn't happen, I think you can adjust this patch to only fire when we have ugt/ult rather than ugt/ule and remove the "pre-transform" step.

bryant updated this revision to Diff 75787.Oct 25 2016, 1:47 PM

Saved a few slocs.

bryant updated this revision to Diff 75793.Oct 25 2016, 2:10 PM

Merge uge/ule with ugt/ult, respectively.

spatel added inline comments.Oct 25 2016, 2:49 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1962–1963 ↗(On Diff #75793)

Why return a ULE icmp here rather than ULT?

bryant added inline comments.Oct 25 2016, 4:42 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1962–1963 ↗(On Diff #75793)

Are you suggesting to emit x u< [(c - 1) >> s] - 1? That just seems messier.

bryant added inline comments.Oct 26 2016, 6:29 AM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1962–1963 ↗(On Diff #75793)

+ 1, not - 1.

spatel added inline comments.Oct 26 2016, 11:27 AM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1955 ↗(On Diff #75793)

">u" should be ">=u"

1961 ↗(On Diff #75793)

I don't think you need to check that the constant is ugt(0) here because:
icmp ult %x, 0
must always be simplified to 'false' ahead of this.
Let me know if I've misunderstood that check.

1962–1963 ↗(On Diff #75793)

If you don't change the predicate, I think it makes the code easier to read (as well as short-circuiting the subsequent canonicalization to ULT).

Could look more like this?

if (Shl->hasNoUnsignedWrap() &&
    (Pred == ICmpInst::ICMP_UGT || Pred == ICmpInst::ICMP_ULT)) {
  Type *CTy = IntegerType::get(Cmp.getContext(), C->getBitWidth());
  if (X->getType()->isVectorTy())
    CTy = VectorType::get(CTy, X->getType()->getVectorNumElements());

  APInt ShiftedC = Pred == ICmpInst::ICMP_ULT ? (*C - 1).lshr(*ShiftAmt) + 1
                                              : C->lshr(*ShiftAmt);
  return new ICmpInst(Pred, X, ConstantInt::get(CTy, ShiftedC));
}
test/Transforms/InstCombine/icmp-shl-zext-simplify.ll
39–42 ↗(On Diff #75793)

Please change the constant in this test so we have more test coverage to show the correct non-zero constant for a ule/ult comparison. Eg, 196608 (0x30000) --> 4 ?

bryant added inline comments.Oct 26 2016, 12:34 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1955 ↗(On Diff #75793)

So, just to recap: This transformation only works for all c when ugt and ule (cf. the z3 in my opening post) and fails for uge:

z3
(push)
(declare-const s (_ BitVec 8))
(declare-const c (_ BitVec 8))
(declare-const x (_ BitVec 8))

(assert (= (bvlshr (bvshl x s) s) x))

(assert (not (= (bvuge (bvshl x s) c) (bvuge x (bvlshr c s)))))
(check-sat)
(get-model)
(pop)

sat
(model 
  (define-fun s () (_ BitVec 8)
    #x10)
  (define-fun x () (_ BitVec 8)
    #x00)
  (define-fun c () (_ BitVec 8)
    #x01)
)

But the original rule can be transformed from ugt to uge:

(x << s) ugt c = x ugt (c >> s)
(x << s) uge (c + 1) = x ugt (c >> s) if c < -1
(x << s) uge (c + 1) = x uge (c >> s) + 1 if c < -1
(x << s) uge c = x uge ((c - 1) >> s) + 1 if c > 0

The exact same holds for ule to ult, but c has to be non-zero.

Maybe I'll add this derivation to the comments.

1961 ↗(On Diff #75793)

It's a safety check that doesn't seem so expensive. If C is zero by some chance for whatever reason, then this transformation is incorrect and really mustn't happen. Are you really sure that I should remove it? Better safe than slow.

Yes, adding the derivation to the comments would be good.

I wasn't really concerned about the speed of the extra check for *C != 0; it just simplified the code to not have the extra check for that half of the transform. Since I've gotten that sort of assumption wrong before, I'm cc'ing David and Eli to see if they have a better idea.

bryant updated this revision to Diff 75939.Oct 26 2016, 1:30 PM
  • replaced non-zero check for c with note in comments.
  • added derivation of ult case in comments.
  • use icmp-shl-nuw.ll instead of icmp-shl-zext-simplify.ll
  • added non-zero constant vector and non-pow2 shift test cases.
efriedma edited edge metadata.Oct 26 2016, 2:21 PM

"X <u 0" should be reliably folded to "false" before you reach this point, if that's the question... InstSimplify always runs before any other InstCombine transform. Maybe add an assert to be on the safe side.

Not sure about "X >u 0" -> "X != 0"; I think that actually happens later, although I guess it doesn't matter in this context.

lib/Transforms/InstCombine/InstCombineCompares.cpp
1963 ↗(On Diff #75939)

"// (X << S) <u (C + 1) = X <u (C >> S) + 1 if C < - 1"

This comment is confusing... when exactly is "C < - 1" true?

bryant added inline comments.Oct 26 2016, 3:23 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1963 ↗(On Diff #75939)

-1 == all ones. Assuming that addition wraps, X <=u C is equivalent to X <=u (C + 1) if C + 1 doesn't wrap, or if C is less than all ones.

bryant added inline comments.Oct 26 2016, 3:24 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1963 ↗(On Diff #75939)

Replace the second <=u with <u.

efriedma added inline comments.Oct 26 2016, 3:25 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1963 ↗(On Diff #75939)

Ah, okay. You could probably write that in a way that's a little more obvious.

bryant added inline comments.Oct 28 2016, 7:05 AM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1963 ↗(On Diff #75939)

C < -1u? C <u -1? C <u ~0u?

efriedma added inline comments.Oct 28 2016, 10:08 AM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1963 ↗(On Diff #75939)

I would probably go with "C <u -1u", but any of those is fine.

bryant updated this revision to Diff 76222.Oct 28 2016, 11:33 AM
bryant edited edge metadata.
  • Comment fixes. -1 => ~0u.
  • Add sanity assertion for ult.
bryant marked 4 inline comments as done.Oct 28 2016, 11:34 AM
bryant added a comment.Nov 1 2016, 9:51 AM

Sanjay: Would you mind landing this if there are no further objections? Thanks in advance. (:

Sanjay: Would you mind landing this if there are no further objections? Thanks in advance. (:

Sure - let's ping Eli to make sure there are no other suggestions, but the code looks good to me.

test/Transforms/InstCombine/icmp-shl-nuw.ll
54–55 ↗(On Diff #76222)

"16x2" --> "12x2" ?

efriedma accepted this revision.Nov 1 2016, 10:14 AM
efriedma edited edge metadata.

Just one minor comment.

lib/Transforms/InstCombine/InstCombineCompares.cpp
1960 ↗(On Diff #76222)

Isn't CTy always equal to "X->getType()"?

This revision is now accepted and ready to land.Nov 1 2016, 10:14 AM
bryant added inline comments.Nov 1 2016, 10:44 AM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1960 ↗(On Diff #76222)

Since we're morphing (x << s) ugt c into x ugt (c >> s), CTy needs to be enough to hold c >> s.

c's type would be sufficient thanks to the right shift.

But unlike other binops, shift nodes place no constraint between the types of shiftee and shift amount (I'm really just inferring from my memory of TargetSelectionDAG.td, so I could be wrong).

So X-getType() might not be wide enough to hold c >> s.

1970 ↗(On Diff #76222)

ShiftC. Ugh.

test/Transforms/InstCombine/icmp-shl-nuw.ll
54–55 ↗(On Diff #76222)

Thanks for catching this.

efriedma added inline comments.Nov 1 2016, 10:48 AM
lib/Transforms/InstCombine/InstCombineCompares.cpp
1960 ↗(On Diff #76222)

In IR the two operands of an shl always have the same type. (The rules change once you get to SelectionDAG.)

bryant updated this revision to Diff 76598.Nov 1 2016, 10:58 AM
bryant edited edge metadata.
  • Re-worded the main comment to flow better.
  • ShiftC => ShiftedC.
  • Fix test case name.
bryant updated this revision to Diff 76607.Nov 1 2016, 11:23 AM

Simplify since ICmp implies identically typed lhs + rhs.

bryant marked 3 inline comments as done.Nov 1 2016, 11:24 AM
bryant added inline comments.
lib/Transforms/InstCombine/InstCombineCompares.cpp
1960 ↗(On Diff #76222)

You're right, just checked LangRef. And on top of that, icmp obviously needs both sides to be the same. Fixed.

spatel accepted this revision.Nov 1 2016, 11:31 AM
spatel edited edge metadata.

Simplify since ICmp implies identically typed lhs + rhs.

Yes - that looks nicer. Unlike the DAG, IR has actual documentation that we can reference. :)
"Both arguments to the ‘shl‘ instruction must be the same integer or vector of integer type."
http://llvm.org/docs/LangRef.html#shl-instruction

Sorry for the earlier bogus suggestion to grab the code from the trunc fold.

This revision was automatically updated to reflect the committed changes.