This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/2
InstCombineCompares.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
5/10
onehot_merge.ll
-
signbit-lshr-and-icmpeq-zero.ll
-
signbit-shl-and-icmpeq-zero.ll

Differential D62818

[InstCombine] Introduce fold for icmp pred (and X, (sh signbit, Y)), 0.
AbandonedPublic

Authored by huihuiz on Jun 3 2019, 10:23 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
efriedma
lebedev.ri

Summary

Fold:

(X & (signbit l>> Y)) ==/!= 0 -> (X << Y) s>=/s< 0
(X & (signbit << Y)) ==/!= 0 -> (X l>> Y) s>=/s< 0

Diff Detail

Repository: rL LLVM

Event Timeline

huihuiz created this revision.Jun 3 2019, 10:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2019, 10:23 AM

For thumb target, this optimization allow generation of more compacted instructions.
run: clang -mcpu=cortex-m0 -target armv6m-none-eabi icmp-shl-and.ll -O2 -S -o t.s

@ %bb.0:                                @ %entry
        subs    r0, r0, #1
        lsls    r1, r0
        cmp     r1, #0
        blt     .LBB0_2
@ %bb.1:                                @ %entry
        mov     r2, r3
.LBB0_2:                                @ %entry
        mov     r0, r2
        bx      lr

Otherwise will generate more instructions with signmask shifting

@ %bb.0:                                @ %entry
        .save   {r4, lr}
        push    {r4, lr}
        subs    r0, r0, #1
        movs    r4, #1
        lsls    r4, r4, #31
        lsrs    r4, r0
        tst     r4, r1
        beq     .LBB0_2
@ %bb.1:                                @ %entry
        mov     r3, r2
.LBB0_2:                                @ %entry
        mov     r0, r3
        pop     {r4, pc}

ARM and thumb2 target allow flexible second operand, for this case test bit instruction with shift. This optimization does not affect performance of generated instructions.
Run: clang -mcpu=cortex-a53 -target armv8-none-musleabi icmp-shl-and.ll -O2 -S -o t.s

With this optimization

@ %bb.0:                                @ %entry
        sub     r0, r0, #1
        lsl     r0, r1, r0
        cmp     r0, #0
        movge   r2, r3
        mov     r0, r2
        bx      lr

Without this optimization:

@ %bb.0:                                @ %entry
        sub     r12, r0, #1
        mov     r0, #-2147483648
        tst     r1, r0, lsr r12
        moveq   r2, r3
        mov     r0, r2
        bx      lr

This looks like a missing backend-level transform, either a generic-one in DAGCombiner, or in ARMISelLowering.cpp.

This fix is not the right thing to do because even if you disable this fold,
you can still receive this 'bad' IR you are trying to avoid here,
and will still end up generating bad ASM.

This revision now requires changes to proceed.Jun 3 2019, 10:54 AM

Though this transform is also bad for X86: https://godbolt.org/z/KFM3gQ
When the C2 << Y isn't being hoisted out of the loop that is of course.

So we're missing an undo fold: https://rise4fun.com/Alive/w25
Not sure if it should be guarded by a TTI hook, i would expect it to be always beneficial.
(that doesn't mind the original fold is always not beneficial though)
I'll try to take a look.

On the instcombine side, one thing worth noting which isn't called out in the commit message is the interaction with other instcombine patterns. In the testcase, note that the final IR actually doesn't contain any mask; instead, it checks icmp slt i32 [[SHL]], 0. Huihui, please update the commit message to make this clear.

It's possible we should also implement the related pattern to transform (x & (signbit >> y)) != 0 to (x << y) < 0, sure.

In terms of whether it's universally profitable, I'm not sure... I guess if somehow "icmp ne X, 0" is free, but "icmp slt X, 0" isn't, it could be an issue, but I don't think that applies to any architecture I can think of.

I'm about to post dagcominer undo-fold, hold on..

In D62818#1528149, @efriedma wrote:

On the instcombine side, one thing worth noting which isn't called out in the commit message is the interaction with other instcombine patterns. In the testcase, note that the final IR actually doesn't contain any mask; instead, it checks icmp slt i32 [[SHL]], 0. Huihui, please update the commit message to make this clear.

It's possible we should also implement the related pattern to transform (x & (signbit >> y)) != 0 to (x << y) < 0, sure.

Yes, now that would be a good patch, +see inline comment.

In terms of whether it's universally profitable, I'm not sure... I guess if somehow "icmp ne X, 0" is free, but "icmp slt X, 0" isn't, it could be an issue, but I don't think that applies to any architecture I can think of.

I think there may or may not bea confusion here. We are in a middle-end here. Other than TTI,
we don't really care about what ever backed/target may find troubling/unprofitable.
We only care about producing most simple IR, that is most suited for further transforms.
That new IR may, or may not, be optimal for any particular target.
If IR is not optimal for back-end, then an opposite transform should be present in backend.

lib/Transforms/InstCombine/InstCombineCompares.cpp
1606–1611 ↗	(On Diff #202749)	There should also be a sibling fold with swapped shift directions

lebedev.ri added inline comments.Jun 3 2019, 3:26 PM

test/Transforms/InstCombine/icmp-shl-and.ll
12–13 ↗	(On Diff #202749)	Hmm, this already should be folding: https://godbolt.org/z/77mvnv I guess the order of folds is wrong.

Diffusion mentioned this in rL362494: [NFC][Codegen] D62818 - also add tests with X being constant.Jun 4 2019, 4:41 AM

lebedev.ri mentioned this in rG2e49e8196dab: [NFC][Codegen] D62818 - also add tests with X being constant.Jun 4 2019, 4:43 AM

And posted: D62871

As for the instcombine side,
i guess i would recommend a new differential,
with actual folds, not this blacklisting.

The other approach could be changing the order of folding. Move foldICmpBinOpEqualityWithConstant to the very beginning of foldICmpInstWithConstant.
foldICmpBinOpEqualityWithConstant has rules to replace (and X, (1 << size(X)-1) != 0) with x s< 0.
Let me know if this approach is more preferable?

In D62818#1529921, @huihuiz wrote:

The other approach could be changing the order of folding. Move foldICmpBinOpEqualityWithConstant to the very beginning of foldICmpInstWithConstant.
foldICmpBinOpEqualityWithConstant has rules to replace (and X, (1 << size(X)-1) != 0) with x s< 0.
Let me know if this approach is more preferable?

You want (1+1*2)*2 = 6 folds: https://rise4fun.com/Alive/Y8Ct

huihuiz updated this revision to Diff 203024.Jun 4 2019, 2:28 PM

huihuiz retitled this revision from [InstCombine] Allow ((X << Y) & SignMask) != 0 to be optimized as (X << Y) s< 0. to [InstCombine] Change order of ICmp fold..

huihuiz edited the summary of this revision. (Show Details)

Yes , changing the order would allow these folds.

(X & (signbit >> Y)) != 0  ->  (X << Y) s< 0
(X & (signbit >> Y)) == 0  ->  (X << Y) >= 0
((X << Y) & signbit) != 0  ->  (X << Y) s< 0
((X << Y) & signbit) == 0  ->  (X << Y) >= 0

lebedev.ri added inline comments.Jun 4 2019, 3:11 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp

1762–1778 ↗

(On Diff #203024)

Eww, this looks too much like backend pattern matching :)
Here you want something more like

// (V0 & (signbit l>> V1)) ==/!= 0 -> (V0 << V1) >=/< 0
// (V0 & (signbit << V1)) ==/!= 0 -> (V0 l>> V1) >=/< 0
Value *V0, *V1, *Shift, *Zero;
ICmpInst::Predicate Pred;
if (match(&Cmp,
          m_ICmp(Pred,
                 m_OneUse(m_c_And(
                     m_CombineAnd(
                         m_CombineAnd(m_Shift(m_SignMask(), m_Value(V1)),
                                      m_Value(Shift)),
                         m_CombineOr(m_Shl(m_Value(), m_Value()),
                                     m_LShr(m_Value(), m_Value()))),
                     m_Value(V0))),
                 m_CombineAnd(m_Zero(), m_Value(Zero)))) &&
    Cmp.isEquality(Pred)) {
  Value *NewShift = cast<Instruction>(Shift)->getOpcode() == Instruction::LShr
                        ? Builder.CreateShl(V0, V1)
                        : Builder.CreateLShr(V0, V1);
  ICmpInst::Predicate NewPred =
      Pred == CmpInst::ICMP_EQ ? CmpInst::ICMP_SGE : CmpInst::ICMP_SLT;
  return new ICmpInst(NewPred, NewShift, Zero);
}

lebedev.ri added inline comments.Jun 4 2019, 3:14 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp
2664–2666 ↗	(On Diff #203024)	I'm not looking forward seeing the fallout of this move. I will be extremely surprised if, while fixing the target problem, this won't expose numerous other fold order issues. Can you instead simply follow the `TODO`, and simply refactor the single interesting fold out of `foldICmpBinOpEqualityWithConstant()` into `foldICmpAndConstant()` i guess?

Test cases in icmp-shift-and-signbit.ll shows the updated fold order can generate better IR.

huihuiz marked 2 inline comments as done.Jun 5 2019, 10:46 PM

Nice, getting closer.
Could you please split this up:

A patch that adds your original motivational testcase that shows that the fold order is wrong.
The move of the // Replace (and X, (1 << size(X)-1) != 0) with x s< 0 fold
A patch with just new test/Transforms/InstCombine/icmp-shift-and-signbit.ll
The fold itself, showing the changes to the check lines

lib/Transforms/InstCombine/InstCombineCompares.cpp
1646 ↗	(On Diff #203284)	Here the codegen is irrelevant. We do this because it results in simpler IR. Not sure if that new comment adds anything useful
1660 ↗	(On Diff #203284)	`C2->negate().isPowerOf2()`
2791–2792 ↗	(On Diff #203284)	Uhm, where did this check that we were comparing with `0`?
test/Transforms/InstCombine/icmp-shift-and-signbit.ll
2 ↗	(On Diff #203284)	Please Move this to a new differential In that same patch, re-add your initial motivational pattern, that shows that fold reordering did something Use `llvm/utils/update_test_checks.py` to generate check lines Rebase this diff ontop of that new patch, so this diff shows how the check lines change
13 ↗	(On Diff #203284)	`select` is not relevant for this pattern, drop it
68 ↗	(On Diff #203284)	You also want a few extra tests: A trivial vector test with `<i32 -2147483648, i32 -2147483648>` and `<i32 0, i32 0>` 3 vector tests with undefs: `<i32 -2147483648, i32 undef, i32 -2147483648>` and `<i32 0, i32 0, i32 0>` `<i32 -2147483648, i32 -2147483648, i32 -2147483648>` and `<i32 0, i32 undef, i32 0>` `<i32 -2147483648, i32 undef, i32 -2147483648>` and `<i32 0, i32 undef, i32 0>` A tests to verify single-use constraints: a test with extra use on `%shr` (should get folded, but not others) a test with extra use on `%and` a test with extra use on `%shr` and `%and`. How to introduce extra uses see e.g. `llvm/test/Transforms/InstCombine/unfold-masked-merge-with-const-mask-scalar.ll`
test/Transforms/InstCombine/pr17827.ll
66 ↗	(On Diff #203284)	These don't look like improvements to me. Looks like that reordering exposes yet another missing fold.

huihuiz mentioned this in D63025: [InstCombine] Add tests to show missing fold opportunity for "icmp and shift" (nfc)..Jun 7 2019, 1:13 PM

huihuiz mentioned this in D63026: [InstCombine] Fold icmp eq/ne (and %x, signbit), 0 -> %x s>=/s< 0 earlier.Jun 7 2019, 2:12 PM

Thank you so much for all the review feedback, really appreciate it! :)

lib/Transforms/InstCombine/InstCombineCompares.cpp
1660 ↗	(On Diff #203284)	Should not call C2->negate() If C2 negate is not power of 2, then calling negate() will replace C2 with C2 negate. C2 should not be modified.
2791–2792 ↗	(On Diff #203284)	What happened was, C being 0, signbit, other number we are ok with 0 if C is signbit, consider test: X & signbit == signbit fold: X & -C == -C -> X > u ~C X & -C != -C -> X <= u ~C and fold: For i32: x >u 2147483647 -> x <s 0 -> true if sign bit set are scheduled before fold: (and X, (1 << size(X)-1) != 0) with x s< 0 if C is other number, SimplifyICmpInst will do its job
test/Transforms/InstCombine/pr17827.ll
66 ↗	(On Diff #203284)	in D63026 I am moving fold ((X & ~7) == 0) --> X < 8 ahead. If X is (BinOp Y, C3), should allow other rules to fold C3 with C2, eg (X >> C3) & C2 != C1 -> (X & (C2 << C3)) != (C1 << C3)

huihuiz mentioned this in D63028: [InstCombine] Add tests for missing fold icmp pred (and X, (sh signbit, Y)), 0..Jun 7 2019, 2:32 PM

D62818 is now split into D63025 , D63026 , D63028 and D62818

More signum, sgn patterns
https://godbolt.org/z/tE00f4

In D62818#1534806, @xbolva00 wrote:

More signum, sgn patterns
https://godbolt.org/z/tE00f4

Hey @xbolva00 , I don't see there is much difference between codegen of x86-clang and x86-gcc.
Let's focus on the missing folds we are trying to resolve here:

(X & (signbit l>> Y)) ==/!= 0 -> (X << Y) >=/< 0
(X & (signbit << Y)) ==/!= 0 -> (X l>> Y) >=/< 0

and fold order issue of

((X << Y) & signbit) ==/!= 0) -> (X << Y) >=/< 0;
(X << Y) & ~C ==/!= 0 -> (X << Y) </>= C+1, C+1 is power of 2;
and
((X << Y) & C) == 0 -> (X & (C >> Y)) == 0.

Oh, i thought i commented on these reviews, apparently not :(
I still see random changes to test coverage (new tests being added) in an non-nfc patches.
Let me rephrase: can you put all the test updates, new tests into *ONE* review, and the rest of the patches should not add new/change existing tests?

Original test cases are added in D63025 . Hopefully would be good coverage :)
D63026 fix fold order issue
this differential introduce new fold for icmp pred (and X, (sh signbit, Y)), 0

Is this the only remaining patch?
I don't think i should review my own code, perhaps @spatel can take a look?

lib/Transforms/InstCombine/InstCombineCompares.cpp
1796 ↗	(On Diff #204436)	I'm not sure why i have added `m_OneUse()` here, it should not be here.

spatel added inline comments.Jun 26 2019, 9:23 AM

lib/Transforms/InstCombine/InstCombineCompares.cpp
1792–1795 ↗	(On Diff #204436)	m_LogicalShift() ?
test/Transforms/InstCombine/signbit-shl-and-icmpeq-zero.ll
180 ↗	(On Diff #204436)	I didn't step through the transforms, but it seems wrong to call this a 'negative test'. This patch must have fired and allowed further simplification?

huihuiz mentioned this in rL364497: [InstCombine][NFCI] Fix test comments..Jun 26 2019, 10:46 PM

huihuiz mentioned this in rG9f69052394a4: [InstCombine][NFCI] Fix test comments..

I simplify the code for pattern matching, more readable.

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 26 2019, 11:30 PM

huihuiz added inline comments.Jun 26 2019, 11:33 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp
1796 ↗	(On Diff #204436)	I agree that m_OneUse() should not be in the pattern matching. V0 might be constant value, which will have more than one use outside of its current function. Actually I added, not you, sorry about that. There is a regression, see test file: test/Transforms/InstCombine/onehot_merge.ll define i1 @foo1_and(i32 %k, i32 %c1, i32 %c2) { bb: %tmp = shl i32 1, %c1 %tmp4 = lshr i32 -2147483648, %c2 %tmp1 = and i32 %tmp, %k %tmp2 = icmp eq i32 %tmp1, 0 %tmp5 = and i32 %tmp4, %k %tmp6 = icmp eq i32 %tmp5, 0 %or = or i1 %tmp2, %tmp6 ret i1 %or } failed to fold (iszero(A&K1) \| iszero(A&K2)) -> (A&(K1\|K2)) != (K1\|K2) , where K1 and K2 are 'one-hot' (only one bit is on). Here K1 is one, K2 is signbit. I am still thinking how to get over this regression.
test/Transforms/InstCombine/signbit-shl-and-icmpeq-zero.ll
180 ↗	(On Diff #204436)	Yes, X being constant is positive case. Fold happened, and allowed further simplification.

lebedev.ri added inline comments.Jun 27 2019, 1:33 AM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1790–1792	`m_Value(V0)` will always match, it's best to swap them.
llvm/test/Transforms/InstCombine/onehot_merge.ll
18	Can you please regenerate the original test?
18–23	I'm not sure what's on the LHS of the diff, but ignoring the instruction count this looks like improvement to me.

lebedev.ri added inline comments.Jun 27 2019, 2:49 AM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1785–1786	// (V0 & (signbit l>> V1)) ==/!= 0 -> (V0 << V1) s>=/s< 0 // (V0 & (signbit << V1)) ==/!= 0 -> (V0 l>> V1) s>=/s< 0
llvm/test/Transforms/InstCombine/onehot_merge.ll
18	AH, you also want to str-replace `%tmp` with `%t`, it confuses the update script likely.

spatel added inline comments.Jun 27 2019, 7:25 AM

llvm/test/Transforms/InstCombine/onehot_merge.ll
18	rL364546

Rebased patch, and addressed review comments.

llvm/test/Transforms/InstCombine/onehot_merge.ll
18–23	Actually we missed the fold for ((k & ( 1 l<< C1 )) == 0) \|\| ((k & ( signbit l>> C2 )) == 0) --> ((k & (( 1 l<< C1 ) \|\| ( signbit l>> C2 ))) != 0)

lebedev.ri added inline comments.Jun 27 2019, 11:21 AM

llvm/test/Transforms/InstCombine/onehot_merge.ll
18–23	Thanks for the analysis! @spatel does this fall into the nowadays reasoning that we shouldn't be doing too much folds into bitmath here in instcombine? I'm almost tempted to say that this isn't a regression, but the original fold that now no longer happens should be removed instead.

huihuiz added a child revision: D63903: [InstCombine][NFCI] Update test cases in onehot_merge.ll.Jun 27 2019, 4:10 PM

for onehot_merge.ll
mathematically speaking

(signbit l>> C)

is equivalent to

(one l<< (bitwidth - C - 1))

In D63903, I update the test input, so that we are still checking fold for 'or' of ICmps and 'and' of ICmps.

spatel added inline comments.Jun 28 2019, 10:58 AM

llvm/test/Transforms/InstCombine/onehot_merge.ll
18–23	If we say that the longer IR sequence is more canonical, then we'd want to add a transform to create that longer sequence starting from the shorter sequence. Are we willing to do that to improve analysis in IR? As a practical matter, we probably also want to look at asm output for the alternatives on a few targets to see how much backend logic is required to do/undo this.

spatel added inline comments.Jun 28 2019, 11:05 AM

llvm/test/Transforms/InstCombine/onehot_merge.ll
18–23	Sorry - I haven't followed this patch and its friends closely; scrolling back through the comments, I think the backend questions are covered by D62871.

lebedev.ri mentioned this in D63829: [InstCombine] Shift amount reassociation in bittest (PR42399).Jul 1 2019, 11:58 AM

This isn't specific to sign bit, the more general pattern is https://rise4fun.com/Alive/2zpl
I'm apparently working on it..

Diffusion mentioned this in rL365056: [NFC][InstCombine] onehot_merge.ll: add last few tests in the state they….Jul 3 2019, 9:50 AM

lebedev.ri mentioned this in rG826db453d1fc: [NFC][InstCombine] onehot_merge.ll: add last few tests in the state they….Jul 3 2019, 9:50 AM

lebedev.ri added inline comments.Jul 3 2019, 1:11 PM

llvm/test/Transforms/InstCombine/onehot_merge.ll
106–107	Looks like to support this pattern, `InstCombiner::foldAndOrOfICmpsOfAndWithPow2()` will need to be generalized.

huihuiz marked 2 inline comments as done.Jul 3 2019, 10:26 PM

huihuiz added inline comments.

llvm/test/Transforms/InstCombine/onehot_merge.ll
106–107	I am looking into this, hold on.

Generalize InstCombiner::foldAndOrOfICmpsOfAndWithPow2() in D64275

huihuiz added a child revision: D64275: [InstCombine] Generalize InstCombiner::foldAndOrOfICmpsOfAndWithPow2()..Jul 5 2019, 8:55 PM

Diffusion mentioned this in rL366955: [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold.Jul 24 2019, 3:58 PM

lebedev.ri mentioned this in rG017e272c3add: [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold.Jul 24 2019, 3:59 PM

lebedev.ri requested changes to this revision.Aug 1 2019, 3:09 PM

This revision now requires changes to proceed.Aug 1 2019, 3:09 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:43 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:43 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

huihuiz abandoned this revision.Jan 13 2023, 9:23 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

17 lines

test/

Transforms/

InstCombine/

onehot_merge.ll

44 lines

signbit-lshr-and-icmpeq-zero.ll

39 lines

signbit-shl-and-icmpeq-zero.ll

44 lines

Diff 206784

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 1,776 Lines • ▼ Show 20 Lines	if (ExactLogBase2 != -1 && DL.isLegalInteger(ExactLogBase2 + 1)) {
NTy = VectorType::get(NTy, And->getType()->getVectorNumElements());		NTy = VectorType::get(NTy, And->getType()->getVectorNumElements());
Value *Trunc = Builder.CreateTrunc(X, NTy);		Value *Trunc = Builder.CreateTrunc(X, NTy);
auto NewPred = Cmp.getPredicate() == CmpInst::ICMP_EQ ? CmpInst::ICMP_SGE		auto NewPred = Cmp.getPredicate() == CmpInst::ICMP_EQ ? CmpInst::ICMP_SGE
: CmpInst::ICMP_SLT;		: CmpInst::ICMP_SLT;
return new ICmpInst(NewPred, Trunc, Constant::getNullValue(NTy));		return new ICmpInst(NewPred, Trunc, Constant::getNullValue(NTy));
}		}
}		}

		// (V0 & (signbit l>> V1)) ==/!= 0 -> (V0 << V1) >=/< 0
		// (V0 & (signbit << V1)) ==/!= 0 -> (V0 l>> V1) >=/< 0
		lebedev.riUnsubmitted Done Reply Inline Actions // (V0 & (signbit l>> V1)) ==/!= 0 -> (V0 << V1) s>=/s< 0 // (V0 & (signbit << V1)) ==/!= 0 -> (V0 l>> V1) s>=/s< 0 lebedev.ri: ``` // (V0 & (signbit l>> V1)) ==/!= 0 -> (V0 << V1) s>=/s< 0 // (V0 & (signbit << V1)) ==/!
		Value V0, V1, *Shift;
		if (C.isNullValue() &&
		match(And, m_OneUse(m_c_And(
		m_Value(V0),
		m_CombineAnd(m_LogicalShift(m_SignMask(), m_Value(V1)),
		m_Value(Shift)))))) {
		lebedev.riUnsubmitted Done Reply Inline Actions `m_Value(V0)` will always match, it's best to swap them. lebedev.ri: `m_Value(V0)` will always match, it's best to swap them.
		Value *NewShift = cast<Instruction>(Shift)->getOpcode() == Instruction::LShr
		? Builder.CreateShl(V0, V1)
		: Builder.CreateLShr(V0, V1);
		return new ICmpInst(Cmp.getPredicate() == CmpInst::ICMP_EQ
		? ICmpInst::ICMP_SGE
		: ICmpInst::ICMP_SLT,
		NewShift, Cmp.getOperand(1));
		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (or X, Y), C.		/// Fold icmp (or X, Y), C.
Instruction InstCombiner::foldICmpOrConstant(ICmpInst &Cmp, BinaryOperator Or,		Instruction InstCombiner::foldICmpOrConstant(ICmpInst &Cmp, BinaryOperator Or,
const APInt &C) {		const APInt &C) {
ICmpInst::Predicate Pred = Cmp.getPredicate();		ICmpInst::Predicate Pred = Cmp.getPredicate();
if (C.isOneValue()) {		if (C.isOneValue()) {
▲ Show 20 Lines • Show All 3,868 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/onehot_merge.ll

Show All 9 Lines	bb:
%tmp2 = icmp eq i32 %tmp1, 0		%tmp2 = icmp eq i32 %tmp1, 0
%tmp5 = and i32 8, %k		%tmp5 = and i32 8, %k
%tmp6 = icmp eq i32 %tmp5, 0		%tmp6 = icmp eq i32 %tmp5, 0
%or = or i1 %tmp2, %tmp6		%or = or i1 %tmp2, %tmp6
ret i1 %or		ret i1 %or
}		}

;CHECK: @foo1_and		;CHECK: @foo1_and
;CHECK: shl i32 1, %c1		;CHECK: [[TMP:%.*]] = shl i32 1, %c1
lebedev.riUnsubmitted Done Reply Inline Actions Can you please regenerate the original test? lebedev.ri: Can you please regenerate the original test?
lebedev.riUnsubmitted Done Reply Inline Actions AH, you also want to str-replace `%tmp` with `%t`, it confuses the update script likely. lebedev.ri: AH, you also want to str-replace `%tmp` with `%t`, it confuses the update script likely.
spatelUnsubmitted Done Reply Inline Actions rL364546 spatel: rL364546
;CHECK-NEXT: lshr i32 -2147483648, %c2		;CHECK-NEXT: [[TMP1:%.*]] = and i32 [[TMP]], %k
;CHECK-NEXT: or i32		;CHECK-NEXT: icmp eq i32 [[TMP1]], 0
;CHECK-NEXT: and i32		;CHECK-NEXT: [[TMP2:%.*]] = shl i32 %k, %c2
;CHECK-NEXT: icmp ne i32 %1, %0		;CHECK-NEXT: icmp sgt i32 [[TMP2]], -1
		;CHECK-NEXT: or i1
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm not sure what's on the LHS of the diff, but ignoring the instruction count this looks like improvement to me. lebedev.ri: I'm not sure what's on the LHS of the diff, but ignoring the instruction count this looks like…
		huihuizAuthorUnsubmitted Done Reply Inline Actions Actually we missed the fold for ((k & ( 1 l<< C1 )) == 0) \|\| ((k & ( signbit l>> C2 )) == 0) --> ((k & (( 1 l<< C1 ) \|\| ( signbit l>> C2 ))) != 0) huihuiz: Actually we missed the fold for ``` ((k & ( 1 l<< C1 )) == 0) \|\| ((k & ( signbit l>> C2 )) ==…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Thanks for the analysis! @spatel does this fall into the nowadays reasoning that we shouldn't be doing too much folds into bitmath here in instcombine? I'm almost tempted to say that this isn't a regression, but the original fold that now no longer happens should be removed instead. lebedev.ri: Thanks for the analysis! @spatel does this fall into the nowadays reasoning that we shouldn't…
		spatelUnsubmitted Not Done Reply Inline Actions If we say that the longer IR sequence is more canonical, then we'd want to add a transform to create that longer sequence starting from the shorter sequence. Are we willing to do that to improve analysis in IR? As a practical matter, we probably also want to look at asm output for the alternatives on a few targets to see how much backend logic is required to do/undo this. spatel: If we say that the longer IR sequence is more canonical, then we'd want to add a transform to…
		spatelUnsubmitted Not Done Reply Inline Actions Sorry - I haven't followed this patch and its friends closely; scrolling back through the comments, I think the backend questions are covered by D62871. spatel: Sorry - I haven't followed this patch and its friends closely; scrolling back through the…
;CHECK: ret		;CHECK: ret
define i1 @foo1_and(i32 %k, i32 %c1, i32 %c2) {		define i1 @foo1_and(i32 %k, i32 %c1, i32 %c2) {
bb:		bb:
%tmp = shl i32 1, %c1		%tmp = shl i32 1, %c1
%tmp4 = lshr i32 -2147483648, %c2		%tmp4 = lshr i32 -2147483648, %c2
%tmp1 = and i32 %tmp, %k		%tmp1 = and i32 %tmp, %k
%tmp2 = icmp eq i32 %tmp1, 0		%tmp2 = icmp eq i32 %tmp1, 0
%tmp5 = and i32 %tmp4, %k		%tmp5 = and i32 %tmp4, %k
%tmp6 = icmp eq i32 %tmp5, 0		%tmp6 = icmp eq i32 %tmp5, 0
%or = or i1 %tmp2, %tmp6		%or = or i1 %tmp2, %tmp6
ret i1 %or		ret i1 %or
}		}

; Same as above but with operands commuted one of the ands, but not the other.		; Same as above but with operands commuted one of the ands, but not the other.
define i1 @foo1_and_commuted(i32 %k, i32 %c1, i32 %c2) {		define i1 @foo1_and_commuted(i32 %k, i32 %c1, i32 %c2) {
; CHECK-LABEL: @foo1_and_commuted(		; CHECK-LABEL: @foo1_and_commuted(
; CHECK-NEXT: [[K2:%.]] = mul i32 [[K:%.]], [[K]]		; CHECK-NEXT: [[K2:%.]] = mul i32 [[K:%.]], [[K]]
; CHECK-NEXT: [[TMP:%.]] = shl i32 1, [[C1:%.]]		; CHECK-NEXT: [[TMP:%.]] = shl i32 1, [[C1:%.]]
; CHECK-NEXT: [[TMP4:%.]] = lshr i32 -2147483648, [[C2:%.]]		; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[K2]], [[TMP]]
; CHECK-NEXT: [[TMP0:%.*]] = or i32 [[TMP]], [[TMP4]]		; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[TMP1]], 0
; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[K2]], [[TMP0]]		; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[K2]], [[C2:%.]]
; CHECK-NEXT: [[TMP2:%.*]] = icmp ne i32 [[TMP1]], [[TMP0]]		; CHECK-NEXT: [[TMP6:%.*]] = icmp sgt i32 [[TMP1]], -1
; CHECK-NEXT: ret i1 [[TMP2]]		; CHECK-NEXT: [[OR:%.*]] = or i1 [[TMP2]], [[TMP6]]
		; CHECK-NEXT: ret i1 [[OR]]
;		;
%k2 = mul i32 %k, %k ; to trick the complexity sorting		%k2 = mul i32 %k, %k ; to trick the complexity sorting
%tmp = shl i32 1, %c1		%tmp = shl i32 1, %c1
%tmp4 = lshr i32 -2147483648, %c2		%tmp4 = lshr i32 -2147483648, %c2
%tmp1 = and i32 %k2, %tmp		%tmp1 = and i32 %k2, %tmp
%tmp2 = icmp eq i32 %tmp1, 0		%tmp2 = icmp eq i32 %tmp1, 0
%tmp5 = and i32 %tmp4, %k2		%tmp5 = and i32 %tmp4, %k2
%tmp6 = icmp eq i32 %tmp5, 0		%tmp6 = icmp eq i32 %tmp5, 0
Show All 13 Lines	;
%tmp6 = icmp ne i32 %tmp5, 0		%tmp6 = icmp ne i32 %tmp5, 0
%or = and i1 %tmp2, %tmp6		%or = and i1 %tmp2, %tmp6
ret i1 %or		ret i1 %or
}		}

define i1 @foo1_or(i32 %k, i32 %c1, i32 %c2) {		define i1 @foo1_or(i32 %k, i32 %c1, i32 %c2) {
; CHECK-LABEL: @foo1_or(		; CHECK-LABEL: @foo1_or(
; CHECK-NEXT: [[TMP:%.]] = shl i32 1, [[C1:%.]]		; CHECK-NEXT: [[TMP:%.]] = shl i32 1, [[C1:%.]]
; CHECK-NEXT: [[TMP4:%.]] = lshr i32 -2147483648, [[C2:%.]]		; CHECK-NEXT: [[TMP1:%.]] = and i32 [[TMP]], [[K:%.]]
; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[TMP]], [[TMP4]]		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne i32 [[TMP1]], 0
; CHECK-NEXT: [[TMP2:%.]] = and i32 [[TMP1]], [[K:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[K]], [[C2:%.]]
; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[TMP2]], [[TMP1]]		; CHECK-NEXT: [[TMP6:%.*]] = icmp slt i32 [[TMP1]], 0
; CHECK-NEXT: ret i1 [[TMP3]]		; CHECK-NEXT: [[OR:%.*]] = and i1 [[TMP2]], [[TMP6]]
		; CHECK-NEXT: ret i1 [[OR]]
;		;
%tmp = shl i32 1, %c1		%tmp = shl i32 1, %c1
%tmp4 = lshr i32 -2147483648, %c2		%tmp4 = lshr i32 -2147483648, %c2
%tmp1 = and i32 %tmp, %k		%tmp1 = and i32 %tmp, %k
%tmp2 = icmp ne i32 %tmp1, 0		%tmp2 = icmp ne i32 %tmp1, 0
%tmp5 = and i32 %tmp4, %k		%tmp5 = and i32 %tmp4, %k
%tmp6 = icmp ne i32 %tmp5, 0		%tmp6 = icmp ne i32 %tmp5, 0
%or = and i1 %tmp2, %tmp6		%or = and i1 %tmp2, %tmp6
ret i1 %or		ret i1 %or
}		}

; Same as above but with operands commuted one of the ors, but not the other.		; Same as above but with operands commuted one of the ors, but not the other.
define i1 @foo1_or_commuted(i32 %k, i32 %c1, i32 %c2) {		define i1 @foo1_or_commuted(i32 %k, i32 %c1, i32 %c2) {
; CHECK-LABEL: @foo1_or_commuted(		; CHECK-LABEL: @foo1_or_commuted(
; CHECK-NEXT: [[K2:%.]] = mul i32 [[K:%.]], [[K]]		; CHECK-NEXT: [[K2:%.]] = mul i32 [[K:%.]], [[K]]
; CHECK-NEXT: [[TMP:%.]] = shl i32 1, [[C1:%.]]		; CHECK-NEXT: [[TMP:%.]] = shl i32 1, [[C1:%.]]
; CHECK-NEXT: [[TMP4:%.]] = lshr i32 -2147483648, [[C2:%.]]		; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[K2]], [[TMP]]
; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[TMP]], [[TMP4]]		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne i32 [[TMP1]], 0
; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[K2]], [[TMP1]]		; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[K2]], [[C2:%.]]
; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[TMP2]], [[TMP1]]		; CHECK-NEXT: [[TMP6:%.*]] = icmp slt i32 [[TMP1]], 0
; CHECK-NEXT: ret i1 [[TMP3]]		; CHECK-NEXT: [[OR:%.*]] = and i1 [[TMP2]], [[TMP6]]
		; CHECK-NEXT: ret i1 [[OR]]
;		;
%k2 = mul i32 %k, %k ; to trick the complexity sorting		%k2 = mul i32 %k, %k ; to trick the complexity sorting
%tmp = shl i32 1, %c1		%tmp = shl i32 1, %c1
		lebedev.riUnsubmitted Not Done Reply Inline Actions Looks like to support this pattern, `InstCombiner::foldAndOrOfICmpsOfAndWithPow2()` will need to be generalized. lebedev.ri: Looks like to support this pattern, `InstCombiner::foldAndOrOfICmpsOfAndWithPow2()` will need…
		huihuizAuthorUnsubmitted Done Reply Inline Actions I am looking into this, hold on. huihuiz: I am looking into this, hold on.
%tmp4 = lshr i32 -2147483648, %c2		%tmp4 = lshr i32 -2147483648, %c2
%tmp1 = and i32 %k2, %tmp		%tmp1 = and i32 %k2, %tmp
%tmp2 = icmp ne i32 %tmp1, 0		%tmp2 = icmp ne i32 %tmp1, 0
%tmp5 = and i32 %tmp4, %k2		%tmp5 = and i32 %tmp4, %k2
%tmp6 = icmp ne i32 %tmp5, 0		%tmp6 = icmp ne i32 %tmp5, 0
%or = and i1 %tmp2, %tmp6		%or = and i1 %tmp2, %tmp6
ret i1 %or		ret i1 %or
}		}

llvm/test/Transforms/InstCombine/signbit-lshr-and-icmpeq-zero.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt %s -instcombine -S \| FileCheck %s		; RUN: opt %s -instcombine -S \| FileCheck %s

; For pattern (X & (signbit l>> Y)) ==/!= 0		; For pattern (X & (signbit l>> Y)) ==/!= 0
; it may be optimal to fold into (X << Y) >=/< 0		; it may be optimal to fold into (X << Y) >=/< 0

; Scalar tests		; Scalar tests

define i1 @scalar_i8_signbit_lshr_and_eq(i8 %x, i8 %y) {		define i1 @scalar_i8_signbit_lshr_and_eq(i8 %x, i8 %y) {
; CHECK-LABEL: @scalar_i8_signbit_lshr_and_eq(		; CHECK-LABEL: @scalar_i8_signbit_lshr_and_eq(
; CHECK-NEXT: [[LSHR:%.]] = lshr i8 -128, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i8 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i8 [[LSHR]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt i8 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i8 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%lshr = lshr i8 128, %y		%lshr = lshr i8 128, %y
%and = and i8 %lshr, %x		%and = and i8 %lshr, %x
%r = icmp eq i8 %and, 0		%r = icmp eq i8 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i16_signbit_lshr_and_eq(i16 %x, i16 %y) {		define i1 @scalar_i16_signbit_lshr_and_eq(i16 %x, i16 %y) {
; CHECK-LABEL: @scalar_i16_signbit_lshr_and_eq(		; CHECK-LABEL: @scalar_i16_signbit_lshr_and_eq(
; CHECK-NEXT: [[LSHR:%.]] = lshr i16 -32768, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i16 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i16 [[LSHR]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt i16 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i16 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%lshr = lshr i16 32768, %y		%lshr = lshr i16 32768, %y
%and = and i16 %lshr, %x		%and = and i16 %lshr, %x
%r = icmp eq i16 %and, 0		%r = icmp eq i16 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i32_signbit_lshr_and_eq(i32 %x, i32 %y) {		define i1 @scalar_i32_signbit_lshr_and_eq(i32 %x, i32 %y) {
; CHECK-LABEL: @scalar_i32_signbit_lshr_and_eq(		; CHECK-LABEL: @scalar_i32_signbit_lshr_and_eq(
; CHECK-NEXT: [[LSHR:%.]] = lshr i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i32 [[LSHR]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i32 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%lshr = lshr i32 2147483648, %y		%lshr = lshr i32 2147483648, %y
%and = and i32 %lshr, %x		%and = and i32 %lshr, %x
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i64_signbit_lshr_and_eq(i64 %x, i64 %y) {		define i1 @scalar_i64_signbit_lshr_and_eq(i64 %x, i64 %y) {
; CHECK-LABEL: @scalar_i64_signbit_lshr_and_eq(		; CHECK-LABEL: @scalar_i64_signbit_lshr_and_eq(
; CHECK-NEXT: [[LSHR:%.]] = lshr i64 -9223372036854775808, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i64 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i64 [[LSHR]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt i64 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i64 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%lshr = lshr i64 9223372036854775808, %y		%lshr = lshr i64 9223372036854775808, %y
%and = and i64 %lshr, %x		%and = and i64 %lshr, %x
%r = icmp eq i64 %and, 0		%r = icmp eq i64 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i32_signbit_lshr_and_ne(i32 %x, i32 %y) {		define i1 @scalar_i32_signbit_lshr_and_ne(i32 %x, i32 %y) {
; CHECK-LABEL: @scalar_i32_signbit_lshr_and_ne(		; CHECK-LABEL: @scalar_i32_signbit_lshr_and_ne(
; CHECK-NEXT: [[LSHR:%.]] = lshr i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i32 [[LSHR]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp slt i32 [[TMP1]], 0
; CHECK-NEXT: [[R:%.*]] = icmp ne i32 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%lshr = lshr i32 2147483648, %y		%lshr = lshr i32 2147483648, %y
%and = and i32 %lshr, %x		%and = and i32 %lshr, %x
%r = icmp ne i32 %and, 0 ; check 'ne' predicate		%r = icmp ne i32 %and, 0 ; check 'ne' predicate
ret i1 %r		ret i1 %r
}		}

; Vector tests		; Vector tests

define <4 x i1> @vec_4xi32_signbit_lshr_and_eq(<4 x i32> %x, <4 x i32> %y) {		define <4 x i1> @vec_4xi32_signbit_lshr_and_eq(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: @vec_4xi32_signbit_lshr_and_eq(		; CHECK-LABEL: @vec_4xi32_signbit_lshr_and_eq(
; CHECK-NEXT: [[LSHR:%.]] = lshr <4 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl <4 x i32> [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and <4 x i32> [[LSHR]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt <4 x i32> [[TMP1]], <i32 -1, i32 -1, i32 -1, i32 -1>
; CHECK-NEXT: [[R:%.*]] = icmp eq <4 x i32> [[AND]], zeroinitializer
; CHECK-NEXT: ret <4 x i1> [[R]]		; CHECK-NEXT: ret <4 x i1> [[R]]
;		;
%lshr = lshr <4 x i32> <i32 2147483648, i32 2147483648, i32 2147483648, i32 2147483648>, %y		%lshr = lshr <4 x i32> <i32 2147483648, i32 2147483648, i32 2147483648, i32 2147483648>, %y
%and = and <4 x i32> %lshr, %x		%and = and <4 x i32> %lshr, %x
%r = icmp eq <4 x i32> %and, <i32 0, i32 0, i32 0, i32 0>		%r = icmp eq <4 x i32> %and, <i32 0, i32 0, i32 0, i32 0>
ret <4 x i1> %r		ret <4 x i1> %r
}		}

Show All 39 Lines
; Extra use		; Extra use

; Fold happened		; Fold happened
define i1 @scalar_i32_signbit_lshr_and_eq_extra_use_lshr(i32 %x, i32 %y, i32 %z, i32* %p) {		define i1 @scalar_i32_signbit_lshr_and_eq_extra_use_lshr(i32 %x, i32 %y, i32 %z, i32* %p) {
; CHECK-LABEL: @scalar_i32_signbit_lshr_and_eq_extra_use_lshr(		; CHECK-LABEL: @scalar_i32_signbit_lshr_and_eq_extra_use_lshr(
; CHECK-NEXT: [[LSHR:%.]] = lshr i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: [[LSHR:%.]] = lshr i32 -2147483648, [[Y:%.]]
; CHECK-NEXT: [[XOR:%.]] = xor i32 [[LSHR]], [[Z:%.]]		; CHECK-NEXT: [[XOR:%.]] = xor i32 [[LSHR]], [[Z:%.]]
; CHECK-NEXT: store i32 [[XOR]], i32* [[P:%.*]], align 4		; CHECK-NEXT: store i32 [[XOR]], i32* [[P:%.*]], align 4
; CHECK-NEXT: [[AND:%.]] = and i32 [[LSHR]], [[X:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[X:%.]], [[Y]]
; CHECK-NEXT: [[R:%.*]] = icmp eq i32 [[AND]], 0		; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[TMP1]], -1
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%lshr = lshr i32 2147483648, %y		%lshr = lshr i32 2147483648, %y
%xor = xor i32 %lshr, %z ; extra use of lshr		%xor = xor i32 %lshr, %z ; extra use of lshr
store i32 %xor, i32* %p		store i32 %xor, i32* %p
%and = and i32 %lshr, %x		%and = and i32 %lshr, %x
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
Show All 36 Lines	;
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
}		}

; X is constant		; X is constant

define i1 @scalar_i32_signbit_lshr_and_eq_X_is_constant1(i32 %y) {		define i1 @scalar_i32_signbit_lshr_and_eq_X_is_constant1(i32 %y) {
; CHECK-LABEL: @scalar_i32_signbit_lshr_and_eq_X_is_constant1(		; CHECK-LABEL: @scalar_i32_signbit_lshr_and_eq_X_is_constant1(
; CHECK-NEXT: [[LSHR:%.]] = lshr i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i32 12345, [[Y:%.]]
; CHECK-NEXT: [[AND:%.*]] = and i32 [[LSHR]], 12345		; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i32 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%lshr = lshr i32 2147483648, %y		%lshr = lshr i32 2147483648, %y
%and = and i32 %lshr, 12345		%and = and i32 %lshr, 12345
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
}		}

▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/signbit-shl-and-icmpeq-zero.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt %s -instcombine -S \| FileCheck %s		; RUN: opt %s -instcombine -S \| FileCheck %s

; For pattern (X & (signbit << Y)) ==/!= 0		; For pattern (X & (signbit << Y)) ==/!= 0
; it may be optimal to fold into (X l>> Y) >=/< 0		; it may be optimal to fold into (X l>> Y) >=/< 0

; Scalar tests		; Scalar tests

define i1 @scalar_i8_signbit_shl_and_eq(i8 %x, i8 %y) {		define i1 @scalar_i8_signbit_shl_and_eq(i8 %x, i8 %y) {
; CHECK-LABEL: @scalar_i8_signbit_shl_and_eq(		; CHECK-LABEL: @scalar_i8_signbit_shl_and_eq(
; CHECK-NEXT: [[SHL:%.]] = shl i8 -128, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr i8 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i8 [[SHL]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt i8 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i8 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%shl = shl i8 128, %y		%shl = shl i8 128, %y
%and = and i8 %shl, %x		%and = and i8 %shl, %x
%r = icmp eq i8 %and, 0		%r = icmp eq i8 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i16_signbit_shl_and_eq(i16 %x, i16 %y) {		define i1 @scalar_i16_signbit_shl_and_eq(i16 %x, i16 %y) {
; CHECK-LABEL: @scalar_i16_signbit_shl_and_eq(		; CHECK-LABEL: @scalar_i16_signbit_shl_and_eq(
; CHECK-NEXT: [[SHL:%.]] = shl i16 -32768, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr i16 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i16 [[SHL]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt i16 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i16 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%shl = shl i16 32768, %y		%shl = shl i16 32768, %y
%and = and i16 %shl, %x		%and = and i16 %shl, %x
%r = icmp eq i16 %and, 0		%r = icmp eq i16 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i32_signbit_shl_and_eq(i32 %x, i32 %y) {		define i1 @scalar_i32_signbit_shl_and_eq(i32 %x, i32 %y) {
; CHECK-LABEL: @scalar_i32_signbit_shl_and_eq(		; CHECK-LABEL: @scalar_i32_signbit_shl_and_eq(
; CHECK-NEXT: [[SHL:%.]] = shl i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i32 [[SHL]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i32 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%shl = shl i32 2147483648, %y		%shl = shl i32 2147483648, %y
%and = and i32 %shl, %x		%and = and i32 %shl, %x
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i64_signbit_shl_and_eq(i64 %x, i64 %y) {		define i1 @scalar_i64_signbit_shl_and_eq(i64 %x, i64 %y) {
; CHECK-LABEL: @scalar_i64_signbit_shl_and_eq(		; CHECK-LABEL: @scalar_i64_signbit_shl_and_eq(
; CHECK-NEXT: [[SHL:%.]] = shl i64 -9223372036854775808, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr i64 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i64 [[SHL]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt i64 [[TMP1]], -1
; CHECK-NEXT: [[R:%.*]] = icmp eq i64 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%shl = shl i64 9223372036854775808, %y		%shl = shl i64 9223372036854775808, %y
%and = and i64 %shl, %x		%and = and i64 %shl, %x
%r = icmp eq i64 %and, 0		%r = icmp eq i64 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i32_signbit_shl_and_ne(i32 %x, i32 %y) {		define i1 @scalar_i32_signbit_shl_and_ne(i32 %x, i32 %y) {
; CHECK-LABEL: @scalar_i32_signbit_shl_and_ne(		; CHECK-LABEL: @scalar_i32_signbit_shl_and_ne(
; CHECK-NEXT: [[SHL:%.]] = shl i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and i32 [[SHL]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp slt i32 [[TMP1]], 0
; CHECK-NEXT: [[R:%.*]] = icmp ne i32 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%shl = shl i32 2147483648, %y		%shl = shl i32 2147483648, %y
%and = and i32 %shl, %x		%and = and i32 %shl, %x
%r = icmp ne i32 %and, 0 ; check 'ne' predicate		%r = icmp ne i32 %and, 0 ; check 'ne' predicate
ret i1 %r		ret i1 %r
}		}

; Vector tests		; Vector tests

define <4 x i1> @vec_4xi32_signbit_shl_and_eq(<4 x i32> %x, <4 x i32> %y) {		define <4 x i1> @vec_4xi32_signbit_shl_and_eq(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: @vec_4xi32_signbit_shl_and_eq(		; CHECK-LABEL: @vec_4xi32_signbit_shl_and_eq(
; CHECK-NEXT: [[SHL:%.]] = shl <4 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr <4 x i32> [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[AND:%.]] = and <4 x i32> [[SHL]], [[X:%.]]		; CHECK-NEXT: [[R:%.*]] = icmp sgt <4 x i32> [[TMP1]], <i32 -1, i32 -1, i32 -1, i32 -1>
; CHECK-NEXT: [[R:%.*]] = icmp eq <4 x i32> [[AND]], zeroinitializer
; CHECK-NEXT: ret <4 x i1> [[R]]		; CHECK-NEXT: ret <4 x i1> [[R]]
;		;
%shl = shl <4 x i32> <i32 2147483648, i32 2147483648, i32 2147483648, i32 2147483648>, %y		%shl = shl <4 x i32> <i32 2147483648, i32 2147483648, i32 2147483648, i32 2147483648>, %y
%and = and <4 x i32> %shl, %x		%and = and <4 x i32> %shl, %x
%r = icmp eq <4 x i32> %and, <i32 0, i32 0, i32 0, i32 0>		%r = icmp eq <4 x i32> %and, <i32 0, i32 0, i32 0, i32 0>
ret <4 x i1> %r		ret <4 x i1> %r
}		}

Show All 39 Lines
; Extra use		; Extra use

; Fold happened		; Fold happened
define i1 @scalar_i32_signbit_shl_and_eq_extra_use_shl(i32 %x, i32 %y, i32 %z, i32* %p) {		define i1 @scalar_i32_signbit_shl_and_eq_extra_use_shl(i32 %x, i32 %y, i32 %z, i32* %p) {
; CHECK-LABEL: @scalar_i32_signbit_shl_and_eq_extra_use_shl(		; CHECK-LABEL: @scalar_i32_signbit_shl_and_eq_extra_use_shl(
; CHECK-NEXT: [[SHL:%.]] = shl i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: [[SHL:%.]] = shl i32 -2147483648, [[Y:%.]]
; CHECK-NEXT: [[XOR:%.]] = xor i32 [[SHL]], [[Z:%.]]		; CHECK-NEXT: [[XOR:%.]] = xor i32 [[SHL]], [[Z:%.]]
; CHECK-NEXT: store i32 [[XOR]], i32* [[P:%.*]], align 4		; CHECK-NEXT: store i32 [[XOR]], i32* [[P:%.*]], align 4
; CHECK-NEXT: [[AND:%.]] = and i32 [[SHL]], [[X:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[X:%.]], [[Y]]
; CHECK-NEXT: [[R:%.*]] = icmp eq i32 [[AND]], 0		; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[TMP1]], -1
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%shl = shl i32 2147483648, %y		%shl = shl i32 2147483648, %y
%xor = xor i32 %shl, %z ; extra use of shl		%xor = xor i32 %shl, %z ; extra use of shl
store i32 %xor, i32* %p		store i32 %xor, i32* %p
%and = and i32 %shl, %x		%and = and i32 %shl, %x
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
Show All 36 Lines	;
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
}		}

; X is constant		; X is constant

define i1 @scalar_i32_signbit_shl_and_eq_X_is_constant1(i32 %y) {		define i1 @scalar_i32_signbit_shl_and_eq_X_is_constant1(i32 %y) {
; CHECK-LABEL: @scalar_i32_signbit_shl_and_eq_X_is_constant1(		; CHECK-LABEL: @scalar_i32_signbit_shl_and_eq_X_is_constant1(
; CHECK-NEXT: [[SHL:%.]] = shl i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: ret i1 true
; CHECK-NEXT: [[AND:%.*]] = and i32 [[SHL]], 12345
; CHECK-NEXT: [[R:%.*]] = icmp eq i32 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]
;		;
%shl = shl i32 2147483648, %y		%shl = shl i32 2147483648, %y
%and = and i32 %shl, 12345		%and = and i32 %shl, 12345
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
}		}

define i1 @scalar_i32_signbit_shl_and_eq_X_is_constant2(i32 %y) {		define i1 @scalar_i32_signbit_shl_and_eq_X_is_constant2(i32 %y) {
; CHECK-LABEL: @scalar_i32_signbit_shl_and_eq_X_is_constant2(		; CHECK-LABEL: @scalar_i32_signbit_shl_and_eq_X_is_constant2(
; CHECK-NEXT: [[SHL:%.]] = shl i32 -2147483648, [[Y:%.]]		; CHECK-NEXT: ret i1 true
; CHECK-NEXT: [[AND:%.*]] = and i32 [[SHL]], 1
; CHECK-NEXT: [[R:%.*]] = icmp eq i32 [[AND]], 0
; CHECK-NEXT: ret i1 [[R]]
;		;
%shl = shl i32 2147483648, %y		%shl = shl i32 2147483648, %y
%and = and i32 %shl, 1		%and = and i32 %shl, 1
%r = icmp eq i32 %and, 0		%r = icmp eq i32 %and, 0
ret i1 %r		ret i1 %r
}		}

; Negative tests		; Negative tests
Show All 30 Lines