This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineShifts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
partally-redundant-left-shift-input-masking-variant-a.ll
-
partally-redundant-left-shift-input-masking-variant-b.ll

Differential D67677

[InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563)
ClosedPublic

Authored by lebedev.ri on Sep 17 2019, 1:11 PM.

Download Raw Diff

Details

Reviewers

spatel
nikic
xbolva00

Commits

rGb4a1d8a84ce0: [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask…
rL372629: [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask…

Summary

And this is finally the interesting part of that fold!

If we have a pattern (x & (~(-1 << maskNbits))) << shiftNbits,
we already know (have a fold) that will drop the & (~(-1 << maskNbits))
mask iff (maskNbits+shiftNbits) u>= bitwidth(x).
But that is actually ignorant, there's more general fold here:

In this pattern, (maskNbits+shiftNbits) actually correlates
with the number of low bits that will remain in the final value.
So even if (maskNbits+shiftNbits) u< bitwidth(x), we can still
fold, we will just need to apply a constant mask afterwards:

Name: a, normal+mask
  %onebit = shl i32 -1, C1
  %mask = xor i32 %onebit, -1
  %masked = and i32 %mask, %x
  %r = shl i32 %masked, C2
=>
  %n0 = shl i32 %x, C2
  %n1 = add i32 C1, C2
  %n2 = zext i32 %n1 to i64
  %n3 = shl i64 -1, %n2
  %n4 = xor i64 %n3, -1
  %n5 = trunc i64 %n4 to i32
  %r = and i32 %n0, %n5

https://rise4fun.com/Alive/F5R

Naturally, old %masked will have to be one-use.
Similar fold exists for patterns c,d,e, will post patch later.

https://bugs.llvm.org/show_bug.cgi?id=42563

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Sep 17 2019, 1:11 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 17 2019, 1:11 PM

Upload correct diff with no noise in tests..

Rebased, NFC.

lebedev.ri added a child revision: D67725: [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. c/d/e with mask (PR42563).Sep 18 2019, 11:40 AM

lebedev.ri edited the summary of this revision. (Show Details)

spatel added inline comments.Sep 19 2019, 8:42 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗	(On Diff #220712)	Is there a test showing that we need this ext+trunc complexity? See if I've botched this Alive somehow, but the simpler constant mask appears to work: https://rise4fun.com/Alive/ArQC

lebedev.ri added inline comments.Sep 19 2019, 9:19 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗	(On Diff #220712)	Hmm. The reason i've gone forward with ext/trunc is: https://rise4fun.com/Alive/o5l In your example alive does not complain because those are constants, and somehow the usual poison rules don't apply? Are we sure this isn't alive limitation, but the correct behavior?

lebedev.ri marked an inline comment as done.Sep 19 2019, 9:20 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗	(On Diff #220712)	I.e., i don't think i checked, what happens if `ConstantExpr::getShl()` is called with such out-of-bounds shift amount? Does it correctly handle it, or will ubsan complain, etc?

lebedev.ri marked 3 inline comments as done.Sep 19 2019, 10:33 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

176–180 ↗

(On Diff #220712)

Tried. No, we can't do this, it breaks the whole point of losslessly handling lanes that need no extra masking.

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
index 3f466495c5e..8db01b4d4bd 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
@@ -171,21 +171,10 @@ dropRedundantMaskingOfLeftShiftInput(BinaryOperator *OuterShift,
       // But for a mask we need to get rid of old masking instruction.
       if (!Masked->hasOneUse())
         return nullptr; // Else we can't perform the fold.
-      // We should produce compute the mask in wider type, and truncate later!
-      // Get type twice as wide element-wise (same number of elements!).
-      Type *ExtendedScalarTy = Type::getIntNTy(Ty->getContext(), 2 * BitWidth);
-      Type *ExtendedTy =
-          Ty->isVectorTy()
-              ? VectorType::get(ExtendedScalarTy, Ty->getVectorNumElements())
-              : ExtendedScalarTy;
-      auto *ExtendedSumOfShAmts =
-          ConstantExpr::getZExt(SumOfShAmts, ExtendedTy);
       // And compute the mask as usual: ~(-1 << (SumOfShAmts))
-      auto *ExtendedAllOnes = ConstantExpr::getAllOnesValue(ExtendedTy);
-      auto *ExtendedInvertedMask =
-          ConstantExpr::getShl(ExtendedAllOnes, ExtendedSumOfShAmts);
-      auto *ExtendedMask = ConstantExpr::getNot(ExtendedInvertedMask);
-      NewMask = ConstantExpr::getTrunc(ExtendedMask, Ty);
+      auto *AllOnes = ConstantExpr::getAllOnesValue(Ty);
+      auto *InvertedMask = ConstantExpr::getShl(AllOnes, SumOfShAmts);
+      NewMask = ConstantExpr::getNot(InvertedMask);
     } else
       NewMask = nullptr; // No mask needed.
     // All good, we can do this fold.
diff --git a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
index 5445275ad1c..235e152d2fe 100644
--- a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
+++ b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
@@ -82,7 +82,7 @@ define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
 ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T2]])
 ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T4]])
 ; CHECK-NEXT:    [[TMP1:%.*]] = shl <8 x i32> [[X:%.*]], [[T4]]
-; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
+; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 undef, i32 undef, i32 undef, i32 undef>
 ; CHECK-NEXT:    ret <8 x i32> [[T5]]
 ;
   %t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
diff --git a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
index 6165b579661..0a7c0e5d030 100644
--- a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
+++ b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
@@ -82,7 +82,7 @@ define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
 ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T2]])
 ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T4]])
 ; CHECK-NEXT:    [[TMP1:%.*]] = shl <8 x i32> [[X:%.*]], [[T4]]
-; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
+; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 undef, i32 undef, i32 undef, i32 undef>
 ; CHECK-NEXT:    ret <8 x i32> [[T5]]
 ;
   %t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>

LGTM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗	(On Diff #220712)	Ok, thanks for checking. Can we make the reasoning more clear in the code comment? Something like: // The mask must be computed in a type twice as wide to ensure // that no bits are lost if the sum-of-shifts is wider than the base type.

This revision is now accepted and ready to land.Sep 20 2019, 10:42 AM

In D67677#1677044, @spatel wrote:

LGTM

Thank you for the review.

I'm kinda worried about single-use check there.
I don't know for a fact that it is "bad", but i suspect it may be.
But that is something for later.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗	(On Diff #220712)	Yes, i will improve comment here.

@spatel will you want to review the sibling patch D67725, or should i "self-review" since it's rather identical to this one?

In D67677#1678368, @lebedev.ri wrote:

@spatel will you want to review the sibling patch D67725, or should i "self-review" since it's rather identical to this one?

Sorry for the delay - I'll take a look today.

Closed by commit rL372629: [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask… (authored by lebedevri). · Explain WhySep 23 2019, 10:02 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri mentioned this in D69125: [InstCombine] dropRedundantMaskingOfLeftShiftInput(): truncation (PR42563).Oct 17 2019, 11:14 AM

lebedev.ri mentioned this in rGccf1a5f4bbe6: [InstCombine] dropRedundantMaskingOfLeftShiftInput(): truncation (PR42563).Nov 5 2019, 1:43 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineShifts.cpp

33 lines

test/

Transforms/

InstCombine/

partally-redundant-left-shift-input-masking-variant-a.ll

12 lines

partally-redundant-left-shift-input-masking-variant-b.ll

12 lines

Diff 221360

llvm/trunk/lib/Transforms/InstCombine/InstCombineShifts.cpp

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	dropRedundantMaskingOfLeftShiftInput(BinaryOperator *OuterShift,
auto MaskB = m_Xor(m_Shl(m_AllOnes(), m_Value(MaskShAmt)), m_AllOnes());		auto MaskB = m_Xor(m_Shl(m_AllOnes(), m_Value(MaskShAmt)), m_AllOnes());
// (-1 >> MaskShAmt)		// (-1 >> MaskShAmt)
auto MaskC = m_Shr(m_AllOnes(), m_Value(MaskShAmt));		auto MaskC = m_Shr(m_AllOnes(), m_Value(MaskShAmt));
// ((-1 << MaskShAmt) >> MaskShAmt)		// ((-1 << MaskShAmt) >> MaskShAmt)
auto MaskD =		auto MaskD =
m_Shr(m_Shl(m_AllOnes(), m_Value(MaskShAmt)), m_Deferred(MaskShAmt));		m_Shr(m_Shl(m_AllOnes(), m_Value(MaskShAmt)), m_Deferred(MaskShAmt));

Value *X;		Value *X;
		Constant *NewMask;
if (match(Masked, m_c_And(m_CombineOr(MaskA, MaskB), m_Value(X)))) {		if (match(Masked, m_c_And(m_CombineOr(MaskA, MaskB), m_Value(X)))) {
// Can we simplify (MaskShAmt+ShiftShAmt) ?		// Can we simplify (MaskShAmt+ShiftShAmt) ?
auto *SumOfShAmts = dyn_cast_or_null<Constant>(		auto *SumOfShAmts = dyn_cast_or_null<Constant>(
SimplifyAddInst(MaskShAmt, ShiftShAmt, /IsNSW=/false, /IsNUW=/false,		SimplifyAddInst(MaskShAmt, ShiftShAmt, /IsNSW=/false, /IsNUW=/false,
SQ.getWithInstruction(OuterShift)));		SQ.getWithInstruction(OuterShift)));
if (!SumOfShAmts)		if (!SumOfShAmts)
return nullptr; // Did not simplify.		return nullptr; // Did not simplify.
Type *Ty = X->getType();		Type *Ty = X->getType();
unsigned BitWidth = Ty->getScalarSizeInBits();		unsigned BitWidth = Ty->getScalarSizeInBits();
// In this pattern SumOfShAmts correlates with the number of low bits that		// In this pattern SumOfShAmts correlates with the number of low bits that
// shall remain in the root value (OuterShift). If SumOfShAmts is less than		// shall remain in the root value (OuterShift). If SumOfShAmts is less than
// bitwidth, we'll need to also produce a mask to keep SumOfShAmts low bits.		// bitwidth, we'll need to also produce a mask to keep SumOfShAmts low bits.
// So, does any channel need a mask?		// So, does any channel need a mask?
if (!match(SumOfShAmts, m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_UGE,		if (!match(SumOfShAmts, m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_UGE,
APInt(BitWidth, BitWidth))))		APInt(BitWidth, BitWidth)))) {
return nullptr; // FIXME.		// But for a mask we need to get rid of old masking instruction.
		if (!Masked->hasOneUse())
		return nullptr; // Else we can't perform the fold.
		// We should produce compute the mask in wider type, and truncate later!
		// Get type twice as wide element-wise (same number of elements!).
		Type ExtendedScalarTy = Type::getIntNTy(Ty->getContext(), 2 BitWidth);
		Type *ExtendedTy =
		Ty->isVectorTy()
		? VectorType::get(ExtendedScalarTy, Ty->getVectorNumElements())
		: ExtendedScalarTy;
		auto *ExtendedSumOfShAmts =
		ConstantExpr::getZExt(SumOfShAmts, ExtendedTy);
		// And compute the mask as usual: ~(-1 << (SumOfShAmts))
		auto *ExtendedAllOnes = ConstantExpr::getAllOnesValue(ExtendedTy);
		auto *ExtendedInvertedMask =
		ConstantExpr::getShl(ExtendedAllOnes, ExtendedSumOfShAmts);
		auto *ExtendedMask = ConstantExpr::getNot(ExtendedInvertedMask);
		NewMask = ConstantExpr::getTrunc(ExtendedMask, Ty);
		} else
		NewMask = nullptr; // No mask needed.
// All good, we can do this fold.		// All good, we can do this fold.
} else if (match(Masked, m_c_And(m_CombineOr(MaskC, MaskD), m_Value(X))) \|\|		} else if (match(Masked, m_c_And(m_CombineOr(MaskC, MaskD), m_Value(X))) \|\|
match(Masked, m_Shr(m_Shl(m_Value(X), m_Value(MaskShAmt)),		match(Masked, m_Shr(m_Shl(m_Value(X), m_Value(MaskShAmt)),
m_Deferred(MaskShAmt)))) {		m_Deferred(MaskShAmt)))) {
// Can we simplify (ShiftShAmt-MaskShAmt) ?		// Can we simplify (ShiftShAmt-MaskShAmt) ?
auto *ShAmtsDiff = dyn_cast_or_null<Constant>(		auto *ShAmtsDiff = dyn_cast_or_null<Constant>(
SimplifySubInst(ShiftShAmt, MaskShAmt, /IsNSW=/false, /IsNUW=/false,		SimplifySubInst(ShiftShAmt, MaskShAmt, /IsNSW=/false, /IsNUW=/false,
SQ.getWithInstruction(OuterShift)));		SQ.getWithInstruction(OuterShift)));
if (!ShAmtsDiff)		if (!ShAmtsDiff)
return nullptr; // Did not simplify.		return nullptr; // Did not simplify.
// In this pattern ShAmtsDiff correlates with the number of high bits that		// In this pattern ShAmtsDiff correlates with the number of high bits that
// shall be unset in the root value (OuterShift). If ShAmtsDiff is negative,		// shall be unset in the root value (OuterShift). If ShAmtsDiff is negative,
// we'll need to also produce a mask to unset ShAmtsDiff high bits.		// we'll need to also produce a mask to unset ShAmtsDiff high bits.
// So, does any channel need a mask? (is ShiftShAmt u>= MaskShAmt ?)		// So, does any channel need a mask? (is ShiftShAmt u>= MaskShAmt ?)
if (!match(ShAmtsDiff, m_NonNegative()))		if (!match(ShAmtsDiff, m_NonNegative()))
return nullptr; // FIXME.		return nullptr; // FIXME.
// All good, we can do this fold.		// All good, we can do this fold.
		NewMask = nullptr; // No mask needed.
} else		} else
return nullptr; // Don't know anything about this pattern.		return nullptr; // Don't know anything about this pattern.

// No 'NUW'/'NSW'!		// No 'NUW'/'NSW'!
// We no longer know that we won't shift-out non-0 bits.		// We no longer know that we won't shift-out non-0 bits.
return BinaryOperator::Create(OuterShift->getOpcode(), X, ShiftShAmt);		auto *NewShift =
		BinaryOperator::Create(OuterShift->getOpcode(), X, ShiftShAmt);
		if (!NewMask)
		return NewShift;

		Builder.Insert(NewShift);
		return BinaryOperator::Create(Instruction::And, NewShift, NewMask);
}		}

Instruction *InstCombiner::commonShiftTransforms(BinaryOperator &I) {		Instruction *InstCombiner::commonShiftTransforms(BinaryOperator &I) {
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
assert(Op0->getType() == Op1->getType());		assert(Op0->getType() == Op1->getType());

// See if we can fold away this shift.		// See if we can fold away this shift.
if (SimplifyDemandedInstructionBits(I))		if (SimplifyDemandedInstructionBits(I))
▲ Show 20 Lines • Show All 870 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll

	Show All 12 Lines

	declare void @use32(i32)			declare void @use32(i32)

	define i32 @t0_basic(i32 %x, i32 %nbits) {			define i32 @t0_basic(i32 %x, i32 %nbits) {
	; CHECK-LABEL: @t0_basic(			; CHECK-LABEL: @t0_basic(
	; CHECK-NEXT: [[T0:%.]] = add i32 [[NBITS:%.]], -1			; CHECK-NEXT: [[T0:%.]] = add i32 [[NBITS:%.]], -1
	; CHECK-NEXT: [[T1:%.*]] = shl i32 1, [[T0]]			; CHECK-NEXT: [[T1:%.*]] = shl i32 1, [[T0]]
	; CHECK-NEXT: [[T2:%.*]] = add i32 [[T1]], -1			; CHECK-NEXT: [[T2:%.*]] = add i32 [[T1]], -1
	; CHECK-NEXT: [[T3:%.]] = and i32 [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub i32 32, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub i32 32, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T4]])			; CHECK-NEXT: call void @use32(i32 [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl i32 [[T3]], [[T4]]			; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[X:%.]], [[T4]]
				; CHECK-NEXT: [[T5:%.*]] = and i32 [[TMP1]], 2147483647
	; CHECK-NEXT: ret i32 [[T5]]			; CHECK-NEXT: ret i32 [[T5]]
	;			;
	%t0 = add i32 %nbits, -1			%t0 = add i32 %nbits, -1
	%t1 = shl i32 1, %t0 ; shifting by nbits-1			%t1 = shl i32 1, %t0 ; shifting by nbits-1
	%t2 = add i32 %t1, -1			%t2 = add i32 %t1, -1
	%t3 = and i32 %t2, %x			%t3 = and i32 %t2, %x
	%t4 = sub i32 32, %nbits			%t4 = sub i32 32, %nbits
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	call void @use32(i32 %t1)			call void @use32(i32 %t1)
	call void @use32(i32 %t2)			call void @use32(i32 %t2)
	call void @use32(i32 %t4)			call void @use32(i32 %t4)
	%t5 = shl i32 %t3, %t4			%t5 = shl i32 %t3, %t4
	ret i32 %t5			ret i32 %t5
	}			}

	; Vectors			; Vectors

	declare void @use8xi32(<8 x i32>)			declare void @use8xi32(<8 x i32>)

	define <8 x i32> @t1_vec_splat(<8 x i32> %x, <8 x i32> %nbits) {			define <8 x i32> @t1_vec_splat(<8 x i32> %x, <8 x i32> %nbits) {
	; CHECK-LABEL: @t1_vec_splat(			; CHECK-LABEL: @t1_vec_splat(
	; CHECK-NEXT: [[T0:%.]] = add <8 x i32> [[NBITS:%.]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[T0:%.]] = add <8 x i32> [[NBITS:%.]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	; CHECK-NEXT: [[T1:%.*]] = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, [[T0]]			; CHECK-NEXT: [[T1:%.*]] = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, [[T0]]
	; CHECK-NEXT: [[T2:%.*]] = add <8 x i32> [[T1]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[T2:%.*]] = add <8 x i32> [[T1]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	; CHECK-NEXT: [[T3:%.]] = and <8 x i32> [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, [[NBITS]]
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T0]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T0]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T1]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T1]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl <8 x i32> [[T3]], [[T4]]			; CHECK-NEXT: [[TMP1:%.]] = shl <8 x i32> [[X:%.]], [[T4]]
				; CHECK-NEXT: [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
	; CHECK-NEXT: ret <8 x i32> [[T5]]			; CHECK-NEXT: ret <8 x i32> [[T5]]
	;			;
	%t0 = add <8 x i32> %nbits, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			%t0 = add <8 x i32> %nbits, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	%t1 = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, %t0			%t1 = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, %t0
	%t2 = add <8 x i32> %t1, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			%t2 = add <8 x i32> %t1, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	%t3 = and <8 x i32> %t2, %x			%t3 = and <8 x i32> %t2, %x
	%t4 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %nbits			%t4 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %nbits
	call void @use8xi32(<8 x i32> %t0)			call void @use8xi32(<8 x i32> %t0)
	call void @use8xi32(<8 x i32> %t1)			call void @use8xi32(<8 x i32> %t1)
	call void @use8xi32(<8 x i32> %t2)			call void @use8xi32(<8 x i32> %t2)
	call void @use8xi32(<8 x i32> %t4)			call void @use8xi32(<8 x i32> %t4)
	%t5 = shl <8 x i32> %t3, %t4			%t5 = shl <8 x i32> %t3, %t4
	ret <8 x i32> %t5			ret <8 x i32> %t5
	}			}

	define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {			define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
	; CHECK-LABEL: @t2_vec_nonsplat(			; CHECK-LABEL: @t2_vec_nonsplat(
	; CHECK-NEXT: [[T0:%.]] = add <8 x i32> [[NBITS:%.]], <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>			; CHECK-NEXT: [[T0:%.]] = add <8 x i32> [[NBITS:%.]], <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
	; CHECK-NEXT: [[T1:%.*]] = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, [[T0]]			; CHECK-NEXT: [[T1:%.*]] = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, [[T0]]
	; CHECK-NEXT: [[T2:%.*]] = add <8 x i32> [[T1]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[T2:%.*]] = add <8 x i32> [[T1]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	; CHECK-NEXT: [[T3:%.]] = and <8 x i32> [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, [[NBITS]]
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T0]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T0]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T1]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T1]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl <8 x i32> [[T3]], [[T4]]			; CHECK-NEXT: [[TMP1:%.]] = shl <8 x i32> [[X:%.]], [[T4]]
				; CHECK-NEXT: [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
	; CHECK-NEXT: ret <8 x i32> [[T5]]			; CHECK-NEXT: ret <8 x i32> [[T5]]
	;			;
	%t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>			%t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
	%t1 = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, %t0			%t1 = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, %t0
	%t2 = add <8 x i32> %t1, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			%t2 = add <8 x i32> %t1, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	%t3 = and <8 x i32> %t2, %x			%t3 = and <8 x i32> %t2, %x
	%t4 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %nbits			%t4 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %nbits
	call void @use8xi32(<8 x i32> %t0)			call void @use8xi32(<8 x i32> %t0)
	Show All 37 Lines

llvm/trunk/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll

	Show All 12 Lines

	declare void @use32(i32)			declare void @use32(i32)

	define i32 @t0_basic(i32 %x, i32 %nbits) {			define i32 @t0_basic(i32 %x, i32 %nbits) {
	; CHECK-LABEL: @t0_basic(			; CHECK-LABEL: @t0_basic(
	; CHECK-NEXT: [[T0:%.]] = add i32 [[NBITS:%.]], -1			; CHECK-NEXT: [[T0:%.]] = add i32 [[NBITS:%.]], -1
	; CHECK-NEXT: [[T1:%.*]] = shl i32 -1, [[T0]]			; CHECK-NEXT: [[T1:%.*]] = shl i32 -1, [[T0]]
	; CHECK-NEXT: [[T2:%.*]] = xor i32 [[T1]], -1			; CHECK-NEXT: [[T2:%.*]] = xor i32 [[T1]], -1
	; CHECK-NEXT: [[T3:%.]] = and i32 [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub i32 32, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub i32 32, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T4]])			; CHECK-NEXT: call void @use32(i32 [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl i32 [[T3]], [[T4]]			; CHECK-NEXT: [[TMP1:%.]] = shl i32 [[X:%.]], [[T4]]
				; CHECK-NEXT: [[T5:%.*]] = and i32 [[TMP1]], 2147483647
	; CHECK-NEXT: ret i32 [[T5]]			; CHECK-NEXT: ret i32 [[T5]]
	;			;
	%t0 = add i32 %nbits, -1			%t0 = add i32 %nbits, -1
	%t1 = shl i32 -1, %t0 ; shifting by nbits-1			%t1 = shl i32 -1, %t0 ; shifting by nbits-1
	%t2 = xor i32 %t1, -1			%t2 = xor i32 %t1, -1
	%t3 = and i32 %t2, %x			%t3 = and i32 %t2, %x
	%t4 = sub i32 32, %nbits			%t4 = sub i32 32, %nbits
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	call void @use32(i32 %t1)			call void @use32(i32 %t1)
	call void @use32(i32 %t2)			call void @use32(i32 %t2)
	call void @use32(i32 %t4)			call void @use32(i32 %t4)
	%t5 = shl i32 %t3, %t4			%t5 = shl i32 %t3, %t4
	ret i32 %t5			ret i32 %t5
	}			}

	; Vectors			; Vectors

	declare void @use8xi32(<8 x i32>)			declare void @use8xi32(<8 x i32>)

	define <8 x i32> @t1_vec_splat(<8 x i32> %x, <8 x i32> %nbits) {			define <8 x i32> @t1_vec_splat(<8 x i32> %x, <8 x i32> %nbits) {
	; CHECK-LABEL: @t1_vec_splat(			; CHECK-LABEL: @t1_vec_splat(
	; CHECK-NEXT: [[T0:%.]] = add <8 x i32> [[NBITS:%.]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[T0:%.]] = add <8 x i32> [[NBITS:%.]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	; CHECK-NEXT: [[T1:%.*]] = shl <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, [[T0]]			; CHECK-NEXT: [[T1:%.*]] = shl <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, [[T0]]
	; CHECK-NEXT: [[T2:%.*]] = xor <8 x i32> [[T1]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[T2:%.*]] = xor <8 x i32> [[T1]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	; CHECK-NEXT: [[T3:%.]] = and <8 x i32> [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, [[NBITS]]
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T0]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T0]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T1]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T1]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl <8 x i32> [[T3]], [[T4]]			; CHECK-NEXT: [[TMP1:%.]] = shl <8 x i32> [[X:%.]], [[T4]]
				; CHECK-NEXT: [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
	; CHECK-NEXT: ret <8 x i32> [[T5]]			; CHECK-NEXT: ret <8 x i32> [[T5]]
	;			;
	%t0 = add <8 x i32> %nbits, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			%t0 = add <8 x i32> %nbits, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	%t1 = shl <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, %t0			%t1 = shl <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, %t0
	%t2 = xor <8 x i32> %t1, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			%t2 = xor <8 x i32> %t1, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	%t3 = and <8 x i32> %t2, %x			%t3 = and <8 x i32> %t2, %x
	%t4 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %nbits			%t4 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %nbits
	call void @use8xi32(<8 x i32> %t0)			call void @use8xi32(<8 x i32> %t0)
	call void @use8xi32(<8 x i32> %t1)			call void @use8xi32(<8 x i32> %t1)
	call void @use8xi32(<8 x i32> %t2)			call void @use8xi32(<8 x i32> %t2)
	call void @use8xi32(<8 x i32> %t4)			call void @use8xi32(<8 x i32> %t4)
	%t5 = shl <8 x i32> %t3, %t4			%t5 = shl <8 x i32> %t3, %t4
	ret <8 x i32> %t5			ret <8 x i32> %t5
	}			}

	define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {			define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
	; CHECK-LABEL: @t2_vec_nonsplat(			; CHECK-LABEL: @t2_vec_nonsplat(
	; CHECK-NEXT: [[T0:%.]] = add <8 x i32> [[NBITS:%.]], <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>			; CHECK-NEXT: [[T0:%.]] = add <8 x i32> [[NBITS:%.]], <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
	; CHECK-NEXT: [[T1:%.*]] = shl <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, [[T0]]			; CHECK-NEXT: [[T1:%.*]] = shl <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, [[T0]]
	; CHECK-NEXT: [[T2:%.*]] = xor <8 x i32> [[T1]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[T2:%.*]] = xor <8 x i32> [[T1]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	; CHECK-NEXT: [[T3:%.]] = and <8 x i32> [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, [[NBITS]]
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T0]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T0]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T1]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T1]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])
	; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])			; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl <8 x i32> [[T3]], [[T4]]			; CHECK-NEXT: [[TMP1:%.]] = shl <8 x i32> [[X:%.]], [[T4]]
				; CHECK-NEXT: [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
	; CHECK-NEXT: ret <8 x i32> [[T5]]			; CHECK-NEXT: ret <8 x i32> [[T5]]
	;			;
	%t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>			%t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
	%t1 = shl <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, %t0			%t1 = shl <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, %t0
	%t2 = xor <8 x i32> %t1, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			%t2 = xor <8 x i32> %t1, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	%t3 = and <8 x i32> %t2, %x			%t3 = and <8 x i32> %t2, %x
	%t4 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %nbits			%t4 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %nbits
	call void @use8xi32(<8 x i32> %t0)			call void @use8xi32(<8 x i32> %t0)
	Show All 37 Lines