This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix SSE2/AVX2 vector shift by constant
ClosedPublic

Authored by RKSimon on Aug 5 2015, 3:55 AM.

Download Raw Diff

Details

Reviewers

qcolombet
andreadb
mkuper

Commits

rG3815c16bf86c: [InstCombine] Fix SSE2/AVX2 vector logical shift by constant
rL244341: [InstCombine] Fix SSE2/AVX2 vector logical shift by constant

Summary

This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.

e.g.

%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)

In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.

In addition, this review adds support for the SSE2/AVX2 ashr shift-by-constant and also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821). I can commit these changes separately if necessary.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 31342.Aug 5 2015, 3:55 AM

RKSimon retitled this revision from to [InstCombine] Fix SSE2/AVX2 vector shift by constant.

RKSimon updated this object.

RKSimon added reviewers: qcolombet, mkuper, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

Hi Simon,

As pointed out in the comments below, I suggest to split this patch into two separate patches.
I'd like you to move the code that "combines" packed arithmetic shifts on a separate patch. That patch will also have to remove the target specific DAG combiner rules in the x86 backend (your patch will make those rules redundant).

It is okay in my opinion if this patch also fixes PR23821 (that fix is very small and probably makes sense to just have it in thius patch..) .

If you address the (minor) comments below, then the patch LGTM.

Thanks Simon!

lib/Transforms/InstCombine/InstCombineCalls.cpp
213 ↗	(On Diff #31342)	I would probably be more specific and explicitly quote the architecture manual which says: "only the first 64-bits of a 128-bit count operand are checked to compute the count". But it is up to you, That comment is probably already good enough :-).
787–799 ↗	(On Diff #31342)	Can this be committed in a separate patch? For simplicity I would prefer if you just fix the problem with the shift count on this patch. You can add rules for combining arithmetic shifts on a next patch. That new patch will also have to get rid of the (horrible) target specific DAG combine rules on psra(i) intrinsics that we currently run on x86 as part of 'PerformINTRINSIC_WO_CHAINCombine'.
test/Transforms/InstCombine/x86-vector-shifts.ll
9–10 ↗	(On Diff #31342)	We should check that we don't have any instruction before the return. We want to make sure that a shift-by-zero is folded away. In this case you can check that no shift is generated before the return statement (and that the tail call to the intrinsic function is no longer in the code). You should do the same for all the other tests that check the instcombine behavior for shift-by-zero.

This revision is now accepted and ready to land.Aug 5 2015, 5:46 AM

Updated patch. I'll submit the ASHR fix as a separate review.

LGTM. Thanks!

Closed by commit rL244341: [InstCombine] Fix SSE2/AVX2 vector logical shift by constant (authored by RKSimon). · Explain WhyAug 7 2015, 11:23 AM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in D11886: [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner.Aug 9 2015, 6:16 AM

RKSimon mentioned this in rL244495: [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner.Aug 10 2015, 1:22 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

55 lines

test/

Transforms/

InstCombine/

x86-vector-shifts.ll

309 lines

Diff 31532

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	Instruction InstCombiner::SimplifyMemSet(MemSetInst MI) {
}		}

return nullptr;		return nullptr;
}		}

static Value *SimplifyX86immshift(const IntrinsicInst &II,		static Value *SimplifyX86immshift(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder,		InstCombiner::BuilderTy &Builder,
bool ShiftLeft) {		bool ShiftLeft) {
// Simplify if count is constant. To 0 if >= BitWidth,		// Simplify if count is constant.
// otherwise to shl/lshr.		auto Arg1 = II.getArgOperand(1);
auto CDV = dyn_cast<ConstantDataVector>(II.getArgOperand(1));		auto CAZ = dyn_cast<ConstantAggregateZero>(Arg1);
auto CInt = dyn_cast<ConstantInt>(II.getArgOperand(1));		auto CDV = dyn_cast<ConstantDataVector>(Arg1);
if (!CDV && !CInt)		auto CInt = dyn_cast<ConstantInt>(Arg1);
return nullptr;		if (!CAZ && !CDV && !CInt)
ConstantInt *Count;		return nullptr;
if (CDV)
Count = cast<ConstantInt>(CDV->getElementAsConstant(0));		APInt Count(64, 0);
else		if (CDV) {
Count = CInt;		// SSE2/AVX2 uses all the first 64-bits of the 128-bit vector
		// operand to compute the shift amount.
		auto VT = cast<VectorType>(CDV->getType());
		unsigned BitWidth = VT->getElementType()->getPrimitiveSizeInBits();
		assert((64 % BitWidth) == 0 && "Unexpected packed shift size");
		unsigned NumSubElts = 64 / BitWidth;

		// Concatenate the sub-elements to create the 64-bit value.
		for (unsigned i = 0; i != NumSubElts; ++i) {
		unsigned SubEltIdx = (NumSubElts - 1) - i;
		auto SubElt = cast<ConstantInt>(CDV->getElementAsConstant(SubEltIdx));
		Count = Count.shl(BitWidth);
		Count \|= SubElt->getValue().zextOrTrunc(64);
		}
		}
		else if (CInt)
		Count = CInt->getValue();

auto Vec = II.getArgOperand(0);		auto Vec = II.getArgOperand(0);
auto VT = cast<VectorType>(Vec->getType());		auto VT = cast<VectorType>(Vec->getType());
auto SVT = VT->getElementType();		auto SVT = VT->getElementType();
if (Count->getZExtValue() > (SVT->getPrimitiveSizeInBits() - 1))
return ConstantAggregateZero::get(VT);

unsigned VWidth = VT->getNumElements();		unsigned VWidth = VT->getNumElements();
		unsigned BitWidth = SVT->getPrimitiveSizeInBits();

		// If shift-by-zero then just return the original value.
		if (Count == 0)
		return Vec;

		// Handle cases when Shift >= BitWidth - just return zero.
		if (Count.uge(BitWidth))
		return ConstantAggregateZero::get(VT);

// Get a constant vector of the same type as the first operand.		// Get a constant vector of the same type as the first operand.
auto VTCI = ConstantInt::get(VT->getElementType(), Count->getZExtValue());		auto ShiftAmt = ConstantInt::get(SVT, Count.zextOrTrunc(BitWidth));
		auto ShiftVec = Builder.CreateVectorSplat(VWidth, ShiftAmt);

if (ShiftLeft)		if (ShiftLeft)
return Builder.CreateShl(Vec, Builder.CreateVectorSplat(VWidth, VTCI));		return Builder.CreateShl(Vec, ShiftVec);

return Builder.CreateLShr(Vec, Builder.CreateVectorSplat(VWidth, VTCI));		return Builder.CreateLShr(Vec, ShiftVec);
}		}

static Value *SimplifyX86extend(const IntrinsicInst &II,		static Value *SimplifyX86extend(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder,		InstCombiner::BuilderTy &Builder,
bool SignExtend) {		bool SignExtend) {
VectorType *SrcTy = cast<VectorType>(II.getArgOperand(0)->getType());		VectorType *SrcTy = cast<VectorType>(II.getArgOperand(0)->getType());
VectorType *DstTy = cast<VectorType>(II.getType());		VectorType *DstTy = cast<VectorType>(II.getType());
unsigned NumDstElts = DstTy->getNumElements();		unsigned NumDstElts = DstTy->getNumElements();
▲ Show 20 Lines • Show All 1,723 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/x86-vector-shifts.ll

	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	;			;
	; LSHR - Immediate			; LSHR - Immediate
	;			;

	define <8 x i16> @sse2_psrli_w_0(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_psrli_w_0(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_w_0			; CHECK-LABEL: @sse2_psrli_w_0
	; CHECK: ret <8 x i16> %v			; CHECK-NEXT: ret <8 x i16> %v
	%1 = tail call <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %v, i32 0)			%1 = tail call <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %v, i32 0)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

	define <8 x i16> @sse2_psrli_w_15(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_psrli_w_15(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_w_15			; CHECK-LABEL: @sse2_psrli_w_15
	; CHECK: %1 = lshr <8 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			; CHECK-NEXT: %1 = lshr <8 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	; CHECK: ret <8 x i16> %1			; CHECK-NEXT: ret <8 x i16> %1
	%1 = tail call <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %v, i32 15)			%1 = tail call <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %v, i32 15)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

	define <8 x i16> @sse2_psrli_w_64(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_psrli_w_64(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_w_64			; CHECK-LABEL: @sse2_psrli_w_64
	; CHECK: ret <8 x i16> zeroinitializer			; CHECK-NEXT: ret <8 x i16> zeroinitializer
	%1 = tail call <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %v, i32 64)			%1 = tail call <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %v, i32 64)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

	define <4 x i32> @sse2_psrli_d_0(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_psrli_d_0(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_d_0			; CHECK-LABEL: @sse2_psrli_d_0
	; CHECK: ret <4 x i32> %v			; CHECK-NEXT: ret <4 x i32> %v
	%1 = tail call <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32> %v, i32 0)			%1 = tail call <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32> %v, i32 0)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

	define <4 x i32> @sse2_psrli_d_15(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_psrli_d_15(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_d_15			; CHECK-LABEL: @sse2_psrli_d_15
	; CHECK: %1 = lshr <4 x i32> %v, <i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: %1 = lshr <4 x i32> %v, <i32 15, i32 15, i32 15, i32 15>
	; CHECK: ret <4 x i32> %1			; CHECK-NEXT: ret <4 x i32> %1
	%1 = tail call <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32> %v, i32 15)			%1 = tail call <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32> %v, i32 15)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

	define <4 x i32> @sse2_psrli_d_64(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_psrli_d_64(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_d_64			; CHECK-LABEL: @sse2_psrli_d_64
	; CHECK: ret <4 x i32> zeroinitializer			; CHECK-NEXT: ret <4 x i32> zeroinitializer
	%1 = tail call <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32> %v, i32 64)			%1 = tail call <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32> %v, i32 64)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

	define <2 x i64> @sse2_psrli_q_0(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_psrli_q_0(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_q_0			; CHECK-LABEL: @sse2_psrli_q_0
	; CHECK: ret <2 x i64> %v			; CHECK-NEXT: ret <2 x i64> %v
	%1 = tail call <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64> %v, i32 0)			%1 = tail call <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64> %v, i32 0)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

	define <2 x i64> @sse2_psrli_q_15(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_psrli_q_15(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_q_15			; CHECK-LABEL: @sse2_psrli_q_15
	; CHECK: %1 = lshr <2 x i64> %v, <i64 15, i64 15>			; CHECK-NEXT: %1 = lshr <2 x i64> %v, <i64 15, i64 15>
	; CHECK: ret <2 x i64> %1			; CHECK-NEXT: ret <2 x i64> %1
	%1 = tail call <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64> %v, i32 15)			%1 = tail call <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64> %v, i32 15)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

	define <2 x i64> @sse2_psrli_q_64(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_psrli_q_64(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrli_q_64			; CHECK-LABEL: @sse2_psrli_q_64
	; CHECK: ret <2 x i64> zeroinitializer			; CHECK-NEXT: ret <2 x i64> zeroinitializer
	%1 = tail call <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64> %v, i32 64)			%1 = tail call <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64> %v, i32 64)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

	define <16 x i16> @avx2_psrli_w_0(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_psrli_w_0(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_w_0			; CHECK-LABEL: @avx2_psrli_w_0
	; CHECK: ret <16 x i16> %v			; CHECK-NEXT: ret <16 x i16> %v
	%1 = tail call <16 x i16> @llvm.x86.avx2.psrli.w(<16 x i16> %v, i32 0)			%1 = tail call <16 x i16> @llvm.x86.avx2.psrli.w(<16 x i16> %v, i32 0)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

	define <16 x i16> @avx2_psrli_w_15(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_psrli_w_15(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_w_15			; CHECK-LABEL: @avx2_psrli_w_15
	; CHECK: %1 = lshr <16 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			; CHECK-NEXT: %1 = lshr <16 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	; CHECK: ret <16 x i16> %1			; CHECK-NEXT: ret <16 x i16> %1
	%1 = tail call <16 x i16> @llvm.x86.avx2.psrli.w(<16 x i16> %v, i32 15)			%1 = tail call <16 x i16> @llvm.x86.avx2.psrli.w(<16 x i16> %v, i32 15)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

	define <16 x i16> @avx2_psrli_w_64(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_psrli_w_64(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_w_64			; CHECK-LABEL: @avx2_psrli_w_64
	; CHECK: ret <16 x i16> zeroinitializer			; CHECK-NEXT: ret <16 x i16> zeroinitializer
	%1 = tail call <16 x i16> @llvm.x86.avx2.psrli.w(<16 x i16> %v, i32 64)			%1 = tail call <16 x i16> @llvm.x86.avx2.psrli.w(<16 x i16> %v, i32 64)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

	define <8 x i32> @avx2_psrli_d_0(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_psrli_d_0(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_d_0			; CHECK-LABEL: @avx2_psrli_d_0
	; CHECK: ret <8 x i32> %v			; CHECK-NEXT: ret <8 x i32> %v
	%1 = tail call <8 x i32> @llvm.x86.avx2.psrli.d(<8 x i32> %v, i32 0)			%1 = tail call <8 x i32> @llvm.x86.avx2.psrli.d(<8 x i32> %v, i32 0)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

	define <8 x i32> @avx2_psrli_d_15(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_psrli_d_15(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_d_15			; CHECK-LABEL: @avx2_psrli_d_15
	; CHECK: %1 = lshr <8 x i32> %v, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: %1 = lshr <8 x i32> %v, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK: ret <8 x i32> %1			; CHECK-NEXT: ret <8 x i32> %1
	%1 = tail call <8 x i32> @llvm.x86.avx2.psrli.d(<8 x i32> %v, i32 15)			%1 = tail call <8 x i32> @llvm.x86.avx2.psrli.d(<8 x i32> %v, i32 15)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

	define <8 x i32> @avx2_psrli_d_64(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_psrli_d_64(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_d_64			; CHECK-LABEL: @avx2_psrli_d_64
	; CHECK: ret <8 x i32> zeroinitializer			; CHECK-NEXT: ret <8 x i32> zeroinitializer
	%1 = tail call <8 x i32> @llvm.x86.avx2.psrli.d(<8 x i32> %v, i32 64)			%1 = tail call <8 x i32> @llvm.x86.avx2.psrli.d(<8 x i32> %v, i32 64)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

	define <4 x i64> @avx2_psrli_q_0(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_psrli_q_0(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_q_0			; CHECK-LABEL: @avx2_psrli_q_0
	; CHECK: ret <4 x i64> %v			; CHECK-NEXT: ret <4 x i64> %v
	%1 = tail call <4 x i64> @llvm.x86.avx2.psrli.q(<4 x i64> %v, i32 0)			%1 = tail call <4 x i64> @llvm.x86.avx2.psrli.q(<4 x i64> %v, i32 0)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	define <4 x i64> @avx2_psrli_q_15(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_psrli_q_15(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_q_15			; CHECK-LABEL: @avx2_psrli_q_15
	; CHECK: %1 = lshr <4 x i64> %v, <i64 15, i64 15, i64 15, i64 15>			; CHECK-NEXT: %1 = lshr <4 x i64> %v, <i64 15, i64 15, i64 15, i64 15>
	; CHECK: ret <4 x i64> %1			; CHECK-NEXT: ret <4 x i64> %1
	%1 = tail call <4 x i64> @llvm.x86.avx2.psrli.q(<4 x i64> %v, i32 15)			%1 = tail call <4 x i64> @llvm.x86.avx2.psrli.q(<4 x i64> %v, i32 15)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	define <4 x i64> @avx2_psrli_q_64(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_psrli_q_64(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrli_q_64			; CHECK-LABEL: @avx2_psrli_q_64
	; CHECK: ret <4 x i64> zeroinitializer			; CHECK-NEXT: ret <4 x i64> zeroinitializer
	%1 = tail call <4 x i64> @llvm.x86.avx2.psrli.q(<4 x i64> %v, i32 64)			%1 = tail call <4 x i64> @llvm.x86.avx2.psrli.q(<4 x i64> %v, i32 64)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	;			;
	; SHL - Immediate			; SHL - Immediate
	;			;

	define <8 x i16> @sse2_pslli_w_0(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_pslli_w_0(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_w_0			; CHECK-LABEL: @sse2_pslli_w_0
	; CHECK: ret <8 x i16> %v			; CHECK-NEXT: ret <8 x i16> %v
	%1 = tail call <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16> %v, i32 0)			%1 = tail call <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16> %v, i32 0)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

	define <8 x i16> @sse2_pslli_w_15(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_pslli_w_15(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_w_15			; CHECK-LABEL: @sse2_pslli_w_15
	; CHECK: %1 = shl <8 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			; CHECK-NEXT: %1 = shl <8 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	; CHECK: ret <8 x i16> %1			; CHECK-NEXT: ret <8 x i16> %1
	%1 = tail call <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16> %v, i32 15)			%1 = tail call <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16> %v, i32 15)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

	define <8 x i16> @sse2_pslli_w_64(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_pslli_w_64(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_w_64			; CHECK-LABEL: @sse2_pslli_w_64
	; CHECK: ret <8 x i16> zeroinitializer			; CHECK-NEXT: ret <8 x i16> zeroinitializer
	%1 = tail call <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16> %v, i32 64)			%1 = tail call <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16> %v, i32 64)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

	define <4 x i32> @sse2_pslli_d_0(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_pslli_d_0(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_d_0			; CHECK-LABEL: @sse2_pslli_d_0
	; CHECK: ret <4 x i32> %v			; CHECK-NEXT: ret <4 x i32> %v
	%1 = tail call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> %v, i32 0)			%1 = tail call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> %v, i32 0)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

	define <4 x i32> @sse2_pslli_d_15(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_pslli_d_15(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_d_15			; CHECK-LABEL: @sse2_pslli_d_15
	; CHECK: %1 = shl <4 x i32> %v, <i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: %1 = shl <4 x i32> %v, <i32 15, i32 15, i32 15, i32 15>
	; CHECK: ret <4 x i32> %1			; CHECK-NEXT: ret <4 x i32> %1
	%1 = tail call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> %v, i32 15)			%1 = tail call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> %v, i32 15)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

	define <4 x i32> @sse2_pslli_d_64(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_pslli_d_64(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_d_64			; CHECK-LABEL: @sse2_pslli_d_64
	; CHECK: ret <4 x i32> zeroinitializer			; CHECK-NEXT: ret <4 x i32> zeroinitializer
	%1 = tail call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> %v, i32 64)			%1 = tail call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> %v, i32 64)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

	define <2 x i64> @sse2_pslli_q_0(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_pslli_q_0(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_q_0			; CHECK-LABEL: @sse2_pslli_q_0
	; CHECK: ret <2 x i64> %v			; CHECK-NEXT: ret <2 x i64> %v
	%1 = tail call <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64> %v, i32 0)			%1 = tail call <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64> %v, i32 0)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

	define <2 x i64> @sse2_pslli_q_15(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_pslli_q_15(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_q_15			; CHECK-LABEL: @sse2_pslli_q_15
	; CHECK: %1 = shl <2 x i64> %v, <i64 15, i64 15>			; CHECK-NEXT: %1 = shl <2 x i64> %v, <i64 15, i64 15>
	; CHECK: ret <2 x i64> %1			; CHECK-NEXT: ret <2 x i64> %1
	%1 = tail call <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64> %v, i32 15)			%1 = tail call <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64> %v, i32 15)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

	define <2 x i64> @sse2_pslli_q_64(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_pslli_q_64(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_pslli_q_64			; CHECK-LABEL: @sse2_pslli_q_64
	; CHECK: ret <2 x i64> zeroinitializer			; CHECK-NEXT: ret <2 x i64> zeroinitializer
	%1 = tail call <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64> %v, i32 64)			%1 = tail call <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64> %v, i32 64)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

	define <16 x i16> @avx2_pslli_w_0(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_pslli_w_0(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_w_0			; CHECK-LABEL: @avx2_pslli_w_0
	; CHECK: ret <16 x i16> %v			; CHECK-NEXT: ret <16 x i16> %v
	%1 = tail call <16 x i16> @llvm.x86.avx2.pslli.w(<16 x i16> %v, i32 0)			%1 = tail call <16 x i16> @llvm.x86.avx2.pslli.w(<16 x i16> %v, i32 0)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

	define <16 x i16> @avx2_pslli_w_15(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_pslli_w_15(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_w_15			; CHECK-LABEL: @avx2_pslli_w_15
	; CHECK: %1 = shl <16 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			; CHECK-NEXT: %1 = shl <16 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	; CHECK: ret <16 x i16> %1			; CHECK-NEXT: ret <16 x i16> %1
	%1 = tail call <16 x i16> @llvm.x86.avx2.pslli.w(<16 x i16> %v, i32 15)			%1 = tail call <16 x i16> @llvm.x86.avx2.pslli.w(<16 x i16> %v, i32 15)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

	define <16 x i16> @avx2_pslli_w_64(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_pslli_w_64(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_w_64			; CHECK-LABEL: @avx2_pslli_w_64
	; CHECK: ret <16 x i16> zeroinitializer			; CHECK-NEXT: ret <16 x i16> zeroinitializer
	%1 = tail call <16 x i16> @llvm.x86.avx2.pslli.w(<16 x i16> %v, i32 64)			%1 = tail call <16 x i16> @llvm.x86.avx2.pslli.w(<16 x i16> %v, i32 64)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

	define <8 x i32> @avx2_pslli_d_0(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_pslli_d_0(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_d_0			; CHECK-LABEL: @avx2_pslli_d_0
	; CHECK: ret <8 x i32> %v			; CHECK-NEXT: ret <8 x i32> %v
	%1 = tail call <8 x i32> @llvm.x86.avx2.pslli.d(<8 x i32> %v, i32 0)			%1 = tail call <8 x i32> @llvm.x86.avx2.pslli.d(<8 x i32> %v, i32 0)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

	define <8 x i32> @avx2_pslli_d_15(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_pslli_d_15(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_d_15			; CHECK-LABEL: @avx2_pslli_d_15
	; CHECK: %1 = shl <8 x i32> %v, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: %1 = shl <8 x i32> %v, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK: ret <8 x i32> %1			; CHECK-NEXT: ret <8 x i32> %1
	%1 = tail call <8 x i32> @llvm.x86.avx2.pslli.d(<8 x i32> %v, i32 15)			%1 = tail call <8 x i32> @llvm.x86.avx2.pslli.d(<8 x i32> %v, i32 15)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

	define <8 x i32> @avx2_pslli_d_64(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_pslli_d_64(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_d_64			; CHECK-LABEL: @avx2_pslli_d_64
	; CHECK: ret <8 x i32> zeroinitializer			; CHECK-NEXT: ret <8 x i32> zeroinitializer
	%1 = tail call <8 x i32> @llvm.x86.avx2.pslli.d(<8 x i32> %v, i32 64)			%1 = tail call <8 x i32> @llvm.x86.avx2.pslli.d(<8 x i32> %v, i32 64)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

	define <4 x i64> @avx2_pslli_q_0(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_pslli_q_0(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_q_0			; CHECK-LABEL: @avx2_pslli_q_0
	; CHECK: ret <4 x i64> %v			; CHECK-NEXT: ret <4 x i64> %v
	%1 = tail call <4 x i64> @llvm.x86.avx2.pslli.q(<4 x i64> %v, i32 0)			%1 = tail call <4 x i64> @llvm.x86.avx2.pslli.q(<4 x i64> %v, i32 0)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	define <4 x i64> @avx2_pslli_q_15(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_pslli_q_15(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_q_15			; CHECK-LABEL: @avx2_pslli_q_15
	; CHECK: %1 = shl <4 x i64> %v, <i64 15, i64 15, i64 15, i64 15>			; CHECK-NEXT: %1 = shl <4 x i64> %v, <i64 15, i64 15, i64 15, i64 15>
	; CHECK: ret <4 x i64> %1			; CHECK-NEXT: ret <4 x i64> %1
	%1 = tail call <4 x i64> @llvm.x86.avx2.pslli.q(<4 x i64> %v, i32 15)			%1 = tail call <4 x i64> @llvm.x86.avx2.pslli.q(<4 x i64> %v, i32 15)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	define <4 x i64> @avx2_pslli_q_64(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_pslli_q_64(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_pslli_q_64			; CHECK-LABEL: @avx2_pslli_q_64
	; CHECK: ret <4 x i64> zeroinitializer			; CHECK-NEXT: ret <4 x i64> zeroinitializer
	%1 = tail call <4 x i64> @llvm.x86.avx2.pslli.q(<4 x i64> %v, i32 64)			%1 = tail call <4 x i64> @llvm.x86.avx2.pslli.q(<4 x i64> %v, i32 64)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	;			;
	; LSHR - Constant Vector			; LSHR - Constant Vector
	;			;

				define <8 x i16> @sse2_psrl_w_0(<8 x i16> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psrl_w_0
				; CHECK-NEXT: ret <8 x i16> %v
				%1 = tail call <8 x i16> @llvm.x86.sse2.psrl.w(<8 x i16> %v, <8 x i16> zeroinitializer)
				ret <8 x i16> %1
				}

	define <8 x i16> @sse2_psrl_w_15(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_psrl_w_15(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrl_w_15			; CHECK-LABEL: @sse2_psrl_w_15
	; CHECK: %1 = lshr <8 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			; CHECK-NEXT: %1 = lshr <8 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	; CHECK: ret <8 x i16> %1			; CHECK-NEXT: ret <8 x i16> %1
	%1 = tail call <8 x i16> @llvm.x86.sse2.psrl.w(<8 x i16> %v, <8 x i16> <i16 15, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)			%1 = tail call <8 x i16> @llvm.x86.sse2.psrl.w(<8 x i16> %v, <8 x i16> <i16 15, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

				define <8 x i16> @sse2_psrl_w_15_splat(<8 x i16> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psrl_w_15_splat
				; CHECK-NEXT: ret <8 x i16> zeroinitializer
				%1 = tail call <8 x i16> @llvm.x86.sse2.psrl.w(<8 x i16> %v, <8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>)
				ret <8 x i16> %1
				}

	define <8 x i16> @sse2_psrl_w_64(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_psrl_w_64(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrl_w_64			; CHECK-LABEL: @sse2_psrl_w_64
	; CHECK: ret <8 x i16> zeroinitializer			; CHECK-NEXT: ret <8 x i16> zeroinitializer
	%1 = tail call <8 x i16> @llvm.x86.sse2.psrl.w(<8 x i16> %v, <8 x i16> <i16 64, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)			%1 = tail call <8 x i16> @llvm.x86.sse2.psrl.w(<8 x i16> %v, <8 x i16> <i16 64, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

				define <4 x i32> @sse2_psrl_d_0(<4 x i32> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psrl_d_0
				; CHECK-NEXT: ret <4 x i32> %v
				%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> zeroinitializer)
				ret <4 x i32> %1
				}

	define <4 x i32> @sse2_psrl_d_15(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_psrl_d_15(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrl_d_15			; CHECK-LABEL: @sse2_psrl_d_15
	; CHECK: %1 = lshr <4 x i32> %v, <i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: %1 = lshr <4 x i32> %v, <i32 15, i32 15, i32 15, i32 15>
	; CHECK: ret <4 x i32> %1			; CHECK-NEXT: ret <4 x i32> %1
	%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 0, i32 9999, i32 9999>)			%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 0, i32 9999, i32 9999>)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

				define <4 x i32> @sse2_psrl_d_15_splat(<4 x i32> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psrl_d_15_splat
				; CHECK-NEXT: ret <4 x i32> zeroinitializer
				%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
				ret <4 x i32> %1
				}

	define <4 x i32> @sse2_psrl_d_64(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_psrl_d_64(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrl_d_64			; CHECK-LABEL: @sse2_psrl_d_64
	; CHECK: ret <4 x i32> zeroinitializer			; CHECK-NEXT: ret <4 x i32> zeroinitializer
	%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 64, i32 0, i32 9999, i32 9999>)			%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 64, i32 0, i32 9999, i32 9999>)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

				define <2 x i64> @sse2_psrl_q_0(<2 x i64> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psrl_q_0
				; CHECK-NEXT: ret <2 x i64> %v
				%1 = tail call <2 x i64> @llvm.x86.sse2.psrl.q(<2 x i64> %v, <2 x i64> zeroinitializer)
				ret <2 x i64> %1
				}

	define <2 x i64> @sse2_psrl_q_15(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_psrl_q_15(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrl_q_15			; CHECK-LABEL: @sse2_psrl_q_15
	; CHECK: %1 = lshr <2 x i64> %v, <i64 15, i64 15>			; CHECK-NEXT: %1 = lshr <2 x i64> %v, <i64 15, i64 15>
	; CHECK: ret <2 x i64> %1			; CHECK-NEXT: ret <2 x i64> %1
	%1 = tail call <2 x i64> @llvm.x86.sse2.psrl.q(<2 x i64> %v, <2 x i64> <i64 15, i64 9999>)			%1 = tail call <2 x i64> @llvm.x86.sse2.psrl.q(<2 x i64> %v, <2 x i64> <i64 15, i64 9999>)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

	define <2 x i64> @sse2_psrl_q_64(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_psrl_q_64(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psrl_q_64			; CHECK-LABEL: @sse2_psrl_q_64
	; CHECK: ret <2 x i64> zeroinitializer			; CHECK-NEXT: ret <2 x i64> zeroinitializer
	%1 = tail call <2 x i64> @llvm.x86.sse2.psrl.q(<2 x i64> %v, <2 x i64> <i64 64, i64 9999>)			%1 = tail call <2 x i64> @llvm.x86.sse2.psrl.q(<2 x i64> %v, <2 x i64> <i64 64, i64 9999>)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

				define <16 x i16> @avx2_psrl_w_0(<16 x i16> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psrl_w_0
				; CHECK-NEXT: ret <16 x i16> %v
				%1 = tail call <16 x i16> @llvm.x86.avx2.psrl.w(<16 x i16> %v, <8 x i16> zeroinitializer)
				ret <16 x i16> %1
				}

	define <16 x i16> @avx2_psrl_w_15(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_psrl_w_15(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrl_w_15			; CHECK-LABEL: @avx2_psrl_w_15
	; CHECK: %1 = lshr <16 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			; CHECK-NEXT: %1 = lshr <16 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	; CHECK: ret <16 x i16> %1			; CHECK-NEXT: ret <16 x i16> %1
	%1 = tail call <16 x i16> @llvm.x86.avx2.psrl.w(<16 x i16> %v, <8 x i16> <i16 15, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)			%1 = tail call <16 x i16> @llvm.x86.avx2.psrl.w(<16 x i16> %v, <8 x i16> <i16 15, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

				define <16 x i16> @avx2_psrl_w_15_splat(<16 x i16> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psrl_w_15_splat
				; CHECK-NEXT: ret <16 x i16> zeroinitializer
				%1 = tail call <16 x i16> @llvm.x86.avx2.psrl.w(<16 x i16> %v, <8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>)
				ret <16 x i16> %1
				}

	define <16 x i16> @avx2_psrl_w_64(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_psrl_w_64(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrl_w_64			; CHECK-LABEL: @avx2_psrl_w_64
	; CHECK: ret <16 x i16> zeroinitializer			; CHECK-NEXT: ret <16 x i16> zeroinitializer
	%1 = tail call <16 x i16> @llvm.x86.avx2.psrl.w(<16 x i16> %v, <8 x i16> <i16 64, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)			%1 = tail call <16 x i16> @llvm.x86.avx2.psrl.w(<16 x i16> %v, <8 x i16> <i16 64, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

				define <8 x i32> @avx2_psrl_d_0(<8 x i32> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psrl_d_0
				; CHECK-NEXT: ret <8 x i32> %v
				%1 = tail call <8 x i32> @llvm.x86.avx2.psrl.d(<8 x i32> %v, <4 x i32> zeroinitializer)
				ret <8 x i32> %1
				}

	define <8 x i32> @avx2_psrl_d_15(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_psrl_d_15(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrl_d_15			; CHECK-LABEL: @avx2_psrl_d_15
	; CHECK: %1 = lshr <8 x i32> %v, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: %1 = lshr <8 x i32> %v, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK: ret <8 x i32> %1			; CHECK-NEXT: ret <8 x i32> %1
	%1 = tail call <8 x i32> @llvm.x86.avx2.psrl.d(<8 x i32> %v, <4 x i32> <i32 15, i32 0, i32 9999, i32 9999>)			%1 = tail call <8 x i32> @llvm.x86.avx2.psrl.d(<8 x i32> %v, <4 x i32> <i32 15, i32 0, i32 9999, i32 9999>)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

				define <8 x i32> @avx2_psrl_d_15_splat(<8 x i32> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psrl_d_15_splat
				; CHECK-NEXT: ret <8 x i32> zeroinitializer
				%1 = tail call <8 x i32> @llvm.x86.avx2.psrl.d(<8 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
				ret <8 x i32> %1
				}

	define <8 x i32> @avx2_psrl_d_64(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_psrl_d_64(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrl_d_64			; CHECK-LABEL: @avx2_psrl_d_64
	; CHECK: ret <8 x i32> zeroinitializer			; CHECK-NEXT: ret <8 x i32> zeroinitializer
	%1 = tail call <8 x i32> @llvm.x86.avx2.psrl.d(<8 x i32> %v, <4 x i32> <i32 64, i32 0, i32 9999, i32 9999>)			%1 = tail call <8 x i32> @llvm.x86.avx2.psrl.d(<8 x i32> %v, <4 x i32> <i32 64, i32 0, i32 9999, i32 9999>)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

				define <4 x i64> @avx2_psrl_q_0(<4 x i64> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psrl_q_0
				; CHECK-NEXT: ret <4 x i64> %v
				%1 = tail call <4 x i64> @llvm.x86.avx2.psrl.q(<4 x i64> %v, <2 x i64> zeroinitializer)
				ret <4 x i64> %1
				}

	define <4 x i64> @avx2_psrl_q_15(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_psrl_q_15(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrl_q_15			; CHECK-LABEL: @avx2_psrl_q_15
	; CHECK: %1 = lshr <4 x i64> %v, <i64 15, i64 15, i64 15, i64 15>			; CHECK-NEXT: %1 = lshr <4 x i64> %v, <i64 15, i64 15, i64 15, i64 15>
	; CHECK: ret <4 x i64> %1			; CHECK-NEXT: ret <4 x i64> %1
	%1 = tail call <4 x i64> @llvm.x86.avx2.psrl.q(<4 x i64> %v, <2 x i64> <i64 15, i64 9999>)			%1 = tail call <4 x i64> @llvm.x86.avx2.psrl.q(<4 x i64> %v, <2 x i64> <i64 15, i64 9999>)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	define <4 x i64> @avx2_psrl_q_64(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_psrl_q_64(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psrl_q_64			; CHECK-LABEL: @avx2_psrl_q_64
	; CHECK: ret <4 x i64> zeroinitializer			; CHECK-NEXT: ret <4 x i64> zeroinitializer
	%1 = tail call <4 x i64> @llvm.x86.avx2.psrl.q(<4 x i64> %v, <2 x i64> <i64 64, i64 9999>)			%1 = tail call <4 x i64> @llvm.x86.avx2.psrl.q(<4 x i64> %v, <2 x i64> <i64 64, i64 9999>)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	;			;
	; SHL - Constant Vector			; SHL - Constant Vector
	;			;

				define <8 x i16> @sse2_psll_w_0(<8 x i16> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psll_w_0
				; CHECK-NEXT: ret <8 x i16> %v
				%1 = tail call <8 x i16> @llvm.x86.sse2.psll.w(<8 x i16> %v, <8 x i16> zeroinitializer)
				ret <8 x i16> %1
				}

	define <8 x i16> @sse2_psll_w_15(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_psll_w_15(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psll_w_15			; CHECK-LABEL: @sse2_psll_w_15
	; CHECK: %1 = shl <8 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			; CHECK-NEXT: %1 = shl <8 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	; CHECK: ret <8 x i16> %1			; CHECK-NEXT: ret <8 x i16> %1
	%1 = tail call <8 x i16> @llvm.x86.sse2.psll.w(<8 x i16> %v, <8 x i16> <i16 15, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)			%1 = tail call <8 x i16> @llvm.x86.sse2.psll.w(<8 x i16> %v, <8 x i16> <i16 15, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

				define <8 x i16> @sse2_psll_w_15_splat(<8 x i16> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psll_w_15_splat
				; CHECK-NEXT: ret <8 x i16> zeroinitializer
				%1 = tail call <8 x i16> @llvm.x86.sse2.psll.w(<8 x i16> %v, <8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>)
				ret <8 x i16> %1
				}

	define <8 x i16> @sse2_psll_w_64(<8 x i16> %v) nounwind readnone uwtable {			define <8 x i16> @sse2_psll_w_64(<8 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psll_w_64			; CHECK-LABEL: @sse2_psll_w_64
	; CHECK: ret <8 x i16> zeroinitializer			; CHECK-NEXT: ret <8 x i16> zeroinitializer
	%1 = tail call <8 x i16> @llvm.x86.sse2.psll.w(<8 x i16> %v, <8 x i16> <i16 64, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)			%1 = tail call <8 x i16> @llvm.x86.sse2.psll.w(<8 x i16> %v, <8 x i16> <i16 64, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

				define <4 x i32> @sse2_psll_d_0(<4 x i32> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psll_d_0
				; CHECK-NEXT: ret <4 x i32> %v
				%1 = tail call <4 x i32> @llvm.x86.sse2.psll.d(<4 x i32> %v, <4 x i32> zeroinitializer)
				ret <4 x i32> %1
				}

	define <4 x i32> @sse2_psll_d_15(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_psll_d_15(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psll_d_15			; CHECK-LABEL: @sse2_psll_d_15
	; CHECK: %1 = shl <4 x i32> %v, <i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: %1 = shl <4 x i32> %v, <i32 15, i32 15, i32 15, i32 15>
	; CHECK: ret <4 x i32> %1			; CHECK-NEXT: ret <4 x i32> %1
	%1 = tail call <4 x i32> @llvm.x86.sse2.psll.d(<4 x i32> %v, <4 x i32> <i32 15, i32 0, i32 9999, i32 9999>)			%1 = tail call <4 x i32> @llvm.x86.sse2.psll.d(<4 x i32> %v, <4 x i32> <i32 15, i32 0, i32 9999, i32 9999>)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

				define <4 x i32> @sse2_psll_d_15_splat(<4 x i32> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psll_d_15_splat
				; CHECK-NEXT: ret <4 x i32> zeroinitializer
				%1 = tail call <4 x i32> @llvm.x86.sse2.psll.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
				ret <4 x i32> %1
				}

	define <4 x i32> @sse2_psll_d_64(<4 x i32> %v) nounwind readnone uwtable {			define <4 x i32> @sse2_psll_d_64(<4 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psll_d_64			; CHECK-LABEL: @sse2_psll_d_64
	; CHECK: ret <4 x i32> zeroinitializer			; CHECK-NEXT: ret <4 x i32> zeroinitializer
	%1 = tail call <4 x i32> @llvm.x86.sse2.psll.d(<4 x i32> %v, <4 x i32> <i32 64, i32 0, i32 9999, i32 9999>)			%1 = tail call <4 x i32> @llvm.x86.sse2.psll.d(<4 x i32> %v, <4 x i32> <i32 64, i32 0, i32 9999, i32 9999>)
	ret <4 x i32> %1			ret <4 x i32> %1
	}			}

				define <2 x i64> @sse2_psll_q_0(<2 x i64> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @sse2_psll_q_0
				; CHECK-NEXT: ret <2 x i64> %v
				%1 = tail call <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64> %v, <2 x i64> zeroinitializer)
				ret <2 x i64> %1
				}

	define <2 x i64> @sse2_psll_q_15(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_psll_q_15(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psll_q_15			; CHECK-LABEL: @sse2_psll_q_15
	; CHECK: %1 = shl <2 x i64> %v, <i64 15, i64 15>			; CHECK-NEXT: %1 = shl <2 x i64> %v, <i64 15, i64 15>
	; CHECK: ret <2 x i64> %1			; CHECK-NEXT: ret <2 x i64> %1
	%1 = tail call <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64> %v, <2 x i64> <i64 15, i64 9999>)			%1 = tail call <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64> %v, <2 x i64> <i64 15, i64 9999>)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

	define <2 x i64> @sse2_psll_q_64(<2 x i64> %v) nounwind readnone uwtable {			define <2 x i64> @sse2_psll_q_64(<2 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @sse2_psll_q_64			; CHECK-LABEL: @sse2_psll_q_64
	; CHECK: ret <2 x i64> zeroinitializer			; CHECK-NEXT: ret <2 x i64> zeroinitializer
	%1 = tail call <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64> %v, <2 x i64> <i64 64, i64 9999>)			%1 = tail call <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64> %v, <2 x i64> <i64 64, i64 9999>)
	ret <2 x i64> %1			ret <2 x i64> %1
	}			}

				define <16 x i16> @avx2_psll_w_0(<16 x i16> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psll_w_0
				; CHECK-NEXT: ret <16 x i16> %v
				%1 = tail call <16 x i16> @llvm.x86.avx2.psll.w(<16 x i16> %v, <8 x i16> zeroinitializer)
				ret <16 x i16> %1
				}

	define <16 x i16> @avx2_psll_w_15(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_psll_w_15(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psll_w_15			; CHECK-LABEL: @avx2_psll_w_15
	; CHECK: %1 = shl <16 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			; CHECK-NEXT: %1 = shl <16 x i16> %v, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	; CHECK: ret <16 x i16> %1			; CHECK-NEXT: ret <16 x i16> %1
	%1 = tail call <16 x i16> @llvm.x86.avx2.psll.w(<16 x i16> %v, <8 x i16> <i16 15, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)			%1 = tail call <16 x i16> @llvm.x86.avx2.psll.w(<16 x i16> %v, <8 x i16> <i16 15, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

				define <16 x i16> @avx2_psll_w_15_splat(<16 x i16> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psll_w_15_splat
				; CHECK-NEXT: ret <16 x i16> zeroinitializer
				%1 = tail call <16 x i16> @llvm.x86.avx2.psll.w(<16 x i16> %v, <8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>)
				ret <16 x i16> %1
				}

	define <16 x i16> @avx2_psll_w_64(<16 x i16> %v) nounwind readnone uwtable {			define <16 x i16> @avx2_psll_w_64(<16 x i16> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psll_w_64			; CHECK-LABEL: @avx2_psll_w_64
	; CHECK: ret <16 x i16> zeroinitializer			; CHECK-NEXT: ret <16 x i16> zeroinitializer
	%1 = tail call <16 x i16> @llvm.x86.avx2.psll.w(<16 x i16> %v, <8 x i16> <i16 64, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)			%1 = tail call <16 x i16> @llvm.x86.avx2.psll.w(<16 x i16> %v, <8 x i16> <i16 64, i16 0, i16 0, i16 0, i16 9999, i16 9999, i16 9999, i16 9999>)
	ret <16 x i16> %1			ret <16 x i16> %1
	}			}

				define <8 x i32> @avx2_psll_d_0(<8 x i32> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psll_d_0
				; CHECK-NEXT: ret <8 x i32> %v
				%1 = tail call <8 x i32> @llvm.x86.avx2.psll.d(<8 x i32> %v, <4 x i32> zeroinitializer)
				ret <8 x i32> %1
				}

	define <8 x i32> @avx2_psll_d_15(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_psll_d_15(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psll_d_15			; CHECK-LABEL: @avx2_psll_d_15
	; CHECK: %1 = shl <8 x i32> %v, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: %1 = shl <8 x i32> %v, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK: ret <8 x i32> %1			; CHECK-NEXT: ret <8 x i32> %1
	%1 = tail call <8 x i32> @llvm.x86.avx2.psll.d(<8 x i32> %v, <4 x i32> <i32 15, i32 0, i32 9999, i32 9999>)			%1 = tail call <8 x i32> @llvm.x86.avx2.psll.d(<8 x i32> %v, <4 x i32> <i32 15, i32 0, i32 9999, i32 9999>)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

				define <8 x i32> @avx2_psll_d_15_splat(<8 x i32> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psll_d_15_splat
				; CHECK-NEXT: ret <8 x i32> zeroinitializer
				%1 = tail call <8 x i32> @llvm.x86.avx2.psll.d(<8 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
				ret <8 x i32> %1
				}

	define <8 x i32> @avx2_psll_d_64(<8 x i32> %v) nounwind readnone uwtable {			define <8 x i32> @avx2_psll_d_64(<8 x i32> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psll_d_64			; CHECK-LABEL: @avx2_psll_d_64
	; CHECK: ret <8 x i32> zeroinitializer			; CHECK-NEXT: ret <8 x i32> zeroinitializer
	%1 = tail call <8 x i32> @llvm.x86.avx2.psll.d(<8 x i32> %v, <4 x i32> <i32 64, i32 0, i32 9999, i32 9999>)			%1 = tail call <8 x i32> @llvm.x86.avx2.psll.d(<8 x i32> %v, <4 x i32> <i32 64, i32 0, i32 9999, i32 9999>)
	ret <8 x i32> %1			ret <8 x i32> %1
	}			}

				define <4 x i64> @avx2_psll_q_0(<4 x i64> %v) nounwind readnone uwtable {
				; CHECK-LABEL: @avx2_psll_q_0
				; CHECK-NEXT: ret <4 x i64> %v
				%1 = tail call <4 x i64> @llvm.x86.avx2.psll.q(<4 x i64> %v, <2 x i64> zeroinitializer)
				ret <4 x i64> %1
				}

	define <4 x i64> @avx2_psll_q_15(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_psll_q_15(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psll_q_15			; CHECK-LABEL: @avx2_psll_q_15
	; CHECK: %1 = shl <4 x i64> %v, <i64 15, i64 15, i64 15, i64 15>			; CHECK-NEXT: %1 = shl <4 x i64> %v, <i64 15, i64 15, i64 15, i64 15>
	; CHECK: ret <4 x i64> %1			; CHECK-NEXT: ret <4 x i64> %1
	%1 = tail call <4 x i64> @llvm.x86.avx2.psll.q(<4 x i64> %v, <2 x i64> <i64 15, i64 9999>)			%1 = tail call <4 x i64> @llvm.x86.avx2.psll.q(<4 x i64> %v, <2 x i64> <i64 15, i64 9999>)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	define <4 x i64> @avx2_psll_q_64(<4 x i64> %v) nounwind readnone uwtable {			define <4 x i64> @avx2_psll_q_64(<4 x i64> %v) nounwind readnone uwtable {
	; CHECK-LABEL: @avx2_psll_q_64			; CHECK-LABEL: @avx2_psll_q_64
	; CHECK: ret <4 x i64> zeroinitializer			; CHECK-NEXT: ret <4 x i64> zeroinitializer
	%1 = tail call <4 x i64> @llvm.x86.avx2.psll.q(<4 x i64> %v, <2 x i64> <i64 64, i64 9999>)			%1 = tail call <4 x i64> @llvm.x86.avx2.psll.q(<4 x i64> %v, <2 x i64> <i64 64, i64 9999>)
	ret <4 x i64> %1			ret <4 x i64> %1
	}			}

	;			;
	; Constant Folding			; Constant Folding
	;			;

	▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	declare <8 x i32> @llvm.x86.avx2.psll.d(<8 x i32>, <4 x i32>) #1			declare <8 x i32> @llvm.x86.avx2.psll.d(<8 x i32>, <4 x i32>) #1
	declare <16 x i16> @llvm.x86.avx2.psll.w(<16 x i16>, <8 x i16>) #1			declare <16 x i16> @llvm.x86.avx2.psll.w(<16 x i16>, <8 x i16>) #1
	declare <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64>, i32) #1			declare <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64>, i32) #1
	declare <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32>, i32) #1			declare <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32>, i32) #1
	declare <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16>, i32) #1			declare <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16>, i32) #1
	declare <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64>, <2 x i64>) #1			declare <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64>, <2 x i64>) #1
	declare <4 x i32> @llvm.x86.sse2.psll.d(<4 x i32>, <4 x i32>) #1			declare <4 x i32> @llvm.x86.sse2.psll.d(<4 x i32>, <4 x i32>) #1
	declare <8 x i16> @llvm.x86.sse2.psll.w(<8 x i16>, <8 x i16>) #1			declare <8 x i16> @llvm.x86.sse2.psll.w(<8 x i16>, <8 x i16>) #1

	declare <4 x i64> @llvm.x86.avx2.psrli.q(<4 x i64>, i32) #1			declare <4 x i64> @llvm.x86.avx2.psrli.q(<4 x i64>, i32) #1
	declare <8 x i32> @llvm.x86.avx2.psrli.d(<8 x i32>, i32) #1			declare <8 x i32> @llvm.x86.avx2.psrli.d(<8 x i32>, i32) #1
	declare <16 x i16> @llvm.x86.avx2.psrli.w(<16 x i16>, i32) #1			declare <16 x i16> @llvm.x86.avx2.psrli.w(<16 x i16>, i32) #1
	declare <4 x i64> @llvm.x86.avx2.psrl.q(<4 x i64>, <2 x i64>) #1			declare <4 x i64> @llvm.x86.avx2.psrl.q(<4 x i64>, <2 x i64>) #1
	declare <8 x i32> @llvm.x86.avx2.psrl.d(<8 x i32>, <4 x i32>) #1			declare <8 x i32> @llvm.x86.avx2.psrl.d(<8 x i32>, <4 x i32>) #1
	declare <16 x i16> @llvm.x86.avx2.psrl.w(<16 x i16>, <8 x i16>) #1			declare <16 x i16> @llvm.x86.avx2.psrl.w(<16 x i16>, <8 x i16>) #1
	declare <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64>, i32) #1			declare <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64>, i32) #1
	declare <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32>, i32) #1			declare <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32>, i32) #1
	declare <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16>, i32) #1			declare <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16>, i32) #1
	declare <2 x i64> @llvm.x86.sse2.psrl.q(<2 x i64>, <2 x i64>) #1			declare <2 x i64> @llvm.x86.sse2.psrl.q(<2 x i64>, <2 x i64>) #1
	declare <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32>, <4 x i32>) #1			declare <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32>, <4 x i32>) #1
	declare <8 x i16> @llvm.x86.sse2.psrl.w(<8 x i16>, <8 x i16>) #1			declare <8 x i16> @llvm.x86.sse2.psrl.w(<8 x i16>, <8 x i16>) #1

	attributes #1 = { nounwind readnone }			attributes #1 = { nounwind readnone }