This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineVectorOps.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
shuffle_select.ll

Differential D48401

[InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
ClosedPublic

Authored by spatel on Jun 20 2018, 3:51 PM.

Download Raw Diff

Details

Reviewers

efriedma
lebedev.ri
RKSimon

Commits

rGa76b70069d46: [InstCombine] fold vector select of binops with constant ops to 1 binop…
rL335283: [InstCombine] fold vector select of binops with constant ops to 1 binop…

Summary

This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have a common variable operand used in a pair of binops with vector constants that are vector selected together, then we can constant shuffle the constant vectors to eliminate the shuffle instruction.

This has some tricky parts that are hopefully addressed in the tests and their respective comments:

If the shuffle mask contains an undef element, then that lane of the result is undef:

http://llvm.org/docs/LangRef.html#shufflevector-instruction

Therefore, we can replace the constant in that lane with an undef value except for div/rem. With div/rem, an undef in the divisor would cause the whole op to be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'. (Making the code from that patch into an InstCombine utility function could be a preliminary NFC patch if that's desired.)

Intersect the wrapping and FMF of the original binops for the new binop. There should be no extra poison or fast-math potential in the new binop that wasn't possible in the original code.

Disregard other uses. Given that we're eliminating uses (shortening the dependency chain), I think that's always the right IR canonicalization. But I purposely chose the udiv test to demonstrate the scenario where both intermediate values have other uses because that seems likely worse for codegen with an expensive math op. This seems like a very rare possibility to me, so I don't think it requires a backend patch first, but if we must avoid that, then we can limit the transform based on uses.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Jun 20 2018, 3:51 PM

Herald added a subscriber: mcrosier. · View Herald TranscriptJun 20 2018, 3:51 PM

lebedev.ri added inline comments.Jun 20 2018, 11:45 PM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1152 ↗	(On Diff #152178)	Do you envision adding other top-level `if`'s later? Otherwise you could do early return.
1170–1171 ↗	(On Diff #152178)	Hm, you are sure there is a test for this in `test/Transforms/InstCombine/shuffle_select.ll`?

spatel added inline comments.Jun 21 2018, 5:25 AM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1152 ↗	(On Diff #152178)	Yes, there are 2 potential near-term follow-ups discussed in the bug report: Match 2 variable vectors rather than 1 repeated vector. Match different opcodes with special constants (example: mul and shl). Assuming these are all good combines, they might be big enough that they each deserve their own helper, so I'll adjust this.
1170–1171 ↗	(On Diff #152178)	Yes - the sdiv and urem tests have undef elements in the shuffle mask, so they would both fail without this condition.

Patch updated:
No functional change from the previous rev, but use early exits to minimize indents and add TODO comments for the planned enhancements.

This doesn't look immediately broken to me,
but you probably want to wait a bit for more definitive review.

test/Transforms/InstCombine/shuffle_select.ll
160–199 ↗	(On Diff #152178)	Oh, somehow i did not notice this.. No `udiv`+`undef` test it seems though.

This revision is now accepted and ready to land.Jun 21 2018, 5:50 AM

LGTM

lib/Transforms/InstCombine/InstructionCombining.cpp
1428 ↗	(On Diff #152263)	Might as well separate this off as an NFC pre-commit.

spatel mentioned this in rL335242: [InstCombine] make div/rem vector constant utility function; NFCI.Jun 21 2018, 8:04 AM

Patch updated:
Committed the helper function change with rL335242, so this is only the shuffle fold now.

Closed by commit rL335283: [InstCombine] fold vector select of binops with constant ops to 1 binop… (authored by spatel). · Explain WhyJun 21 2018, 1:19 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D48485: [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806).Jun 22 2018, 7:04 AM

spatel mentioned this in D48678: [InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806).Jun 27 2018, 3:05 PM

spatel mentioned this in rL335888: [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806).Jun 28 2018, 10:52 AM

spatel mentioned this in rL335974: [InstCombine] enhance shuffle-of-binops to allow different variable ops….Jun 29 2018, 6:48 AM

spatel mentioned this in D48830: [InstCombine] fold shuffle-with-binop and common value.Jul 2 2018, 6:59 AM

spatel mentioned this in rL336196: [InstCombine] fold shuffle-with-binop and common value.Jul 3 2018, 6:49 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

51 lines

test/

Transforms/

InstCombine/

shuffle_select.ll

66 lines

Diff 152366

llvm/trunk/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 1,134 Lines • ▼ Show 20 Lines	static bool isShuffleExtractingFromLHS(ShuffleVectorInst &SVI,
if (BegIdx > EndIdx \|\| EndIdx >= LHSElems \|\| EndIdx - BegIdx != MaskElems - 1)		if (BegIdx > EndIdx \|\| EndIdx >= LHSElems \|\| EndIdx - BegIdx != MaskElems - 1)
return false;		return false;
for (unsigned I = 0; I != MaskElems; ++I)		for (unsigned I = 0; I != MaskElems; ++I)
if (static_cast<unsigned>(Mask[I]) != BegIdx + I)		if (static_cast<unsigned>(Mask[I]) != BegIdx + I)
return false;		return false;
return true;		return true;
}		}

		static Instruction *foldSelectShuffles(ShuffleVectorInst &Shuf) {
		// Folds under here require the equivalent of a vector select.
		if (!Shuf.isSelect())
		return nullptr;

		BinaryOperator B0, B1;
		if (!match(Shuf.getOperand(0), m_BinOp(B0)) \|\|
		!match(Shuf.getOperand(1), m_BinOp(B1)))
		return nullptr;

		// TODO: There are potential folds where the opcodes do not match (mul+shl).
		if (B0->getOpcode() != B1->getOpcode())
		return nullptr;

		// TODO: Fold the case with different variable operands (requires creating a
		// new shuffle and checking number of uses).
		Value *X;
		Constant C0, C1;
		if (!match(B0, m_c_BinOp(m_Value(X), m_Constant(C0))) \|\|
		!match(B1, m_c_BinOp(m_Specific(X), m_Constant(C1))))
		return nullptr;

		// If all operands are constants, let constant folding remove the binops.
		if (isa<Constant>(X))
		return nullptr;

		// Remove a binop and the shuffle by rearranging the constant:
		// shuffle (op X, C0), (op X, C1), M --> op X, C'
		// shuffle (op C0, X), (op C1, X), M --> op C', X
		Constant *NewC = ConstantExpr::getShuffleVector(C0, C1, Shuf.getMask());

		// If the shuffle mask contains undef elements, then the new constant
		// vector will have undefs in those lanes. This could cause the entire
		// binop to be undef.
		if (B0->isIntDivRem())
		NewC = getSafeVectorConstantForIntDivRem(NewC);

		BinaryOperator::BinaryOps Opc = B0->getOpcode();
		bool Op0IsConst = isa<Constant>(B0->getOperand(0));
		Instruction *NewBO = Op0IsConst ? BinaryOperator::Create(Opc, NewC, X) :
		BinaryOperator::Create(Opc, X, NewC);

		// Flags are intersected from the 2 source binops.
		NewBO->copyIRFlags(B0);
		NewBO->andIRFlags(B1);
		return NewBO;
		}

Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {		Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
Value *LHS = SVI.getOperand(0);		Value *LHS = SVI.getOperand(0);
Value *RHS = SVI.getOperand(1);		Value *RHS = SVI.getOperand(1);
SmallVector<int, 16> Mask = SVI.getShuffleMask();		SmallVector<int, 16> Mask = SVI.getShuffleMask();
Type *Int32Ty = Type::getInt32Ty(SVI.getContext());		Type *Int32Ty = Type::getInt32Ty(SVI.getContext());

if (auto *V = SimplifyShuffleVectorInst(		if (auto *V = SimplifyShuffleVectorInst(
LHS, RHS, SVI.getMask(), SVI.getType(), SQ.getWithInstruction(&SVI)))		LHS, RHS, SVI.getMask(), SVI.getType(), SQ.getWithInstruction(&SVI)))
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);

		if (Instruction *I = foldSelectShuffles(SVI))
		return I;

bool MadeChange = false;		bool MadeChange = false;
unsigned VWidth = SVI.getType()->getVectorNumElements();		unsigned VWidth = SVI.getType()->getVectorNumElements();

APInt UndefElts(VWidth, 0);		APInt UndefElts(VWidth, 0);
APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));		APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));
if (Value *V = SimplifyDemandedVectorElts(&SVI, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(&SVI, AllOnesEltMask, UndefElts)) {
if (V != &SVI)		if (V != &SVI)
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);
▲ Show 20 Lines • Show All 335 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/shuffle_select.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	; Try to eliminate binops and shuffles when the shuffle is a select in disguise:			; Try to eliminate binops and shuffles when the shuffle is a select in disguise:
	; PR37806 - https://bugs.llvm.org/show_bug.cgi?id=37806			; PR37806 - https://bugs.llvm.org/show_bug.cgi?id=37806

	define <4 x i32> @add(<4 x i32> %v0) {			define <4 x i32> @add(<4 x i32> %v0) {
	; CHECK-LABEL: @add(			; CHECK-LABEL: @add(
	; CHECK-NEXT: [[T1:%.]] = add <4 x i32> [[V0:%.]], <i32 1, i32 undef, i32 3, i32 undef>			; CHECK-NEXT: [[T3:%.]] = add <4 x i32> [[V0:%.]], <i32 1, i32 6, i32 3, i32 8>
	; CHECK-NEXT: [[T2:%.*]] = add <4 x i32> [[V0]], <i32 undef, i32 6, i32 undef, i32 8>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = add <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = add <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = add <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = add <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 5, i32 2, i32 7>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	; Constant operand 0 (LHS) also works.			; Constant operand 0 (LHS) also works.

	define <4 x i32> @sub(<4 x i32> %v0) {			define <4 x i32> @sub(<4 x i32> %v0) {
	; CHECK-LABEL: @sub(			; CHECK-LABEL: @sub(
	; CHECK-NEXT: [[T1:%.]] = sub <4 x i32> <i32 1, i32 2, i32 3, i32 undef>, [[V0:%.]]			; CHECK-NEXT: [[T3:%.]] = sub <4 x i32> <i32 1, i32 2, i32 3, i32 8>, [[V0:%.]]
	; CHECK-NEXT: [[T2:%.*]] = sub <4 x i32> <i32 undef, i32 undef, i32 undef, i32 8>, [[V0]]
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = sub <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0			%t1 = sub <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0
	%t2 = sub <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0			%t2 = sub <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 1, i32 2, i32 7>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 1, i32 2, i32 7>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	; If any element of the shuffle mask operand is undef, that element of the result is undef.			; If any element of the shuffle mask operand is undef, that element of the result is undef.
	; The shuffle is eliminated in this transform, but we can replace a constant element with undef.			; The shuffle is eliminated in this transform, but we can replace a constant element with undef.

	define <4 x i32> @mul(<4 x i32> %v0) {			define <4 x i32> @mul(<4 x i32> %v0) {
	; CHECK-LABEL: @mul(			; CHECK-LABEL: @mul(
	; CHECK-NEXT: [[T1:%.]] = mul <4 x i32> [[V0:%.]], <i32 undef, i32 undef, i32 3, i32 undef>			; CHECK-NEXT: [[T3:%.]] = mul <4 x i32> [[V0:%.]], <i32 undef, i32 6, i32 3, i32 8>
	; CHECK-NEXT: [[T2:%.*]] = mul <4 x i32> [[V0]], <i32 undef, i32 6, i32 undef, i32 8>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 undef, i32 5, i32 2, i32 7>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = mul <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = mul <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = mul <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = mul <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 undef, i32 5, i32 2, i32 7>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 undef, i32 5, i32 2, i32 7>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	; Preserve flags when possible.			; Preserve flags when possible.

	define <4 x i32> @shl(<4 x i32> %v0) {			define <4 x i32> @shl(<4 x i32> %v0) {
	; CHECK-LABEL: @shl(			; CHECK-LABEL: @shl(
	; CHECK-NEXT: [[T1:%.]] = shl nuw <4 x i32> [[V0:%.]], <i32 1, i32 2, i32 3, i32 4>			; CHECK-NEXT: [[T3:%.]] = shl nuw <4 x i32> [[V0:%.]], <i32 undef, i32 6, i32 3, i32 undef>
	; CHECK-NEXT: [[T2:%.*]] = shl nuw <4 x i32> [[V0]], <i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 undef, i32 5, i32 2, i32 undef>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = shl nuw <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = shl nuw <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = shl nuw <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = shl nuw <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 undef, i32 5, i32 2, i32 undef>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 undef, i32 5, i32 2, i32 undef>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	; Can't propagate the flag here.			; Can't propagate the flag here.

	define <4 x i32> @lshr(<4 x i32> %v0) {			define <4 x i32> @lshr(<4 x i32> %v0) {
	; CHECK-LABEL: @lshr(			; CHECK-LABEL: @lshr(
	; CHECK-NEXT: [[T1:%.]] = lshr exact <4 x i32> <i32 1, i32 2, i32 3, i32 4>, [[V0:%.]]			; CHECK-NEXT: [[T3:%.]] = lshr <4 x i32> <i32 5, i32 6, i32 3, i32 8>, [[V0:%.]]
	; CHECK-NEXT: [[T2:%.*]] = lshr <4 x i32> <i32 5, i32 6, i32 7, i32 8>, [[V0]]
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 4, i32 5, i32 2, i32 7>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = lshr exact <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0			%t1 = lshr exact <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0
	%t2 = lshr <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0			%t2 = lshr <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 4, i32 5, i32 2, i32 7>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 4, i32 5, i32 2, i32 7>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	; Try weird types.			; Try weird types.

	define <3 x i32> @ashr(<3 x i32> %v0) {			define <3 x i32> @ashr(<3 x i32> %v0) {
	; CHECK-LABEL: @ashr(			; CHECK-LABEL: @ashr(
	; CHECK-NEXT: [[T1:%.]] = ashr <3 x i32> [[V0:%.]], <i32 1, i32 2, i32 3>			; CHECK-NEXT: [[T3:%.]] = ashr <3 x i32> [[V0:%.]], <i32 4, i32 2, i32 3>
	; CHECK-NEXT: [[T2:%.*]] = ashr <3 x i32> [[V0]], <i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <3 x i32> [[T1]], <3 x i32> [[T2]], <3 x i32> <i32 3, i32 1, i32 2>
	; CHECK-NEXT: ret <3 x i32> [[T3]]			; CHECK-NEXT: ret <3 x i32> [[T3]]
	;			;
	%t1 = ashr <3 x i32> %v0, <i32 1, i32 2, i32 3>			%t1 = ashr <3 x i32> %v0, <i32 1, i32 2, i32 3>
	%t2 = ashr <3 x i32> %v0, <i32 4, i32 5, i32 6>			%t2 = ashr <3 x i32> %v0, <i32 4, i32 5, i32 6>
	%t3 = shufflevector <3 x i32> %t1, <3 x i32> %t2, <3 x i32> <i32 3, i32 1, i32 2>			%t3 = shufflevector <3 x i32> %t1, <3 x i32> %t2, <3 x i32> <i32 3, i32 1, i32 2>
	ret <3 x i32> %t3			ret <3 x i32> %t3
	}			}

	define <3 x i42> @and(<3 x i42> %v0) {			define <3 x i42> @and(<3 x i42> %v0) {
	; CHECK-LABEL: @and(			; CHECK-LABEL: @and(
	; CHECK-NEXT: [[T1:%.]] = and <3 x i42> [[V0:%.]], <i42 1, i42 undef, i42 undef>			; CHECK-NEXT: [[T3:%.]] = and <3 x i42> [[V0:%.]], <i42 1, i42 5, i42 undef>
	; CHECK-NEXT: [[T2:%.*]] = and <3 x i42> [[V0]], <i42 undef, i42 5, i42 undef>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <3 x i42> [[T1]], <3 x i42> [[T2]], <3 x i32> <i32 0, i32 4, i32 undef>
	; CHECK-NEXT: ret <3 x i42> [[T3]]			; CHECK-NEXT: ret <3 x i42> [[T3]]
	;			;
	%t1 = and <3 x i42> %v0, <i42 1, i42 2, i42 3>			%t1 = and <3 x i42> %v0, <i42 1, i42 2, i42 3>
	%t2 = and <3 x i42> %v0, <i42 4, i42 5, i42 6>			%t2 = and <3 x i42> %v0, <i42 4, i42 5, i42 6>
	%t3 = shufflevector <3 x i42> %t1, <3 x i42> %t2, <3 x i32> <i32 0, i32 4, i32 undef>			%t3 = shufflevector <3 x i42> %t1, <3 x i42> %t2, <3 x i32> <i32 0, i32 4, i32 undef>
	ret <3 x i42> %t3			ret <3 x i42> %t3
	}			}

	; It doesn't matter if the intermediate ops have extra uses.			; It doesn't matter if the intermediate ops have extra uses.

	declare void @use_v4i32(<4 x i32>)			declare void @use_v4i32(<4 x i32>)

	define <4 x i32> @or(<4 x i32> %v0) {			define <4 x i32> @or(<4 x i32> %v0) {
	; CHECK-LABEL: @or(			; CHECK-LABEL: @or(
	; CHECK-NEXT: [[T1:%.]] = or <4 x i32> [[V0:%.]], <i32 1, i32 2, i32 3, i32 4>			; CHECK-NEXT: [[T1:%.]] = or <4 x i32> [[V0:%.]], <i32 1, i32 2, i32 3, i32 4>
	; CHECK-NEXT: [[T2:%.*]] = or <4 x i32> [[V0]], <i32 5, i32 6, i32 undef, i32 undef>			; CHECK-NEXT: [[T3:%.*]] = or <4 x i32> [[V0]], <i32 5, i32 6, i32 3, i32 4>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: call void @use_v4i32(<4 x i32> [[T1]])			; CHECK-NEXT: call void @use_v4i32(<4 x i32> [[T1]])
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = or <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = or <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = or <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = or <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 4, i32 5, i32 2, i32 3>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	call void @use_v4i32(<4 x i32> %t1)			call void @use_v4i32(<4 x i32> %t1)
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	define <4 x i32> @xor(<4 x i32> %v0) {			define <4 x i32> @xor(<4 x i32> %v0) {
	; CHECK-LABEL: @xor(			; CHECK-LABEL: @xor(
	; CHECK-NEXT: [[T1:%.]] = xor <4 x i32> [[V0:%.]], <i32 1, i32 undef, i32 3, i32 4>			; CHECK-NEXT: [[T2:%.]] = xor <4 x i32> [[V0:%.]], <i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[T2:%.*]] = xor <4 x i32> [[V0]], <i32 5, i32 6, i32 7, i32 8>			; CHECK-NEXT: [[T3:%.*]] = xor <4 x i32> [[V0]], <i32 1, i32 6, i32 3, i32 4>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
	; CHECK-NEXT: call void @use_v4i32(<4 x i32> [[T2]])			; CHECK-NEXT: call void @use_v4i32(<4 x i32> [[T2]])
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = xor <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = xor <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = xor <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = xor <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 5, i32 2, i32 3>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 5, i32 2, i32 3>
	call void @use_v4i32(<4 x i32> %t2)			call void @use_v4i32(<4 x i32> %t2)
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	define <4 x i32> @udiv(<4 x i32> %v0) {			define <4 x i32> @udiv(<4 x i32> %v0) {
	; CHECK-LABEL: @udiv(			; CHECK-LABEL: @udiv(
	; CHECK-NEXT: [[T1:%.]] = udiv <4 x i32> <i32 1, i32 2, i32 3, i32 4>, [[V0:%.]]			; CHECK-NEXT: [[T1:%.]] = udiv <4 x i32> <i32 1, i32 2, i32 3, i32 4>, [[V0:%.]]
	; CHECK-NEXT: [[T2:%.*]] = udiv <4 x i32> <i32 5, i32 6, i32 7, i32 8>, [[V0]]			; CHECK-NEXT: [[T2:%.*]] = udiv <4 x i32> <i32 5, i32 6, i32 7, i32 8>, [[V0]]
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>			; CHECK-NEXT: [[T3:%.*]] = udiv <4 x i32> <i32 1, i32 2, i32 3, i32 8>, [[V0]]
	; CHECK-NEXT: call void @use_v4i32(<4 x i32> [[T1]])			; CHECK-NEXT: call void @use_v4i32(<4 x i32> [[T1]])
	; CHECK-NEXT: call void @use_v4i32(<4 x i32> [[T2]])			; CHECK-NEXT: call void @use_v4i32(<4 x i32> [[T2]])
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = udiv <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0			%t1 = udiv <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0
	%t2 = udiv <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0			%t2 = udiv <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 1, i32 2, i32 7>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 1, i32 2, i32 7>
	call void @use_v4i32(<4 x i32> %t1)			call void @use_v4i32(<4 x i32> %t1)
	call void @use_v4i32(<4 x i32> %t2)			call void @use_v4i32(<4 x i32> %t2)
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	; Div/rem need special handling if the shuffle has undef elements.			; Div/rem need special handling if the shuffle has undef elements.

	define <4 x i32> @sdiv(<4 x i32> %v0) {			define <4 x i32> @sdiv(<4 x i32> %v0) {
	; CHECK-LABEL: @sdiv(			; CHECK-LABEL: @sdiv(
	; CHECK-NEXT: [[T1:%.]] = sdiv <4 x i32> [[V0:%.]], <i32 1, i32 2, i32 3, i32 4>			; CHECK-NEXT: [[T3:%.]] = sdiv <4 x i32> [[V0:%.]], <i32 1, i32 2, i32 7, i32 1>
	; CHECK-NEXT: [[T2:%.*]] = sdiv <4 x i32> [[V0]], <i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 undef, i32 1, i32 6, i32 undef>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = sdiv <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = sdiv <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = sdiv <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = sdiv <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 undef, i32 1, i32 6, i32 undef>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 undef, i32 1, i32 6, i32 undef>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	define <4 x i32> @urem(<4 x i32> %v0) {			define <4 x i32> @urem(<4 x i32> %v0) {
	; CHECK-LABEL: @urem(			; CHECK-LABEL: @urem(
	; CHECK-NEXT: [[T1:%.]] = urem <4 x i32> <i32 1, i32 2, i32 3, i32 4>, [[V0:%.]]			; CHECK-NEXT: [[T3:%.]] = urem <4 x i32> <i32 1, i32 2, i32 7, i32 1>, [[V0:%.]]
	; CHECK-NEXT: [[T2:%.*]] = urem <4 x i32> <i32 5, i32 6, i32 7, i32 8>, [[V0]]
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 0, i32 1, i32 6, i32 undef>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = urem <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0			%t1 = urem <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0
	%t2 = urem <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0			%t2 = urem <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 1, i32 6, i32 undef>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 1, i32 6, i32 undef>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	define <4 x i32> @srem(<4 x i32> %v0) {			define <4 x i32> @srem(<4 x i32> %v0) {
	; CHECK-LABEL: @srem(			; CHECK-LABEL: @srem(
	; CHECK-NEXT: [[T1:%.]] = srem <4 x i32> <i32 1, i32 2, i32 3, i32 4>, [[V0:%.]]			; CHECK-NEXT: [[T3:%.]] = srem <4 x i32> <i32 1, i32 2, i32 7, i32 4>, [[V0:%.]]
	; CHECK-NEXT: [[T2:%.*]] = srem <4 x i32> <i32 5, i32 6, i32 7, i32 8>, [[V0]]
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = srem <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0			%t1 = srem <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0
	%t2 = srem <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0			%t2 = srem <4 x i32> <i32 5, i32 6, i32 7, i32 8>, %v0
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 1, i32 6, i32 3>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	; Try FP ops/types.			; Try FP ops/types.

	define <4 x float> @fadd(<4 x float> %v0) {			define <4 x float> @fadd(<4 x float> %v0) {
	; CHECK-LABEL: @fadd(			; CHECK-LABEL: @fadd(
	; CHECK-NEXT: [[T1:%.]] = fadd <4 x float> [[V0:%.]], <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00>			; CHECK-NEXT: [[T3:%.]] = fadd <4 x float> [[V0:%.]], <float 1.000000e+00, float 2.000000e+00, float 7.000000e+00, float 8.000000e+00>
	; CHECK-NEXT: [[T2:%.*]] = fadd <4 x float> [[V0]], <float 5.000000e+00, float 6.000000e+00, float 7.000000e+00, float 8.000000e+00>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x float> [[T1]], <4 x float> [[T2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	; CHECK-NEXT: ret <4 x float> [[T3]]			; CHECK-NEXT: ret <4 x float> [[T3]]
	;			;
	%t1 = fadd <4 x float> %v0, <float 1.0, float 2.0, float 3.0, float 4.0>			%t1 = fadd <4 x float> %v0, <float 1.0, float 2.0, float 3.0, float 4.0>
	%t2 = fadd <4 x float> %v0, <float 5.0, float 6.0, float 7.0, float 8.0>			%t2 = fadd <4 x float> %v0, <float 5.0, float 6.0, float 7.0, float 8.0>
	%t3 = shufflevector <4 x float> %t1, <4 x float> %t2, <4 x i32> <i32 0, i32 1, i32 6, i32 7>			%t3 = shufflevector <4 x float> %t1, <4 x float> %t2, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	ret <4 x float> %t3			ret <4 x float> %t3
	}			}

	define <4 x double> @fsub(<4 x double> %v0) {			define <4 x double> @fsub(<4 x double> %v0) {
	; CHECK-LABEL: @fsub(			; CHECK-LABEL: @fsub(
	; CHECK-NEXT: [[T1:%.]] = fsub <4 x double> <double 1.000000e+00, double 2.000000e+00, double 3.000000e+00, double 4.000000e+00>, [[V0:%.]]			; CHECK-NEXT: [[T3:%.]] = fsub <4 x double> <double undef, double 2.000000e+00, double 7.000000e+00, double 8.000000e+00>, [[V0:%.]]
	; CHECK-NEXT: [[T2:%.*]] = fsub <4 x double> <double 5.000000e+00, double 6.000000e+00, double 7.000000e+00, double 8.000000e+00>, [[V0]]
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x double> [[T1]], <4 x double> [[T2]], <4 x i32> <i32 undef, i32 1, i32 6, i32 7>
	; CHECK-NEXT: ret <4 x double> [[T3]]			; CHECK-NEXT: ret <4 x double> [[T3]]
	;			;
	%t1 = fsub <4 x double> <double 1.0, double 2.0, double 3.0, double 4.0>, %v0			%t1 = fsub <4 x double> <double 1.0, double 2.0, double 3.0, double 4.0>, %v0
	%t2 = fsub <4 x double> <double 5.0, double 6.0, double 7.0, double 8.0>, %v0			%t2 = fsub <4 x double> <double 5.0, double 6.0, double 7.0, double 8.0>, %v0
	%t3 = shufflevector <4 x double> %t1, <4 x double> %t2, <4 x i32> <i32 undef, i32 1, i32 6, i32 7>			%t3 = shufflevector <4 x double> %t1, <4 x double> %t2, <4 x i32> <i32 undef, i32 1, i32 6, i32 7>
	ret <4 x double> %t3			ret <4 x double> %t3
	}			}

	; Intersect any FMF.			; Intersect any FMF.

	define <4 x float> @fmul(<4 x float> %v0) {			define <4 x float> @fmul(<4 x float> %v0) {
	; CHECK-LABEL: @fmul(			; CHECK-LABEL: @fmul(
	; CHECK-NEXT: [[T1:%.]] = fmul nnan ninf <4 x float> [[V0:%.]], <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00>			; CHECK-NEXT: [[T3:%.]] = fmul nnan ninf <4 x float> [[V0:%.]], <float 1.000000e+00, float 6.000000e+00, float 7.000000e+00, float 8.000000e+00>
	; CHECK-NEXT: [[T2:%.*]] = fmul nnan ninf <4 x float> [[V0]], <float 5.000000e+00, float 6.000000e+00, float 7.000000e+00, float 8.000000e+00>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x float> [[T1]], <4 x float> [[T2]], <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	; CHECK-NEXT: ret <4 x float> [[T3]]			; CHECK-NEXT: ret <4 x float> [[T3]]
	;			;
	%t1 = fmul nnan ninf <4 x float> %v0, <float 1.0, float 2.0, float 3.0, float 4.0>			%t1 = fmul nnan ninf <4 x float> %v0, <float 1.0, float 2.0, float 3.0, float 4.0>
	%t2 = fmul nnan ninf <4 x float> %v0, <float 5.0, float 6.0, float 7.0, float 8.0>			%t2 = fmul nnan ninf <4 x float> %v0, <float 5.0, float 6.0, float 7.0, float 8.0>
	%t3 = shufflevector <4 x float> %t1, <4 x float> %t2, <4 x i32> <i32 0, i32 5, i32 6, i32 7>			%t3 = shufflevector <4 x float> %t1, <4 x float> %t2, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	ret <4 x float> %t3			ret <4 x float> %t3
	}			}

	define <4 x double> @fdiv(<4 x double> %v0) {			define <4 x double> @fdiv(<4 x double> %v0) {
	; CHECK-LABEL: @fdiv(			; CHECK-LABEL: @fdiv(
	; CHECK-NEXT: [[T1:%.]] = fdiv fast <4 x double> <double 1.000000e+00, double 2.000000e+00, double 3.000000e+00, double 4.000000e+00>, [[V0:%.]]			; CHECK-NEXT: [[T3:%.]] = fdiv nnan arcp <4 x double> <double undef, double 2.000000e+00, double 7.000000e+00, double 8.000000e+00>, [[V0:%.]]
	; CHECK-NEXT: [[T2:%.*]] = fdiv nnan arcp <4 x double> <double 5.000000e+00, double 6.000000e+00, double 7.000000e+00, double 8.000000e+00>, [[V0]]
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x double> [[T1]], <4 x double> [[T2]], <4 x i32> <i32 undef, i32 1, i32 6, i32 7>
	; CHECK-NEXT: ret <4 x double> [[T3]]			; CHECK-NEXT: ret <4 x double> [[T3]]
	;			;
	%t1 = fdiv fast <4 x double> <double 1.0, double 2.0, double 3.0, double 4.0>, %v0			%t1 = fdiv fast <4 x double> <double 1.0, double 2.0, double 3.0, double 4.0>, %v0
	%t2 = fdiv nnan arcp <4 x double> <double 5.0, double 6.0, double 7.0, double 8.0>, %v0			%t2 = fdiv nnan arcp <4 x double> <double 5.0, double 6.0, double 7.0, double 8.0>, %v0
	%t3 = shufflevector <4 x double> %t1, <4 x double> %t2, <4 x i32> <i32 undef, i32 1, i32 6, i32 7>			%t3 = shufflevector <4 x double> %t1, <4 x double> %t2, <4 x i32> <i32 undef, i32 1, i32 6, i32 7>
	ret <4 x double> %t3			ret <4 x double> %t3
	}			}