This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2
ClosedPublic

Authored by RKSimon on Dec 13 2014, 8:09 AM.

Download Raw Diff

Details

Reviewers

qcolombet
chandlerc
andreadb

Commits

rG46cd4f74005b: [X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2
rL228047: [X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2

Summary

Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699.

For integer vector shuffles where lanes are being moved to the left/right in short groups and zeros are being inserted. Each integer type can be shifted safely using any wider type (so i8 -> i16/i32/i64, i16 -> i32/i64, i32 -> i64).

I have an upcoming patch that will fix the combine-or.ll domain mismatch.

I kept to just providing the immediate versions of the SSE2 logical bit shifts - there may be a case for adding support for AVX2 per-lane shifts but I don't have the hardware to test this.

Theoretically I think in the future this could be generalised (endian fixes) and moved to DAGCombine?

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 17256.Dec 13 2014, 8:09 AM

RKSimon retitled this revision from to [X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2.

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: chandlerc, qcolombet, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

chandlerc added inline comments.Dec 15 2014, 10:56 AM

lib/Target/X86/X86ISelLowering.cpp
7762–7777	Is this clang-formatted? I would have expected a slightly different layout, but I don't want to quibble with whatever clang-format chooses.
7779–7790	Have you checkde whether we already have such a predicate? I thought we did, but maybe this is sligthly different. Either way, a comment clarifying the exact thing it is checking would be really helpful I think. At the very least, the argument set is quite surprising. Also, for lambdas, I've been consistently naming them as variables, and so IsSequential.
7794	Does a range based for loop not work here?
7795	Please use early exit patterns to reduce indentation, and test the inverse and continue.
7797–7798	I would prefer to just use integers to represent numbers unless you explicitly want modular arithmetic (here and throughout this patch)
7800	I think it would be really helpful to actually comment on the algorithm you're using to search for bit-shift equivalent shuffle masks. It makes it a bit easier to check that the code is actually doing what is intended.
7814–7815	Rather than use variables, maybe sink this into the predicate function so that you can early-exit from the loops?
7819	80-columns.

Thanks Chandler - I'm creating an updated patch now.

lib/Target/X86/X86ISelLowering.cpp
7762–7777	I tried that but I thought clang-format's indentation was rather extreme - it pushes all the entries to the far right (all that whitespace!). But to avoid confusion I'll use it again.
7779–7790	Turns out there is a usable predicate (isSequentialOrUndefInRange) and whatever reason I didn't use it originally in lowerVectorShuffleAsByteShift was false - that patch went through a lot of edits. I've changed it for both lowerVectorShuffleAsByteShift and lowerVectorShuffleAsBitShift.
7794	Not sure what you mean here - have you got any examples?
7814–7815	This doesn't really work now that I can remove the predicated and use isSequentialOrUndefInRange directly.

Updated patch based on Chandler's comments.

Replaced the custom isSequential predicate with isSequentialOrUndefInRange in both lowerVectorShuffleAsByteShift and lowerVectorShuffleAsBitShift.

Updated + rebased shuffle -> bit shift lowering patch based on feedback.

Chandler - are there any outstanding issues you are concerned with please?

PING

Rebased - I've covered all of Chandler's comments now.

Quentin/Andrea - do you have any additional comments please?

Really sorry Simon, I lost track of this over the holidays as I became distracted with SROA stuff.

lib/Target/X86/X86ISelLowering.cpp
7709–7713	Please go ahead and commit these bits and the comment update as a separate change. This change should focus on adding bit-shift variants.
7779–7781	The first thing I think needs to be re-worked here is to change how you're computing the types to work with. The table solution with a simple continue is really bad IMO. I had to spend some time thinking to see the best way to do this. At first, I thought the best approach would be to build a function or an actual map from VT to the range of possible other vector types that we could use in combination with a shift. Then the outer loop would just be over the specific candidate types. But then it occurred to me that you don't need to hard code any of these things. Given an integer VT with N elements and M bits, you can just divid N by 2 and multiply M by 2 on each iteration as long as N > 1 and M <= 64. That will walk over all possible re-slicings of the vector type. For x86 where the full permutation of types are legal, I would just add an assert that the resulting vector type is legal. This completely removes the need for a table or SSE2 vs AVX2 distinctions. And it leaves you with a single loop with no continues.
7787–7790	[IGNORE THIS: phabricator won't let me delete the comment]
7798–7813	You can merge all of these tests into a single predicate function which has an outer loop containing an inner loop and two calls to isSequentialOrUndefInRange. Then I suspect you can merge this predicate with the Left variant by providing it as a parameter which half to examine [Scale - Shift, Scale) for right, and [0, Shift) for left. You might need one other parameter, but still, I think its worth folding them. By putting them into a predicate function, you don't need 3 boolean variables, and you can early exit as soon as you determine "no".
7816–7821	Once you make everything a nice re-usable predicate for testing both right and left shifts, you can write this code once and just have a conditional to select which instruction (right or left) is used.

Merged left / right shift matching code into single predicate - I'm a little dubious as to whether the predicate is actually needed at this point....

Improved some byte shift tests as they were matching against bit shifts instead and we were losing test coverage.

ping

Hi Simon,

Chandler did all the work of the review so I’ll leave it the final LGTM.
Here are my comments.

Thanks for your patience.
-Quentin

test/CodeGen/X86/vec_insert-5.ll
68 ↗	(On Diff #17861)	I guess you modified the input IR to have additional coverage. If that is the case, I believe a new test case with that input would be preferable, i.e., having both tests.
77 ↗	(On Diff #17861)	Same here.

Thanks Quentin - I've added the extra tests to vec_insert-5.ll - the existing tests now match the bit shifts and the (readded) modified old tests match the byte shifts again.

Chandler / Quentin - are you happy with the changes?

Sorry I've not gotten back to this. I'm going to try to sweep through all
the outstanding vector shuffle reviews tomorrow.

ping

This is much better, but I still think there is a simpler way to write the inner loops. Trying to explain this with prose isn't helping tjough, so I'll try to patch and modify it. If it works I can submit the hybrid result, and if not I'll understand much better why not. =]

RKSimon mentioned this in D7256: [X86][SSE] Added general integer shuffle matching for MOVQ instruction.Jan 29 2015, 8:48 AM

It'd be great if you want to have a go at improving this - it does reek of being able to be reduced further but so far I've failed to find the right predicate magic.

The only other thing that I've thought of is adding all_of / any_of / none_of range tests to the (Small)BitVector classes to tidyup those zeroable tests, which are becoming more common in the X86 shuffle code now.

BTW - I've just put up a patch (D7256) which fixes the combine-or.ll issue in conjunction with this patch.

I've had another go at this, and hope I''ve cracked it this time. I''ve refactored the predicate to actually create the bit shift node (+ surrounding casts) instead of just reporting whether a left/right shift is possible.

Moved AVX2 bitshift matching after VPSHUFD detection to stop loss of a foldable shuffle

rebased + updated tests

I think this code is pretty nice.

I have a minor simplification of it, but I'm happy to just apply that in a follow-up patch. Go ahead and submit this one as-is.

This revision is now accepted and ready to land.Feb 3 2015, 12:51 PM

Are you working on committing this Simon? If not, I'd like to commit it for
you so that I don't then cause you a bunch of merge conflicts when updating
test cases as i flip defaults around on flags.

Thanks Chandler, I'll get this committed shortly - just got distracted with the release build compile warnings from my other commits tonight.

Closed by commit rL228047: [X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2 (authored by RKSimon). · Explain WhyFeb 3 2015, 2:00 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

180 lines

test/

CodeGen/

X86/

combine-or.ll

22 lines

vector-idiv.ll

88 lines

vector-shuffle-128-v16.ll

108 lines

vector-shuffle-128-v4.ll

38 lines

vector-shuffle-128-v8.ll

121 lines

vector-shuffle-256-v16.ll

110 lines

vector-shuffle-256-v32.ll

148 lines

vector-shuffle-256-v8.ll

42 lines

Diff 17474

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,845 Lines • ▼ Show 20 Lines	bool X86TargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned BitSize = Ty->getPrimitiveSizeInBits();		unsigned BitSize = Ty->getPrimitiveSizeInBits();
if (BitSize == 0 \|\| BitSize > 64)		if (BitSize == 0 \|\| BitSize > 64)
return false;		return false;
return true;		return true;
}		}

bool X86TargetLowering::isExtractSubvectorCheap(EVT ResVT,		bool X86TargetLowering::isExtractSubvectorCheap(EVT ResVT,
unsigned Index) const {		unsigned Index) const {
if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))		if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))
return false;		return false;

return (Index == 0 \|\| Index == ResVT.getVectorNumElements());		return (Index == 0 \|\| Index == ResVT.getVectorNumElements());
}		}

/// isUndefOrInRange - Return true if Val is undef or if its value falls within		/// isUndefOrInRange - Return true if Val is undef or if its value falls within
/// the specified range (L, H].		/// the specified range (L, H].
static bool isUndefOrInRange(int Val, int Low, int Hi) {		static bool isUndefOrInRange(int Val, int Low, int Hi) {
return (Val < 0) \|\| (Val >= Low && Val < Hi);		return (Val < 0) \|\| (Val >= Low && Val < Hi);
}		}

/// isUndefOrEqual - Val is either less than zero (undef) or equal to the		/// isUndefOrEqual - Val is either less than zero (undef) or equal to the
/// specified value.		/// specified value.
static bool isUndefOrEqual(int Val, int CmpVal) {		static bool isUndefOrEqual(int Val, int CmpVal) {
return (Val < 0 \|\| Val == CmpVal);		return (Val < 0 \|\| Val == CmpVal);
}		}

/// isSequentialOrUndefInRange - Return true if every element in Mask, beginning		/// isSequentialOrUndefInRange - Return true if every element in Mask, beginning
/// from position Pos and ending in Pos+Size, falls within the specified		/// from position Pos and ending in Pos+Size, falls within the specified
/// sequential range (L, L+Pos]. or is undef.		/// sequential range (L, L+Size]. or is undef.
static bool isSequentialOrUndefInRange(ArrayRef<int> Mask,		static bool isSequentialOrUndefInRange(ArrayRef<int> Mask,
unsigned Pos, unsigned Size, int Low) {		unsigned Pos, unsigned Size, int Low) {
for (unsigned i = Pos, e = Pos+Size; i != e; ++i, ++Low)		for (unsigned i = Pos, e = Pos+Size; i != e; ++i, ++Low)
if (!isUndefOrEqual(Mask[i], Low))		if (!isUndefOrEqual(Mask[i], Low))
return false;		return false;
return true;		return true;
}		}

▲ Show 20 Lines • Show All 2,146 Lines • ▼ Show 20 Lines	if (LDBase->hasAnyUseOfValue(1)) {
SDValue(NewLd.getNode(), 1));		SDValue(NewLd.getNode(), 1));
DAG.ReplaceAllUsesOfValueWith(SDValue(LDBase, 1), NewChain);		DAG.ReplaceAllUsesOfValueWith(SDValue(LDBase, 1), NewChain);
DAG.UpdateNodeOperands(NewChain.getNode(), SDValue(LDBase, 1),		DAG.UpdateNodeOperands(NewChain.getNode(), SDValue(LDBase, 1),
SDValue(NewLd.getNode(), 1));		SDValue(NewLd.getNode(), 1));
}		}

return NewLd;		return NewLd;
}		}

//TODO: The code below fires only for for loading the low v2i32 / v2f32		//TODO: The code below fires only for for loading the low v2i32 / v2f32
//of a v4i32 / v4f32. It's probably worth generalizing.		//of a v4i32 / v4f32. It's probably worth generalizing.
if (NumElems == 4 && LastLoadedElt == 1 && (EltVT.getSizeInBits() == 32) &&		if (NumElems == 4 && LastLoadedElt == 1 && (EltVT.getSizeInBits() == 32) &&
DAG.getTargetLoweringInfo().isTypeLegal(MVT::v2i64)) {		DAG.getTargetLoweringInfo().isTypeLegal(MVT::v2i64)) {
SDVTList Tys = DAG.getVTList(MVT::v2i64, MVT::Other);		SDVTList Tys = DAG.getVTList(MVT::v2i64, MVT::Other);
SDValue Ops[] = { LDBase->getChain(), LDBase->getBasePtr() };		SDValue Ops[] = { LDBase->getChain(), LDBase->getBasePtr() };
SDValue ResNode =		SDValue ResNode =
DAG.getMemIntrinsicNode(X86ISD::VZEXT_LOAD, DL, Tys, Ops, MVT::i64,		DAG.getMemIntrinsicNode(X86ISD::VZEXT_LOAD, DL, Tys, Ops, MVT::i64,
▲ Show 20 Lines • Show All 970 Lines • ▼ Show 20 Lines	X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const {
if (VT.is256BitVector() \|\| VT.is512BitVector()) {		if (VT.is256BitVector() \|\| VT.is512BitVector()) {
SmallVector<SDValue, 64> V;		SmallVector<SDValue, 64> V;
for (unsigned i = 0; i != NumElems; ++i)		for (unsigned i = 0; i != NumElems; ++i)
V.push_back(Op.getOperand(i));		V.push_back(Op.getOperand(i));

// Check for a build vector of consecutive loads.		// Check for a build vector of consecutive loads.
if (SDValue LD = EltsFromConsecutiveLoads(VT, V, dl, DAG, false))		if (SDValue LD = EltsFromConsecutiveLoads(VT, V, dl, DAG, false))
return LD;		return LD;

EVT HVT = EVT::getVectorVT(*DAG.getContext(), ExtVT, NumElems/2);		EVT HVT = EVT::getVectorVT(*DAG.getContext(), ExtVT, NumElems/2);

// Build both the lower and upper subvector.		// Build both the lower and upper subvector.
SDValue Lower = DAG.getNode(ISD::BUILD_VECTOR, dl, HVT,		SDValue Lower = DAG.getNode(ISD::BUILD_VECTOR, dl, HVT,
makeArrayRef(&V[0], NumElems/2));		makeArrayRef(&V[0], NumElems/2));
SDValue Upper = DAG.getNode(ISD::BUILD_VECTOR, dl, HVT,		SDValue Upper = DAG.getNode(ISD::BUILD_VECTOR, dl, HVT,
makeArrayRef(&V[NumElems / 2], NumElems/2));		makeArrayRef(&V[NumElems / 2], NumElems/2));

▲ Show 20 Lines • Show All 653 Lines • ▼ Show 20 Lines	static SDValue lowerVectorShuffleAsByteShift(SDLoc DL, MVT VT, SDValue V1,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
assert(!isNoopShuffleMask(Mask) && "We shouldn't lower no-op shuffles!");		assert(!isNoopShuffleMask(Mask) && "We shouldn't lower no-op shuffles!");

SmallBitVector Zeroable = computeZeroableShuffleElements(Mask, V1, V2);		SmallBitVector Zeroable = computeZeroableShuffleElements(Mask, V1, V2);

int Size = Mask.size();		int Size = Mask.size();
int Scale = 16 / Size;		int Scale = 16 / Size;

auto isSequential = [](int Base, int StartIndex, int EndIndex, int MaskOffset,
ArrayRef<int> Mask) {
for (int i = StartIndex; i < EndIndex; i++) {
if (Mask[i] < 0)
continue;
if (i + Base != Mask[i] - MaskOffset)
return false;
}
return true;
};

for (int Shift = 1; Shift < Size; Shift++) {		for (int Shift = 1; Shift < Size; Shift++) {
int ByteShift = Shift * Scale;		int ByteShift = Shift * Scale;

// PSRLDQ : (little-endian) right byte shift		// PSRLDQ : (little-endian) right byte shift
// [ 5, 6, 7, zz, zz, zz, zz, zz]		// [ 5, 6, 7, zz, zz, zz, zz, zz]
// [ -1, 5, 6, 7, zz, zz, zz, zz]		// [ -1, 5, 6, 7, zz, zz, zz, zz]
// [ 1, 2, -1, -1, -1, -1, zz, zz]		// [ 1, 2, -1, -1, -1, -1, zz, zz]
bool ZeroableRight = true;		bool ZeroableRight = true;
for (int i = Size - Shift; i < Size; i++) {		for (int i = Size - Shift; i < Size; i++) {
ZeroableRight &= Zeroable[i];		ZeroableRight &= Zeroable[i];
}		}

if (ZeroableRight) {		if (ZeroableRight) {
bool ValidShiftRight1 = isSequential(Shift, 0, Size - Shift, 0, Mask);		bool ValidShiftRight1 =
bool ValidShiftRight2 = isSequential(Shift, 0, Size - Shift, Size, Mask);		isSequentialOrUndefInRange(Mask, 0, Size - Shift, Shift);
		bool ValidShiftRight2 =
		isSequentialOrUndefInRange(Mask, 0, Size - Shift, Size + Shift);

		chandlercUnsubmitted Not Done Reply Inline Actions Please go ahead and commit these bits and the comment update as a separate change. This change should focus on adding bit-shift variants. chandlerc: Please go ahead and commit these bits and the comment update as a separate change. This change…
if (ValidShiftRight1 \|\| ValidShiftRight2) {		if (ValidShiftRight1 \|\| ValidShiftRight2) {
// Cast the inputs to v2i64 to match PSRLDQ.		// Cast the inputs to v2i64 to match PSRLDQ.
SDValue &TargetV = ValidShiftRight1 ? V1 : V2;		SDValue &TargetV = ValidShiftRight1 ? V1 : V2;
SDValue V = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TargetV);		SDValue V = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TargetV);
SDValue Shifted = DAG.getNode(X86ISD::VSRLDQ, DL, MVT::v2i64, V,		SDValue Shifted = DAG.getNode(X86ISD::VSRLDQ, DL, MVT::v2i64, V,
DAG.getConstant(ByteShift * 8, MVT::i8));		DAG.getConstant(ByteShift * 8, MVT::i8));
return DAG.getNode(ISD::BITCAST, DL, VT, Shifted);		return DAG.getNode(ISD::BITCAST, DL, VT, Shifted);
}		}
}		}

// PSLLDQ : (little-endian) left byte shift		// PSLLDQ : (little-endian) left byte shift
// [ zz, 0, 1, 2, 3, 4, 5, 6]		// [ zz, 0, 1, 2, 3, 4, 5, 6]
// [ zz, zz, -1, -1, 2, 3, 4, -1]		// [ zz, zz, -1, -1, 2, 3, 4, -1]
// [ zz, zz, zz, zz, zz, zz, -1, 1]		// [ zz, zz, zz, zz, zz, zz, -1, 1]
bool ZeroableLeft = true;		bool ZeroableLeft = true;
for (int i = 0; i < Shift; i++) {		for (int i = 0; i < Shift; i++) {
ZeroableLeft &= Zeroable[i];		ZeroableLeft &= Zeroable[i];
}		}

if (ZeroableLeft) {		if (ZeroableLeft) {
bool ValidShiftLeft1 = isSequential(-Shift, Shift, Size, 0, Mask);		bool ValidShiftLeft1 =
bool ValidShiftLeft2 = isSequential(-Shift, Shift, Size, Size, Mask);		isSequentialOrUndefInRange(Mask, Shift, Size - Shift, 0);
		bool ValidShiftLeft2 =
		isSequentialOrUndefInRange(Mask, Shift, Size - Shift, Size);

if (ValidShiftLeft1 \|\| ValidShiftLeft2) {		if (ValidShiftLeft1 \|\| ValidShiftLeft2) {
// Cast the inputs to v2i64 to match PSLLDQ.		// Cast the inputs to v2i64 to match PSLLDQ.
SDValue &TargetV = ValidShiftLeft1 ? V1 : V2;		SDValue &TargetV = ValidShiftLeft1 ? V1 : V2;
SDValue V = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TargetV);		SDValue V = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TargetV);
SDValue Shifted = DAG.getNode(X86ISD::VSHLDQ, DL, MVT::v2i64, V,		SDValue Shifted = DAG.getNode(X86ISD::VSHLDQ, DL, MVT::v2i64, V,
DAG.getConstant(ByteShift * 8, MVT::i8));		DAG.getConstant(ByteShift * 8, MVT::i8));
return DAG.getNode(ISD::BITCAST, DL, VT, Shifted);		return DAG.getNode(ISD::BITCAST, DL, VT, Shifted);
}		}
}		}
}		}

return SDValue();		return SDValue();
}		}

		/// \brief Try to lower a vector shuffle as a bit shift (shifts in zeros).
		///
		/// Attempts to match a shuffle mask against the PSRL(W/D/Q) and PSLL(W/D/Q)
		/// SSE2 and AVX2 logical bit-shift instructions. The function matches
		/// elements from one of the input vectors shuffled to the left or right
		/// with zeroable elements 'shifted in'.
		static SDValue lowerVectorShuffleAsBitShift(SDLoc DL, MVT VT, SDValue V1,
		SDValue V2, ArrayRef<int> Mask,
		SelectionDAG &DAG) {
		const MVT::SimpleValueType ShiftMapping[][2] = {// SSE2
		{MVT::v4i32, MVT::v2i64},
		{MVT::v8i16, MVT::v4i32},
		{MVT::v8i16, MVT::v2i64},
		{MVT::v16i8, MVT::v8i16},
		{MVT::v16i8, MVT::v4i32},
		{MVT::v16i8, MVT::v2i64},
		// AVX2
		{MVT::v8i32, MVT::v4i64},
		{MVT::v16i16, MVT::v8i32},
		{MVT::v16i16, MVT::v4i64},
		{MVT::v32i8, MVT::v16i16},
		{MVT::v32i8, MVT::v8i32},
		{MVT::v32i8, MVT::v4i64}};

		SmallBitVector Zeroable = computeZeroableShuffleElements(Mask, V1, V2);
		chandlercUnsubmitted Not Done Reply Inline Actions Is this clang-formatted? I would have expected a slightly different layout, but I don't want to quibble with whatever clang-format chooses. chandlerc: Is this clang-formatted? I would have expected a slightly different layout, but I don't want…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions I tried that but I thought clang-format's indentation was rather extreme - it pushes all the entries to the far right (all that whitespace!). But to avoid confusion I'll use it again. RKSimon: I tried that but I thought clang-format's indentation was rather extreme - it pushes all the…

		for (auto map : ShiftMapping) {
		if (VT.SimpleTy != map[0])
		continue;
		chandlercUnsubmitted Not Done Reply Inline Actions The first thing I think needs to be re-worked here is to change how you're computing the types to work with. The table solution with a simple continue is really bad IMO. I had to spend some time thinking to see the best way to do this. At first, I thought the best approach would be to build a function or an actual map from VT to the range of possible other vector types that we could use in combination with a shift. Then the outer loop would just be over the specific candidate types. But then it occurred to me that you don't need to hard code any of these things. Given an integer VT with N elements and M bits, you can just divid N by 2 and multiply M by 2 on each iteration as long as N > 1 and M <= 64. That will walk over all possible re-slicings of the vector type. For x86 where the full permutation of types are legal, I would just add an assert that the resulting vector type is legal. This completely removes the need for a table or SSE2 vs AVX2 distinctions. And it leaves you with a single loop with no continues. chandlerc: The first thing I think needs to be re-worked here is to change how you're computing the types…

		MVT ShiftVT = MVT(map[1]);
		int Size = ShiftVT.getVectorNumElements();
		int Scale = VT.getVectorNumElements() / Size;

		// We can shift the elements of the integer vector by whole multiples of
		// their width within the elements of the larger integer vector. Test each
		// multiple to see if we can find a match with the moved element indices
		// and that the shifted in elements are all zeroable.
		chandlercUnsubmitted Not Done Reply Inline Actions Have you checkde whether we already have such a predicate? I thought we did, but maybe this is sligthly different. Either way, a comment clarifying the exact thing it is checking would be really helpful I think. At the very least, the argument set is quite surprising. Also, for lambdas, I've been consistently naming them as variables, and so IsSequential. chandlerc: Have you checkde whether we already have such a predicate? I thought we did, but maybe this is…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Turns out there is a usable predicate (isSequentialOrUndefInRange) and whatever reason I didn't use it originally in lowerVectorShuffleAsByteShift was false - that patch went through a lot of edits. I've changed it for both lowerVectorShuffleAsByteShift and lowerVectorShuffleAsBitShift. RKSimon: Turns out there is a usable predicate (isSequentialOrUndefInRange) and whatever reason I didn't…
		chandlercUnsubmitted Not Done Reply Inline Actions [IGNORE THIS: phabricator won't let me delete the comment] chandlerc: [IGNORE THIS: phabricator won't let me delete the comment]
		for (int Shift = 1; Shift != Scale; Shift++) {
		int ShiftAmt = Shift * VT.getScalarSizeInBits();

		// PSRL : (little-endian) right bit shift.
		chandlercUnsubmitted Not Done Reply Inline Actions Does a range based for loop not work here? chandlerc: Does a range based for loop not work here?
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Not sure what you mean here - have you got any examples? RKSimon: Not sure what you mean here - have you got any examples?
		// [ 1, zz, 3, zz]
		chandlercUnsubmitted Not Done Reply Inline Actions Please use early exit patterns to reduce indentation, and test the inverse and continue. chandlerc: Please use early exit patterns to reduce indentation, and test the inverse and continue.
		// [ -1, -1, 7, zz]
		bool ZeroableRight = true;
		for (int i = 0, e = Size * Scale; i != e; i += Scale) {
		chandlercUnsubmitted Not Done Reply Inline Actions I would prefer to just use integers to represent numbers unless you explicitly want modular arithmetic (here and throughout this patch) chandlerc: I would prefer to just use integers to represent numbers unless you explicitly want modular…
		for (int j = Scale - Shift; j < Scale; j++) {
		ZeroableRight &= Zeroable[i + j];
		chandlercUnsubmitted Not Done Reply Inline Actions I think it would be really helpful to actually comment on the algorithm you're using to search for bit-shift equivalent shuffle masks. It makes it a bit easier to check that the code is actually doing what is intended. chandlerc: I think it would be really helpful to actually comment on the algorithm you're using to search…
		}
		}

		if (ZeroableRight) {
		bool ValidShiftRight1 = true;
		bool ValidShiftRight2 = true;

		for (unsigned i = 0, e = Size * Scale; i != e; i += Scale) {
		ValidShiftRight1 &= isSequentialOrUndefInRange(
		Mask, i, Scale - Shift, i + Shift);
		ValidShiftRight2 &= isSequentialOrUndefInRange(
		Mask, i, Scale - Shift, i + Shift + Mask.size());
		}
		chandlercUnsubmitted Not Done Reply Inline Actions You can merge all of these tests into a single predicate function which has an outer loop containing an inner loop and two calls to isSequentialOrUndefInRange. Then I suspect you can merge this predicate with the Left variant by providing it as a parameter which half to examine [Scale - Shift, Scale) for right, and [0, Shift) for left. You might need one other parameter, but still, I think its worth folding them. By putting them into a predicate function, you don't need 3 boolean variables, and you can early exit as soon as you determine "no". chandlerc: You can merge all of these tests into a single predicate function which has an outer loop…

		if (ValidShiftRight1 \|\| ValidShiftRight2) {
		chandlercUnsubmitted Not Done Reply Inline Actions Rather than use variables, maybe sink this into the predicate function so that you can early-exit from the loops? chandlerc: Rather than use variables, maybe sink this into the predicate function so that you can early…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions This doesn't really work now that I can remove the predicated and use isSequentialOrUndefInRange directly. RKSimon: This doesn't really work now that I can remove the predicated and use…
		// Cast the inputs to ShiftVT to match VSRLI and then back again.
		SDValue &TargetV = ValidShiftRight1 ? V1 : V2;
		SDValue V = DAG.getNode(ISD::BITCAST, DL, ShiftVT, TargetV);
		SDValue Shifted = DAG.getNode(X86ISD::VSRLI, DL, ShiftVT, V,
		chandlercUnsubmitted Not Done Reply Inline Actions 80-columns. chandlerc: 80-columns.
		DAG.getConstant(ShiftAmt, MVT::i8));
		return DAG.getNode(ISD::BITCAST, DL, VT, Shifted);
		chandlercUnsubmitted Not Done Reply Inline Actions Once you make everything a nice re-usable predicate for testing both right and left shifts, you can write this code once and just have a conditional to select which instruction (right or left) is used. chandlerc: Once you make everything a nice re-usable predicate for testing both right and left shifts, you…
		}
		}

		// PSHL : (little-endian) left bit shift.
		// [ zz, 0, zz, 2 ]
		// [ -1, 4, zz, -1 ]
		bool ZeroableLeft = true;
		for (int i = 0, e = Size * Scale; i != e; i += Scale) {
		for (int j = 0; j < Shift; j++) {
		ZeroableLeft &= Zeroable[i + j];
		}
		}

		if (ZeroableLeft) {
		bool ValidShiftLeft1 = true;
		bool ValidShiftLeft2 = true;

		for (int i = 0, e = Size * Scale; i != e; i += Scale) {
		ValidShiftLeft1 &= isSequentialOrUndefInRange(
		Mask, i + Shift, Scale - Shift, i);
		ValidShiftLeft2 &= isSequentialOrUndefInRange(
		Mask, i + Shift, Scale - Shift, i + Mask.size());
		}

		if (ValidShiftLeft1 \|\| ValidShiftLeft2) {
		// Cast the inputs to ShiftVT to match VSHLI and then back again.
		SDValue &TargetV = ValidShiftLeft1 ? V1 : V2;
		SDValue V = DAG.getNode(ISD::BITCAST, DL, ShiftVT, TargetV);
		SDValue Shifted = DAG.getNode(X86ISD::VSHLI, DL, ShiftVT, V,
		DAG.getConstant(ShiftAmt, MVT::i8));
		return DAG.getNode(ISD::BITCAST, DL, VT, Shifted);
		}
		}
		}
		}

		return SDValue();
		}

/// \brief Lower a vector shuffle as a zero or any extension.		/// \brief Lower a vector shuffle as a zero or any extension.
///		///
/// Given a specific number of elements, element bit width, and extension		/// Given a specific number of elements, element bit width, and extension
/// stride, produce either a zero or any extension based on the available		/// stride, produce either a zero or any extension based on the available
/// features of the subtarget.		/// features of the subtarget.
static SDValue lowerVectorShuffleAsSpecificZeroOrAnyExtend(		static SDValue lowerVectorShuffleAsSpecificZeroOrAnyExtend(
SDLoc DL, MVT VT, int NumElements, int Scale, bool AnyExt, SDValue InputV,		SDLoc DL, MVT VT, int NumElements, int Scale, bool AnyExt, SDValue InputV,
const X86Subtarget *Subtarget, SelectionDAG &DAG) {		const X86Subtarget *Subtarget, SelectionDAG &DAG) {
▲ Show 20 Lines • Show All 757 Lines • ▼ Show 20 Lines	return DAG.getNode(X86ISD::PSHUFD, DL, MVT::v4i32, V1,
getV4X86ShuffleImm8ForMask(Mask, DAG));		getV4X86ShuffleImm8ForMask(Mask, DAG));
}		}

// Try to use byte shift instructions.		// Try to use byte shift instructions.
if (SDValue Shift = lowerVectorShuffleAsByteShift(		if (SDValue Shift = lowerVectorShuffleAsByteShift(
DL, MVT::v4i32, V1, V2, Mask, DAG))		DL, MVT::v4i32, V1, V2, Mask, DAG))
return Shift;		return Shift;

		// Try to use bit shift instructions.
		if (SDValue Shift = lowerVectorShuffleAsBitShift(
		DL, MVT::v4i32, V1, V2, Mask, DAG))
		return Shift;

// There are special ways we can lower some single-element blends.		// There are special ways we can lower some single-element blends.
if (NumV2Elements == 1)		if (NumV2Elements == 1)
if (SDValue V = lowerVectorShuffleAsElementInsertion(MVT::v4i32, DL, V1, V2,		if (SDValue V = lowerVectorShuffleAsElementInsertion(MVT::v4i32, DL, V1, V2,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return V;		return V;

// Use dedicated unpack instructions for masks that match their pattern.		// Use dedicated unpack instructions for masks that match their pattern.
if (isShuffleEquivalent(Mask, 0, 4, 1, 5))		if (isShuffleEquivalent(Mask, 0, 4, 1, 5))
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(MVT::v8i16, DL, V,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return Broadcast;		return Broadcast;

// Try to use byte shift instructions.		// Try to use byte shift instructions.
if (SDValue Shift = lowerVectorShuffleAsByteShift(		if (SDValue Shift = lowerVectorShuffleAsByteShift(
DL, MVT::v8i16, V, V, Mask, DAG))		DL, MVT::v8i16, V, V, Mask, DAG))
return Shift;		return Shift;

		// Try to use bit shift instructions.
		if (SDValue Shift = lowerVectorShuffleAsBitShift(
		DL, MVT::v8i16, V, V, Mask, DAG))
		return Shift;

// Use dedicated unpack instructions for masks that match their pattern.		// Use dedicated unpack instructions for masks that match their pattern.
if (isShuffleEquivalent(Mask, 0, 0, 1, 1, 2, 2, 3, 3))		if (isShuffleEquivalent(Mask, 0, 0, 1, 1, 2, 2, 3, 3))
return DAG.getNode(X86ISD::UNPCKL, DL, MVT::v8i16, V, V);		return DAG.getNode(X86ISD::UNPCKL, DL, MVT::v8i16, V, V);
if (isShuffleEquivalent(Mask, 4, 4, 5, 5, 6, 6, 7, 7))		if (isShuffleEquivalent(Mask, 4, 4, 5, 5, 6, 6, 7, 7))
return DAG.getNode(X86ISD::UNPCKH, DL, MVT::v8i16, V, V);		return DAG.getNode(X86ISD::UNPCKH, DL, MVT::v8i16, V, V);

// Try to use byte rotation instructions.		// Try to use byte rotation instructions.
if (SDValue Rotate = lowerVectorShuffleAsByteRotate(		if (SDValue Rotate = lowerVectorShuffleAsByteRotate(
▲ Show 20 Lines • Show All 601 Lines • ▼ Show 20 Lines	static SDValue lowerV8I16VectorShuffle(SDValue Op, SDValue V1, SDValue V2,
assert(NumV1Inputs > 0 && "All single-input shuffles should be canonicalized "		assert(NumV1Inputs > 0 && "All single-input shuffles should be canonicalized "
"to be V1-input shuffles.");		"to be V1-input shuffles.");

// Try to use byte shift instructions.		// Try to use byte shift instructions.
if (SDValue Shift = lowerVectorShuffleAsByteShift(		if (SDValue Shift = lowerVectorShuffleAsByteShift(
DL, MVT::v8i16, V1, V2, Mask, DAG))		DL, MVT::v8i16, V1, V2, Mask, DAG))
return Shift;		return Shift;

		// Try to use bit shift instructions.
		if (SDValue Shift = lowerVectorShuffleAsBitShift(
		DL, MVT::v8i16, V1, V2, Mask, DAG))
		return Shift;

// There are special ways we can lower some single-element blends.		// There are special ways we can lower some single-element blends.
if (NumV2Inputs == 1)		if (NumV2Inputs == 1)
if (SDValue V = lowerVectorShuffleAsElementInsertion(MVT::v8i16, DL, V1, V2,		if (SDValue V = lowerVectorShuffleAsElementInsertion(MVT::v8i16, DL, V1, V2,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return V;		return V;

// Use dedicated unpack instructions for masks that match their pattern.		// Use dedicated unpack instructions for masks that match their pattern.
if (isShuffleEquivalent(Mask, 0, 8, 1, 9, 2, 10, 3, 11))		if (isShuffleEquivalent(Mask, 0, 8, 1, 9, 2, 10, 3, 11))
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	static SDValue lowerV16I8VectorShuffle(SDValue Op, SDValue V1, SDValue V2,
ArrayRef<int> OrigMask = SVOp->getMask();		ArrayRef<int> OrigMask = SVOp->getMask();
assert(OrigMask.size() == 16 && "Unexpected mask size for v16 shuffle!");		assert(OrigMask.size() == 16 && "Unexpected mask size for v16 shuffle!");

// Try to use byte shift instructions.		// Try to use byte shift instructions.
if (SDValue Shift = lowerVectorShuffleAsByteShift(		if (SDValue Shift = lowerVectorShuffleAsByteShift(
DL, MVT::v16i8, V1, V2, OrigMask, DAG))		DL, MVT::v16i8, V1, V2, OrigMask, DAG))
return Shift;		return Shift;

		// Try to use bit shift instructions.
		if (SDValue Shift = lowerVectorShuffleAsBitShift(
		DL, MVT::v16i8, V1, V2, OrigMask, DAG))
		return Shift;

// Try to use byte rotation instructions.		// Try to use byte rotation instructions.
if (SDValue Rotate = lowerVectorShuffleAsByteRotate(		if (SDValue Rotate = lowerVectorShuffleAsByteRotate(
DL, MVT::v16i8, V1, V2, OrigMask, Subtarget, DAG))		DL, MVT::v16i8, V1, V2, OrigMask, Subtarget, DAG))
return Rotate;		return Rotate;

// Try to use a zext lowering.		// Try to use a zext lowering.
if (SDValue ZExt = lowerVectorShuffleAsZeroOrAnyExtend(		if (SDValue ZExt = lowerVectorShuffleAsZeroOrAnyExtend(
DL, MVT::v16i8, V1, V2, OrigMask, Subtarget, DAG))		DL, MVT::v16i8, V1, V2, OrigMask, Subtarget, DAG))
▲ Show 20 Lines • Show All 994 Lines • ▼ Show 20 Lines	static SDValue lowerV8I32VectorShuffle(SDValue Op, SDValue V1, SDValue V2,
SDLoc DL(Op);		SDLoc DL(Op);
assert(V1.getSimpleValueType() == MVT::v8i32 && "Bad operand type!");		assert(V1.getSimpleValueType() == MVT::v8i32 && "Bad operand type!");
assert(V2.getSimpleValueType() == MVT::v8i32 && "Bad operand type!");		assert(V2.getSimpleValueType() == MVT::v8i32 && "Bad operand type!");
ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);		ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);
ArrayRef<int> Mask = SVOp->getMask();		ArrayRef<int> Mask = SVOp->getMask();
assert(Mask.size() == 8 && "Unexpected mask size for v8 shuffle!");		assert(Mask.size() == 8 && "Unexpected mask size for v8 shuffle!");
assert(Subtarget->hasAVX2() && "We can only lower v8i32 with AVX2!");		assert(Subtarget->hasAVX2() && "We can only lower v8i32 with AVX2!");

		// Try to use bit shift instructions.
		if (SDValue Shift = lowerVectorShuffleAsBitShift(
		DL, MVT::v8i32, V1, V2, Mask, DAG))
		return Shift;

if (SDValue Blend = lowerVectorShuffleAsBlend(DL, MVT::v8i32, V1, V2, Mask,		if (SDValue Blend = lowerVectorShuffleAsBlend(DL, MVT::v8i32, V1, V2, Mask,
Subtarget, DAG))		Subtarget, DAG))
return Blend;		return Blend;

// Check for being able to broadcast a single element.		// Check for being able to broadcast a single element.
if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(MVT::v8i32, DL, V1,		if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(MVT::v8i32, DL, V1,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return Broadcast;		return Broadcast;
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	static SDValue lowerV16I16VectorShuffle(SDValue Op, SDValue V1, SDValue V2,
assert(Mask.size() == 16 && "Unexpected mask size for v16 shuffle!");		assert(Mask.size() == 16 && "Unexpected mask size for v16 shuffle!");
assert(Subtarget->hasAVX2() && "We can only lower v16i16 with AVX2!");		assert(Subtarget->hasAVX2() && "We can only lower v16i16 with AVX2!");

// Check for being able to broadcast a single element.		// Check for being able to broadcast a single element.
if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(MVT::v16i16, DL, V1,		if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(MVT::v16i16, DL, V1,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return Broadcast;		return Broadcast;

		// Try to use bit shift instructions.
		if (SDValue Shift = lowerVectorShuffleAsBitShift(
		DL, MVT::v16i16, V1, V2, Mask, DAG))
		return Shift;

if (SDValue Blend = lowerVectorShuffleAsBlend(DL, MVT::v16i16, V1, V2, Mask,		if (SDValue Blend = lowerVectorShuffleAsBlend(DL, MVT::v16i16, V1, V2, Mask,
Subtarget, DAG))		Subtarget, DAG))
return Blend;		return Blend;

// Use dedicated unpack instructions for masks that match their pattern.		// Use dedicated unpack instructions for masks that match their pattern.
if (isShuffleEquivalent(Mask,		if (isShuffleEquivalent(Mask,
// First 128-bit lane:		// First 128-bit lane:
0, 16, 1, 17, 2, 18, 3, 19,		0, 16, 1, 17, 2, 18, 3, 19,
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	static SDValue lowerV32I8VectorShuffle(SDValue Op, SDValue V1, SDValue V2,
assert(Mask.size() == 32 && "Unexpected mask size for v32 shuffle!");		assert(Mask.size() == 32 && "Unexpected mask size for v32 shuffle!");
assert(Subtarget->hasAVX2() && "We can only lower v32i8 with AVX2!");		assert(Subtarget->hasAVX2() && "We can only lower v32i8 with AVX2!");

// Check for being able to broadcast a single element.		// Check for being able to broadcast a single element.
if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(MVT::v32i8, DL, V1,		if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(MVT::v32i8, DL, V1,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return Broadcast;		return Broadcast;

		// Try to use bit shift instructions.
		if (SDValue Shift = lowerVectorShuffleAsBitShift(
		DL, MVT::v32i8, V1, V2, Mask, DAG))
		return Shift;

if (SDValue Blend = lowerVectorShuffleAsBlend(DL, MVT::v32i8, V1, V2, Mask,		if (SDValue Blend = lowerVectorShuffleAsBlend(DL, MVT::v32i8, V1, V2, Mask,
Subtarget, DAG))		Subtarget, DAG))
return Blend;		return Blend;

// Use dedicated unpack instructions for masks that match their pattern.		// Use dedicated unpack instructions for masks that match their pattern.
// Note that these are repeated 128-bit lane unpacks, not unpacks across all		// Note that these are repeated 128-bit lane unpacks, not unpacks across all
// 256-bit lanes.		// 256-bit lanes.
if (isShuffleEquivalent(		if (isShuffleEquivalent(
▲ Show 20 Lines • Show All 6,257 Lines • ▼ Show 20 Lines	static SDValue getVectorMaskingNode(SDValue Op, SDValue Mask,
return DAG.getNode(ISD::VSELECT, dl, VT, VMask, Op, PreservedSrc);		return DAG.getNode(ISD::VSELECT, dl, VT, VMask, Op, PreservedSrc);
}		}

/// \brief Creates an SDNode for a predicated scalar operation.		/// \brief Creates an SDNode for a predicated scalar operation.
/// \returns (X86vselect \p Mask, \p Op, \p PreservedSrc).		/// \returns (X86vselect \p Mask, \p Op, \p PreservedSrc).
/// The mask is comming as MVT::i8 and it should be truncated		/// The mask is comming as MVT::i8 and it should be truncated
/// to MVT::i1 while lowering masking intrinsics.		/// to MVT::i1 while lowering masking intrinsics.
/// The main difference between ScalarMaskingNode and VectorMaskingNode is using		/// The main difference between ScalarMaskingNode and VectorMaskingNode is using
/// "X86select" instead of "vselect". We just can't create the "vselect" node for		/// "X86select" instead of "vselect". We just can't create the "vselect" node for
/// a scalar instruction.		/// a scalar instruction.
static SDValue getScalarMaskingNode(SDValue Op, SDValue Mask,		static SDValue getScalarMaskingNode(SDValue Op, SDValue Mask,
SDValue PreservedSrc,		SDValue PreservedSrc,
const X86Subtarget *Subtarget,		const X86Subtarget *Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
if (isAllOnes(Mask))		if (isAllOnes(Mask))
return Op;		return Op;

▲ Show 20 Lines • Show All 5,758 Lines • ▼ Show 20 Lines	if (ExtractedElements != 15)
return SDValue();		return SDValue();

// Ok, we've now decided to do the transformation.		// Ok, we've now decided to do the transformation.
// If 64-bit shifts are legal, use the extract-shift sequence,		// If 64-bit shifts are legal, use the extract-shift sequence,
// otherwise bounce the vector off the cache.		// otherwise bounce the vector off the cache.
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDValue Vals[4];		SDValue Vals[4];
SDLoc dl(InputVector);		SDLoc dl(InputVector);

if (TLI.isOperationLegal(ISD::SRA, MVT::i64)) {		if (TLI.isOperationLegal(ISD::SRA, MVT::i64)) {
SDValue Cst = DAG.getNode(ISD::BITCAST, dl, MVT::v2i64, InputVector);		SDValue Cst = DAG.getNode(ISD::BITCAST, dl, MVT::v2i64, InputVector);
EVT VecIdxTy = DAG.getTargetLoweringInfo().getVectorIdxTy();		EVT VecIdxTy = DAG.getTargetLoweringInfo().getVectorIdxTy();
SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i64, Cst,		SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i64, Cst,
DAG.getConstant(0, VecIdxTy));		DAG.getConstant(0, VecIdxTy));
SDValue TopHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i64, Cst,		SDValue TopHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i64, Cst,
DAG.getConstant(1, VecIdxTy));		DAG.getConstant(1, VecIdxTy));

SDValue ShAmt = DAG.getConstant(32,		SDValue ShAmt = DAG.getConstant(32,
DAG.getTargetLoweringInfo().getShiftAmountTy(MVT::i64));		DAG.getTargetLoweringInfo().getShiftAmountTy(MVT::i64));
Vals[0] = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, BottomHalf);		Vals[0] = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, BottomHalf);
Vals[1] = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32,		Vals[1] = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32,
DAG.getNode(ISD::SRA, dl, MVT::i64, BottomHalf, ShAmt));		DAG.getNode(ISD::SRA, dl, MVT::i64, BottomHalf, ShAmt));
Vals[2] = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, TopHalf);		Vals[2] = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, TopHalf);
Vals[3] = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32,		Vals[3] = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32,
DAG.getNode(ISD::SRA, dl, MVT::i64, TopHalf, ShAmt));		DAG.getNode(ISD::SRA, dl, MVT::i64, TopHalf, ShAmt));
} else {		} else {
▲ Show 20 Lines • Show All 3,726 Lines • Show Last 20 Lines

test/CodeGen/X86/combine-or.ll

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%or = or <2 x i64> %shuf1, %shuf2		%or = or <2 x i64> %shuf1, %shuf2
ret <2 x i64> %or		ret <2 x i64> %or
}		}


; Verify that the dag-combiner does not fold a OR of two shuffles into a single		; Verify that the dag-combiner does not fold a OR of two shuffles into a single
; shuffle instruction when the shuffle indexes are not compatible.		; shuffle instruction when the shuffle indexes are not compatible.

define <4 x i32> @test17(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @test17(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: test17:		; CHECK-LABEL: test17:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: xorps %xmm2, %xmm2		; CHECK-NEXT: psllq $32, %xmm0
; CHECK-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0,0]		; CHECK-NEXT: xorps %xmm2, %xmm2
; CHECK-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,0],xmm0[0,2]		; CHECK-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0,0]
; CHECK-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2,1,3]		; CHECK-NEXT: por %xmm1, %xmm0
; CHECK-NEXT: orps %xmm1, %xmm2		; CHECK-NEXT: retq
; CHECK-NEXT: movaps %xmm2, %xmm0		%shuf1 = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4 x i32><i32 4, i32 0, i32 4, i32 2>
; CHECK-NEXT: retq		%shuf2 = shufflevector <4 x i32> %b, <4 x i32> zeroinitializer, <4 x i32><i32 0, i32 1, i32 4, i32 4>
%shuf1 = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4 x i32><i32 4, i32 0, i32 4, i32 2>
%shuf2 = shufflevector <4 x i32> %b, <4 x i32> zeroinitializer, <4 x i32><i32 0, i32 1, i32 4, i32 4>
%or = or <4 x i32> %shuf1, %shuf2		%or = or <4 x i32> %shuf1, %shuf2
ret <4 x i32> %or		ret <4 x i32> %or
}		}


define <4 x i32> @test18(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @test18(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: test18:		; CHECK-LABEL: test18:
; CHECK: # BB#0:		; CHECK: # BB#0:
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-idiv.ll

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,3],xmm3[1,3]			; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,3],xmm3[1,3]
	; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2,1,3]			; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2,1,3]
	; SSE-NEXT: psubd %xmm2, %xmm1			; SSE-NEXT: psubd %xmm2, %xmm1
	; SSE-NEXT: psrld $1, %xmm1			; SSE-NEXT: psrld $1, %xmm1
	; SSE-NEXT: paddd %xmm2, %xmm1			; SSE-NEXT: paddd %xmm2, %xmm1
	; SSE-NEXT: psrld $2, %xmm1			; SSE-NEXT: psrld $2, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: test2:			; AVX-LABEL: test2:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1			; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1
	; AVX-NEXT: vpshufd {{.*#+}} ymm2 = ymm1[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm1, %ymm2
	; AVX-NEXT: vpshufd {{.*#+}} ymm3 = ymm0[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm0, %ymm3
	; AVX-NEXT: vpmuludq %ymm2, %ymm3, %ymm2			; AVX-NEXT: vpmuludq %ymm2, %ymm3, %ymm2
	; AVX-NEXT: vpmuludq %ymm1, %ymm0, %ymm1			; AVX-NEXT: vpmuludq %ymm1, %ymm0, %ymm1
	; AVX-NEXT: vpshufd {{.*#+}} ymm1 = ymm1[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm1, %ymm1
	; AVX-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1],ymm1[2],ymm2[3],ymm1[4],ymm2[5],ymm1[6],ymm2[7]			; AVX-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1],ymm1[2],ymm2[3],ymm1[4],ymm2[5],ymm1[6],ymm2[7]
	; AVX-NEXT: vpsubd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpsubd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpsrld $1, %ymm0, %ymm0			; AVX-NEXT: vpsrld $1, %ymm0, %ymm0
	; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpsrld $2, %ymm0, %ymm0			; AVX-NEXT: vpsrld $2, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%div = udiv <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>			%div = udiv <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>
	ret <8 x i32> %div			ret <8 x i32> %div
	}			}

	define <8 x i16> @test3(<8 x i16> %a) {			define <8 x i16> @test3(<8 x i16> %a) {
	▲ Show 20 Lines • Show All 823 Lines • ▼ Show 20 Lines
	; SSE-NEXT: psubd %xmm3, %xmm1			; SSE-NEXT: psubd %xmm3, %xmm1
	; SSE-NEXT: paddd %xmm2, %xmm1			; SSE-NEXT: paddd %xmm2, %xmm1
	; SSE-NEXT: movdqa %xmm1, %xmm2			; SSE-NEXT: movdqa %xmm1, %xmm2
	; SSE-NEXT: psrld $31, %xmm2			; SSE-NEXT: psrld $31, %xmm2
	; SSE-NEXT: psrad $2, %xmm1			; SSE-NEXT: psrad $2, %xmm1
	; SSE-NEXT: paddd %xmm2, %xmm1			; SSE-NEXT: paddd %xmm2, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: test9:			; AVX-LABEL: test9:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1			; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1
	; AVX-NEXT: vpshufd {{.*#+}} ymm2 = ymm1[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm1, %ymm2
	; AVX-NEXT: vpshufd {{.*#+}} ymm3 = ymm0[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm0, %ymm3
	; AVX-NEXT: vpmuldq %ymm2, %ymm3, %ymm2			; AVX-NEXT: vpmuldq %ymm2, %ymm3, %ymm2
	; AVX-NEXT: vpmuldq %ymm1, %ymm0, %ymm1			; AVX-NEXT: vpmuldq %ymm1, %ymm0, %ymm1
	; AVX-NEXT: vpshufd {{.*#+}} ymm1 = ymm1[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm1, %ymm1
	; AVX-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1],ymm1[2],ymm2[3],ymm1[4],ymm2[5],ymm1[6],ymm2[7]			; AVX-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1],ymm1[2],ymm2[3],ymm1[4],ymm2[5],ymm1[6],ymm2[7]
	; AVX-NEXT: vpaddd %ymm0, %ymm1, %ymm0			; AVX-NEXT: vpaddd %ymm0, %ymm1, %ymm0
	; AVX-NEXT: vpsrld $31, %ymm0, %ymm1			; AVX-NEXT: vpsrld $31, %ymm0, %ymm1
	; AVX-NEXT: vpsrad $2, %ymm0, %ymm0			; AVX-NEXT: vpsrad $2, %ymm0, %ymm0
	; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%div = sdiv <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>			%div = sdiv <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>
	ret <8 x i32> %div			ret <8 x i32> %div
	}			}

	define <8 x i32> @test10(<8 x i32> %a) {			define <8 x i32> @test10(<8 x i32> %a) {
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm4[1,1,3,3]			; SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm4[1,1,3,3]
	; SSE-NEXT: pmuludq %xmm3, %xmm4			; SSE-NEXT: pmuludq %xmm3, %xmm4
	; SSE-NEXT: pmuludq %xmm3, %xmm2			; SSE-NEXT: pmuludq %xmm3, %xmm2
	; SSE-NEXT: shufps {{.*#+}} xmm4 = xmm4[0,2],xmm2[0,2]			; SSE-NEXT: shufps {{.*#+}} xmm4 = xmm4[0,2],xmm2[0,2]
	; SSE-NEXT: shufps {{.*#+}} xmm4 = xmm4[0,2,1,3]			; SSE-NEXT: shufps {{.*#+}} xmm4 = xmm4[0,2,1,3]
	; SSE-NEXT: psubd %xmm4, %xmm1			; SSE-NEXT: psubd %xmm4, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: test10:			; AVX-LABEL: test10:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1			; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1
	; AVX-NEXT: vpshufd {{.*#+}} ymm2 = ymm1[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm1, %ymm2
	; AVX-NEXT: vpshufd {{.*#+}} ymm3 = ymm0[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm0, %ymm3
	; AVX-NEXT: vpmuludq %ymm2, %ymm3, %ymm2			; AVX-NEXT: vpmuludq %ymm2, %ymm3, %ymm2
	; AVX-NEXT: vpmuludq %ymm1, %ymm0, %ymm1			; AVX-NEXT: vpmuludq %ymm1, %ymm0, %ymm1
	; AVX-NEXT: vpshufd {{.*#+}} ymm1 = ymm1[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm1, %ymm1
	; AVX-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1],ymm1[2],ymm2[3],ymm1[4],ymm2[5],ymm1[6],ymm2[7]			; AVX-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1],ymm1[2],ymm2[3],ymm1[4],ymm2[5],ymm1[6],ymm2[7]
	; AVX-NEXT: vpsubd %ymm1, %ymm0, %ymm2			; AVX-NEXT: vpsubd %ymm1, %ymm0, %ymm2
	; AVX-NEXT: vpsrld $1, %ymm2, %ymm2			; AVX-NEXT: vpsrld $1, %ymm2, %ymm2
	; AVX-NEXT: vpaddd %ymm1, %ymm2, %ymm1			; AVX-NEXT: vpaddd %ymm1, %ymm2, %ymm1
	; AVX-NEXT: vpsrld $2, %ymm1, %ymm1			; AVX-NEXT: vpsrld $2, %ymm1, %ymm1
	; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm2			; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm2
	; AVX-NEXT: vpmulld %ymm2, %ymm1, %ymm1			; AVX-NEXT: vpmulld %ymm2, %ymm1, %ymm1
	; AVX-NEXT: vpsubd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpsubd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%rem = urem <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>			%rem = urem <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>
	ret <8 x i32> %rem			ret <8 x i32> %rem
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pshufd {{.*#+}} xmm3 = xmm2[1,1,3,3]			; SSE-NEXT: pshufd {{.*#+}} xmm3 = xmm2[1,1,3,3]
	; SSE-NEXT: pmuludq %xmm4, %xmm2			; SSE-NEXT: pmuludq %xmm4, %xmm2
	; SSE-NEXT: pmuludq %xmm4, %xmm3			; SSE-NEXT: pmuludq %xmm4, %xmm3
	; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm3[0,2]			; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm3[0,2]
	; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2,1,3]			; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2,1,3]
	; SSE-NEXT: psubd %xmm2, %xmm1			; SSE-NEXT: psubd %xmm2, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: test11:			; AVX-LABEL: test11:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1			; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1
	; AVX-NEXT: vpshufd {{.*#+}} ymm2 = ymm1[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm1, %ymm2
	; AVX-NEXT: vpshufd {{.*#+}} ymm3 = ymm0[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm0, %ymm3
	; AVX-NEXT: vpmuldq %ymm2, %ymm3, %ymm2			; AVX-NEXT: vpmuldq %ymm2, %ymm3, %ymm2
	; AVX-NEXT: vpmuldq %ymm1, %ymm0, %ymm1			; AVX-NEXT: vpmuldq %ymm1, %ymm0, %ymm1
	; AVX-NEXT: vpshufd {{.*#+}} ymm1 = ymm1[1,1,3,3,5,5,7,7]			; AVX-NEXT: vpsrlq $32, %ymm1, %ymm1
	; AVX-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1],ymm1[2],ymm2[3],ymm1[4],ymm2[5],ymm1[6],ymm2[7]			; AVX-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1],ymm1[2],ymm2[3],ymm1[4],ymm2[5],ymm1[6],ymm2[7]
	; AVX-NEXT: vpaddd %ymm0, %ymm1, %ymm1			; AVX-NEXT: vpaddd %ymm0, %ymm1, %ymm1
	; AVX-NEXT: vpsrld $31, %ymm1, %ymm2			; AVX-NEXT: vpsrld $31, %ymm1, %ymm2
	; AVX-NEXT: vpsrad $2, %ymm1, %ymm1			; AVX-NEXT: vpsrad $2, %ymm1, %ymm1
	; AVX-NEXT: vpaddd %ymm2, %ymm1, %ymm1			; AVX-NEXT: vpaddd %ymm2, %ymm1, %ymm1
	; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm2			; AVX-NEXT: vpbroadcastd {{.*}}(%rip), %ymm2
	; AVX-NEXT: vpmulld %ymm2, %ymm1, %ymm1			; AVX-NEXT: vpmulld %ymm2, %ymm1, %ymm1
	; AVX-NEXT: vpsubd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpsubd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%rem = srem <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>			%rem = srem <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>
	ret <8 x i32> %rem			ret <8 x i32> %rem
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-128-v16.ll

	Show First 20 Lines • Show All 1,079 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vmovaps %xmm0, (%rdi)			; AVX-NEXT: vmovaps %xmm0, (%rdi)
	; AVX-NEXT: vmovaps %xmm0, (%rsi)			; AVX-NEXT: vmovaps %xmm0, (%rsi)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	entry:			entry:
	%weird_zero = bitcast <4 x i32> zeroinitializer to <16 x i8>			%weird_zero = bitcast <4 x i32> zeroinitializer to <16 x i8>
	%shuffle.i = shufflevector <16 x i8> <i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 0, i8 0, i8 0, i8 0>, <16 x i8> %weird_zero, <16 x i32> <i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27>			%shuffle.i = shufflevector <16 x i8> <i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 0, i8 0, i8 0, i8 0>, <16 x i8> %weird_zero, <16 x i32> <i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27>
	%weirder_zero = bitcast <16 x i8> %shuffle.i to <4 x i32>			%weirder_zero = bitcast <16 x i8> %shuffle.i to <4 x i32>
	store <4 x i32> %weirder_zero, <4 x i32>* %ptr1, align 16			store <4 x i32> %weirder_zero, <4 x i32>* %ptr1, align 16
	store <4 x i32> zeroinitializer, <4 x i32>* %ptr2, align 16			store <4 x i32> zeroinitializer, <4 x i32>* %ptr2, align 16
	ret void			ret void
	}			}

				;
				; Shuffle to logical bit shifts
				;

				define <16 x i8> @shuffle_v16i8_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14(<16 x i8> %a, <16 x i8> %b) {
				; SSE-LABEL: shuffle_v16i8_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14:
				; SSE: # BB#0:
				; SSE-NEXT: psllw $8, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v16i8_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14:
				; AVX: # BB#0:
				; AVX-NEXT: vpsllw $8, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 16, i32 0, i32 16, i32 2, i32 16, i32 4, i32 16, i32 6, i32 16, i32 8, i32 16, i32 10, i32 16, i32 12, i32 16, i32 14>
				ret <16 x i8> %shuffle
				}

				define <16 x i8> @shuffle_v16i8_zz_zz_zz_00_zz_zz_zz_04_zz_zz_zz_08_zz_zz_zz_12(<16 x i8> %a, <16 x i8> %b) {
				; SSE-LABEL: shuffle_v16i8_zz_zz_zz_00_zz_zz_zz_04_zz_zz_zz_08_zz_zz_zz_12:
				; SSE: # BB#0:
				; SSE-NEXT: pslld $24, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v16i8_zz_zz_zz_00_zz_zz_zz_04_zz_zz_zz_08_zz_zz_zz_12:
				; AVX: # BB#0:
				; AVX-NEXT: vpslld $24, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 16, i32 16, i32 16, i32 0, i32 16, i32 16, i32 16, i32 4, i32 16, i32 16, i32 16, i32 8, i32 16, i32 16, i32 16, i32 12>
				ret <16 x i8> %shuffle
				}

				define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_00_zz_zz_zz_zz_zz_zz_zz_08(<16 x i8> %a, <16 x i8> %b) {
				; SSE-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_00_zz_zz_zz_zz_zz_zz_zz_08:
				; SSE: # BB#0:
				; SSE-NEXT: psllq $56, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_00_zz_zz_zz_zz_zz_zz_zz_08:
				; AVX: # BB#0:
				; AVX-NEXT: vpsllq $56, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 0, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 8>
				ret <16 x i8> %shuffle
				}

				define <16 x i8> @shuffle_v16i8_zz_00_uu_02_03_uu_05_06_zz_08_09_uu_11_12_13_14(<16 x i8> %a, <16 x i8> %b) {
				; SSE-LABEL: shuffle_v16i8_zz_00_uu_02_03_uu_05_06_zz_08_09_uu_11_12_13_14:
				; SSE: # BB#0:
				; SSE-NEXT: psllq $8, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v16i8_zz_00_uu_02_03_uu_05_06_zz_08_09_uu_11_12_13_14:
				; AVX: # BB#0:
				; AVX-NEXT: vpsllq $8, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 16, i32 0, i32 undef, i32 2, i32 3, i32 undef, i32 5, i32 6, i32 16, i32 8, i32 9, i32 undef, i32 11, i32 12, i32 13, i32 14>
				ret <16 x i8> %shuffle
				}

				define <16 x i8> @shuffle_v16i8_01_uu_uu_uu_uu_zz_uu_zz_uu_zz_11_zz_13_zz_15_zz(<16 x i8> %a, <16 x i8> %b) {
				; SSE-LABEL: shuffle_v16i8_01_uu_uu_uu_uu_zz_uu_zz_uu_zz_11_zz_13_zz_15_zz:
				; SSE: # BB#0:
				; SSE-NEXT: psrlw $8, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v16i8_01_uu_uu_uu_uu_zz_uu_zz_uu_zz_11_zz_13_zz_15_zz:
				; AVX: # BB#0:
				; AVX-NEXT: vpsrlw $8, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 16, i32 undef, i32 16, i32 undef, i32 16, i32 11, i32 16, i32 13, i32 16, i32 15, i32 16>
				ret <16 x i8> %shuffle
				}

				define <16 x i8> @shuffle_v16i8_02_03_zz_zz_06_07_uu_uu_uu_uu_uu_uu_14_15_zz_zz(<16 x i8> %a, <16 x i8> %b) {
				; SSE-LABEL: shuffle_v16i8_02_03_zz_zz_06_07_uu_uu_uu_uu_uu_uu_14_15_zz_zz:
				; SSE: # BB#0:
				; SSE-NEXT: psrld $16, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v16i8_02_03_zz_zz_06_07_uu_uu_uu_uu_uu_uu_14_15_zz_zz:
				; AVX: # BB#0:
				; AVX-NEXT: vpsrld $16, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 2, i32 3, i32 16, i32 16, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 14, i32 15, i32 16, i32 16>
				ret <16 x i8> %shuffle
				}

				define <16 x i8> @shuffle_v16i8_07_zz_zz_zz_zz_zz_uu_uu_15_uu_uu_uu_uu_uu_zz_zz(<16 x i8> %a, <16 x i8> %b) {
				; SSE-LABEL: shuffle_v16i8_07_zz_zz_zz_zz_zz_uu_uu_15_uu_uu_uu_uu_uu_zz_zz:
				; SSE: # BB#0:
				; SSE-NEXT: psrlq $56, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v16i8_07_zz_zz_zz_zz_zz_uu_uu_15_uu_uu_uu_uu_uu_zz_zz:
				; AVX: # BB#0:
				; AVX-NEXT: vpsrlq $56, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 7, i32 16, i32 16, i32 16, i32 16, i32 16, i32 undef, i32 undef, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 16, i32 16>
				ret <16 x i8> %shuffle
				}

test/CodeGen/X86/vector-shuffle-128-v4.ll

	Show First 20 Lines • Show All 1,350 Lines • ▼ Show 20 Lines
	; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,2,1,0]			; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,2,1,0]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_mem_v4f32_3210:			; AVX-LABEL: shuffle_mem_v4f32_3210:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpermilps {{.*#+}} xmm0 = mem[3,2,1,0]			; AVX-NEXT: vpermilps {{.*#+}} xmm0 = mem[3,2,1,0]
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = load <4 x float>* %ptr			%a = load <4 x float>* %ptr
	%shuffle = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			%shuffle = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	ret <4 x float> %shuffle			ret <4 x float> %shuffle
	}			}

				;
				; Shuffle to logical bit shifts
				;

				define <4 x i32> @shuffle_v4i32_z0zX(<4 x i32> %a) {
				; SSE-LABEL: shuffle_v4i32_z0zX:
				; SSE: # BB#0:
				; SSE-NEXT: psllq $32, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v4i32_z0zX:
				; AVX: # BB#0:
				; AVX-NEXT: vpsllq $32, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4 x i32> <i32 4, i32 0, i32 4, i32 undef>
				ret <4 x i32> %shuffle
				}

				define <4 x i32> @shuffle_v4i32_1z3z(<4 x i32> %a) {
				; SSE-LABEL: shuffle_v4i32_1z3z:
				; SSE: # BB#0:
				; SSE-NEXT: psrlq $32, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v4i32_1z3z:
				; AVX: # BB#0:
				; AVX-NEXT: vpsrlq $32, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4 x i32> <i32 1, i32 4, i32 3, i32 4>
				ret <4 x i32> %shuffle
				}

test/CodeGen/X86/vector-shuffle-128-v8.ll

	Show First 20 Lines • Show All 1,908 Lines • ▼ Show 20 Lines
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxwd %xmm0, %xmm0			; SSE41-NEXT: pmovzxwd %xmm0, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v8i16_0z1z2z3z:			; AVX-LABEL: shuffle_v8i16_0z1z2z3z:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxwd %xmm0, %xmm0			; AVX-NEXT: vpmovzxwd %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 9, i32 1, i32 11, i32 2, i32 13, i32 3, i32 15>			%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 9, i32 1, i32 11, i32 2, i32 13, i32 3, i32 15>
	ret <8 x i16> %shuffle			ret <8 x i16> %shuffle
	}			}

				;
				; Shuffle to logical bit shifts
				;
				define <8 x i16> @shuffle_v8i16_z0z2z4z6(<8 x i16> %a) {
				; SSE-LABEL: shuffle_v8i16_z0z2z4z6:
				; SSE: # BB#0:
				; SSE-NEXT: pslld $16, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v8i16_z0z2z4z6:
				; AVX: # BB#0:
				; AVX-NEXT: vpslld $16, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32><i32 8, i32 0, i32 8, i32 2, i32 8, i32 4, i32 8, i32 6>
				ret <8 x i16> %shuffle
				}

				define <8 x i16> @shuffle_v8i16_zzz0zzz4(<8 x i16> %a) {
				; SSE-LABEL: shuffle_v8i16_zzz0zzz4:
				; SSE: # BB#0:
				; SSE-NEXT: psllq $48, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v8i16_zzz0zzz4:
				; AVX: # BB#0:
				; AVX-NEXT: vpsllq $48, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32><i32 8, i32 8, i32 8, i32 0, i32 8, i32 8, i32 8, i32 4>
				ret <8 x i16> %shuffle
				}

				define <8 x i16> @shuffle_v8i16_zz01zX4X(<8 x i16> %a) {
				; SSE-LABEL: shuffle_v8i16_zz01zX4X:
				; SSE: # BB#0:
				; SSE-NEXT: psllq $32, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v8i16_zz01zX4X:
				; AVX: # BB#0:
				; AVX-NEXT: vpsllq $32, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32><i32 8, i32 8, i32 0, i32 1, i32 8, i32 undef, i32 4, i32 undef>
				ret <8 x i16> %shuffle
				}

				define <8 x i16> @shuffle_v8i16_z0X2z456(<8 x i16> %a) {
				; SSE-LABEL: shuffle_v8i16_z0X2z456:
				; SSE: # BB#0:
				; SSE-NEXT: psllq $16, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v8i16_z0X2z456:
				; AVX: # BB#0:
				; AVX-NEXT: vpsllq $16, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32><i32 8, i32 0, i32 undef, i32 2, i32 8, i32 4, i32 5, i32 6>
				ret <8 x i16> %shuffle
				}

				define <8 x i16> @shuffle_v8i16_1z3zXz7z(<8 x i16> %a) {
				; SSE-LABEL: shuffle_v8i16_1z3zXz7z:
				; SSE: # BB#0:
				; SSE-NEXT: psrld $16, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v8i16_1z3zXz7z:
				; AVX: # BB#0:
				; AVX-NEXT: vpsrld $16, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32><i32 1, i32 8, i32 3, i32 8, i32 undef, i32 8, i32 7, i32 8>
				ret <8 x i16> %shuffle
				}

				define <8 x i16> @shuffle_v8i16_1X3z567z(<8 x i16> %a) {
				; SSE-LABEL: shuffle_v8i16_1X3z567z:
				; SSE: # BB#0:
				; SSE-NEXT: psrlq $16, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v8i16_1X3z567z:
				; AVX: # BB#0:
				; AVX-NEXT: vpsrlq $16, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32><i32 1, i32 undef, i32 3, i32 8, i32 5, i32 6, i32 7, i32 8>
				ret <8 x i16> %shuffle
				}

				define <8 x i16> @shuffle_v8i16_23zz67zz(<8 x i16> %a) {
				; SSE-LABEL: shuffle_v8i16_23zz67zz:
				; SSE: # BB#0:
				; SSE-NEXT: psrlq $32, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v8i16_23zz67zz:
				; AVX: # BB#0:
				; AVX-NEXT: vpsrlq $32, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32><i32 2, i32 3, i32 8, i32 8, i32 6, i32 7, i32 8, i32 8>
				ret <8 x i16> %shuffle
				}

				define <8 x i16> @shuffle_v8i16_3zXXXzzz(<8 x i16> %a) {
				; SSE-LABEL: shuffle_v8i16_3zXXXzzz:
				; SSE: # BB#0:
				; SSE-NEXT: psrlq $48, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v8i16_3zXXXzzz:
				; AVX: # BB#0:
				; AVX-NEXT: vpsrlq $48, %xmm0
				; AVX-NEXT: retq
				%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32><i32 3, i32 8, i32 undef, i32 undef, i32 undef, i32 8, i32 8, i32 8>
				ret <8 x i16> %shuffle
				}

test/CodeGen/X86/vector-shuffle-256-v16.ll

	Show First 20 Lines • Show All 1,353 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v16i16_00_16_01_17_02_18_03_19_04_20_05_21_06_22_07_23:			; AVX2-LABEL: shuffle_v16i16_00_16_01_17_02_18_03_19_04_20_05_21_06_22_07_23:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpunpckhwd {{.*#+}} xmm2 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; AVX2-NEXT: vpunpckhwd {{.*#+}} xmm2 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; AVX2-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; AVX2-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>			%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>
	ret <16 x i16> %shuffle			ret <16 x i16> %shuffle
	}			}

				;
				; Shuffle to logical bit shifts
				;

				define <16 x i16> @shuffle_v16i16_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14(<16 x i16> %a) {
				; AVX1-LABEL: shuffle_v16i16_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14:
				; AVX1: # BB#0:
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
				; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; AVX1-NEXT: vpshuflw {{.*#+}} xmm3 = xmm3[0,0,0,0,4,5,6,7]
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm1 = xmm3[0],xmm1[0],xmm3[1],xmm1[1],xmm3[2],xmm1[2],xmm3[3],xmm1[3]
				; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3]
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v16i16_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpslld $16, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <16 x i16> %a, <16 x i16> zeroinitializer, <16 x i32> <i32 16, i32 0, i32 16, i32 2, i32 16, i32 4, i32 16, i32 6, i32 16, i32 8, i32 16, i32 10, i32 16, i32 12, i32 16, i32 14>
				ret <16 x i16> %shuffle
				}

				define <16 x i16> @shuffle_v16i16_zz_zz_zz_00_zz_zz_zz_04_zz_zz_zz_08_zz_zz_zz_12(<16 x i16> %a) {
				; AVX1-LABEL: shuffle_v16i16_zz_zz_zz_00_zz_zz_zz_04_zz_zz_zz_08_zz_zz_zz_12:
				; AVX1: # BB#0:
				; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
				; AVX1-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
				; AVX1-NEXT: vpshuflw {{.*#+}} xmm2 = xmm2[0,2,2,3,4,5,6,7]
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm2 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
				; AVX1-NEXT: vpshuflw {{.*#+}} xmm2 = xmm2[0,1,0,3,4,5,6,7]
				; AVX1-NEXT: vpshuflw {{.*#+}} xmm3 = xmm1[0,0,0,0,4,5,6,7]
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]
				; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
				; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,1,0,3,4,5,6,7]
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3]
				; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v16i16_zz_zz_zz_00_zz_zz_zz_04_zz_zz_zz_08_zz_zz_zz_12:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsllq $48, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <16 x i16> %a, <16 x i16> zeroinitializer, <16 x i32> <i32 16, i32 16, i32 16, i32 0, i32 16, i32 16, i32 16, i32 4, i32 16, i32 16, i32 16, i32 8, i32 16, i32 16, i32 16, i32 12>
				ret <16 x i16> %shuffle
				}

				define <16 x i16> @shuffle_v16i16_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz(<16 x i16> %a) {
				; AVX1-LABEL: shuffle_v16i16_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz:
				; AVX1: # BB#0:
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
				; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; AVX1-NEXT: vpshuflw {{.*#+}} xmm3 = xmm3[0,0,0,0,4,5,6,7]
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3]
				; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3]
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v16i16_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsrld $16, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <16 x i16> %a, <16 x i16> zeroinitializer, <16 x i32> <i32 1, i32 16, i32 3, i32 16, i32 5, i32 16, i32 7, i32 16, i32 9, i32 16, i32 11, i32 16, i32 13, i32 16, i32 15, i32 16>
				ret <16 x i16> %shuffle
				}

				define <16 x i16> @shuffle_v16i16_02_03_zz_zz_06_07_zz_zz_10_11_zz_zz_14_15_zz_zz(<16 x i16> %a) {
				; AVX1-LABEL: shuffle_v16i16_02_03_zz_zz_06_07_zz_zz_10_11_zz_zz_14_15_zz_zz:
				; AVX1: # BB#0:
				; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
				; AVX1-NEXT: vpshufd {{.*#+}} xmm3 = xmm2[2,3,0,1]
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1],xmm3[2],xmm1[2],xmm3[3],xmm1[3]
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [8,9,12,13,2,3,2,3,8,9,12,13,12,13,14,15]
				; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
				; AVX1-NEXT: vpshufb %xmm4, %xmm2, %xmm2
				; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
				; AVX1-NEXT: vpshufd {{.*#+}} xmm3 = xmm0[2,3,0,1]
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1],xmm3[2],xmm1[2],xmm3[3],xmm1[3]
				; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
				; AVX1-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
				; AVX1-NEXT: vpshufb %xmm4, %xmm0, %xmm0
				; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm3[0]
				; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v16i16_02_03_zz_zz_06_07_zz_zz_10_11_zz_zz_14_15_zz_zz:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsrlq $32, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <16 x i16> %a, <16 x i16> zeroinitializer, <16 x i32> <i32 2, i32 3, i32 16, i32 16, i32 6, i32 7, i32 16, i32 16, i32 10, i32 11, i32 16, i32 16, i32 14, i32 15, i32 16, i32 16>
				ret <16 x i16> %shuffle
				}

test/CodeGen/X86/vector-shuffle-256-v32.ll

	Show First 20 Lines • Show All 1,645 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v32i8_00_32_01_33_02_34_03_35_04_36_05_37_06_38_07_39_08_40_09_41_10_42_11_43_12_44_13_45_14_46_15_47:			; AVX2-LABEL: shuffle_v32i8_00_32_01_33_02_34_03_35_04_36_05_37_06_38_07_39_08_40_09_41_10_42_11_43_12_44_13_45_14_46_15_47:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm0[8],xmm1[8],xmm0[9],xmm1[9],xmm0[10],xmm1[10],xmm0[11],xmm1[11],xmm0[12],xmm1[12],xmm0[13],xmm1[13],xmm0[14],xmm1[14],xmm0[15],xmm1[15]			; AVX2-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm0[8],xmm1[8],xmm0[9],xmm1[9],xmm0[10],xmm1[10],xmm0[11],xmm1[11],xmm0[12],xmm1[12],xmm0[13],xmm1[13],xmm0[14],xmm1[14],xmm0[15],xmm1[15]
	; AVX2-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; AVX2-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

				;
				; Shuffle to logical bit shifts
				;

				define <32 x i8> @shuffle_v32i8_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14_zz_16_zz_18_zz_20_zz_22_zz_24_zz_26_zz_28_zz_30(<32 x i8> %a) {
				; AVX1-LABEL: shuffle_v32i8_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14_zz_16_zz_18_zz_20_zz_22_zz_24_zz_26_zz_28_zz_30:
				; AVX1: # BB#0:
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = <0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u>
				; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
				; AVX1-NEXT: vpshuflw $0, %xmm3, %xmm3 # xmm3 = xmm3[0,0,0,0,4,5,6,7]
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm1 = xmm3[0],xmm1[0],xmm3[1],xmm1[1],xmm3[2],xmm1[2],xmm3[3],xmm1[3],xmm3[4],xmm1[4],xmm3[5],xmm1[5],xmm3[6],xmm1[6],xmm3[7],xmm1[7]
				; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3],xmm3[4],xmm0[4],xmm3[5],xmm0[5],xmm3[6],xmm0[6],xmm3[7],xmm0[7]
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v32i8_zz_00_zz_02_zz_04_zz_06_zz_08_zz_10_zz_12_zz_14_zz_16_zz_18_zz_20_zz_22_zz_24_zz_26_zz_28_zz_30:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsllw $8, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <32 x i8> %a, <32 x i8> zeroinitializer, <32 x i32> <i32 32, i32 0, i32 32, i32 2, i32 32, i32 4, i32 32, i32 6, i32 32, i32 8, i32 32, i32 10, i32 32, i32 12, i32 32, i32 14, i32 32, i32 16, i32 32, i32 18, i32 32, i32 20, i32 32, i32 22, i32 32, i32 24, i32 32, i32 26, i32 32, i32 28, i32 32, i32 30>
				ret <32 x i8> %shuffle
				}

				define <32 x i8> @shuffle_v32i8_zz_zz_00_01_zz_zz_04_05_zz_zz_08_09_zz_zz_12_13_zz_zz_16_17_zz_zz_20_21_zz_zz_24_25_zz_zz_28_29(<32 x i8> %a) {
				; AVX1-LABEL: shuffle_v32i8_zz_zz_00_01_zz_zz_04_05_zz_zz_08_09_zz_zz_12_13_zz_zz_16_17_zz_zz_20_21_zz_zz_24_25_zz_zz_28_29:
				; AVX1: # BB#0:
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [128,128,0,1,128,128,4,5,128,128,8,9,128,128,12,13]
				; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; AVX1-NEXT: vpshufb {{.*#+}} xmm3 = xmm3[0,0],zero,zero,xmm3[0,0],zero,zero,xmm3[0,0],zero,zero,xmm3[0,0],zero,zero
				; AVX1-NEXT: vpor %xmm1, %xmm3, %xmm1
				; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpor %xmm0, %xmm3, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v32i8_zz_zz_00_01_zz_zz_04_05_zz_zz_08_09_zz_zz_12_13_zz_zz_16_17_zz_zz_20_21_zz_zz_24_25_zz_zz_28_29:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpslld $16, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <32 x i8> %a, <32 x i8> zeroinitializer, <32 x i32> <i32 32, i32 32, i32 0, i32 1, i32 32, i32 32, i32 4, i32 5, i32 32, i32 32, i32 8, i32 9, i32 32, i32 32, i32 12, i32 13, i32 32, i32 32, i32 16, i32 17, i32 32, i32 32, i32 20, i32 21, i32 32, i32 32, i32 24, i32 25, i32 32, i32 32, i32 28, i32 29>
				ret <32 x i8> %shuffle
				}

				define <32 x i8> @shuffle_v32i8_zz_zz_zz_zz_zz_zz_00_01_zz_zz_zz_zz_zz_zz_08_09_zz_zz_zz_zz_zz_zz_16_17_zz_zz_zz_zz_zz_zz_24_25(<32 x i8> %a) {
				; AVX1-LABEL: shuffle_v32i8_zz_zz_zz_zz_zz_zz_00_01_zz_zz_zz_zz_zz_zz_08_09_zz_zz_zz_zz_zz_zz_16_17_zz_zz_zz_zz_zz_zz_24_25:
				; AVX1: # BB#0:
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [128,128,128,128,128,128,0,1,128,128,128,128,128,128,8,9]
				; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; AVX1-NEXT: vpshufb {{.*#+}} xmm3 = xmm3[0,0,0,0,0,0],zero,zero,xmm3[0,0,0,0,0,0],zero,zero
				; AVX1-NEXT: vpor %xmm1, %xmm3, %xmm1
				; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpor %xmm0, %xmm3, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v32i8_zz_zz_zz_zz_zz_zz_00_01_zz_zz_zz_zz_zz_zz_08_09_zz_zz_zz_zz_zz_zz_16_17_zz_zz_zz_zz_zz_zz_24_25:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsllq $48, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <32 x i8> %a, <32 x i8> zeroinitializer, <32 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 0, i32 1, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 8, i32 9, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 16, i32 17, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 24, i32 25>
				ret <32 x i8> %shuffle
				}

				define <32 x i8> @shuffle_v32i8_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz_17_zz_19_zz_21_zz_23_zz_25_zz_27_zz_29_zz_31_zz(<32 x i8> %a) {
				; AVX1-LABEL: shuffle_v32i8_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz_17_zz_19_zz_21_zz_23_zz_25_zz_27_zz_29_zz_31_zz:
				; AVX1: # BB#0:
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = <1,3,5,7,9,11,13,15,u,u,u,u,u,u,u,u>
				; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
				; AVX1-NEXT: vpshuflw $0, %xmm3, %xmm3 # xmm3 = xmm3[0,0,0,0,4,5,6,7]
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3],xmm1[4],xmm3[4],xmm1[5],xmm3[5],xmm1[6],xmm3[6],xmm1[7],xmm3[7]
				; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v32i8_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz_17_zz_19_zz_21_zz_23_zz_25_zz_27_zz_29_zz_31_zz:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsrlw $8, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <32 x i8> %a, <32 x i8> zeroinitializer, <32 x i32> <i32 1, i32 32, i32 3, i32 32, i32 5, i32 32, i32 7, i32 32, i32 9, i32 32, i32 11, i32 32, i32 13, i32 32, i32 15, i32 32, i32 17, i32 32, i32 19, i32 32, i32 21, i32 32, i32 23, i32 32, i32 25, i32 32, i32 27, i32 32, i32 29, i32 32, i32 31, i32 32>
				ret <32 x i8> %shuffle
				}

				define <32 x i8> @shuffle_v32i8_02_03_zz_zz_06_07_zz_zz_10_11_zz_zz_14_15_zz_zz_18_19_zz_zz_22_23_zz_zz_26_27_zz_zz_30_31_zz_zz(<32 x i8> %a) {
				; AVX1-LABEL: shuffle_v32i8_02_03_zz_zz_06_07_zz_zz_10_11_zz_zz_14_15_zz_zz_18_19_zz_zz_22_23_zz_zz_26_27_zz_zz_30_31_zz_zz:
				; AVX1: # BB#0:
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [2,3,128,128,6,7,128,128,10,11,128,128,14,15,128,128]
				; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; AVX1-NEXT: vpshufb {{.*#+}} xmm3 = zero,zero,xmm3[0,0],zero,zero,xmm3[0,0],zero,zero,xmm3[0,0],zero,zero,xmm3[0,0]
				; AVX1-NEXT: vpor %xmm3, %xmm1, %xmm1
				; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpor %xmm3, %xmm0, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v32i8_02_03_zz_zz_06_07_zz_zz_10_11_zz_zz_14_15_zz_zz_18_19_zz_zz_22_23_zz_zz_26_27_zz_zz_30_31_zz_zz:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsrld $16, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <32 x i8> %a, <32 x i8> zeroinitializer, <32 x i32> <i32 2, i32 3, i32 32, i32 32, i32 6, i32 7, i32 32, i32 32, i32 10, i32 11, i32 32, i32 32, i32 14, i32 15, i32 32, i32 32, i32 18, i32 19, i32 32, i32 32, i32 22, i32 23, i32 32, i32 32, i32 26, i32 27, i32 32, i32 32, i32 30, i32 31, i32 32, i32 32>
				ret <32 x i8> %shuffle
				}

				define <32 x i8> @shuffle_v32i8_07_zz_zz_zz_zz_zz_zz_zz_15_zz_zz_zz_zz_z_zz_zz_23_zz_zz_zz_zz_zz_zz_zz_31_zz_zz_zz_zz_zz_zz_zz(<32 x i8> %a) {
				; AVX1-LABEL: shuffle_v32i8_07_zz_zz_zz_zz_zz_zz_zz_15_zz_zz_zz_zz_z_zz_zz_23_zz_zz_zz_zz_zz_zz_zz_31_zz_zz_zz_zz_zz_zz_zz:
				; AVX1: # BB#0:
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = <7,128,128,128,15,128,128,128,u,u,u,u,u,u,u,u>
				; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; AVX1-NEXT: vpshufb {{.*#+}} xmm4 = zero,xmm3[0,0,0],zero,xmm3[0,0,0,u,u,u,u,u,u,u,u]
				; AVX1-NEXT: vpor %xmm1, %xmm4, %xmm1
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
				; AVX1-NEXT: vpshuflw {{.*#+}} xmm3 = xmm3[0,0,0,0,4,5,6,7]
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3],xmm1[4],xmm3[4],xmm1[5],xmm3[5],xmm1[6],xmm3[6],xmm1[7],xmm3[7]
				; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpor %xmm0, %xmm4, %xmm0
				; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v32i8_07_zz_zz_zz_zz_zz_zz_zz_15_zz_zz_zz_zz_z_zz_zz_23_zz_zz_zz_zz_zz_zz_zz_31_zz_zz_zz_zz_zz_zz_zz:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsrlq $56, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <32 x i8> %a, <32 x i8> zeroinitializer, <32 x i32> <i32 7, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 15, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 23, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 31, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>
				ret <32 x i8> %shuffle
				}

test/CodeGen/X86/vector-shuffle-256-v8.ll

	Show First 20 Lines • Show All 1,840 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,0,0,0]			; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: splat_v8f32:			; AVX2-LABEL: splat_v8f32:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vbroadcastss %xmm0, %ymm0			; AVX2-NEXT: vbroadcastss %xmm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%1 = shufflevector <4 x float> %r, <4 x float> undef, <8 x i32> zeroinitializer			%1 = shufflevector <4 x float> %r, <4 x float> undef, <8 x i32> zeroinitializer
	ret <8 x float> %1			ret <8 x float> %1
	}			}

				;
				; Shuffle to logical bit shifts
				;

				define <8 x i32> @shuffle_v8i32_z0U2zUz6(<8 x i32> %a) {
				; AVX1-LABEL: shuffle_v8i32_z0U2zUz6:
				; AVX1: # BB#0:
				; AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[0,0,2,2,4,4,6,6]
				; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3],ymm1[4],ymm0[5],ymm1[6],ymm0[7]
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8i32_z0U2zUz6:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsllq $32, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <8 x i32> %a, <8 x i32> zeroinitializer, <8 x i32> <i32 8, i32 0, i32 undef, i32 2, i32 8, i32 undef, i32 8, i32 6>
				ret <8 x i32> %shuffle
				}

				define <8 x i32> @shuffle_v8i32_1U3z5zUU(<8 x i32> %a) {
				; AVX1-LABEL: shuffle_v8i32_1U3z5zUU:
				; AVX1: # BB#0:
				; AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7]
				; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1,2],ymm1[3],ymm0[4],ymm1[5],ymm0[6,7]
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8i32_1U3z5zUU:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpsrlq $32, %ymm0
				; AVX2-NEXT: retq
				%shuffle = shufflevector <8 x i32> %a, <8 x i32> zeroinitializer, <8 x i32> <i32 1, i32 undef, i32 3, i32 8, i32 5, i32 8, i32 undef, i32 undef>
				ret <8 x i32> %shuffle
				}