This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Shuffle mask decode support for zero extend, scalar float/double moves and integer load instructions
ClosedPublic

Authored by RKSimon on Jan 28 2015, 9:01 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet
chandlerc
andreadb

Commits

rG9c76b4746984: [X86][SSE] Shuffle mask decode support for zero extend, scalar float/double…
rL227688: [X86][SSE] Shuffle mask decode support for zero extend, scalar float/double…

Summary

This patch adds shuffle mask decodes for integer zero extends (pmovzx** and movq xmm,xmm) and scalar float/double loads/moves (movss/movsd).

Also adds shuffle mask decodes for integer loads (movd/movq).

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 18896.Jan 28 2015, 9:01 AM

RKSimon retitled this revision from to [X86][SSE] Shuffle mask decode support for zero extend, scalar float/double moves and integer load instructions.

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: chandlerc, qcolombet, andreadb, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

qcolombet added inline comments.Jan 28 2015, 5:33 PM

lib/Target/X86/InstPrinter/X86InstComments.cpp
24	Instead of returning SrcVT, I think, based on the uses of that function, that it would be better to return what you called the scale. If you want to stick with SrcVT, see my next comment.
34	The SrcVT is confusing to me. The used input type is v8i8, not v16i8. But indeed, the “register” type is v16i8. If you want to use SrcVT, please clarify what is the expected output of this function. Note: I like the use of SrcVT, instead of scale, but I found the types set here confusing. So my suggestion is do either of: Return Scale. Add a comment to explain the return SrcVT is the content of the whole register, not just that is used. Set SrcVT to what is actually used.
lib/Target/X86/Utils/X86ShuffleDecode.cpp
428	Format of the comments. Capital letter and period.

RKSimon mentioned this in D7251: [X86][AVX2] Added support for 256-bit zero extension shuffle matching.Jan 29 2015, 7:25 AM

RKSimon mentioned this in D7256: [X86][SSE] Added general integer shuffle matching for MOVQ instruction.Jan 29 2015, 8:48 AM

Thanks Quentin, I've updated the comments in the patch to explain that the SrcVT type represents the whole register, not just the lower elements that will be zero extended in the DstVT.

Hi Simon,

LGTM.

Thanks,
-Quentin

This revision is now accepted and ready to land.Jan 30 2015, 9:48 AM

Closed by commit rL227688: [X86][SSE] Shuffle mask decode support for zero extend, scalar float/double… (authored by RKSimon). · Explain WhyJan 31 2015, 6:11 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

InstPrinter/

	X86InstComments.cpp
	X86InstComments.cpp (revision 227566)

172 lines

Utils/

	X86ShuffleDecode.h
	X86ShuffleDecode.h (revision 227566)

10 lines

	X86ShuffleDecode.cpp
	X86ShuffleDecode.cpp (revision 227566)

32 lines

	X86ISelLowering.cpp
	X86ISelLowering.cpp (revision 227566)

15 lines

test/

CodeGen/

X86/

	vector-shuffle-128-v16.ll
	vector-shuffle-128-v16.ll (revision 227566)

28 lines

	vector-shuffle-128-v2.ll
	vector-shuffle-128-v2.ll (revision 227566)

90 lines

	vector-shuffle-128-v4.ll
	vector-shuffle-128-v4.ll (revision 227566)

68 lines

	vector-shuffle-128-v8.ll
	vector-shuffle-128-v8.ll (revision 227566)

16 lines

	vector-shuffle-256-v4.ll
	vector-shuffle-256-v4.ll (revision 227566)

8 lines

	vector-shuffle-256-v8.ll
	vector-shuffle-256-v8.ll (revision 227566)

6 lines

Diff 19028

lib/Target/X86/InstPrinter/X86InstComments.cpp

Show All 15 Lines
#include "MCTargetDesc/X86MCTargetDesc.h"		#include "MCTargetDesc/X86MCTargetDesc.h"
#include "Utils/X86ShuffleDecode.h"		#include "Utils/X86ShuffleDecode.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/CodeGen/MachineValueType.h"		#include "llvm/CodeGen/MachineValueType.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

		/// \brief Extracts the src/dst types for a given zero extension instruction.
		qcolombetUnsubmitted Not Done Reply Inline Actions Instead of returning SrcVT, I think, based on the uses of that function, that it would be better to return what you called the scale. If you want to stick with SrcVT, see my next comment. qcolombet: Instead of returning SrcVT, I think, based on the uses of that function, that it would be…
		/// \note While the number of elements in DstVT type correct, the
		/// number in the SrcVT type is expanded to fill the src xmm register and the
		/// upper elements may not be included in the dst xmm/ymm register.
		static void getZeroExtensionTypes(const MCInst *MI, MVT &SrcVT, MVT &DstVT) {
		switch (MI->getOpcode()) {
		default:
		llvm_unreachable("Unknown zero extension instruction");
		// i8 zero extension
		case X86::PMOVZXBWrm:
		case X86::PMOVZXBWrr:
		qcolombetUnsubmitted Not Done Reply Inline Actions The SrcVT is confusing to me. The used input type is v8i8, not v16i8. But indeed, the “register” type is v16i8. If you want to use SrcVT, please clarify what is the expected output of this function. Note: I like the use of SrcVT, instead of scale, but I found the types set here confusing. So my suggestion is do either of: Return Scale. Add a comment to explain the return SrcVT is the content of the whole register, not just that is used. Set SrcVT to what is actually used. qcolombet: The SrcVT is confusing to me. The used input type is v8i8, not v16i8. But indeed, the…
		case X86::VPMOVZXBWrm:
		case X86::VPMOVZXBWrr:
		SrcVT = MVT::v16i8;
		DstVT = MVT::v8i16;
		break;
		case X86::VPMOVZXBWYrm:
		case X86::VPMOVZXBWYrr:
		SrcVT = MVT::v16i8;
		DstVT = MVT::v16i16;
		break;
		case X86::PMOVZXBDrm:
		case X86::PMOVZXBDrr:
		case X86::VPMOVZXBDrm:
		case X86::VPMOVZXBDrr:
		SrcVT = MVT::v16i8;
		DstVT = MVT::v4i32;
		break;
		case X86::VPMOVZXBDYrm:
		case X86::VPMOVZXBDYrr:
		SrcVT = MVT::v16i8;
		DstVT = MVT::v8i32;
		break;
		case X86::PMOVZXBQrm:
		case X86::PMOVZXBQrr:
		case X86::VPMOVZXBQrm:
		case X86::VPMOVZXBQrr:
		SrcVT = MVT::v16i8;
		DstVT = MVT::v2i64;
		break;
		case X86::VPMOVZXBQYrm:
		case X86::VPMOVZXBQYrr:
		SrcVT = MVT::v16i8;
		DstVT = MVT::v4i64;
		break;
		// i16 zero extension
		case X86::PMOVZXWDrm:
		case X86::PMOVZXWDrr:
		case X86::VPMOVZXWDrm:
		case X86::VPMOVZXWDrr:
		SrcVT = MVT::v8i16;
		DstVT = MVT::v4i32;
		break;
		case X86::VPMOVZXWDYrm:
		case X86::VPMOVZXWDYrr:
		SrcVT = MVT::v8i16;
		DstVT = MVT::v8i32;
		break;
		case X86::PMOVZXWQrm:
		case X86::PMOVZXWQrr:
		case X86::VPMOVZXWQrm:
		case X86::VPMOVZXWQrr:
		SrcVT = MVT::v8i16;
		DstVT = MVT::v2i64;
		break;
		case X86::VPMOVZXWQYrm:
		case X86::VPMOVZXWQYrr:
		SrcVT = MVT::v8i16;
		DstVT = MVT::v4i64;
		break;
		// i32 zero extension
		case X86::PMOVZXDQrm:
		case X86::PMOVZXDQrr:
		case X86::VPMOVZXDQrm:
		case X86::VPMOVZXDQrr:
		SrcVT = MVT::v4i32;
		DstVT = MVT::v2i64;
		break;
		case X86::VPMOVZXDQYrm:
		case X86::VPMOVZXDQYrr:
		SrcVT = MVT::v4i32;
		DstVT = MVT::v4i64;
		break;
		}
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Top Level Entrypoint		// Top Level Entrypoint
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// EmitAnyX86InstComments - This function decodes x86 instructions and prints		/// EmitAnyX86InstComments - This function decodes x86 instructions and prints
/// newline terminated strings to the specified string if desired. This		/// newline terminated strings to the specified string if desired. This
/// information is shown in disassembly dumps when verbose assembly is enabled.		/// information is shown in disassembly dumps when verbose assembly is enabled.
bool llvm::EmitAnyX86InstComments(const MCInst *MI, raw_ostream &OS,		bool llvm::EmitAnyX86InstComments(const MCInst *MI, raw_ostream &OS,
▲ Show 20 Lines • Show All 713 Lines • ▼ Show 20 Lines	case X86::VPERMPDYri:
// FALL THROUGH.		// FALL THROUGH.
case X86::VPERMQYmi:		case X86::VPERMQYmi:
case X86::VPERMPDYmi:		case X86::VPERMPDYmi:
if(MI->getOperand(MI->getNumOperands()-1).isImm())		if(MI->getOperand(MI->getNumOperands()-1).isImm())
DecodeVPERMMask(MI->getOperand(MI->getNumOperands()-1).getImm(),		DecodeVPERMMask(MI->getOperand(MI->getNumOperands()-1).getImm(),
ShuffleMask);		ShuffleMask);
DestName = getRegName(MI->getOperand(0).getReg());		DestName = getRegName(MI->getOperand(0).getReg());
break;		break;

		case X86::MOVSDrr:
		case X86::VMOVSDrr:
		Src2Name = getRegName(MI->getOperand(2).getReg());
		// FALL THROUGH.
		case X86::MOVSDrm:
		case X86::VMOVSDrm:
		DecodeScalarMoveMask(MVT::v2f64, nullptr == Src2Name, ShuffleMask);
		Src1Name = getRegName(MI->getOperand(1).getReg());
		DestName = getRegName(MI->getOperand(0).getReg());
		break;
		case X86::MOVSSrr:
		case X86::VMOVSSrr:
		Src2Name = getRegName(MI->getOperand(2).getReg());
		// FALL THROUGH.
		case X86::MOVSSrm:
		case X86::VMOVSSrm:
		DecodeScalarMoveMask(MVT::v4f32, nullptr == Src2Name, ShuffleMask);
		Src1Name = getRegName(MI->getOperand(1).getReg());
		DestName = getRegName(MI->getOperand(0).getReg());
		break;

		case X86::MOVPQI2QIrr:
		case X86::MOVZPQILo2PQIrr:
		case X86::VMOVPQI2QIrr:
		case X86::VMOVZPQILo2PQIrr:
		Src1Name = getRegName(MI->getOperand(1).getReg());
		// FALL THROUGH.
		case X86::MOVQI2PQIrm:
		case X86::MOVZQI2PQIrm:
		case X86::MOVZPQILo2PQIrm:
		case X86::VMOVQI2PQIrm:
		case X86::VMOVZQI2PQIrm:
		case X86::VMOVZPQILo2PQIrm:
		DecodeZeroMoveLowMask(MVT::v2i64, ShuffleMask);
		DestName = getRegName(MI->getOperand(0).getReg());
		break;
		case X86::MOVDI2PDIrm:
		case X86::VMOVDI2PDIrm:
		DecodeZeroMoveLowMask(MVT::v4i32, ShuffleMask);
		DestName = getRegName(MI->getOperand(0).getReg());
		break;

		case X86::PMOVZXBWrr:
		case X86::PMOVZXBDrr:
		case X86::PMOVZXBQrr:
		case X86::PMOVZXWDrr:
		case X86::PMOVZXWQrr:
		case X86::PMOVZXDQrr:
		case X86::VPMOVZXBWrr:
		case X86::VPMOVZXBDrr:
		case X86::VPMOVZXBQrr:
		case X86::VPMOVZXWDrr:
		case X86::VPMOVZXWQrr:
		case X86::VPMOVZXDQrr:
		case X86::VPMOVZXBWYrr:
		case X86::VPMOVZXBDYrr:
		case X86::VPMOVZXBQYrr:
		case X86::VPMOVZXWDYrr:
		case X86::VPMOVZXWQYrr:
		case X86::VPMOVZXDQYrr:
		Src1Name = getRegName(MI->getOperand(1).getReg());
		// FALL THROUGH.
		case X86::PMOVZXBWrm:
		case X86::PMOVZXBDrm:
		case X86::PMOVZXBQrm:
		case X86::PMOVZXWDrm:
		case X86::PMOVZXWQrm:
		case X86::PMOVZXDQrm:
		case X86::VPMOVZXBWrm:
		case X86::VPMOVZXBDrm:
		case X86::VPMOVZXBQrm:
		case X86::VPMOVZXWDrm:
		case X86::VPMOVZXWQrm:
		case X86::VPMOVZXDQrm:
		case X86::VPMOVZXBWYrm:
		case X86::VPMOVZXBDYrm:
		case X86::VPMOVZXBQYrm:
		case X86::VPMOVZXWDYrm:
		case X86::VPMOVZXWQYrm:
		case X86::VPMOVZXDQYrm: {
		MVT SrcVT, DstVT;
		getZeroExtensionTypes(MI, SrcVT, DstVT);
		DecodeZeroExtendMask(SrcVT, DstVT, ShuffleMask);
		DestName = getRegName(MI->getOperand(0).getReg());
		} break;
}		}

// The only comments we decode are shuffles, so give up if we were unable to		// The only comments we decode are shuffles, so give up if we were unable to
// decode a shuffle mask.		// decode a shuffle mask.
if (ShuffleMask.empty())		if (ShuffleMask.empty())
return false;		return false;

if (!DestName) DestName = Src1Name;		if (!DestName) DestName = Src1Name;
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

lib/Target/X86/Utils/X86ShuffleDecode.h

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

	/// DecodeVPERMMask - this decodes the shuffle masks for VPERMQ/VPERMPD.			/// DecodeVPERMMask - this decodes the shuffle masks for VPERMQ/VPERMPD.
	/// No VT provided since it only works on 256-bit, 4 element vectors.			/// No VT provided since it only works on 256-bit, 4 element vectors.
	void DecodeVPERMMask(unsigned Imm, SmallVectorImpl<int> &ShuffleMask);			void DecodeVPERMMask(unsigned Imm, SmallVectorImpl<int> &ShuffleMask);

	/// \brief Decode a VPERMILP variable mask from an IR-level vector constant.			/// \brief Decode a VPERMILP variable mask from an IR-level vector constant.
	void DecodeVPERMILPMask(const Constant *C, SmallVectorImpl<int> &ShuffleMask);			void DecodeVPERMILPMask(const Constant *C, SmallVectorImpl<int> &ShuffleMask);

				/// \brief Decode a zero extension instruction as a shuffle mask.
				void DecodeZeroExtendMask(MVT SrcVT, MVT DstVT,
				SmallVectorImpl<int> &ShuffleMask);

				/// \brief Decode a move lower and zero upper instruction as a shuffle mask.
				void DecodeZeroMoveLowMask(MVT VT, SmallVectorImpl<int> &ShuffleMask);

				/// \brief Decode a scalar float move instruction as a shuffle mask.
				void DecodeScalarMoveMask(MVT VT, bool IsLoad,
				SmallVectorImpl<int> &ShuffleMask);
	} // llvm namespace			} // llvm namespace

	#endif			#endif

lib/Target/X86/Utils/X86ShuffleDecode.cpp

Show First 20 Lines • Show All 393 Lines • ▼ Show 20 Lines	for (int i = 0; i < NumElements; ++i) {
uint64_t Element = cast<ConstantInt>(COp)->getZExtValue();		uint64_t Element = cast<ConstantInt>(COp)->getZExtValue();
// Only the least significant 2 bits of the integer are used.		// Only the least significant 2 bits of the integer are used.
int Index = Base + (Element & 0x3);		int Index = Base + (Element & 0x3);
ShuffleMask.push_back(Index);		ShuffleMask.push_back(Index);
}		}
}		}
}		}

		void DecodeZeroExtendMask(MVT SrcVT, MVT DstVT, SmallVectorImpl<int> &Mask) {
		unsigned NumSrcElts = SrcVT.getVectorNumElements();
		unsigned NumDstElts = DstVT.getVectorNumElements();
		unsigned SrcScalarBits = SrcVT.getScalarSizeInBits();
		unsigned DstScalarBits = DstVT.getScalarSizeInBits();
		unsigned Scale = DstScalarBits / SrcScalarBits;
		assert(SrcScalarBits < DstScalarBits &&
		"Expected zero extension mask to increase scalar size");
		assert(NumSrcElts >= NumDstElts && "Too many zero extension lanes");

		for (unsigned i = 0; i != NumDstElts; i++) {
		Mask.push_back(i);
		for (unsigned j = 1; j != Scale; j++)
		Mask.push_back(SM_SentinelZero);
		}
		}

		void DecodeZeroMoveLowMask(MVT VT, SmallVectorImpl<int> &ShuffleMask) {
		unsigned NumElts = VT.getVectorNumElements();
		ShuffleMask.push_back(0);
		for (unsigned i = 1; i < NumElts; i++)
		ShuffleMask.push_back(SM_SentinelZero);
		}

		void DecodeScalarMoveMask(MVT VT, bool IsLoad, SmallVectorImpl<int> &Mask) {
		// First element comes from the first element of second source.
		// Remaining elements: Load zero extends / Move copies from first source.
		qcolombetUnsubmitted Not Done Reply Inline Actions Format of the comments. Capital letter and period. qcolombet: Format of the comments. Capital letter and period.
		unsigned NumElts = VT.getVectorNumElements();
		Mask.push_back(NumElts);
		for (unsigned i = 1; i < NumElts; i++)
		Mask.push_back(IsLoad ? SM_SentinelZero : i);
		}
} // llvm namespace		} // llvm namespace

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,511 Lines • ▼ Show 20 Lines	case X86ISD::PSHUFB: {
return false;		return false;
}		}
case X86ISD::VPERMI:		case X86ISD::VPERMI:
ImmN = N->getOperand(N->getNumOperands()-1);		ImmN = N->getOperand(N->getNumOperands()-1);
DecodeVPERMMask(cast<ConstantSDNode>(ImmN)->getZExtValue(), Mask);		DecodeVPERMMask(cast<ConstantSDNode>(ImmN)->getZExtValue(), Mask);
IsUnary = true;		IsUnary = true;
break;		break;
case X86ISD::MOVSS:		case X86ISD::MOVSS:
case X86ISD::MOVSD: {		case X86ISD::MOVSD:
// The index 0 always comes from the first element of the second source,		DecodeScalarMoveMask(VT, /* IsLoad */ false, Mask);
// this is why MOVSS and MOVSD are used in the first place. The other
// elements come from the other positions of the first source vector
Mask.push_back(NumElems);
for (unsigned i = 1; i != NumElems; ++i) {
Mask.push_back(i);
}
break;		break;
}
case X86ISD::VPERM2X128:		case X86ISD::VPERM2X128:
ImmN = N->getOperand(N->getNumOperands()-1);		ImmN = N->getOperand(N->getNumOperands()-1);
DecodeVPERM2X128Mask(VT, cast<ConstantSDNode>(ImmN)->getZExtValue(), Mask);		DecodeVPERM2X128Mask(VT, cast<ConstantSDNode>(ImmN)->getZExtValue(), Mask);
if (Mask.empty()) return false;		if (Mask.empty()) return false;
break;		break;
case X86ISD::MOVSLDUP:		case X86ISD::MOVSLDUP:
DecodeMOVSLDUPMask(VT, Mask);		DecodeMOVSLDUPMask(VT, Mask);
IsUnary = true;		IsUnary = true;
▲ Show 20 Lines • Show All 19,216 Lines • ▼ Show 20 Lines	else {
SmallVector<SDValue, 16> Ops(NumConcat);		SmallVector<SDValue, 16> Ops(NumConcat);
SDValue ZeroVal = DAG.getConstant(0, Mask.getValueType());		SDValue ZeroVal = DAG.getConstant(0, Mask.getValueType());
Ops[0] = Mask;		Ops[0] = Mask;
for (unsigned i = 1; i != NumConcat; ++i)		for (unsigned i = 1; i != NumConcat; ++i)
Ops[i] = ZeroVal;		Ops[i] = ZeroVal;

NewMask = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewMaskVT, Ops);		NewMask = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewMaskVT, Ops);
}		}

SDValue WideLd = DAG.getMaskedLoad(WideVecVT, dl, Mld->getChain(),		SDValue WideLd = DAG.getMaskedLoad(WideVecVT, dl, Mld->getChain(),
Mld->getBasePtr(), NewMask, WideSrc0,		Mld->getBasePtr(), NewMask, WideSrc0,
Mld->getMemoryVT(), Mld->getMemOperand(),		Mld->getMemoryVT(), Mld->getMemOperand(),
ISD::NON_EXTLOAD);		ISD::NON_EXTLOAD);
SDValue NewVec = DAG.getNode(X86ISD::VSEXT, dl, VT, WideLd);		SDValue NewVec = DAG.getNode(X86ISD::VSEXT, dl, VT, WideLd);
return DCI.CombineTo(N, NewVec, WideLd.getValue(1), true);		return DCI.CombineTo(N, NewVec, WideLd.getValue(1), true);

}		}
Show All 13 Lines	static SDValue PerformMSTORECombine(SDNode *N, SelectionDAG &DAG,
unsigned FromSz = VT.getVectorElementType().getSizeInBits();		unsigned FromSz = VT.getVectorElementType().getSizeInBits();
unsigned ToSz = StVT.getVectorElementType().getSizeInBits();		unsigned ToSz = StVT.getVectorElementType().getSizeInBits();

// From, To sizes and ElemCount must be pow of two		// From, To sizes and ElemCount must be pow of two
assert (isPowerOf2_32(NumElems * FromSz * ToSz) &&		assert (isPowerOf2_32(NumElems * FromSz * ToSz) &&
"Unexpected size for truncating masked store");		"Unexpected size for truncating masked store");
// We are going to use the original vector elt for storing.		// We are going to use the original vector elt for storing.
// Accumulated smaller vector elements must be a multiple of the store size.		// Accumulated smaller vector elements must be a multiple of the store size.
assert (((NumElems * FromSz) % ToSz) == 0 &&		assert (((NumElems * FromSz) % ToSz) == 0 &&
"Unexpected ratio for truncating masked store");		"Unexpected ratio for truncating masked store");

unsigned SizeRatio = FromSz / ToSz;		unsigned SizeRatio = FromSz / ToSz;
assert(SizeRatio * NumElems * ToSz == VT.getSizeInBits());		assert(SizeRatio * NumElems * ToSz == VT.getSizeInBits());

// Create a type on which we perform the shuffle		// Create a type on which we perform the shuffle
EVT WideVecVT = EVT::getVectorVT(*DAG.getContext(),		EVT WideVecVT = EVT::getVectorVT(*DAG.getContext(),
StVT.getScalarType(), NumElems*SizeRatio);		StVT.getScalarType(), NumElems*SizeRatio);
▲ Show 20 Lines • Show All 2,007 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-128-v16.ll

	Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pxor %xmm2, %xmm2			; SSE2-NEXT: pxor %xmm2, %xmm2
	; SSE2-NEXT: movdqa %xmm1, %xmm3			; SSE2-NEXT: movdqa %xmm1, %xmm3
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm3 = xmm3[0,1,2,3,7,6,5,4]			; SSE2-NEXT: pshufhw {{.*#+}} xmm3 = xmm3[0,1,2,3,7,6,5,4]
	; SSE2-NEXT: movdqa %xmm0, %xmm4			; SSE2-NEXT: movdqa %xmm0, %xmm4
	; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8],xmm2[8],xmm4[9],xmm2[9],xmm4[10],xmm2[10],xmm4[11],xmm2[11],xmm4[12],xmm2[12],xmm4[13],xmm2[13],xmm4[14],xmm2[14],xmm4[15],xmm2[15]			; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8],xmm2[8],xmm4[9],xmm2[9],xmm4[10],xmm2[10],xmm4[11],xmm2[11],xmm4[12],xmm2[12],xmm4[13],xmm2[13],xmm4[14],xmm2[14],xmm4[15],xmm2[15]
	; SSE2-NEXT: pshuflw {{.*#+}} xmm4 = xmm4[3,2,1,0,4,5,6,7]			; SSE2-NEXT: pshuflw {{.*#+}} xmm4 = xmm4[3,2,1,0,4,5,6,7]
	; SSE2-NEXT: movsd %xmm4, %xmm3			; SSE2-NEXT: movsd {{.*#+}} xmm3 = xmm4[0],xmm3[1]
	; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]			; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,6,5,4]			; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,6,5,4]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
	; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[3,2,1,0,4,5,6,7]			; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[3,2,1,0,4,5,6,7]
	; SSE2-NEXT: movsd %xmm0, %xmm1			; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
	; SSE2-NEXT: packuswb %xmm3, %xmm1			; SSE2-NEXT: packuswb %xmm3, %xmm1
	; SSE2-NEXT: movdqa %xmm1, %xmm0			; SSE2-NEXT: movdqa %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_03_02_01_00_31_30_29_28_11_10_09_08_23_22_21_20:			; SSSE3-LABEL: shuffle_v16i8_03_02_01_00_31_30_29_28_11_10_09_08_23_22_21_20:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = zero,zero,zero,zero,xmm1[15,14,13,12],zero,zero,zero,zero,xmm1[7,6,5,4]			; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = zero,zero,zero,zero,xmm1[15,14,13,12],zero,zero,zero,zero,xmm1[7,6,5,4]
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[3,2,1,0],zero,zero,zero,zero,xmm0[11,10,9,8],zero,zero,zero,zero			; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[3,2,1,0],zero,zero,zero,zero,xmm0[11,10,9,8],zero,zero,zero,zero
	▲ Show 20 Lines • Show All 428 Lines • ▼ Show 20 Lines
	;			;
	; SSSE3-LABEL: shuffle_v16i8_00_uu_uu_uu_uu_uu_uu_uu_01_uu_uu_uu_uu_uu_uu_uu:			; SSSE3-LABEL: shuffle_v16i8_00_uu_uu_uu_uu_uu_uu_uu_01_uu_uu_uu_uu_uu_uu_uu:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_00_uu_uu_uu_uu_uu_uu_uu_01_uu_uu_uu_uu_uu_uu_uu:			; SSE41-LABEL: shuffle_v16i8_00_uu_uu_uu_uu_uu_uu_uu_01_uu_uu_uu_uu_uu_uu_uu:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxbq %xmm0, %xmm0			; SSE41-NEXT: pmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_00_uu_uu_uu_uu_uu_uu_uu_01_uu_uu_uu_uu_uu_uu_uu:			; AVX-LABEL: shuffle_v16i8_00_uu_uu_uu_uu_uu_uu_uu_01_uu_uu_uu_uu_uu_uu_uu:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxbq %xmm0, %xmm0			; AVX-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz(<16 x i8> %a) {
	; SSE2-LABEL: shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz:			; SSE2-LABEL: shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: pxor %xmm1, %xmm1
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]			; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz:			; SSSE3-LABEL: shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz:			; SSE41-LABEL: shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxbq %xmm0, %xmm0			; SSE41-NEXT: pmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz:			; AVX-LABEL: shuffle_v16i8_00_zz_zz_zz_zz_zz_zz_zz_01_zz_zz_zz_zz_zz_zz_zz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxbq %xmm0, %xmm0			; AVX-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 1, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 1, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu(<16 x i8> %a) {
	; SSE2-LABEL: shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu:			; SSE2-LABEL: shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu:			; SSSE3-LABEL: shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu:			; SSE41-LABEL: shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxbd %xmm0, %xmm0			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu:			; AVX-LABEL: shuffle_v16i8_00_uu_uu_uu_01_uu_uu_uu_02_uu_uu_uu_03_uu_uu_uu:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxbd %xmm0, %xmm0			; AVX-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 2, i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 undef>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 2, i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 undef>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz(<16 x i8> %a) {
	; SSE2-LABEL: shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz:			; SSE2-LABEL: shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: pxor %xmm1, %xmm1
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz:			; SSSE3-LABEL: shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: pxor %xmm1, %xmm1			; SSSE3-NEXT: pxor %xmm1, %xmm1
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz:			; SSE41-LABEL: shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxbd %xmm0, %xmm0			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz:			; AVX-LABEL: shuffle_v16i8_00_zz_zz_zz_01_zz_zz_zz_02_zz_zz_zz_03_zz_zz_zz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxbd %xmm0, %xmm0			; AVX-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 1, i32 21, i32 22, i32 23, i32 2, i32 25, i32 26, i32 27, i32 3, i32 29, i32 30, i32 31>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 1, i32 21, i32 22, i32 23, i32 2, i32 25, i32 26, i32 27, i32 3, i32 29, i32 30, i32 31>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu(<16 x i8> %a) {
	; SSE2-LABEL: shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu:			; SSE2-LABEL: shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu:			; SSSE3-LABEL: shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu:			; SSE41-LABEL: shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxbw %xmm0, %xmm0			; SSE41-NEXT: pmovzxbw {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu:			; AVX-LABEL: shuffle_v16i8_00_uu_01_uu_02_uu_03_uu_04_uu_05_uu_06_uu_07_uu:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxbw %xmm0, %xmm0			; AVX-NEXT: vpmovzxbw {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 3, i32 undef, i32 4, i32 undef, i32 5, i32 undef, i32 6, i32 undef, i32 7, i32 undef>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 3, i32 undef, i32 4, i32 undef, i32 5, i32 undef, i32 6, i32 undef, i32 7, i32 undef>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz(<16 x i8> %a) {
	; SSE2-LABEL: shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz:			; SSE2-LABEL: shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: pxor %xmm1, %xmm1
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz:			; SSSE3-LABEL: shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: pxor %xmm1, %xmm1			; SSSE3-NEXT: pxor %xmm1, %xmm1
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz:			; SSE41-LABEL: shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxbw %xmm0, %xmm0			; SSE41-NEXT: pmovzxbw {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz:			; AVX-LABEL: shuffle_v16i8_00_zz_01_zz_02_zz_03_zz_04_zz_05_zz_06_zz_07_zz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxbw %xmm0, %xmm0			; AVX-NEXT: vpmovzxbw {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 1, i32 19, i32 2, i32 21, i32 3, i32 23, i32 4, i32 25, i32 5, i32 27, i32 6, i32 29, i32 7, i32 31>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 1, i32 19, i32 2, i32 21, i32 3, i32 23, i32 4, i32 25, i32 5, i32 27, i32 6, i32 29, i32 7, i32 31>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_uu_10_02_07_22_14_07_02_18_03_01_14_18_09_11_00(<16 x i8> %a, <16 x i8> %b) {			define <16 x i8> @shuffle_v16i8_uu_10_02_07_22_14_07_02_18_03_01_14_18_09_11_00(<16 x i8> %a, <16 x i8> %b) {
	; SSE2-LABEL: shuffle_v16i8_uu_10_02_07_22_14_07_02_18_03_01_14_18_09_11_00:			; SSE2-LABEL: shuffle_v16i8_uu_10_02_07_22_14_07_02_18_03_01_14_18_09_11_00:
	; SSE2: # BB#0: # %entry			; SSE2: # BB#0: # %entry
	▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-128-v2.ll

Show First 20 Lines • Show All 205 Lines • ▼ Show 20 Lines
; AVX-NEXT: vmovhlps {{.*#+}} xmm0 = xmm1[1,1]		; AVX-NEXT: vmovhlps {{.*#+}} xmm0 = xmm1[1,1]
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 3, i32 3>		%shuffle = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 3, i32 3>
ret <2 x double> %shuffle		ret <2 x double> %shuffle
}		}
define <2 x double> @shuffle_v2f64_03(<2 x double> %a, <2 x double> %b) {		define <2 x double> @shuffle_v2f64_03(<2 x double> %a, <2 x double> %b) {
; SSE2-LABEL: shuffle_v2f64_03:		; SSE2-LABEL: shuffle_v2f64_03:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movsd %xmm0, %xmm1		; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSE2-NEXT: movaps %xmm1, %xmm0		; SSE2-NEXT: movaps %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v2f64_03:		; SSE3-LABEL: shuffle_v2f64_03:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movsd %xmm0, %xmm1		; SSE3-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSE3-NEXT: movaps %xmm1, %xmm0		; SSE3-NEXT: movaps %xmm1, %xmm0
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v2f64_03:		; SSSE3-LABEL: shuffle_v2f64_03:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movsd %xmm0, %xmm1		; SSSE3-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSSE3-NEXT: movaps %xmm1, %xmm0		; SSSE3-NEXT: movaps %xmm1, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v2f64_03:		; SSE41-LABEL: shuffle_v2f64_03:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]		; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: shuffle_v2f64_03:		; AVX-LABEL: shuffle_v2f64_03:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]		; AVX-NEXT: vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 0, i32 3>
ret <2 x double> %shuffle		ret <2 x double> %shuffle
}		}
define <2 x double> @shuffle_v2f64_21(<2 x double> %a, <2 x double> %b) {		define <2 x double> @shuffle_v2f64_21(<2 x double> %a, <2 x double> %b) {
; SSE2-LABEL: shuffle_v2f64_21:		; SSE2-LABEL: shuffle_v2f64_21:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movsd %xmm1, %xmm0		; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v2f64_21:		; SSE3-LABEL: shuffle_v2f64_21:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movsd %xmm1, %xmm0		; SSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v2f64_21:		; SSSE3-LABEL: shuffle_v2f64_21:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movsd %xmm1, %xmm0		; SSSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v2f64_21:		; SSE41-LABEL: shuffle_v2f64_21:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]		; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: shuffle_v2f64_21:		; AVX-LABEL: shuffle_v2f64_21:
Show All 30 Lines
; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm2[0]		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm2[0]
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 2>		%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 2>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}
define <2 x i64> @shuffle_v2i64_03(<2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @shuffle_v2i64_03(<2 x i64> %a, <2 x i64> %b) {
; SSE2-LABEL: shuffle_v2i64_03:		; SSE2-LABEL: shuffle_v2i64_03:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movsd %xmm0, %xmm1		; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSE2-NEXT: movaps %xmm1, %xmm0		; SSE2-NEXT: movaps %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v2i64_03:		; SSE3-LABEL: shuffle_v2i64_03:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movsd %xmm0, %xmm1		; SSE3-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSE3-NEXT: movaps %xmm1, %xmm0		; SSE3-NEXT: movaps %xmm1, %xmm0
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v2i64_03:		; SSSE3-LABEL: shuffle_v2i64_03:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movsd %xmm0, %xmm1		; SSSE3-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSSE3-NEXT: movaps %xmm1, %xmm0		; SSSE3-NEXT: movaps %xmm1, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v2i64_03:		; SSE41-LABEL: shuffle_v2i64_03:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: shuffle_v2i64_03:		; AVX1-LABEL: shuffle_v2i64_03:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]		; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: shuffle_v2i64_03:		; AVX2-LABEL: shuffle_v2i64_03:
; AVX2: # BB#0:		; AVX2: # BB#0:
; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]		; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 3>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}
define <2 x i64> @shuffle_v2i64_03_copy(<2 x i64> %nonce, <2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @shuffle_v2i64_03_copy(<2 x i64> %nonce, <2 x i64> %a, <2 x i64> %b) {
; SSE2-LABEL: shuffle_v2i64_03_copy:		; SSE2-LABEL: shuffle_v2i64_03_copy:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movsd %xmm1, %xmm2		; SSE2-NEXT: movsd {{.*#+}} xmm2 = xmm1[0],xmm2[1]
; SSE2-NEXT: movaps %xmm2, %xmm0		; SSE2-NEXT: movaps %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v2i64_03_copy:		; SSE3-LABEL: shuffle_v2i64_03_copy:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movsd %xmm1, %xmm2		; SSE3-NEXT: movsd {{.*#+}} xmm2 = xmm1[0],xmm2[1]
; SSE3-NEXT: movaps %xmm2, %xmm0		; SSE3-NEXT: movaps %xmm2, %xmm0
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v2i64_03_copy:		; SSSE3-LABEL: shuffle_v2i64_03_copy:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movsd %xmm1, %xmm2		; SSSE3-NEXT: movsd {{.*#+}} xmm2 = xmm1[0],xmm2[1]
; SSSE3-NEXT: movaps %xmm2, %xmm0		; SSSE3-NEXT: movaps %xmm2, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v2i64_03_copy:		; SSE41-LABEL: shuffle_v2i64_03_copy:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0,1,2,3],xmm2[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0,1,2,3],xmm2[4,5,6,7]
; SSE41-NEXT: movdqa %xmm1, %xmm0		; SSE41-NEXT: movdqa %xmm1, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm1[0]		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm1[0]
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 2, i32 0>		%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 2, i32 0>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}
define <2 x i64> @shuffle_v2i64_21(<2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @shuffle_v2i64_21(<2 x i64> %a, <2 x i64> %b) {
; SSE2-LABEL: shuffle_v2i64_21:		; SSE2-LABEL: shuffle_v2i64_21:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movsd %xmm1, %xmm0		; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v2i64_21:		; SSE3-LABEL: shuffle_v2i64_21:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movsd %xmm1, %xmm0		; SSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v2i64_21:		; SSSE3-LABEL: shuffle_v2i64_21:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movsd %xmm1, %xmm0		; SSSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v2i64_21:		; SSE41-LABEL: shuffle_v2i64_21:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: shuffle_v2i64_21:		; AVX1-LABEL: shuffle_v2i64_21:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: shuffle_v2i64_21:		; AVX2-LABEL: shuffle_v2i64_21:
; AVX2: # BB#0:		; AVX2: # BB#0:
; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]		; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 2, i32 1>		%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 2, i32 1>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}
define <2 x i64> @shuffle_v2i64_21_copy(<2 x i64> %nonce, <2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @shuffle_v2i64_21_copy(<2 x i64> %nonce, <2 x i64> %a, <2 x i64> %b) {
; SSE2-LABEL: shuffle_v2i64_21_copy:		; SSE2-LABEL: shuffle_v2i64_21_copy:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movsd %xmm2, %xmm1		; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
; SSE2-NEXT: movaps %xmm1, %xmm0		; SSE2-NEXT: movaps %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v2i64_21_copy:		; SSE3-LABEL: shuffle_v2i64_21_copy:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movsd %xmm2, %xmm1		; SSE3-NEXT: movsd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
; SSE3-NEXT: movaps %xmm1, %xmm0		; SSE3-NEXT: movaps %xmm1, %xmm0
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v2i64_21_copy:		; SSSE3-LABEL: shuffle_v2i64_21_copy:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movsd %xmm2, %xmm1		; SSSE3-NEXT: movsd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
; SSSE3-NEXT: movaps %xmm1, %xmm0		; SSSE3-NEXT: movaps %xmm1, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v2i64_21_copy:		; SSE41-LABEL: shuffle_v2i64_21_copy:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm2[0,1,2,3],xmm1[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm2[0,1,2,3],xmm1[4,5,6,7]
; SSE41-NEXT: movdqa %xmm1, %xmm0		; SSE41-NEXT: movdqa %xmm1, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>		%shuffle = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x i64> @shuffle_v2i64_0z(<2 x i64> %a) {		define <2 x i64> @shuffle_v2i64_0z(<2 x i64> %a) {
; SSE-LABEL: shuffle_v2i64_0z:		; SSE-LABEL: shuffle_v2i64_0z:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq %xmm0, %xmm0		; SSE-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: shuffle_v2i64_0z:		; AVX-LABEL: shuffle_v2i64_0z:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq %xmm0, %xmm0		; AVX-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <2 x i64> %a, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x i64> %a, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x i64> @shuffle_v2i64_1z(<2 x i64> %a) {		define <2 x i64> @shuffle_v2i64_1z(<2 x i64> %a) {
; SSE-LABEL: shuffle_v2i64_1z:		; SSE-LABEL: shuffle_v2i64_1z:
; SSE: # BB#0:		; SSE: # BB#0:
Show All 21 Lines	; AVX-NEXT: retq
%shuffle = shufflevector <2 x i64> %a, <2 x i64> zeroinitializer, <2 x i32> <i32 2, i32 0>		%shuffle = shufflevector <2 x i64> %a, <2 x i64> zeroinitializer, <2 x i32> <i32 2, i32 0>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x i64> @shuffle_v2i64_z1(<2 x i64> %a) {		define <2 x i64> @shuffle_v2i64_z1(<2 x i64> %a) {
; SSE2-LABEL: shuffle_v2i64_z1:		; SSE2-LABEL: shuffle_v2i64_z1:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: movsd %xmm1, %xmm0		; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v2i64_z1:		; SSE3-LABEL: shuffle_v2i64_z1:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: xorps %xmm1, %xmm1		; SSE3-NEXT: xorps %xmm1, %xmm1
; SSE3-NEXT: movsd %xmm1, %xmm0		; SSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v2i64_z1:		; SSSE3-LABEL: shuffle_v2i64_z1:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: xorps %xmm1, %xmm1		; SSSE3-NEXT: xorps %xmm1, %xmm1
; SSSE3-NEXT: movsd %xmm1, %xmm0		; SSSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v2i64_z1:		; SSE41-LABEL: shuffle_v2i64_z1:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pxor %xmm1, %xmm1		; SSE41-NEXT: pxor %xmm1, %xmm1
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
Show All 10 Lines
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%shuffle = shufflevector <2 x i64> %a, <2 x i64> zeroinitializer, <2 x i32> <i32 2, i32 1>		%shuffle = shufflevector <2 x i64> %a, <2 x i64> zeroinitializer, <2 x i32> <i32 2, i32 1>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x double> @shuffle_v2f64_0z(<2 x double> %a) {		define <2 x double> @shuffle_v2f64_0z(<2 x double> %a) {
; SSE-LABEL: shuffle_v2f64_0z:		; SSE-LABEL: shuffle_v2f64_0z:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq %xmm0, %xmm0		; SSE-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: shuffle_v2f64_0z:		; AVX-LABEL: shuffle_v2f64_0z:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq %xmm0, %xmm0		; AVX-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <2 x i32> <i32 0, i32 3>
ret <2 x double> %shuffle		ret <2 x double> %shuffle
}		}

define <2 x double> @shuffle_v2f64_1z(<2 x double> %a) {		define <2 x double> @shuffle_v2f64_1z(<2 x double> %a) {
; SSE-LABEL: shuffle_v2f64_1z:		; SSE-LABEL: shuffle_v2f64_1z:
; SSE: # BB#0:		; SSE: # BB#0:
Show All 26 Lines	; AVX-NEXT: retq
%shuffle = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <2 x i32> <i32 2, i32 0>		%shuffle = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <2 x i32> <i32 2, i32 0>
ret <2 x double> %shuffle		ret <2 x double> %shuffle
}		}

define <2 x double> @shuffle_v2f64_z1(<2 x double> %a) {		define <2 x double> @shuffle_v2f64_z1(<2 x double> %a) {
; SSE2-LABEL: shuffle_v2f64_z1:		; SSE2-LABEL: shuffle_v2f64_z1:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: movsd %xmm1, %xmm0		; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v2f64_z1:		; SSE3-LABEL: shuffle_v2f64_z1:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: xorps %xmm1, %xmm1		; SSE3-NEXT: xorps %xmm1, %xmm1
; SSE3-NEXT: movsd %xmm1, %xmm0		; SSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v2f64_z1:		; SSSE3-LABEL: shuffle_v2f64_z1:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: xorps %xmm1, %xmm1		; SSSE3-NEXT: xorps %xmm1, %xmm1
; SSSE3-NEXT: movsd %xmm1, %xmm0		; SSSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v2f64_z1:		; SSE41-LABEL: shuffle_v2f64_z1:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: xorpd %xmm1, %xmm1		; SSE41-NEXT: xorpd %xmm1, %xmm1
; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]		; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
Show All 19 Lines	; AVX-NEXT: retq
%v = insertelement <2 x i64> undef, i64 %a, i32 0		%v = insertelement <2 x i64> undef, i64 %a, i32 0
%shuffle = shufflevector <2 x i64> %v, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x i64> %v, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x i64> @insert_mem_and_zero_v2i64(i64* %ptr) {		define <2 x i64> @insert_mem_and_zero_v2i64(i64* %ptr) {
; SSE-LABEL: insert_mem_and_zero_v2i64:		; SSE-LABEL: insert_mem_and_zero_v2i64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq (%rdi), %xmm0		; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_mem_and_zero_v2i64:		; AVX-LABEL: insert_mem_and_zero_v2i64:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq (%rdi), %xmm0		; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = load i64* %ptr		%a = load i64* %ptr
%v = insertelement <2 x i64> undef, i64 %a, i32 0		%v = insertelement <2 x i64> undef, i64 %a, i32 0
%shuffle = shufflevector <2 x i64> %v, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x i64> %v, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x double> @insert_reg_and_zero_v2f64(double %a) {		define <2 x double> @insert_reg_and_zero_v2f64(double %a) {
; SSE-LABEL: insert_reg_and_zero_v2f64:		; SSE-LABEL: insert_reg_and_zero_v2f64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq %xmm0, %xmm0		; SSE-NEXT: movq %xmm0, %xmm0 {{.*#+}} xmm0 = xmm0[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_reg_and_zero_v2f64:		; AVX-LABEL: insert_reg_and_zero_v2f64:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq %xmm0, %xmm0		; AVX-NEXT: vmovq %xmm0, %xmm0 {{.*#+}} xmm0 = xmm0[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%v = insertelement <2 x double> undef, double %a, i32 0		%v = insertelement <2 x double> undef, double %a, i32 0
%shuffle = shufflevector <2 x double> %v, <2 x double> zeroinitializer, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x double> %v, <2 x double> zeroinitializer, <2 x i32> <i32 0, i32 3>
ret <2 x double> %shuffle		ret <2 x double> %shuffle
}		}

define <2 x double> @insert_mem_and_zero_v2f64(double* %ptr) {		define <2 x double> @insert_mem_and_zero_v2f64(double* %ptr) {
; SSE-LABEL: insert_mem_and_zero_v2f64:		; SSE-LABEL: insert_mem_and_zero_v2f64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movsd (%rdi), %xmm0		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_mem_and_zero_v2f64:		; AVX-LABEL: insert_mem_and_zero_v2f64:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovsd (%rdi), %xmm0		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = load double* %ptr		%a = load double* %ptr
%v = insertelement <2 x double> undef, double %a, i32 0		%v = insertelement <2 x double> undef, double %a, i32 0
%shuffle = shufflevector <2 x double> %v, <2 x double> zeroinitializer, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x double> %v, <2 x double> zeroinitializer, <2 x i32> <i32 0, i32 3>
ret <2 x double> %shuffle		ret <2 x double> %shuffle
}		}

define <2 x i64> @insert_reg_lo_v2i64(i64 %a, <2 x i64> %b) {		define <2 x i64> @insert_reg_lo_v2i64(i64 %a, <2 x i64> %b) {
; SSE2-LABEL: insert_reg_lo_v2i64:		; SSE2-LABEL: insert_reg_lo_v2i64:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movd %rdi, %xmm1		; SSE2-NEXT: movd %rdi, %xmm1
; SSE2-NEXT: movsd %xmm1, %xmm0		; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: insert_reg_lo_v2i64:		; SSE3-LABEL: insert_reg_lo_v2i64:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movd %rdi, %xmm1		; SSE3-NEXT: movd %rdi, %xmm1
; SSE3-NEXT: movsd %xmm1, %xmm0		; SSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: insert_reg_lo_v2i64:		; SSSE3-LABEL: insert_reg_lo_v2i64:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movd %rdi, %xmm1		; SSSE3-NEXT: movd %rdi, %xmm1
; SSSE3-NEXT: movsd %xmm1, %xmm0		; SSSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: insert_reg_lo_v2i64:		; SSE41-LABEL: insert_reg_lo_v2i64:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: movd %rdi, %xmm1		; SSE41-NEXT: movd %rdi, %xmm1
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
Show All 26 Lines
;		;
; SSSE3-LABEL: insert_mem_lo_v2i64:		; SSSE3-LABEL: insert_mem_lo_v2i64:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movlpd (%rdi), %xmm0		; SSSE3-NEXT: movlpd (%rdi), %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: insert_mem_lo_v2i64:		; SSE41-LABEL: insert_mem_lo_v2i64:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: movq (%rdi), %xmm1		; SSE41-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: insert_mem_lo_v2i64:		; AVX1-LABEL: insert_mem_lo_v2i64:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vmovq (%rdi), %xmm1		; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: insert_mem_lo_v2i64:		; AVX2-LABEL: insert_mem_lo_v2i64:
; AVX2: # BB#0:		; AVX2: # BB#0:
; AVX2-NEXT: vmovq (%rdi), %xmm1		; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]		; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%a = load i64* %ptr		%a = load i64* %ptr
%v = insertelement <2 x i64> undef, i64 %a, i32 0		%v = insertelement <2 x i64> undef, i64 %a, i32 0
%shuffle = shufflevector <2 x i64> %v, <2 x i64> %b, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x i64> %v, <2 x i64> %b, <2 x i32> <i32 0, i32 3>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

Show All 12 Lines	; AVX-NEXT: retq
%v = insertelement <2 x i64> undef, i64 %a, i32 0		%v = insertelement <2 x i64> undef, i64 %a, i32 0
%shuffle = shufflevector <2 x i64> %v, <2 x i64> %b, <2 x i32> <i32 2, i32 0>		%shuffle = shufflevector <2 x i64> %v, <2 x i64> %b, <2 x i32> <i32 2, i32 0>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x i64> @insert_mem_hi_v2i64(i64* %ptr, <2 x i64> %b) {		define <2 x i64> @insert_mem_hi_v2i64(i64* %ptr, <2 x i64> %b) {
; SSE-LABEL: insert_mem_hi_v2i64:		; SSE-LABEL: insert_mem_hi_v2i64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq (%rdi), %xmm1		; SSE-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_mem_hi_v2i64:		; AVX-LABEL: insert_mem_hi_v2i64:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq (%rdi), %xmm1		; AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = load i64* %ptr		%a = load i64* %ptr
%v = insertelement <2 x i64> undef, i64 %a, i32 0		%v = insertelement <2 x i64> undef, i64 %a, i32 0
%shuffle = shufflevector <2 x i64> %v, <2 x i64> %b, <2 x i32> <i32 2, i32 0>		%shuffle = shufflevector <2 x i64> %v, <2 x i64> %b, <2 x i32> <i32 2, i32 0>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x double> @insert_reg_lo_v2f64(double %a, <2 x double> %b) {		define <2 x double> @insert_reg_lo_v2f64(double %a, <2 x double> %b) {
; SSE-LABEL: insert_reg_lo_v2f64:		; SSE-LABEL: insert_reg_lo_v2f64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movsd %xmm0, %xmm1		; SSE-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSE-NEXT: movaps %xmm1, %xmm0		; SSE-NEXT: movaps %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_reg_lo_v2f64:		; AVX-LABEL: insert_reg_lo_v2f64:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovsd %xmm0, %xmm1, %xmm0		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
; AVX-NEXT: retq		; AVX-NEXT: retq
%v = insertelement <2 x double> undef, double %a, i32 0		%v = insertelement <2 x double> undef, double %a, i32 0
%shuffle = shufflevector <2 x double> %v, <2 x double> %b, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x double> %v, <2 x double> %b, <2 x i32> <i32 0, i32 3>
ret <2 x double> %shuffle		ret <2 x double> %shuffle
}		}

define <2 x double> @insert_mem_lo_v2f64(double* %ptr, <2 x double> %b) {		define <2 x double> @insert_mem_lo_v2f64(double* %ptr, <2 x double> %b) {
; SSE-LABEL: insert_mem_lo_v2f64:		; SSE-LABEL: insert_mem_lo_v2f64:
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; AVX-NEXT: retq		; AVX-NEXT: retq
%v = insertelement <2 x double> undef, double %a, i32 0		%v = insertelement <2 x double> undef, double %a, i32 0
%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> <i32 0, i32 0>		%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> <i32 0, i32 0>
ret <2 x double> %shuffle		ret <2 x double> %shuffle
}		}
define <2 x double> @insert_dup_mem_v2f64(double* %ptr) {		define <2 x double> @insert_dup_mem_v2f64(double* %ptr) {
; SSE2-LABEL: insert_dup_mem_v2f64:		; SSE2-LABEL: insert_dup_mem_v2f64:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movsd (%rdi), %xmm0		; SSE2-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0,0]		; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0,0]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: insert_dup_mem_v2f64:		; SSE3-LABEL: insert_dup_mem_v2f64:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movddup (%rdi), %xmm0		; SSE3-NEXT: movddup (%rdi), %xmm0
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
Show All 35 Lines

test/CodeGen/X86/vector-shuffle-128-v4.ll

Show First 20 Lines • Show All 435 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 0, i32 1, i32 5>		%shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 0, i32 1, i32 5>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x float> @shuffle_v4f32_4zzz(<4 x float> %a) {		define <4 x float> @shuffle_v4f32_4zzz(<4 x float> %a) {
; SSE2-LABEL: shuffle_v4f32_4zzz:		; SSE2-LABEL: shuffle_v4f32_4zzz:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: movss %xmm0, %xmm1		; SSE2-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE2-NEXT: movaps %xmm1, %xmm0		; SSE2-NEXT: movaps %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v4f32_4zzz:		; SSE3-LABEL: shuffle_v4f32_4zzz:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: xorps %xmm1, %xmm1		; SSE3-NEXT: xorps %xmm1, %xmm1
; SSE3-NEXT: movss %xmm0, %xmm1		; SSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE3-NEXT: movaps %xmm1, %xmm0		; SSE3-NEXT: movaps %xmm1, %xmm0
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v4f32_4zzz:		; SSSE3-LABEL: shuffle_v4f32_4zzz:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: xorps %xmm1, %xmm1		; SSSE3-NEXT: xorps %xmm1, %xmm1
; SSSE3-NEXT: movss %xmm0, %xmm1		; SSSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSSE3-NEXT: movaps %xmm1, %xmm0		; SSSE3-NEXT: movaps %xmm1, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v4f32_4zzz:		; SSE41-LABEL: shuffle_v4f32_4zzz:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: xorps %xmm1, %xmm1		; SSE41-NEXT: xorps %xmm1, %xmm1
; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]		; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%shuffle = shufflevector <4 x float> zeroinitializer, <4 x float> %a, <4 x i32> <i32 0, i32 6, i32 2, i32 3>		%shuffle = shufflevector <4 x float> zeroinitializer, <4 x float> %a, <4 x i32> <i32 0, i32 6, i32 2, i32 3>
ret <4 x float> %shuffle		ret <4 x float> %shuffle
}		}

define <4 x i32> @shuffle_v4i32_4zzz(<4 x i32> %a) {		define <4 x i32> @shuffle_v4i32_4zzz(<4 x i32> %a) {
; SSE2-LABEL: shuffle_v4i32_4zzz:		; SSE2-LABEL: shuffle_v4i32_4zzz:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: movss %xmm0, %xmm1		; SSE2-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE2-NEXT: movaps %xmm1, %xmm0		; SSE2-NEXT: movaps %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v4i32_4zzz:		; SSE3-LABEL: shuffle_v4i32_4zzz:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: xorps %xmm1, %xmm1		; SSE3-NEXT: xorps %xmm1, %xmm1
; SSE3-NEXT: movss %xmm0, %xmm1		; SSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE3-NEXT: movaps %xmm1, %xmm0		; SSE3-NEXT: movaps %xmm1, %xmm0
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v4i32_4zzz:		; SSSE3-LABEL: shuffle_v4i32_4zzz:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: xorps %xmm1, %xmm1		; SSSE3-NEXT: xorps %xmm1, %xmm1
; SSSE3-NEXT: movss %xmm0, %xmm1		; SSSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSSE3-NEXT: movaps %xmm1, %xmm0		; SSSE3-NEXT: movaps %xmm1, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v4i32_4zzz:		; SSE41-LABEL: shuffle_v4i32_4zzz:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pxor %xmm1, %xmm1		; SSE41-NEXT: pxor %xmm1, %xmm1
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3,4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3,4,5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: shuffle_v4i32_4zzz:		; AVX-LABEL: shuffle_v4i32_4zzz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3,4,5,6,7]		; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3,4,5,6,7]
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <4 x i32> zeroinitializer, <4 x i32> %a, <4 x i32> <i32 4, i32 1, i32 2, i32 3>		%shuffle = shufflevector <4 x i32> zeroinitializer, <4 x i32> %a, <4 x i32> <i32 4, i32 1, i32 2, i32 3>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x i32> @shuffle_v4i32_z4zz(<4 x i32> %a) {		define <4 x i32> @shuffle_v4i32_z4zz(<4 x i32> %a) {
; SSE2-LABEL: shuffle_v4i32_z4zz:		; SSE2-LABEL: shuffle_v4i32_z4zz:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: movss %xmm0, %xmm1		; SSE2-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v4i32_z4zz:		; SSE3-LABEL: shuffle_v4i32_z4zz:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: xorps %xmm1, %xmm1		; SSE3-NEXT: xorps %xmm1, %xmm1
; SSE3-NEXT: movss %xmm0, %xmm1		; SSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]		; SSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v4i32_z4zz:		; SSSE3-LABEL: shuffle_v4i32_z4zz:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: xorps %xmm1, %xmm1		; SSSE3-NEXT: xorps %xmm1, %xmm1
; SSSE3-NEXT: movss %xmm0, %xmm1		; SSSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]		; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v4i32_z4zz:		; SSE41-LABEL: shuffle_v4i32_z4zz:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pxor %xmm1, %xmm1		; SSE41-NEXT: pxor %xmm1, %xmm1
; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3,4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3,4,5,6,7]
; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]		; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: shuffle_v4i32_z4zz:		; AVX-LABEL: shuffle_v4i32_z4zz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3,4,5,6,7]		; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3,4,5,6,7]
; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]		; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <4 x i32> zeroinitializer, <4 x i32> %a, <4 x i32> <i32 2, i32 4, i32 3, i32 0>		%shuffle = shufflevector <4 x i32> zeroinitializer, <4 x i32> %a, <4 x i32> <i32 2, i32 4, i32 3, i32 0>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x i32> @shuffle_v4i32_zz4z(<4 x i32> %a) {		define <4 x i32> @shuffle_v4i32_zz4z(<4 x i32> %a) {
; SSE2-LABEL: shuffle_v4i32_zz4z:		; SSE2-LABEL: shuffle_v4i32_zz4z:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: movss %xmm0, %xmm1		; SSE2-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,0,1]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,0,1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: shuffle_v4i32_zz4z:		; SSE3-LABEL: shuffle_v4i32_zz4z:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: xorps %xmm1, %xmm1		; SSE3-NEXT: xorps %xmm1, %xmm1
; SSE3-NEXT: movss %xmm0, %xmm1		; SSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,0,1]		; SSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,0,1]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v4i32_zz4z:		; SSSE3-LABEL: shuffle_v4i32_zz4z:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: xorps %xmm1, %xmm1		; SSSE3-NEXT: xorps %xmm1, %xmm1
; SSSE3-NEXT: movss %xmm0, %xmm1		; SSSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,0,1]		; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,0,1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v4i32_zz4z:		; SSE41-LABEL: shuffle_v4i32_zz4z:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pxor %xmm1, %xmm1		; SSE41-NEXT: pxor %xmm1, %xmm1
; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3,4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3,4,5,6,7]
; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,0,1]		; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,0,1]
▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines
;		;
; SSSE3-LABEL: shuffle_v4i32_0u1u:		; SSSE3-LABEL: shuffle_v4i32_0u1u:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]		; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v4i32_0u1u:		; SSE41-LABEL: shuffle_v4i32_0u1u:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pmovzxdq %xmm0, %xmm0		; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: shuffle_v4i32_0u1u:		; AVX-LABEL: shuffle_v4i32_0u1u:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpmovzxdq %xmm0, %xmm0		; AVX-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>		%shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x i32> @shuffle_v4i32_0z1z(<4 x i32> %a) {		define <4 x i32> @shuffle_v4i32_0z1z(<4 x i32> %a) {
; SSE2-LABEL: shuffle_v4i32_0z1z:		; SSE2-LABEL: shuffle_v4i32_0z1z:
; SSE2: # BB#0:		; SSE2: # BB#0:
Show All 10 Lines
; SSSE3-LABEL: shuffle_v4i32_0z1z:		; SSSE3-LABEL: shuffle_v4i32_0z1z:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: pxor %xmm1, %xmm1		; SSSE3-NEXT: pxor %xmm1, %xmm1
; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_v4i32_0z1z:		; SSE41-LABEL: shuffle_v4i32_0z1z:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: pmovzxdq %xmm0, %xmm0		; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: shuffle_v4i32_0z1z:		; AVX-LABEL: shuffle_v4i32_0z1z:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpmovzxdq %xmm0, %xmm0		; AVX-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuffle = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 1, i32 7>		%shuffle = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 1, i32 7>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x i32> @insert_reg_and_zero_v4i32(i32 %a) {		define <4 x i32> @insert_reg_and_zero_v4i32(i32 %a) {
; SSE-LABEL: insert_reg_and_zero_v4i32:		; SSE-LABEL: insert_reg_and_zero_v4i32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movd %edi, %xmm0		; SSE-NEXT: movd %edi, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_reg_and_zero_v4i32:		; AVX-LABEL: insert_reg_and_zero_v4i32:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd %edi, %xmm0		; AVX-NEXT: vmovd %edi, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%v = insertelement <4 x i32> undef, i32 %a, i32 0		%v = insertelement <4 x i32> undef, i32 %a, i32 0
%shuffle = shufflevector <4 x i32> %v, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i32> %v, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x i32> @insert_mem_and_zero_v4i32(i32* %ptr) {		define <4 x i32> @insert_mem_and_zero_v4i32(i32* %ptr) {
; SSE-LABEL: insert_mem_and_zero_v4i32:		; SSE-LABEL: insert_mem_and_zero_v4i32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movd (%rdi), %xmm0		; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_mem_and_zero_v4i32:		; AVX-LABEL: insert_mem_and_zero_v4i32:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd (%rdi), %xmm0		; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = load i32* %ptr		%a = load i32* %ptr
%v = insertelement <4 x i32> undef, i32 %a, i32 0		%v = insertelement <4 x i32> undef, i32 %a, i32 0
%shuffle = shufflevector <4 x i32> %v, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i32> %v, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x float> @insert_reg_and_zero_v4f32(float %a) {		define <4 x float> @insert_reg_and_zero_v4f32(float %a) {
; SSE2-LABEL: insert_reg_and_zero_v4f32:		; SSE2-LABEL: insert_reg_and_zero_v4f32:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: movss %xmm0, %xmm1		; SSE2-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE2-NEXT: movaps %xmm1, %xmm0		; SSE2-NEXT: movaps %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: insert_reg_and_zero_v4f32:		; SSE3-LABEL: insert_reg_and_zero_v4f32:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: xorps %xmm1, %xmm1		; SSE3-NEXT: xorps %xmm1, %xmm1
; SSE3-NEXT: movss %xmm0, %xmm1		; SSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSE3-NEXT: movaps %xmm1, %xmm0		; SSE3-NEXT: movaps %xmm1, %xmm0
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: insert_reg_and_zero_v4f32:		; SSSE3-LABEL: insert_reg_and_zero_v4f32:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: xorps %xmm1, %xmm1		; SSSE3-NEXT: xorps %xmm1, %xmm1
; SSSE3-NEXT: movss %xmm0, %xmm1		; SSSE3-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
; SSSE3-NEXT: movaps %xmm1, %xmm0		; SSSE3-NEXT: movaps %xmm1, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: insert_reg_and_zero_v4f32:		; SSE41-LABEL: insert_reg_and_zero_v4f32:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: xorps %xmm1, %xmm1		; SSE41-NEXT: xorps %xmm1, %xmm1
; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]		; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: insert_reg_and_zero_v4f32:		; AVX-LABEL: insert_reg_and_zero_v4f32:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1		; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX-NEXT: vmovss %xmm0, %xmm1, %xmm0		; AVX-NEXT: vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; AVX-NEXT: retq		; AVX-NEXT: retq
%v = insertelement <4 x float> undef, float %a, i32 0		%v = insertelement <4 x float> undef, float %a, i32 0
%shuffle = shufflevector <4 x float> %v, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x float> %v, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x float> %shuffle		ret <4 x float> %shuffle
}		}

define <4 x float> @insert_mem_and_zero_v4f32(float* %ptr) {		define <4 x float> @insert_mem_and_zero_v4f32(float* %ptr) {
; SSE-LABEL: insert_mem_and_zero_v4f32:		; SSE-LABEL: insert_mem_and_zero_v4f32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movss (%rdi), %xmm0		; SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_mem_and_zero_v4f32:		; AVX-LABEL: insert_mem_and_zero_v4f32:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovss (%rdi), %xmm0		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = load float* %ptr		%a = load float* %ptr
%v = insertelement <4 x float> undef, float %a, i32 0		%v = insertelement <4 x float> undef, float %a, i32 0
%shuffle = shufflevector <4 x float> %v, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x float> %v, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x float> %shuffle		ret <4 x float> %shuffle
}		}

define <4 x i32> @insert_reg_lo_v4i32(i64 %a, <4 x i32> %b) {		define <4 x i32> @insert_reg_lo_v4i32(i64 %a, <4 x i32> %b) {
; SSE2-LABEL: insert_reg_lo_v4i32:		; SSE2-LABEL: insert_reg_lo_v4i32:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movd %rdi, %xmm1		; SSE2-NEXT: movd %rdi, %xmm1
; SSE2-NEXT: movsd %xmm1, %xmm0		; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: insert_reg_lo_v4i32:		; SSE3-LABEL: insert_reg_lo_v4i32:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: movd %rdi, %xmm1		; SSE3-NEXT: movd %rdi, %xmm1
; SSE3-NEXT: movsd %xmm1, %xmm0		; SSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: insert_reg_lo_v4i32:		; SSSE3-LABEL: insert_reg_lo_v4i32:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movd %rdi, %xmm1		; SSSE3-NEXT: movd %rdi, %xmm1
; SSSE3-NEXT: movsd %xmm1, %xmm0		; SSSE3-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: insert_reg_lo_v4i32:		; SSE41-LABEL: insert_reg_lo_v4i32:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: movd %rdi, %xmm1		; SSE41-NEXT: movd %rdi, %xmm1
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
Show All 27 Lines
;		;
; SSSE3-LABEL: insert_mem_lo_v4i32:		; SSSE3-LABEL: insert_mem_lo_v4i32:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: movlpd (%rdi), %xmm0		; SSSE3-NEXT: movlpd (%rdi), %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: insert_mem_lo_v4i32:		; SSE41-LABEL: insert_mem_lo_v4i32:
; SSE41: # BB#0:		; SSE41: # BB#0:
; SSE41-NEXT: movq (%rdi), %xmm1		; SSE41-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: insert_mem_lo_v4i32:		; AVX1-LABEL: insert_mem_lo_v4i32:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vmovq (%rdi), %xmm1		; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]		; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: insert_mem_lo_v4i32:		; AVX2-LABEL: insert_mem_lo_v4i32:
; AVX2: # BB#0:		; AVX2: # BB#0:
; AVX2-NEXT: vmovq (%rdi), %xmm1		; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]		; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%a = load <2 x i32>* %ptr		%a = load <2 x i32>* %ptr
%v = shufflevector <2 x i32> %a, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		%v = shufflevector <2 x i32> %a, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%shuffle = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 0, i32 1, i32 6, i32 7>		%shuffle = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

Show All 13 Lines	; AVX-NEXT: retq
%v = shufflevector <2 x i32> %a.cast, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		%v = shufflevector <2 x i32> %a.cast, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%shuffle = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 0, i32 1>		%shuffle = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x i32> @insert_mem_hi_v4i32(<2 x i32>* %ptr, <4 x i32> %b) {		define <4 x i32> @insert_mem_hi_v4i32(<2 x i32>* %ptr, <4 x i32> %b) {
; SSE-LABEL: insert_mem_hi_v4i32:		; SSE-LABEL: insert_mem_hi_v4i32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq (%rdi), %xmm1		; SSE-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_mem_hi_v4i32:		; AVX-LABEL: insert_mem_hi_v4i32:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq (%rdi), %xmm1		; AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = load <2 x i32>* %ptr		%a = load <2 x i32>* %ptr
%v = shufflevector <2 x i32> %a, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		%v = shufflevector <2 x i32> %a, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%shuffle = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 0, i32 1>		%shuffle = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x float> @insert_reg_lo_v4f32(double %a, <4 x float> %b) {		define <4 x float> @insert_reg_lo_v4f32(double %a, <4 x float> %b) {
; SSE-LABEL: insert_reg_lo_v4f32:		; SSE-LABEL: insert_reg_lo_v4f32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movsd %xmm0, %xmm1		; SSE-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSE-NEXT: movaps %xmm1, %xmm0		; SSE-NEXT: movaps %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_reg_lo_v4f32:		; AVX-LABEL: insert_reg_lo_v4f32:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovsd %xmm0, %xmm1, %xmm0		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
; AVX-NEXT: retq		; AVX-NEXT: retq
%a.cast = bitcast double %a to <2 x float>		%a.cast = bitcast double %a to <2 x float>
%v = shufflevector <2 x float> %a.cast, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		%v = shufflevector <2 x float> %a.cast, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%shuffle = shufflevector <4 x float> %v, <4 x float> %b, <4 x i32> <i32 0, i32 1, i32 6, i32 7>		%shuffle = shufflevector <4 x float> %v, <4 x float> %b, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
ret <4 x float> %shuffle		ret <4 x float> %shuffle
}		}

define <4 x float> @insert_mem_lo_v4f32(<2 x float>* %ptr, <4 x float> %b) {		define <4 x float> @insert_mem_lo_v4f32(<2 x float>* %ptr, <4 x float> %b) {
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-128-v8.ll

	Show First 20 Lines • Show All 1,823 Lines • ▼ Show 20 Lines
	; SSSE3-LABEL: shuffle_v8i16_0uuu1uuu:			; SSSE3-LABEL: shuffle_v8i16_0uuu1uuu:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]			; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]
	; SSSE3-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,5,6,7]			; SSSE3-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,5,6,7]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v8i16_0uuu1uuu:			; SSE41-LABEL: shuffle_v8i16_0uuu1uuu:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxwq %xmm0, %xmm0			; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v8i16_0uuu1uuu:			; AVX-LABEL: shuffle_v8i16_0uuu1uuu:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxwq %xmm0, %xmm0			; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 undef>			%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 undef>
	ret <8 x i16> %shuffle			ret <8 x i16> %shuffle
	}			}

	define <8 x i16> @shuffle_v8i16_0zzz1zzz(<8 x i16> %a) {			define <8 x i16> @shuffle_v8i16_0zzz1zzz(<8 x i16> %a) {
	; SSE2-LABEL: shuffle_v8i16_0zzz1zzz:			; SSE2-LABEL: shuffle_v8i16_0zzz1zzz:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: pxor %xmm1, %xmm1
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]			; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v8i16_0zzz1zzz:			; SSSE3-LABEL: shuffle_v8i16_0zzz1zzz:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: pxor %xmm1, %xmm1			; SSSE3-NEXT: pxor %xmm1, %xmm1
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]			; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v8i16_0zzz1zzz:			; SSE41-LABEL: shuffle_v8i16_0zzz1zzz:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxwq %xmm0, %xmm0			; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v8i16_0zzz1zzz:			; AVX-LABEL: shuffle_v8i16_0zzz1zzz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxwq %xmm0, %xmm0			; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 9, i32 10, i32 11, i32 1, i32 13, i32 14, i32 15>			%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 9, i32 10, i32 11, i32 1, i32 13, i32 14, i32 15>
	ret <8 x i16> %shuffle			ret <8 x i16> %shuffle
	}			}

	define <8 x i16> @shuffle_v8i16_0u1u2u3u(<8 x i16> %a) {			define <8 x i16> @shuffle_v8i16_0u1u2u3u(<8 x i16> %a) {
	; SSE2-LABEL: shuffle_v8i16_0u1u2u3u:			; SSE2-LABEL: shuffle_v8i16_0u1u2u3u:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v8i16_0u1u2u3u:			; SSSE3-LABEL: shuffle_v8i16_0u1u2u3u:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v8i16_0u1u2u3u:			; SSE41-LABEL: shuffle_v8i16_0u1u2u3u:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxwd %xmm0, %xmm0			; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v8i16_0u1u2u3u:			; AVX-LABEL: shuffle_v8i16_0u1u2u3u:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxwd %xmm0, %xmm0			; AVX-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 3, i32 undef>			%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 3, i32 undef>
	ret <8 x i16> %shuffle			ret <8 x i16> %shuffle
	}			}

	define <8 x i16> @shuffle_v8i16_0z1z2z3z(<8 x i16> %a) {			define <8 x i16> @shuffle_v8i16_0z1z2z3z(<8 x i16> %a) {
	; SSE2-LABEL: shuffle_v8i16_0z1z2z3z:			; SSE2-LABEL: shuffle_v8i16_0z1z2z3z:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: pxor %xmm1, %xmm1
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v8i16_0z1z2z3z:			; SSSE3-LABEL: shuffle_v8i16_0z1z2z3z:
	; SSSE3: # BB#0:			; SSSE3: # BB#0:
	; SSSE3-NEXT: pxor %xmm1, %xmm1			; SSSE3-NEXT: pxor %xmm1, %xmm1
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v8i16_0z1z2z3z:			; SSE41-LABEL: shuffle_v8i16_0z1z2z3z:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pmovzxwd %xmm0, %xmm0			; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v8i16_0z1z2z3z:			; AVX-LABEL: shuffle_v8i16_0z1z2z3z:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxwd %xmm0, %xmm0			; AVX-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 9, i32 1, i32 11, i32 2, i32 13, i32 3, i32 15>			%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 0, i32 9, i32 1, i32 11, i32 2, i32 13, i32 3, i32 15>
	ret <8 x i16> %shuffle			ret <8 x i16> %shuffle
	}			}

test/CodeGen/X86/vector-shuffle-256-v4.ll

Show First 20 Lines • Show All 788 Lines • ▼ Show 20 Lines	; AVX2-NEXT: retq
%v = insertelement <4 x i64> undef, i64 %a, i64 0		%v = insertelement <4 x i64> undef, i64 %a, i64 0
%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i64> %shuffle		ret <4 x i64> %shuffle
}		}

define <4 x i64> @insert_mem_and_zero_v4i64(i64* %ptr) {		define <4 x i64> @insert_mem_and_zero_v4i64(i64* %ptr) {
; AVX1-LABEL: insert_mem_and_zero_v4i64:		; AVX1-LABEL: insert_mem_and_zero_v4i64:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vmovq (%rdi), %xmm0		; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX1-NEXT: vxorpd %ymm1, %ymm1, %ymm1		; AVX1-NEXT: vxorpd %ymm1, %ymm1, %ymm1
; AVX1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]		; AVX1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: insert_mem_and_zero_v4i64:		; AVX2-LABEL: insert_mem_and_zero_v4i64:
; AVX2: # BB#0:		; AVX2: # BB#0:
; AVX2-NEXT: vmovq (%rdi), %xmm0		; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX2-NEXT: vpxor %ymm1, %ymm1, %ymm1		; AVX2-NEXT: vpxor %ymm1, %ymm1, %ymm1
; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5,6,7]		; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5,6,7]
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%a = load i64* %ptr		%a = load i64* %ptr
%v = insertelement <4 x i64> undef, i64 %a, i64 0		%v = insertelement <4 x i64> undef, i64 %a, i64 0
%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i64> %shuffle		ret <4 x i64> %shuffle
}		}

define <4 x double> @insert_reg_and_zero_v4f64(double %a) {		define <4 x double> @insert_reg_and_zero_v4f64(double %a) {
; ALL-LABEL: insert_reg_and_zero_v4f64:		; ALL-LABEL: insert_reg_and_zero_v4f64:
; ALL: # BB#0:		; ALL: # BB#0:
; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1		; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1
; ALL-NEXT: vmovsd %xmm0, %xmm1, %xmm0		; ALL-NEXT: vmovsd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
; ALL-NEXT: retq		; ALL-NEXT: retq
%v = insertelement <4 x double> undef, double %a, i32 0		%v = insertelement <4 x double> undef, double %a, i32 0
%shuffle = shufflevector <4 x double> %v, <4 x double> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x double> %v, <4 x double> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x double> %shuffle		ret <4 x double> %shuffle
}		}

define <4 x double> @insert_mem_and_zero_v4f64(double* %ptr) {		define <4 x double> @insert_mem_and_zero_v4f64(double* %ptr) {
; ALL-LABEL: insert_mem_and_zero_v4f64:		; ALL-LABEL: insert_mem_and_zero_v4f64:
; ALL: # BB#0:		; ALL: # BB#0:
; ALL-NEXT: vmovsd (%rdi), %xmm0		; ALL-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; ALL-NEXT: retq		; ALL-NEXT: retq
%a = load double* %ptr		%a = load double* %ptr
%v = insertelement <4 x double> undef, double %a, i32 0		%v = insertelement <4 x double> undef, double %a, i32 0
%shuffle = shufflevector <4 x double> %v, <4 x double> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x double> %v, <4 x double> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x double> %shuffle		ret <4 x double> %shuffle
}		}

define <4 x double> @splat_mem_v4f64(double* %ptr) {		define <4 x double> @splat_mem_v4f64(double* %ptr) {
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-256-v8.ll

	Show First 20 Lines • Show All 1,847 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%1 = shufflevector <4 x float> %r, <4 x float> undef, <8 x i32> zeroinitializer			%1 = shufflevector <4 x float> %r, <4 x float> undef, <8 x i32> zeroinitializer
	ret <8 x float> %1			ret <8 x float> %1
	}			}

	define <8x float> @concat_v2f32_1(<2 x float>* %tmp64, <2 x float>* %tmp65) {			define <8x float> @concat_v2f32_1(<2 x float>* %tmp64, <2 x float>* %tmp65) {
	; ALL-LABEL: concat_v2f32_1:			; ALL-LABEL: concat_v2f32_1:
	; ALL: # BB#0: # %entry			; ALL: # BB#0: # %entry
	; ALL-NEXT: vmovq (%rdi), %xmm0			; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
	; ALL-NEXT: vmovhpd (%rsi), %xmm0, %xmm0			; ALL-NEXT: vmovhpd (%rsi), %xmm0, %xmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%tmp74 = load <2 x float>* %tmp65, align 8			%tmp74 = load <2 x float>* %tmp65, align 8
	%tmp72 = load <2 x float>* %tmp64, align 8			%tmp72 = load <2 x float>* %tmp64, align 8
	%tmp73 = shufflevector <2 x float> %tmp72, <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp73 = shufflevector <2 x float> %tmp72, <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%tmp75 = shufflevector <2 x float> %tmp74, <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp75 = shufflevector <2 x float> %tmp74, <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%tmp76 = shufflevector <8 x float> %tmp73, <8 x float> %tmp75, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp76 = shufflevector <8 x float> %tmp73, <8 x float> %tmp75, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x float> %tmp76			ret <8 x float> %tmp76
	}			}

	define <8x float> @concat_v2f32_2(<2 x float>* %tmp64, <2 x float>* %tmp65) {			define <8x float> @concat_v2f32_2(<2 x float>* %tmp64, <2 x float>* %tmp65) {
	; ALL-LABEL: concat_v2f32_2:			; ALL-LABEL: concat_v2f32_2:
	; ALL: # BB#0: # %entry			; ALL: # BB#0: # %entry
	; ALL-NEXT: vmovq (%rdi), %xmm0			; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
	; ALL-NEXT: vmovhpd (%rsi), %xmm0, %xmm0			; ALL-NEXT: vmovhpd (%rsi), %xmm0, %xmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%tmp74 = load <2 x float>* %tmp65, align 8			%tmp74 = load <2 x float>* %tmp65, align 8
	%tmp72 = load <2 x float>* %tmp64, align 8			%tmp72 = load <2 x float>* %tmp64, align 8
	%tmp76 = shufflevector <2 x float> %tmp72, <2 x float> %tmp74, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp76 = shufflevector <2 x float> %tmp72, <2 x float> %tmp74, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x float> %tmp76			ret <8 x float> %tmp76
	}			}

	define <8x float> @concat_v2f32_3(<2 x float>* %tmp64, <2 x float>* %tmp65) {			define <8x float> @concat_v2f32_3(<2 x float>* %tmp64, <2 x float>* %tmp65) {
	; ALL-LABEL: concat_v2f32_3:			; ALL-LABEL: concat_v2f32_3:
	; ALL: # BB#0: # %entry			; ALL: # BB#0: # %entry
	; ALL-NEXT: vmovq (%rdi), %xmm0			; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
	; ALL-NEXT: vmovhpd (%rsi), %xmm0, %xmm0			; ALL-NEXT: vmovhpd (%rsi), %xmm0, %xmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%tmp74 = load <2 x float>* %tmp65, align 8			%tmp74 = load <2 x float>* %tmp65, align 8
	%tmp72 = load <2 x float>* %tmp64, align 8			%tmp72 = load <2 x float>* %tmp64, align 8
	%tmp76 = shufflevector <2 x float> %tmp72, <2 x float> %tmp74, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%tmp76 = shufflevector <2 x float> %tmp72, <2 x float> %tmp74, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%res = shufflevector <4 x float> %tmp76, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%res = shufflevector <4 x float> %tmp76, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x float> %res			ret <8 x float> %res
	}			}