This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level
ClosedPublic

Authored by RKSimon on Jul 26 2015, 2:43 PM.

Download Raw Diff

Details

Reviewers

qcolombet
chandlerc
andreadb

Commits

rG503a2594c344: [DAGCombiner] Convert constant AND masks to shuffle clear masks down to the…
rL243831: [DAGCombiner] Convert constant AND masks to shuffle clear masks down to the…

Summary

The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to converts them to legal shuffle clear masks.

This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading.

The patch involves a number of additional minor tweaks to improve codegen, I can commit these separately or generate patches for extra review if any of you wish:

XformToShuffleWithZero is only attempted if constant folding has failed.
A lot more X86 byte vector blends are now generated. I've added a stage to the VPBLENDVB lowering in lowerVectorShuffleAsBlend to attempt to lower back to a AND mask using lowerVectorShuffleAsBitMask if possible. VPAND is a lot faster than VPBLENDVB.
X86 v8i16 shuffle lowering now attempts to use lowerVectorShuffleAsBitMask before resorting to VPSHUFB (matches v16i8 shuffle lowering). VPAND is a lot faster than VPSHUFB.

There are still a few examples of poor shuffle lowering that are exposed that we can cleanup in future patches (e.g. x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX)

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 30667.Jul 26 2015, 2:43 PM

RKSimon retitled this revision from to [DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level.

RKSimon updated this object.

RKSimon added reviewers: chandlerc, qcolombet, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

By and large, really, really nice.

Comments in-line.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12991–12994	How annoying would it be to fully split this out into a helper function? The indentation is already reasonably significant here.
12999–13001	I would just consistently use 'int' here.
13118–13121	Is there a particular reason to need to adjust this in this way?
lib/Target/X86/X86ISelLowering.cpp
6884–6885	Please send a separate review with just this (the bit mask lowering) change? It seems likely it should be testable independently.
test/CodeGen/X86/sse3.ll
272–275	This seems like a pretty bad regression. Any hope of recovering it?
test/CodeGen/X86/vec_cast2.ll
49–54	Same question as above...
test/CodeGen/X86/vec_int_to_fp.ll
1762–1765	and here, wow!

Thanks Chandler - I've replied to your comments and I'll get the 'sub patches' done first.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12991–12994	Moderately annoying - the clear mask test is the only test in XformToShuffleWithZero. I could replace the if (RHS.getOpcode() == ISD::BUILD_VECTOR) block with a if (RHS.getOpcode() != ISD::BUILD_VECTOR) early out to reduce one level indentation instead?
13118–13121	We carry on constant folding of binops - in this case (AND c1 c2) - a lot later in the lowering stages than we do constant folding of bitcasts and shuffles, so a number of tests started failing as they'd been converted to bitcasted shuffles, resulting in a lot of unnecessary code.
lib/Target/X86/X86ISelLowering.cpp
6884–6885	Easily done (and understandable...) - I'll do a NFC commit beforehand to move the unaltered function so that the diff for a subpatch is as clear as possible.
test/CodeGen/X86/sse3.ll
272–275	Yes - it requires DAGCombiner::visitVECTOR_SHUFFLE to be able to peek through shuffles of bitcasts of two inputs, so far it only handles the one input case. That should fix both the domain crossing and number of instructions. I've moved it up my todo list.
test/CodeGen/X86/vec_cast2.ll
49–54	This is the zero-extend issue that I mentioned in the summary. Yak shaving......
test/CodeGen/X86/vec_int_to_fp.ll
1762–1765	This is the zero-extend issue that I mentioned in the summary.

RKSimon mentioned this in rL243264: [X86] Reordered lowerVectorShuffleAsBitMask before lowerVectorShuffleAsBlend..Jul 27 2015, 5:37 AM

RKSimon mentioned this in D11541: [X86][SSE] Use bitmasks instead of shuffles where possible..Jul 27 2015, 2:29 PM

RKSimon mentioned this in rL243395: [X86][SSE] Use bitmasks instead of shuffles where possible..Jul 28 2015, 1:55 AM

rebased now that D11541 has been committed

rebased

Chandler - would you be happy for this patch to be submitted? I've started looking at the remaining issues (shuffle through bitcasts, extension + masks and merging 2x128 bitops -> 1x256 mask), all of them are more easily exposed after this patch and I'm working on improvements right now.

Yea, LGTM and looking forward to the two-input fix and zext fix. Awesome work.

This revision is now accepted and ready to land.Jul 31 2015, 2:13 AM

Closed by commit rL243831: [DAGCombiner] Convert constant AND masks to shuffle clear masks down to the… (authored by RKSimon). · Explain WhyAug 1 2015, 3:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

100 lines

Target/

X86/

X86ISelLowering.cpp

269 lines

test/

CodeGen/

X86/

avx2-conversions.ll

3 lines

masked_memop.ll

2 lines

sse2-vector-shifts.ll

1 line

6 lines

10 lines

70 lines

281 lines

12 lines

4 lines

Diff 30667

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 403 Lines • ▼ Show 20 Lines	bool MergeStoresOfConstantsOrVecElts(SmallVectorImpl<MemOpLink> &StoreNodes,
bool IsConstantSrc, bool UseVector);		bool IsConstantSrc, bool UseVector);

/// This is a helper function for MergeConsecutiveStores.		/// This is a helper function for MergeConsecutiveStores.
/// Stores that may be merged are placed in StoreNodes.		/// Stores that may be merged are placed in StoreNodes.
/// Loads that may alias with those stores are placed in AliasLoadNodes.		/// Loads that may alias with those stores are placed in AliasLoadNodes.
void getStoreMergeAndAliasCandidates(		void getStoreMergeAndAliasCandidates(
StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes,		StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes,
SmallVectorImpl<LSBaseSDNode*> &AliasLoadNodes);		SmallVectorImpl<LSBaseSDNode*> &AliasLoadNodes);

/// Merge consecutive store operations into a wide store.		/// Merge consecutive store operations into a wide store.
/// This optimization uses wide integers or vectors when possible.		/// This optimization uses wide integers or vectors when possible.
/// \return True if some memory operations were changed.		/// \return True if some memory operations were changed.
bool MergeConsecutiveStores(StoreSDNode *N);		bool MergeConsecutiveStores(StoreSDNode *N);

/// \brief Try to transform a truncation where C is a constant:		/// \brief Try to transform a truncation where C is a constant:
/// (trunc (and X, C)) -> (and (trunc X), (trunc C))		/// (trunc (and X, C)) -> (and (trunc X), (trunc C))
///		///
▲ Show 20 Lines • Show All 5,141 Lines • ▼ Show 20 Lines
}		}

SDValue DAGCombiner::visitSETCC(SDNode *N) {		SDValue DAGCombiner::visitSETCC(SDNode *N) {
return SimplifySetCC(N->getValueType(0), N->getOperand(0), N->getOperand(1),		return SimplifySetCC(N->getValueType(0), N->getOperand(0), N->getOperand(1),
cast<CondCodeSDNode>(N->getOperand(2))->get(),		cast<CondCodeSDNode>(N->getOperand(2))->get(),
SDLoc(N));		SDLoc(N));
}		}

/// Try to fold a sext/zext/aext dag node into a ConstantSDNode or		/// Try to fold a sext/zext/aext dag node into a ConstantSDNode or
/// a build_vector of constants.		/// a build_vector of constants.
/// This function is called by the DAGCombiner when visiting sext/zext/aext		/// This function is called by the DAGCombiner when visiting sext/zext/aext
/// dag nodes (see for example method DAGCombiner::visitSIGN_EXTEND).		/// dag nodes (see for example method DAGCombiner::visitSIGN_EXTEND).
/// Vector extends are not folded if operations are legal; this is to		/// Vector extends are not folded if operations are legal; this is to
/// avoid introducing illegal build_vector dag nodes.		/// avoid introducing illegal build_vector dag nodes.
static SDNode tryToFoldExtendOfConstant(SDNode N, const TargetLowering &TLI,		static SDNode tryToFoldExtendOfConstant(SDNode N, const TargetLowering &TLI,
SelectionDAG &DAG, bool LegalTypes,		SelectionDAG &DAG, bool LegalTypes,
bool LegalOperations) {		bool LegalOperations) {
▲ Show 20 Lines • Show All 2,819 Lines • ▼ Show 20 Lines
SDValue DAGCombiner::visitFSQRT(SDNode *N) {		SDValue DAGCombiner::visitFSQRT(SDNode *N) {
if (!DAG.getTarget().Options.UnsafeFPMath \|\| TLI.isFsqrtCheap())		if (!DAG.getTarget().Options.UnsafeFPMath \|\| TLI.isFsqrtCheap())
return SDValue();		return SDValue();

// Compute this as X * (1/sqrt(X)) = X * (X ** -0.5)		// Compute this as X * (1/sqrt(X)) = X * (X ** -0.5)
SDValue RV = BuildRsqrtEstimate(N->getOperand(0));		SDValue RV = BuildRsqrtEstimate(N->getOperand(0));
if (!RV)		if (!RV)
return SDValue();		return SDValue();

EVT VT = RV.getValueType();		EVT VT = RV.getValueType();
SDLoc DL(N);		SDLoc DL(N);
RV = DAG.getNode(ISD::FMUL, DL, VT, N->getOperand(0), RV);		RV = DAG.getNode(ISD::FMUL, DL, VT, N->getOperand(0), RV);
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());

// Unfortunately, RV is now NaN if the input was exactly 0.		// Unfortunately, RV is now NaN if the input was exactly 0.
// Select out this case and force the answer to 0.		// Select out this case and force the answer to 0.
SDValue Zero = DAG.getConstantFP(0.0, DL, VT);		SDValue Zero = DAG.getConstantFP(0.0, DL, VT);
▲ Show 20 Lines • Show All 2,430 Lines • ▼ Show 20 Lines	bool DAGCombiner::MergeConsecutiveStores(StoreSDNode* St) {
SDValue Chain = SDValue(St, 0);		SDValue Chain = SDValue(St, 0);
if (Chain->hasOneUse() && Chain->use_begin()->getOpcode() == ISD::STORE)		if (Chain->hasOneUse() && Chain->use_begin()->getOpcode() == ISD::STORE)
return false;		return false;

// Save the LoadSDNodes that we find in the chain.		// Save the LoadSDNodes that we find in the chain.
// We need to make sure that these nodes do not interfere with		// We need to make sure that these nodes do not interfere with
// any of the store nodes.		// any of the store nodes.
SmallVector<LSBaseSDNode*, 8> AliasLoadNodes;		SmallVector<LSBaseSDNode*, 8> AliasLoadNodes;

// Save the StoreSDNodes that we find in the chain.		// Save the StoreSDNodes that we find in the chain.
SmallVector<MemOpLink, 8> StoreNodes;		SmallVector<MemOpLink, 8> StoreNodes;

getStoreMergeAndAliasCandidates(St, StoreNodes, AliasLoadNodes);		getStoreMergeAndAliasCandidates(St, StoreNodes, AliasLoadNodes);

// Check if there is anything to merge.		// Check if there is anything to merge.
if (StoreNodes.size() < 2)		if (StoreNodes.size() < 2)
return false;		return false;

// Sort the memory operands according to their distance from the base pointer.		// Sort the memory operands according to their distance from the base pointer.
std::sort(StoreNodes.begin(), StoreNodes.end(),		std::sort(StoreNodes.begin(), StoreNodes.end(),
[](MemOpLink LHS, MemOpLink RHS) {		[](MemOpLink LHS, MemOpLink RHS) {
return LHS.OffsetFromBase < RHS.OffsetFromBase \|\|		return LHS.OffsetFromBase < RHS.OffsetFromBase \|\|
▲ Show 20 Lines • Show All 2,113 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::XformToShuffleWithZero(SDNode *N) {

if (N->getOpcode() != ISD::AND)		if (N->getOpcode() != ISD::AND)
return SDValue();		return SDValue();

if (RHS.getOpcode() == ISD::BITCAST)		if (RHS.getOpcode() == ISD::BITCAST)
RHS = RHS.getOperand(0);		RHS = RHS.getOperand(0);

if (RHS.getOpcode() == ISD::BUILD_VECTOR) {		if (RHS.getOpcode() == ISD::BUILD_VECTOR) {
SmallVector<int, 8> Indices;		EVT RVT = RHS.getValueType();
unsigned NumElts = RHS.getNumOperands();		unsigned NumElts = RHS.getNumOperands();

for (unsigned i = 0; i != NumElts; ++i) {		// Attempt to create a valid clear mask, splitting the mask into
SDValue Elt = RHS.getOperand(i);		// sub elements and checking to see if each is
if (isAllOnesConstant(Elt))		// all zeros or all ones - suitable for shuffle masking.
		auto BuildClearMask = [&](unsigned Split) {
		chandlercUnsubmitted Not Done Reply Inline Actions How annoying would it be to fully split this out into a helper function? The indentation is already reasonably significant here. chandlerc: How annoying would it be to fully split this out into a helper function? The indentation is…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Moderately annoying - the clear mask test is the only test in XformToShuffleWithZero. I could replace the if (RHS.getOpcode() == ISD::BUILD_VECTOR) block with a if (RHS.getOpcode() != ISD::BUILD_VECTOR) early out to reduce one level indentation instead? RKSimon: Moderately annoying - the clear mask test is the only test in XformToShuffleWithZero. I could…
		unsigned NumSubElts = NumElts * Split;
		unsigned NumSubBits = RVT.getScalarSizeInBits() / Split;

		SmallVector<int, 8> Indices;
		for (unsigned i = 0; i != NumSubElts; ++i) {
		unsigned EltIdx = i / Split;
		unsigned SubIdx = i % Split;
		chandlercUnsubmitted Not Done Reply Inline Actions I would just consistently use 'int' here. chandlerc: I would just consistently use 'int' here.
		SDValue Elt = RHS.getOperand(EltIdx);
		if (Elt.getOpcode() == ISD::UNDEF) {
		Indices.push_back(-1);
		continue;
		}

		APInt Bits;
		if (isa<ConstantSDNode>(Elt))
		Bits = cast<ConstantSDNode>(Elt)->getAPIntValue();
		else if(isa<ConstantFPSDNode>(Elt))
		Bits = cast<ConstantFPSDNode>(Elt)->getValueAPF().bitcastToAPInt();
		else
		return SDValue();

		// Extract the sub element from the constant bit mask.
		if (DAG.getDataLayout().isBigEndian()) {
		Bits = Bits.lshr((Split - SubIdx - 1) * NumSubBits);
		} else {
		Bits = Bits.lshr(SubIdx * NumSubBits);
		}

		if (Split > 1)
		Bits = Bits.trunc(NumSubBits);

		if (Bits.isAllOnesValue())
Indices.push_back(i);		Indices.push_back(i);
else if (isNullConstant(Elt))		else if (Bits == 0)
Indices.push_back(NumElts+i);		Indices.push_back(i + NumSubElts);
else		else
return SDValue();		return SDValue();
}		}

// Let's see if the target supports this vector_shuffle.		// Let's see if the target supports this vector_shuffle.
EVT RVT = RHS.getValueType();		EVT ClearSVT = EVT::getIntegerVT(*DAG.getContext(), NumSubBits);
if (!TLI.isVectorClearMaskLegal(Indices, RVT))		EVT ClearVT = EVT::getVectorVT(*DAG.getContext(), ClearSVT, NumSubElts);
		if (!TLI.isVectorClearMaskLegal(Indices, ClearVT))
return SDValue();		return SDValue();

// Return the new VECTOR_SHUFFLE node.		SDValue Zero = DAG.getConstant(0, dl, ClearVT);
EVT EltVT = RVT.getVectorElementType();		return DAG.getBitcast(
SmallVector<SDValue,8> ZeroOps(RVT.getVectorNumElements(),		VT, DAG.getVectorShuffle(ClearVT, dl, DAG.getBitcast(ClearVT, LHS),
DAG.getConstant(0, dl, EltVT));		Zero, &Indices[0]));
SDValue Zero = DAG.getNode(ISD::BUILD_VECTOR, dl, RVT, ZeroOps);		};
LHS = DAG.getNode(ISD::BITCAST, dl, RVT, LHS);
SDValue Shuf = DAG.getVectorShuffle(RVT, dl, LHS, Zero, &Indices[0]);		// Determine maximum split level (byte level masking).
return DAG.getNode(ISD::BITCAST, dl, VT, Shuf);		unsigned MaxSplit = 1;
		if (RVT.getScalarSizeInBits() % 8 == 0)
		MaxSplit = RVT.getScalarSizeInBits() / 8;

		for (unsigned Split = 1; Split <= MaxSplit; ++Split)
		if (RVT.getScalarSizeInBits() % Split == 0)
		if (SDValue S = BuildClearMask(Split))
		return S;
}		}

return SDValue();		return SDValue();
}		}

/// Visit a binary vector operation, like ADD.		/// Visit a binary vector operation, like ADD.
SDValue DAGCombiner::SimplifyVBinOp(SDNode *N) {		SDValue DAGCombiner::SimplifyVBinOp(SDNode *N) {
assert(N->getValueType(0).isVector() &&		assert(N->getValueType(0).isVector() &&
"SimplifyVBinOp only works on vectors!");		"SimplifyVBinOp only works on vectors!");

SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);		SDValue RHS = N->getOperand(1);

if (SDValue Shuffle = XformToShuffleWithZero(N))
return Shuffle;

// If the LHS and RHS are BUILD_VECTOR nodes, see if we can constant fold		// If the LHS and RHS are BUILD_VECTOR nodes, see if we can constant fold
// this operation.		// this operation.
if (LHS.getOpcode() == ISD::BUILD_VECTOR &&		if (LHS.getOpcode() == ISD::BUILD_VECTOR &&
RHS.getOpcode() == ISD::BUILD_VECTOR) {		RHS.getOpcode() == ISD::BUILD_VECTOR) {
// Check if both vectors are constants. If not bail out.		// Check if both vectors are constants. If not bail out.
if (!(cast<BuildVectorSDNode>(LHS)->isConstant() &&		if (!(cast<BuildVectorSDNode>(LHS)->isConstant() &&
cast<BuildVectorSDNode>(RHS)->isConstant()))		cast<BuildVectorSDNode>(RHS)->isConstant()))
return SDValue();		return SDValue();
Show All 34 Lines	for (unsigned i = 0, e = LHS.getNumOperands(); i != e; ++i) {
Ops.push_back(FoldOp);		Ops.push_back(FoldOp);
AddToWorklist(FoldOp.getNode());		AddToWorklist(FoldOp.getNode());
}		}

if (Ops.size() == LHS.getNumOperands())		if (Ops.size() == LHS.getNumOperands())
return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N), LHS.getValueType(), Ops);		return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N), LHS.getValueType(), Ops);
}		}

		// Try to convert a constant mask AND into a shuffle clear mask.
		if (SDValue Shuffle = XformToShuffleWithZero(N))
		return Shuffle;

		chandlercUnsubmitted Not Done Reply Inline Actions Is there a particular reason to need to adjust this in this way? chandlerc: Is there a particular reason to need to adjust this in this way?
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions We carry on constant folding of binops - in this case (AND c1 c2) - a lot later in the lowering stages than we do constant folding of bitcasts and shuffles, so a number of tests started failing as they'd been converted to bitcasted shuffles, resulting in a lot of unnecessary code. RKSimon: We carry on constant folding of binops - in this case (AND c1 c2) - a lot later in the lowering…
// Type legalization might introduce new shuffles in the DAG.		// Type legalization might introduce new shuffles in the DAG.
// Fold (VBinOp (shuffle (A, Undef, Mask)), (shuffle (B, Undef, Mask)))		// Fold (VBinOp (shuffle (A, Undef, Mask)), (shuffle (B, Undef, Mask)))
// -> (shuffle (VBinOp (A, B)), Undef, Mask).		// -> (shuffle (VBinOp (A, B)), Undef, Mask).
if (LegalTypes && isa<ShuffleVectorSDNode>(LHS) &&		if (LegalTypes && isa<ShuffleVectorSDNode>(LHS) &&
isa<ShuffleVectorSDNode>(RHS) && LHS.hasOneUse() && RHS.hasOneUse() &&		isa<ShuffleVectorSDNode>(RHS) && LHS.hasOneUse() && RHS.hasOneUse() &&
LHS.getOperand(1).getOpcode() == ISD::UNDEF &&		LHS.getOperand(1).getOpcode() == ISD::UNDEF &&
RHS.getOperand(1).getOpcode() == ISD::UNDEF) {		RHS.getOperand(1).getOpcode() == ISD::UNDEF) {
ShuffleVectorSDNode *SVN0 = cast<ShuffleVectorSDNode>(LHS);		ShuffleVectorSDNode *SVN0 = cast<ShuffleVectorSDNode>(LHS);
▲ Show 20 Lines • Show All 997 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,496 Lines • ▼ Show 20 Lines	static SDValue lowerVectorShuffleAsBitBlend(SDLoc DL, MVT VT, SDValue V1,
// We have to cast V2 around.		// We have to cast V2 around.
MVT MaskVT = MVT::getVectorVT(MVT::i64, VT.getSizeInBits() / 64);		MVT MaskVT = MVT::getVectorVT(MVT::i64, VT.getSizeInBits() / 64);
V2 = DAG.getBitcast(VT, DAG.getNode(X86ISD::ANDNP, DL, MaskVT,		V2 = DAG.getBitcast(VT, DAG.getNode(X86ISD::ANDNP, DL, MaskVT,
DAG.getBitcast(MaskVT, V1Mask),		DAG.getBitcast(MaskVT, V1Mask),
DAG.getBitcast(MaskVT, V2)));		DAG.getBitcast(MaskVT, V2)));
return DAG.getNode(ISD::OR, DL, VT, V1, V2);		return DAG.getNode(ISD::OR, DL, VT, V1, V2);
}		}

/// \brief Try to emit a blend instruction for a shuffle.
///
/// This doesn't do any checks for the availability of instructions for blending
/// these values. It relies on the availability of the X86ISD::BLENDI pattern to
/// be matched in the backend with the type given. What it does check for is
/// that the shuffle mask is in fact a blend.
static SDValue lowerVectorShuffleAsBlend(SDLoc DL, MVT VT, SDValue V1,
SDValue V2, ArrayRef<int> Mask,
const X86Subtarget *Subtarget,
SelectionDAG &DAG) {
unsigned BlendMask = 0;
for (int i = 0, Size = Mask.size(); i < Size; ++i) {
if (Mask[i] >= Size) {
if (Mask[i] != i + Size)
return SDValue(); // Shuffled V2 input!
BlendMask \|= 1u << i;
continue;
}
if (Mask[i] >= 0 && Mask[i] != i)
return SDValue(); // Shuffled V1 input!
}
switch (VT.SimpleTy) {
case MVT::v2f64:
case MVT::v4f32:
case MVT::v4f64:
case MVT::v8f32:
return DAG.getNode(X86ISD::BLENDI, DL, VT, V1, V2,
DAG.getConstant(BlendMask, DL, MVT::i8));

case MVT::v4i64:
case MVT::v8i32:
assert(Subtarget->hasAVX2() && "256-bit integer blends require AVX2!");
// FALLTHROUGH
case MVT::v2i64:
case MVT::v4i32:
// If we have AVX2 it is faster to use VPBLENDD when the shuffle fits into
// that instruction.
if (Subtarget->hasAVX2()) {
// Scale the blend by the number of 32-bit dwords per element.
int Scale = VT.getScalarSizeInBits() / 32;
BlendMask = 0;
for (int i = 0, Size = Mask.size(); i < Size; ++i)
if (Mask[i] >= Size)
for (int j = 0; j < Scale; ++j)
BlendMask \|= 1u << (i * Scale + j);

MVT BlendVT = VT.getSizeInBits() > 128 ? MVT::v8i32 : MVT::v4i32;
V1 = DAG.getBitcast(BlendVT, V1);
V2 = DAG.getBitcast(BlendVT, V2);
return DAG.getBitcast(
VT, DAG.getNode(X86ISD::BLENDI, DL, BlendVT, V1, V2,
DAG.getConstant(BlendMask, DL, MVT::i8)));
}
// FALLTHROUGH
case MVT::v8i16: {
// For integer shuffles we need to expand the mask and cast the inputs to
// v8i16s prior to blending.
int Scale = 8 / VT.getVectorNumElements();
BlendMask = 0;
for (int i = 0, Size = Mask.size(); i < Size; ++i)
if (Mask[i] >= Size)
for (int j = 0; j < Scale; ++j)
BlendMask \|= 1u << (i * Scale + j);

V1 = DAG.getBitcast(MVT::v8i16, V1);
V2 = DAG.getBitcast(MVT::v8i16, V2);
return DAG.getBitcast(VT,
DAG.getNode(X86ISD::BLENDI, DL, MVT::v8i16, V1, V2,
DAG.getConstant(BlendMask, DL, MVT::i8)));
}

case MVT::v16i16: {
assert(Subtarget->hasAVX2() && "256-bit integer blends require AVX2!");
SmallVector<int, 8> RepeatedMask;
if (is128BitLaneRepeatedShuffleMask(MVT::v16i16, Mask, RepeatedMask)) {
// We can lower these with PBLENDW which is mirrored across 128-bit lanes.
assert(RepeatedMask.size() == 8 && "Repeated mask size doesn't match!");
BlendMask = 0;
for (int i = 0; i < 8; ++i)
if (RepeatedMask[i] >= 16)
BlendMask \|= 1u << i;
return DAG.getNode(X86ISD::BLENDI, DL, MVT::v16i16, V1, V2,
DAG.getConstant(BlendMask, DL, MVT::i8));
}
}
// FALLTHROUGH
case MVT::v16i8:
case MVT::v32i8: {
assert((VT.getSizeInBits() == 128 \|\| Subtarget->hasAVX2()) &&
"256-bit byte-blends require AVX2 support!");

// Scale the blend by the number of bytes per element.
int Scale = VT.getScalarSizeInBits() / 8;

// This form of blend is always done on bytes. Compute the byte vector
// type.
MVT BlendVT = MVT::getVectorVT(MVT::i8, VT.getSizeInBits() / 8);

// Compute the VSELECT mask. Note that VSELECT is really confusing in the
// mix of LLVM's code generator and the x86 backend. We tell the code
// generator that boolean values in the elements of an x86 vector register
// are -1 for true and 0 for false. We then use the LLVM semantics of 'true'
// mapping a select to operand #1, and 'false' mapping to operand #2. The
// reality in x86 is that vector masks (pre-AVX-512) use only the high bit
// of the element (the remaining are ignored) and 0 in that high bit would
// mean operand #1 while 1 in the high bit would mean operand #2. So while
// the LLVM model for boolean values in vector elements gets the relevant
// bit set, it is set backwards and over constrained relative to x86's
// actual model.
SmallVector<SDValue, 32> VSELECTMask;
for (int i = 0, Size = Mask.size(); i < Size; ++i)
for (int j = 0; j < Scale; ++j)
VSELECTMask.push_back(
Mask[i] < 0 ? DAG.getUNDEF(MVT::i8)
: DAG.getConstant(Mask[i] < Size ? -1 : 0, DL,
MVT::i8));

V1 = DAG.getBitcast(BlendVT, V1);
V2 = DAG.getBitcast(BlendVT, V2);
return DAG.getBitcast(VT, DAG.getNode(ISD::VSELECT, DL, BlendVT,
DAG.getNode(ISD::BUILD_VECTOR, DL,
BlendVT, VSELECTMask),
V1, V2));
}

default:
llvm_unreachable("Not a supported integer vector type!");
}
}

/// \brief Try to lower as a blend of elements from two inputs followed by		/// \brief Try to lower as a blend of elements from two inputs followed by
/// a single-input permutation.		/// a single-input permutation.
///		///
/// This matches the pattern where we can blend elements from two inputs and		/// This matches the pattern where we can blend elements from two inputs and
/// then reduce the shuffle to a single-input permutation.		/// then reduce the shuffle to a single-input permutation.
static SDValue lowerVectorShuffleAsBlendAndPermute(SDLoc DL, MVT VT, SDValue V1,		static SDValue lowerVectorShuffleAsBlendAndPermute(SDLoc DL, MVT VT, SDValue V1,
SDValue V2,		SDValue V2,
ArrayRef<int> Mask,		ArrayRef<int> Mask,
▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	static SDValue lowerVectorShuffleAsBitMask(SDLoc DL, MVT VT, SDValue V1,

SDValue VMask = DAG.getNode(ISD::BUILD_VECTOR, DL, VT, VMaskOps);		SDValue VMask = DAG.getNode(ISD::BUILD_VECTOR, DL, VT, VMaskOps);
V = DAG.getNode(VT.isFloatingPoint()		V = DAG.getNode(VT.isFloatingPoint()
? (unsigned) X86ISD::FAND : (unsigned) ISD::AND,		? (unsigned) X86ISD::FAND : (unsigned) ISD::AND,
DL, VT, V, VMask);		DL, VT, V, VMask);
return V;		return V;
}		}

		/// \brief Try to emit a blend instruction for a shuffle.
		///
		/// This doesn't do any checks for the availability of instructions for blending
		/// these values. It relies on the availability of the X86ISD::BLENDI pattern to
		/// be matched in the backend with the type given. What it does check for is
		/// that the shuffle mask is in fact a blend.
		static SDValue lowerVectorShuffleAsBlend(SDLoc DL, MVT VT, SDValue V1,
		SDValue V2, ArrayRef<int> Mask,
		const X86Subtarget *Subtarget,
		SelectionDAG &DAG) {
		unsigned BlendMask = 0;
		for (int i = 0, Size = Mask.size(); i < Size; ++i) {
		if (Mask[i] >= Size) {
		if (Mask[i] != i + Size)
		return SDValue(); // Shuffled V2 input!
		BlendMask \|= 1u << i;
		continue;
		}
		if (Mask[i] >= 0 && Mask[i] != i)
		return SDValue(); // Shuffled V1 input!
		}
		switch (VT.SimpleTy) {
		case MVT::v2f64:
		case MVT::v4f32:
		case MVT::v4f64:
		case MVT::v8f32:
		return DAG.getNode(X86ISD::BLENDI, DL, VT, V1, V2,
		DAG.getConstant(BlendMask, DL, MVT::i8));

		case MVT::v4i64:
		case MVT::v8i32:
		assert(Subtarget->hasAVX2() && "256-bit integer blends require AVX2!");
		// FALLTHROUGH
		case MVT::v2i64:
		case MVT::v4i32:
		// If we have AVX2 it is faster to use VPBLENDD when the shuffle fits into
		// that instruction.
		if (Subtarget->hasAVX2()) {
		// Scale the blend by the number of 32-bit dwords per element.
		int Scale = VT.getScalarSizeInBits() / 32;
		BlendMask = 0;
		for (int i = 0, Size = Mask.size(); i < Size; ++i)
		if (Mask[i] >= Size)
		for (int j = 0; j < Scale; ++j)
		BlendMask \|= 1u << (i * Scale + j);

		MVT BlendVT = VT.getSizeInBits() > 128 ? MVT::v8i32 : MVT::v4i32;
		V1 = DAG.getBitcast(BlendVT, V1);
		V2 = DAG.getBitcast(BlendVT, V2);
		return DAG.getBitcast(
		VT, DAG.getNode(X86ISD::BLENDI, DL, BlendVT, V1, V2,
		DAG.getConstant(BlendMask, DL, MVT::i8)));
		}
		// FALLTHROUGH
		case MVT::v8i16: {
		// For integer shuffles we need to expand the mask and cast the inputs to
		// v8i16s prior to blending.
		int Scale = 8 / VT.getVectorNumElements();
		BlendMask = 0;
		for (int i = 0, Size = Mask.size(); i < Size; ++i)
		if (Mask[i] >= Size)
		for (int j = 0; j < Scale; ++j)
		BlendMask \|= 1u << (i * Scale + j);

		V1 = DAG.getBitcast(MVT::v8i16, V1);
		V2 = DAG.getBitcast(MVT::v8i16, V2);
		return DAG.getBitcast(VT,
		DAG.getNode(X86ISD::BLENDI, DL, MVT::v8i16, V1, V2,
		DAG.getConstant(BlendMask, DL, MVT::i8)));
		}

		case MVT::v16i16: {
		assert(Subtarget->hasAVX2() && "256-bit integer blends require AVX2!");
		SmallVector<int, 8> RepeatedMask;
		if (is128BitLaneRepeatedShuffleMask(MVT::v16i16, Mask, RepeatedMask)) {
		// We can lower these with PBLENDW which is mirrored across 128-bit lanes.
		assert(RepeatedMask.size() == 8 && "Repeated mask size doesn't match!");
		BlendMask = 0;
		for (int i = 0; i < 8; ++i)
		if (RepeatedMask[i] >= 16)
		BlendMask \|= 1u << i;
		return DAG.getNode(X86ISD::BLENDI, DL, MVT::v16i16, V1, V2,
		DAG.getConstant(BlendMask, DL, MVT::i8));
		}
		}
		// FALLTHROUGH
		case MVT::v16i8:
		case MVT::v32i8: {
		assert((VT.getSizeInBits() == 128 \|\| Subtarget->hasAVX2()) &&
		"256-bit byte-blends require AVX2 support!");

		if (SDValue Masked = lowerVectorShuffleAsBitMask(DL, VT, V1, V2, Mask, DAG))
		return Masked;
		chandlercUnsubmitted Not Done Reply Inline Actions Please send a separate review with just this (the bit mask lowering) change? It seems likely it should be testable independently. chandlerc: Please send a separate review with just this (the bit mask lowering) change? It seems likely…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Easily done (and understandable...) - I'll do a NFC commit beforehand to move the unaltered function so that the diff for a subpatch is as clear as possible. RKSimon: Easily done (and understandable...) - I'll do a NFC commit beforehand to move the unaltered…

		// Scale the blend by the number of bytes per element.
		int Scale = VT.getScalarSizeInBits() / 8;

		// This form of blend is always done on bytes. Compute the byte vector
		// type.
		MVT BlendVT = MVT::getVectorVT(MVT::i8, VT.getSizeInBits() / 8);

		// Compute the VSELECT mask. Note that VSELECT is really confusing in the
		// mix of LLVM's code generator and the x86 backend. We tell the code
		// generator that boolean values in the elements of an x86 vector register
		// are -1 for true and 0 for false. We then use the LLVM semantics of 'true'
		// mapping a select to operand #1, and 'false' mapping to operand #2. The
		// reality in x86 is that vector masks (pre-AVX-512) use only the high bit
		// of the element (the remaining are ignored) and 0 in that high bit would
		// mean operand #1 while 1 in the high bit would mean operand #2. So while
		// the LLVM model for boolean values in vector elements gets the relevant
		// bit set, it is set backwards and over constrained relative to x86's
		// actual model.
		SmallVector<SDValue, 32> VSELECTMask;
		for (int i = 0, Size = Mask.size(); i < Size; ++i)
		for (int j = 0; j < Scale; ++j)
		VSELECTMask.push_back(
		Mask[i] < 0 ? DAG.getUNDEF(MVT::i8)
		: DAG.getConstant(Mask[i] < Size ? -1 : 0, DL,
		MVT::i8));

		V1 = DAG.getBitcast(BlendVT, V1);
		V2 = DAG.getBitcast(BlendVT, V2);
		return DAG.getBitcast(VT, DAG.getNode(ISD::VSELECT, DL, BlendVT,
		DAG.getNode(ISD::BUILD_VECTOR, DL,
		BlendVT, VSELECTMask),
		V1, V2));
		}

		default:
		llvm_unreachable("Not a supported integer vector type!");
		}
		}

/// \brief Try to lower a vector shuffle as a bit shift (shifts in zeros).		/// \brief Try to lower a vector shuffle as a bit shift (shifts in zeros).
///		///
/// Attempts to match a shuffle mask against the PSLL(W/D/Q/DQ) and		/// Attempts to match a shuffle mask against the PSLL(W/D/Q/DQ) and
/// PSRL(W/D/Q/DQ) SSE2 and AVX2 logical bit-shift instructions. The function		/// PSRL(W/D/Q/DQ) SSE2 and AVX2 logical bit-shift instructions. The function
/// matches elements from one of the input vectors shuffled to the left or		/// matches elements from one of the input vectors shuffled to the left or
/// right with zeroable elements 'shifted in'. It handles both the strictly		/// right with zeroable elements 'shifted in'. It handles both the strictly
/// bit-wise element shifts and the byte shift across an entire 128-bit double		/// bit-wise element shifts and the byte shift across an entire 128-bit double
/// quad word lane.		/// quad word lane.
▲ Show 20 Lines • Show All 2,151 Lines • ▼ Show 20 Lines	auto tryToWidenViaDuplication = [&]() -> SDValue {
MVT::v16i8,		MVT::v16i8,
DAG.getVectorShuffle(MVT::v8i16, DL, DAG.getBitcast(MVT::v8i16, V1),		DAG.getVectorShuffle(MVT::v8i16, DL, DAG.getBitcast(MVT::v8i16, V1),
DAG.getUNDEF(MVT::v8i16), PostDupI16Shuffle));		DAG.getUNDEF(MVT::v8i16), PostDupI16Shuffle));
};		};
if (SDValue V = tryToWidenViaDuplication())		if (SDValue V = tryToWidenViaDuplication())
return V;		return V;
}		}

		if (SDValue Masked =
		lowerVectorShuffleAsBitMask(DL, MVT::v16i8, V1, V2, Mask, DAG))
		return Masked;

// Use dedicated unpack instructions for masks that match their pattern.		// Use dedicated unpack instructions for masks that match their pattern.
if (isShuffleEquivalent(V1, V2, Mask, {// Low half.		if (isShuffleEquivalent(V1, V2, Mask, {// Low half.
0, 16, 1, 17, 2, 18, 3, 19,		0, 16, 1, 17, 2, 18, 3, 19,
// High half.		// High half.
4, 20, 5, 21, 6, 22, 7, 23}))		4, 20, 5, 21, 6, 22, 7, 23}))
return DAG.getNode(X86ISD::UNPCKL, DL, MVT::v16i8, V1, V2);		return DAG.getNode(X86ISD::UNPCKL, DL, MVT::v16i8, V1, V2);
if (isShuffleEquivalent(V1, V2, Mask, {// Low half.		if (isShuffleEquivalent(V1, V2, Mask, {// Low half.
8, 24, 9, 25, 10, 26, 11, 27,		8, 24, 9, 25, 10, 26, 11, 27,
▲ Show 20 Lines • Show All 6,301 Lines • ▼ Show 20 Lines	case INTR_TYPE_4OP:
return DAG.getNode(IntrData->Opc0, dl, Op.getValueType(), Op.getOperand(1),		return DAG.getNode(IntrData->Opc0, dl, Op.getValueType(), Op.getOperand(1),
Op.getOperand(2), Op.getOperand(3), Op.getOperand(4));		Op.getOperand(2), Op.getOperand(3), Op.getOperand(4));
case INTR_TYPE_1OP_MASK_RM: {		case INTR_TYPE_1OP_MASK_RM: {
SDValue Src = Op.getOperand(1);		SDValue Src = Op.getOperand(1);
SDValue PassThru = Op.getOperand(2);		SDValue PassThru = Op.getOperand(2);
SDValue Mask = Op.getOperand(3);		SDValue Mask = Op.getOperand(3);
SDValue RoundingMode;		SDValue RoundingMode;
// We allways add rounding mode to the Node.		// We allways add rounding mode to the Node.
// If the rounding mode is not specified, we add the		// If the rounding mode is not specified, we add the
// "current direction" mode.		// "current direction" mode.
if (Op.getNumOperands() == 4)		if (Op.getNumOperands() == 4)
RoundingMode =		RoundingMode =
DAG.getConstant(X86::STATIC_ROUNDING::CUR_DIRECTION, dl, MVT::i32);		DAG.getConstant(X86::STATIC_ROUNDING::CUR_DIRECTION, dl, MVT::i32);
else		else
RoundingMode = Op.getOperand(4);		RoundingMode = Op.getOperand(4);
unsigned IntrWithRoundingModeOpcode = IntrData->Opc1;		unsigned IntrWithRoundingModeOpcode = IntrData->Opc1;
if (IntrWithRoundingModeOpcode != 0)		if (IntrWithRoundingModeOpcode != 0)
▲ Show 20 Lines • Show All 10,963 Lines • Show Last 20 Lines

test/CodeGen/X86/avx2-conversions.ll

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%B = zext <8 x i16> %A to <8 x i32>		%B = zext <8 x i16> %A to <8 x i32>
ret <8 x i32>%B		ret <8 x i32>%B
}		}

define <8 x i32> @zext_8i8_8i32(<8 x i8> %A) nounwind {		define <8 x i32> @zext_8i8_8i32(<8 x i8> %A) nounwind {
; CHECK-LABEL: zext_8i8_8i32:		; CHECK-LABEL: zext_8i8_8i32:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
; CHECK-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; CHECK-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1		; CHECK-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
; CHECK-NEXT: vpand %ymm1, %ymm0, %ymm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%B = zext <8 x i8> %A to <8 x i32>		%B = zext <8 x i8> %A to <8 x i32>
ret <8 x i32>%B		ret <8 x i32>%B
}		}

define <16 x i16> @zext_16i8_16i16(<16 x i8> %z) {		define <16 x i16> @zext_16i8_16i16(<16 x i8> %z) {
; CHECK-LABEL: zext_16i8_16i16:		; CHECK-LABEL: zext_16i8_16i16:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/CodeGen/X86/masked_memop.ll

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	define void @test14(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %val) {
ret void		ret void
}		}

; AVX2-LABEL: test15		; AVX2-LABEL: test15
; AVX2: vpmaskmovd		; AVX2: vpmaskmovd

; SKX-LABEL: test15:		; SKX-LABEL: test15:
; SKX: ## BB#0:		; SKX: ## BB#0:
; SKX-NEXT: vpandq {{.*}}(%rip), %xmm0, %xmm0
; SKX-NEXT: vpxor %xmm2, %xmm2, %xmm2		; SKX-NEXT: vpxor %xmm2, %xmm2, %xmm2
		; SKX-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3]
; SKX-NEXT: vpcmpeqq %xmm2, %xmm0, %k1		; SKX-NEXT: vpcmpeqq %xmm2, %xmm0, %k1
; SKX-NEXT: vpmovqd %xmm1, (%rdi) {%k1}		; SKX-NEXT: vpmovqd %xmm1, (%rdi) {%k1}
; SKX-NEXT: retq		; SKX-NEXT: retq
define void @test15(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) {		define void @test15(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) {
%mask = icmp eq <2 x i32> %trigger, zeroinitializer		%mask = icmp eq <2 x i32> %trigger, zeroinitializer
call void @llvm.masked.store.v2i32(<2 x i32>%val, <2 x i32>* %addr, i32 4, <2 x i1>%mask)		call void @llvm.masked.store.v2i32(<2 x i32>%val, <2 x i32>* %addr, i32 4, <2 x i1>%mask)
ret void		ret void
}		}
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

test/CodeGen/X86/sse2-vector-shifts.ll

	Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines
	define <4 x i32> @shl_srl_v4i32(<4 x i32> %x) nounwind {			define <4 x i32> @shl_srl_v4i32(<4 x i32> %x) nounwind {
	%shl0 = lshr <4 x i32> %x, <i32 2, i32 2, i32 2, i32 2>			%shl0 = lshr <4 x i32> %x, <i32 2, i32 2, i32 2, i32 2>
	%shl1 = shl <4 x i32> %shl0, <i32 5, i32 5, i32 5, i32 5>			%shl1 = shl <4 x i32> %shl0, <i32 5, i32 5, i32 5, i32 5>
	ret <4 x i32> %shl1			ret <4 x i32> %shl1
	}			}

	; CHECK-LABEL: @shl_zext_srl_v4i32			; CHECK-LABEL: @shl_zext_srl_v4i32
	; CHECK: andps			; CHECK: andps
				; CHECK: andps
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	define <4 x i32> @shl_zext_srl_v4i32(<4 x i16> %x) nounwind {			define <4 x i32> @shl_zext_srl_v4i32(<4 x i16> %x) nounwind {
	%srl = lshr <4 x i16> %x, <i16 2, i16 2, i16 2, i16 2>			%srl = lshr <4 x i16> %x, <i16 2, i16 2, i16 2, i16 2>
	%zext = zext <4 x i16> %srl to <4 x i32>			%zext = zext <4 x i16> %srl to <4 x i32>
	%shl = shl <4 x i32> %zext, <i32 2, i32 2, i32 2, i32 2>			%shl = shl <4 x i32> %zext, <i32 2, i32 2, i32 2, i32 2>
	ret <4 x i32> %shl			ret <4 x i32> %shl
	}			}

	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/CodeGen/X86/sse3.ll

Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	entry:
%tmp9 = shufflevector <16 x i8> %tmp8, <16 x i8> %T0, <16 x i32> < i32 0, i32 1, i32 2, i32 17, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef , i32 undef >		%tmp9 = shufflevector <16 x i8> %tmp8, <16 x i8> %T0, <16 x i32> < i32 0, i32 1, i32 2, i32 17, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef , i32 undef >
ret <16 x i8> %tmp9		ret <16 x i8> %tmp9
}		}

; rdar://8520311		; rdar://8520311
define <4 x i32> @t17() nounwind {		define <4 x i32> @t17() nounwind {
; X64-LABEL: t17:		; X64-LABEL: t17:
; X64: ## BB#0: ## %entry		; X64: ## BB#0: ## %entry
; X64-NEXT: movddup {{.*#+}} xmm0 = mem[0,0]		; X64-NEXT: movaps (%rax), %xmm0
; X64-NEXT: andpd {{.*}}(%rip), %xmm0		; X64-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0,0,1,1]
		; X64-NEXT: pxor %xmm1, %xmm1
		; X64-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
		chandlercUnsubmitted Not Done Reply Inline Actions This seems like a pretty bad regression. Any hope of recovering it? chandlerc: This seems like a pretty bad regression. Any hope of recovering it?
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Yes - it requires DAGCombiner::visitVECTOR_SHUFFLE to be able to peek through shuffles of bitcasts of two inputs, so far it only handles the one input case. That should fix both the domain crossing and number of instructions. I've moved it up my todo list. RKSimon: Yes - it requires DAGCombiner::visitVECTOR_SHUFFLE to be able to peek through shuffles of…
; X64-NEXT: retq		; X64-NEXT: retq
entry:		entry:
%tmp1 = load <4 x float>, <4 x float>* undef, align 16		%tmp1 = load <4 x float>, <4 x float>* undef, align 16
%tmp2 = shufflevector <4 x float> %tmp1, <4 x float> undef, <4 x i32> <i32 4, i32 1, i32 2, i32 3>		%tmp2 = shufflevector <4 x float> %tmp1, <4 x float> undef, <4 x i32> <i32 4, i32 1, i32 2, i32 3>
%tmp3 = load <4 x float>, <4 x float>* undef, align 16		%tmp3 = load <4 x float>, <4 x float>* undef, align 16
%tmp4 = shufflevector <4 x float> %tmp2, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>		%tmp4 = shufflevector <4 x float> %tmp2, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
%tmp5 = bitcast <4 x float> %tmp3 to <4 x i32>		%tmp5 = bitcast <4 x float> %tmp3 to <4 x i32>
%tmp6 = shufflevector <4 x i32> %tmp5, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>		%tmp6 = shufflevector <4 x i32> %tmp5, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
%tmp7 = and <4 x i32> %tmp6, <i32 undef, i32 undef, i32 -1, i32 0>		%tmp7 = and <4 x i32> %tmp6, <i32 undef, i32 undef, i32 -1, i32 0>
ret <4 x i32> %tmp7		ret <4 x i32> %tmp7
}		}

test/CodeGen/X86/vec_cast2.ll

	Show All 40 Lines
	; CHECK-WIDE-NEXT: retl			; CHECK-WIDE-NEXT: retl
	%res = sitofp <4 x i8> %src to <4 x float>			%res = sitofp <4 x i8> %src to <4 x float>
	ret <4 x float> %res			ret <4 x float> %res
	}			}

	define <8 x float> @foo2_8(<8 x i8> %src) {			define <8 x float> @foo2_8(<8 x i8> %src) {
	; CHECK-LABEL: foo2_8:			; CHECK-LABEL: foo2_8:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; CHECK-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm0[4,4,5,5,6,6,7,7]
	; CHECK-NEXT: vpunpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]			; CHECK-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
	; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; CHECK-NEXT: vpand %xmm2, %xmm1, %xmm1
	; CHECK-NEXT: vandps LCPI2_0, %ymm0, %ymm0			; CHECK-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
				; CHECK-NEXT: vpand %xmm2, %xmm0, %xmm0
				; CHECK-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				chandlercUnsubmitted Not Done Reply Inline Actions Same question as above... chandlerc: Same question as above...
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions This is the zero-extend issue that I mentioned in the summary. Yak shaving...... RKSimon: This is the zero-extend issue that I mentioned in the summary. Yak shaving......
	; CHECK-NEXT: vcvtdq2ps %ymm0, %ymm0			; CHECK-NEXT: vcvtdq2ps %ymm0, %ymm0
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	;			;
	; CHECK-WIDE-LABEL: foo2_8:			; CHECK-WIDE-LABEL: foo2_8:
	; CHECK-WIDE: ## BB#0:			; CHECK-WIDE: ## BB#0:
	; CHECK-WIDE-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; CHECK-WIDE-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; CHECK-WIDE-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; CHECK-WIDE-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; CHECK-WIDE-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; CHECK-WIDE-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_int_to_fp.ll

Show First 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	; AVX2-NEXT: retq
%cvt = uitofp <4 x i32> %a to <4 x double>		%cvt = uitofp <4 x i32> %a to <4 x double>
%shuf = shufflevector <4 x double> %cvt, <4 x double> undef, <2 x i32> <i32 0, i32 1>		%shuf = shufflevector <4 x double> %cvt, <4 x double> undef, <2 x i32> <i32 0, i32 1>
ret <2 x double> %shuf		ret <2 x double> %shuf
}		}

define <2 x double> @uitofp_2i16_to_2f64(<8 x i16> %a) {		define <2 x double> @uitofp_2i16_to_2f64(<8 x i16> %a) {
; SSE-LABEL: uitofp_2i16_to_2f64:		; SSE-LABEL: uitofp_2i16_to_2f64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]		; SSE-NEXT: pxor %xmm1, %xmm1
; SSE-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,5,6,7]		; SSE-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; SSE-NEXT: cvtdq2pd %xmm0, %xmm0		; SSE-NEXT: cvtdq2pd %xmm0, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: uitofp_2i16_to_2f64:		; AVX-LABEL: uitofp_2i16_to_2f64:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero		; AVX-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0		; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuf = shufflevector <8 x i16> %a, <8 x i16> undef, <2 x i32> <i32 0, i32 1>		%shuf = shufflevector <8 x i16> %a, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
%cvt = uitofp <2 x i16> %shuf to <2 x double>		%cvt = uitofp <2 x i16> %shuf to <2 x double>
ret <2 x double> %cvt		ret <2 x double> %cvt
}		}

define <2 x double> @uitofp_8i16_to_2f64(<8 x i16> %a) {		define <2 x double> @uitofp_8i16_to_2f64(<8 x i16> %a) {
Show All 20 Lines	; AVX2-NEXT: retq
%cvt = uitofp <8 x i16> %a to <8 x double>		%cvt = uitofp <8 x i16> %a to <8 x double>
%shuf = shufflevector <8 x double> %cvt, <8 x double> undef, <2 x i32> <i32 0, i32 1>		%shuf = shufflevector <8 x double> %cvt, <8 x double> undef, <2 x i32> <i32 0, i32 1>
ret <2 x double> %shuf		ret <2 x double> %shuf
}		}

define <2 x double> @uitofp_2i8_to_2f64(<16 x i8> %a) {		define <2 x double> @uitofp_2i8_to_2f64(<16 x i8> %a) {
; SSE-LABEL: uitofp_2i8_to_2f64:		; SSE-LABEL: uitofp_2i8_to_2f64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE-NEXT: pxor %xmm1, %xmm1
; SSE-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0,0,1,1]		; SSE-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; SSE-NEXT: cvtdq2pd %xmm0, %xmm0		; SSE-NEXT: cvtdq2pd %xmm0, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: uitofp_2i8_to_2f64:		; AVX-LABEL: uitofp_2i8_to_2f64:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero		; AVX-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0		; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <2 x i32> <i32 0, i32 1>		%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <2 x i32> <i32 0, i32 1>
%cvt = uitofp <2 x i8> %shuf to <2 x double>		%cvt = uitofp <2 x i8> %shuf to <2 x double>
ret <2 x double> %cvt		ret <2 x double> %cvt
}		}

define <2 x double> @uitofp_16i8_to_2f64(<16 x i8> %a) {		define <2 x double> @uitofp_16i8_to_2f64(<16 x i8> %a) {
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%cvt = uitofp <4 x i64> %a to <4 x double>		%cvt = uitofp <4 x i64> %a to <4 x double>
ret <4 x double> %cvt		ret <4 x double> %cvt
}		}

define <4 x double> @uitofp_4i32_to_4f64(<4 x i32> %a) {		define <4 x double> @uitofp_4i32_to_4f64(<4 x i32> %a) {
; SSE-LABEL: uitofp_4i32_to_4f64:		; SSE-LABEL: uitofp_4i32_to_4f64:
; SSE: # BB#0:		; SSE: # BB#0:
		; SSE-NEXT: movdqa %xmm0, %xmm2
; SSE-NEXT: pxor %xmm1, %xmm1		; SSE-NEXT: pxor %xmm1, %xmm1
; SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,2,3,3]
; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE-NEXT: movdqa {{.*#+}} xmm3 = [1127219200,1160773632,0,0]		; SSE-NEXT: movdqa {{.*#+}} xmm3 = [1127219200,1160773632,0,0]
; SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]		; SSE-NEXT: pshufd {{.*#+}} xmm4 = xmm0[2,3,0,1]
; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]
; SSE-NEXT: movapd {{.*#+}} xmm4 = [4.503600e+15,1.934281e+25]		; SSE-NEXT: movapd {{.*#+}} xmm5 = [4.503600e+15,1.934281e+25]
; SSE-NEXT: subpd %xmm4, %xmm0		; SSE-NEXT: subpd %xmm5, %xmm0
; SSE-NEXT: pshufd {{.*#+}} xmm5 = xmm0[2,3,0,1]		; SSE-NEXT: pshufd {{.*#+}} xmm6 = xmm0[2,3,0,1]
; SSE-NEXT: addpd %xmm5, %xmm0		; SSE-NEXT: addpd %xmm6, %xmm0
; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm3[0],xmm4[1],xmm3[1]
; SSE-NEXT: subpd %xmm4, %xmm1		; SSE-NEXT: subpd %xmm5, %xmm4
; SSE-NEXT: pshufd {{.*#+}} xmm5 = xmm1[2,3,0,1]		; SSE-NEXT: pshufd {{.*#+}} xmm6 = xmm4[2,3,0,1]
; SSE-NEXT: addpd %xmm1, %xmm5		; SSE-NEXT: addpd %xmm4, %xmm6
; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm5[0]		; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm6[0]
; SSE-NEXT: pand {{.*}}(%rip), %xmm2		; SSE-NEXT: punpckhdq {{.*#+}} xmm2 = xmm2[2],xmm1[2],xmm2[3],xmm1[3]
; SSE-NEXT: pshufd {{.*#+}} xmm5 = xmm2[2,3,0,1]		; SSE-NEXT: pshufd {{.*#+}} xmm4 = xmm2[2,3,0,1]
; SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1]
; SSE-NEXT: subpd %xmm4, %xmm2		; SSE-NEXT: subpd %xmm5, %xmm2
; SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm2[2,3,0,1]		; SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm2[2,3,0,1]
; SSE-NEXT: addpd %xmm2, %xmm1		; SSE-NEXT: addpd %xmm2, %xmm1
; SSE-NEXT: punpckldq {{.*#+}} xmm5 = xmm5[0],xmm3[0],xmm5[1],xmm3[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm3[0],xmm4[1],xmm3[1]
; SSE-NEXT: subpd %xmm4, %xmm5		; SSE-NEXT: subpd %xmm5, %xmm4
; SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm5[2,3,0,1]		; SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm4[2,3,0,1]
; SSE-NEXT: addpd %xmm5, %xmm2		; SSE-NEXT: addpd %xmm4, %xmm2
; SSE-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]		; SSE-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: uitofp_4i32_to_4f64:		; AVX1-LABEL: uitofp_4i32_to_4f64:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm1		; AVX1-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm1
; AVX1-NEXT: vcvtdq2pd %xmm1, %ymm1		; AVX1-NEXT: vcvtdq2pd %xmm1, %ymm1
; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0		; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0
▲ Show 20 Lines • Show All 1,052 Lines • ▼ Show 20 Lines

define <8 x float> @uitofp_8i16_to_8f32(<8 x i16> %a) {		define <8 x float> @uitofp_8i16_to_8f32(<8 x i16> %a) {
; SSE-LABEL: uitofp_8i16_to_8f32:		; SSE-LABEL: uitofp_8i16_to_8f32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pxor %xmm1, %xmm1		; SSE-NEXT: pxor %xmm1, %xmm1
; SSE-NEXT: movdqa %xmm0, %xmm2		; SSE-NEXT: movdqa %xmm0, %xmm2
; SSE-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]		; SSE-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
; SSE-NEXT: cvtdq2ps %xmm2, %xmm2		; SSE-NEXT: cvtdq2ps %xmm2, %xmm2
; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]		; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: cvtdq2ps %xmm0, %xmm1		; SSE-NEXT: cvtdq2ps %xmm0, %xmm1
; SSE-NEXT: movaps %xmm2, %xmm0		; SSE-NEXT: movaps %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: uitofp_8i16_to_8f32:		; AVX1-LABEL: uitofp_8i16_to_8f32:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]		; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
Show All 24 Lines
; SSE-NEXT: pand {{.*}}(%rip), %xmm0		; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: cvtdq2ps %xmm0, %xmm1		; SSE-NEXT: cvtdq2ps %xmm0, %xmm1
; SSE-NEXT: movaps %xmm2, %xmm0		; SSE-NEXT: movaps %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: uitofp_8i8_to_8f32:		; AVX1-LABEL: uitofp_8i8_to_8f32:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]		; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]
		; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
		; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
		; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
		; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
		chandlercUnsubmitted Not Done Reply Inline Actions and here, wow! chandlerc: and here, wow!
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions This is the zero-extend issue that I mentioned in the summary. RKSimon: This is the zero-extend issue that I mentioned in the summary.
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0		; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: uitofp_8i8_to_8f32:		; AVX2-LABEL: uitofp_8i8_to_8f32:
; AVX2: # BB#0:		; AVX2: # BB#0:
; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero		; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
; AVX2-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1		; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0		; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%cvt = uitofp <8 x i8> %shuf to <8 x float>		%cvt = uitofp <8 x i8> %shuf to <8 x float>
ret <8 x float> %cvt		ret <8 x float> %cvt
}		}

define <8 x float> @uitofp_16i8_to_8f32(<16 x i8> %a) {		define <8 x float> @uitofp_16i8_to_8f32(<16 x i8> %a) {
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-zext.ll

Show All 33 Lines

; PR17654		; PR17654
define <16 x i16> @zext_16i8_to_16i16(<16 x i8> %A) {		define <16 x i16> @zext_16i8_to_16i16(<16 x i8> %A) {
; SSE2-LABEL: zext_16i8_to_16i16:		; SSE2-LABEL: zext_16i8_to_16i16:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm2		; SSE2-NEXT: pxor %xmm2, %xmm2
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_16i8_to_16i16:		; SSSE3-LABEL: zext_16i8_to_16i16:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: pxor %xmm2, %xmm2		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
; SSSE3-NEXT: pand {{.*}}(%rip), %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_16i8_to_16i16:		; SSE41-LABEL: zext_16i8_to_16i16:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: movdqa %xmm0, %xmm1		; SSE41-NEXT: movdqa %xmm0, %xmm1
		; SSE41-NEXT: pxor %xmm2, %xmm2
; SSE41-NEXT: pmovzxbw {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero		; SSE41-NEXT: pmovzxbw {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
; SSE41-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE41-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
; SSE41-NEXT: pand {{.*}}(%rip), %xmm1
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: zext_16i8_to_16i16:		; AVX1-LABEL: zext_16i8_to_16i16:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vpunpckhbw {{.*#+}} xmm1 = xmm0[8],xmm1[8],xmm0[9],xmm1[9],xmm0[10],xmm1[10],xmm0[11],xmm1[11],xmm0[12],xmm1[12],xmm0[13],xmm1[13],xmm0[14],xmm1[14],xmm0[15],xmm1[15]		; AVX1-NEXT: vpunpckhbw {{.*#+}} xmm1 = xmm0[8],xmm1[8],xmm0[9],xmm1[9],xmm0[10],xmm1[10],xmm0[11],xmm1[11],xmm0[12],xmm1[12],xmm0[13],xmm1[13],xmm0[14],xmm1[14],xmm0[15],xmm1[15]
; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_16i8_to_8i32:		; SSSE3-LABEL: zext_16i8_to_8i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: pxor %xmm2, %xmm2		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]
; SSSE3-NEXT: pand {{.*}}(%rip), %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_16i8_to_8i32:		; SSE41-LABEL: zext_16i8_to_8i32:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: movdqa %xmm0, %xmm1		; SSE41-NEXT: movdqa %xmm0, %xmm1
; SSE41-NEXT: pmovzxbd {{.*#+}} xmm0 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero		; SSE41-NEXT: pmovzxbd {{.*#+}} xmm0 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
; SSE41-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE41-NEXT: pshufb {{.*#+}} xmm1 = xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero
; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]
; SSE41-NEXT: pand {{.*}}(%rip), %xmm1
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: zext_16i8_to_8i32:		; AVX1-LABEL: zext_16i8_to_8i32:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]		; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]
		; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
		; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
		; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
		; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: zext_16i8_to_8i32:		; AVX2-LABEL: zext_16i8_to_8i32:
; AVX2: # BB#0: # %entry		; AVX2: # BB#0: # %entry
; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero		; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
; AVX2-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1		; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
entry:		entry:
%B = shufflevector <16 x i8> %A, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%B = shufflevector <16 x i8> %A, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%C = zext <8 x i8> %B to <8 x i32>		%C = zext <8 x i8> %B to <8 x i32>
ret <8 x i32> %C		ret <8 x i32> %C
}		}

define <2 x i64> @zext_16i8_to_2i64(<16 x i8> %A) nounwind uwtable readnone ssp {		define <2 x i64> @zext_16i8_to_2i64(<16 x i8> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_16i8_to_2i64:		; SSE2-LABEL: zext_16i8_to_2i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: pxor %xmm1, %xmm1
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0,0,1,1]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm0		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_16i8_to_2i64:		; SSSE3-LABEL: zext_16i8_to_2i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero		; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
; SSSE3-NEXT: pand {{.*}}(%rip), %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_16i8_to_2i64:		; SSE41-LABEL: zext_16i8_to_2i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero		; SSE41-NEXT: pmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
; SSE41-NEXT: pand {{.*}}(%rip), %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: zext_16i8_to_2i64:		; AVX-LABEL: zext_16i8_to_2i64:
; AVX: # BB#0: # %entry		; AVX: # BB#0: # %entry
; AVX-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero		; AVX-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
entry:		entry:
%B = shufflevector <16 x i8> %A, <16 x i8> undef, <2 x i32> <i32 0, i32 1>		%B = shufflevector <16 x i8> %A, <16 x i8> undef, <2 x i32> <i32 0, i32 1>
%C = zext <2 x i8> %B to <2 x i64>		%C = zext <2 x i8> %B to <2 x i64>
ret <2 x i64> %C		ret <2 x i64> %C
}		}

define <4 x i64> @zext_16i8_to_4i64(<16 x i8> %A) nounwind uwtable readnone ssp {		define <4 x i64> @zext_16i8_to_4i64(<16 x i8> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_16i8_to_4i64:		; SSE2-LABEL: zext_16i8_to_4i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
		; SSE2-NEXT: pxor %xmm1, %xmm1
; SSE2-NEXT: movdqa %xmm0, %xmm2		; SSE2-NEXT: movdqa %xmm0, %xmm2
; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [255,255]
; SSE2-NEXT: pand %xmm3, %xmm2
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,2,1]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,2,1]
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[2,1,2,3,4,5,6,7]		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[2,1,2,3,4,5,6,7]
; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm0[0,1,2,3,7,5,6,7]		; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm0[0,1,2,3,7,5,6,7]
; SSE2-NEXT: pand %xmm3, %xmm1		; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: movdqa %xmm2, %xmm0		; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_16i8_to_4i64:		; SSSE3-LABEL: zext_16i8_to_4i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm2		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: pshufb {{.*#+}} xmm2 = xmm2[0],zero,zero,zero,zero,zero,zero,zero,xmm2[1],zero,zero,zero,zero,zero,zero,zero		; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
; SSSE3-NEXT: movdqa {{.*#+}} xmm1 = [255,255]		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[2],zero,zero,zero,zero,zero,zero,zero,xmm1[3],zero,zero,zero,zero,zero,zero,zero
; SSSE3-NEXT: pand %xmm1, %xmm2
; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[2,2,1,1,2,2,3,3,3,3,5,5,2,2,3,3]
; SSSE3-NEXT: pand %xmm0, %xmm1
; SSSE3-NEXT: movdqa %xmm2, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_16i8_to_4i64:		; SSE41-LABEL: zext_16i8_to_4i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxbq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero		; SSE41-NEXT: movdqa %xmm0, %xmm1
; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [255,255]		; SSE41-NEXT: pmovzxbq {{.*#+}} xmm0 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
; SSE41-NEXT: pand %xmm1, %xmm2		; SSE41-NEXT: pshufb {{.*#+}} xmm1 = xmm1[2],zero,zero,zero,zero,zero,zero,zero,xmm1[3],zero,zero,zero,zero,zero,zero,zero
; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[2,2,1,1,2,2,3,3,3,3,5,5,2,2,3,3]
; SSE41-NEXT: pand %xmm0, %xmm1
; SSE41-NEXT: movdqa %xmm2, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: zext_16i8_to_4i64:		; AVX1-LABEL: zext_16i8_to_4i64:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero		; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]		; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
		; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
		; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
		; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
		; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: zext_16i8_to_4i64:		; AVX2-LABEL: zext_16i8_to_4i64:
; AVX2: # BB#0: # %entry		; AVX2: # BB#0: # %entry
; AVX2-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero		; AVX2-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
; AVX2-NEXT: vpbroadcastq {{.*}}(%rip), %ymm1		; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
entry:		entry:
%B = shufflevector <16 x i8> %A, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%B = shufflevector <16 x i8> %A, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%C = zext <4 x i8> %B to <4 x i64>		%C = zext <4 x i8> %B to <4 x i64>
ret <4 x i64> %C		ret <4 x i64> %C
}		}

define <4 x i32> @zext_8i16_to_4i32(<8 x i16> %A) nounwind uwtable readnone ssp {		define <4 x i32> @zext_8i16_to_4i32(<8 x i16> %A) nounwind uwtable readnone ssp {
Show All 25 Lines
}		}

define <8 x i32> @zext_8i16_to_8i32(<8 x i16> %A) nounwind uwtable readnone ssp {		define <8 x i32> @zext_8i16_to_8i32(<8 x i16> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_8i16_to_8i32:		; SSE2-LABEL: zext_8i16_to_8i32:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm2		; SSE2-NEXT: pxor %xmm2, %xmm2
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_8i16_to_8i32:		; SSSE3-LABEL: zext_8i16_to_8i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: pxor %xmm2, %xmm2		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]		; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
; SSSE3-NEXT: pand {{.*}}(%rip), %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_8i16_to_8i32:		; SSE41-LABEL: zext_8i16_to_8i32:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: movdqa %xmm0, %xmm1		; SSE41-NEXT: movdqa %xmm0, %xmm1
		; SSE41-NEXT: pxor %xmm2, %xmm2
; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero		; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]		; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
; SSE41-NEXT: pand {{.*}}(%rip), %xmm1
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: zext_8i16_to_8i32:		; AVX1-LABEL: zext_8i16_to_8i32:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]		; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero		; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: zext_8i16_to_8i32:		; AVX2-LABEL: zext_8i16_to_8i32:
; AVX2: # BB#0: # %entry		; AVX2: # BB#0: # %entry
; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
; AVX2-NEXT: retq		; AVX2-NEXT: retq
entry:		entry:
%B = zext <8 x i16> %A to <8 x i32>		%B = zext <8 x i16> %A to <8 x i32>
ret <8 x i32>%B		ret <8 x i32>%B
}		}

define <2 x i64> @zext_8i16_to_2i64(<8 x i16> %A) nounwind uwtable readnone ssp {		define <2 x i64> @zext_8i16_to_2i64(<8 x i16> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_8i16_to_2i64:		; SSE2-LABEL: zext_8i16_to_2i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]		; SSE2-NEXT: pxor %xmm1, %xmm1
; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,5,6,7]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm0		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_8i16_to_2i64:		; SSSE3-LABEL: zext_8i16_to_2i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]		; SSSE3-NEXT: pxor %xmm1, %xmm1
; SSSE3-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,5,6,7]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSSE3-NEXT: pand {{.*}}(%rip), %xmm0		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_8i16_to_2i64:		; SSE41-LABEL: zext_8i16_to_2i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero		; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; SSE41-NEXT: pand {{.*}}(%rip), %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: zext_8i16_to_2i64:		; AVX-LABEL: zext_8i16_to_2i64:
; AVX: # BB#0: # %entry		; AVX: # BB#0: # %entry
; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero		; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
entry:		entry:
%B = shufflevector <8 x i16> %A, <8 x i16> undef, <2 x i32> <i32 0, i32 1>		%B = shufflevector <8 x i16> %A, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
%C = zext <2 x i16> %B to <2 x i64>		%C = zext <2 x i16> %B to <2 x i64>
ret <2 x i64> %C		ret <2 x i64> %C
}		}

define <4 x i64> @zext_8i16_to_4i64(<8 x i16> %A) nounwind uwtable readnone ssp {		define <4 x i64> @zext_8i16_to_4i64(<8 x i16> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_8i16_to_4i64:		; SSE2-LABEL: zext_8i16_to_4i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,0,3]		; SSE2-NEXT: pxor %xmm1, %xmm1
; SSE2-NEXT: pshufhw {{.*#+}} xmm2 = xmm1[0,1,2,3,5,5,6,7]		; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm0[0,1,2,1]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [65535,65535]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSE2-NEXT: pand %xmm3, %xmm2		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,2,1]		; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm2[2,1,2,3,4,5,6,7]
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[2,1,2,3,4,5,6,7]		; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,5,6,7]
; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm0[0,1,2,3,7,5,6,7]		; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: pand %xmm3, %xmm1
; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_8i16_to_4i64:		; SSSE3-LABEL: zext_8i16_to_4i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,0,3]		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[4,5,2,3,4,5,6,7,6,7,10,11,4,5,6,7]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [65535,65535]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSSE3-NEXT: pand %xmm2, %xmm1		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[4,5],zero,zero,zero,zero,zero,zero,xmm1[6,7],zero,zero,zero,zero,zero,zero
; SSSE3-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,5,6,7]
; SSSE3-NEXT: pand %xmm2, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_8i16_to_4i64:		; SSE41-LABEL: zext_8i16_to_4i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxwq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero		; SSE41-NEXT: pmovzxwq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [65535,65535]
; SSE41-NEXT: pand %xmm1, %xmm2
; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[4,5,2,3,4,5,6,7,6,7,10,11,4,5,6,7]		; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[4,5,2,3,4,5,6,7,6,7,10,11,4,5,6,7]
; SSE41-NEXT: pand %xmm0, %xmm1		; SSE41-NEXT: pxor %xmm1, %xmm1
		; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3],xmm0[4],xmm1[5,6,7]
; SSE41-NEXT: movdqa %xmm2, %xmm0		; SSE41-NEXT: movdqa %xmm2, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: zext_8i16_to_4i64:		; AVX1-LABEL: zext_8i16_to_4i64:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero		; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]		; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
		; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2
		; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm1[0],xmm2[1,2,3],xmm1[4],xmm2[5,6,7]
		; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
		; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1,2,3],xmm0[4],xmm2[5,6,7]
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: zext_8i16_to_4i64:		; AVX2-LABEL: zext_8i16_to_4i64:
; AVX2: # BB#0: # %entry		; AVX2: # BB#0: # %entry
; AVX2-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero		; AVX2-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
; AVX2-NEXT: vpbroadcastq {{.*}}(%rip), %ymm1		; AVX2-NEXT: vpxor %ymm1, %ymm1, %ymm1
; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0		; AVX2-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5,6,7],ymm0[8],ymm1[9,10,11],ymm0[12],ymm1[13,14,15]
; AVX2-NEXT: retq		; AVX2-NEXT: retq
entry:		entry:
%B = shufflevector <8 x i16> %A, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%B = shufflevector <8 x i16> %A, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%C = zext <4 x i16> %B to <4 x i64>		%C = zext <4 x i16> %B to <4 x i64>
ret <4 x i64> %C		ret <4 x i64> %C
}		}

define <2 x i64> @zext_4i32_to_2i64(<4 x i32> %A) nounwind uwtable readnone ssp {		define <2 x i64> @zext_4i32_to_2i64(<4 x i32> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_4i32_to_2i64:		; SSE2-LABEL: zext_4i32_to_2i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]		; SSE2-NEXT: pxor %xmm1, %xmm1
; SSE2-NEXT: pand {{.*}}(%rip), %xmm0		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_4i32_to_2i64:		; SSSE3-LABEL: zext_4i32_to_2i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]		; SSSE3-NEXT: pxor %xmm1, %xmm1
; SSSE3-NEXT: pand {{.*}}(%rip), %xmm0		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_4i32_to_2i64:		; SSE41-LABEL: zext_4i32_to_2i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero		; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; SSE41-NEXT: pand {{.*}}(%rip), %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: zext_4i32_to_2i64:		; AVX-LABEL: zext_4i32_to_2i64:
; AVX: # BB#0: # %entry		; AVX: # BB#0: # %entry
; AVX-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero		; AVX-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
entry:		entry:
%B = shufflevector <4 x i32> %A, <4 x i32> undef, <2 x i32> <i32 0, i32 1>		%B = shufflevector <4 x i32> %A, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
%C = zext <2 x i32> %B to <2 x i64>		%C = zext <2 x i32> %B to <2 x i64>
ret <2 x i64> %C		ret <2 x i64> %C
}		}

define <4 x i64> @zext_4i32_to_4i64(<4 x i32> %A) nounwind uwtable readnone ssp {		define <4 x i64> @zext_4i32_to_4i64(<4 x i32> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_4i32_to_4i64:		; SSE2-LABEL: zext_4i32_to_4i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm0[0,1,1,3]		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [4294967295,4294967295]		; SSE2-NEXT: pxor %xmm2, %xmm2
; SSE2-NEXT: pand %xmm3, %xmm2		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,2,3,3]		; SSE2-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
; SSE2-NEXT: pand %xmm3, %xmm1
; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_4i32_to_4i64:		; SSSE3-LABEL: zext_4i32_to_4i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: pshufd {{.*#+}} xmm2 = xmm0[0,1,1,3]		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: movdqa {{.*#+}} xmm3 = [4294967295,4294967295]		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: pand %xmm3, %xmm2		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,2,3,3]		; SSSE3-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
; SSSE3-NEXT: pand %xmm3, %xmm1
; SSSE3-NEXT: movdqa %xmm2, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_4i32_to_4i64:		; SSE41-LABEL: zext_4i32_to_4i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxdq {{.*#+}} xmm2 = xmm0[0],zero,xmm0[1],zero		; SSE41-NEXT: movdqa %xmm0, %xmm1
; SSE41-NEXT: movdqa {{.*#+}} xmm3 = [4294967295,4294967295]		; SSE41-NEXT: pxor %xmm2, %xmm2
; SSE41-NEXT: pand %xmm3, %xmm2		; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero
; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,2,3,3]		; SSE41-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
; SSE41-NEXT: pand %xmm3, %xmm1
; SSE41-NEXT: movdqa %xmm2, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: zext_4i32_to_4i64:		; AVX1-LABEL: zext_4i32_to_4i64:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vpunpckhdq {{.*#+}} xmm1 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]		; AVX1-NEXT: vpunpckhdq {{.*#+}} xmm1 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero		; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines

define <4 x i64> @load_zext_4i8_to_4i64(<4 x i8> *%ptr) {		define <4 x i64> @load_zext_4i8_to_4i64(<4 x i8> *%ptr) {
; SSE2-LABEL: load_zext_4i8_to_4i64:		; SSE2-LABEL: load_zext_4i8_to_4i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero		; SSE2-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
; SSE2-NEXT: pand %xmm2, %xmm0		; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
; SSE2-NEXT: pand %xmm2, %xmm1		; SSE2-NEXT: pand %xmm2, %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: load_zext_4i8_to_4i64:		; SSSE3-LABEL: load_zext_4i8_to_4i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero		; SSSE3-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [255,255]		; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[4],zero,zero,zero,zero,zero,zero,zero
; SSSE3-NEXT: pand %xmm2, %xmm0		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[8],zero,zero,zero,zero,zero,zero,zero,xmm1[12],zero,zero,zero,zero,zero,zero,zero
; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
; SSSE3-NEXT: pand %xmm2, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: load_zext_4i8_to_4i64:		; SSE41-LABEL: load_zext_4i8_to_4i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxbq {{.*#+}} xmm0 = mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero		; SSE41-NEXT: pmovzxbq {{.*#+}} xmm0 = mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero
; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero		; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines

define <8 x i32> @load_zext_8i8_to_8i32(<8 x i8> *%ptr) {		define <8 x i32> @load_zext_8i8_to_8i32(<8 x i8> *%ptr) {
; SSE2-LABEL: load_zext_8i8_to_8i32:		; SSE2-LABEL: load_zext_8i8_to_8i32:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movq {{.*#+}} xmm1 = mem[0],zero		; SSE2-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
; SSE2-NEXT: pand %xmm2, %xmm0		; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]
; SSE2-NEXT: pand %xmm2, %xmm1		; SSE2-NEXT: pand %xmm2, %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: load_zext_8i8_to_8i32:		; SSSE3-LABEL: load_zext_8i8_to_8i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movq {{.*#+}} xmm1 = mem[0],zero		; SSSE3-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSSE3-NEXT: movdqa %xmm1, %xmm0		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[6],zero,zero,zero
; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255]		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[8],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[14],zero,zero,zero
; SSSE3-NEXT: pand %xmm2, %xmm0
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]
; SSSE3-NEXT: pand %xmm2, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: load_zext_8i8_to_8i32:		; SSE41-LABEL: load_zext_8i8_to_8i32:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxbd {{.*#+}} xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero		; SSE41-NEXT: pmovzxbd {{.*#+}} xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero		; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
Show All 16 Lines

define <16 x i16> @load_zext_16i8_to_16i16(<16 x i8> *%ptr) {		define <16 x i16> @load_zext_16i8_to_16i16(<16 x i8> *%ptr) {
; SSE2-LABEL: load_zext_16i8_to_16i16:		; SSE2-LABEL: load_zext_16i8_to_16i16:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa (%rdi), %xmm1		; SSE2-NEXT: movdqa (%rdi), %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm2		; SSE2-NEXT: pxor %xmm2, %xmm2
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: load_zext_16i8_to_16i16:		; SSSE3-LABEL: load_zext_16i8_to_16i16:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa (%rdi), %xmm1		; SSSE3-NEXT: movdqa (%rdi), %xmm1
; SSSE3-NEXT: pxor %xmm2, %xmm2		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: movdqa %xmm1, %xmm0		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
; SSSE3-NEXT: pand {{.*}}(%rip), %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: load_zext_16i8_to_16i16:		; SSE41-LABEL: load_zext_16i8_to_16i16:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxbw {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero		; SSE41-NEXT: pmovzxbw {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
; SSE41-NEXT: pmovzxbw {{.*#+}} xmm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero		; SSE41-NEXT: pmovzxbw {{.*#+}} xmm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
}		}

define <4 x i64> @load_zext_4i16_to_4i64(<4 x i16> *%ptr) {		define <4 x i64> @load_zext_4i16_to_4i64(<4 x i16> *%ptr) {
; SSE2-LABEL: load_zext_4i16_to_4i64:		; SSE2-LABEL: load_zext_4i16_to_4i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movq {{.*#+}} xmm1 = mem[0],zero		; SSE2-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [65535,65535]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [65535,0,0,0,65535,0,0,0]
; SSE2-NEXT: pand %xmm2, %xmm0		; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
; SSE2-NEXT: pand %xmm2, %xmm1		; SSE2-NEXT: pand %xmm2, %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: load_zext_4i16_to_4i64:		; SSSE3-LABEL: load_zext_4i16_to_4i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movq {{.*#+}} xmm1 = mem[0],zero		; SSSE3-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [65535,65535]		; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0,1],zero,zero,zero,zero,zero,zero,xmm0[4,5],zero,zero,zero,zero,zero,zero
; SSSE3-NEXT: pand %xmm2, %xmm0		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[8,9],zero,zero,zero,zero,zero,zero,xmm1[12,13],zero,zero,zero,zero,zero,zero
; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
; SSSE3-NEXT: pand %xmm2, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: load_zext_4i16_to_4i64:		; SSE41-LABEL: load_zext_4i16_to_4i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero		; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero
; SSE41-NEXT: pmovzxwq {{.*#+}} xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero		; SSE41-NEXT: pmovzxwq {{.*#+}} xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
Show All 16 Lines

define <8 x i32> @load_zext_8i16_to_8i32(<8 x i16> *%ptr) {		define <8 x i32> @load_zext_8i16_to_8i32(<8 x i16> *%ptr) {
; SSE2-LABEL: load_zext_8i16_to_8i32:		; SSE2-LABEL: load_zext_8i16_to_8i32:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa (%rdi), %xmm1		; SSE2-NEXT: movdqa (%rdi), %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm2		; SSE2-NEXT: pxor %xmm2, %xmm2
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: load_zext_8i16_to_8i32:		; SSSE3-LABEL: load_zext_8i16_to_8i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa (%rdi), %xmm1		; SSSE3-NEXT: movdqa (%rdi), %xmm1
; SSSE3-NEXT: pxor %xmm2, %xmm2		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: movdqa %xmm1, %xmm0		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]		; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
; SSSE3-NEXT: pand {{.*}}(%rip), %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: load_zext_8i16_to_8i32:		; SSE41-LABEL: load_zext_8i16_to_8i32:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero		; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero
; SSE41-NEXT: pmovzxwd {{.*#+}} xmm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero		; SSE41-NEXT: pmovzxwd {{.*#+}} xmm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
%Y = zext <2 x i32> %X to <2 x i64>		%Y = zext <2 x i32> %X to <2 x i64>
ret <2 x i64> %Y		ret <2 x i64> %Y
}		}

define <4 x i64> @load_zext_4i32_to_4i64(<4 x i32> *%ptr) {		define <4 x i64> @load_zext_4i32_to_4i64(<4 x i32> *%ptr) {
; SSE2-LABEL: load_zext_4i32_to_4i64:		; SSE2-LABEL: load_zext_4i32_to_4i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa (%rdi), %xmm1		; SSE2-NEXT: movdqa (%rdi), %xmm1
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]		; SSE2-NEXT: pxor %xmm2, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [4294967295,4294967295]		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pand %xmm2, %xmm0		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]		; SSE2-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
; SSE2-NEXT: pand %xmm2, %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: load_zext_4i32_to_4i64:		; SSSE3-LABEL: load_zext_4i32_to_4i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa (%rdi), %xmm1		; SSSE3-NEXT: movdqa (%rdi), %xmm1
; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [4294967295,4294967295]		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: pand %xmm2, %xmm0		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]		; SSSE3-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
; SSSE3-NEXT: pand %xmm2, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: load_zext_4i32_to_4i64:		; SSE41-LABEL: load_zext_4i32_to_4i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = mem[0],zero,mem[1],zero		; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = mem[0],zero,mem[1],zero
; SSE41-NEXT: pmovzxdq {{.*#+}} xmm1 = mem[0],zero,mem[1],zero		; SSE41-NEXT: pmovzxdq {{.*#+}} xmm1 = mem[0],zero,mem[1],zero
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
Show All 14 Lines
ret <4 x i64> %Y		ret <4 x i64> %Y
}		}

define <8 x i32> @zext_8i8_to_8i32(<8 x i8> %z) {		define <8 x i32> @zext_8i8_to_8i32(<8 x i8> %z) {
; SSE2-LABEL: zext_8i8_to_8i32:		; SSE2-LABEL: zext_8i8_to_8i32:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm2		; SSE2-NEXT: movdqa %xmm0, %xmm2
; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
; SSE2-NEXT: pand %xmm1, %xmm2		; SSE2-NEXT: pand %xmm1, %xmm2
; SSE2-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]
; SSE2-NEXT: pand %xmm0, %xmm1		; SSE2-NEXT: pand %xmm0, %xmm1
; SSE2-NEXT: movdqa %xmm2, %xmm0		; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_8i8_to_8i32:		; SSSE3-LABEL: zext_8i8_to_8i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm2		; SSSE3-NEXT: movdqa %xmm0, %xmm2
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
; SSSE3-NEXT: movdqa {{.*#+}} xmm1 = [255,255,255,255]		; SSSE3-NEXT: movdqa {{.*#+}} xmm1 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
; SSSE3-NEXT: pand %xmm1, %xmm2		; SSSE3-NEXT: pand %xmm1, %xmm2
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]		; SSSE3-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]
; SSSE3-NEXT: pand %xmm0, %xmm1		; SSSE3-NEXT: pand %xmm0, %xmm1
; SSSE3-NEXT: movdqa %xmm2, %xmm0		; SSSE3-NEXT: movdqa %xmm2, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: zext_8i8_to_8i32:		; SSE41-LABEL: zext_8i8_to_8i32:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: pmovzxwd {{.*#+}} xmm2 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero		; SSE41-NEXT: pmovzxwd {{.*#+}} xmm2 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [255,255,255,255]		; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
; SSE41-NEXT: pand %xmm1, %xmm2		; SSE41-NEXT: pand %xmm1, %xmm2
; SSE41-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]		; SSE41-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]
; SSE41-NEXT: pand %xmm0, %xmm1		; SSE41-NEXT: pand %xmm0, %xmm1
; SSE41-NEXT: movdqa %xmm2, %xmm0		; SSE41-NEXT: movdqa %xmm2, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: zext_8i8_to_8i32:		; AVX1-LABEL: zext_8i8_to_8i32:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero		; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm0[4,4,5,5,6,6,7,7]
; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]		; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0		; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0		; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
		; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: zext_8i8_to_8i32:		; AVX2-LABEL: zext_8i8_to_8i32:
; AVX2: # BB#0: # %entry		; AVX2: # BB#0: # %entry
; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
; AVX2-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1		; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
entry:		entry:
%t = zext <8 x i8> %z to <8 x i32>		%t = zext <8 x i8> %z to <8 x i32>
ret <8 x i32> %t		ret <8 x i32> %t
}		}

define <8 x i32> @shuf_zext_8i16_to_8i32(<8 x i16> %A) nounwind uwtable readnone ssp {		define <8 x i32> @shuf_zext_8i16_to_8i32(<8 x i16> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: shuf_zext_8i16_to_8i32:		; SSE2-LABEL: shuf_zext_8i16_to_8i32:
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	entry:
%B = shufflevector <4 x i32> %A, <4 x i32> zeroinitializer, <8 x i32> <i32 0, i32 4, i32 1, i32 4, i32 2, i32 4, i32 3, i32 4>		%B = shufflevector <4 x i32> %A, <4 x i32> zeroinitializer, <8 x i32> <i32 0, i32 4, i32 1, i32 4, i32 2, i32 4, i32 3, i32 4>
%Z = bitcast <8 x i32> %B to <4 x i64>		%Z = bitcast <8 x i32> %B to <4 x i64>
ret <4 x i64> %Z		ret <4 x i64> %Z
}		}

define <8 x i32> @shuf_zext_8i8_to_8i32(<8 x i8> %A) {		define <8 x i32> @shuf_zext_8i8_to_8i32(<8 x i8> %A) {
; SSE2-LABEL: shuf_zext_8i8_to_8i32:		; SSE2-LABEL: shuf_zext_8i8_to_8i32:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: pand {{.*}}(%rip), %xmm0		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: packuswb %xmm0, %xmm0		; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: pxor %xmm1, %xmm1		; SSE2-NEXT: packuswb %xmm1, %xmm1
; SSE2-NEXT: movdqa %xmm0, %xmm2		; SSE2-NEXT: pxor %xmm2, %xmm2
; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; SSE2-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [0,255,255,255,0,255,255,255,0,255,255,255,0,255,255,255]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]
; SSE2-NEXT: pandn %xmm0, %xmm1		; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: shuf_zext_8i8_to_8i32:		; SSSE3-LABEL: shuf_zext_8i8_to_8i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
; SSSE3-NEXT: pxor %xmm2, %xmm2		; SSSE3-NEXT: pxor %xmm2, %xmm2
; SSSE3-NEXT: movdqa %xmm1, %xmm0		; SSSE3-NEXT: movdqa %xmm1, %xmm0
Show All 31 Lines

test/CodeGen/X86/vselect-avx.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; to be optimized into a and. In that case, the conditional mask was wrong.			; to be optimized into a and. In that case, the conditional mask was wrong.
	;			;
	; Make sure that the and is fed by the original mask.			; Make sure that the and is fed by the original mask.
	;			;
	; <rdar://problem/18819506>			; <rdar://problem/18819506>

	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; Compute the mask.			; Compute the mask.
	; CHECK: vpcmpeqd {{%xmm[0-9]+}}, {{%xmm[0-9]+}}, [[MASK:%xmm[0-9]+]]			; CHECK: vpcmpeqd {{%xmm[0-9]+}}, {{%xmm[0-9]+}}, [[MASK:%xmm[0-9]+]]
	; Do not shrink the bit of the mask.			; Do not shrink the bit of the mask.
	; CHECK-NOT: vpslld $31, [[MASK]], {{%xmm[0-9]+}}			; CHECK-NOT: vpslld $31, [[MASK]], {{%xmm[0-9]+}}
	; Use the mask in the blend.			; Use the mask in the blend.
	; CHECK-NEXT: vblendvps [[MASK]], %xmm{{[0-9]+}}, %xmm{{[0-9]+}}, %xmm{{[0-9]+}}			; CHECK-NEXT: vblendvps [[MASK]], %xmm{{[0-9]+}}, %xmm{{[0-9]+}}, %xmm{{[0-9]+}}
	; Use the mask in the and.			; Shuffle mask to truncate.
	; CHECK-NEXT: vpand LCPI2_2(%rip), [[MASK]], {{%xmm[0-9]+}}			; CHECK-NEXT: vmovdqa {{.*#+}} xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
				; CHECK: vpshufb %xmm{{[0-9]+}}, %xmm{{[0-9]+}}, %xmm{{[0-9]+}}
				; CHECK: vpshufb %xmm{{[0-9]+}}, %xmm{{[0-9]+}}, %xmm{{[0-9]+}}
	; CHECK: retq			; CHECK: retq
	define void @test3(<4 x i32> %induction30, <4 x i16>* %tmp16, <4 x i16>* %tmp17, <4 x i16> %tmp3, <4 x i16> %tmp12) {			define void @test3(<4 x i32> %induction30, <4 x i16>* %tmp16, <4 x i16>* %tmp17, <4 x i16> %tmp3, <4 x i16> %tmp12) {
	%tmp6 = srem <4 x i32> %induction30, <i32 3, i32 3, i32 3, i32 3>			%tmp6 = srem <4 x i32> %induction30, <i32 3, i32 3, i32 3, i32 3>
	%tmp7 = icmp eq <4 x i32> %tmp6, zeroinitializer			%tmp7 = icmp eq <4 x i32> %tmp6, zeroinitializer
	%predphi = select <4 x i1> %tmp7, <4 x i16> %tmp3, <4 x i16> %tmp12			%predphi = select <4 x i1> %tmp7, <4 x i16> %tmp3, <4 x i16> %tmp12
	%predphi31 = select <4 x i1> %tmp7, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> zeroinitializer			%predphi31 = select <4 x i1> %tmp7, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> zeroinitializer

	store <4 x i16> %predphi31, <4 x i16>* %tmp16, align 8			store <4 x i16> %predphi31, <4 x i16>* %tmp16, align 8
	Show All 14 Lines

test/CodeGen/X86/widen_load-2.ll

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movw %[[R0]]x, (%[[PTR0:.*]])			; CHECK-NEXT: movw %[[R0]]x, (%[[PTR0:.*]])
	; CHECK-NEXT: movb $-98, 2(%[[PTR0]])			; CHECK-NEXT: movb $-98, 2(%[[PTR0]])
	; CHECK-NEXT: movdqa {{.*}}, %[[CONSTANT1:xmm[0-9]+]]			; CHECK-NEXT: movdqa {{.*}}, %[[CONSTANT1:xmm[0-9]+]]
	; CHECK-NEXT: pshufb %[[SHUFFLE_MASK]], %[[CONSTANT1]]			; CHECK-NEXT: pshufb %[[SHUFFLE_MASK]], %[[CONSTANT1]]
	; CHECK-NEXT: pmovzxwq %[[CONSTANT1]], %[[CONSTANT1]]			; CHECK-NEXT: pmovzxwq %[[CONSTANT1]], %[[CONSTANT1]]
	; CHECK-NEXT: movd %[[CONSTANT1]], %e[[R1:[abcd]]]x			; CHECK-NEXT: movd %[[CONSTANT1]], %e[[R1:[abcd]]]x
	; CHECK-NEXT: movw %[[R1]]x, (%[[PTR1:.*]])			; CHECK-NEXT: movw %[[R1]]x, (%[[PTR1:.*]])
	; CHECK-NEXT: movb $1, 2(%[[PTR1]])			; CHECK-NEXT: movb $1, 2(%[[PTR1]])
	; CHECK-NEXT: movl (%[[PTR0]]), [[TMP1:%e[abcd]+x]]			; CHECK-NEXT: pmovzxbd (%[[PTR0]]), %[[X0:xmm[0-9]+]]
	; CHECK-NEXT: movl [[TMP1]], [[TMP2:.*]]
	; CHECK-NEXT: pmovzxbd [[TMP2]], %[[X0:xmm[0-9]+]]
	; CHECK-NEXT: movdqa %[[X0]], %[[X1:xmm[0-9]+]]			; CHECK-NEXT: movdqa %[[X0]], %[[X1:xmm[0-9]+]]
	; CHECK-NEXT: psrld $1, %[[X1]]			; CHECK-NEXT: psrld $1, %[[X1]]
	; CHECK-NEXT: pblendw $192, %[[X0]], %[[X1]]			; CHECK-NEXT: pblendw $192, %[[X0]], %[[X1]]
	; CHECK-NEXT: pextrb $8, %[[X1]], 2(%{{.*}})			; CHECK-NEXT: pextrb $8, %[[X1]], 2(%{{.*}})
	; CHECK-NEXT: pshufb %[[SHUFFLE_MASK]], %[[X1]]			; CHECK-NEXT: pshufb %[[SHUFFLE_MASK]], %[[X1]]
	; CHECK-NEXT: pmovzxwq %[[X1]], %[[X3:xmm[0-9]+]]			; CHECK-NEXT: pmovzxwq %[[X1]], %[[X3:xmm[0-9]+]]
	; CHECK-NEXT: movd %[[X3]], %e[[R0:[abcd]]]x			; CHECK-NEXT: movd %[[X3]], %e[[R0:[abcd]]]x
	; CHECK-NEXT: movw %[[R0]]x, (%{{.*}})			; CHECK-NEXT: movw %[[R0]]x, (%{{.*}})
	Show All 16 Lines