This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
1/3
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1/3
buildvec-extract.ll
-
vector-shuffle-128-v16.ll
-
vector-shuffle-128-v8.ll
-
vector-shuffle-sse4a.ll

Differential D56784

[X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle
ClosedPublic

Authored by RKSimon on Jan 16 2019, 7:25 AM.

Download Raw Diff

Details

Reviewers

craig.topper
lebedev.ri
spatel
andreadb

Commits

rG85184017e9f7: [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle
rL352883: [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle

Summary

As suggested on PR40318, this patch uses PSLLDQ/PSRLDQ to lower shuffles to zero out the ends of a vector, leaving a sequential inner section.

For pre-SSSE3 we do this for shuffles with zeros at either end (requiring up to 3 shifts), but once PSHUFB is available I've limited this to shuffles with a single zeroable end (2 shifts).

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon created this revision.Jan 16 2019, 7:25 AM

The double-shift cases look good, but I'm skeptical about the triple-shift. Wouldn't those always be better with an 'and' mask followed by shift? We reduce the dependent chain of vector ops and instruction count for the cost of a speculatable constant pool load.

In D56784#1360580, @spatel wrote:

The double-shift cases look good, but I'm skeptical about the triple-shift. Wouldn't those always be better with an 'and' mask followed by shift? We reduce the dependent chain of vector ops and instruction count for the cost of a speculatable constant pool load.

I did consider that but then we contradict the "3 op limit" for older machines (like pre-SSSE3) before using "variable" shuffle masks - which includes AND masks.

In D56784#1361280, @RKSimon wrote:

In D56784#1360580, @spatel wrote:

The double-shift cases look good, but I'm skeptical about the triple-shift. Wouldn't those always be better with an 'and' mask followed by shift? We reduce the dependent chain of vector ops and instruction count for the cost of a speculatable constant pool load.

I did consider that but then we contradict the "3 op limit" for older machines (like pre-SSSE3) before using "variable" shuffle masks - which includes AND masks.

Ah, so we would expect an even later transform (combineX86ShufflesRecursively?) to squash that. Worth adding a TODO comment about that? Or maybe nobody cares about pre-SSSE3 perf that much to bother.

lib/Target/X86/X86ISelLowering.cpp
10939–10940	The pair part of the comment is over-specific for the top-level - move it below where we have the example sequences?
10960–10963	Could we do the simpler check/assert that V2 has been canonicalized to a zero constant?

lebedev.ri added inline comments.Jan 24 2019, 12:31 AM

test/CodeGen/X86/buildvec-extract.ll
408	I'm having trouble reading this pretty-print. Shouldn't it be something more like psrldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero ? Otherwise to me it reads as-if it wasn't `zext i16 to 16`, but `zext i16 to i64 + shl (64-16)` (i.e. zeros are not in MSB, but LSB)

craig.topper added inline comments.Jan 24 2019, 2:30 PM

test/CodeGen/X86/buildvec-extract.ll
408	The pretty printer prints LSB first. So the pslldq is putting bytes 0, 1, 2, and 3 of the original vector in the MSBs. Then the pslrdq takes bytes 14 and 15 from that which are really bytes 2 and 3 of the input and moves them to byte 0 and 1 of the output.

RKSimon marked an inline comment as done.Jan 29 2019, 4:27 AM

RKSimon added inline comments.

test/CodeGen/X86/buildvec-extract.ll
408	The joys of x86 idiosyncrasies..... This is a shuffle asm comment and that is the shuffle that is created.

RKSimon marked an inline comment as done.Jan 29 2019, 5:36 AM

RKSimon added inline comments.

lib/Target/X86/X86ISelLowering.cpp
10960–10963	Not easily, our shuffle mask canonicalization doesn't try to account for zero ops as it regresses shuffles in cases where we can't make use of it being zero.

rebased and updated comments

ping?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 1:26 AM

LGTM - but I think we should have a bug report and/or a TODO comment somewhere about the triple-shift if it's not there already.

This revision is now accepted and ready to land.Feb 1 2019, 6:09 AM

Closed by commit rL352883: [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle (authored by RKSimon). · Explain WhyFeb 1 2019, 8:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

	X86ISelLowering.cpp
	X86ISelLowering.cpp (revision 352491)

72 lines

test/

CodeGen/

X86/

	buildvec-extract.ll
	buildvec-extract.ll (revision 352491)

59 lines

	vector-shuffle-128-v16.ll
	vector-shuffle-128-v16.ll (revision 352491)

117 lines

	vector-shuffle-128-v8.ll
	vector-shuffle-128-v8.ll (revision 352491)

135 lines

	vector-shuffle-sse4a.ll
	vector-shuffle-sse4a.ll (revision 352491)

5 lines

Diff 184065

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,928 Lines • ▼ Show 20 Lines	static SDValue lowerShuffleAsRotate(const SDLoc &DL, MVT VT, SDValue V1,
int Rotation = matchShuffleAsRotate(Lo, Hi, Mask);		int Rotation = matchShuffleAsRotate(Lo, Hi, Mask);
if (Rotation <= 0)		if (Rotation <= 0)
return SDValue();		return SDValue();

return DAG.getNode(X86ISD::VALIGN, DL, VT, Lo, Hi,		return DAG.getNode(X86ISD::VALIGN, DL, VT, Lo, Hi,
DAG.getConstant(Rotation, DL, MVT::i8));		DAG.getConstant(Rotation, DL, MVT::i8));
}		}

		/// Try to lower a vector shuffle as a byte shift sequence.
		static SDValue lowerVectorShuffleAsByteShiftMask(
		const SDLoc &DL, MVT VT, SDValue V1, SDValue V2, ArrayRef<int> Mask,
		const APInt &Zeroable, const X86Subtarget &Subtarget, SelectionDAG &DAG) {
		spatelUnsubmitted Not Done Reply Inline Actions The pair part of the comment is over-specific for the top-level - move it below where we have the example sequences? spatel: The pair part of the comment is over-specific for the top-level - move it below where we have…
		assert(!isNoopShuffleMask(Mask) && "We shouldn't lower no-op shuffles!");
		assert(VT.is128BitVector() && "Only 128-bit vectors supported");

		// We need a shuffle that has zeros at one/both ends and a sequential
		// shuffle from one source within.
		unsigned ZeroLo = Zeroable.countTrailingOnes();
		unsigned ZeroHi = Zeroable.countLeadingOnes();
		if (!ZeroLo && !ZeroHi)
		return SDValue();

		unsigned NumElts = Mask.size();
		unsigned Len = NumElts - (ZeroLo + ZeroHi);
		if (!isSequentialOrUndefInRange(Mask, ZeroLo, Len, Mask[ZeroLo]))
		return SDValue();

		unsigned Scale = VT.getScalarSizeInBits() / 8;
		ArrayRef<int> StubMask = Mask.slice(ZeroLo, Len);
		if (!isUndefOrInRange(StubMask, 0, NumElts) &&
		!isUndefOrInRange(StubMask, NumElts, 2 * NumElts))
		return SDValue();

		SDValue Res = Mask[ZeroLo] < (int)NumElts ? V1 : V2;
		Res = DAG.getBitcast(MVT::v16i8, Res);
		spatelUnsubmitted Not Done Reply Inline Actions Could we do the simpler check/assert that V2 has been canonicalized to a zero constant? spatel: Could we do the simpler check/assert that V2 has been canonicalized to a zero constant?
		RKSimonAuthorUnsubmitted Done Reply Inline Actions Not easily, our shuffle mask canonicalization doesn't try to account for zero ops as it regresses shuffles in cases where we can't make use of it being zero. RKSimon: Not easily, our shuffle mask canonicalization doesn't try to account for zero ops as it…

		// Use VSHLDQ/VSRLDQ ops to zero the ends of a vector and leave an
		// inner sequential set of elements, possibly offset:
		// 01234567 --> zzzzzz01 --> 1zzzzzzz
		// 01234567 --> 4567zzzz --> zzzzz456
		// 01234567 --> z0123456 --> 3456zzzz --> zz3456zz
		if (ZeroLo == 0) {
		unsigned Shift = (NumElts - 1) - (Mask[ZeroLo + Len - 1] % NumElts);
		Res = DAG.getNode(X86ISD::VSHLDQ, DL, MVT::v16i8, Res,
		DAG.getConstant(Scale * Shift, DL, MVT::i8));
		Res = DAG.getNode(X86ISD::VSRLDQ, DL, MVT::v16i8, Res,
		DAG.getConstant(Scale * ZeroHi, DL, MVT::i8));
		} else if (ZeroHi == 0) {
		unsigned Shift = Mask[ZeroLo] % NumElts;
		Res = DAG.getNode(X86ISD::VSRLDQ, DL, MVT::v16i8, Res,
		DAG.getConstant(Scale * Shift, DL, MVT::i8));
		Res = DAG.getNode(X86ISD::VSHLDQ, DL, MVT::v16i8, Res,
		DAG.getConstant(Scale * ZeroLo, DL, MVT::i8));
		} else if (!Subtarget.hasSSSE3()) {
		// If we don't have PSHUFB then its worth avoiding an AND constant mask
		// by performing 3 byte shifts. Shuffle combining can kick in above that.
		unsigned Shift = (NumElts - 1) - (Mask[ZeroLo + Len - 1] % NumElts);
		Res = DAG.getNode(X86ISD::VSHLDQ, DL, MVT::v16i8, Res,
		DAG.getConstant(Scale * Shift, DL, MVT::i8));
		Shift += Mask[ZeroLo] % NumElts;
		Res = DAG.getNode(X86ISD::VSRLDQ, DL, MVT::v16i8, Res,
		DAG.getConstant(Scale * Shift, DL, MVT::i8));
		Res = DAG.getNode(X86ISD::VSHLDQ, DL, MVT::v16i8, Res,
		DAG.getConstant(Scale * ZeroLo, DL, MVT::i8));
		} else
		return SDValue();

		return DAG.getBitcast(VT, Res);
		}

/// Try to lower a vector shuffle as a bit shift (shifts in zeros).		/// Try to lower a vector shuffle as a bit shift (shifts in zeros).
///		///
/// Attempts to match a shuffle mask against the PSLL(W/D/Q/DQ) and		/// Attempts to match a shuffle mask against the PSLL(W/D/Q/DQ) and
/// PSRL(W/D/Q/DQ) SSE2 and AVX2 logical bit-shift instructions. The function		/// PSRL(W/D/Q/DQ) SSE2 and AVX2 logical bit-shift instructions. The function
/// matches elements from one of the input vectors shuffled to the left or		/// matches elements from one of the input vectors shuffled to the left or
/// right with zeroable elements 'shifted in'. It handles both the strictly		/// right with zeroable elements 'shifted in'. It handles both the strictly
/// bit-wise element shifts and the byte shift across an entire 128-bit double		/// bit-wise element shifts and the byte shift across an entire 128-bit double
/// quad word lane.		/// quad word lane.
▲ Show 20 Lines • Show All 2,388 Lines • ▼ Show 20 Lines	static SDValue lowerV8I16Shuffle(const SDLoc &DL, ArrayRef<int> Mask,
if (SDValue Rotate = lowerShuffleAsByteRotate(DL, MVT::v8i16, V1, V2, Mask,		if (SDValue Rotate = lowerShuffleAsByteRotate(DL, MVT::v8i16, V1, V2, Mask,
Subtarget, DAG))		Subtarget, DAG))
return Rotate;		return Rotate;

if (SDValue BitBlend =		if (SDValue BitBlend =
lowerShuffleAsBitBlend(DL, MVT::v8i16, V1, V2, Mask, DAG))		lowerShuffleAsBitBlend(DL, MVT::v8i16, V1, V2, Mask, DAG))
return BitBlend;		return BitBlend;

		// Try to use byte shift instructions to mask.
		if (SDValue V = lowerVectorShuffleAsByteShiftMask(
		DL, MVT::v8i16, V1, V2, Mask, Zeroable, Subtarget, DAG))
		return V;

// Try to lower by permuting the inputs into an unpack instruction.		// Try to lower by permuting the inputs into an unpack instruction.
if (SDValue Unpack = lowerShuffleAsPermuteAndUnpack(DL, MVT::v8i16, V1, V2,		if (SDValue Unpack = lowerShuffleAsPermuteAndUnpack(DL, MVT::v8i16, V1, V2,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return Unpack;		return Unpack;

// If we can't directly blend but can use PSHUFB, that will be better as it		// If we can't directly blend but can use PSHUFB, that will be better as it
// can both shuffle and set up the inefficient blend.		// can both shuffle and set up the inefficient blend.
if (!IsBlendSupported && Subtarget.hasSSSE3()) {		if (!IsBlendSupported && Subtarget.hasSSSE3()) {
▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	static SDValue lowerV16I8Shuffle(const SDLoc &DL, ArrayRef<int> Mask,
if (SDValue Masked = lowerShuffleAsBitMask(DL, MVT::v16i8, V1, V2, Mask,		if (SDValue Masked = lowerShuffleAsBitMask(DL, MVT::v16i8, V1, V2, Mask,
Zeroable, DAG))		Zeroable, DAG))
return Masked;		return Masked;

// Use dedicated unpack instructions for masks that match their pattern.		// Use dedicated unpack instructions for masks that match their pattern.
if (SDValue V = lowerShuffleWithUNPCK(DL, MVT::v16i8, Mask, V1, V2, DAG))		if (SDValue V = lowerShuffleWithUNPCK(DL, MVT::v16i8, Mask, V1, V2, DAG))
return V;		return V;

		// Try to use byte shift instructions to mask.
		if (SDValue V = lowerVectorShuffleAsByteShiftMask(
		DL, MVT::v16i8, V1, V2, Mask, Zeroable, Subtarget, DAG))
		return V;

// Check for SSSE3 which lets us lower all v16i8 shuffles much more directly		// Check for SSSE3 which lets us lower all v16i8 shuffles much more directly
// with PSHUFB. It is important to do this before we attempt to generate any		// with PSHUFB. It is important to do this before we attempt to generate any
// blends but after all of the single-input lowerings. If the single input		// blends but after all of the single-input lowerings. If the single input
// lowerings can find an instruction sequence that is faster than a PSHUFB, we		// lowerings can find an instruction sequence that is faster than a PSHUFB, we
// want to preserve that and we can DAG combine any longer sequences into		// want to preserve that and we can DAG combine any longer sequences into
// a PSHUFB in the end. But once we start blending from multiple inputs,		// a PSHUFB in the end. But once we start blending from multiple inputs,
// the complexity of DAG combining bad patterns back into PSHUFB is too high,		// the complexity of DAG combining bad patterns back into PSHUFB is too high,
// and there are very few patterns that would actually be faster than the		// and there are very few patterns that would actually be faster than the
▲ Show 20 Lines • Show All 29,437 Lines • Show Last 20 Lines

test/CodeGen/X86/buildvec-extract.ll

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines
; AVX-NEXT: retq		; AVX-NEXT: retq
%e = extractelement <8 x i16> %x, i32 0		%e = extractelement <8 x i16> %x, i32 0
%z = zext i16 %e to i64		%z = zext i16 %e to i64
%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 0		%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 0
ret <2 x i64> %r		ret <2 x i64> %r
}		}

define <2 x i64> @extract1_i16_zext_insert0_i64_undef(<8 x i16> %x) {		define <2 x i64> @extract1_i16_zext_insert0_i64_undef(<8 x i16> %x) {
; SSE2-LABEL: extract1_i16_zext_insert0_i64_undef:		; SSE-LABEL: extract1_i16_zext_insert0_i64_undef:
; SSE2: # %bb.0:		; SSE: # %bb.0:
; SSE2-NEXT: psrld $16, %xmm0		; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3]
; SSE2-NEXT: pxor %xmm1, %xmm1		; SSE-NEXT: psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm having trouble reading this pretty-print. Shouldn't it be something more like psrldq {{.#+}} xmm0 = zero,zero,zero,zero,zero,zero,xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero ? Otherwise to me it reads as-if it wasn't `zext i16 to 16`, but `zext i16 to i64 + shl (64-16)` (i.e. zeros are not in MSB, but LSB) lebedev.ri:* I'm having trouble reading this pretty-print. Shouldn't it be something more like ``` psrldq {{.
		craig.topperUnsubmitted Not Done Reply Inline Actions The pretty printer prints LSB first. So the pslldq is putting bytes 0, 1, 2, and 3 of the original vector in the MSBs. Then the pslrdq takes bytes 14 and 15 from that which are really bytes 2 and 3 of the input and moves them to byte 0 and 1 of the output. craig.topper: The pretty printer prints LSB first. So the pslldq is putting bytes 0, 1, 2, and 3 of the…
		RKSimonAuthorUnsubmitted Done Reply Inline Actions The joys of x86 idiosyncrasies..... This is a shuffle asm comment and that is the shuffle that is created. RKSimon: The joys of x86 idiosyncrasies..... This is a shuffle asm comment and that is the shuffle that…
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]		; SSE-NEXT: retq
; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE2-NEXT: retq
;
; SSE41-LABEL: extract1_i16_zext_insert0_i64_undef:
; SSE41: # %bb.0:
; SSE41-NEXT: psrld $16, %xmm0
; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; SSE41-NEXT: retq
;		;
; AVX-LABEL: extract1_i16_zext_insert0_i64_undef:		; AVX-LABEL: extract1_i16_zext_insert0_i64_undef:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vpsrld $16, %xmm0, %xmm0		; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3]
; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero		; AVX-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%e = extractelement <8 x i16> %x, i32 1		%e = extractelement <8 x i16> %x, i32 1
%z = zext i16 %e to i64		%z = zext i16 %e to i64
%r = insertelement <2 x i64> undef, i64 %z, i32 0		%r = insertelement <2 x i64> undef, i64 %z, i32 0
ret <2 x i64> %r		ret <2 x i64> %r
}		}

define <2 x i64> @extract1_i16_zext_insert0_i64_zero(<8 x i16> %x) {		define <2 x i64> @extract1_i16_zext_insert0_i64_zero(<8 x i16> %x) {
Show All 10 Lines
; AVX-NEXT: retq		; AVX-NEXT: retq
%e = extractelement <8 x i16> %x, i32 1		%e = extractelement <8 x i16> %x, i32 1
%z = zext i16 %e to i64		%z = zext i16 %e to i64
%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 0		%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 0
ret <2 x i64> %r		ret <2 x i64> %r
}		}

define <2 x i64> @extract2_i16_zext_insert0_i64_undef(<8 x i16> %x) {		define <2 x i64> @extract2_i16_zext_insert0_i64_undef(<8 x i16> %x) {
; SSE2-LABEL: extract2_i16_zext_insert0_i64_undef:		; SSE-LABEL: extract2_i16_zext_insert0_i64_undef:
; SSE2: # %bb.0:		; SSE: # %bb.0:
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]		; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
; SSE2-NEXT: pxor %xmm1, %xmm1		; SSE-NEXT: psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]		; SSE-NEXT: retq
; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE2-NEXT: retq
;
; SSE41-LABEL: extract2_i16_zext_insert0_i64_undef:
; SSE41: # %bb.0:
; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; SSE41-NEXT: retq
;		;
; AVX-LABEL: extract2_i16_zext_insert0_i64_undef:		; AVX-LABEL: extract2_i16_zext_insert0_i64_undef:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]		; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero		; AVX-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%e = extractelement <8 x i16> %x, i32 2		%e = extractelement <8 x i16> %x, i32 2
%z = zext i16 %e to i64		%z = zext i16 %e to i64
%r = insertelement <2 x i64> undef, i64 %z, i32 0		%r = insertelement <2 x i64> undef, i64 %z, i32 0
ret <2 x i64> %r		ret <2 x i64> %r
}		}

define <2 x i64> @extract2_i16_zext_insert0_i64_zero(<8 x i16> %x) {		define <2 x i64> @extract2_i16_zext_insert0_i64_zero(<8 x i16> %x) {
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%z = zext i16 %e to i64		%z = zext i16 %e to i64
%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 0		%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 0
ret <2 x i64> %r		ret <2 x i64> %r
}		}

define <2 x i64> @extract0_i16_zext_insert1_i64_undef(<8 x i16> %x) {		define <2 x i64> @extract0_i16_zext_insert1_i64_undef(<8 x i16> %x) {
; SSE2-LABEL: extract0_i16_zext_insert1_i64_undef:		; SSE2-LABEL: extract0_i16_zext_insert1_i64_undef:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]		; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm0		; SSE2-NEXT: psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
		; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: extract0_i16_zext_insert1_i64_undef:		; SSE41-LABEL: extract0_i16_zext_insert1_i64_undef:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,0,1]		; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,0,1]
; SSE41-NEXT: pxor %xmm0, %xmm0		; SSE41-NEXT: pxor %xmm0, %xmm0
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4],xmm0[5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4],xmm0[5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%z = zext i16 %e to i64		%z = zext i16 %e to i64
%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 1		%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 1
ret <2 x i64> %r		ret <2 x i64> %r
}		}

define <2 x i64> @extract2_i16_zext_insert1_i64_undef(<8 x i16> %x) {		define <2 x i64> @extract2_i16_zext_insert1_i64_undef(<8 x i16> %x) {
; SSE2-LABEL: extract2_i16_zext_insert1_i64_undef:		; SSE2-LABEL: extract2_i16_zext_insert1_i64_undef:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]		; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm0		; SSE2-NEXT: psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
		; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: extract2_i16_zext_insert1_i64_undef:		; SSE41-LABEL: extract2_i16_zext_insert1_i64_undef:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: pmovzxdq {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero		; SSE41-NEXT: pmovzxdq {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero
; SSE41-NEXT: pxor %xmm0, %xmm0		; SSE41-NEXT: pxor %xmm0, %xmm0
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4],xmm0[5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4],xmm0[5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
Show All 28 Lines	; AVX-NEXT: retq
%z = zext i16 %e to i64		%z = zext i16 %e to i64
%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 1		%r = insertelement <2 x i64> zeroinitializer, i64 %z, i32 1
ret <2 x i64> %r		ret <2 x i64> %r
}		}

define <2 x i64> @extract3_i16_zext_insert1_i64_undef(<8 x i16> %x) {		define <2 x i64> @extract3_i16_zext_insert1_i64_undef(<8 x i16> %x) {
; SSE2-LABEL: extract3_i16_zext_insert1_i64_undef:		; SSE2-LABEL: extract3_i16_zext_insert1_i64_undef:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]		; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
; SSE2-NEXT: pand {{.*}}(%rip), %xmm0		; SSE2-NEXT: psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
		; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: extract3_i16_zext_insert1_i64_undef:		; SSE41-LABEL: extract3_i16_zext_insert1_i64_undef:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]		; SSE41-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
; SSE41-NEXT: pxor %xmm1, %xmm1		; SSE41-NEXT: pxor %xmm1, %xmm1
; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4],xmm1[5,6,7]		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4],xmm1[5,6,7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
Show All 33 Lines

test/CodeGen/X86/vector-shuffle-128-v16.ll

	Show First 20 Lines • Show All 1,529 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 7, i32 16, i32 16, i32 16, i32 16, i32 16, i32 undef, i32 undef, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 16, i32 16>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32><i32 7, i32 16, i32 16, i32 16, i32 16, i32 16, i32 undef, i32 undef, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 16, i32 16>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz(<16 x i8> %a) {
	; SSE2-LABEL: shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz:			; SSE2-LABEL: shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: pand {{.*}}(%rip), %xmm0			; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6]
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: psrldq {{.*#+}} xmm0 = xmm0[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm0[0,1,2,3,6,5,6,7]
	; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,2,0,3]
	; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[3,2,0,0,4,5,6,7]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,4,4,4,4]
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,0]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,7,6,7]
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,3,2,1]
	; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[2,2,2,2,4,5,6,7]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,6,7,4]
	; SSE2-NEXT: packuswb %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz:			; SSSE3-LABEL: shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz:
	; SSSE3: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero			; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz:			; SSE41-LABEL: shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero			; SSE41-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz:			; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero			; AVX-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> <i8 0, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>, <16 x i32> <i32 16, i32 16, i32 16, i32 16, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> <i8 0, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>, <16 x i32> <i32 16, i32 16, i32 16, i32 16, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(<16 x i8> %a) {
	; SSE2-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; SSE-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE2: # %bb.0:			; SSE: # %bb.0:
	; SSE2-NEXT: pand {{.*}}(%rip), %xmm0			; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6]
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: psrldq {{.*#+}} xmm0 = xmm0[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: retq
	; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm0[0,0,2,3,4,5,6,7]
	; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,0]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,5,4,7]
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,0]
	; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[1,2,3,0,4,5,6,7]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,6,4,4]
	; SSE2-NEXT: packuswb %xmm1, %xmm0
	; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX1-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSSE3: # %bb.0:			; AVX1: # %bb.0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; AVX1-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6]
	; SSSE3-NEXT: retq			; AVX1-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX1-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX2-SLOW-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE41: # %bb.0:			; AVX2-SLOW: # %bb.0:
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; AVX2-SLOW-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6]
	; SSE41-NEXT: retq			; AVX2-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX2-SLOW-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX2-FAST-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; AVX: # %bb.0:			; AVX2-FAST: # %bb.0:
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; AVX2-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; AVX-NEXT: retq			; AVX2-FAST-NEXT: retq
				;
				; AVX512VL-LABEL: shuffle_v16i8_01_02_03_04_05_06_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[1,2,3,4,5,6],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX512VL-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> <i8 0, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> <i8 0, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06(<16 x i8> %a) {
	; SSE2-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:			; SSE-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:
	; SSE2: # %bb.0:			; SSE: # %bb.0:
	; SSE2-NEXT: pand {{.*}}(%rip), %xmm0			; SSE-NEXT: psrldq {{.*#+}} xmm0 = xmm0[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],zero
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: retq
	; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm0[0,0,2,3,4,5,6,7]
	; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,0]
	; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,3,4,5,6,7]
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,0]
	; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,1,2,4,5,6,7]
	; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,7,4,5,6]
	; SSE2-NEXT: packuswb %xmm0, %xmm1
	; SSE2-NEXT: movdqa %xmm1, %xmm0
	; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:			; AVX1-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:
	; SSSE3: # %bb.0:			; AVX1: # %bb.0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[1,2,3,4,5,6]			; AVX1-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],zero
	; SSSE3-NEXT: retq			; AVX1-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
				; AVX1-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:			; AVX2-SLOW-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:
	; SSE41: # %bb.0:			; AVX2-SLOW: # %bb.0:
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[1,2,3,4,5,6]			; AVX2-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],zero
	; SSE41-NEXT: retq			; AVX2-SLOW-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
				; AVX2-SLOW-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:			; AVX2-FAST-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:
	; AVX: # %bb.0:			; AVX2-FAST: # %bb.0:
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[1,2,3,4,5,6]			; AVX2-FAST-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[1,2,3,4,5,6]
	; AVX-NEXT: retq			; AVX2-FAST-NEXT: retq
				;
				; AVX512VL-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_01_02_03_04_05_06:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[1,2,3,4,5,6]
				; AVX512VL-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> <i8 0, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>, <16 x i32> <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> <i8 0, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>, <16 x i32> <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @PR12412(<16 x i8> %inval1, <16 x i8> %inval2) {			define <16 x i8> @PR12412(<16 x i8> %inval1, <16 x i8> %inval2) {
	; SSE2-LABEL: PR12412:			; SSE2-LABEL: PR12412:
	; SSE2: # %bb.0: # %entry			; SSE2: # %bb.0: # %entry
	; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255,255,255,255,255]			; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255,255,255,255,255]
	▲ Show 20 Lines • Show All 456 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-128-v8.ll

	Show First 20 Lines • Show All 2,471 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 8, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 undef>			%shuffle = shufflevector <8 x i16> %a, <8 x i16> zeroinitializer, <8 x i32> <i32 8, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 undef>

	ret <8 x i16> %shuffle			ret <8 x i16> %shuffle
	}			}

	; PR40306			; PR40306
	define <8 x i16> @shuffle_v8i16_9zzzuuuu(<8 x i16> %x) {			define <8 x i16> @shuffle_v8i16_9zzzuuuu(<8 x i16> %x) {
	; SSE2-LABEL: shuffle_v8i16_9zzzuuuu:			; SSE-LABEL: shuffle_v8i16_9zzzuuuu:
	; SSE2: # %bb.0:			; SSE: # %bb.0:
	; SSE2-NEXT: psrld $16, %xmm0			; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3]
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSE-NEXT: retq
	; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v8i16_9zzzuuuu:			; AVX1-LABEL: shuffle_v8i16_9zzzuuuu:
	; SSSE3: # %bb.0:			; AVX1: # %bb.0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[2,3],zero,zero,zero,zero,zero,zero,xmm0[u,u,u,u,u,u,u,u]			; AVX1-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3]
	; SSSE3-NEXT: retq			; AVX1-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX1-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v8i16_9zzzuuuu:			; AVX2-SLOW-LABEL: shuffle_v8i16_9zzzuuuu:
	; SSE41: # %bb.0:			; AVX2-SLOW: # %bb.0:
	; SSE41-NEXT: psrld $16, %xmm0			; AVX2-SLOW-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3]
	; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; AVX2-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq			; AVX2-SLOW-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v8i16_9zzzuuuu:			; AVX2-FAST-LABEL: shuffle_v8i16_9zzzuuuu:
	; AVX: # %bb.0:			; AVX2-FAST: # %bb.0:
	; AVX-NEXT: vpsrld $16, %xmm0, %xmm0			; AVX2-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[2,3],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; AVX2-FAST-NEXT: retq
	; AVX-NEXT: retq			;
				; AVX512VL-SLOW-LABEL: shuffle_v8i16_9zzzuuuu:
				; AVX512VL-SLOW: # %bb.0:
				; AVX512VL-SLOW-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3]
				; AVX512VL-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX512VL-SLOW-NEXT: retq
				;
				; AVX512VL-FAST-LABEL: shuffle_v8i16_9zzzuuuu:
				; AVX512VL-FAST: # %bb.0:
				; AVX512VL-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[2,3],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX512VL-FAST-NEXT: retq
	%r = shufflevector <8 x i16> zeroinitializer, <8 x i16> %x, <8 x i32> <i32 9, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%r = shufflevector <8 x i16> zeroinitializer, <8 x i16> %x, <8 x i32> <i32 9, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x i16> %r			ret <8 x i16> %r
	}			}

	; PR40318			; PR40318
	define <8 x i16> @shuffle_v8i16_2zzzuuuu(<8 x i16> %x) {			define <8 x i16> @shuffle_v8i16_2zzzuuuu(<8 x i16> %x) {
	; SSE2-LABEL: shuffle_v8i16_2zzzuuuu:			; SSE-LABEL: shuffle_v8i16_2zzzuuuu:
	; SSE2: # %bb.0:			; SSE: # %bb.0:
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSE-NEXT: retq
	; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: shuffle_v8i16_2zzzuuuu:			; AVX1-LABEL: shuffle_v8i16_2zzzuuuu:
	; SSSE3: # %bb.0:			; AVX1: # %bb.0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[4,5],zero,zero,zero,zero,zero,zero,xmm0[u,u,u,u,u,u,u,u]			; AVX1-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
	; SSSE3-NEXT: retq			; AVX1-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX1-NEXT: retq
	;			;
	; SSE41-LABEL: shuffle_v8i16_2zzzuuuu:			; AVX2-SLOW-LABEL: shuffle_v8i16_2zzzuuuu:
	; SSE41: # %bb.0:			; AVX2-SLOW: # %bb.0:
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; AVX2-SLOW-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
	; SSE41-NEXT: pmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; AVX2-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq			; AVX2-SLOW-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v8i16_2zzzuuuu:			; AVX2-FAST-LABEL: shuffle_v8i16_2zzzuuuu:
	; AVX: # %bb.0:			; AVX2-FAST: # %bb.0:
	; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; AVX2-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; AVX-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; AVX2-FAST-NEXT: retq
	; AVX-NEXT: retq			;
				; AVX512VL-SLOW-LABEL: shuffle_v8i16_2zzzuuuu:
				; AVX512VL-SLOW: # %bb.0:
				; AVX512VL-SLOW-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
				; AVX512VL-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX512VL-SLOW-NEXT: retq
				;
				; AVX512VL-FAST-LABEL: shuffle_v8i16_2zzzuuuu:
				; AVX512VL-FAST: # %bb.0:
				; AVX512VL-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AVX512VL-FAST-NEXT: retq
	%r = shufflevector <8 x i16> %x, <8 x i16> zeroinitializer, <8 x i32> <i32 2, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef, i32 undef>			%r = shufflevector <8 x i16> %x, <8 x i16> zeroinitializer, <8 x i32> <i32 2, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x i16> %r			ret <8 x i16> %r
	}			}

	define <8 x i16> @shuffle_v8i16_3uu6zzzz(<8 x i16> %x) {			define <8 x i16> @shuffle_v8i16_3uu6zzzz(<8 x i16> %x) {
	; SSE2-LABEL: shuffle_v8i16_3uu6zzzz:			; SSE-LABEL: shuffle_v8i16_3uu6zzzz:
	; SSE2: # %bb.0:			; SSE: # %bb.0:
	; SSE2-NEXT: psrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero			; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
	; SSE2-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero			; SSE-NEXT: psrldq {{.*#+}} xmm0 = xmm0[8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero
	; SSE2-NEXT: retq			; SSE-NEXT: retq
	;
	; SSSE3-LABEL: shuffle_v8i16_3uu6zzzz:
	; SSSE3: # %bb.0:
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[6,7,u,u,u,u,12,13],zero,zero,zero,zero,zero,zero,zero,zero
	; SSSE3-NEXT: retq
	;
	; SSE41-LABEL: shuffle_v8i16_3uu6zzzz:
	; SSE41: # %bb.0:
	; SSE41-NEXT: psrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero
	; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: shuffle_v8i16_3uu6zzzz:			; AVX1-LABEL: shuffle_v8i16_3uu6zzzz:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero			; AVX1-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
	; AVX1-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero			; AVX1-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-SLOW-LABEL: shuffle_v8i16_3uu6zzzz:			; AVX2-SLOW-LABEL: shuffle_v8i16_3uu6zzzz:
	; AVX2-SLOW: # %bb.0:			; AVX2-SLOW: # %bb.0:
	; AVX2-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero			; AVX2-SLOW-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
	; AVX2-SLOW-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero			; AVX2-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero
	; AVX2-SLOW-NEXT: retq			; AVX2-SLOW-NEXT: retq
	;			;
	; AVX2-FAST-LABEL: shuffle_v8i16_3uu6zzzz:			; AVX2-FAST-LABEL: shuffle_v8i16_3uu6zzzz:
	; AVX2-FAST: # %bb.0:			; AVX2-FAST: # %bb.0:
	; AVX2-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13],zero,zero,zero,zero,zero,zero,zero,zero			; AVX2-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13],zero,zero,zero,zero,zero,zero,zero,zero
	; AVX2-FAST-NEXT: retq			; AVX2-FAST-NEXT: retq
	;			;
	; AVX512VL-SLOW-LABEL: shuffle_v8i16_3uu6zzzz:			; AVX512VL-SLOW-LABEL: shuffle_v8i16_3uu6zzzz:
	; AVX512VL-SLOW: # %bb.0:			; AVX512VL-SLOW: # %bb.0:
	; AVX512VL-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero			; AVX512VL-SLOW-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
	; AVX512VL-SLOW-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero			; AVX512VL-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero
	; AVX512VL-SLOW-NEXT: retq			; AVX512VL-SLOW-NEXT: retq
	;			;
	; AVX512VL-FAST-LABEL: shuffle_v8i16_3uu6zzzz:			; AVX512VL-FAST-LABEL: shuffle_v8i16_3uu6zzzz:
	; AVX512VL-FAST: # %bb.0:			; AVX512VL-FAST: # %bb.0:
	; AVX512VL-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13],zero,zero,zero,zero,zero,zero,zero,zero			; AVX512VL-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13],zero,zero,zero,zero,zero,zero,zero,zero
	; AVX512VL-FAST-NEXT: retq			; AVX512VL-FAST-NEXT: retq
	%r = shufflevector <8 x i16> %x, <8 x i16> zeroinitializer, <8 x i32> <i32 3, i32 undef, i32 undef, i32 6, i32 8, i32 8, i32 8, i32 8>			%r = shufflevector <8 x i16> %x, <8 x i16> zeroinitializer, <8 x i32> <i32 3, i32 undef, i32 undef, i32 6, i32 8, i32 8, i32 8, i32 8>
	ret <8 x i16> %r			ret <8 x i16> %r
	▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-sse4a.ll

	Show First 20 Lines • Show All 404 Lines • ▼ Show 20 Lines
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq
	%1 = shufflevector <16 x i8> %v, <16 x i8> zeroinitializer, <16 x i32> <i32 undef, i32 0, i32 5, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i8> %v, <16 x i8> zeroinitializer, <16 x i32> <i32 undef, i32 0, i32 5, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <16 x i8> %1			ret <16 x i8> %1
	}			}

	define <16 x i8> @shuffle_uu_16_4_16_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu(<16 x i8> %v) {			define <16 x i8> @shuffle_uu_16_4_16_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu(<16 x i8> %v) {
	; AMD10H-LABEL: shuffle_uu_16_4_16_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu:			; AMD10H-LABEL: shuffle_uu_16_4_16_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu:
	; AMD10H: # %bb.0:			; AMD10H: # %bb.0:
	; AMD10H-NEXT: psrlq $16, %xmm0			; AMD10H-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4]
	; AMD10H-NEXT: pand {{.*}}(%rip), %xmm0			; AMD10H-NEXT: psrldq {{.*#+}} xmm0 = xmm0[15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; AMD10H-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
	; AMD10H-NEXT: retq			; AMD10H-NEXT: retq
	;			;
	; BTVER1-LABEL: shuffle_uu_16_4_16_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu:			; BTVER1-LABEL: shuffle_uu_16_4_16_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu:
	; BTVER1: # %bb.0:			; BTVER1: # %bb.0:
	; BTVER1-NEXT: pshufb {{.*#+}} xmm0 = xmm0[u],zero,xmm0[4],zero,xmm0[u,u,u,u,u,u,u,u,u,u,u,u]			; BTVER1-NEXT: pshufb {{.*#+}} xmm0 = xmm0[u],zero,xmm0[4],zero,xmm0[u,u,u,u,u,u,u,u,u,u,u,u]
	; BTVER1-NEXT: retq			; BTVER1-NEXT: retq
	;			;
	; BTVER2-LABEL: shuffle_uu_16_4_16_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu:			; BTVER2-LABEL: shuffle_uu_16_4_16_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu:
	Show All 18 Lines