This is an archive of the discontinued LLVM Phabricator instance.

[AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ
ClosedPublic

Authored by craig.topper on Nov 3 2016, 11:21 PM.

Download Raw Diff

Details

Reviewers

RKSimon
delena

Commits

rG5cb13062d277: [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ
rL286709: [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ

Summary

VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper updated this revision to Diff 76911.Nov 3 2016, 11:21 PM

craig.topper retitled this revision from to [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ.

craig.topper updated this object.

craig.topper added reviewers: delena, RKSimon.

craig.topper added subscribers: llvm-commits, Farhana.

Please can you add the new tests to trunk with the existing lowering?

For 128-bit vectors, we favor VALIGND/Q over PALIGNR so that we naturely match the vector element size in case the shuffle is part of a masked operation

Would we be better off putting this logic into combineSelect() and keep to PALIGNR? Detect mask/maskz selects taking a bitcasted target shuffles/logicals/whatever and attempt to narrow/widen them back? Not certain if this would actually work in general, its just a thought.

That sounds like a good idea. We can leave it as PALIGNR and not even try to do PALIGND/Q for 128-bit until combineSelect. Less code in shuffle lowering is probably best.

Removing 128-bit support at Simon's suggestion.

Add current codegen to trunk to show the diffs in the tests?

lib/Target/X86/X86ISelLowering.cpp
12215 ↗	(On Diff #77144)	Will this ever fire?

Updating tests to only show difference in codegen with this patch.

lib/Target/X86/X86ISelLowering.cpp
12215 ↗	(On Diff #77144)	VALIGND/Q work on the whole vector. PALIGNR works on 128-bit lanes so they should cover different cases.

LGTM - please can you create a PR about the AVX512 combineSelect shuffle/logical widening/narrowing idea (unless you intend to do it very soon).

I'll see if we can add matchVectorShuffleAsRotate to matchBinaryPermuteVectorShuffle.

This revision is now accepted and ready to land.Nov 11 2016, 10:56 AM

Closed by commit rL286709: [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ (authored by ctopper). · Explain WhyNov 11 2016, 9:15 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

124 lines

test/

CodeGen/

X86/

vector-shuffle-256-v4.ll

3 lines

vector-shuffle-256-v8.ll

6 lines

vector-shuffle-512-v16.ll

6 lines

vector-shuffle-512-v8.ll

12 lines

Diff 77709

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,782 Lines • ▼ Show 20 Lines	if (SDValue BlendPerm =
lowerVectorShuffleAsBlendAndPermute(DL, VT, V1, V2, Mask, DAG))		lowerVectorShuffleAsBlendAndPermute(DL, VT, V1, V2, Mask, DAG))
return BlendPerm;		return BlendPerm;

V1 = DAG.getVectorShuffle(VT, DL, V1, DAG.getUNDEF(VT), V1Mask);		V1 = DAG.getVectorShuffle(VT, DL, V1, DAG.getUNDEF(VT), V1Mask);
V2 = DAG.getVectorShuffle(VT, DL, V2, DAG.getUNDEF(VT), V2Mask);		V2 = DAG.getVectorShuffle(VT, DL, V2, DAG.getUNDEF(VT), V2Mask);
return DAG.getVectorShuffle(VT, DL, V1, V2, BlendMask);		return DAG.getVectorShuffle(VT, DL, V1, V2, BlendMask);
}		}

/// \brief Try to lower a vector shuffle as a byte rotation.		/// \brief Try to lower a vector shuffle as a rotation.
///
/// SSSE3 has a generic PALIGNR instruction in x86 that will do an arbitrary
/// byte-rotation of the concatenation of two vectors; pre-SSSE3 can use
/// a PSRLDQ/PSLLDQ/POR pattern to get a similar effect. This routine will
/// try to generically lower a vector shuffle through such an pattern. It
/// does not check for the profitability of lowering either as PALIGNR or
/// PSRLDQ/PSLLDQ/POR, only whether the mask is valid to lower in that form.
/// This matches shuffle vectors that look like:
///		///
/// v8i16 [11, 12, 13, 14, 15, 0, 1, 2]		/// This is used for support PALIGNR for SSSE3 or VALIGND/Q for AVX512.
///		static int matchVectorShuffleAsRotate(SDValue &V1, SDValue &V2,
/// Essentially it concatenates V1 and V2, shifts right by some number of
/// elements, and takes the low elements as the result. Note that while this is
/// specified as a right shift because x86 is little-endian, it is a *left
/// rotate* of the vector lanes.
static int matchVectorShuffleAsByteRotate(MVT VT, SDValue &V1, SDValue &V2,
ArrayRef<int> Mask) {		ArrayRef<int> Mask) {
// Don't accept any shuffles with zero elements.		int NumElts = Mask.size();
if (any_of(Mask, [](int M) { return M == SM_SentinelZero; }))
return -1;

// PALIGNR works on 128-bit lanes.
SmallVector<int, 16> RepeatedMask;
if (!is128BitLaneRepeatedShuffleMask(VT, Mask, RepeatedMask))
return -1;

int NumElts = RepeatedMask.size();

// We need to detect various ways of spelling a rotation:		// We need to detect various ways of spelling a rotation:
// [11, 12, 13, 14, 15, 0, 1, 2]		// [11, 12, 13, 14, 15, 0, 1, 2]
// [-1, 12, 13, 14, -1, -1, 1, -1]		// [-1, 12, 13, 14, -1, -1, 1, -1]
// [-1, -1, -1, -1, -1, -1, 1, 2]		// [-1, -1, -1, -1, -1, -1, 1, 2]
// [ 3, 4, 5, 6, 7, 8, 9, 10]		// [ 3, 4, 5, 6, 7, 8, 9, 10]
// [-1, 4, 5, 6, -1, -1, 9, -1]		// [-1, 4, 5, 6, -1, -1, 9, -1]
// [-1, 4, 5, 6, -1, -1, -1, -1]		// [-1, 4, 5, 6, -1, -1, -1, -1]
int Rotation = 0;		int Rotation = 0;
SDValue Lo, Hi;		SDValue Lo, Hi;
for (int i = 0; i < NumElts; ++i) {		for (int i = 0; i < NumElts; ++i) {
int M = RepeatedMask[i];		int M = Mask[i];
assert((M == SM_SentinelUndef \|\| (0 <= M && M < (2*NumElts))) &&		assert((M == SM_SentinelUndef \|\| (0 <= M && M < (2*NumElts))) &&
"Unexpected mask index.");		"Unexpected mask index.");
if (M < 0)		if (M < 0)
continue;		continue;

// Determine where a rotated vector would have started.		// Determine where a rotated vector would have started.
int StartIdx = i - (M % NumElts);		int StartIdx = i - (M % NumElts);
if (StartIdx == 0)		if (StartIdx == 0)
Show All 35 Lines	static int matchVectorShuffleAsRotate(SDValue &V1, SDValue &V2,
if (!Lo)		if (!Lo)
Lo = Hi;		Lo = Hi;
else if (!Hi)		else if (!Hi)
Hi = Lo;		Hi = Lo;

V1 = Lo;		V1 = Lo;
V2 = Hi;		V2 = Hi;

		return Rotation;
		}

		/// \brief Try to lower a vector shuffle as a byte rotation.
		///
		/// SSSE3 has a generic PALIGNR instruction in x86 that will do an arbitrary
		/// byte-rotation of the concatenation of two vectors; pre-SSSE3 can use
		/// a PSRLDQ/PSLLDQ/POR pattern to get a similar effect. This routine will
		/// try to generically lower a vector shuffle through such an pattern. It
		/// does not check for the profitability of lowering either as PALIGNR or
		/// PSRLDQ/PSLLDQ/POR, only whether the mask is valid to lower in that form.
		/// This matches shuffle vectors that look like:
		///
		/// v8i16 [11, 12, 13, 14, 15, 0, 1, 2]
		///
		/// Essentially it concatenates V1 and V2, shifts right by some number of
		/// elements, and takes the low elements as the result. Note that while this is
		/// specified as a right shift because x86 is little-endian, it is a *left
		/// rotate* of the vector lanes.
		static int matchVectorShuffleAsByteRotate(MVT VT, SDValue &V1, SDValue &V2,
		ArrayRef<int> Mask) {
		// Don't accept any shuffles with zero elements.
		if (any_of(Mask, [](int M) { return M == SM_SentinelZero; }))
		return -1;

		// PALIGNR works on 128-bit lanes.
		SmallVector<int, 16> RepeatedMask;
		if (!is128BitLaneRepeatedShuffleMask(VT, Mask, RepeatedMask))
		return -1;

		int Rotation = matchVectorShuffleAsRotate(V1, V2, RepeatedMask);
		if (Rotation <= 0)
		return -1;

// PALIGNR rotates bytes, so we need to scale the		// PALIGNR rotates bytes, so we need to scale the
// rotation based on how many bytes are in the vector lane.		// rotation based on how many bytes are in the vector lane.
		int NumElts = RepeatedMask.size();
int Scale = 16 / NumElts;		int Scale = 16 / NumElts;
return Rotation * Scale;		return Rotation * Scale;
}		}

static SDValue lowerVectorShuffleAsByteRotate(const SDLoc &DL, MVT VT,		static SDValue lowerVectorShuffleAsByteRotate(const SDLoc &DL, MVT VT,
SDValue V1, SDValue V2,		SDValue V1, SDValue V2,
ArrayRef<int> Mask,		ArrayRef<int> Mask,
const X86Subtarget &Subtarget,		const X86Subtarget &Subtarget,
Show All 34 Lines	static SDValue lowerVectorShuffleAsByteRotate(const SDLoc &DL, MVT VT,
SDValue LoShift = DAG.getNode(X86ISD::VSHLDQ, DL, MVT::v16i8, Lo,		SDValue LoShift = DAG.getNode(X86ISD::VSHLDQ, DL, MVT::v16i8, Lo,
DAG.getConstant(LoByteShift, DL, MVT::i8));		DAG.getConstant(LoByteShift, DL, MVT::i8));
SDValue HiShift = DAG.getNode(X86ISD::VSRLDQ, DL, MVT::v16i8, Hi,		SDValue HiShift = DAG.getNode(X86ISD::VSRLDQ, DL, MVT::v16i8, Hi,
DAG.getConstant(HiByteShift, DL, MVT::i8));		DAG.getConstant(HiByteShift, DL, MVT::i8));
return DAG.getBitcast(VT,		return DAG.getBitcast(VT,
DAG.getNode(ISD::OR, DL, MVT::v16i8, LoShift, HiShift));		DAG.getNode(ISD::OR, DL, MVT::v16i8, LoShift, HiShift));
}		}

		/// \brief Try to lower a vector shuffle as a dword/qword rotation.
		///
		/// AVX512 has a VALIGND/VALIGNQ instructions that will do an arbitrary
		/// rotation of the concatenation of two vectors; This routine will
		/// try to generically lower a vector shuffle through such an pattern.
		///
		/// Essentially it concatenates V1 and V2, shifts right by some number of
		/// elements, and takes the low elements as the result. Note that while this is
		/// specified as a right shift because x86 is little-endian, it is a *left
		/// rotate* of the vector lanes.
		static SDValue lowerVectorShuffleAsRotate(const SDLoc &DL, MVT VT,
		SDValue V1, SDValue V2,
		ArrayRef<int> Mask,
		const X86Subtarget &Subtarget,
		SelectionDAG &DAG) {
		assert((VT.getScalarType() == MVT::i32 \|\| VT.getScalarType() == MVT::i64) &&
		"Only 32-bit and 64-bit elements are supported!");

		// 128/256-bit vectors are only supported with VLX.
		assert((Subtarget.hasVLX() \|\| (!VT.is128BitVector() && !VT.is256BitVector()))
		&& "VLX required for 128/256-bit vectors");

		SDValue Lo = V1, Hi = V2;
		int Rotation = matchVectorShuffleAsRotate(Lo, Hi, Mask);
		if (Rotation <= 0)
		return SDValue();

		return DAG.getNode(X86ISD::VALIGN, DL, VT, Lo, Hi,
		DAG.getConstant(Rotation, DL, MVT::i8));
		}

/// \brief Try to lower a vector shuffle as a bit shift (shifts in zeros).		/// \brief Try to lower a vector shuffle as a bit shift (shifts in zeros).
///		///
/// Attempts to match a shuffle mask against the PSLL(W/D/Q/DQ) and		/// Attempts to match a shuffle mask against the PSLL(W/D/Q/DQ) and
/// PSRL(W/D/Q/DQ) SSE2 and AVX2 logical bit-shift instructions. The function		/// PSRL(W/D/Q/DQ) SSE2 and AVX2 logical bit-shift instructions. The function
/// matches elements from one of the input vectors shuffled to the left or		/// matches elements from one of the input vectors shuffled to the left or
/// right with zeroable elements 'shifted in'. It handles both the strictly		/// right with zeroable elements 'shifted in'. It handles both the strictly
/// bit-wise element shifts and the byte shift across an entire 128-bit double		/// bit-wise element shifts and the byte shift across an entire 128-bit double
/// quad word lane.		/// quad word lane.
▲ Show 20 Lines • Show All 3,558 Lines • ▼ Show 20 Lines	return DAG.getNode(X86ISD::VPERMI, DL, MVT::v4i64, V1,
getV4X86ShuffleImm8ForMask(Mask, DL, DAG));		getV4X86ShuffleImm8ForMask(Mask, DL, DAG));
}		}

// Try to use shift instructions.		// Try to use shift instructions.
if (SDValue Shift = lowerVectorShuffleAsShift(DL, MVT::v4i64, V1, V2, Mask,		if (SDValue Shift = lowerVectorShuffleAsShift(DL, MVT::v4i64, V1, V2, Mask,
Zeroable, Subtarget, DAG))		Zeroable, Subtarget, DAG))
return Shift;		return Shift;

		// If we have VLX support, we can use VALIGN.
		if (Subtarget.hasVLX())
		if (SDValue Rotate = lowerVectorShuffleAsRotate(DL, MVT::v4i64, V1, V2,
		Mask, Subtarget, DAG))
		return Rotate;

		// Try to use PALIGNR.
if (SDValue Rotate = lowerVectorShuffleAsByteRotate(DL, MVT::v4i64, V1, V2,		if (SDValue Rotate = lowerVectorShuffleAsByteRotate(DL, MVT::v4i64, V1, V2,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return Rotate;		return Rotate;

// Use dedicated unpack instructions for masks that match their pattern.		// Use dedicated unpack instructions for masks that match their pattern.
if (SDValue V =		if (SDValue V =
lowerVectorShuffleWithUNPCK(DL, MVT::v4i64, Mask, V1, V2, DAG))		lowerVectorShuffleWithUNPCK(DL, MVT::v4i64, Mask, V1, V2, DAG))
return V;		return V;
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	if (SDValue V =
return V;		return V;
}		}

// Try to use shift instructions.		// Try to use shift instructions.
if (SDValue Shift = lowerVectorShuffleAsShift(DL, MVT::v8i32, V1, V2, Mask,		if (SDValue Shift = lowerVectorShuffleAsShift(DL, MVT::v8i32, V1, V2, Mask,
Zeroable, Subtarget, DAG))		Zeroable, Subtarget, DAG))
return Shift;		return Shift;

		// If we have VLX support, we can use VALIGN.
		if (Subtarget.hasVLX())
		if (SDValue Rotate = lowerVectorShuffleAsRotate(DL, MVT::v8i32, V1, V2,
		Mask, Subtarget, DAG))
		return Rotate;

// Try to use byte rotation instructions.		// Try to use byte rotation instructions.
if (SDValue Rotate = lowerVectorShuffleAsByteRotate(		if (SDValue Rotate = lowerVectorShuffleAsByteRotate(
DL, MVT::v8i32, V1, V2, Mask, Subtarget, DAG))		DL, MVT::v8i32, V1, V2, Mask, Subtarget, DAG))
return Rotate;		return Rotate;

// Try to create an in-lane repeating shuffle mask and then shuffle the		// Try to create an in-lane repeating shuffle mask and then shuffle the
// results into the target lanes.		// results into the target lanes.
if (SDValue V = lowerShuffleAsRepeatedMaskAndLanePermute(		if (SDValue V = lowerShuffleAsRepeatedMaskAndLanePermute(
▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	if (is256BitLaneRepeatedShuffleMask(MVT::v8i64, Mask, Repeated256Mask))
getV4X86ShuffleImm8ForMask(Repeated256Mask, DL, DAG));		getV4X86ShuffleImm8ForMask(Repeated256Mask, DL, DAG));
}		}

// Try to use shift instructions.		// Try to use shift instructions.
if (SDValue Shift = lowerVectorShuffleAsShift(DL, MVT::v8i64, V1, V2, Mask,		if (SDValue Shift = lowerVectorShuffleAsShift(DL, MVT::v8i64, V1, V2, Mask,
Zeroable, Subtarget, DAG))		Zeroable, Subtarget, DAG))
return Shift;		return Shift;

		// Try to use VALIGN.
		if (SDValue Rotate = lowerVectorShuffleAsRotate(DL, MVT::v8i64, V1, V2,
		Mask, Subtarget, DAG))
		return Rotate;

		// Try to use PALIGNR.
if (SDValue Rotate = lowerVectorShuffleAsByteRotate(DL, MVT::v8i64, V1, V2,		if (SDValue Rotate = lowerVectorShuffleAsByteRotate(DL, MVT::v8i64, V1, V2,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
return Rotate;		return Rotate;

if (SDValue Unpck =		if (SDValue Unpck =
lowerVectorShuffleWithUNPCK(DL, MVT::v8i64, Mask, V1, V2, DAG))		lowerVectorShuffleWithUNPCK(DL, MVT::v8i64, Mask, V1, V2, DAG))
return Unpck;		return Unpck;

Show All 33 Lines	if (SDValue V =
return V;		return V;
}		}

// Try to use shift instructions.		// Try to use shift instructions.
if (SDValue Shift = lowerVectorShuffleAsShift(DL, MVT::v16i32, V1, V2, Mask,		if (SDValue Shift = lowerVectorShuffleAsShift(DL, MVT::v16i32, V1, V2, Mask,
Zeroable, Subtarget, DAG))		Zeroable, Subtarget, DAG))
return Shift;		return Shift;

		// Try to use VALIGN.
		if (SDValue Rotate = lowerVectorShuffleAsRotate(DL, MVT::v16i32, V1, V2,
		Mask, Subtarget, DAG))
		return Rotate;

// Try to use byte rotation instructions.		// Try to use byte rotation instructions.
if (Subtarget.hasBWI())		if (Subtarget.hasBWI())
if (SDValue Rotate = lowerVectorShuffleAsByteRotate(		if (SDValue Rotate = lowerVectorShuffleAsByteRotate(
DL, MVT::v16i32, V1, V2, Mask, Subtarget, DAG))		DL, MVT::v16i32, V1, V2, Mask, Subtarget, DAG))
return Rotate;		return Rotate;

return lowerVectorShuffleWithPERMV(DL, MVT::v16i32, Mask, V1, V2, DAG);		return lowerVectorShuffleWithPERMV(DL, MVT::v16i32, Mask, V1, V2, DAG);
}		}
▲ Show 20 Lines • Show All 20,943 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v4.ll

	Show First 20 Lines • Show All 1,456 Lines • ▼ Show 20 Lines
	; AVX2-LABEL: shuffle_v4i64_1234:			; AVX2-LABEL: shuffle_v4i64_1234:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1],ymm0[2,3,4,5,6,7]			; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1],ymm0[2,3,4,5,6,7]
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[1,2,3,0]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[1,2,3,0]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: shuffle_v4i64_1234:			; AVX512VL-LABEL: shuffle_v4i64_1234:
	; AVX512VL: # BB#0:			; AVX512VL: # BB#0:
	; AVX512VL-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1],ymm0[2,3,4,5,6,7]			; AVX512VL-NEXT: valignq {{.*#+}} ymm0 = ymm0[1,2,3],ymm1[0]
	; AVX512VL-NEXT: vpermq {{.*#+}} ymm0 = ymm0[1,2,3,0]
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%shuffle = shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 1, i32 2, i32 3, i32 4>			%shuffle = shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 1, i32 2, i32 3, i32 4>
	ret <4 x i64> %shuffle			ret <4 x i64> %shuffle
	}			}

	define <4 x i64> @shuffle_v4i64_1230(<4 x i64> %a) {			define <4 x i64> @shuffle_v4i64_1230(<4 x i64> %a) {
	; AVX1-LABEL: shuffle_v4i64_1230:			; AVX1-LABEL: shuffle_v4i64_1230:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	Show All 16 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v8.ll

	Show First 20 Lines • Show All 2,556 Lines • ▼ Show 20 Lines
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]			; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]
	; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [1,2,3,4,5,6,7,0]			; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [1,2,3,4,5,6,7,0]
	; AVX2-NEXT: vpermd %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpermd %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: shuffle_v8i32_12345678:			; AVX512VL-LABEL: shuffle_v8i32_12345678:
	; AVX512VL: # BB#0:			; AVX512VL: # BB#0:
	; AVX512VL-NEXT: vmovdqa32 {{.*#+}} ymm2 = [1,2,3,4,5,6,7,8]			; AVX512VL-NEXT: valignd {{.*#+}} ymm0 = ymm0[1,2,3,4,5,6,7],ymm1[0]
	; AVX512VL-NEXT: vpermt2d %ymm1, %ymm2, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>			%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	ret <8 x i32> %shuffle			ret <8 x i32> %shuffle
	}			}

	define <8 x i32> @shuffle_v8i32_12345670(<8 x i32> %a) {			define <8 x i32> @shuffle_v8i32_12345670(<8 x i32> %a) {
	; AVX1-LABEL: shuffle_v8i32_12345670:			; AVX1-LABEL: shuffle_v8i32_12345670:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[2,3,0,1]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[2,3,0,1]
	; AVX1-NEXT: vshufps {{.*#+}} ymm1 = ymm1[0,0],ymm0[3,0],ymm1[4,4],ymm0[7,4]			; AVX1-NEXT: vshufps {{.*#+}} ymm1 = ymm1[0,0],ymm0[3,0],ymm1[4,4],ymm0[7,4]
	; AVX1-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,2],ymm1[2,0],ymm0[5,6],ymm1[6,4]			; AVX1-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,2],ymm1[2,0],ymm0[5,6],ymm1[6,4]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v8i32_12345670:			; AVX2-LABEL: shuffle_v8i32_12345670:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [1,2,3,4,5,6,7,0]			; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [1,2,3,4,5,6,7,0]
	; AVX2-NEXT: vpermd %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpermd %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: shuffle_v8i32_12345670:			; AVX512VL-LABEL: shuffle_v8i32_12345670:
	; AVX512VL: # BB#0:			; AVX512VL: # BB#0:
	; AVX512VL-NEXT: vmovdqa32 {{.*#+}} ymm1 = [1,2,3,4,5,6,7,0]			; AVX512VL-NEXT: valignd {{.*#+}} ymm0 = ymm0[1,2,3,4,5,6,7,0]
	; AVX512VL-NEXT: vpermd %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%shuffle = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0>			%shuffle = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0>
	ret <8 x i32> %shuffle			ret <8 x i32> %shuffle
	}			}

llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v16.ll

	Show First 20 Lines • Show All 336 Lines • ▼ Show 20 Lines
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%shuffle = shufflevector <16 x i32> zeroinitializer, <16 x i32> %a, <16 x i32> <i32 16, i32 0, i32 17, i32 0, i32 18, i32 0, i32 19, i32 0, i32 20, i32 0, i32 21, i32 0, i32 22, i32 0, i32 23, i32 0>			%shuffle = shufflevector <16 x i32> zeroinitializer, <16 x i32> %a, <16 x i32> <i32 16, i32 0, i32 17, i32 0, i32 18, i32 0, i32 19, i32 0, i32 20, i32 0, i32 21, i32 0, i32 22, i32 0, i32 23, i32 0>
	ret <16 x i32> %shuffle			ret <16 x i32> %shuffle
	}			}

	define <16 x i32> @shuffle_v16i32_01_02_03_04_05_06_07_08_09_10_11_12_13_14_15_16(<16 x i32> %a, <16 x i32> %b) {			define <16 x i32> @shuffle_v16i32_01_02_03_04_05_06_07_08_09_10_11_12_13_14_15_16(<16 x i32> %a, <16 x i32> %b) {
	; ALL-LABEL: shuffle_v16i32_01_02_03_04_05_06_07_08_09_10_11_12_13_14_15_16:			; ALL-LABEL: shuffle_v16i32_01_02_03_04_05_06_07_08_09_10_11_12_13_14_15_16:
	; ALL: # BB#0:			; ALL: # BB#0:
	; ALL-NEXT: vmovdqa32 {{.*#+}} zmm2 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]			; ALL-NEXT: valignd {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],zmm1[0]
	; ALL-NEXT: vpermt2d %zmm1, %zmm2, %zmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%shuffle = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32><i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>			%shuffle = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32><i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>
	ret <16 x i32> %shuffle			ret <16 x i32> %shuffle
	}			}

	define <16 x i32> @shuffle_v16i32_01_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00(<16 x i32> %a) {			define <16 x i32> @shuffle_v16i32_01_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00(<16 x i32> %a) {
	; ALL-LABEL: shuffle_v16i32_01_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00:			; ALL-LABEL: shuffle_v16i32_01_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00:
	; ALL: # BB#0:			; ALL: # BB#0:
	; ALL-NEXT: vmovdqa32 {{.*#+}} zmm1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,0]			; ALL-NEXT: valignd {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,0]
	; ALL-NEXT: vpermd %zmm0, %zmm1, %zmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%shuffle = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32><i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 0>			%shuffle = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32><i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 0>
	ret <16 x i32> %shuffle			ret <16 x i32> %shuffle
	}			}

llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll

Show First 20 Lines • Show All 2,274 Lines • ▼ Show 20 Lines	; AVX512F-32-NEXT: retl
%shuffle = shufflevector <8 x double> %a, <8 x double> zeroinitializer, <8 x i32> <i32 0, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>		%shuffle = shufflevector <8 x double> %a, <8 x double> zeroinitializer, <8 x i32> <i32 0, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
ret <8 x double> %shuffle		ret <8 x double> %shuffle
}		}

define <8 x i64> @shuffle_v8i64_12345678(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @shuffle_v8i64_12345678(<8 x i64> %a, <8 x i64> %b) {
;		;
; AVX512F-LABEL: shuffle_v8i64_12345678:		; AVX512F-LABEL: shuffle_v8i64_12345678:
; AVX512F: # BB#0:		; AVX512F: # BB#0:
; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm2 = [1,2,3,4,5,6,7,8]		; AVX512F-NEXT: valignq {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7],zmm1[0]
; AVX512F-NEXT: vpermt2q %zmm1, %zmm2, %zmm0
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512F-32-LABEL: shuffle_v8i64_12345678:		; AVX512F-32-LABEL: shuffle_v8i64_12345678:
; AVX512F-32: # BB#0:		; AVX512F-32: # BB#0:
; AVX512F-32-NEXT: vmovdqa64 {{.*#+}} zmm2 = [1,0,2,0,3,0,4,0,5,0,6,0,7,0,8,0]		; AVX512F-32-NEXT: valignq {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7],zmm1[0]
; AVX512F-32-NEXT: vpermt2q %zmm1, %zmm2, %zmm0
; AVX512F-32-NEXT: retl		; AVX512F-32-NEXT: retl
%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>		%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
ret <8 x i64> %shuffle		ret <8 x i64> %shuffle
}		}

define <8 x i64> @shuffle_v8i64_12345670(<8 x i64> %a) {		define <8 x i64> @shuffle_v8i64_12345670(<8 x i64> %a) {
;		;
; AVX512F-LABEL: shuffle_v8i64_12345670:		; AVX512F-LABEL: shuffle_v8i64_12345670:
; AVX512F: # BB#0:		; AVX512F: # BB#0:
; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm1 = [1,2,3,4,5,6,7,0]		; AVX512F-NEXT: valignq {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7,0]
; AVX512F-NEXT: vpermq %zmm0, %zmm1, %zmm0
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512F-32-LABEL: shuffle_v8i64_12345670:		; AVX512F-32-LABEL: shuffle_v8i64_12345670:
; AVX512F-32: # BB#0:		; AVX512F-32: # BB#0:
; AVX512F-32-NEXT: vmovdqa64 {{.*#+}} zmm1 = [1,0,2,0,3,0,4,0,5,0,6,0,7,0,0,0]		; AVX512F-32-NEXT: valignq {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7,0]
; AVX512F-32-NEXT: vpermq %zmm0, %zmm1, %zmm0
; AVX512F-32-NEXT: retl		; AVX512F-32-NEXT: retl
%shuffle = shufflevector <8 x i64> %a, <8 x i64> undef, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0>		%shuffle = shufflevector <8 x i64> %a, <8 x i64> undef, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0>
ret <8 x i64> %shuffle		ret <8 x i64> %shuffle
}		}