This is an archive of the discontinued LLVM Phabricator instance.

Differential D19228

[X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles
ClosedPublic

Authored by RKSimon on Apr 18 2016, 9:52 AM.

Download Raw Diff

Details

Reviewers

spatel
andreadb
mkuper

Commits

rG32b1c9fe7fd9: [X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary…
rL266728: [X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary…

Summary

Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 54076.Apr 18 2016, 9:52 AM

RKSimon retitled this revision from to [X86][AVX2] Prefer VPERMQ/VPERMPD over VPERM2I128/VPERM2F128 for unary shuffles.

RKSimon updated this object.

RKSimon added reviewers: mkuper, andreadb, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

lib/Target/X86/X86ISelLowering.cpp
10584–10587	I may have missed it, but the advantage shown in the test changes is just that we get to use an instruction with a single input operand. Add a test to show the load folding win?

RKSimon mentioned this in rL266662: [X86][AVX] Added extra memory folding tests for D19228.Apr 18 2016, 12:54 PM

Cheers Sanjay, my previous explanation was wrong - the actual problem is with the lowerV2X128VectorShuffle cases that use vinsertf128/vinserti128, the perm2f128/perm2i128 cases correctly fold for unary shuffles. I've added tests to demonstrate this.

I've updated the patch title/description accordingly.

spatel added inline comments.Apr 18 2016, 2:04 PM

test/CodeGen/X86/avx-vperm2x128.ll
65–66	So this one could be 'vperm2f128' with a memop, couldn't it? Any idea why that didn't happen?

RKSimon added inline comments.Apr 18 2016, 2:49 PM

test/CodeGen/X86/avx-vperm2x128.ll
65–66	The insertf128 pattern is used instead for cases where we're inserting the lower half (so no extract) and the other half is already in place - this is the better thing to do on pre-AVX2 targets according to Agner's lists (especially on AMD targets which is weak on 128-bit lane crossings). Fixing this in the memory fold code would be tricky as the folding logic will see the input split into 2 and will assume it can't be folded so it'll never arrive at foldMemoryOperandImpl.

LGTM. Thanks!

This revision is now accepted and ready to land.Apr 18 2016, 3:20 PM

Closed by commit rL266728: [X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary… (authored by RKSimon). · Explain WhyApr 19 2016, 5:32 AM

This revision was automatically updated to reflect the committed changes.

Thanks Sanjay, in the commit I was able to move the patch inside the insertf128 lowering code - this means that AVX2 targets still use perm2f128/perm2i128 in some unary shuffles.

RKSimon mentioned this in rL275411: [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle.Jul 14 2016, 6:36 AM

RKSimon mentioned this in rL275497: [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied).Jul 14 2016, 4:12 PM

Revision Contents

Path

Size

lib/

Target/

X86/

	X86ISelLowering.cpp
	X86ISelLowering.cpp (revision 266660)

18 lines

test/

CodeGen/

X86/

	avx-vperm2x128.ll
	avx-vperm2x128.ll (revision 266662)

121 lines

	vector-shuffle-256-v16.ll
	vector-shuffle-256-v16.ll (revision 266660)

14 lines

	vector-shuffle-256-v32.ll
	vector-shuffle-256-v32.ll (revision 266660)

12 lines

	vector-shuffle-256-v4.ll
	vector-shuffle-256-v4.ll (revision 266660)

8 lines

	vector-shuffle-256-v8.ll
	vector-shuffle-256-v8.ll (revision 266660)

56 lines

	vector-shuffle-combining.ll
	vector-shuffle-combining.ll (revision 266660)

2 lines

Diff 54107

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,570 Lines • ▼ Show 20 Lines	static SDValue lowerV2X128VectorShuffle(SDLoc DL, MVT VT, SDValue V1,
// the zero vector has only one use, we could use a VPERM2X128 to save the		// the zero vector has only one use, we could use a VPERM2X128 to save the
// instruction bytes needed to explicitly generate the zero vector.		// instruction bytes needed to explicitly generate the zero vector.

// Blends are faster and handle all the non-lane-crossing cases.		// Blends are faster and handle all the non-lane-crossing cases.
if (SDValue Blend = lowerVectorShuffleAsBlend(DL, VT, V1, V2, Mask,		if (SDValue Blend = lowerVectorShuffleAsBlend(DL, VT, V1, V2, Mask,
Subtarget, DAG))		Subtarget, DAG))
return Blend;		return Blend;

		// If either input operand is a zero vector, use VPERM2X128 because its mask
		// allows us to replace the zero input with an implicit zero.
bool IsV1Zero = ISD::isBuildVectorAllZeros(V1.getNode());		bool IsV1Zero = ISD::isBuildVectorAllZeros(V1.getNode());
bool IsV2Zero = ISD::isBuildVectorAllZeros(V2.getNode());		bool IsV2Zero = ISD::isBuildVectorAllZeros(V2.getNode());

// If either input operand is a zero vector, use VPERM2X128 because its mask		// With AVX2 we should use VPERMQ/VPERMPD to allow memory folding.
// allows us to replace the zero input with an implicit zero.		if (Subtarget.hasAVX2() && isSingleInputShuffleMask(Mask) && !IsV1Zero)
		return SDValue();

		spatelUnsubmitted Not Done Reply Inline Actions I may have missed it, but the advantage shown in the test changes is just that we get to use an instruction with a single input operand. Add a test to show the load folding win? spatel: I may have missed it, but the advantage shown in the test changes is just that we get to use an…
if (!IsV1Zero && !IsV2Zero) {		if (!IsV1Zero && !IsV2Zero) {
// Check for patterns which can be matched with a single insert of a 128-bit		// Check for patterns which can be matched with a single insert of a 128-bit
// subvector.		// subvector.
bool OnlyUsesV1 = isShuffleEquivalent(V1, V2, Mask, {0, 1, 0, 1});		bool OnlyUsesV1 = isShuffleEquivalent(V1, V2, Mask, {0, 1, 0, 1});
if (OnlyUsesV1 \|\| isShuffleEquivalent(V1, V2, Mask, {0, 1, 4, 5})) {		if (OnlyUsesV1 \|\| isShuffleEquivalent(V1, V2, Mask, {0, 1, 4, 5})) {
MVT SubVT = MVT::getVectorVT(VT.getVectorElementType(),		MVT SubVT = MVT::getVectorVT(VT.getVectorElementType(),
VT.getVectorNumElements() / 2);		VT.getVectorNumElements() / 2);
SDValue LoV = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVT, V1,		SDValue LoV = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVT, V1,
▲ Show 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	static SDValue lowerV4F64VectorShuffle(SDValue Op, SDValue V1, SDValue V2,
assert(V1.getSimpleValueType() == MVT::v4f64 && "Bad operand type!");		assert(V1.getSimpleValueType() == MVT::v4f64 && "Bad operand type!");
assert(V2.getSimpleValueType() == MVT::v4f64 && "Bad operand type!");		assert(V2.getSimpleValueType() == MVT::v4f64 && "Bad operand type!");
ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);		ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);
ArrayRef<int> Mask = SVOp->getMask();		ArrayRef<int> Mask = SVOp->getMask();
assert(Mask.size() == 4 && "Unexpected mask size for v4 shuffle!");		assert(Mask.size() == 4 && "Unexpected mask size for v4 shuffle!");

SmallVector<int, 4> WidenedMask;		SmallVector<int, 4> WidenedMask;
if (canWidenShuffleElements(Mask, WidenedMask))		if (canWidenShuffleElements(Mask, WidenedMask))
return lowerV2X128VectorShuffle(DL, MVT::v4f64, V1, V2, Mask, Subtarget,		if (SDValue V = lowerV2X128VectorShuffle(DL, MVT::v4f64, V1, V2, Mask,
DAG);		Subtarget, DAG))
		return V;

if (isSingleInputShuffleMask(Mask)) {		if (isSingleInputShuffleMask(Mask)) {
// Check for being able to broadcast a single element.		// Check for being able to broadcast a single element.
if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(		if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(
DL, MVT::v4f64, V1, V2, Mask, Subtarget, DAG))		DL, MVT::v4f64, V1, V2, Mask, Subtarget, DAG))
return Broadcast;		return Broadcast;

// Use low duplicate instructions for masks that match their pattern.		// Use low duplicate instructions for masks that match their pattern.
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	static SDValue lowerV4I64VectorShuffle(SDValue Op, SDValue V1, SDValue V2,
assert(V2.getSimpleValueType() == MVT::v4i64 && "Bad operand type!");		assert(V2.getSimpleValueType() == MVT::v4i64 && "Bad operand type!");
ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);		ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);
ArrayRef<int> Mask = SVOp->getMask();		ArrayRef<int> Mask = SVOp->getMask();
assert(Mask.size() == 4 && "Unexpected mask size for v4 shuffle!");		assert(Mask.size() == 4 && "Unexpected mask size for v4 shuffle!");
assert(Subtarget.hasAVX2() && "We can only lower v4i64 with AVX2!");		assert(Subtarget.hasAVX2() && "We can only lower v4i64 with AVX2!");

SmallVector<int, 4> WidenedMask;		SmallVector<int, 4> WidenedMask;
if (canWidenShuffleElements(Mask, WidenedMask))		if (canWidenShuffleElements(Mask, WidenedMask))
return lowerV2X128VectorShuffle(DL, MVT::v4i64, V1, V2, Mask, Subtarget,		if (SDValue V = lowerV2X128VectorShuffle(DL, MVT::v4i64, V1, V2, Mask,
DAG);		Subtarget, DAG))
		return V;

if (SDValue Blend = lowerVectorShuffleAsBlend(DL, MVT::v4i64, V1, V2, Mask,		if (SDValue Blend = lowerVectorShuffleAsBlend(DL, MVT::v4i64, V1, V2, Mask,
Subtarget, DAG))		Subtarget, DAG))
return Blend;		return Blend;

// Check for being able to broadcast a single element.		// Check for being able to broadcast a single element.
if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(DL, MVT::v4i64, V1, V2,		if (SDValue Broadcast = lowerVectorShuffleAsBroadcast(DL, MVT::v4i64, V1, V2,
Mask, Subtarget, DAG))		Mask, Subtarget, DAG))
▲ Show 20 Lines • Show All 19,421 Lines • Show Last 20 Lines

test/CodeGen/X86/avx-vperm2x128.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX1			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX1
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefix=ALL --check-prefix=AVX2			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefix=ALL --check-prefix=AVX2

	define <8 x float> @shuffle_v8f32_45670123(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_45670123(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_45670123:			; AVX1-LABEL: shuffle_v8f32_45670123:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_45670123:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[2,3,0,1]
				; AVX2-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_45670123_mem(<8 x float>* %pa, <8 x float>* %pb) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_45670123_mem(<8 x float>* %pa, <8 x float>* %pb) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_45670123_mem:			; AVX1-LABEL: shuffle_v8f32_45670123_mem:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = mem[2,3,0,1]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = mem[2,3,0,1]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_45670123_mem:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = mem[2,3,0,1]
				; AVX2-NEXT: retq
	entry:			entry:
	%a = load <8 x float>, <8 x float>* %pa			%a = load <8 x float>, <8 x float>* %pa
	%b = load <8 x float>, <8 x float>* %pb			%b = load <8 x float>, <8 x float>* %pb
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_0123cdef(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_0123cdef(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_0123cdef:			; ALL-LABEL: shuffle_v8f32_0123cdef:
	; ALL: ## BB#0: ## %entry			; ALL: ## BB#0: ## %entry
	; ALL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3]			; ALL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3]
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_01230123(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_01230123(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_01230123:			; AVX1-LABEL: shuffle_v8f32_01230123:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_01230123:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,1,0,1]
				; AVX2-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_01230123_mem(<8 x float>* %pa, <8 x float>* %pb) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_01230123_mem(<8 x float>* %pa, <8 x float>* %pb) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_01230123_mem:			; AVX1-LABEL: shuffle_v8f32_01230123_mem:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vmovaps (%rdi), %ymm0			; AVX1-NEXT: vmovaps (%rdi), %ymm0
	; ALL-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
				spatelUnsubmitted Not Done Reply Inline Actions So this one could be 'vperm2f128' with a memop, couldn't it? Any idea why that didn't happen? spatel: So this one could be 'vperm2f128' with a memop, couldn't it? Any idea why that didn't happen?
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions The insertf128 pattern is used instead for cases where we're inserting the lower half (so no extract) and the other half is already in place - this is the better thing to do on pre-AVX2 targets according to Agner's lists (especially on AMD targets which is weak on 128-bit lane crossings). Fixing this in the memory fold code would be tricky as the folding logic will see the input split into 2 and will assume it can't be folded so it'll never arrive at foldMemoryOperandImpl. RKSimon: The insertf128 pattern is used instead for cases where we're inserting the lower half (so no…
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_01230123_mem:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = mem[0,1,0,1]
				; AVX2-NEXT: retq
	entry:			entry:
	%a = load <8 x float>, <8 x float>* %pa			%a = load <8 x float>, <8 x float>* %pa
	%b = load <8 x float>, <8 x float>* %pb			%b = load <8 x float>, <8 x float>* %pb
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_45674567(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_45674567(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_45674567:			; AVX1-LABEL: shuffle_v8f32_45674567:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_45674567:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[2,3,2,3]
				; AVX2-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 4, i32 5, i32 6, i32 7>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 4, i32 5, i32 6, i32 7>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_2323(<32 x i8> %a, <32 x i8> %b) nounwind uwtable readnone ssp {			define <32 x i8> @shuffle_v32i8_2323(<32 x i8> %a, <32 x i8> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v32i8_2323:			; AVX1-LABEL: shuffle_v32i8_2323:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v32i8_2323:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[2,3,2,3]
				; AVX2-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_2323_domain(<32 x i8> %a, <32 x i8> %b) nounwind uwtable readnone ssp {			define <32 x i8> @shuffle_v32i8_2323_domain(<32 x i8> %a, <32 x i8> %b) nounwind uwtable readnone ssp {
	; AVX1-LABEL: shuffle_v32i8_2323_domain:			; AVX1-LABEL: shuffle_v32i8_2323_domain:
	; AVX1: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0			; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
	; AVX1-NEXT: vpaddb {{.*}}(%rip), %xmm0, %xmm0			; AVX1-NEXT: vpaddb {{.*}}(%rip), %xmm0, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v32i8_2323_domain:			; AVX2-LABEL: shuffle_v32i8_2323_domain:
	; AVX2: ## BB#0: ## %entry			; AVX2: ## BB#0: ## %entry
	; AVX2-NEXT: vpaddb {{.*}}(%rip), %ymm0, %ymm0			; AVX2-NEXT: vpaddb {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: vperm2i128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	entry:			entry:
	; add forces execution domain			; add forces execution domain
	%a2 = add <32 x i8> %a, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%a2 = add <32 x i8> %a, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%shuffle = shufflevector <32 x i8> %a2, <32 x i8> %b, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			%shuffle = shufflevector <32 x i8> %a2, <32 x i8> %b, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3],ymm1[0,1]			; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3],ymm1[0,1]
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 9, i32 undef, i32 11>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 9, i32 undef, i32 11>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_uu67uu67(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_uu67uu67(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_uu67uu67:			; AVX1-LABEL: shuffle_v8f32_uu67uu67:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_uu67uu67:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,3,2,3]
				; AVX2-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_uu67uuab(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_uu67uuab(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_uu67uuab:			; ALL-LABEL: shuffle_v8f32_uu67uuab:
	; ALL: ## BB#0: ## %entry			; ALL: ## BB#0: ## %entry
	Show All 10 Lines
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3],ymm1[2,3]			; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3],ymm1[2,3]
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 14, i32 15>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 14, i32 15>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_uu674567(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_uu674567(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_uu674567:			; AVX1-LABEL: shuffle_v8f32_uu674567:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_uu674567:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,3,2,3]
				; AVX2-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 4, i32 5, i32 6, i32 7>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 4, i32 5, i32 6, i32 7>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_uu6789ab(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_uu6789ab(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_uu6789ab:			; ALL-LABEL: shuffle_v8f32_uu6789ab:
	; ALL: ## BB#0: ## %entry			; ALL: ## BB#0: ## %entry
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3],ymm1[0,1]			; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3],ymm1[0,1]
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_4567uu67(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_4567uu67(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_4567uu67:			; AVX1-LABEL: shuffle_v8f32_4567uu67:
	; ALL: ## BB#0: ## %entry			; AVX1: ## BB#0: ## %entry
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_4567uu67:
				; AVX2: ## BB#0: ## %entry
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[2,3,2,3]
				; AVX2-NEXT: retq
	entry:			entry:
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_4567uuef(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {			define <8 x float> @shuffle_v8f32_4567uuef(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
	; ALL-LABEL: shuffle_v8f32_4567uuef:			; ALL-LABEL: shuffle_v8f32_4567uuef:
	; ALL: ## BB#0: ## %entry			; ALL: ## BB#0: ## %entry
	▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-256-v16.ll

	Show First 20 Lines • Show All 449 Lines • ▼ Show 20 Lines
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,0,1,0,1,2,3,0,1]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,0,1,0,1,2,3,0,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v16i16_00_00_00_00_00_00_01_00_00_00_00_00_00_00_01_00:			; AVX2-LABEL: shuffle_v16i16_00_00_00_00_00_00_01_00_00_00_00_00_00_00_01_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,0,1,0,1,2,3,0,1]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,0,1,0,1,2,3,0,1]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0>			%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0>
	ret <16 x i16> %shuffle			ret <16 x i16> %shuffle
	}			}

	define <16 x i16> @shuffle_v16i16_00_00_00_00_00_02_00_00_00_00_00_00_00_02_00_00(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @shuffle_v16i16_00_00_00_00_00_02_00_00_00_00_00_00_00_02_00_00(<16 x i16> %a, <16 x i16> %b) {
	; AVX1-LABEL: shuffle_v16i16_00_00_00_00_00_02_00_00_00_00_00_00_00_02_00_00:			; AVX1-LABEL: shuffle_v16i16_00_00_00_00_00_02_00_00_00_00_00_00_00_02_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,0,1,4,5,0,1,0,1]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,0,1,4,5,0,1,0,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v16i16_00_00_00_00_00_02_00_00_00_00_00_00_00_02_00_00:			; AVX2-LABEL: shuffle_v16i16_00_00_00_00_00_02_00_00_00_00_00_00_00_02_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,0,1,4,5,0,1,0,1]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,0,1,4,5,0,1,0,1]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0>			%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0>
	ret <16 x i16> %shuffle			ret <16 x i16> %shuffle
	}			}

	define <16 x i16> @shuffle_v16i16_00_00_00_00_03_00_00_00_00_00_00_00_03_00_00_00(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @shuffle_v16i16_00_00_00_00_03_00_00_00_00_00_00_00_03_00_00_00(<16 x i16> %a, <16 x i16> %b) {
	; AVX1-LABEL: shuffle_v16i16_00_00_00_00_03_00_00_00_00_00_00_00_03_00_00_00:			; AVX1-LABEL: shuffle_v16i16_00_00_00_00_03_00_00_00_00_00_00_00_03_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,6,7,0,1,0,1,0,1]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,6,7,0,1,0,1,0,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v16i16_00_00_00_00_03_00_00_00_00_00_00_00_03_00_00_00:			; AVX2-LABEL: shuffle_v16i16_00_00_00_00_03_00_00_00_00_00_00_00_03_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,6,7,0,1,0,1,0,1]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,0,1,6,7,0,1,0,1,0,1]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 3, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 3, i32 0, i32 0, i32 0>			%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 3, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 3, i32 0, i32 0, i32 0>
	ret <16 x i16> %shuffle			ret <16 x i16> %shuffle
	}			}

	define <16 x i16> @shuffle_v16i16_00_00_00_04_00_00_00_00_00_00_00_04_00_00_00_00(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @shuffle_v16i16_00_00_00_04_00_00_00_00_00_00_00_04_00_00_00_00(<16 x i16> %a, <16 x i16> %b) {
	; AVX1-LABEL: shuffle_v16i16_00_00_00_04_00_00_00_00_00_00_00_04_00_00_00_00:			; AVX1-LABEL: shuffle_v16i16_00_00_00_04_00_00_00_00_00_00_00_04_00_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,8,9,0,1,0,1,0,1,0,1]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,8,9,0,1,0,1,0,1,0,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v16i16_00_00_00_04_00_00_00_00_00_00_00_04_00_00_00_00:			; AVX2-LABEL: shuffle_v16i16_00_00_00_04_00_00_00_00_00_00_00_04_00_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,8,9,0,1,0,1,0,1,0,1]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,0,1,8,9,0,1,0,1,0,1,0,1]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 0, i32 4, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 4, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 0, i32 4, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 4, i32 0, i32 0, i32 0, i32 0>
	ret <16 x i16> %shuffle			ret <16 x i16> %shuffle
	}			}

	define <16 x i16> @shuffle_v16i16_00_00_05_00_00_00_00_00_00_00_05_00_00_00_00_00(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @shuffle_v16i16_00_00_05_00_00_00_00_00_00_00_05_00_00_00_00_00(<16 x i16> %a, <16 x i16> %b) {
	; AVX1-LABEL: shuffle_v16i16_00_00_05_00_00_00_00_00_00_00_05_00_00_00_00_00:			; AVX1-LABEL: shuffle_v16i16_00_00_05_00_00_00_00_00_00_00_05_00_00_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,10,11,0,1,0,1,0,1,0,1,0,1]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,10,11,0,1,0,1,0,1,0,1,0,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v16i16_00_00_05_00_00_00_00_00_00_00_05_00_00_00_00_00:			; AVX2-LABEL: shuffle_v16i16_00_00_05_00_00_00_00_00_00_00_05_00_00_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,10,11,0,1,0,1,0,1,0,1,0,1]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,0,1,10,11,0,1,0,1,0,1,0,1,0,1]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 5, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 5, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 0, i32 5, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 5, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <16 x i16> %shuffle			ret <16 x i16> %shuffle
	}			}

	define <16 x i16> @shuffle_v16i16_00_06_00_00_00_00_00_00_00_06_00_00_00_00_00_00(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @shuffle_v16i16_00_06_00_00_00_00_00_00_00_06_00_00_00_00_00_00(<16 x i16> %a, <16 x i16> %b) {
	; AVX1-LABEL: shuffle_v16i16_00_06_00_00_00_00_00_00_00_06_00_00_00_00_00_00:			; AVX1-LABEL: shuffle_v16i16_00_06_00_00_00_00_00_00_00_06_00_00_00_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,12,13,0,1,0,1,0,1,0,1,0,1,0,1]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,12,13,0,1,0,1,0,1,0,1,0,1,0,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v16i16_00_06_00_00_00_00_00_00_00_06_00_00_00_00_00_00:			; AVX2-LABEL: shuffle_v16i16_00_06_00_00_00_00_00_00_00_06_00_00_00_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,12,13,0,1,0,1,0,1,0,1,0,1,0,1]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,12,13,0,1,0,1,0,1,0,1,0,1,0,1]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 6, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 6, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 6, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 6, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <16 x i16> %shuffle			ret <16 x i16> %shuffle
	}			}

	define <16 x i16> @shuffle_v16i16_07_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @shuffle_v16i16_07_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00(<16 x i16> %a, <16 x i16> %b) {
	; AVX1-LABEL: shuffle_v16i16_07_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00:			; AVX1-LABEL: shuffle_v16i16_07_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[14,15,0,1,0,1,0,1,0,1,0,1,0,1,0,1]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[14,15,0,1,0,1,0,1,0,1,0,1,0,1,0,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v16i16_07_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00:			; AVX2-LABEL: shuffle_v16i16_07_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[14,15,0,1,0,1,0,1,0,1,0,1,0,1,0,1]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[14,15,0,1,0,1,0,1,0,1,0,1,0,1,0,1]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <16 x i16> %shuffle			ret <16 x i16> %shuffle
	}			}

	define <16 x i16> @shuffle_v16i16_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @shuffle_v16i16_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31(<16 x i16> %a, <16 x i16> %b) {
	; AVX1-LABEL: shuffle_v16i16_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31:			; AVX1-LABEL: shuffle_v16i16_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	▲ Show 20 Lines • Show All 2,920 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-256-v32.ll

	Show First 20 Lines • Show All 812 Lines • ▼ Show 20 Lines
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_00_01_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_01_00:			; AVX2-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_00_01_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_01_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00(<32 x i8> %a, <32 x i8> %b) {			define <32 x i8> @shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00(<32 x i8> %a, <32 x i8> %b) {
	; AVX1-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00:			; AVX1-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00:			; AVX2-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_02_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00(<32 x i8> %a, <32 x i8> %b) {			define <32 x i8> @shuffle_v32i8_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00(<32 x i8> %a, <32 x i8> %b) {
	; AVX1-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00:			; AVX1-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,7,0,0,0,0,0,0,0]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,7,0,0,0,0,0,0,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00:			; AVX2-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_07_00_00_00_00_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,7,0,0,0,0,0,0,0]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,0,7,0,0,0,0,0,0,0]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00(<32 x i8> %a, <32 x i8> %b) {			define <32 x i8> @shuffle_v32i8_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00(<32 x i8> %a, <32 x i8> %b) {
	; AVX1-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00:			; AVX1-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00:			; AVX2-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 8, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 8, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 8, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 8, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00(<32 x i8> %a, <32 x i8> %b) {			define <32 x i8> @shuffle_v32i8_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00(<32 x i8> %a, <32 x i8> %b) {
	; AVX1-LABEL: shuffle_v32i8_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00:			; AVX1-LABEL: shuffle_v32i8_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0]			; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v32i8_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00:			; AVX2-LABEL: shuffle_v32i8_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_14_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0]			; AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 14, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 14, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 14, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 14, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00(<32 x i8> %a, <32 x i8> %b) {			define <32 x i8> @shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00(<32 x i8> %a, <32 x i8> %b) {
	; AVX1-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:			; AVX1-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: movl $15, %eax			; AVX1-NEXT: movl $15, %eax
	; AVX1-NEXT: vmovd %eax, %xmm1			; AVX1-NEXT: vmovd %eax, %xmm1
	; AVX1-NEXT: vpshufb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpshufb %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:			; AVX2-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: movl $15, %eax			; AVX2-NEXT: movl $15, %eax
	; AVX2-NEXT: vmovd %eax, %xmm1			; AVX2-NEXT: vmovd %eax, %xmm1
	; AVX2-NEXT: vpshufb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpshufb %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_00_33_02_35_04_37_06_39_08_41_10_43_12_45_14_47_16_49_18_51_20_53_22_55_24_57_26_59_28_61_30_63(<32 x i8> %a, <32 x i8> %b) {			define <32 x i8> @shuffle_v32i8_00_33_02_35_04_37_06_39_08_41_10_43_12_45_14_47_16_49_18_51_20_53_22_55_24_57_26_59_28_61_30_63(<32 x i8> %a, <32 x i8> %b) {
	; AVX1-LABEL: shuffle_v32i8_00_33_02_35_04_37_06_39_08_41_10_43_12_45_14_47_16_49_18_51_20_53_22_55_24_57_26_59_28_61_30_63:			; AVX1-LABEL: shuffle_v32i8_00_33_02_35_04_37_06_39_08_41_10_43_12_45_14_47_16_49_18_51_20_53_22_55_24_57_26_59_28_61_30_63:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	▲ Show 20 Lines • Show All 1,236 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-256-v4.ll

	Show First 20 Lines • Show All 843 Lines • ▼ Show 20 Lines
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vunpckhpd {{.*#+}} xmm2 = xmm1[1],xmm0[1]			; AVX1-NEXT: vunpckhpd {{.*#+}} xmm2 = xmm1[1],xmm0[1]
	; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v4i64_0451:			; AVX2-LABEL: shuffle_v4i64_0451:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
	; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,0,1,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,0,1,3]
				; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,2,1]
	; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5],ymm0[6,7]			; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5],ymm0[6,7]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: shuffle_v4i64_0451:			; AVX512VL-LABEL: shuffle_v4i64_0451:
	; AVX512VL: # BB#0:			; AVX512VL: # BB#0:
	; AVX512VL-NEXT: vinserti32x4 $1, %xmm0, %ymm0, %ymm0
	; AVX512VL-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,0,1,3]			; AVX512VL-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,0,1,3]
				; AVX512VL-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,2,1]
	; AVX512VL-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5],ymm0[6,7]			; AVX512VL-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5],ymm0[6,7]
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%shuffle = shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 0, i32 4, i32 5, i32 1>			%shuffle = shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 0, i32 4, i32 5, i32 1>
	ret <4 x i64> %shuffle			ret <4 x i64> %shuffle
	}			}

	define <4 x i64> @shuffle_v4i64_4501(<4 x i64> %a, <4 x i64> %b) {			define <4 x i64> @shuffle_v4i64_4501(<4 x i64> %a, <4 x i64> %b) {
	; AVX1-LABEL: shuffle_v4i64_4501:			; AVX1-LABEL: shuffle_v4i64_4501:
	Show All 19 Lines
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vunpckhpd {{.*#+}} xmm2 = xmm0[1],xmm1[1]			; AVX1-NEXT: vunpckhpd {{.*#+}} xmm2 = xmm0[1],xmm1[1]
	; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm1[0],xmm0[0]			; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm1[0],xmm0[0]
	; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v4i64_4015:			; AVX2-LABEL: shuffle_v4i64_4015:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm1, %ymm1			; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,1,2,1]
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,0,1,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,0,1,3]
	; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1],ymm0[2,3,4,5],ymm1[6,7]			; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1],ymm0[2,3,4,5],ymm1[6,7]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: shuffle_v4i64_4015:			; AVX512VL-LABEL: shuffle_v4i64_4015:
	; AVX512VL: # BB#0:			; AVX512VL: # BB#0:
	; AVX512VL-NEXT: vinserti32x4 $1, %xmm1, %ymm1, %ymm1			; AVX512VL-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,1,2,1]
	; AVX512VL-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,0,1,3]			; AVX512VL-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,0,1,3]
	; AVX512VL-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1],ymm0[2,3,4,5],ymm1[6,7]			; AVX512VL-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0,1],ymm0[2,3,4,5],ymm1[6,7]
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%shuffle = shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 4, i32 0, i32 1, i32 5>			%shuffle = shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 4, i32 0, i32 1, i32 5>
	ret <4 x i64> %shuffle			ret <4 x i64> %shuffle
	}			}

	define <4 x i64> @shuffle_v4i64_2u35(<4 x i64> %a, <4 x i64> %b) {			define <4 x i64> @shuffle_v4i64_2u35(<4 x i64> %a, <4 x i64> %b) {
	▲ Show 20 Lines • Show All 574 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-256-v8.ll

	Show First 20 Lines • Show All 665 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3,4,5,6],ymm1[7]			; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3,4,5,6],ymm1[7]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v8f32_f511235a:			; AVX2-LABEL: shuffle_v8f32_f511235a:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vmovaps {{.*#+}} ymm2 = <u,5,1,1,2,3,5,u>			; AVX2-NEXT: vmovaps {{.*#+}} ymm2 = <u,5,1,1,2,3,5,u>
	; AVX2-NEXT: vpermps %ymm0, %ymm2, %ymm0			; AVX2-NEXT: vpermps %ymm0, %ymm2, %ymm0
	; AVX2-NEXT: vpermilps {{.*#+}} ymm1 = ymm1[3,1,2,2,7,5,6,6]			; AVX2-NEXT: vpermilps {{.*#+}} ymm1 = ymm1[3,1,2,2,7,5,6,6]
	; AVX2-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm1[2,3,0,1]			; AVX2-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[2,1,2,1]
	; AVX2-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3,4,5,6],ymm1[7]			; AVX2-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3,4,5,6],ymm1[7]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 15, i32 5, i32 1, i32 1, i32 2, i32 3, i32 5, i32 10>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 15, i32 5, i32 1, i32 1, i32 2, i32 3, i32 5, i32 10>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_32103210(<8 x float> %a, <8 x float> %b) {			define <8 x float> @shuffle_v8f32_32103210(<8 x float> %a, <8 x float> %b) {
	; ALL-LABEL: shuffle_v8f32_32103210:			; AVX1-LABEL: shuffle_v8f32_32103210:
	; ALL: # BB#0:			; AVX1: # BB#0:
	; ALL-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0]			; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0]
	; ALL-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_32103210:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0]
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,1,0,1]
				; AVX2-NEXT: retq
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_76547654(<8 x float> %a, <8 x float> %b) {			define <8 x float> @shuffle_v8f32_76547654(<8 x float> %a, <8 x float> %b) {
	; ALL-LABEL: shuffle_v8f32_76547654:			; AVX1-LABEL: shuffle_v8f32_76547654:
	; ALL: # BB#0:			; AVX1: # BB#0:
	; ALL-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]			; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_76547654:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[2,3,2,3]
				; AVX2-NEXT: retq
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 7, i32 6, i32 5, i32 4>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 7, i32 6, i32 5, i32 4>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_76543210(<8 x float> %a, <8 x float> %b) {			define <8 x float> @shuffle_v8f32_76543210(<8 x float> %a, <8 x float> %b) {
	; ALL-LABEL: shuffle_v8f32_76543210:			; AVX1-LABEL: shuffle_v8f32_76543210:
	; ALL: # BB#0:			; AVX1: # BB#0:
	; ALL-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]			; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
	; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1]
	; ALL-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: shuffle_v8f32_76543210:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
				; AVX2-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[2,3,0,1]
				; AVX2-NEXT: retq
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_3210ba98(<8 x float> %a, <8 x float> %b) {			define <8 x float> @shuffle_v8f32_3210ba98(<8 x float> %a, <8 x float> %b) {
	; ALL-LABEL: shuffle_v8f32_3210ba98:			; ALL-LABEL: shuffle_v8f32_3210ba98:
	; ALL: # BB#0:			; ALL: # BB#0:
	; ALL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; ALL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	▲ Show 20 Lines • Show All 1,044 Lines • ▼ Show 20 Lines
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0]			; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v8i32_32103210:			; AVX2-LABEL: shuffle_v8i32_32103210:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,2,1,0]			; AVX2-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,2,1,0]
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0>			%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0>
	ret <8 x i32> %shuffle			ret <8 x i32> %shuffle
	}			}

	define <8 x i32> @shuffle_v8i32_76547654(<8 x i32> %a, <8 x i32> %b) {			define <8 x i32> @shuffle_v8i32_76547654(<8 x i32> %a, <8 x i32> %b) {
	; AVX1-LABEL: shuffle_v8i32_76547654:			; AVX1-LABEL: shuffle_v8i32_76547654:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]			; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
	; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v8i32_76547654:			; AVX2-LABEL: shuffle_v8i32_76547654:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]			; AVX2-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
	; AVX2-NEXT: vperm2i128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 7, i32 6, i32 5, i32 4>			%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 7, i32 6, i32 5, i32 4>
	ret <8 x i32> %shuffle			ret <8 x i32> %shuffle
	}			}

	define <8 x i32> @shuffle_v8i32_76543210(<8 x i32> %a, <8 x i32> %b) {			define <8 x i32> @shuffle_v8i32_76543210(<8 x i32> %a, <8 x i32> %b) {
	; AVX1-LABEL: shuffle_v8i32_76543210:			; AVX1-LABEL: shuffle_v8i32_76543210:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]			; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
	; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: shuffle_v8i32_76543210:			; AVX2-LABEL: shuffle_v8i32_76543210:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]			; AVX2-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
	; AVX2-NEXT: vperm2i128 {{.*#+}} ymm0 = ymm0[2,3,0,1]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[2,3,0,1]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	ret <8 x i32> %shuffle			ret <8 x i32> %shuffle
	}			}

	define <8 x i32> @shuffle_v8i32_3210ba98(<8 x i32> %a, <8 x i32> %b) {			define <8 x i32> @shuffle_v8i32_3210ba98(<8 x i32> %a, <8 x i32> %b) {
	; AVX1-LABEL: shuffle_v8i32_3210ba98:			; AVX1-LABEL: shuffle_v8i32_3210ba98:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	▲ Show 20 Lines • Show All 463 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-combining.ll

	Show First 20 Lines • Show All 2,662 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]			; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
	; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: combine_unneeded_subvector1:			; AVX2-LABEL: combine_unneeded_subvector1:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpaddd {{.*}}(%rip), %ymm0, %ymm0			; AVX2-NEXT: vpaddd {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]			; AVX2-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4]
	; AVX2-NEXT: vperm2i128 {{.*#+}} ymm0 = ymm0[2,3,2,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[2,3,2,3]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%b = add <8 x i32> %a, <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>			%b = add <8 x i32> %a, <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	%c = shufflevector <8 x i32> %b, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 7, i32 6, i32 5, i32 4>			%c = shufflevector <8 x i32> %b, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 7, i32 6, i32 5, i32 4>
	ret <8 x i32> %c			ret <8 x i32> %c
	}			}

	define <8 x i32> @combine_unneeded_subvector2(<8 x i32> %a, <8 x i32> %b) {			define <8 x i32> @combine_unneeded_subvector2(<8 x i32> %a, <8 x i32> %b) {
	; SSE-LABEL: combine_unneeded_subvector2:			; SSE-LABEL: combine_unneeded_subvector2:
	▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines