This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
ClosedPublic

Authored by RKSimon on Mar 22 2015, 7:10 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet
chandlerc
andreadb
resistor

Commits

rGed2ba33ba0b3: [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
rL234004: [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR

Summary

This patch attempts to optimize the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

Also removed a x86 insertps test that was testing for the old shuffle lowering system.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 22419.Mar 22 2015, 7:10 AM

RKSimon retitled this revision from to [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR.

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: qcolombet, chandlerc, andreadb, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

We need to be careful here. Constant loads can be much more expensive than shuffles on some targets.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12019	Add space after if.
12029	When would this be false?
12042	It might be better to use ZExt, depending on the target/type. Might be better to use something like this: Op = isZExtFree(Op.getValueType(), SVT) ? DAG.getZExtOrTrunc(Op, SDLoc(N), SVT) : DAG.getSExtOrTrunc(Op, SDLoc(N), SVT);

Thanks for the review Hal, I'll update the patch later.

In D8516#145467, @hfinkel wrote:

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

We need to be careful here. Constant loads can be much more expensive than shuffles on some targets.

Yes that was my concern as well - there isn't an easy way to determine how expensive the shuffle is that we're trying to remove. I'll leave it as it is for now and only combine when the inputs.are only used once.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12029	We have an early out from the loop if an active operand turns out to be something other than BUILD_VECTOR or SCALAR_TO_VECTOR. I'll move this logic to the opening if() entry into the block.
12042	Easy enough to add.

Updated patch based on Hal's review.

Updated patch

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12030	I ended up keeping this in as it makes for much easier understanding than all the conditions I would have to add to the if() test. I've added a comment explaining the 'bail out' to make it clearer.

PING

ping * 2

Looks good to me.

This revision is now accepted and ready to land.Apr 1 2015, 9:48 AM

andreadb added inline comments.Apr 1 2015, 10:03 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12031–12033	On x86, a build_vector with all operands undef excluding the first operand is legalized to a scalar_to_vector. I am not sure if other targets would do the same; in case, I would suggest to remove this code and just emit a build_vector. Do you have an example that relies on this check? A shuffles with only one non-undef element should have been canonicalized/simplified to a shuffle where one of the operands is undef.
test/CodeGen/X86/mmx-bitcast.ll
74	I don't think this is related to your patch. However, this looks like a bug to me. Shouldn't this be a 'movq'?
test/CodeGen/X86/sse41.ll
1028–1029	My question is: do we have an equivalent test for insertps somewhere else? If so, then I think it is OK to remove it. Otherwise I would keep it.
test/CodeGen/X86/vector-shuffle-128-v16.ll
642	Shouldn't this be 'vmovd'?

Thanks Andrea. Are you happy with me submitting with these changes (after tests) or would prefer another review?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12031–12033	No specific need - its easy enough to always create a BUILD_VECTOR and rely on a target's lowering logic to do the right thing.
test/CodeGen/X86/mmx-bitcast.ll
74	movd deals with 32 and 64-bit gprs <-> vector moves. movq does vector load/stores and vector <-> vector moves. Its a funny old world.
test/CodeGen/X86/sse41.ll
1028–1029	Its a reduced test that with that patch folds to a store of a constant. I'll change it to use non-constant vector data and see if that works.
test/CodeGen/X86/vector-shuffle-128-v16.ll
642	Fixed.

RKSimon added inline comments.Apr 1 2015, 11:34 AM

test/CodeGen/X86/mmx-bitcast.ll
74	Actually scrub that - this does appear to be a bug. Its is a (v)movq instruction - but encoded similar to the (v)movd and completely separate to the (v)movq vector version.

In D8516#150644, @RKSimon wrote:

Thanks Andrea. Are you happy with me submitting with these changes (after tests) or would prefer another review?

The patch looks good to me.
Thanks!

test/CodeGen/X86/mmx-bitcast.ll
74	Anyway, this problem is not related to your patch. In case, you can raise a bug for it.

spatel added inline comments.Apr 1 2015, 12:31 PM

test/CodeGen/X86/mmx-bitcast.ll
74	I had a similar question in: http://reviews.llvm.org/D8691 Looks like some weirdness due to opcode prefixes; one of the movq versions won't work on a 64-bit system.

Closed by commit rL234004: [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR (authored by RKSimon). · Explain WhyApr 3 2015, 3:05 AM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in D8885: [CodeGen] Combine small-element shuffles of scalar_to_vector in terms of the wider scalar..Apr 8 2015, 11:18 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

46 lines

test/

CodeGen/

AArch64/

arm64-neon-copy.ll

14 lines

arm64-vshuffle.ll

155 lines

PowerPC/

vperm-lowering.ll

87 lines

X86/

27 lines

51 lines

26 lines

26 lines

14 lines

vector-shuffle-128-v16.ll

193 lines

vector-shuffle-128-v8.ll

148 lines

Diff 22419

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,574 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::FMUL) {
// Fold scalars or any vector constants (not just splats).		// Fold scalars or any vector constants (not just splats).
// This fold is done in general by InstCombine, but extra fmul insts		// This fold is done in general by InstCombine, but extra fmul insts
// may have been generated during lowering.		// may have been generated during lowering.
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
SDValue N01 = N0.getOperand(1);		SDValue N01 = N0.getOperand(1);
auto *BV1 = dyn_cast<BuildVectorSDNode>(N1);		auto *BV1 = dyn_cast<BuildVectorSDNode>(N1);
auto *BV00 = dyn_cast<BuildVectorSDNode>(N00);		auto *BV00 = dyn_cast<BuildVectorSDNode>(N00);
auto *BV01 = dyn_cast<BuildVectorSDNode>(N01);		auto *BV01 = dyn_cast<BuildVectorSDNode>(N01);

// Check 1: Make sure that the first operand of the inner multiply is NOT		// Check 1: Make sure that the first operand of the inner multiply is NOT
// a constant. Otherwise, we may induce infinite looping.		// a constant. Otherwise, we may induce infinite looping.
if (!(isConstOrConstSplatFP(N00) \|\| (BV00 && BV00->isConstant()))) {		if (!(isConstOrConstSplatFP(N00) \|\| (BV00 && BV00->isConstant()))) {
// Check 2: Make sure that the second operand of the inner multiply and		// Check 2: Make sure that the second operand of the inner multiply and
// the second operand of the outer multiply are constants.		// the second operand of the outer multiply are constants.
if ((N1CFP && isConstOrConstSplatFP(N01)) \|\|		if ((N1CFP && isConstOrConstSplatFP(N01)) \|\|
(BV1 && BV01 && BV1->isConstant() && BV01->isConstant())) {		(BV1 && BV01 && BV1->isConstant() && BV01->isConstant())) {
SDLoc SL(N);		SDLoc SL(N);
▲ Show 20 Lines • Show All 4,405 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::CONCAT_VECTORS &&
(N1.getOpcode() == ISD::CONCAT_VECTORS &&		(N1.getOpcode() == ISD::CONCAT_VECTORS &&
N0.getOperand(0).getValueType() == N1.getOperand(0).getValueType()))) {		N0.getOperand(0).getValueType() == N1.getOperand(0).getValueType()))) {
SDValue V = partitionShuffleOfConcats(N, DAG);		SDValue V = partitionShuffleOfConcats(N, DAG);

if (V.getNode())		if (V.getNode())
return V;		return V;
}		}

		// Attempt to combine a shuffle of 2 inputs of 'scalar sources' -
		// BUILD_VECTOR or SCALAR_TO_VECTOR into a single BUILD_VECTOR or
		// SCALAR_TO_VECTOR operation.
		if (Level < AfterLegalizeVectorOps && TLI.isTypeLegal(VT)) {
		int NumDefElts = 0;
		SmallVector<SDValue, 8> Ops;
		for (int M : SVN->getMask()) {
		SDValue Op = DAG.getUNDEF(VT.getScalarType());
		if (M >= 0) {
		int Idx = M % NumElts;
		SDValue &S = (M < (int)NumElts ? N0 : N1);
		if (S.getOpcode() == ISD::BUILD_VECTOR && S.hasOneUse()) {
		Op = S.getOperand(Idx);
		} else if (S.getOpcode() == ISD::SCALAR_TO_VECTOR && S.hasOneUse()) {
		if(Idx == 0)
		hfinkelUnsubmitted Not Done Reply Inline Actions Add space after if. hfinkel: Add space after if.
		Op = S.getOperand(0);
		} else {
		break;
		}
		}
		if (Op.getOpcode() != ISD::UNDEF)
		NumDefElts++;
		Ops.push_back(Op);
		}
		if (Ops.size() == VT.getVectorNumElements()) {
		hfinkelUnsubmitted Not Done Reply Inline Actions When would this be false? hfinkel: When would this be false?
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions We have an early out from the loop if an active operand turns out to be something other than BUILD_VECTOR or SCALAR_TO_VECTOR. I'll move this logic to the opening if() entry into the block. RKSimon: We have an early out from the loop if an active operand turns out to be something other than…
		// Create SCALAR_TO_VECTOR if the only defined input is input[0].
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions I ended up keeping this in as it makes for much easier understanding than all the conditions I would have to add to the if() test. I've added a comment explaining the 'bail out' to make it clearer. RKSimon: I ended up keeping this in as it makes for much easier understanding than all the conditions I…
		if (1 == NumDefElts && Ops[0].getOpcode() != ISD::UNDEF)
		return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), VT, Ops[0]);

		andreadbUnsubmitted Not Done Reply Inline Actions On x86, a build_vector with all operands undef excluding the first operand is legalized to a scalar_to_vector. I am not sure if other targets would do the same; in case, I would suggest to remove this code and just emit a build_vector. Do you have an example that relies on this check? A shuffles with only one non-undef element should have been canonicalized/simplified to a shuffle where one of the operands is undef. andreadb: On x86, a build_vector with all operands undef excluding the first operand is legalized to a…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions No specific need - its easy enough to always create a BUILD_VECTOR and rely on a target's lowering logic to do the right thing. RKSimon: No specific need - its easy enough to always create a BUILD_VECTOR and rely on a target's…
		// BUILD_VECTOR requires all inputs to be of the same type, find the
		// maximum type and extend them all.
		EVT SVT = VT.getScalarType();
		if (SVT.isInteger())
		for (SDValue &Op : Ops)
		SVT = (SVT.bitsLT(Op.getValueType()) ? Op.getValueType() : SVT);
		if (SVT != VT.getScalarType())
		for (SDValue &Op : Ops)
		Op = DAG.getSExtOrTrunc(Op, SDLoc(N), SVT);
		hfinkelUnsubmitted Not Done Reply Inline Actions It might be better to use ZExt, depending on the target/type. Might be better to use something like this: Op = isZExtFree(Op.getValueType(), SVT) ? DAG.getZExtOrTrunc(Op, SDLoc(N), SVT) : DAG.getSExtOrTrunc(Op, SDLoc(N), SVT); hfinkel: It might be better to use ZExt, depending on the target/type. Might be better to use something…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Easy enough to add. RKSimon: Easy enough to add.
		return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N), VT, Ops);
		}
		}

// If this shuffle only has a single input that is a bitcasted shuffle,		// If this shuffle only has a single input that is a bitcasted shuffle,
// attempt to merge the 2 shuffles and suitably bitcast the inputs/output		// attempt to merge the 2 shuffles and suitably bitcast the inputs/output
// back to their original types.		// back to their original types.
if (N0.getOpcode() == ISD::BITCAST && N0.hasOneUse() &&		if (N0.getOpcode() == ISD::BITCAST && N0.hasOneUse() &&
N1.getOpcode() == ISD::UNDEF && Level < AfterLegalizeVectorOps &&		N1.getOpcode() == ISD::UNDEF && Level < AfterLegalizeVectorOps &&
TLI.isTypeLegal(VT)) {		TLI.isTypeLegal(VT)) {

// Peek through the bitcast only if there is one user.		// Peek through the bitcast only if there is one user.
▲ Show 20 Lines • Show All 253 Lines • ▼ Show 20 Lines
/// e.g. AND V, <0xffffffff, 0, 0xffffffff, 0>. ==>		/// e.g. AND V, <0xffffffff, 0, 0xffffffff, 0>. ==>
/// vector_shuffle V, Zero, <0, 4, 2, 4>		/// vector_shuffle V, Zero, <0, 4, 2, 4>
SDValue DAGCombiner::XformToShuffleWithZero(SDNode *N) {		SDValue DAGCombiner::XformToShuffleWithZero(SDNode *N) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);		SDValue RHS = N->getOperand(1);
SDLoc dl(N);		SDLoc dl(N);

// Make sure we're not running after operation legalization where it		// Make sure we're not running after operation legalization where it
// may have custom lowered the vector shuffles.		// may have custom lowered the vector shuffles.
if (LegalOperations)		if (LegalOperations)
return SDValue();		return SDValue();

if (N->getOpcode() != ISD::AND)		if (N->getOpcode() != ISD::AND)
return SDValue();		return SDValue();

if (RHS.getOpcode() == ISD::BITCAST)		if (RHS.getOpcode() == ISD::BITCAST)
▲ Show 20 Lines • Show All 1,095 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-neon-copy.ll

	Show First 20 Lines • Show All 1,077 Lines • ▼ Show 20 Lines
	entry:			entry:
	%0 = extractelement <2 x i32> %a, i32 0			%0 = extractelement <2 x i32> %a, i32 0
	%vecinit.i = insertelement <2 x i32> undef, i32 %0, i32 0			%vecinit.i = insertelement <2 x i32> undef, i32 %0, i32 0
	%vecinit1.i = insertelement <2 x i32> %vecinit.i, i32 %0, i32 1			%vecinit1.i = insertelement <2 x i32> %vecinit.i, i32 %0, i32 1
	ret <2 x i32> %vecinit1.i			ret <2 x i32> %vecinit1.i
	}			}

	define <2 x i32> @test_concat_diff_v1i32_v1i32(i32 %a, i32 %b) {			define <2 x i32> @test_concat_diff_v1i32_v1i32(i32 %a, i32 %b) {
	; CHECK-LABEL: test_concat_diff_v1i32_v1i32:			; CHECK-LABEL: test_concat_diff_v1i32_v1i32:
	; CHECK: sqabs s{{[0-9]+}}, s{{[0-9]+}}			; CHECK: sqabs s{{[0-9]+}}, s{{[0-9]+}}
	; CHECK: sqabs s{{[0-9]+}}, s{{[0-9]+}}			; CHECK: sqabs s{{[0-9]+}}, s{{[0-9]+}}
	; CHECK-NEXT: zip1 {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.2s			; CHECK: ins {{v[0-9]+}}.s[1], w{{[0-9]+}}
	entry:			entry:
	%c = tail call i32 @llvm.aarch64.neon.sqabs.i32(i32 %a)			%c = tail call i32 @llvm.aarch64.neon.sqabs.i32(i32 %a)
	%d = insertelement <2 x i32> undef, i32 %c, i32 0			%d = insertelement <2 x i32> undef, i32 %c, i32 0
	%e = tail call i32 @llvm.aarch64.neon.sqabs.i32(i32 %b)			%e = tail call i32 @llvm.aarch64.neon.sqabs.i32(i32 %b)
	%f = insertelement <2 x i32> undef, i32 %e, i32 0			%f = insertelement <2 x i32> undef, i32 %e, i32 0
	%h = shufflevector <2 x i32> %d, <2 x i32> %f, <2 x i32> <i32 0, i32 2>			%h = shufflevector <2 x i32> %d, <2 x i32> %f, <2 x i32> <i32 0, i32 2>
	ret <2 x i32> %h			ret <2 x i32> %h
	}			}

	define <16 x i8> @test_concat_v16i8_v16i8_v16i8(<16 x i8> %x, <16 x i8> %y) #0 {			define <16 x i8> @test_concat_v16i8_v16i8_v16i8(<16 x i8> %x, <16 x i8> %y) #0 {
	; CHECK-LABEL: test_concat_v16i8_v16i8_v16i8:			; CHECK-LABEL: test_concat_v16i8_v16i8_v16i8:
	▲ Show 20 Lines • Show All 345 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-vshuffle.ll

	; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone \| FileCheck %s


	; The mask:			; CHECK: test1
	; CHECK: lCPI0_0:			; CHECK: movi d[[REG0:[0-9]+]], #0000000000000000
	; CHECK: .byte 2 ; 0x2			define <8 x i1> @test1() {
	; CHECK: .byte 255 ; 0xff			entry:
	; CHECK: .byte 6 ; 0x6			%Shuff = shufflevector <8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
	; CHECK: .byte 255 ; 0xff
	; The second vector is legalized to undef and the elements of the first vector
	; are used instead.
	; CHECK: .byte 2 ; 0x2
	; CHECK: .byte 4 ; 0x4
	; CHECK: .byte 6 ; 0x6
	; CHECK: .byte 0 ; 0x0
	; CHECK: test1
	; CHECK: ldr d[[REG0:[0-9]+]], [{{.*}}, lCPI0_0
	; CHECK: movi.8h v[[REG1:[0-9]+]], #0x1, lsl #8
	; CHECK: tbl.8b v{{[0-9]+}}, { v[[REG1]] }, v[[REG0]]
	define <8 x i1> @test1() {
	entry:
	%Shuff = shufflevector <8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
	i1 7>,			i1 7>,
	<8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,			<8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
	i1 7>,			i1 7>,
	<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10,			<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10,
	i32 12, i32 14, i32 0>			i32 12, i32 14, i32 0>
	ret <8 x i1> %Shuff			ret <8 x i1> %Shuff
	}			}

	; CHECK: lCPI1_0:			; CHECK: lCPI1_0:
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 255 ; 0xff			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 2 ; 0x2			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 255 ; 0xff			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 10 ; 0xa			; CHECK: .byte 1 ; 0x1
	; CHECK: .byte 12 ; 0xc			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 14 ; 0xe			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 7 ; 0x7			; CHECK: .byte 0 ; 0x0
	; CHECK: test2			; CHECK: test2
	; CHECK: ldr d[[REG0:[0-9]+]], [{{.*}}, lCPI1_0@PAGEOFF]			; CHECK: adrp x[[REG2:[0-9]+]], lCPI1_0@PAGE
	; CHECK: adrp x[[REG2:[0-9]+]], lCPI1_1@PAGE			; CHECK: ldr d[[REG1:[0-9]+]], [x[[REG2]], lCPI1_0@PAGEOFF]
	; CHECK: ldr q[[REG1:[0-9]+]], [x[[REG2]], lCPI1_1@PAGEOFF]			define <8 x i1>@test2() {
	; CHECK: tbl.8b v{{[0-9]+}}, { v[[REG1]] }, v[[REG0]]			bb:
	define <8 x i1>@test2() {			%Shuff = shufflevector <8 x i1> zeroinitializer,
	bb:
	%Shuff = shufflevector <8 x i1> zeroinitializer,
	<8 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,			<8 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,
	<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,			<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,
	i32 0>			i32 0>
	ret <8 x i1> %Shuff			ret <8 x i1> %Shuff
	}			}

	; CHECK: lCPI2_0:			; CHECK: test3
	; CHECK: .byte 2 ; 0x2			; CHECK: movi.4s v{{[0-9]+}}, #0x1
	; CHECK: .byte 255 ; 0xff			define <16 x i1> @test3(i1* %ptr, i32 %v) {
	; CHECK: .byte 6 ; 0x6			bb:
	; CHECK: .byte 255 ; 0xff			%Shuff = shufflevector <16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>, <16 x i1> undef,
	; CHECK: .byte 10 ; 0xa
	; CHECK: .byte 12 ; 0xc
	; CHECK: .byte 14 ; 0xe
	; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 2 ; 0x2
	; CHECK: .byte 255 ; 0xff
	; CHECK: .byte 6 ; 0x6
	; CHECK: .byte 255 ; 0xff
	; CHECK: .byte 10 ; 0xa
	; CHECK: .byte 12 ; 0xc
	; CHECK: .byte 14 ; 0xe
	; CHECK: .byte 0 ; 0x0
	; CHECK: test3
	; CHECK: adrp x[[REG3:[0-9]+]], lCPI2_0@PAGE
	; CHECK: ldr q[[REG0:[0-9]+]], [x[[REG3]], lCPI2_0@PAGEOFF]
	; CHECK: ldr q[[REG1:[0-9]+]], [x[[REG3]], lCPI2_1@PAGEOFF]
	; CHECK: tbl.16b v{{[0-9]+}}, { v[[REG1]] }, v[[REG0]]
	define <16 x i1> @test3(i1* %ptr, i32 %v) {
	bb:
	%Shuff = shufflevector <16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>, <16 x i1> undef,
	<16 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,			<16 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,
	i32 0, i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12,			i32 0, i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12,
	i32 14, i32 0>			i32 14, i32 0>
	ret <16 x i1> %Shuff			ret <16 x i1> %Shuff
	}			}
	; CHECK: lCPI3_1:			; CHECK: lCPI3_0:
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 1 ; 0x1			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 2 ; 0x2			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 18 ; 0x12			; CHECK: .byte 1 ; 0x1
	; CHECK: .byte 4 ; 0x4			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 5 ; 0x5			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 6 ; 0x6			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 7 ; 0x7			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 8 ; 0x8			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 31 ; 0x1f			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 10 ; 0xa			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 30 ; 0x1e			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 12 ; 0xc			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 13 ; 0xd			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 14 ; 0xe			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 15 ; 0xf			; CHECK: .byte 0 ; 0x0
	; CHECK: _test4:			; CHECK: _test4:
	; CHECK: ldr q[[REG1:[0-9]+]]			; CHECK: adrp x[[REG3:[0-9]+]], lCPI3_0@PAGE
	; CHECK: movi.2d v[[REG0:[0-9]+]], #0000000000000000			; CHECK: ldr q[[REG2:[0-9]+]], [x[[REG3]], lCPI3_0@PAGEOFF]
	; CHECK: adrp x[[REG3:[0-9]+]], lCPI3_1@PAGE			define <16 x i1> @test4(i1* %ptr, i32 %v) {
	; CHECK: ldr q[[REG2:[0-9]+]], [x[[REG3]], lCPI3_1@PAGEOFF]			bb:
	; CHECK: tbl.16b v{{[0-9]+}}, { v[[REG0]], v[[REG1]] }, v[[REG2]]			%Shuff = shufflevector <16 x i1> zeroinitializer,
	define <16 x i1> @test4(i1* %ptr, i32 %v) {
	bb:
	%Shuff = shufflevector <16 x i1> zeroinitializer,
	<16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1,			<16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1,
	i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,			i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,
	<16 x i32> <i32 2, i32 1, i32 6, i32 18, i32 10, i32 12, i32 14, i32 0,			<16 x i32> <i32 2, i32 1, i32 6, i32 18, i32 10, i32 12, i32 14, i32 0,
	i32 2, i32 31, i32 6, i32 30, i32 10, i32 12, i32 14, i32 0>			i32 2, i32 31, i32 6, i32 30, i32 10, i32 12, i32 14, i32 0>
	ret <16 x i1> %Shuff			ret <16 x i1> %Shuff
	}			}

test/CodeGen/PowerPC/vperm-lowering.ll

	; RUN: llc -O0 -fast-isel=false -mcpu=ppc64 < %s \| FileCheck %s			; RUN: llc -O0 -fast-isel=false -mcpu=ppc64 < %s \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"
	target triple = "powerpc64le-unknown-linux-gnu"			target triple = "powerpc64le-unknown-linux-gnu"

	define <16 x i8> @foo() nounwind ssp {			define <16 x i8> @foo() nounwind ssp {
	%1 = shufflevector <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>, <16 x i8> <i8 16, i8 17, i8 18, i8 19, i8 20, i8 21, i8 22, i8 23, i8 24, i8 25, i8 26, i8 27, i8 28, i8 29, i8 30, i8 31>, <16 x i32> <i32 0, i32 5, i32 10, i32 15, i32 20, i32 25, i32 30, i32 3, i32 8, i32 13, i32 18, i32 23, i32 28, i32 1, i32 6, i32 11>			%1 = shufflevector <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>, <16 x i8> <i8 16, i8 17, i8 18, i8 19, i8 20, i8 21, i8 22, i8 23, i8 24, i8 25, i8 26, i8 27, i8 28, i8 29, i8 30, i8 31>, <16 x i32> <i32 0, i32 5, i32 10, i32 15, i32 20, i32 25, i32 30, i32 3, i32 8, i32 13, i32 18, i32 23, i32 28, i32 1, i32 6, i32 11>
	ret <16 x i8> %1			ret <16 x i8> %1
	}			}

	; CHECK: .LCPI0_0:			; CHECK: .LCPI0_0:
	; CHECK: .byte 31			; CHECK: .byte 0
	; CHECK: .byte 26			; CHECK: .byte 5
	; CHECK: .byte 21			; CHECK: .byte 10
	; CHECK: .byte 16			; CHECK: .byte 15
	; CHECK: .byte 11			; CHECK: .byte 20
	; CHECK: .byte 6			; CHECK: .byte 25
	; CHECK: .byte 1			; CHECK: .byte 30
	; CHECK: .byte 28			; CHECK: .byte 3
	; CHECK: .byte 23			; CHECK: .byte 8
	; CHECK: .byte 18			; CHECK: .byte 13
	; CHECK: .byte 13			; CHECK: .byte 18
	; CHECK: .byte 8			; CHECK: .byte 23
	; CHECK: .byte 3			; CHECK: .byte 28
	; CHECK: .byte 30			; CHECK: .byte 1
	; CHECK: .byte 25			; CHECK: .byte 6
	; CHECK: .byte 20			; CHECK: .byte 11
	; CHECK: .LCPI0_1:			; CHECK: foo:
	; CHECK: .byte 0			; CHECK: addis [[REG1:[0-9]+]], 2, .LCPI0_0@toc@ha
	; CHECK: .byte 1			; CHECK: addi [[REG2:[0-9]+]], [[REG1]], .LCPI0_0@toc@l
	; CHECK: .byte 2			; CHECK: lvx [[REG3:[0-9]+]], 0, [[REG2]]
	; CHECK: .byte 3
	; CHECK: .byte 4
	; CHECK: .byte 5
	; CHECK: .byte 6
	; CHECK: .byte 7
	; CHECK: .byte 8
	; CHECK: .byte 9
	; CHECK: .byte 10
	; CHECK: .byte 11
	; CHECK: .byte 12
	; CHECK: .byte 13
	; CHECK: .byte 14
	; CHECK: .byte 15
	; CHECK: .LCPI0_2:
	; CHECK: .byte 16
	; CHECK: .byte 17
	; CHECK: .byte 18
	; CHECK: .byte 19
	; CHECK: .byte 20
	; CHECK: .byte 21
	; CHECK: .byte 22
	; CHECK: .byte 23
	; CHECK: .byte 24
	; CHECK: .byte 25
	; CHECK: .byte 26
	; CHECK: .byte 27
	; CHECK: .byte 28
	; CHECK: .byte 29
	; CHECK: .byte 30
	; CHECK: .byte 31
	; CHECK: foo:
	; CHECK: addis [[REG1:[0-9]+]], 2, .LCPI0_2@toc@ha
	; CHECK: addi [[REG2:[0-9]+]], [[REG1]], .LCPI0_2@toc@l
	; CHECK: lvx [[REG3:[0-9]+]], 0, [[REG2]]
	; CHECK: vperm {{[0-9]+}}, [[REG3]], {{[0-9]+}}, {{[0-9]+}}

test/CodeGen/X86/mmx-bitcast.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	entry:
%tmp2 = bitcast <1 x i64> %A to x86_mmx		%tmp2 = bitcast <1 x i64> %A to x86_mmx
%tmp3 = bitcast <1 x i64> %B to x86_mmx		%tmp3 = bitcast <1 x i64> %B to x86_mmx
%tmp7 = tail call x86_mmx @llvm.x86.mmx.paddus.w(x86_mmx %tmp2, x86_mmx %tmp3)		%tmp7 = tail call x86_mmx @llvm.x86.mmx.paddus.w(x86_mmx %tmp2, x86_mmx %tmp3)
store x86_mmx %tmp7, x86_mmx* @R		store x86_mmx %tmp7, x86_mmx* @R
tail call void @llvm.x86.mmx.emms()		tail call void @llvm.x86.mmx.emms()
ret void		ret void
}		}

define i64 @t5(i32 %a, i32 %b) nounwind readnone {		define i64 @t5(i32 %a, i32 %b) nounwind readnone {
; CHECK-LABEL: t5:		; CHECK-LABEL: t5:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
		andreadbUnsubmitted Not Done Reply Inline Actions I don't think this is related to your patch. However, this looks like a bug to me. Shouldn't this be a 'movq'? andreadb: I don't think this is related to your patch. However, this looks like a bug to me. Shouldn't…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions movd deals with 32 and 64-bit gprs <-> vector moves. movq does vector load/stores and vector <-> vector moves. Its a funny old world. RKSimon: movd deals with 32 and 64-bit gprs <-> vector moves. movq does vector load/stores and vector <…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Actually scrub that - this does appear to be a bug. Its is a (v)movq instruction - but encoded similar to the (v)movd and completely separate to the (v)movq vector version. RKSimon: Actually scrub that - this does appear to be a bug. Its is a (v)movq instruction - but encoded…
		andreadbUnsubmitted Not Done Reply Inline Actions Anyway, this problem is not related to your patch. In case, you can raise a bug for it. andreadb: Anyway, this problem is not related to your patch. In case, you can raise a bug for it.
		spatelUnsubmitted Not Done Reply Inline Actions I had a similar question in: http://reviews.llvm.org/D8691 Looks like some weirdness due to opcode prefixes; one of the movq versions won't work on a 64-bit system. spatel: I had a similar question in: http://reviews.llvm.org/D8691 Looks like some weirdness due to…
; CHECK-NEXT: movd		; CHECK-NEXT: movd
; CHECK-NEXT: movd		; CHECK-NEXT: movd
; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]		; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]		; CHECK-NEXT: movd %xmm1, %rax
; CHECK-NEXT: movd %xmm0, %rax		; CHECK-NEXT: retq
; CHECK-NEXT: retq		%v0 = insertelement <2 x i32> undef, i32 %a, i32 0
%v0 = insertelement <2 x i32> undef, i32 %a, i32 0		%v1 = insertelement <2 x i32> %v0, i32 %b, i32 1
%v1 = insertelement <2 x i32> %v0, i32 %b, i32 1		%conv = bitcast <2 x i32> %v1 to i64
%conv = bitcast <2 x i32> %v1 to i64		ret i64 %conv
ret i64 %conv		}
}

declare x86_mmx @llvm.x86.mmx.pslli.q(x86_mmx, i32)		declare x86_mmx @llvm.x86.mmx.pslli.q(x86_mmx, i32)

define <1 x i64> @t6(i64 %t) {		define <1 x i64> @t6(i64 %t) {
; CHECK-LABEL: t6:		; CHECK-LABEL: t6:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
; CHECK-NEXT: movd		; CHECK-NEXT: movd
; CHECK-NEXT: psllq $48, %mm0		; CHECK-NEXT: psllq $48, %mm0
Show All 16 Lines

test/CodeGen/X86/sse41.ll

	Show First 20 Lines • Show All 1,013 Lines • ▼ Show 20 Lines
	; X32: ## BB#0:			; X32: ## BB#0:
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm0[2],mem[2]			; X32-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm0[2],mem[2]
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: pr20087:			; X64-LABEL: pr20087:
	; X64: ## BB#0:			; X64: ## BB#0:
	; X64-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm0[2],mem[2]			; X64-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm0[2],mem[2]
	; X64-NEXT: retq			; X64-NEXT: retq
	%load = load <4 x float> , <4 x float> *%ptr			%load = load <4 x float> , <4 x float> *%ptr
	%ret = shufflevector <4 x float> %load, <4 x float> %a, <4 x i32> <i32 4, i32 undef, i32 6, i32 2>			%ret = shufflevector <4 x float> %load, <4 x float> %a, <4 x i32> <i32 4, i32 undef, i32 6, i32 2>
	ret <4 x float> %ret			ret <4 x float> %ret
	}			}

	; Edge case for insertps where we end up with a shuffle with mask=<0, 7, -1, -1>			define <4 x float> @insertps_4(<4 x float> %A, <4 x float> %B) {
	define void @insertps_pr20411(i32* noalias nocapture %RET) #1 {			; X32-LABEL: insertps_4:
	andreadbUnsubmitted Not Done Reply Inline Actions My question is: do we have an equivalent test for insertps somewhere else? If so, then I think it is OK to remove it. Otherwise I would keep it. andreadb: My question is: do we have an equivalent test for insertps somewhere else? If so, then I think…
	RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Its a reduced test that with that patch folds to a store of a constant. I'll change it to use non-constant vector data and see if that works. RKSimon: Its a reduced test that with that patch folds to a store of a constant. I'll change it to use…
	; X32-LABEL: insertps_pr20411:			; X32: ## BB#0: ## %entry
	; X32: ## BB#0:			; X32-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm1[2],zero
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: retl
	; X32-NEXT: pshufd {{.*#+}} xmm0 = mem[2,3,0,1]			;
	; X32-NEXT: pshufd {{.*#+}} xmm1 = mem[3,1,2,3]
	; X32-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0,1],xmm0[2,3],xmm1[4,5,6,7]
	; X32-NEXT: movdqu %xmm1, (%eax)
	; X32-NEXT: retl
	;
	; X64-LABEL: insertps_pr20411:
	; X64: ## BB#0:
	; X64-NEXT: pshufd {{.*#+}} xmm0 = mem[2,3,0,1]
	; X64-NEXT: pshufd {{.*#+}} xmm1 = mem[3,1,2,3]
	; X64-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0,1],xmm0[2,3],xmm1[4,5,6,7]
	; X64-NEXT: movdqu %xmm1, (%rdi)
	; X64-NEXT: retq
	%gather_load = shufflevector <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%shuffle109 = shufflevector <4 x i32> <i32 4, i32 5, i32 6, i32 7>, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; 4 5 6 7
	%shuffle116 = shufflevector <8 x i32> %gather_load, <8 x i32> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef> ; 3 x x x
	%shuffle117 = shufflevector <4 x i32> %shuffle109, <4 x i32> %shuffle116, <4 x i32> <i32 4, i32 3, i32 undef, i32 undef> ; 3 7 x x
	%ptrcast = bitcast i32* %RET to <4 x i32>*
	store <4 x i32> %shuffle117, <4 x i32>* %ptrcast, align 4
	ret void
	}

	define <4 x float> @insertps_4(<4 x float> %A, <4 x float> %B) {
	; X32-LABEL: insertps_4:
	; X32: ## BB#0: ## %entry
	; X32-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm1[2],zero
	; X32-NEXT: retl
	;
	; X64-LABEL: insertps_4:			; X64-LABEL: insertps_4:
	; X64: ## BB#0: ## %entry			; X64: ## BB#0: ## %entry
	; X64-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm1[2],zero			; X64-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm1[2],zero
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%vecext = extractelement <4 x float> %A, i32 0			%vecext = extractelement <4 x float> %A, i32 0
	%vecinit = insertelement <4 x float> undef, float %vecext, i32 0			%vecinit = insertelement <4 x float> undef, float %vecext, i32 0
	%vecinit1 = insertelement <4 x float> %vecinit, float 0.000000e+00, i32 1			%vecinit1 = insertelement <4 x float> %vecinit, float 0.000000e+00, i32 1
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_insert-5.ll

	; RUN: llc < %s -march=x86 -mattr=+sse2,+ssse3 \| FileCheck %s			; RUN: llc < %s -march=x86 -mattr=+sse2,+ssse3 \| FileCheck %s
	; There are no MMX operations in @t1			; There are no MMX operations in @t1

	define void @t1(i32 %a, x86_mmx* %P) nounwind {			define void @t1(i32 %a, x86_mmx* %P) nounwind {
	; CHECK-LABEL: t1:			; CHECK-LABEL: t1:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: shll $12, %ecx			; CHECK-NEXT: shll $12, %ecx
	; CHECK-NEXT: movd %ecx, %xmm0			; CHECK-NEXT: movd %ecx, %xmm0
	; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,0,1]			; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]
	; CHECK-NEXT: movlpd %xmm0, (%eax)			; CHECK-NEXT: movlpd %xmm0, (%eax)
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%tmp12 = shl i32 %a, 12			%tmp12 = shl i32 %a, 12
	%tmp21 = insertelement <2 x i32> undef, i32 %tmp12, i32 1			%tmp21 = insertelement <2 x i32> undef, i32 %tmp12, i32 1
	%tmp22 = insertelement <2 x i32> %tmp21, i32 0, i32 0			%tmp22 = insertelement <2 x i32> %tmp21, i32 0, i32 0
	%tmp23 = bitcast <2 x i32> %tmp22 to x86_mmx			%tmp23 = bitcast <2 x i32> %tmp22 to x86_mmx
	store x86_mmx %tmp23, x86_mmx* %P			store x86_mmx %tmp23, x86_mmx* %P
	ret void			ret void
	}			}

	define <4 x float> @t2(<4 x float>* %P) nounwind {			define <4 x float> @t2(<4 x float>* %P) nounwind {
	; CHECK-LABEL: t2:			; CHECK-LABEL: t2:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_insert-mmx.ll

	; RUN: llc < %s -mtriple=i686-darwin -mattr=+mmx,+sse2 \| FileCheck %s -check-prefix=X86-32			; RUN: llc < %s -mtriple=i686-darwin -mattr=+mmx,+sse2 \| FileCheck %s -check-prefix=X86-32
	; RUN: llc < %s -mtriple=x86_64-darwin -mattr=+mmx,+sse4.1 \| FileCheck %s -check-prefix=X86-64			; RUN: llc < %s -mtriple=x86_64-darwin -mattr=+mmx,+sse4.1 \| FileCheck %s -check-prefix=X86-64

	; This is not an MMX operation; promoted to XMM.			; This is not an MMX operation; promoted to XMM.
	define x86_mmx @t0(i32 %A) nounwind {			define x86_mmx @t0(i32 %A) nounwind {
	; X86-32-LABEL: t0:			; X86-32-LABEL: t0:
	; X86-32: ## BB#0:			; X86-32: ## BB#0:
	; X86-32: movd {{[0-9]+}}(%esp), %xmm0			; X86-32: movd {{[0-9]+}}(%esp), %xmm0
	; X86-32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,0,1]			; X86-32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]
	; X86-32-NEXT: movlpd %xmm0, (%esp)			; X86-32-NEXT: movlpd %xmm0, (%esp)
	; X86-32-NEXT: movq (%esp), %mm0			; X86-32-NEXT: movq (%esp), %mm0
	; X86-32-NEXT: addl $12, %esp			; X86-32-NEXT: addl $12, %esp
	; X86-32-NEXT: retl			; X86-32-NEXT: retl
	%tmp3 = insertelement <2 x i32> < i32 0, i32 undef >, i32 %A, i32 1			%tmp3 = insertelement <2 x i32> < i32 0, i32 undef >, i32 %A, i32 1
	%tmp4 = bitcast <2 x i32> %tmp3 to x86_mmx			%tmp4 = bitcast <2 x i32> %tmp3 to x86_mmx
	ret x86_mmx %tmp4			ret x86_mmx %tmp4
	}			}

	define <8 x i8> @t1(i8 zeroext %x) nounwind {			define <8 x i8> @t1(i8 zeroext %x) nounwind {
	; X86-32-LABEL: t1:			; X86-32-LABEL: t1:
	; X86-32: ## BB#0:			; X86-32: ## BB#0:
	; X86-32-NOT: movl			; X86-32-NOT: movl
	; X86-32-NEXT: movd {{[0-9]+}}(%esp), %xmm0			; X86-32-NEXT: movd {{[0-9]+}}(%esp), %xmm0
	Show All 35 Lines

test/CodeGen/X86/vec_zero_cse.ll

	; RUN: llc < %s -relocation-model=static -mtriple=i686-unknown -mattr=+mmx,+sse3 \| FileCheck %s			; RUN: llc < %s -relocation-model=static -mtriple=i686-unknown -mattr=+mmx,+sse3 \| FileCheck %s
	; 64-bit stores here do not use MMX.			; 64-bit stores here do not use MMX.

	@M1 = external global <1 x i64>			@M1 = external global <1 x i64>
	@M2 = external global <2 x i32>			@M2 = external global <2 x i32>

	@S1 = external global <2 x i64>			@S1 = external global <2 x i64>
	@S2 = external global <4 x i32>			@S2 = external global <4 x i32>

	define void @test1() {			define void @test1() {
	;CHECK-LABEL: @test1			;CHECK-LABEL: @test1
	;CHECK: xorpd			;CHECK: xorpd
	store <1 x i64> zeroinitializer, <1 x i64>* @M1			store <1 x i64> zeroinitializer, <1 x i64>* @M1
	store <2 x i32> zeroinitializer, <2 x i32>* @M2			store <2 x i32> zeroinitializer, <2 x i32>* @M2
	ret void			ret void
	}			}

	define void @test2() {			define void @test2() {
	;CHECK-LABEL: @test2			;CHECK-LABEL: @test2
	;CHECK: pshufd			;CHECK: pcmpeqd
	store <1 x i64> < i64 -1 >, <1 x i64>* @M1			store <1 x i64> < i64 -1 >, <1 x i64>* @M1
	store <2 x i32> < i32 -1, i32 -1 >, <2 x i32>* @M2			store <2 x i32> < i32 -1, i32 -1 >, <2 x i32>* @M2
	ret void			ret void
	}			}

	define void @test3() {			define void @test3() {
	;CHECK-LABEL: @test3			;CHECK-LABEL: @test3
	;CHECK: xorps			;CHECK: xorps
	store <2 x i64> zeroinitializer, <2 x i64>* @S1			store <2 x i64> zeroinitializer, <2 x i64>* @S1
	store <4 x i32> zeroinitializer, <4 x i32>* @S2			store <4 x i32> zeroinitializer, <4 x i32>* @S2
	ret void			ret void
	Show All 11 Lines

test/CodeGen/X86/vector-shuffle-128-v16.ll

	Show First 20 Lines • Show All 622 Lines • ▼ Show 20 Lines
	; SSE41-LABEL: PR20540:			; SSE41-LABEL: PR20540:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14],zero,zero,zero,zero,zero,zero,zero,zero			; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14],zero,zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: PR20540:			; AVX-LABEL: PR20540:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14],zero,zero,zero,zero,zero,zero,zero,zero			; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14],zero,zero,zero,zero,zero,zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			%shuffle = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {			define <16 x i8> @shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {
	; SSE2-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; SSE-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE2: # BB#0:			; SSE: # BB#0:
	; SSE2-NEXT: movzbl %dil, %eax			; SSE-NEXT: movzbl %dil, %eax
	; SSE2-NEXT: movd %eax, %xmm0			; SSE-NEXT: movd %eax, %xmm0
	; SSE2-NEXT: retq			; SSE-NEXT: retq
	;			;
				andreadbUnsubmitted Not Done Reply Inline Actions Shouldn't this be 'vmovd'? andreadb: Shouldn't this be 'vmovd'?
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Fixed. RKSimon: Fixed.
	; SSSE3-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSSE3: # BB#0:			; AVX: # BB#0:
	; SSSE3-NEXT: movd %edi, %xmm0			; AVX-NEXT: movzbl %dil, %eax
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; AVX-NEXT: movd %eax, %xmm0
	; SSSE3-NEXT: retq			; AVX-NEXT: retq
	;			%a = insertelement <16 x i8> undef, i8 %i, i32 0
	; SSE41-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 16, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; SSE41: # BB#0:			ret <16 x i8> %shuffle
	; SSE41-NEXT: movd %edi, %xmm0			}
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq			define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {
	;			; SSE-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; AVX-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; SSE: # BB#0:
	; AVX: # BB#0:			; SSE-NEXT: shll $8, %edi
	; AVX-NEXT: vmovd %edi, %xmm0			; SSE-NEXT: pxor %xmm0, %xmm0
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; SSE-NEXT: pinsrw $2, %edi, %xmm0
	; AVX-NEXT: retq			; SSE-NEXT: retq
	%a = insertelement <16 x i8> undef, i8 %i, i32 0
	%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 16, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	ret <16 x i8> %shuffle			; AVX: # BB#0:
	}			; AVX-NEXT: shll $8, %edi
				; AVX-NEXT: vpxor %xmm0, %xmm0
	define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {			; AVX-NEXT: vpinsrw $2, %edi, %xmm0
	; SSE2-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX-NEXT: retq
	; SSE2: # BB#0:			%a = insertelement <16 x i8> undef, i8 %i, i32 0
	; SSE2-NEXT: movzbl %dil, %eax			%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; SSE2-NEXT: movd %eax, %xmm0			ret <16 x i8> %shuffle
	; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10]			}
	; SSE2-NEXT: retq
	;			define <16 x i8> @shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16(i8 %i) {
	; SSSE3-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; SSE-LABEL: shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16:
	; SSSE3: # BB#0:			; SSE: # BB#0:
	; SSSE3-NEXT: movd %edi, %xmm0			; SSE-NEXT: shll $8, %edi
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; SSE-NEXT: pxor %xmm0, %xmm0
	; SSSE3-NEXT: retq			; SSE-NEXT: pinsrw $7, %edi, %xmm0
	;			; SSE-NEXT: retq
	; SSE41-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			;
	; SSE41: # BB#0:			; AVX-LABEL: shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16:
	; SSE41-NEXT: movd %edi, %xmm0			; AVX: # BB#0:
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; AVX-NEXT: shll $8, %edi
	; SSE41-NEXT: retq			; AVX-NEXT: vpxor %xmm0, %xmm0
	;			; AVX-NEXT: vpinsrw $7, %edi, %xmm0
	; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX-NEXT: retq
	; AVX: # BB#0:			%a = insertelement <16 x i8> undef, i8 %i, i32 0
	; AVX-NEXT: vmovd %edi, %xmm0			%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 16>
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			ret <16 x i8> %shuffle
	; AVX-NEXT: retq			}
	%a = insertelement <16 x i8> undef, i8 %i, i32 0
	%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			define <16 x i8> @shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {
	ret <16 x i8> %shuffle			; SSE-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	}			; SSE: # BB#0:
				; SSE-NEXT: movzbl %dil, %eax
	define <16 x i8> @shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16(i8 %i) {			; SSE-NEXT: pxor %xmm0, %xmm0
	; SSE-LABEL: shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16:			; SSE-NEXT: pinsrw $1, %eax, %xmm0
	; SSE: # BB#0:			; SSE-NEXT: retq
	; SSE-NEXT: movd %edi, %xmm0			;
	; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0]			; AVX-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE-NEXT: retq			; AVX: # BB#0:
	;			; AVX-NEXT: movzbl %dil, %eax
	; AVX-LABEL: shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16:			; AVX-NEXT: vpxor %xmm0, %xmm0
	; AVX: # BB#0:			; AVX-NEXT: vpinsrw $1, %eax, %xmm0
	; AVX-NEXT: vmovd %edi, %xmm0			; AVX-NEXT: retq
	; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0]			%a = insertelement <16 x i8> undef, i8 %i, i32 3
	; AVX-NEXT: retq			%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 1, i32 19, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	%a = insertelement <16 x i8> undef, i8 %i, i32 0			ret <16 x i8> %shuffle
	%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 16>			}
	ret <16 x i8> %shuffle
	}

	define <16 x i8> @shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {
	; SSE2-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE2: # BB#0:
	; SSE2-NEXT: movzbl %dil, %eax
	; SSE2-NEXT: movd %eax, %xmm0
	; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
	; SSE2-NEXT: retq
	;
	; SSSE3-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSSE3: # BB#0:
	; SSSE3-NEXT: movd %edi, %xmm0
	; SSSE3-NEXT: pslld $24, %xmm0
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSSE3-NEXT: retq
	;
	; SSE41-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE41: # BB#0:
	; SSE41-NEXT: movd %edi, %xmm0
	; SSE41-NEXT: pslld $24, %xmm0
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq
	;
	; AVX-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; AVX: # BB#0:
	; AVX-NEXT: vmovd %edi, %xmm0
	; AVX-NEXT: vpslld $24, %xmm0, %xmm0
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; AVX-NEXT: retq
	%a = insertelement <16 x i8> undef, i8 %i, i32 3
	%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 1, i32 19, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	ret <16 x i8> %shuffle
	}

	define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu(<16 x i8> %a) {
	; SSE-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu:			; SSE-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3]			; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu:			; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu:
	; AVX: # BB#0:			; AVX: # BB#0:
	▲ Show 20 Lines • Show All 605 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-128-v8.ll

	Show First 20 Lines • Show All 1,372 Lines • ▼ Show 20 Lines
	;			;
	; AVX-LABEL: shuffle_v8i16_8zzzzzzz:			; AVX-LABEL: shuffle_v8i16_8zzzzzzz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: movzwl %di, %eax			; AVX-NEXT: movzwl %di, %eax
	; AVX-NEXT: vmovd %eax, %xmm0			; AVX-NEXT: vmovd %eax, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = insertelement <8 x i16> undef, i16 %i, i32 0			%a = insertelement <8 x i16> undef, i16 %i, i32 0
	%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	ret <8 x i16> %shuffle			ret <8 x i16> %shuffle
	}			}

	define <8 x i16> @shuffle_v8i16_z8zzzzzz(i16 %i) {			define <8 x i16> @shuffle_v8i16_z8zzzzzz(i16 %i) {
	; SSE-LABEL: shuffle_v8i16_z8zzzzzz:			; SSE-LABEL: shuffle_v8i16_z8zzzzzz:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movzwl %di, %eax			; SSE-NEXT: pxor %xmm0, %xmm0
	; SSE-NEXT: movd %eax, %xmm0			; SSE-NEXT: pinsrw $1, %edi, %xmm0
	; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]			; SSE-NEXT: retq
	; SSE-NEXT: retq			;
	;			; AVX-LABEL: shuffle_v8i16_z8zzzzzz:
	; AVX-LABEL: shuffle_v8i16_z8zzzzzz:			; AVX: # BB#0:
	; AVX: # BB#0:			; AVX-NEXT: vpxor %xmm0, %xmm0
	; AVX-NEXT: movzwl %di, %eax			; AVX-NEXT: vpinsrw $1, %edi, %xmm0
	; AVX-NEXT: vmovd %eax, %xmm0			; AVX-NEXT: retq
	; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]			%a = insertelement <8 x i16> undef, i16 %i, i32 0
	; AVX-NEXT: retq			%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 2, i32 8, i32 3, i32 7, i32 6, i32 5, i32 4, i32 3>
	%a = insertelement <8 x i16> undef, i16 %i, i32 0			ret <8 x i16> %shuffle
	%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 2, i32 8, i32 3, i32 7, i32 6, i32 5, i32 4, i32 3>			}
	ret <8 x i16> %shuffle
	}			define <8 x i16> @shuffle_v8i16_zzzzz8zz(i16 %i) {
				; SSE-LABEL: shuffle_v8i16_zzzzz8zz:
	define <8 x i16> @shuffle_v8i16_zzzzz8zz(i16 %i) {			; SSE: # BB#0:
	; SSE-LABEL: shuffle_v8i16_zzzzz8zz:			; SSE-NEXT: pxor %xmm0, %xmm0
	; SSE: # BB#0:			; SSE-NEXT: pinsrw $5, %edi, %xmm0
	; SSE-NEXT: movzwl %di, %eax			; SSE-NEXT: retq
	; SSE-NEXT: movd %eax, %xmm0			;
	; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]			; AVX-LABEL: shuffle_v8i16_zzzzz8zz:
	; SSE-NEXT: retq			; AVX: # BB#0:
	;			; AVX-NEXT: vpxor %xmm0, %xmm0
	; AVX-LABEL: shuffle_v8i16_zzzzz8zz:			; AVX-NEXT: vpinsrw $5, %edi, %xmm0
	; AVX: # BB#0:			; AVX-NEXT: retq
	; AVX-NEXT: movzwl %di, %eax			%a = insertelement <8 x i16> undef, i16 %i, i32 0
	; AVX-NEXT: vmovd %eax, %xmm0			%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 8, i32 0, i32 0>
	; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]			ret <8 x i16> %shuffle
	; AVX-NEXT: retq			}
	%a = insertelement <8 x i16> undef, i16 %i, i32 0
	%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 8, i32 0, i32 0>			define <8 x i16> @shuffle_v8i16_zuuzuuz8(i16 %i) {
	ret <8 x i16> %shuffle			; SSE-LABEL: shuffle_v8i16_zuuzuuz8:
	}			; SSE: # BB#0:
				; SSE-NEXT: pxor %xmm0, %xmm0
	define <8 x i16> @shuffle_v8i16_zuuzuuz8(i16 %i) {			; SSE-NEXT: pinsrw $7, %edi, %xmm0
	; SSE-LABEL: shuffle_v8i16_zuuzuuz8:			; SSE-NEXT: retq
	; SSE: # BB#0:			;
	; SSE-NEXT: movd %edi, %xmm0			; AVX-LABEL: shuffle_v8i16_zuuzuuz8:
	; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1]			; AVX: # BB#0:
	; SSE-NEXT: retq			; AVX-NEXT: vpxor %xmm0, %xmm0
	;			; AVX-NEXT: vpinsrw $7, %edi, %xmm0
	; AVX-LABEL: shuffle_v8i16_zuuzuuz8:			; AVX-NEXT: retq
	; AVX: # BB#0:			%a = insertelement <8 x i16> undef, i16 %i, i32 0
	; AVX-NEXT: vmovd %edi, %xmm0			%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 8>
	; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1]			ret <8 x i16> %shuffle
	; AVX-NEXT: retq			}
	%a = insertelement <8 x i16> undef, i16 %i, i32 0
	%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 8>			define <8 x i16> @shuffle_v8i16_zzBzzzzz(i16 %i) {
	ret <8 x i16> %shuffle			; SSE-LABEL: shuffle_v8i16_zzBzzzzz:
	}			; SSE: # BB#0:
				; SSE-NEXT: pxor %xmm0, %xmm0
	define <8 x i16> @shuffle_v8i16_zzBzzzzz(i16 %i) {			; SSE-NEXT: pinsrw $2, %edi, %xmm0
	; SSE-LABEL: shuffle_v8i16_zzBzzzzz:			; SSE-NEXT: retq
	; SSE: # BB#0:			;
	; SSE-NEXT: movzwl %di, %eax			; AVX-LABEL: shuffle_v8i16_zzBzzzzz:
	; SSE-NEXT: movd %eax, %xmm0			; AVX: # BB#0:
	; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11]			; AVX-NEXT: vpxor %xmm0, %xmm0
	; SSE-NEXT: retq			; AVX-NEXT: vpinsrw $2, %edi, %xmm0
	;			; AVX-NEXT: retq
	; AVX-LABEL: shuffle_v8i16_zzBzzzzz:			%a = insertelement <8 x i16> undef, i16 %i, i32 3
	; AVX: # BB#0:			%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 11, i32 3, i32 4, i32 5, i32 6, i32 7>
	; AVX-NEXT: movzwl %di, %eax			ret <8 x i16> %shuffle
	; AVX-NEXT: vmovd %eax, %xmm0			}
	; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11]
	; AVX-NEXT: retq
	%a = insertelement <8 x i16> undef, i16 %i, i32 3
	%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 11, i32 3, i32 4, i32 5, i32 6, i32 7>
	ret <8 x i16> %shuffle
	}

	define <8 x i16> @shuffle_v8i16_def01234(<8 x i16> %a, <8 x i16> %b) {			define <8 x i16> @shuffle_v8i16_def01234(<8 x i16> %a, <8 x i16> %b) {
	; SSE2-LABEL: shuffle_v8i16_def01234:			; SSE2-LABEL: shuffle_v8i16_def01234:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: psrldq {{.*#+}} xmm1 = xmm1[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; SSE2-NEXT: psrldq {{.*#+}} xmm1 = xmm1[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9]			; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9]
	; SSE2-NEXT: por %xmm1, %xmm0			; SSE2-NEXT: por %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	▲ Show 20 Lines • Show All 688 Lines • Show Last 20 Lines