This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
ClosedPublic

Authored by RKSimon on Mar 22 2015, 7:10 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet
chandlerc
andreadb
resistor

Commits

rGed2ba33ba0b3: [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
rL234004: [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR

Summary

This patch attempts to optimize the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

Also removed a x86 insertps test that was testing for the old shuffle lowering system.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 22419.Mar 22 2015, 7:10 AM

RKSimon retitled this revision from to [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR.

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: qcolombet, chandlerc, andreadb, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

We need to be careful here. Constant loads can be much more expensive than shuffles on some targets.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12019 ↗	(On Diff #22419)	Add space after if.
12029 ↗	(On Diff #22419)	When would this be false?
12042 ↗	(On Diff #22419)	It might be better to use ZExt, depending on the target/type. Might be better to use something like this: Op = isZExtFree(Op.getValueType(), SVT) ? DAG.getZExtOrTrunc(Op, SDLoc(N), SVT) : DAG.getSExtOrTrunc(Op, SDLoc(N), SVT);

Thanks for the review Hal, I'll update the patch later.

In D8516#145467, @hfinkel wrote:

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

We need to be careful here. Constant loads can be much more expensive than shuffles on some targets.

Yes that was my concern as well - there isn't an easy way to determine how expensive the shuffle is that we're trying to remove. I'll leave it as it is for now and only combine when the inputs.are only used once.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12029 ↗	(On Diff #22419)	We have an early out from the loop if an active operand turns out to be something other than BUILD_VECTOR or SCALAR_TO_VECTOR. I'll move this logic to the opening if() entry into the block.
12042 ↗	(On Diff #22419)	Easy enough to add.

Updated patch based on Hal's review.

Updated patch

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12032 ↗	(On Diff #22555)	I ended up keeping this in as it makes for much easier understanding than all the conditions I would have to add to the if() test. I've added a comment explaining the 'bail out' to make it clearer.

PING

ping * 2

Looks good to me.

This revision is now accepted and ready to land.Apr 1 2015, 9:48 AM

andreadb added inline comments.Apr 1 2015, 10:03 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12033–12035 ↗	(On Diff #22555)	On x86, a build_vector with all operands undef excluding the first operand is legalized to a scalar_to_vector. I am not sure if other targets would do the same; in case, I would suggest to remove this code and just emit a build_vector. Do you have an example that relies on this check? A shuffles with only one non-undef element should have been canonicalized/simplified to a shuffle where one of the operands is undef.
test/CodeGen/X86/mmx-bitcast.ll
78 ↗	(On Diff #22555)	I don't think this is related to your patch. However, this looks like a bug to me. Shouldn't this be a 'movq'?
test/CodeGen/X86/sse41.ll
1028–1029 ↗	(On Diff #22555)	My question is: do we have an equivalent test for insertps somewhere else? If so, then I think it is OK to remove it. Otherwise I would keep it.
test/CodeGen/X86/vector-shuffle-128-v16.ll
646 ↗	(On Diff #22555)	Shouldn't this be 'vmovd'?

Thanks Andrea. Are you happy with me submitting with these changes (after tests) or would prefer another review?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12033–12035 ↗	(On Diff #22555)	No specific need - its easy enough to always create a BUILD_VECTOR and rely on a target's lowering logic to do the right thing.
test/CodeGen/X86/mmx-bitcast.ll
78 ↗	(On Diff #22555)	movd deals with 32 and 64-bit gprs <-> vector moves. movq does vector load/stores and vector <-> vector moves. Its a funny old world.
test/CodeGen/X86/sse41.ll
1028–1029 ↗	(On Diff #22555)	Its a reduced test that with that patch folds to a store of a constant. I'll change it to use non-constant vector data and see if that works.
test/CodeGen/X86/vector-shuffle-128-v16.ll
646 ↗	(On Diff #22555)	Fixed.

RKSimon added inline comments.Apr 1 2015, 11:34 AM

test/CodeGen/X86/mmx-bitcast.ll
78 ↗	(On Diff #22555)	Actually scrub that - this does appear to be a bug. Its is a (v)movq instruction - but encoded similar to the (v)movd and completely separate to the (v)movq vector version.

In D8516#150644, @RKSimon wrote:

Thanks Andrea. Are you happy with me submitting with these changes (after tests) or would prefer another review?

The patch looks good to me.
Thanks!

test/CodeGen/X86/mmx-bitcast.ll
78 ↗	(On Diff #22555)	Anyway, this problem is not related to your patch. In case, you can raise a bug for it.

spatel added inline comments.Apr 1 2015, 12:31 PM

test/CodeGen/X86/mmx-bitcast.ll
78 ↗	(On Diff #22555)	I had a similar question in: http://reviews.llvm.org/D8691 Looks like some weirdness due to opcode prefixes; one of the movq versions won't work on a 64-bit system.

Closed by commit rL234004: [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR (authored by RKSimon). · Explain WhyApr 3 2015, 3:05 AM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in D8885: [CodeGen] Combine small-element shuffles of scalar_to_vector in terms of the wider scalar..Apr 8 2015, 11:18 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

37 lines

test/

CodeGen/

AArch64/

arm64-neon-copy.ll

2 lines

arm64-vshuffle.ll

95 lines

PowerPC/

vperm-lowering.ll

57 lines

X86/

3 lines

29 lines

2 lines

2 lines

2 lines

vector-shuffle-128-v16.ll

99 lines

vector-shuffle-128-v8.ll

38 lines

Diff 23209

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,974 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::CONCAT_VECTORS &&
(N1.getOpcode() == ISD::CONCAT_VECTORS &&		(N1.getOpcode() == ISD::CONCAT_VECTORS &&
N0.getOperand(0).getValueType() == N1.getOperand(0).getValueType()))) {		N0.getOperand(0).getValueType() == N1.getOperand(0).getValueType()))) {
SDValue V = partitionShuffleOfConcats(N, DAG);		SDValue V = partitionShuffleOfConcats(N, DAG);

if (V.getNode())		if (V.getNode())
return V;		return V;
}		}

		// Attempt to combine a shuffle of 2 inputs of 'scalar sources' -
		// BUILD_VECTOR or SCALAR_TO_VECTOR into a single BUILD_VECTOR.
		if (Level < AfterLegalizeVectorOps && TLI.isTypeLegal(VT)) {
		SmallVector<SDValue, 8> Ops;
		for (int M : SVN->getMask()) {
		SDValue Op = DAG.getUNDEF(VT.getScalarType());
		if (M >= 0) {
		int Idx = M % NumElts;
		SDValue &S = (M < (int)NumElts ? N0 : N1);
		if (S.getOpcode() == ISD::BUILD_VECTOR && S.hasOneUse()) {
		Op = S.getOperand(Idx);
		} else if (S.getOpcode() == ISD::SCALAR_TO_VECTOR && S.hasOneUse()) {
		if (Idx == 0)
		Op = S.getOperand(0);
		} else {
		// Operand can't be combined - bail out.
		break;
		}
		}
		Ops.push_back(Op);
		}
		if (Ops.size() == VT.getVectorNumElements()) {
		// BUILD_VECTOR requires all inputs to be of the same type, find the
		// maximum type and extend them all.
		EVT SVT = VT.getScalarType();
		if (SVT.isInteger())
		for (SDValue &Op : Ops)
		SVT = (SVT.bitsLT(Op.getValueType()) ? Op.getValueType() : SVT);
		if (SVT != VT.getScalarType())
		for (SDValue &Op : Ops)
		Op = TLI.isZExtFree(Op.getValueType(), SVT)
		? DAG.getZExtOrTrunc(Op, SDLoc(N), SVT)
		: DAG.getSExtOrTrunc(Op, SDLoc(N), SVT);
		return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N), VT, Ops);
		}
		}

// If this shuffle only has a single input that is a bitcasted shuffle,		// If this shuffle only has a single input that is a bitcasted shuffle,
// attempt to merge the 2 shuffles and suitably bitcast the inputs/output		// attempt to merge the 2 shuffles and suitably bitcast the inputs/output
// back to their original types.		// back to their original types.
if (N0.getOpcode() == ISD::BITCAST && N0.hasOneUse() &&		if (N0.getOpcode() == ISD::BITCAST && N0.hasOneUse() &&
N1.getOpcode() == ISD::UNDEF && Level < AfterLegalizeVectorOps &&		N1.getOpcode() == ISD::UNDEF && Level < AfterLegalizeVectorOps &&
TLI.isTypeLegal(VT)) {		TLI.isTypeLegal(VT)) {

// Peek through the bitcast only if there is one user.		// Peek through the bitcast only if there is one user.
▲ Show 20 Lines • Show All 1,333 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-neon-copy.ll

Show First 20 Lines • Show All 1,080 Lines • ▼ Show 20 Lines	entry:
%vecinit1.i = insertelement <2 x i32> %vecinit.i, i32 %0, i32 1		%vecinit1.i = insertelement <2 x i32> %vecinit.i, i32 %0, i32 1
ret <2 x i32> %vecinit1.i		ret <2 x i32> %vecinit1.i
}		}

define <2 x i32> @test_concat_diff_v1i32_v1i32(i32 %a, i32 %b) {		define <2 x i32> @test_concat_diff_v1i32_v1i32(i32 %a, i32 %b) {
; CHECK-LABEL: test_concat_diff_v1i32_v1i32:		; CHECK-LABEL: test_concat_diff_v1i32_v1i32:
; CHECK: sqabs s{{[0-9]+}}, s{{[0-9]+}}		; CHECK: sqabs s{{[0-9]+}}, s{{[0-9]+}}
; CHECK: sqabs s{{[0-9]+}}, s{{[0-9]+}}		; CHECK: sqabs s{{[0-9]+}}, s{{[0-9]+}}
; CHECK-NEXT: zip1 {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.2s		; CHECK: ins {{v[0-9]+}}.s[1], w{{[0-9]+}}
entry:		entry:
%c = tail call i32 @llvm.aarch64.neon.sqabs.i32(i32 %a)		%c = tail call i32 @llvm.aarch64.neon.sqabs.i32(i32 %a)
%d = insertelement <2 x i32> undef, i32 %c, i32 0		%d = insertelement <2 x i32> undef, i32 %c, i32 0
%e = tail call i32 @llvm.aarch64.neon.sqabs.i32(i32 %b)		%e = tail call i32 @llvm.aarch64.neon.sqabs.i32(i32 %b)
%f = insertelement <2 x i32> undef, i32 %e, i32 0		%f = insertelement <2 x i32> undef, i32 %e, i32 0
%h = shufflevector <2 x i32> %d, <2 x i32> %f, <2 x i32> <i32 0, i32 2>		%h = shufflevector <2 x i32> %d, <2 x i32> %f, <2 x i32> <i32 0, i32 2>
ret <2 x i32> %h		ret <2 x i32> %h
}		}
▲ Show 20 Lines • Show All 348 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-vshuffle.ll

	; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone \| FileCheck %s


	; The mask:
	; CHECK: lCPI0_0:
	; CHECK: .byte 2 ; 0x2
	; CHECK: .byte 255 ; 0xff
	; CHECK: .byte 6 ; 0x6
	; CHECK: .byte 255 ; 0xff
	; The second vector is legalized to undef and the elements of the first vector
	; are used instead.
	; CHECK: .byte 2 ; 0x2
	; CHECK: .byte 4 ; 0x4
	; CHECK: .byte 6 ; 0x6
	; CHECK: .byte 0 ; 0x0
	; CHECK: test1			; CHECK: test1
	; CHECK: ldr d[[REG0:[0-9]+]], [{{.*}}, lCPI0_0			; CHECK: movi d[[REG0:[0-9]+]], #0000000000000000
	; CHECK: movi.8h v[[REG1:[0-9]+]], #0x1, lsl #8
	; CHECK: tbl.8b v{{[0-9]+}}, { v[[REG1]] }, v[[REG0]]
	define <8 x i1> @test1() {			define <8 x i1> @test1() {
	entry:			entry:
	%Shuff = shufflevector <8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,			%Shuff = shufflevector <8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
	i1 7>,			i1 7>,
	<8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,			<8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
	i1 7>,			i1 7>,
	<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10,			<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10,
	i32 12, i32 14, i32 0>			i32 12, i32 14, i32 0>
	ret <8 x i1> %Shuff			ret <8 x i1> %Shuff
	}			}

	; CHECK: lCPI1_0:			; CHECK: lCPI1_0:
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 255 ; 0xff			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 2 ; 0x2			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 255 ; 0xff			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 10 ; 0xa			; CHECK: .byte 1 ; 0x1
	; CHECK: .byte 12 ; 0xc			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 14 ; 0xe			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 7 ; 0x7			; CHECK: .byte 0 ; 0x0
	; CHECK: test2			; CHECK: test2
	; CHECK: ldr d[[REG0:[0-9]+]], [{{.*}}, lCPI1_0@PAGEOFF]			; CHECK: adrp x[[REG2:[0-9]+]], lCPI1_0@PAGE
	; CHECK: adrp x[[REG2:[0-9]+]], lCPI1_1@PAGE			; CHECK: ldr d[[REG1:[0-9]+]], [x[[REG2]], lCPI1_0@PAGEOFF]
	; CHECK: ldr q[[REG1:[0-9]+]], [x[[REG2]], lCPI1_1@PAGEOFF]
	; CHECK: tbl.8b v{{[0-9]+}}, { v[[REG1]] }, v[[REG0]]
	define <8 x i1>@test2() {			define <8 x i1>@test2() {
	bb:			bb:
	%Shuff = shufflevector <8 x i1> zeroinitializer,			%Shuff = shufflevector <8 x i1> zeroinitializer,
	<8 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,			<8 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,
	<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,			<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,
	i32 0>			i32 0>
	ret <8 x i1> %Shuff			ret <8 x i1> %Shuff
	}			}

	; CHECK: lCPI2_0:
	; CHECK: .byte 2 ; 0x2
	; CHECK: .byte 255 ; 0xff
	; CHECK: .byte 6 ; 0x6
	; CHECK: .byte 255 ; 0xff
	; CHECK: .byte 10 ; 0xa
	; CHECK: .byte 12 ; 0xc
	; CHECK: .byte 14 ; 0xe
	; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 2 ; 0x2
	; CHECK: .byte 255 ; 0xff
	; CHECK: .byte 6 ; 0x6
	; CHECK: .byte 255 ; 0xff
	; CHECK: .byte 10 ; 0xa
	; CHECK: .byte 12 ; 0xc
	; CHECK: .byte 14 ; 0xe
	; CHECK: .byte 0 ; 0x0
	; CHECK: test3			; CHECK: test3
	; CHECK: adrp x[[REG3:[0-9]+]], lCPI2_0@PAGE			; CHECK: movi.4s v{{[0-9]+}}, #0x1
	; CHECK: ldr q[[REG0:[0-9]+]], [x[[REG3]], lCPI2_0@PAGEOFF]
	; CHECK: ldr q[[REG1:[0-9]+]], [x[[REG3]], lCPI2_1@PAGEOFF]
	; CHECK: tbl.16b v{{[0-9]+}}, { v[[REG1]] }, v[[REG0]]
	define <16 x i1> @test3(i1* %ptr, i32 %v) {			define <16 x i1> @test3(i1* %ptr, i32 %v) {
	bb:			bb:
	%Shuff = shufflevector <16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>, <16 x i1> undef,			%Shuff = shufflevector <16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>, <16 x i1> undef,
	<16 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,			<16 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,
	i32 0, i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12,			i32 0, i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12,
	i32 14, i32 0>			i32 14, i32 0>
	ret <16 x i1> %Shuff			ret <16 x i1> %Shuff
	}			}
	; CHECK: lCPI3_1:			; CHECK: lCPI3_0:
				; CHECK: .byte 0 ; 0x0
				; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 1 ; 0x1			; CHECK: .byte 1 ; 0x1
	; CHECK: .byte 2 ; 0x2			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 18 ; 0x12			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 4 ; 0x4			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 5 ; 0x5			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 6 ; 0x6			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 7 ; 0x7			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 8 ; 0x8			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 31 ; 0x1f			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 10 ; 0xa			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 30 ; 0x1e			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 12 ; 0xc			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 13 ; 0xd			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 14 ; 0xe
	; CHECK: .byte 15 ; 0xf
	; CHECK: _test4:			; CHECK: _test4:
	; CHECK: ldr q[[REG1:[0-9]+]]			; CHECK: adrp x[[REG3:[0-9]+]], lCPI3_0@PAGE
	; CHECK: movi.2d v[[REG0:[0-9]+]], #0000000000000000			; CHECK: ldr q[[REG2:[0-9]+]], [x[[REG3]], lCPI3_0@PAGEOFF]
	; CHECK: adrp x[[REG3:[0-9]+]], lCPI3_1@PAGE
	; CHECK: ldr q[[REG2:[0-9]+]], [x[[REG3]], lCPI3_1@PAGEOFF]
	; CHECK: tbl.16b v{{[0-9]+}}, { v[[REG0]], v[[REG1]] }, v[[REG2]]
	define <16 x i1> @test4(i1* %ptr, i32 %v) {			define <16 x i1> @test4(i1* %ptr, i32 %v) {
	bb:			bb:
	%Shuff = shufflevector <16 x i1> zeroinitializer,			%Shuff = shufflevector <16 x i1> zeroinitializer,
	<16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1,			<16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1,
	i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,			i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,
	<16 x i32> <i32 2, i32 1, i32 6, i32 18, i32 10, i32 12, i32 14, i32 0,			<16 x i32> <i32 2, i32 1, i32 6, i32 18, i32 10, i32 12, i32 14, i32 0,
	i32 2, i32 31, i32 6, i32 30, i32 10, i32 12, i32 14, i32 0>			i32 2, i32 31, i32 6, i32 30, i32 10, i32 12, i32 14, i32 0>
	ret <16 x i1> %Shuff			ret <16 x i1> %Shuff
	}			}

llvm/trunk/test/CodeGen/PowerPC/vperm-lowering.ll

	; RUN: llc -O0 -fast-isel=false -mcpu=ppc64 < %s \| FileCheck %s			; RUN: llc -O0 -fast-isel=false -mcpu=ppc64 < %s \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"
	target triple = "powerpc64le-unknown-linux-gnu"			target triple = "powerpc64le-unknown-linux-gnu"

	define <16 x i8> @foo() nounwind ssp {			define <16 x i8> @foo() nounwind ssp {
	%1 = shufflevector <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>, <16 x i8> <i8 16, i8 17, i8 18, i8 19, i8 20, i8 21, i8 22, i8 23, i8 24, i8 25, i8 26, i8 27, i8 28, i8 29, i8 30, i8 31>, <16 x i32> <i32 0, i32 5, i32 10, i32 15, i32 20, i32 25, i32 30, i32 3, i32 8, i32 13, i32 18, i32 23, i32 28, i32 1, i32 6, i32 11>			%1 = shufflevector <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>, <16 x i8> <i8 16, i8 17, i8 18, i8 19, i8 20, i8 21, i8 22, i8 23, i8 24, i8 25, i8 26, i8 27, i8 28, i8 29, i8 30, i8 31>, <16 x i32> <i32 0, i32 5, i32 10, i32 15, i32 20, i32 25, i32 30, i32 3, i32 8, i32 13, i32 18, i32 23, i32 28, i32 1, i32 6, i32 11>
	ret <16 x i8> %1			ret <16 x i8> %1
	}			}

	; CHECK: .LCPI0_0:			; CHECK: .LCPI0_0:
	; CHECK: .byte 31
	; CHECK: .byte 26
	; CHECK: .byte 21
	; CHECK: .byte 16
	; CHECK: .byte 11
	; CHECK: .byte 6
	; CHECK: .byte 1
	; CHECK: .byte 28
	; CHECK: .byte 23
	; CHECK: .byte 18
	; CHECK: .byte 13
	; CHECK: .byte 8
	; CHECK: .byte 3
	; CHECK: .byte 30
	; CHECK: .byte 25
	; CHECK: .byte 20
	; CHECK: .LCPI0_1:
	; CHECK: .byte 0			; CHECK: .byte 0
	; CHECK: .byte 1
	; CHECK: .byte 2
	; CHECK: .byte 3
	; CHECK: .byte 4
	; CHECK: .byte 5			; CHECK: .byte 5
	; CHECK: .byte 6
	; CHECK: .byte 7
	; CHECK: .byte 8
	; CHECK: .byte 9
	; CHECK: .byte 10			; CHECK: .byte 10
	; CHECK: .byte 11
	; CHECK: .byte 12
	; CHECK: .byte 13
	; CHECK: .byte 14
	; CHECK: .byte 15			; CHECK: .byte 15
	; CHECK: .LCPI0_2:
	; CHECK: .byte 16
	; CHECK: .byte 17
	; CHECK: .byte 18
	; CHECK: .byte 19
	; CHECK: .byte 20			; CHECK: .byte 20
	; CHECK: .byte 21
	; CHECK: .byte 22
	; CHECK: .byte 23
	; CHECK: .byte 24
	; CHECK: .byte 25			; CHECK: .byte 25
	; CHECK: .byte 26
	; CHECK: .byte 27
	; CHECK: .byte 28
	; CHECK: .byte 29
	; CHECK: .byte 30			; CHECK: .byte 30
	; CHECK: .byte 31			; CHECK: .byte 3
				; CHECK: .byte 8
				; CHECK: .byte 13
				; CHECK: .byte 18
				; CHECK: .byte 23
				; CHECK: .byte 28
				; CHECK: .byte 1
				; CHECK: .byte 6
				; CHECK: .byte 11
	; CHECK: foo:			; CHECK: foo:
	; CHECK: addis [[REG1:[0-9]+]], 2, .LCPI0_2@toc@ha			; CHECK: addis [[REG1:[0-9]+]], 2, .LCPI0_0@toc@ha
	; CHECK: addi [[REG2:[0-9]+]], [[REG1]], .LCPI0_2@toc@l			; CHECK: addi [[REG2:[0-9]+]], [[REG1]], .LCPI0_0@toc@l
	; CHECK: lvx [[REG3:[0-9]+]], 0, [[REG2]]			; CHECK: lvx [[REG3:[0-9]+]], 0, [[REG2]]
	; CHECK: vperm {{[0-9]+}}, [[REG3]], {{[0-9]+}}, {{[0-9]+}}

llvm/trunk/test/CodeGen/X86/mmx-bitcast.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	}			}

	define i64 @t5(i32 %a, i32 %b) nounwind readnone {			define i64 @t5(i32 %a, i32 %b) nounwind readnone {
	; CHECK-LABEL: t5:			; CHECK-LABEL: t5:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: movd			; CHECK-NEXT: movd
	; CHECK-NEXT: movd			; CHECK-NEXT: movd
	; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]			; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,1,3]			; CHECK-NEXT: movd %xmm1, %rax
	; CHECK-NEXT: movd %xmm0, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%v0 = insertelement <2 x i32> undef, i32 %a, i32 0			%v0 = insertelement <2 x i32> undef, i32 %a, i32 0
	%v1 = insertelement <2 x i32> %v0, i32 %b, i32 1			%v1 = insertelement <2 x i32> %v0, i32 %b, i32 1
	%conv = bitcast <2 x i32> %v1 to i64			%conv = bitcast <2 x i32> %v1 to i64
	ret i64 %conv			ret i64 %conv
	}			}

	declare x86_mmx @llvm.x86.mmx.pslli.q(x86_mmx, i32)			declare x86_mmx @llvm.x86.mmx.pslli.q(x86_mmx, i32)
	Show All 22 Lines

llvm/trunk/test/CodeGen/X86/sse41.ll

	Show First 20 Lines • Show All 1,020 Lines • ▼ Show 20 Lines
	; X64-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm0[2],mem[2]			; X64-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm0[2],mem[2]
	; X64-NEXT: retq			; X64-NEXT: retq
	%load = load <4 x float> , <4 x float> *%ptr			%load = load <4 x float> , <4 x float> *%ptr
	%ret = shufflevector <4 x float> %load, <4 x float> %a, <4 x i32> <i32 4, i32 undef, i32 6, i32 2>			%ret = shufflevector <4 x float> %load, <4 x float> %a, <4 x i32> <i32 4, i32 undef, i32 6, i32 2>
	ret <4 x float> %ret			ret <4 x float> %ret
	}			}

	; Edge case for insertps where we end up with a shuffle with mask=<0, 7, -1, -1>			; Edge case for insertps where we end up with a shuffle with mask=<0, 7, -1, -1>
	define void @insertps_pr20411(i32* noalias nocapture %RET) #1 {			define void @insertps_pr20411(<4 x i32> %shuffle109, <4 x i32> %shuffle116, i32* noalias nocapture %RET) #1 {
	; X32-LABEL: insertps_pr20411:			; X32-LABEL: insertps_pr20411:
	; X32: ## BB#0:			; X32: ## BB#0:
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: pshufd {{.*#+}} xmm0 = mem[2,3,0,1]			; X32-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
	; X32-NEXT: pshufd {{.*#+}} xmm1 = mem[3,1,2,3]			; X32-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3],xmm0[4,5,6,7]
	; X32-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0,1],xmm0[2,3],xmm1[4,5,6,7]
	; X32-NEXT: movdqu %xmm1, (%eax)			; X32-NEXT: movdqu %xmm1, (%eax)
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: insertps_pr20411:			; X64-LABEL: insertps_pr20411:
	; X64: ## BB#0:			; X64: ## BB#0:
	; X64-NEXT: pshufd {{.*#+}} xmm0 = mem[2,3,0,1]			; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
	; X64-NEXT: pshufd {{.*#+}} xmm1 = mem[3,1,2,3]			; X64-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3],xmm0[4,5,6,7]
	; X64-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0,1],xmm0[2,3],xmm1[4,5,6,7]			; X64-NEXT: movdqu %xmm1, (%rdi)
	; X64-NEXT: movdqu %xmm1, (%rdi)			; X64-NEXT: retq
	; X64-NEXT: retq			%shuffle117 = shufflevector <4 x i32> %shuffle109, <4 x i32> %shuffle116, <4 x i32> <i32 0, i32 7, i32 undef, i32 undef>
	%gather_load = shufflevector <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%ptrcast = bitcast i32* %RET to <4 x i32>*
	%shuffle109 = shufflevector <4 x i32> <i32 4, i32 5, i32 6, i32 7>, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; 4 5 6 7			store <4 x i32> %shuffle117, <4 x i32>* %ptrcast, align 4
	%shuffle116 = shufflevector <8 x i32> %gather_load, <8 x i32> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef> ; 3 x x x
	%shuffle117 = shufflevector <4 x i32> %shuffle109, <4 x i32> %shuffle116, <4 x i32> <i32 4, i32 3, i32 undef, i32 undef> ; 3 7 x x
	%ptrcast = bitcast i32* %RET to <4 x i32>*
	store <4 x i32> %shuffle117, <4 x i32>* %ptrcast, align 4
	ret void			ret void
	}			}

	define <4 x float> @insertps_4(<4 x float> %A, <4 x float> %B) {			define <4 x float> @insertps_4(<4 x float> %A, <4 x float> %B) {
	; X32-LABEL: insertps_4:			; X32-LABEL: insertps_4:
	; X32: ## BB#0: ## %entry			; X32: ## BB#0: ## %entry
	; X32-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm1[2],zero			; X32-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,xmm1[2],zero
	; X32-NEXT: retl			; X32-NEXT: retl
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_insert-5.ll

	; RUN: llc < %s -march=x86 -mattr=+sse2,+ssse3 \| FileCheck %s			; RUN: llc < %s -march=x86 -mattr=+sse2,+ssse3 \| FileCheck %s
	; There are no MMX operations in @t1			; There are no MMX operations in @t1

	define void @t1(i32 %a, x86_mmx* %P) nounwind {			define void @t1(i32 %a, x86_mmx* %P) nounwind {
	; CHECK-LABEL: t1:			; CHECK-LABEL: t1:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: shll $12, %ecx			; CHECK-NEXT: shll $12, %ecx
	; CHECK-NEXT: movd %ecx, %xmm0			; CHECK-NEXT: movd %ecx, %xmm0
	; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,0,1]			; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]
	; CHECK-NEXT: movlpd %xmm0, (%eax)			; CHECK-NEXT: movlpd %xmm0, (%eax)
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%tmp12 = shl i32 %a, 12			%tmp12 = shl i32 %a, 12
	%tmp21 = insertelement <2 x i32> undef, i32 %tmp12, i32 1			%tmp21 = insertelement <2 x i32> undef, i32 %tmp12, i32 1
	%tmp22 = insertelement <2 x i32> %tmp21, i32 0, i32 0			%tmp22 = insertelement <2 x i32> %tmp21, i32 0, i32 0
	%tmp23 = bitcast <2 x i32> %tmp22 to x86_mmx			%tmp23 = bitcast <2 x i32> %tmp22 to x86_mmx
	store x86_mmx %tmp23, x86_mmx* %P			store x86_mmx %tmp23, x86_mmx* %P
	ret void			ret void
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_insert-mmx.ll

	; RUN: llc < %s -mtriple=i686-darwin -mattr=+mmx,+sse2 \| FileCheck %s -check-prefix=X86-32			; RUN: llc < %s -mtriple=i686-darwin -mattr=+mmx,+sse2 \| FileCheck %s -check-prefix=X86-32
	; RUN: llc < %s -mtriple=x86_64-darwin -mattr=+mmx,+sse4.1 \| FileCheck %s -check-prefix=X86-64			; RUN: llc < %s -mtriple=x86_64-darwin -mattr=+mmx,+sse4.1 \| FileCheck %s -check-prefix=X86-64

	; This is not an MMX operation; promoted to XMM.			; This is not an MMX operation; promoted to XMM.
	define x86_mmx @t0(i32 %A) nounwind {			define x86_mmx @t0(i32 %A) nounwind {
	; X86-32-LABEL: t0:			; X86-32-LABEL: t0:
	; X86-32: ## BB#0:			; X86-32: ## BB#0:
	; X86-32: movd {{[0-9]+}}(%esp), %xmm0			; X86-32: movd {{[0-9]+}}(%esp), %xmm0
	; X86-32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,0,1]			; X86-32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]
	; X86-32-NEXT: movlpd %xmm0, (%esp)			; X86-32-NEXT: movlpd %xmm0, (%esp)
	; X86-32-NEXT: movq (%esp), %mm0			; X86-32-NEXT: movq (%esp), %mm0
	; X86-32-NEXT: addl $12, %esp			; X86-32-NEXT: addl $12, %esp
	; X86-32-NEXT: retl			; X86-32-NEXT: retl
	%tmp3 = insertelement <2 x i32> < i32 0, i32 undef >, i32 %A, i32 1			%tmp3 = insertelement <2 x i32> < i32 0, i32 undef >, i32 %A, i32 1
	%tmp4 = bitcast <2 x i32> %tmp3 to x86_mmx			%tmp4 = bitcast <2 x i32> %tmp3 to x86_mmx
	ret x86_mmx %tmp4			ret x86_mmx %tmp4
	}			}
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_zero_cse.ll

	Show All 11 Lines
	;CHECK: xorpd			;CHECK: xorpd
	store <1 x i64> zeroinitializer, <1 x i64>* @M1			store <1 x i64> zeroinitializer, <1 x i64>* @M1
	store <2 x i32> zeroinitializer, <2 x i32>* @M2			store <2 x i32> zeroinitializer, <2 x i32>* @M2
	ret void			ret void
	}			}

	define void @test2() {			define void @test2() {
	;CHECK-LABEL: @test2			;CHECK-LABEL: @test2
	;CHECK: pshufd			;CHECK: pcmpeqd
	store <1 x i64> < i64 -1 >, <1 x i64>* @M1			store <1 x i64> < i64 -1 >, <1 x i64>* @M1
	store <2 x i32> < i32 -1, i32 -1 >, <2 x i32>* @M2			store <2 x i32> < i32 -1, i32 -1 >, <2 x i32>* @M2
	ret void			ret void
	}			}

	define void @test3() {			define void @test3() {
	;CHECK-LABEL: @test3			;CHECK-LABEL: @test3
	;CHECK: xorps			;CHECK: xorps
	Show All 14 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v16.ll

	Show First 20 Lines • Show All 628 Lines • ▼ Show 20 Lines
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14],zero,zero,zero,zero,zero,zero,zero,zero			; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14],zero,zero,zero,zero,zero,zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			%shuffle = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {			define <16 x i8> @shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {
	; SSE2-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; SSE-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE2: # BB#0:			; SSE: # BB#0:
	; SSE2-NEXT: movzbl %dil, %eax			; SSE-NEXT: movzbl %dil, %eax
	; SSE2-NEXT: movd %eax, %xmm0			; SSE-NEXT: movd %eax, %xmm0
	; SSE2-NEXT: retq			; SSE-NEXT: retq
	;
	; SSSE3-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSSE3: # BB#0:
	; SSSE3-NEXT: movd %edi, %xmm0
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSSE3-NEXT: retq
	;
	; SSE41-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE41: # BB#0:
	; SSE41-NEXT: movd %edi, %xmm0
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX-LABEL: shuffle_v16i8_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovd %edi, %xmm0			; AVX-NEXT: movzbl %dil, %eax
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; AVX-NEXT: vmovd %eax, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = insertelement <16 x i8> undef, i8 %i, i32 0			%a = insertelement <16 x i8> undef, i8 %i, i32 0
	%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 16, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 16, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {			define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {
	; SSE2-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; SSE-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE2: # BB#0:			; SSE: # BB#0:
	; SSE2-NEXT: movzbl %dil, %eax			; SSE-NEXT: shll $8, %edi
	; SSE2-NEXT: movd %eax, %xmm0			; SSE-NEXT: pxor %xmm0, %xmm0
	; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10]			; SSE-NEXT: pinsrw $2, %edi, %xmm0
	; SSE2-NEXT: retq			; SSE-NEXT: retq
	;
	; SSSE3-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSSE3: # BB#0:
	; SSSE3-NEXT: movd %edi, %xmm0
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSSE3-NEXT: retq
	;
	; SSE41-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE41: # BB#0:
	; SSE41-NEXT: movd %edi, %xmm0
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq
	;
	; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_16_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovd %edi, %xmm0			; AVX-NEXT: shll $8, %edi
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; AVX-NEXT: vpxor %xmm0, %xmm0
				; AVX-NEXT: vpinsrw $2, %edi, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = insertelement <16 x i8> undef, i8 %i, i32 0			%a = insertelement <16 x i8> undef, i8 %i, i32 0
	%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16(i8 %i) {			define <16 x i8> @shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16(i8 %i) {
	; SSE-LABEL: shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16:			; SSE-LABEL: shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movd %edi, %xmm0			; SSE-NEXT: shll $8, %edi
	; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0]			; SSE-NEXT: pxor %xmm0, %xmm0
				; SSE-NEXT: pinsrw $7, %edi, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16:			; AVX-LABEL: shuffle_v16i8_zz_uu_uu_zz_uu_uu_zz_zz_zz_zz_zz_zz_zz_zz_zz_16:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovd %edi, %xmm0			; AVX-NEXT: shll $8, %edi
	; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0]			; AVX-NEXT: vpxor %xmm0, %xmm0
				; AVX-NEXT: vpinsrw $7, %edi, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = insertelement <16 x i8> undef, i8 %i, i32 0			%a = insertelement <16 x i8> undef, i8 %i, i32 0
	%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 16>			%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 16>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {			define <16 x i8> @shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz(i8 %i) {
	; SSE2-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; SSE-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE2: # BB#0:			; SSE: # BB#0:
	; SSE2-NEXT: movzbl %dil, %eax			; SSE-NEXT: movzbl %dil, %eax
	; SSE2-NEXT: movd %eax, %xmm0			; SSE-NEXT: pxor %xmm0, %xmm0
	; SSE2-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]			; SSE-NEXT: pinsrw $1, %eax, %xmm0
	; SSE2-NEXT: retq			; SSE-NEXT: retq
	;
	; SSSE3-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSSE3: # BB#0:
	; SSSE3-NEXT: movd %edi, %xmm0
	; SSSE3-NEXT: pslld $24, %xmm0
	; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSSE3-NEXT: retq
	;
	; SSE41-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; SSE41: # BB#0:
	; SSE41-NEXT: movd %edi, %xmm0
	; SSE41-NEXT: pslld $24, %xmm0
	; SSE41-NEXT: pshufb {{.*#+}} xmm0 = zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:			; AVX-LABEL: shuffle_v16i8_zz_zz_19_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovd %edi, %xmm0			; AVX-NEXT: movzbl %dil, %eax
	; AVX-NEXT: vpslld $24, %xmm0, %xmm0			; AVX-NEXT: vpxor %xmm0, %xmm0
	; AVX-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; AVX-NEXT: vpinsrw $1, %eax, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = insertelement <16 x i8> undef, i8 %i, i32 3			%a = insertelement <16 x i8> undef, i8 %i, i32 3
	%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 1, i32 19, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%shuffle = shufflevector <16 x i8> zeroinitializer, <16 x i8> %a, <16 x i32> <i32 0, i32 1, i32 19, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

	define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu(<16 x i8> %a) {			define <16 x i8> @shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu(<16 x i8> %a) {
	; SSE-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu:			; SSE-LABEL: shuffle_v16i8_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_zz_16_uu_18_uu:
	▲ Show 20 Lines • Show All 611 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll

Show First 20 Lines • Show All 1,378 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%a = insertelement <8 x i16> undef, i16 %i, i32 0		%a = insertelement <8 x i16> undef, i16 %i, i32 0
%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i16> %shuffle		ret <8 x i16> %shuffle
}		}

define <8 x i16> @shuffle_v8i16_z8zzzzzz(i16 %i) {		define <8 x i16> @shuffle_v8i16_z8zzzzzz(i16 %i) {
; SSE-LABEL: shuffle_v8i16_z8zzzzzz:		; SSE-LABEL: shuffle_v8i16_z8zzzzzz:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movzwl %di, %eax		; SSE-NEXT: pxor %xmm0, %xmm0
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: pinsrw $1, %edi, %xmm0
; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: shuffle_v8i16_z8zzzzzz:		; AVX-LABEL: shuffle_v8i16_z8zzzzzz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: movzwl %di, %eax		; AVX-NEXT: vpxor %xmm0, %xmm0
; AVX-NEXT: vmovd %eax, %xmm0		; AVX-NEXT: vpinsrw $1, %edi, %xmm0
; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13]
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = insertelement <8 x i16> undef, i16 %i, i32 0		%a = insertelement <8 x i16> undef, i16 %i, i32 0
%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 2, i32 8, i32 3, i32 7, i32 6, i32 5, i32 4, i32 3>		%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 2, i32 8, i32 3, i32 7, i32 6, i32 5, i32 4, i32 3>
ret <8 x i16> %shuffle		ret <8 x i16> %shuffle
}		}

define <8 x i16> @shuffle_v8i16_zzzzz8zz(i16 %i) {		define <8 x i16> @shuffle_v8i16_zzzzz8zz(i16 %i) {
; SSE-LABEL: shuffle_v8i16_zzzzz8zz:		; SSE-LABEL: shuffle_v8i16_zzzzz8zz:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movzwl %di, %eax		; SSE-NEXT: pxor %xmm0, %xmm0
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: pinsrw $5, %edi, %xmm0
; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: shuffle_v8i16_zzzzz8zz:		; AVX-LABEL: shuffle_v8i16_zzzzz8zz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: movzwl %di, %eax		; AVX-NEXT: vpxor %xmm0, %xmm0
; AVX-NEXT: vmovd %eax, %xmm0		; AVX-NEXT: vpinsrw $5, %edi, %xmm0
; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5]
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = insertelement <8 x i16> undef, i16 %i, i32 0		%a = insertelement <8 x i16> undef, i16 %i, i32 0
%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 8, i32 0, i32 0>		%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 8, i32 0, i32 0>
ret <8 x i16> %shuffle		ret <8 x i16> %shuffle
}		}

define <8 x i16> @shuffle_v8i16_zuuzuuz8(i16 %i) {		define <8 x i16> @shuffle_v8i16_zuuzuuz8(i16 %i) {
; SSE-LABEL: shuffle_v8i16_zuuzuuz8:		; SSE-LABEL: shuffle_v8i16_zuuzuuz8:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movd %edi, %xmm0		; SSE-NEXT: pxor %xmm0, %xmm0
; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1]		; SSE-NEXT: pinsrw $7, %edi, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: shuffle_v8i16_zuuzuuz8:		; AVX-LABEL: shuffle_v8i16_zuuzuuz8:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd %edi, %xmm0		; AVX-NEXT: vpxor %xmm0, %xmm0
; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1]		; AVX-NEXT: vpinsrw $7, %edi, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = insertelement <8 x i16> undef, i16 %i, i32 0		%a = insertelement <8 x i16> undef, i16 %i, i32 0
%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 8>		%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 8>
ret <8 x i16> %shuffle		ret <8 x i16> %shuffle
}		}

define <8 x i16> @shuffle_v8i16_zzBzzzzz(i16 %i) {		define <8 x i16> @shuffle_v8i16_zzBzzzzz(i16 %i) {
; SSE-LABEL: shuffle_v8i16_zzBzzzzz:		; SSE-LABEL: shuffle_v8i16_zzBzzzzz:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movzwl %di, %eax		; SSE-NEXT: pxor %xmm0, %xmm0
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: pinsrw $2, %edi, %xmm0
; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: shuffle_v8i16_zzBzzzzz:		; AVX-LABEL: shuffle_v8i16_zzBzzzzz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: movzwl %di, %eax		; AVX-NEXT: vpxor %xmm0, %xmm0
; AVX-NEXT: vmovd %eax, %xmm0		; AVX-NEXT: vpinsrw $2, %edi, %xmm0
; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8,9,10,11]
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = insertelement <8 x i16> undef, i16 %i, i32 3		%a = insertelement <8 x i16> undef, i16 %i, i32 3
%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 11, i32 3, i32 4, i32 5, i32 6, i32 7>		%shuffle = shufflevector <8 x i16> zeroinitializer, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 11, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i16> %shuffle		ret <8 x i16> %shuffle
}		}

define <8 x i16> @shuffle_v8i16_def01234(<8 x i16> %a, <8 x i16> %b) {		define <8 x i16> @shuffle_v8i16_def01234(<8 x i16> %a, <8 x i16> %b) {
; SSE2-LABEL: shuffle_v8i16_def01234:		; SSE2-LABEL: shuffle_v8i16_def01234:
▲ Show 20 Lines • Show All 694 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTORClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 23209

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/test/CodeGen/AArch64/arm64-neon-copy.ll

llvm/trunk/test/CodeGen/AArch64/arm64-vshuffle.ll

llvm/trunk/test/CodeGen/PowerPC/vperm-lowering.ll

llvm/trunk/test/CodeGen/X86/mmx-bitcast.ll

llvm/trunk/test/CodeGen/X86/sse41.ll

llvm/trunk/test/CodeGen/X86/vec_insert-5.ll

llvm/trunk/test/CodeGen/X86/vec_insert-mmx.ll

llvm/trunk/test/CodeGen/X86/vec_zero_cse.ll

llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v16.ll

llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll

[DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
ClosedPublic