This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
3
DAGCombiner.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
2013-02-12-ShuffleToZext.ll
1
vector-shuffle-128-v16.ll
1
vector-shuffle-128-v2.ll
1
vector-shuffle-128-v4.ll
-
vector-shuffle-256-v4.ll
-
vector-shuffle-mmx.ll

Differential D7939

[DagCombiner] Allow shuffles to merge through bitcasts
ClosedPublic

Authored by RKSimon on Feb 27 2015, 7:23 AM.

Download Raw Diff

Details

Reviewers

qcolombet
chandlerc
andreadb

Commits

rG7189084bef9b: [DagCombiner] Allow shuffles to merge through bitcasts
rL231380: [DagCombiner] Allow shuffles to merge through bitcasts

Summary

Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).

This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.

Dropped old ShuffleToZext test - this patch removes the use of zext and vector-zext.ll covers these anyhow.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 20853.Feb 27 2015, 7:23 AM

RKSimon retitled this revision from to [DagCombiner] Allow shuffles to merge through bitcasts.

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: chandlerc, andreadb, qcolombet.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

Ping. Rebased and added early-out.

qcolombet added inline comments.Mar 3 2015, 2:09 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11923	I don’t think this assert stands in a pre legalization world. This is perfectly legal AFAIK to do something like this: bitcast <4 x i48> %a to <6 x i32>
test/CodeGen/X86/vector-shuffle-128-v16.ll
1352	Do you actually need the zero initializer here? “undef" should do the trick and would make more obvious why you patch apply.
test/CodeGen/X86/vector-shuffle-128-v2.ll
827	Ditto.
test/CodeGen/X86/vector-shuffle-128-v4.ll
1589	Ditto.

Thanks Quentin. The assert for re-scalable scalar types has been changed into part of the type legality test - I've also used that opportunity to remove the code duplication between scaling the inner or outer shuffles. I've also updated the tests to use undef where possible.

Hi Simon,

Overall the patch looks good to me. I suggest that you move this new logic into a separate function. In my opinion, It would make the code in visitVECTOR_SHUFFLE a bit more readable. I also made a couple of minor comments (see below).

Thanks!
-Andrea

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11885	In this context, we know that N0 is used by N, so writing 'N->isOnlyUserOf(N0.getNode())' is equivalent to writing 'N0.hasOneUse()'. If there is one use, then it can only be N.
11901–11909	A very minor nit: You can probably explicitly initialize NewMask like this: SmallVector<int, 8> NewMask(Scale, -1); This will allow you to simplify the loop using operator[]. So, `NewMask.push_back(Scale * M + i)` would become `NewMask[i] = Scale * M + i`. Also, if you do that, then you would not need to always 'push_back(-1)' if M is less than 0.

This revision is now accepted and ready to land.Mar 5 2015, 4:30 AM

Closed by commit rL231380: [DagCombiner] Allow shuffles to merge through bitcasts (authored by RKSimon). · Explain WhyMar 5 2015, 9:16 AM

This revision was automatically updated to reflect the committed changes.

Thanks Andrea, I was able to include some of your minors in this patch. I'll be sending a follow up patch soon (probably next week now) that will remove all the code duplication we have for shuffle mask commutation.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

	DAGCombiner.cpp
	DAGCombiner.cpp (revision 231067)

93 lines

test/

CodeGen/

X86/

	2013-02-12-ShuffleToZext.ll
	2013-02-12-ShuffleToZext.ll (revision 231067)

14 lines

	vector-shuffle-128-v16.ll
	vector-shuffle-128-v16.ll (revision 231067)

19 lines

	vector-shuffle-128-v2.ll
	vector-shuffle-128-v2.ll (revision 231067)

19 lines

	vector-shuffle-128-v4.ll
	vector-shuffle-128-v4.ll (revision 231067)

17 lines

	vector-shuffle-256-v4.ll
	vector-shuffle-256-v4.ll (revision 231067)

19 lines

	vector-shuffle-mmx.ll
	vector-shuffle-mmx.ll (revision 231067)

11 lines

Diff 21102

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,873 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::CONCAT_VECTORS &&
(N1.getOpcode() == ISD::CONCAT_VECTORS &&		(N1.getOpcode() == ISD::CONCAT_VECTORS &&
N0.getOperand(0).getValueType() == N1.getOperand(0).getValueType()))) {		N0.getOperand(0).getValueType() == N1.getOperand(0).getValueType()))) {
SDValue V = partitionShuffleOfConcats(N, DAG);		SDValue V = partitionShuffleOfConcats(N, DAG);

if (V.getNode())		if (V.getNode())
return V;		return V;
}		}

		// If this shuffle only has a single input that is a bitcasted shuffle,
		// attempt to merge the 2 shuffles and suitably bitcast the inputs/output
		// back to their original types.
		if (N0.getOpcode() == ISD::BITCAST && N->isOnlyUserOf(N0.getNode()) &&
		andreadbUnsubmitted Not Done Reply Inline Actions In this context, we know that N0 is used by N, so writing 'N->isOnlyUserOf(N0.getNode())' is equivalent to writing 'N0.hasOneUse()'. If there is one use, then it can only be N. andreadb: In this context, we know that N0 is used by N, so writing 'N->isOnlyUserOf(N0.getNode())' is…
		N1.getOpcode() == ISD::UNDEF && Level < AfterLegalizeVectorOps &&
		TLI.isTypeLegal(VT)) {

		// Peek through the bitcast only if there is one user.
		SDValue BC0 = N0;
		while (BC0.getOpcode() == ISD::BITCAST) {
		if (!BC0.hasOneUse())
		break;
		BC0 = BC0.getOperand(0);
		}

		auto ScaleShuffleMask = [](ArrayRef<int> Mask, int Scale) {
		SmallVector<int, 8> NewMask;
		for (int M : Mask)
		for (int i = 0; i != Scale; ++i) {
		if (M < 0) {
		NewMask.push_back(-1);
		continue;
		}
		NewMask.push_back(Scale * M + i);
		}
		return NewMask;
		};

		andreadbUnsubmitted Not Done Reply Inline Actions A very minor nit: You can probably explicitly initialize NewMask like this: SmallVector<int, 8> NewMask(Scale, -1); This will allow you to simplify the loop using operator[]. So, `NewMask.push_back(Scale * M + i)` would become `NewMask[i] = Scale * M + i`. Also, if you do that, then you would not need to always 'push_back(-1)' if M is less than 0. andreadb: A very minor nit: You can probably explicitly initialize NewMask like this: SmallVector<int, 8>…
		if (BC0.getOpcode() == ISD::VECTOR_SHUFFLE && BC0.hasOneUse()) {
		// Determine which shuffle works with the smaller scalar type and
		// scale the other shuffle mask to match number of elements.
		EVT ScaleVT;
		SmallVector<int, 8> InnerMask;
		SmallVector<int, 8> OuterMask;
		ShuffleVectorSDNode *InnerSVN = cast<ShuffleVectorSDNode>(BC0);

		EVT SVT = VT.getScalarType();
		EVT InnerVT = BC0->getValueType(0);
		EVT InnerSVT = InnerVT.getScalarType();

		if (SVT.bitsLT(InnerSVT)) {
		assert(0 == (InnerSVT.getSizeInBits() % SVT.getSizeInBits()) &&
		qcolombetUnsubmitted Not Done Reply Inline Actions I don’t think this assert stands in a pre legalization world. This is perfectly legal AFAIK to do something like this: bitcast <4 x i48> %a to <6 x i32> qcolombet: I don’t think this assert stands in a pre legalization world. This is perfectly legal AFAIK to…
		"Illegal Shuffle Mask Scale");
		int Scale = InnerSVT.getSizeInBits() / SVT.getSizeInBits();
		InnerMask = ScaleShuffleMask(InnerSVN->getMask(), Scale);
		OuterMask =
		SmallVector<int, 8>(SVN->getMask().begin(), SVN->getMask().end());
		ScaleVT = VT;
		} else {
		assert(0 == (SVT.getSizeInBits() % InnerSVT.getSizeInBits()) &&
		"Illegal Shuffle Mask Scale");
		int Scale = SVT.getSizeInBits() / InnerSVT.getSizeInBits();
		InnerMask = SmallVector<int, 8>(InnerSVN->getMask().begin(),
		InnerSVN->getMask().end());
		OuterMask = ScaleShuffleMask(SVN->getMask(), Scale);
		ScaleVT = InnerVT;
		}

		if (TLI.isTypeLegal(ScaleVT)) {
		// Merge the shuffle masks.
		SmallVector<int, 8> NewMask;
		for (int M : OuterMask)
		NewMask.push_back(M < 0 ? -1 : InnerMask[M]);

		// Test for shuffle mask legality over both commutations.
		SDValue SV0 = BC0->getOperand(0);
		SDValue SV1 = BC0->getOperand(1);
		bool LegalMask = TLI.isShuffleMaskLegal(NewMask, ScaleVT);
		if (!LegalMask) {
		for (int i = 0, e = (int)NewMask.size(); i != e; ++i) {
		int idx = NewMask[i];
		if (idx < 0)
		continue;
		else if (idx < e)
		NewMask[i] = idx + e;
		else
		NewMask[i] = idx - e;
		}
		std::swap(SV0, SV1);
		LegalMask = TLI.isShuffleMaskLegal(NewMask, ScaleVT);
		}

		if (LegalMask) {
		SV0 = DAG.getNode(ISD::BITCAST, SDLoc(N), ScaleVT, SV0);
		SV1 = DAG.getNode(ISD::BITCAST, SDLoc(N), ScaleVT, SV1);
		return DAG.getNode(
		ISD::BITCAST, SDLoc(N), VT,
		DAG.getVectorShuffle(ScaleVT, SDLoc(N), SV0, SV1, NewMask));
		}
		}
		}
		}

// Canonicalize shuffles according to rules:		// Canonicalize shuffles according to rules:
// shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A)		// shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A)
// shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B)		// shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B)
// shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B)		// shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B)
if (N1.getOpcode() == ISD::VECTOR_SHUFFLE &&		if (N1.getOpcode() == ISD::VECTOR_SHUFFLE &&
N0.getOpcode() != ISD::VECTOR_SHUFFLE && Level < AfterLegalizeDAG &&		N0.getOpcode() != ISD::VECTOR_SHUFFLE && Level < AfterLegalizeDAG &&
TLI.isTypeLegal(VT)) {		TLI.isTypeLegal(VT)) {
// The incoming shuffle must be of the same type as the result of the		// The incoming shuffle must be of the same type as the result of the
▲ Show 20 Lines • Show All 1,263 Lines • Show Last 20 Lines

test/CodeGen/X86/2013-02-12-ShuffleToZext.ll

	; RUN: llc < %s -march=x86-64 -mcpu=corei7-avx -mtriple=x86_64-pc-win32 \| FileCheck %s

	; CHECK: test
	; CHECK: vpmovzxwd
	; CHECK: vpmovzxwd
	define void @test(<4 x i64> %a, <4 x i16>* %buf) {
	%ex1 = extractelement <4 x i64> %a, i32 0
	%ex2 = extractelement <4 x i64> %a, i32 1
	%x1 = bitcast i64 %ex1 to <4 x i16>
	%x2 = bitcast i64 %ex2 to <4 x i16>
	%Sh = shufflevector <4 x i16> %x1, <4 x i16> %x2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	store <4 x i16> %Sh, <4 x i16>* %buf, align 1
	ret void
	}

test/CodeGen/X86/vector-shuffle-128-v16.ll

	Show First 20 Lines • Show All 1,330 Lines • ▼ Show 20 Lines
	;			;
	; AVX-LABEL: shuffle_v16i8_uu_02_03_zz_uu_06_07_zz_uu_10_11_zz_uu_14_15_zz:			; AVX-LABEL: shuffle_v16i8_uu_02_03_zz_uu_06_07_zz_uu_10_11_zz_uu_14_15_zz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpsrld $8, %xmm0, %xmm0			; AVX-NEXT: vpsrld $8, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 undef, i32 2, i32 3, i32 16, i32 undef, i32 6, i32 7, i32 16, i32 undef, i32 10, i32 11, i32 16, i32 undef, i32 14, i32 15, i32 16>			%shuffle = shufflevector <16 x i8> %a, <16 x i8> zeroinitializer, <16 x i32> <i32 undef, i32 2, i32 3, i32 16, i32 undef, i32 6, i32 7, i32 16, i32 undef, i32 10, i32 11, i32 16, i32 undef, i32 14, i32 15, i32 16>
	ret <16 x i8> %shuffle			ret <16 x i8> %shuffle
	}			}

				define <16 x i8> @shuffle_v16i8_bitcast_unpack(<16 x i8> %a, <16 x i8> %b) {
				; SSE-LABEL: shuffle_v16i8_bitcast_unpack:
				; SSE: # BB#0:
				; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v16i8_bitcast_unpack:
				; AVX: # BB#0:
				; AVX-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; AVX-NEXT: retq
				%shuffle8 = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32 7, i32 23, i32 6, i32 22, i32 5, i32 21, i32 4, i32 20, i32 3, i32 19, i32 2, i32 18, i32 1, i32 17, i32 0, i32 16>
				%bitcast32 = bitcast <16 x i8> %shuffle8 to <4 x float>
				%shuffle32 = shufflevector <4 x float> %bitcast32, <4 x float> zeroinitializer, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				qcolombetUnsubmitted Not Done Reply Inline Actions Do you actually need the zero initializer here? “undef" should do the trick and would make more obvious why you patch apply. qcolombet: Do you actually need the zero initializer here? “undef" should do the trick and would make more…
				%bitcast16 = bitcast <4 x float> %shuffle32 to <8 x i16>
				%shuffle16 = shufflevector <8 x i16> %bitcast16, <8 x i16> zeroinitializer, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>
				%bitcast8 = bitcast <8 x i16> %shuffle16 to <16 x i8>
				ret <16 x i8> %bitcast8
				}

test/CodeGen/X86/vector-shuffle-128-v2.ll

	Show First 20 Lines • Show All 804 Lines • ▼ Show 20 Lines
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vxorpd %xmm1, %xmm1, %xmm1			; AVX-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; AVX-NEXT: vblendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]			; AVX-NEXT: vblendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuffle = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <2 x i32> <i32 2, i32 1>			%shuffle = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <2 x i32> <i32 2, i32 1>
	ret <2 x double> %shuffle			ret <2 x double> %shuffle
	}			}

				define <2 x double> @shuffle_v2f64_bitcast_1z(<2 x double> %a) {
				; SSE-LABEL: shuffle_v2f64_bitcast_1z:
				; SSE: # BB#0:
				; SSE-NEXT: xorpd %xmm1, %xmm1
				; SSE-NEXT: shufpd {{.*#+}} xmm0 = xmm0[1],xmm1[0]
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v2f64_bitcast_1z:
				; AVX: # BB#0:
				; AVX-NEXT: vxorpd %xmm1, %xmm1, %xmm1
				; AVX-NEXT: vshufpd {{.*#+}} xmm0 = xmm0[1],xmm1[0]
				; AVX-NEXT: retq
				%shuffle64 = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <2 x i32> <i32 2, i32 1>
				%bitcast32 = bitcast <2 x double> %shuffle64 to <4 x float>
				%shuffle32 = shufflevector <4 x float> %bitcast32, <4 x float> zeroinitializer, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
				qcolombetUnsubmitted Not Done Reply Inline Actions Ditto. qcolombet: Ditto.
				%bitcast64 = bitcast <4 x float> %shuffle32 to <2 x double>
				ret <2 x double> %bitcast64
				}

	define <2 x i64> @insert_reg_and_zero_v2i64(i64 %a) {			define <2 x i64> @insert_reg_and_zero_v2i64(i64 %a) {
	; SSE-LABEL: insert_reg_and_zero_v2i64:			; SSE-LABEL: insert_reg_and_zero_v2i64:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movd %rdi, %xmm0			; SSE-NEXT: movd %rdi, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: insert_reg_and_zero_v2i64:			; AVX-LABEL: insert_reg_and_zero_v2i64:
	; AVX: # BB#0:			; AVX: # BB#0:
	▲ Show 20 Lines • Show All 311 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-128-v4.ll

	Show First 20 Lines • Show All 1,568 Lines • ▼ Show 20 Lines
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1,2],xmm0[3]			; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1,2],xmm0[3]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 4, i32 4, i32 3>			%shuffle = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 4, i32 4, i32 3>
	ret <4 x i32> %shuffle			ret <4 x i32> %shuffle
	}			}

				define <4 x i32> @shuffle_v4i32_bitcast_0415(<4 x i32> %a, <4 x i32> %b) {
				; SSE-LABEL: shuffle_v4i32_bitcast_0415:
				; SSE: # BB#0:
				; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; SSE-NEXT: retq
				;
				; AVX-LABEL: shuffle_v4i32_bitcast_0415:
				; AVX: # BB#0:
				; AVX-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; AVX-NEXT: retq
				%shuffle32 = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 1, i32 5, i32 0, i32 4>
				%bitcast64 = bitcast <4 x i32> %shuffle32 to <2 x double>
				%shuffle64 = shufflevector <2 x double> %bitcast64, <2 x double> zeroinitializer, <2 x i32> <i32 1, i32 0>
				qcolombetUnsubmitted Not Done Reply Inline Actions Ditto. qcolombet: Ditto.
				%bitcast32 = bitcast <2 x double> %shuffle64 to <4 x i32>
				ret <4 x i32> %bitcast32
				}

	define <4 x i32> @insert_reg_and_zero_v4i32(i32 %a) {			define <4 x i32> @insert_reg_and_zero_v4i32(i32 %a) {
	; SSE-LABEL: insert_reg_and_zero_v4i32:			; SSE-LABEL: insert_reg_and_zero_v4i32:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movd %edi, %xmm0			; SSE-NEXT: movd %edi, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: insert_reg_and_zero_v4i32:			; AVX-LABEL: insert_reg_and_zero_v4i32:
	; AVX: # BB#0:			; AVX: # BB#0:
	▲ Show 20 Lines • Show All 307 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-256-v4.ll

	Show First 20 Lines • Show All 916 Lines • ▼ Show 20 Lines
	;			;
	; AVX2-LABEL: splat_v4f64:			; AVX2-LABEL: splat_v4f64:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0			; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%1 = shufflevector <2 x double> %r, <2 x double> undef, <4 x i32> zeroinitializer			%1 = shufflevector <2 x double> %r, <2 x double> undef, <4 x i32> zeroinitializer
	ret <4 x double> %1			ret <4 x double> %1
	}			}

				define <4 x double> @bitcast_v4f64_0426(<4 x double> %a, <4 x double> %b) {
				; AVX1-LABEL: bitcast_v4f64_0426:
				; AVX1: # BB#0:
				; AVX1-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: bitcast_v4f64_0426:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpunpcklqdq {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; AVX2-NEXT: retq
				%shuffle64 = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 4, i32 0, i32 6, i32 2>
				%bitcast32 = bitcast <4 x double> %shuffle64 to <8 x float>
				%shuffle32 = shufflevector <8 x float> %bitcast32, <8 x float> undef, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
				%bitcast16 = bitcast <8 x float> %shuffle32 to <16 x i16>
				%shuffle16 = shufflevector <16 x i16> %bitcast16, <16 x i16> undef, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 6, i32 7, i32 4, i32 5, i32 10, i32 11, i32 8, i32 9, i32 14, i32 15, i32 12, i32 13>
				%bitcast64 = bitcast <16 x i16> %shuffle16 to <4 x double>
				ret <4 x double> %bitcast64
				}

test/CodeGen/X86/vector-shuffle-mmx.ll

	; RUN: llc < %s -mtriple=i686-darwin -mattr=+mmx,+sse2 \| FileCheck --check-prefix=X32 %s			; RUN: llc < %s -mtriple=i686-darwin -mattr=+mmx,+sse2 \| FileCheck --check-prefix=X32 %s
	; RUN: llc < %s -mtriple=x86_64-darwin -mattr=+mmx,+sse2 \| FileCheck --check-prefix=X64 %s			; RUN: llc < %s -mtriple=x86_64-darwin -mattr=+mmx,+sse2 \| FileCheck --check-prefix=X64 %s

	; If there is no explicit MMX type usage, always promote to XMM.			; If there is no explicit MMX type usage, always promote to XMM.

	define void @test0(<1 x i64>* %x) {			define void @test0(<1 x i64>* %x) {
	; X32-LABEL: test0:			; X32-LABEL: test0:
	; X32: ## BB#0: ## %entry			; X32: ## BB#0: ## %entry
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,1,3]			; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; X32-NEXT: movlpd %xmm0, (%eax)			; X32-NEXT: movlpd %xmm0, (%eax)
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test0:			; X64-LABEL: test0:
	; X64: ## BB#0: ## %entry			; X64: ## BB#0: ## %entry
	; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,1,3]			; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; X64-NEXT: movq %xmm0, (%rdi)			; X64-NEXT: movq %xmm0, (%rdi)
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%tmp2 = load <1 x i64>, <1 x i64>* %x			%tmp2 = load <1 x i64>, <1 x i64>* %x
	%tmp6 = bitcast <1 x i64> %tmp2 to <2 x i32>			%tmp6 = bitcast <1 x i64> %tmp2 to <2 x i32>
	%tmp9 = shufflevector <2 x i32> %tmp6, <2 x i32> undef, <2 x i32> < i32 1, i32 1 >			%tmp9 = shufflevector <2 x i32> %tmp6, <2 x i32> undef, <2 x i32> < i32 1, i32 1 >
	%tmp10 = bitcast <2 x i32> %tmp9 to <1 x i64>			%tmp10 = bitcast <2 x i32> %tmp9 to <1 x i64>
	store <1 x i64> %tmp10, <1 x i64>* %x			store <1 x i64> %tmp10, <1 x i64>* %x
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines

	@tmp_V2i = common global <2 x i32> zeroinitializer			@tmp_V2i = common global <2 x i32> zeroinitializer

	define void @test2() nounwind {			define void @test2() nounwind {
	; X32-LABEL: test2:			; X32-LABEL: test2:
	; X32: ## BB#0: ## %entry			; X32: ## BB#0: ## %entry
	; X32-NEXT: movl L_tmp_V2i$non_lazy_ptr, %eax			; X32-NEXT: movl L_tmp_V2i$non_lazy_ptr, %eax
	; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0,0]			; X32-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0,0,1,1]
	; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]			; X32-NEXT: movlps %xmm0, (%eax)
	; X32-NEXT: movlpd %xmm0, (%eax)
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test2:			; X64-LABEL: test2:
	; X64: ## BB#0: ## %entry			; X64: ## BB#0: ## %entry
	; X64-NEXT: movq _tmp_V2i@{{.*}}(%rip), %rax			; X64-NEXT: movq _tmp_V2i@{{.*}}(%rip), %rax
	; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,1]			; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
	; X64-NEXT: movq %xmm0, (%rax)			; X64-NEXT: movq %xmm0, (%rax)
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%0 = load <2 x i32>, <2 x i32>* @tmp_V2i, align 8			%0 = load <2 x i32>, <2 x i32>* @tmp_V2i, align 8
	%1 = shufflevector <2 x i32> %0, <2 x i32> undef, <2 x i32> zeroinitializer			%1 = shufflevector <2 x i32> %0, <2 x i32> undef, <2 x i32> zeroinitializer
	store <2 x i32> %1, <2 x i32>* @tmp_V2i, align 8			store <2 x i32> %1, <2 x i32>* @tmp_V2i, align 8
	ret void			ret void
	}			}

	declare void @llvm.x86.mmx.maskmovq(x86_mmx, x86_mmx, i8*)			declare void @llvm.x86.mmx.maskmovq(x86_mmx, x86_mmx, i8*)

This is an archive of the discontinued LLVM Phabricator instance.

[DagCombiner] Allow shuffles to merge through bitcastsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 21102

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/2013-02-12-ShuffleToZext.ll

test/CodeGen/X86/vector-shuffle-128-v16.ll

test/CodeGen/X86/vector-shuffle-128-v2.ll

test/CodeGen/X86/vector-shuffle-128-v4.ll

test/CodeGen/X86/vector-shuffle-256-v4.ll

test/CodeGen/X86/vector-shuffle-mmx.ll

[DagCombiner] Allow shuffles to merge through bitcasts
ClosedPublic