This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Look through concat when lowering in-place shuffles (VZIP, ..)
ClosedPublic

Authored by ab on Jun 12 2015, 3:49 PM.

Download Raw Diff

Details

Reviewers

Commits

rG9a9094260d81: [ARM] Look through concat when lowering in-place shuffles (VZIP, ..)
rL240118: [ARM] Look through concat when lowering in-place shuffles (VZIP, ..)

Summary

Currently, we canonicalize shuffles that produce a result larger than
their operands with:

shuffle(concat(v1, undef), concat(v2, undef))

shuffle(concat(v1, v2), undef)

because we can access quad vectors (see PerformVECTOR_SHUFFLECombine).

This is useful in the general case, but there are special cases where
native shuffles produce larger results: the two-result ops.

Look through the concat when lowering them:

shuffle(concat(v1, v2), undef)

concat(VZIP(v1, v2):0, :1)

This lets us generate the native shuffles instead of scalarizing to
dozens of VMOVs.

I'm a little worried about the disparity between the lowering and
isShuffleMaskLegal, but with the current API we have no way of looking
at the actual operands, and this isn't a problem in practice because
the ARM combine runs last.

The obvious alternative would be to stop doing the combine, but I
think it's useful. We can also avoid doing it for these masks, but
we'll still need to look through concat(v, undef) to avoid
generating needlessly-wide shuffles.

Diff Detail

Event Timeline

ab updated this revision to Diff 27613.Jun 12 2015, 3:49 PM

ab retitled this revision from to [ARM] Look through concat when lowering in-place shuffles (VZIP, ..).

ab updated this object.

ab edited the test plan for this revision. (Show Details)

ab added subscribers: Unknown Object (MLST), jmolloy, t.p.northover and 2 others.

Herald added a subscriber: aemerson. · View Herald TranscriptJun 12 2015, 3:49 PM

Hi Ahmed,

LGTM with some improvements on the tests patterns.

If you really really are motivated, you could fix all the patterns in a subsequent commit :).

Cheers,
-Quentin

test/CodeGen/ARM/vtrn.ll
17	I know this is consistent with the surrounding tests, but I would prefer that we check that the arguments are what we expect. In other words, could you check that we are feeding the right arguments here?

This revision is now accepted and ready to land.Jun 17 2015, 11:04 AM

Closed by commit rL240118: [ARM] Look through concat when lowering in-place shuffles (VZIP, ..) (authored by ab). · Explain WhyJun 18 2015, 7:37 PM

This revision was automatically updated to reflect the committed changes.

No need for motivation with Chandler's script ;)

r240114, r240116, r240118.

-Ahmed

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMISelLowering.cpp

39 lines

test/

CodeGen/

ARM/

vtrn.ll

89 lines

vuzp.ll

71 lines

vzip.ll

71 lines

Diff 27613

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,708 Lines • ▼ Show 20 Lines	if (EltSize <= 32) {
bool isV_UNDEF;		bool isV_UNDEF;
if (unsigned ShuffleOpc = isNEONTwoResultShuffleMask(		if (unsigned ShuffleOpc = isNEONTwoResultShuffleMask(
ShuffleMask, VT, WhichResult, isV_UNDEF)) {		ShuffleMask, VT, WhichResult, isV_UNDEF)) {
if (isV_UNDEF)		if (isV_UNDEF)
V2 = V1;		V2 = V1;
return DAG.getNode(ShuffleOpc, dl, DAG.getVTList(VT, VT), V1, V2)		return DAG.getNode(ShuffleOpc, dl, DAG.getVTList(VT, VT), V1, V2)
.getValue(WhichResult);		.getValue(WhichResult);
}		}

		// Also check for these shuffles through CONCAT_VECTORS: we canonicalize
		// shuffles that produce a result larger than their operands with:
		// shuffle(concat(v1, undef), concat(v2, undef))
		// ->
		// shuffle(concat(v1, v2), undef)
		// because we can access quad vectors (see PerformVECTOR_SHUFFLECombine).
		//
		// This is useful in the general case, but there are special cases where
		// native shuffles produce larger results: the two-result ops.
		//
		// Look through the concat when lowering them:
		// shuffle(concat(v1, v2), undef)
		// ->
		// concat(VZIP(v1, v2):0, :1)
		//
		if (V1->getOpcode() == ISD::CONCAT_VECTORS &&
		V2->getOpcode() == ISD::UNDEF) {
		SDValue SubV1 = V1->getOperand(0);
		SDValue SubV2 = V1->getOperand(1);
		EVT SubVT = SubV1.getValueType();

		// We expect these to have been canonicalized to -1.
		assert(std::all_of(ShuffleMask.begin(), ShuffleMask.end(), [&](int i) {
		return i < (int)VT.getVectorNumElements();
		}) && "Unexpected shuffle index into UNDEF operand!");

		if (unsigned ShuffleOpc = isNEONTwoResultShuffleMask(
		ShuffleMask, SubVT, WhichResult, isV_UNDEF)) {
		if (isV_UNDEF)
		SubV2 = SubV1;
		assert((WhichResult == 0) &&
		"In-place shuffle of concat can only have one result!");
		SDValue Res = DAG.getNode(ShuffleOpc, dl, DAG.getVTList(SubVT, SubVT),
		SubV1, SubV2);
		return DAG.getNode(ISD::CONCAT_VECTORS, dl, VT, Res.getValue(0),
		Res.getValue(1));
		}
		}
}		}

// If the shuffle is not directly supported and it has 4 elements, use		// If the shuffle is not directly supported and it has 4 elements, use
// the PerfectShuffle-generated table to synthesize it from other shuffles.		// the PerfectShuffle-generated table to synthesize it from other shuffles.
unsigned NumElts = VT.getVectorNumElements();		unsigned NumElts = VT.getVectorNumElements();
if (NumElts == 4) {		if (NumElts == 4) {
unsigned PFIndexes[4];		unsigned PFIndexes[4];
for (unsigned i = 0; i != 4; ++i) {		for (unsigned i = 0; i != 4; ++i) {
▲ Show 20 Lines • Show All 5,719 Lines • Show Last 20 Lines

test/CodeGen/ARM/vtrn.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s

	define <8 x i8> @vtrni8(<8 x i8>* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vtrni8(<8 x i8>* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vtrni8:			;CHECK-LABEL: vtrni8:
	;CHECK: vtrn.8			;CHECK: vtrn.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = load <8 x i8>, <8 x i8>* %B			%tmp2 = load <8 x i8>, <8 x i8>* %B
	%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>			%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
	%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>			%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
	%tmp5 = add <8 x i8> %tmp3, %tmp4			%tmp5 = add <8 x i8> %tmp3, %tmp4
	ret <8 x i8> %tmp5			ret <8 x i8> %tmp5
	}			}

				define <16 x i8> @vtrni8_Qres(<8 x i8>* %A, <8 x i8>* %B) nounwind {
				;CHECK-LABEL: vtrni8_Qres:
				;CHECK: vtrn.8
				qcolombetUnsubmitted Not Done Reply Inline Actions I know this is consistent with the surrounding tests, but I would prefer that we check that the arguments are what we expect. In other words, could you check that we are feeding the right arguments here? qcolombet: I know this is consistent with the surrounding tests, but I would prefer that we check that the…
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = load <8 x i8>, <8 x i8>* %B
				%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <16 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14, i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
				ret <16 x i8> %tmp3
				}

	define <4 x i16> @vtrni16(<4 x i16>* %A, <4 x i16>* %B) nounwind {			define <4 x i16> @vtrni16(<4 x i16>* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vtrni16:			;CHECK-LABEL: vtrni16:
	;CHECK: vtrn.16			;CHECK: vtrn.16
	;CHECK-NEXT: vadd.i16			;CHECK-NEXT: vadd.i16
	%tmp1 = load <4 x i16>, <4 x i16>* %A			%tmp1 = load <4 x i16>, <4 x i16>* %A
	%tmp2 = load <4 x i16>, <4 x i16>* %B			%tmp2 = load <4 x i16>, <4 x i16>* %B
	%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	%tmp4 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%tmp4 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	%tmp5 = add <4 x i16> %tmp3, %tmp4			%tmp5 = add <4 x i16> %tmp3, %tmp4
	ret <4 x i16> %tmp5			ret <4 x i16> %tmp5
	}			}

				define <8 x i16> @vtrni16_Qres(<4 x i16>* %A, <4 x i16>* %B) nounwind {
				;CHECK-LABEL: vtrni16_Qres:
				;CHECK: vtrn.16
				%tmp1 = load <4 x i16>, <4 x i16>* %A
				%tmp2 = load <4 x i16>, <4 x i16>* %B
				%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <8 x i32> <i32 0, i32 4, i32 2, i32 6, i32 1, i32 5, i32 3, i32 7>
				ret <8 x i16> %tmp3
				}

	define <2 x i32> @vtrni32(<2 x i32>* %A, <2 x i32>* %B) nounwind {			define <2 x i32> @vtrni32(<2 x i32>* %A, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vtrni32:			;CHECK-LABEL: vtrni32:
	;CHECK: vtrn.32			;CHECK: vtrn.32
	;CHECK-NEXT: vadd.i32			;CHECK-NEXT: vadd.i32
	%tmp1 = load <2 x i32>, <2 x i32>* %A			%tmp1 = load <2 x i32>, <2 x i32>* %A
	%tmp2 = load <2 x i32>, <2 x i32>* %B			%tmp2 = load <2 x i32>, <2 x i32>* %B
	%tmp3 = shufflevector <2 x i32> %tmp1, <2 x i32> %tmp2, <2 x i32> <i32 0, i32 2>			%tmp3 = shufflevector <2 x i32> %tmp1, <2 x i32> %tmp2, <2 x i32> <i32 0, i32 2>
	%tmp4 = shufflevector <2 x i32> %tmp1, <2 x i32> %tmp2, <2 x i32> <i32 1, i32 3>			%tmp4 = shufflevector <2 x i32> %tmp1, <2 x i32> %tmp2, <2 x i32> <i32 1, i32 3>
	%tmp5 = add <2 x i32> %tmp3, %tmp4			%tmp5 = add <2 x i32> %tmp3, %tmp4
	ret <2 x i32> %tmp5			ret <2 x i32> %tmp5
	}			}

				define <4 x i32> @vtrni32_Qres(<2 x i32>* %A, <2 x i32>* %B) nounwind {
				;CHECK-LABEL: vtrni32_Qres:
				;CHECK: vtrn.32
				%tmp1 = load <2 x i32>, <2 x i32>* %A
				%tmp2 = load <2 x i32>, <2 x i32>* %B
				%tmp3 = shufflevector <2 x i32> %tmp1, <2 x i32> %tmp2, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
				ret <4 x i32> %tmp3
				}

	define <2 x float> @vtrnf(<2 x float>* %A, <2 x float>* %B) nounwind {			define <2 x float> @vtrnf(<2 x float>* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vtrnf:			;CHECK-LABEL: vtrnf:
	;CHECK: vtrn.32			;CHECK: vtrn.32
	;CHECK-NEXT: vadd.f32			;CHECK-NEXT: vadd.f32
	%tmp1 = load <2 x float>, <2 x float>* %A			%tmp1 = load <2 x float>, <2 x float>* %A
	%tmp2 = load <2 x float>, <2 x float>* %B			%tmp2 = load <2 x float>, <2 x float>* %B
	%tmp3 = shufflevector <2 x float> %tmp1, <2 x float> %tmp2, <2 x i32> <i32 0, i32 2>			%tmp3 = shufflevector <2 x float> %tmp1, <2 x float> %tmp2, <2 x i32> <i32 0, i32 2>
	%tmp4 = shufflevector <2 x float> %tmp1, <2 x float> %tmp2, <2 x i32> <i32 1, i32 3>			%tmp4 = shufflevector <2 x float> %tmp1, <2 x float> %tmp2, <2 x i32> <i32 1, i32 3>
	%tmp5 = fadd <2 x float> %tmp3, %tmp4			%tmp5 = fadd <2 x float> %tmp3, %tmp4
	ret <2 x float> %tmp5			ret <2 x float> %tmp5
	}			}

				define <4 x float> @vtrnf_Qres(<2 x float>* %A, <2 x float>* %B) nounwind {
				;CHECK-LABEL: vtrnf_Qres:
				;CHECK: vtrn.32
				%tmp1 = load <2 x float>, <2 x float>* %A
				%tmp2 = load <2 x float>, <2 x float>* %B
				%tmp3 = shufflevector <2 x float> %tmp1, <2 x float> %tmp2, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
				ret <4 x float> %tmp3
				}

	define <16 x i8> @vtrnQi8(<16 x i8>* %A, <16 x i8>* %B) nounwind {			define <16 x i8> @vtrnQi8(<16 x i8>* %A, <16 x i8>* %B) nounwind {
	;CHECK-LABEL: vtrnQi8:			;CHECK-LABEL: vtrnQi8:
	;CHECK: vtrn.8			;CHECK: vtrn.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <16 x i8>, <16 x i8>* %A			%tmp1 = load <16 x i8>, <16 x i8>* %A
	%tmp2 = load <16 x i8>, <16 x i8>* %B			%tmp2 = load <16 x i8>, <16 x i8>* %B
	%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>			%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>
	%tmp4 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 1, i32 17, i32 3, i32 19, i32 5, i32 21, i32 7, i32 23, i32 9, i32 25, i32 11, i32 27, i32 13, i32 29, i32 15, i32 31>			%tmp4 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 1, i32 17, i32 3, i32 19, i32 5, i32 21, i32 7, i32 23, i32 9, i32 25, i32 11, i32 27, i32 13, i32 29, i32 15, i32 31>
	%tmp5 = add <16 x i8> %tmp3, %tmp4			%tmp5 = add <16 x i8> %tmp3, %tmp4
	ret <16 x i8> %tmp5			ret <16 x i8> %tmp5
	}			}

				define <32 x i8> @vtrnQi8_QQres(<16 x i8>* %A, <16 x i8>* %B) nounwind {
				;CHECK-LABEL: vtrnQi8_QQres:
				;CHECK: vtrn.8
				%tmp1 = load <16 x i8>, <16 x i8>* %A
				%tmp2 = load <16 x i8>, <16 x i8>* %B
				%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <32 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30, i32 1, i32 17, i32 3, i32 19, i32 5, i32 21, i32 7, i32 23, i32 9, i32 25, i32 11, i32 27, i32 13, i32 29, i32 15, i32 31>
				ret <32 x i8> %tmp3
				}

	define <8 x i16> @vtrnQi16(<8 x i16>* %A, <8 x i16>* %B) nounwind {			define <8 x i16> @vtrnQi16(<8 x i16>* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vtrnQi16:			;CHECK-LABEL: vtrnQi16:
	;CHECK: vtrn.16			;CHECK: vtrn.16
	;CHECK-NEXT: vadd.i16			;CHECK-NEXT: vadd.i16
	%tmp1 = load <8 x i16>, <8 x i16>* %A			%tmp1 = load <8 x i16>, <8 x i16>* %A
	%tmp2 = load <8 x i16>, <8 x i16>* %B			%tmp2 = load <8 x i16>, <8 x i16>* %B
	%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>			%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
	%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>			%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
	%tmp5 = add <8 x i16> %tmp3, %tmp4			%tmp5 = add <8 x i16> %tmp3, %tmp4
	ret <8 x i16> %tmp5			ret <8 x i16> %tmp5
	}			}

				define <16 x i16> @vtrnQi16_QQres(<8 x i16>* %A, <8 x i16>* %B) nounwind {
				;CHECK-LABEL: vtrnQi16_QQres:
				;CHECK: vtrn.16
				%tmp1 = load <8 x i16>, <8 x i16>* %A
				%tmp2 = load <8 x i16>, <8 x i16>* %B
				%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <16 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14, i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
				ret <16 x i16> %tmp3
				}

	define <4 x i32> @vtrnQi32(<4 x i32>* %A, <4 x i32>* %B) nounwind {			define <4 x i32> @vtrnQi32(<4 x i32>* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vtrnQi32:			;CHECK-LABEL: vtrnQi32:
	;CHECK: vtrn.32			;CHECK: vtrn.32
	;CHECK-NEXT: vadd.i32			;CHECK-NEXT: vadd.i32
	%tmp1 = load <4 x i32>, <4 x i32>* %A			%tmp1 = load <4 x i32>, <4 x i32>* %A
	%tmp2 = load <4 x i32>, <4 x i32>* %B			%tmp2 = load <4 x i32>, <4 x i32>* %B
	%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	%tmp4 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%tmp4 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	%tmp5 = add <4 x i32> %tmp3, %tmp4			%tmp5 = add <4 x i32> %tmp3, %tmp4
	ret <4 x i32> %tmp5			ret <4 x i32> %tmp5
	}			}

				define <8 x i32> @vtrnQi32_QQres(<4 x i32>* %A, <4 x i32>* %B) nounwind {
				;CHECK-LABEL: vtrnQi32_QQres:
				;CHECK: vtrn.32
				%tmp1 = load <4 x i32>, <4 x i32>* %A
				%tmp2 = load <4 x i32>, <4 x i32>* %B
				%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <8 x i32> <i32 0, i32 4, i32 2, i32 6, i32 1, i32 5, i32 3, i32 7>
				ret <8 x i32> %tmp3
				}

	define <4 x float> @vtrnQf(<4 x float>* %A, <4 x float>* %B) nounwind {			define <4 x float> @vtrnQf(<4 x float>* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vtrnQf:			;CHECK-LABEL: vtrnQf:
	;CHECK: vtrn.32			;CHECK: vtrn.32
	;CHECK-NEXT: vadd.f32			;CHECK-NEXT: vadd.f32
	%tmp1 = load <4 x float>, <4 x float>* %A			%tmp1 = load <4 x float>, <4 x float>* %A
	%tmp2 = load <4 x float>, <4 x float>* %B			%tmp2 = load <4 x float>, <4 x float>* %B
	%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	%tmp4 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%tmp4 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	%tmp5 = fadd <4 x float> %tmp3, %tmp4			%tmp5 = fadd <4 x float> %tmp3, %tmp4
	ret <4 x float> %tmp5			ret <4 x float> %tmp5
	}			}

				define <8 x float> @vtrnQf_QQres(<4 x float>* %A, <4 x float>* %B) nounwind {
				;CHECK-LABEL: vtrnQf_QQres:
				;CHECK: vtrn.32
				%tmp1 = load <4 x float>, <4 x float>* %A
				%tmp2 = load <4 x float>, <4 x float>* %B
				%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <8 x i32> <i32 0, i32 4, i32 2, i32 6, i32 1, i32 5, i32 3, i32 7>
				ret <8 x float> %tmp3
				}

	; Undef shuffle indices should not prevent matching to VTRN:			; Undef shuffle indices should not prevent matching to VTRN:

	define <8 x i8> @vtrni8_undef(<8 x i8>* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vtrni8_undef(<8 x i8>* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vtrni8_undef:			;CHECK-LABEL: vtrni8_undef:
	;CHECK: vtrn.8			;CHECK: vtrn.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = load <8 x i8>, <8 x i8>* %B			%tmp2 = load <8 x i8>, <8 x i8>* %B
	%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 undef, i32 2, i32 10, i32 undef, i32 12, i32 6, i32 14>			%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 undef, i32 2, i32 10, i32 undef, i32 12, i32 6, i32 14>
	%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 undef, i32 undef, i32 15>			%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 undef, i32 undef, i32 15>
	%tmp5 = add <8 x i8> %tmp3, %tmp4			%tmp5 = add <8 x i8> %tmp3, %tmp4
	ret <8 x i8> %tmp5			ret <8 x i8> %tmp5
	}			}

				define <16 x i8> @vtrni8_undef_Qres(<8 x i8>* %A, <8 x i8>* %B) nounwind {
				;CHECK-LABEL: vtrni8_undef_Qres:
				;CHECK: vtrn.8
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = load <8 x i8>, <8 x i8>* %B
				%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <16 x i32> <i32 0, i32 undef, i32 2, i32 10, i32 undef, i32 12, i32 6, i32 14, i32 1, i32 9, i32 3, i32 11, i32 5, i32 undef, i32 undef, i32 15>
				ret <16 x i8> %tmp3
				}

	define <8 x i16> @vtrnQi16_undef(<8 x i16>* %A, <8 x i16>* %B) nounwind {			define <8 x i16> @vtrnQi16_undef(<8 x i16>* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vtrnQi16_undef:			;CHECK-LABEL: vtrnQi16_undef:
	;CHECK: vtrn.16			;CHECK: vtrn.16
	;CHECK-NEXT: vadd.i16			;CHECK-NEXT: vadd.i16
	%tmp1 = load <8 x i16>, <8 x i16>* %A			%tmp1 = load <8 x i16>, <8 x i16>* %A
	%tmp2 = load <8 x i16>, <8 x i16>* %B			%tmp2 = load <8 x i16>, <8 x i16>* %B
	%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 8, i32 undef, i32 undef, i32 4, i32 12, i32 6, i32 14>			%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 8, i32 undef, i32 undef, i32 4, i32 12, i32 6, i32 14>
	%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 1, i32 undef, i32 3, i32 11, i32 5, i32 13, i32 undef, i32 undef>			%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 1, i32 undef, i32 3, i32 11, i32 5, i32 13, i32 undef, i32 undef>
	%tmp5 = add <8 x i16> %tmp3, %tmp4			%tmp5 = add <8 x i16> %tmp3, %tmp4
	ret <8 x i16> %tmp5			ret <8 x i16> %tmp5
	}			}

				define <16 x i16> @vtrnQi16_undef_QQres(<8 x i16>* %A, <8 x i16>* %B) nounwind {
				;CHECK-LABEL: vtrnQi16_undef_QQres:
				;CHECK: vtrn.16
				%tmp1 = load <8 x i16>, <8 x i16>* %A
				%tmp2 = load <8 x i16>, <8 x i16>* %B
				%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <16 x i32> <i32 0, i32 8, i32 undef, i32 undef, i32 4, i32 12, i32 6, i32 14, i32 1, i32 undef, i32 3, i32 11, i32 5, i32 13, i32 undef, i32 undef>
				ret <16 x i16> %tmp3
				}

test/CodeGen/ARM/vuzp.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s

	define <8 x i8> @vuzpi8(<8 x i8>* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vuzpi8(<8 x i8>* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vuzpi8:			;CHECK-LABEL: vuzpi8:
	;CHECK: vuzp.8			;CHECK: vuzp.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = load <8 x i8>, <8 x i8>* %B			%tmp2 = load <8 x i8>, <8 x i8>* %B
	%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	%tmp5 = add <8 x i8> %tmp3, %tmp4			%tmp5 = add <8 x i8> %tmp3, %tmp4
	ret <8 x i8> %tmp5			ret <8 x i8> %tmp5
	}			}

				define <16 x i8> @vuzpi8_Qres(<8 x i8>* %A, <8 x i8>* %B) nounwind {
				;CHECK-LABEL: vuzpi8_Qres:
				;CHECK: vuzp.8
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = load <8 x i8>, <8 x i8>* %B
				%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
				ret <16 x i8> %tmp3
				}

	define <4 x i16> @vuzpi16(<4 x i16>* %A, <4 x i16>* %B) nounwind {			define <4 x i16> @vuzpi16(<4 x i16>* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vuzpi16:			;CHECK-LABEL: vuzpi16:
	;CHECK: vuzp.16			;CHECK: vuzp.16
	;CHECK-NEXT: vadd.i16			;CHECK-NEXT: vadd.i16
	%tmp1 = load <4 x i16>, <4 x i16>* %A			%tmp1 = load <4 x i16>, <4 x i16>* %A
	%tmp2 = load <4 x i16>, <4 x i16>* %B			%tmp2 = load <4 x i16>, <4 x i16>* %B
	%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%tmp4 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%tmp4 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	%tmp5 = add <4 x i16> %tmp3, %tmp4			%tmp5 = add <4 x i16> %tmp3, %tmp4
	ret <4 x i16> %tmp5			ret <4 x i16> %tmp5
	}			}

				define <8 x i16> @vuzpi16_Qres(<4 x i16>* %A, <4 x i16>* %B) nounwind {
				;CHECK-LABEL: vuzpi16_Qres:
				;CHECK: vuzp.16
				%tmp1 = load <4 x i16>, <4 x i16>* %A
				%tmp2 = load <4 x i16>, <4 x i16>* %B
				%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>
				ret <8 x i16> %tmp3
				}

	; VUZP.32 is equivalent to VTRN.32 for 64-bit vectors.			; VUZP.32 is equivalent to VTRN.32 for 64-bit vectors.

	define <16 x i8> @vuzpQi8(<16 x i8>* %A, <16 x i8>* %B) nounwind {			define <16 x i8> @vuzpQi8(<16 x i8>* %A, <16 x i8>* %B) nounwind {
	;CHECK-LABEL: vuzpQi8:			;CHECK-LABEL: vuzpQi8:
	;CHECK: vuzp.8			;CHECK: vuzp.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <16 x i8>, <16 x i8>* %A			%tmp1 = load <16 x i8>, <16 x i8>* %A
	%tmp2 = load <16 x i8>, <16 x i8>* %B			%tmp2 = load <16 x i8>, <16 x i8>* %B
	%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>			%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>
	%tmp4 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>			%tmp4 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
	%tmp5 = add <16 x i8> %tmp3, %tmp4			%tmp5 = add <16 x i8> %tmp3, %tmp4
	ret <16 x i8> %tmp5			ret <16 x i8> %tmp5
	}			}

				define <32 x i8> @vuzpQi8_QQres(<16 x i8>* %A, <16 x i8>* %B) nounwind {
				;CHECK-LABEL: vuzpQi8_QQres:
				;CHECK: vuzp.8
				%tmp1 = load <16 x i8>, <16 x i8>* %A
				%tmp2 = load <16 x i8>, <16 x i8>* %B
				%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <32 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
				ret <32 x i8> %tmp3
				}

	define <8 x i16> @vuzpQi16(<8 x i16>* %A, <8 x i16>* %B) nounwind {			define <8 x i16> @vuzpQi16(<8 x i16>* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vuzpQi16:			;CHECK-LABEL: vuzpQi16:
	;CHECK: vuzp.16			;CHECK: vuzp.16
	;CHECK-NEXT: vadd.i16			;CHECK-NEXT: vadd.i16
	%tmp1 = load <8 x i16>, <8 x i16>* %A			%tmp1 = load <8 x i16>, <8 x i16>* %A
	%tmp2 = load <8 x i16>, <8 x i16>* %B			%tmp2 = load <8 x i16>, <8 x i16>* %B
	%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	%tmp5 = add <8 x i16> %tmp3, %tmp4			%tmp5 = add <8 x i16> %tmp3, %tmp4
	ret <8 x i16> %tmp5			ret <8 x i16> %tmp5
	}			}

				define <16 x i16> @vuzpQi16_QQres(<8 x i16>* %A, <8 x i16>* %B) nounwind {
				;CHECK-LABEL: vuzpQi16_QQres:
				;CHECK: vuzp.16
				%tmp1 = load <8 x i16>, <8 x i16>* %A
				%tmp2 = load <8 x i16>, <8 x i16>* %B
				%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
				ret <16 x i16> %tmp3
				}

	define <4 x i32> @vuzpQi32(<4 x i32>* %A, <4 x i32>* %B) nounwind {			define <4 x i32> @vuzpQi32(<4 x i32>* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vuzpQi32:			;CHECK-LABEL: vuzpQi32:
	;CHECK: vuzp.32			;CHECK: vuzp.32
	;CHECK-NEXT: vadd.i32			;CHECK-NEXT: vadd.i32
	%tmp1 = load <4 x i32>, <4 x i32>* %A			%tmp1 = load <4 x i32>, <4 x i32>* %A
	%tmp2 = load <4 x i32>, <4 x i32>* %B			%tmp2 = load <4 x i32>, <4 x i32>* %B
	%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%tmp4 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%tmp4 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	%tmp5 = add <4 x i32> %tmp3, %tmp4			%tmp5 = add <4 x i32> %tmp3, %tmp4
	ret <4 x i32> %tmp5			ret <4 x i32> %tmp5
	}			}

				define <8 x i32> @vuzpQi32_QQres(<4 x i32>* %A, <4 x i32>* %B) nounwind {
				;CHECK-LABEL: vuzpQi32_QQres:
				;CHECK: vuzp.32
				%tmp1 = load <4 x i32>, <4 x i32>* %A
				%tmp2 = load <4 x i32>, <4 x i32>* %B
				%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>
				ret <8 x i32> %tmp3
				}

	define <4 x float> @vuzpQf(<4 x float>* %A, <4 x float>* %B) nounwind {			define <4 x float> @vuzpQf(<4 x float>* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vuzpQf:			;CHECK-LABEL: vuzpQf:
	;CHECK: vuzp.32			;CHECK: vuzp.32
	;CHECK-NEXT: vadd.f32			;CHECK-NEXT: vadd.f32
	%tmp1 = load <4 x float>, <4 x float>* %A			%tmp1 = load <4 x float>, <4 x float>* %A
	%tmp2 = load <4 x float>, <4 x float>* %B			%tmp2 = load <4 x float>, <4 x float>* %B
	%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%tmp4 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%tmp4 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	%tmp5 = fadd <4 x float> %tmp3, %tmp4			%tmp5 = fadd <4 x float> %tmp3, %tmp4
	ret <4 x float> %tmp5			ret <4 x float> %tmp5
	}			}

				define <8 x float> @vuzpQf_QQres(<4 x float>* %A, <4 x float>* %B) nounwind {
				;CHECK-LABEL: vuzpQf_QQres:
				;CHECK: vuzp.32
				%tmp1 = load <4 x float>, <4 x float>* %A
				%tmp2 = load <4 x float>, <4 x float>* %B
				%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>
				ret <8 x float> %tmp3
				}

	; Undef shuffle indices should not prevent matching to VUZP:			; Undef shuffle indices should not prevent matching to VUZP:

	define <8 x i8> @vuzpi8_undef(<8 x i8>* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vuzpi8_undef(<8 x i8>* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vuzpi8_undef:			;CHECK-LABEL: vuzpi8_undef:
	;CHECK: vuzp.8			;CHECK: vuzp.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = load <8 x i8>, <8 x i8>* %B			%tmp2 = load <8 x i8>, <8 x i8>* %B
	%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 8, i32 10, i32 12, i32 14>			%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 8, i32 10, i32 12, i32 14>
	%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 13, i32 15>			%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 13, i32 15>
	%tmp5 = add <8 x i8> %tmp3, %tmp4			%tmp5 = add <8 x i8> %tmp3, %tmp4
	ret <8 x i8> %tmp5			ret <8 x i8> %tmp5
	}			}

				define <16 x i8> @vuzpi8_undef_Qres(<8 x i8>* %A, <8 x i8>* %B) nounwind {
				;CHECK-LABEL: vuzpi8_undef_Qres:
				;CHECK: vuzp.8
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = load <8 x i8>, <8 x i8>* %B
				%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <16 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 13, i32 15>
				ret <16 x i8> %tmp3
				}

	define <8 x i16> @vuzpQi16_undef(<8 x i16>* %A, <8 x i16>* %B) nounwind {			define <8 x i16> @vuzpQi16_undef(<8 x i16>* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vuzpQi16_undef:			;CHECK-LABEL: vuzpQi16_undef:
	;CHECK: vuzp.16			;CHECK: vuzp.16
	;CHECK-NEXT: vadd.i16			;CHECK-NEXT: vadd.i16
	%tmp1 = load <8 x i16>, <8 x i16>* %A			%tmp1 = load <8 x i16>, <8 x i16>* %A
	%tmp2 = load <8 x i16>, <8 x i16>* %B			%tmp2 = load <8 x i16>, <8 x i16>* %B
	%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 undef, i32 4, i32 undef, i32 8, i32 10, i32 12, i32 14>			%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 undef, i32 4, i32 undef, i32 8, i32 10, i32 12, i32 14>
	%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 1, i32 3, i32 5, i32 undef, i32 undef, i32 11, i32 13, i32 15>			%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 1, i32 3, i32 5, i32 undef, i32 undef, i32 11, i32 13, i32 15>
	%tmp5 = add <8 x i16> %tmp3, %tmp4			%tmp5 = add <8 x i16> %tmp3, %tmp4
	ret <8 x i16> %tmp5			ret <8 x i16> %tmp5
	}			}

				define <16 x i16> @vuzpQi16_undef_QQres(<8 x i16>* %A, <8 x i16>* %B) nounwind {
				;CHECK-LABEL: vuzpQi16_undef_QQres:
				;CHECK: vuzp.16
				%tmp1 = load <8 x i16>, <8 x i16>* %A
				%tmp2 = load <8 x i16>, <8 x i16>* %B
				%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <16 x i32> <i32 0, i32 undef, i32 4, i32 undef, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 undef, i32 undef, i32 11, i32 13, i32 15>
				ret <16 x i16> %tmp3
				}

test/CodeGen/ARM/vzip.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s

	define <8 x i8> @vzipi8(<8 x i8>* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vzipi8(<8 x i8>* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vzipi8:			;CHECK-LABEL: vzipi8:
	;CHECK: vzip.8			;CHECK: vzip.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = load <8 x i8>, <8 x i8>* %B			%tmp2 = load <8 x i8>, <8 x i8>* %B
	%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>			%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>
	%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>			%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	%tmp5 = add <8 x i8> %tmp3, %tmp4			%tmp5 = add <8 x i8> %tmp3, %tmp4
	ret <8 x i8> %tmp5			ret <8 x i8> %tmp5
	}			}

				define <16 x i8> @vzipi8_Qres(<8 x i8>* %A, <8 x i8>* %B) nounwind {
				;CHECK-LABEL: vzipi8_Qres:
				;CHECK: vzip.8
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = load <8 x i8>, <8 x i8>* %B
				%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
				ret <16 x i8> %tmp3
				}

	define <4 x i16> @vzipi16(<4 x i16>* %A, <4 x i16>* %B) nounwind {			define <4 x i16> @vzipi16(<4 x i16>* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vzipi16:			;CHECK-LABEL: vzipi16:
	;CHECK: vzip.16			;CHECK: vzip.16
	;CHECK-NEXT: vadd.i16			;CHECK-NEXT: vadd.i16
	%tmp1 = load <4 x i16>, <4 x i16>* %A			%tmp1 = load <4 x i16>, <4 x i16>* %A
	%tmp2 = load <4 x i16>, <4 x i16>* %B			%tmp2 = load <4 x i16>, <4 x i16>* %B
	%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>			%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
	%tmp4 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 2, i32 6, i32 3, i32 7>			%tmp4 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
	%tmp5 = add <4 x i16> %tmp3, %tmp4			%tmp5 = add <4 x i16> %tmp3, %tmp4
	ret <4 x i16> %tmp5			ret <4 x i16> %tmp5
	}			}

				define <8 x i16> @vzipi16_Qres(<4 x i16>* %A, <4 x i16>* %B) nounwind {
				;CHECK-LABEL: vzipi16_Qres:
				;CHECK: vzip.16
				%tmp1 = load <4 x i16>, <4 x i16>* %A
				%tmp2 = load <4 x i16>, <4 x i16>* %B
				%tmp3 = shufflevector <4 x i16> %tmp1, <4 x i16> %tmp2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
				ret <8 x i16> %tmp3
				}

	; VZIP.32 is equivalent to VTRN.32 for 64-bit vectors.			; VZIP.32 is equivalent to VTRN.32 for 64-bit vectors.

	define <16 x i8> @vzipQi8(<16 x i8>* %A, <16 x i8>* %B) nounwind {			define <16 x i8> @vzipQi8(<16 x i8>* %A, <16 x i8>* %B) nounwind {
	;CHECK-LABEL: vzipQi8:			;CHECK-LABEL: vzipQi8:
	;CHECK: vzip.8			;CHECK: vzip.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <16 x i8>, <16 x i8>* %A			%tmp1 = load <16 x i8>, <16 x i8>* %A
	%tmp2 = load <16 x i8>, <16 x i8>* %B			%tmp2 = load <16 x i8>, <16 x i8>* %B
	%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>			%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>
	%tmp4 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>			%tmp4 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
	%tmp5 = add <16 x i8> %tmp3, %tmp4			%tmp5 = add <16 x i8> %tmp3, %tmp4
	ret <16 x i8> %tmp5			ret <16 x i8> %tmp5
	}			}

				define <32 x i8> @vzipQi8_QQres(<16 x i8>* %A, <16 x i8>* %B) nounwind {
				;CHECK-LABEL: vzipQi8_QQres:
				;CHECK: vzip.8
				%tmp1 = load <16 x i8>, <16 x i8>* %A
				%tmp2 = load <16 x i8>, <16 x i8>* %B
				%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
				ret <32 x i8> %tmp3
				}

	define <8 x i16> @vzipQi16(<8 x i16>* %A, <8 x i16>* %B) nounwind {			define <8 x i16> @vzipQi16(<8 x i16>* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vzipQi16:			;CHECK-LABEL: vzipQi16:
	;CHECK: vzip.16			;CHECK: vzip.16
	;CHECK-NEXT: vadd.i16			;CHECK-NEXT: vadd.i16
	%tmp1 = load <8 x i16>, <8 x i16>* %A			%tmp1 = load <8 x i16>, <8 x i16>* %A
	%tmp2 = load <8 x i16>, <8 x i16>* %B			%tmp2 = load <8 x i16>, <8 x i16>* %B
	%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>			%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>
	%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>			%tmp4 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	%tmp5 = add <8 x i16> %tmp3, %tmp4			%tmp5 = add <8 x i16> %tmp3, %tmp4
	ret <8 x i16> %tmp5			ret <8 x i16> %tmp5
	}			}

				define <16 x i16> @vzipQi16_QQres(<8 x i16>* %A, <8 x i16>* %B) nounwind {
				;CHECK-LABEL: vzipQi16_QQres:
				;CHECK: vzip.16
				%tmp1 = load <8 x i16>, <8 x i16>* %A
				%tmp2 = load <8 x i16>, <8 x i16>* %B
				%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
				ret <16 x i16> %tmp3
				}

	define <4 x i32> @vzipQi32(<4 x i32>* %A, <4 x i32>* %B) nounwind {			define <4 x i32> @vzipQi32(<4 x i32>* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vzipQi32:			;CHECK-LABEL: vzipQi32:
	;CHECK: vzip.32			;CHECK: vzip.32
	;CHECK-NEXT: vadd.i32			;CHECK-NEXT: vadd.i32
	%tmp1 = load <4 x i32>, <4 x i32>* %A			%tmp1 = load <4 x i32>, <4 x i32>* %A
	%tmp2 = load <4 x i32>, <4 x i32>* %B			%tmp2 = load <4 x i32>, <4 x i32>* %B
	%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>			%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
	%tmp4 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 2, i32 6, i32 3, i32 7>			%tmp4 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
	%tmp5 = add <4 x i32> %tmp3, %tmp4			%tmp5 = add <4 x i32> %tmp3, %tmp4
	ret <4 x i32> %tmp5			ret <4 x i32> %tmp5
	}			}

				define <8 x i32> @vzipQi32_QQres(<4 x i32>* %A, <4 x i32>* %B) nounwind {
				;CHECK-LABEL: vzipQi32_QQres:
				;CHECK: vzip.32
				%tmp1 = load <4 x i32>, <4 x i32>* %A
				%tmp2 = load <4 x i32>, <4 x i32>* %B
				%tmp3 = shufflevector <4 x i32> %tmp1, <4 x i32> %tmp2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
				ret <8 x i32> %tmp3
				}

	define <4 x float> @vzipQf(<4 x float>* %A, <4 x float>* %B) nounwind {			define <4 x float> @vzipQf(<4 x float>* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vzipQf:			;CHECK-LABEL: vzipQf:
	;CHECK: vzip.32			;CHECK: vzip.32
	;CHECK-NEXT: vadd.f32			;CHECK-NEXT: vadd.f32
	%tmp1 = load <4 x float>, <4 x float>* %A			%tmp1 = load <4 x float>, <4 x float>* %A
	%tmp2 = load <4 x float>, <4 x float>* %B			%tmp2 = load <4 x float>, <4 x float>* %B
	%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>			%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
	%tmp4 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 2, i32 6, i32 3, i32 7>			%tmp4 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
	%tmp5 = fadd <4 x float> %tmp3, %tmp4			%tmp5 = fadd <4 x float> %tmp3, %tmp4
	ret <4 x float> %tmp5			ret <4 x float> %tmp5
	}			}

				define <8 x float> @vzipQf_QQres(<4 x float>* %A, <4 x float>* %B) nounwind {
				;CHECK-LABEL: vzipQf_QQres:
				;CHECK: vzip.32
				%tmp1 = load <4 x float>, <4 x float>* %A
				%tmp2 = load <4 x float>, <4 x float>* %B
				%tmp3 = shufflevector <4 x float> %tmp1, <4 x float> %tmp2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
				ret <8 x float> %tmp3
				}

	; Undef shuffle indices should not prevent matching to VZIP:			; Undef shuffle indices should not prevent matching to VZIP:

	define <8 x i8> @vzipi8_undef(<8 x i8>* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vzipi8_undef(<8 x i8>* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vzipi8_undef:			;CHECK-LABEL: vzipi8_undef:
	;CHECK: vzip.8			;CHECK: vzip.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = load <8 x i8>, <8 x i8>* %B			%tmp2 = load <8 x i8>, <8 x i8>* %B
	%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 undef, i32 1, i32 9, i32 undef, i32 10, i32 3, i32 11>			%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 0, i32 undef, i32 1, i32 9, i32 undef, i32 10, i32 3, i32 11>
	%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 undef, i32 undef, i32 15>			%tmp4 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 undef, i32 undef, i32 15>
	%tmp5 = add <8 x i8> %tmp3, %tmp4			%tmp5 = add <8 x i8> %tmp3, %tmp4
	ret <8 x i8> %tmp5			ret <8 x i8> %tmp5
	}			}

				define <16 x i8> @vzipi8_undef_Qres(<8 x i8>* %A, <8 x i8>* %B) nounwind {
				;CHECK-LABEL: vzipi8_undef_Qres:
				;CHECK: vzip.8
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = load <8 x i8>, <8 x i8>* %B
				%tmp3 = shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <16 x i32> <i32 0, i32 undef, i32 1, i32 9, i32 undef, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 undef, i32 undef, i32 15>
				ret <16 x i8> %tmp3
				}

	define <16 x i8> @vzipQi8_undef(<16 x i8>* %A, <16 x i8>* %B) nounwind {			define <16 x i8> @vzipQi8_undef(<16 x i8>* %A, <16 x i8>* %B) nounwind {
	;CHECK-LABEL: vzipQi8_undef:			;CHECK-LABEL: vzipQi8_undef:
	;CHECK: vzip.8			;CHECK: vzip.8
	;CHECK-NEXT: vadd.i8			;CHECK-NEXT: vadd.i8
	%tmp1 = load <16 x i8>, <16 x i8>* %A			%tmp1 = load <16 x i8>, <16 x i8>* %A
	%tmp2 = load <16 x i8>, <16 x i8>* %B			%tmp2 = load <16 x i8>, <16 x i8>* %B
	%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 0, i32 16, i32 1, i32 undef, i32 undef, i32 undef, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>			%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 0, i32 16, i32 1, i32 undef, i32 undef, i32 undef, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>
	%tmp4 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 8, i32 24, i32 9, i32 undef, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 undef, i32 14, i32 30, i32 undef, i32 31>			%tmp4 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <16 x i32> <i32 8, i32 24, i32 9, i32 undef, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 undef, i32 14, i32 30, i32 undef, i32 31>
	%tmp5 = add <16 x i8> %tmp3, %tmp4			%tmp5 = add <16 x i8> %tmp3, %tmp4
	ret <16 x i8> %tmp5			ret <16 x i8> %tmp5
	}			}

				define <32 x i8> @vzipQi8_undef_QQres(<16 x i8>* %A, <16 x i8>* %B) nounwind {
				;CHECK-LABEL: vzipQi8_undef_QQres:
				;CHECK: vzip.8
				%tmp1 = load <16 x i8>, <16 x i8>* %A
				%tmp2 = load <16 x i8>, <16 x i8>* %B
				%tmp3 = shufflevector <16 x i8> %tmp1, <16 x i8> %tmp2, <32 x i32> <i32 0, i32 16, i32 1, i32 undef, i32 undef, i32 undef, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 undef, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 undef, i32 14, i32 30, i32 undef, i32 31>
				ret <32 x i8> %tmp3
				}