This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
aarch64-vuzp.ll
-
arm64-ext.ll
-
neon-scalar-copy.ll

Differential D43973

[AArch64] define isExtractSubvectorCheap
ClosedPublic

Authored by sebpop on Mar 1 2018, 2:29 PM.

Download Raw Diff

Details

Reviewers

efriedma
SjoerdMeijer
evandro
javed.absar
kristof.beyls

Commits

rG41073e8046d3: [AArch64] define isExtractSubvectorCheap
rL326811: [AArch64] define isExtractSubvectorCheap

Summary

Following the ARM-neon backend, define isExtractSubvectorCheap to return true
when extracting low and high part of a neon register.

The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll that was
checking that ReconstructShuffle in ISelLowering is working as expected. The
pattern to exercise ReconstructShuffle is a BUILD_VECTOR and the expected
pattern gets transformed earlier by the DAGCombiner into an extract_subvector +
vector_shuffle. As there is no way to disable the combiner to only exercise the
code in ISelLowering, the patch disables the testcase.

Diff Detail

Repository: rL LLVM

Event Timeline

sebpop created this revision.Mar 1 2018, 2:29 PM

Herald added subscribers: hiraditya, kristof.beyls, javed.absar, rengolin. · View Herald TranscriptMar 1 2018, 2:29 PM

sebpop added reviewers: evandro, javed.absar, kristof.beyls.Mar 5 2018, 11:11 AM

It LGTM, but I'd wait a while to give a chance for others to chime up.

llvm/test/CodeGen/AArch64/arm64-ext.ll
100 ↗	(On Diff #136604)	Methinks that this test could stay. Before, it resulted in a mix of `EXT`, `UZP1`, `ZIP1`. With this patch, in a pair of `LDR`, `TBL`, which seems to be a good result. Better yet, were the shuffle index changed to `<i32 4, i32 8, i32 5, i32 9>`, then the result would be just like in the test below.

sebpop added inline comments.Mar 5 2018, 12:46 PM

llvm/test/CodeGen/AArch64/arm64-ext.ll
100 ↗	(On Diff #136604)	If I replace the pattern with <4,8,5,9> that is trivially matched as a ZIP1 pattern before and after the patch. The testcase was crafted this way to force the compiler to use VEXT instructions instead of loads and stores on the stack. As I tried to explain the the commit message, this testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive" all DAG transforms until ISelLowering. The testcase is supposed to check that AArch64TargetLowering::ReconstructShuffle() works, and for that we need a BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to ISelLowering.

javed.absar added inline comments.Mar 6 2018, 1:13 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8176 ↗	(On Diff #136604)	This is similar (identical, in fact, as far as i can see) to whats done in A32. LGTM.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 6 2018, 8:57 AM

Closed by commit rL326811: [AArch64] define isExtractSubvectorCheap (authored by spop). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64ISelLowering.h

5 lines

AArch64ISelLowering.cpp

8 lines

test/

CodeGen/

AArch64/

aarch64-vuzp.ll

4 lines

arm64-ext.ll

13 lines

neon-scalar-copy.ll

8 lines

Diff 137212

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	public:
/// \brief Returns false if N is a bit extraction pattern of (X >> C) & Mask.		/// \brief Returns false if N is a bit extraction pattern of (X >> C) & Mask.
bool isDesirableToCommuteWithShift(const SDNode *N) const override;		bool isDesirableToCommuteWithShift(const SDNode *N) const override;

/// \brief Returns true if it is beneficial to convert a load of a constant		/// \brief Returns true if it is beneficial to convert a load of a constant
/// to just the constant itself.		/// to just the constant itself.
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

		/// Return true if EXTRACT_SUBVECTOR is cheap for this result type
		/// with this index.
		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
		unsigned Index) const override;

Value emitLoadLinked(IRBuilder<> &Builder, Value Addr,		Value emitLoadLinked(IRBuilder<> &Builder, Value Addr,
AtomicOrdering Ord) const override;		AtomicOrdering Ord) const override;
Value emitStoreConditional(IRBuilder<> &Builder, Value Val,		Value emitStoreConditional(IRBuilder<> &Builder, Value Val,
Value *Addr, AtomicOrdering Ord) const override;		Value *Addr, AtomicOrdering Ord) const override;

void emitAtomicCmpXchgNoStoreLLBalance(IRBuilder<> &Builder) const override;		void emitAtomicCmpXchgNoStoreLLBalance(IRBuilder<> &Builder) const override;

TargetLoweringBase::AtomicExpansionKind		TargetLoweringBase::AtomicExpansionKind
▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,194 Lines • ▼ Show 20 Lines	if (BitSize == 32)
Val &= (1LL << 32) - 1;		Val &= (1LL << 32) - 1;

unsigned LZ = countLeadingZeros((uint64_t)Val);		unsigned LZ = countLeadingZeros((uint64_t)Val);
unsigned Shift = (63 - LZ) / 16;		unsigned Shift = (63 - LZ) / 16;
// MOVZ is free so return true for one or fewer MOVK.		// MOVZ is free so return true for one or fewer MOVK.
return Shift < 3;		return Shift < 3;
}		}

		bool AArch64TargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
		unsigned Index) const {
		if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))
		return false;

		return (Index == 0 \|\| Index == ResVT.getVectorNumElements());
		}

/// Turn vector tests of the signbit in the form of:		/// Turn vector tests of the signbit in the form of:
/// xor (sra X, elt_size(X)-1), -1		/// xor (sra X, elt_size(X)-1), -1
/// into:		/// into:
/// cmge X, X, #0		/// cmge X, X, #0
static SDValue foldVectorXorShiftIntoCmp(SDNode *N, SelectionDAG &DAG,		static SDValue foldVectorXorShiftIntoCmp(SDNode *N, SelectionDAG &DAG,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if (!Subtarget->hasNEON() \|\| !VT.isVector())		if (!Subtarget->hasNEON() \|\| !VT.isVector())
▲ Show 20 Lines • Show All 2,959 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/aarch64-vuzp.ll

	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8>, <16 x i8>)			declare <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8>, <16 x i8>)

	; CHECK-LABEL: fun1:			; CHECK-LABEL: fun1:
	; CHECK: uzp1 {{v[0-9]+}}.8b, {{v[0-9]+}}.8b, {{v[0-9]+}}.8b			; CHECK: uzp1 {{v[0-9]+}}.8b, {{v[0-9]+}}.8b, {{v[0-9]+}}.8b
	; CHECK-NOT: mov
	define i32 @fun1() {			define i32 @fun1() {
	entry:			entry:
	%vtbl1.i.1 = tail call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> <i8 0, i8 16, i8 19, i8 4, i8 -65, i8 -65, i8 -71, i8 -71, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> undef)			%vtbl1.i.1 = tail call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> <i8 0, i8 16, i8 19, i8 4, i8 -65, i8 -65, i8 -71, i8 -71, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> undef)
	%vuzp.i212.1 = shufflevector <16 x i8> %vtbl1.i.1, <16 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			%vuzp.i212.1 = shufflevector <16 x i8> %vtbl1.i.1, <16 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	%scevgep = getelementptr <8 x i8>, <8 x i8>* undef, i64 1			%scevgep = getelementptr <8 x i8>, <8 x i8>* undef, i64 1
	store <8 x i8> %vuzp.i212.1, <8 x i8>* %scevgep, align 1			store <8 x i8> %vuzp.i212.1, <8 x i8>* %scevgep, align 1
	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: fun2:			; CHECK-LABEL: fun2:
	; CHECK: uzp2 {{v[0-9]+}}.8b, {{v[0-9]+}}.8b, {{v[0-9]+}}.8b			; CHECK: uzp2 {{v[0-9]+}}.8b, {{v[0-9]+}}.8b, {{v[0-9]+}}.8b
	; CHECK-NOT: mov
	define i32 @fun2() {			define i32 @fun2() {
	entry:			entry:
	%vtbl1.i.1 = tail call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> <i8 0, i8 16, i8 19, i8 4, i8 -65, i8 -65, i8 -71, i8 -71, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> undef)			%vtbl1.i.1 = tail call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> <i8 0, i8 16, i8 19, i8 4, i8 -65, i8 -65, i8 -71, i8 -71, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> undef)
	%vuzp.i212.1 = shufflevector <16 x i8> %vtbl1.i.1, <16 x i8> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			%vuzp.i212.1 = shufflevector <16 x i8> %vtbl1.i.1, <16 x i8> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	%scevgep = getelementptr <8 x i8>, <8 x i8>* undef, i64 1			%scevgep = getelementptr <8 x i8>, <8 x i8>* undef, i64 1
	store <8 x i8> %vuzp.i212.1, <8 x i8>* %scevgep, align 1			store <8 x i8> %vuzp.i212.1, <8 x i8>* %scevgep, align 1
	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: fun3:			; CHECK-LABEL: fun3:
	; CHECK-NOT: uzp1			; CHECK-NOT: uzp1
	; CHECK: mov
	define i32 @fun3() {			define i32 @fun3() {
	entry:			entry:
	%vtbl1.i.1 = tail call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> <i8 0, i8 16, i8 19, i8 4, i8 -65, i8 -65, i8 -71, i8 -71, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> undef)			%vtbl1.i.1 = tail call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> <i8 0, i8 16, i8 19, i8 4, i8 -65, i8 -65, i8 -71, i8 -71, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> undef)
	%vuzp.i212.1 = shufflevector <16 x i8> %vtbl1.i.1, <16 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 15>			%vuzp.i212.1 = shufflevector <16 x i8> %vtbl1.i.1, <16 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 15>
	%scevgep = getelementptr <8 x i8>, <8 x i8>* undef, i64 1			%scevgep = getelementptr <8 x i8>, <8 x i8>* undef, i64 1
	store <8 x i8> %vuzp.i212.1, <8 x i8>* %scevgep, align 1			store <8 x i8> %vuzp.i212.1, <8 x i8>* %scevgep, align 1
	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: fun4:			; CHECK-LABEL: fun4:
	; CHECK-NOT: uzp2			; CHECK-NOT: uzp2
	; CHECK: mov
	define i32 @fun4() {			define i32 @fun4() {
	entry:			entry:
	%vtbl1.i.1 = tail call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> <i8 0, i8 16, i8 19, i8 4, i8 -65, i8 -65, i8 -71, i8 -71, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> undef)			%vtbl1.i.1 = tail call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> <i8 0, i8 16, i8 19, i8 4, i8 -65, i8 -65, i8 -71, i8 -71, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> undef)
	%vuzp.i212.1 = shufflevector <16 x i8> %vtbl1.i.1, <16 x i8> undef, <8 x i32> <i32 3, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			%vuzp.i212.1 = shufflevector <16 x i8> %vtbl1.i.1, <16 x i8> undef, <8 x i32> <i32 3, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	%scevgep = getelementptr <8 x i8>, <8 x i8>* undef, i64 1			%scevgep = getelementptr <8 x i8>, <8 x i8>* undef, i64 1
	store <8 x i8> %vuzp.i212.1, <8 x i8>* %scevgep, align 1			store <8 x i8> %vuzp.i212.1, <8 x i8>* %scevgep, align 1
	ret i32 undef			ret i32 undef
	}			}
	Show All 13 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-ext.ll

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	;CHECK: {{ext.16b.*#10}}
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A
%vext = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 2, i32 3, i32 4>		%vext = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 2, i32 3, i32 4>
ret <8 x i16> %vext;		ret <8 x i16> %vext;
}		}

; Tests for ReconstructShuffle function. Indices have to be carefully		; Tests for ReconstructShuffle function. Indices have to be carefully
; chosen to reach lowering phase as a BUILD_VECTOR.		; chosen to reach lowering phase as a BUILD_VECTOR.

; One vector needs vext, the other can be handled by extract_subvector
; Also checks interleaving of sources is handled correctly.
; Essence: a vext is used on %A and something saner than stack load/store for final result.
define <4 x i16> @test_interleaved(<8 x i16>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: test_interleaved:
;CHECK: ext.8b
;CHECK: zip1.4h
%tmp1 = load <8 x i16>, <8 x i16>* %A
%tmp2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <4 x i32> <i32 3, i32 8, i32 5, i32 9>
ret <4 x i16> %tmp3
}

; An undef in the shuffle list should still be optimizable		; An undef in the shuffle list should still be optimizable
define <4 x i16> @test_undef(<8 x i16>* %A, <8 x i16>* %B) nounwind {		define <4 x i16> @test_undef(<8 x i16>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: test_undef:		;CHECK-LABEL: test_undef:
;CHECK: zip1.4h		;CHECK: zip1.4h
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A
%tmp2 = load <8 x i16>, <8 x i16>* %B		%tmp2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <4 x i32> <i32 undef, i32 8, i32 5, i32 9>		%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <4 x i32> <i32 undef, i32 8, i32 5, i32 9>
ret <4 x i16> %tmp3		ret <4 x i16> %tmp3
}		}

llvm/trunk/test/CodeGen/AArch64/neon-scalar-copy.ll

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; CHECK-NOT: ins {{[vsdh][0-9]+}}			; CHECK-NOT: ins {{[vsdh][0-9]+}}
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <8 x half> %v, i32 0			%tmp1 = extractelement <8 x half> %v, i32 0
	ret half %tmp1			ret half %tmp1
	}			}

	define <1 x i8> @test_vector_dup_bv16B(<16 x i8> %v1) #0 {			define <1 x i8> @test_vector_dup_bv16B(<16 x i8> %v1) #0 {
	; CHECK-LABEL: test_vector_dup_bv16B:			; CHECK-LABEL: test_vector_dup_bv16B:
	; CHECK-NEXT: dup v0.16b, v0.b[14]			; CHECK-NEXT: dup v0.8b, v0.b[14]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%shuffle.i = shufflevector <16 x i8> %v1, <16 x i8> undef, <1 x i32> <i32 14>			%shuffle.i = shufflevector <16 x i8> %v1, <16 x i8> undef, <1 x i32> <i32 14>
	ret <1 x i8> %shuffle.i			ret <1 x i8> %shuffle.i
	}			}

	define <1 x i8> @test_vector_dup_bv8B(<8 x i8> %v1) #0 {			define <1 x i8> @test_vector_dup_bv8B(<8 x i8> %v1) #0 {
	; CHECK-LABEL: test_vector_dup_bv8B:			; CHECK-LABEL: test_vector_dup_bv8B:
	; CHECK-NEXT: dup v0.8b, v0.b[7]			; CHECK-NEXT: dup v0.8b, v0.b[7]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%shuffle.i = shufflevector <8 x i8> %v1, <8 x i8> undef, <1 x i32> <i32 7>			%shuffle.i = shufflevector <8 x i8> %v1, <8 x i8> undef, <1 x i32> <i32 7>
	ret <1 x i8> %shuffle.i			ret <1 x i8> %shuffle.i
	}			}

	define <1 x i16> @test_vector_dup_hv8H(<8 x i16> %v1) #0 {			define <1 x i16> @test_vector_dup_hv8H(<8 x i16> %v1) #0 {
	; CHECK-LABEL: test_vector_dup_hv8H:			; CHECK-LABEL: test_vector_dup_hv8H:
	; CHECK-NEXT: dup v0.8h, v0.h[7]			; CHECK-NEXT: dup v0.4h, v0.h[7]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%shuffle.i = shufflevector <8 x i16> %v1, <8 x i16> undef, <1 x i32> <i32 7>			%shuffle.i = shufflevector <8 x i16> %v1, <8 x i16> undef, <1 x i32> <i32 7>
	ret <1 x i16> %shuffle.i			ret <1 x i16> %shuffle.i
	}			}

	define <1 x i16> @test_vector_dup_hv4H(<4 x i16> %v1) #0 {			define <1 x i16> @test_vector_dup_hv4H(<4 x i16> %v1) #0 {
	; CHECK-LABEL: test_vector_dup_hv4H:			; CHECK-LABEL: test_vector_dup_hv4H:
	; CHECK-NEXT: dup v0.4h, v0.h[3]			; CHECK-NEXT: dup v0.4h, v0.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%shuffle.i = shufflevector <4 x i16> %v1, <4 x i16> undef, <1 x i32> <i32 3>			%shuffle.i = shufflevector <4 x i16> %v1, <4 x i16> undef, <1 x i32> <i32 3>
	ret <1 x i16> %shuffle.i			ret <1 x i16> %shuffle.i
	}			}

	define <1 x i32> @test_vector_dup_sv4S(<4 x i32> %v1) #0 {			define <1 x i32> @test_vector_dup_sv4S(<4 x i32> %v1) #0 {
	; CHECK-LABEL: test_vector_dup_sv4S:			; CHECK-LABEL: test_vector_dup_sv4S:
	; CHECK-NEXT: dup v0.4s, v0.s[3]			; CHECK-NEXT: dup v0.2s, v0.s[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%shuffle = shufflevector <4 x i32> %v1, <4 x i32> undef, <1 x i32> <i32 3>			%shuffle = shufflevector <4 x i32> %v1, <4 x i32> undef, <1 x i32> <i32 3>
	ret <1 x i32> %shuffle			ret <1 x i32> %shuffle
	}			}

	define <1 x i32> @test_vector_dup_sv2S(<2 x i32> %v1) #0 {			define <1 x i32> @test_vector_dup_sv2S(<2 x i32> %v1) #0 {
	; CHECK-LABEL: test_vector_dup_sv2S:			; CHECK-LABEL: test_vector_dup_sv2S:
	; CHECK-NEXT: dup v0.2s, v0.s[1]			; CHECK-NEXT: dup v0.2s, v0.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%shuffle = shufflevector <2 x i32> %v1, <2 x i32> undef, <1 x i32> <i32 1>			%shuffle = shufflevector <2 x i32> %v1, <2 x i32> undef, <1 x i32> <i32 1>
	ret <1 x i32> %shuffle			ret <1 x i32> %shuffle
	}			}

	define <1 x i64> @test_vector_dup_dv2D(<2 x i64> %v1) #0 {			define <1 x i64> @test_vector_dup_dv2D(<2 x i64> %v1) #0 {
	; CHECK-LABEL: test_vector_dup_dv2D:			; CHECK-LABEL: test_vector_dup_dv2D:
	; CHECK-NEXT: ext {{v[0-9]+}}.16b, {{v[0-9]+}}.16b, {{v[0-9]+}}.16b, #8			; CHECK-NEXT: ext {{v[0-9]+}}.16b, {{v[0-9]+}}.16b, {{v[0-9]+}}.16b, #8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%shuffle.i = shufflevector <2 x i64> %v1, <2 x i64> undef, <1 x i32> <i32 1>			%shuffle.i = shufflevector <2 x i64> %v1, <2 x i64> undef, <1 x i32> <i32 1>
	ret <1 x i64> %shuffle.i			ret <1 x i64> %shuffle.i
	}			}

	define <1 x i64> @test_vector_copy_dup_dv2D(<1 x i64> %a, <2 x i64> %c) #0 {			define <1 x i64> @test_vector_copy_dup_dv2D(<1 x i64> %a, <2 x i64> %c) #0 {
	; CHECK-LABEL: test_vector_copy_dup_dv2D:			; CHECK-LABEL: test_vector_copy_dup_dv2D:
	; CHECK-NEXT: dup v0.2d, v1.d[1]			; CHECK-NEXT: ext {{v[0-9]+}}.16b, {{v[0-9]+}}.16b, {{v[0-9]+}}.16b, #8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%vget_lane = extractelement <2 x i64> %c, i32 1			%vget_lane = extractelement <2 x i64> %c, i32 1
	%vset_lane = insertelement <1 x i64> undef, i64 %vget_lane, i32 0			%vset_lane = insertelement <1 x i64> undef, i64 %vget_lane, i32 0
	ret <1 x i64> %vset_lane			ret <1 x i64> %vset_lane
	}			}

	; Undefined behaviour, so we really don't care what actually gets emitted, just			; Undefined behaviour, so we really don't care what actually gets emitted, just
	; as long as we don't crash (since it could be dynamically unreachable).			; as long as we don't crash (since it could be dynamically unreachable).
	Show All 17 Lines