This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
-
CGCall.cpp
-
CodeGenFunction.cpp
-
test/
-
CodeGen/
1
aarch64-neon-3v.c
-
aarch64-neon-across.c
-
aarch64-neon-fma.c
-
aarch64-neon-ldst-one.c
-
aarch64-neon-scalar-copy.c
-
aarch64-neon-scalar-x-indexed-elem.c
-
aarch64-neon-tbl.c
-
aarch64-neon-vget.c
-
aarch64-poly64.c
-
arm-neon-fma.c
-
arm-neon-numeric-maxmin.c
-
arm-neon-vcvtX.c
-
arm64_vdupq_n_f64.c
1
x86-vector-width.c
-
CodeGenOpenCL/
1
fpmath.cl

Differential D52441

[CodeGen] Update min-legal-vector width based on function argument and return types
ClosedPublic

Authored by craig.topper on Sep 24 2018, 3:32 PM.

Download Raw Diff

Details

Reviewers

rnk
chandlerc
rsmith
echristo
javed.absar

Commits

rG3113ec3dc78e: [CodeGen] Update min-legal-vector width based on function argument and return…
rL345168: [CodeGen] Update min-legal-vector width based on function argument and return…
rC345168: [CodeGen] Update min-legal-vector width based on function argument and return…

Summary

This is a continuation of my patches to inform the X86 backend about what the largest IR types are in the function so that we can restrict the backend type legalizer to prevent 512-bit vectors on SKX when -mprefer-vector-width=256 is specified if no explicit 512 bit vectors were specified by the user.

This patch updates the vector width based on the argument and return types of the current function and from the types of any functions it calls. This is intended to make sure the backend type legalizer doesn't disturb any types that are required for ABI.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 24 2018, 3:32 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptSep 24 2018, 3:32 PM

Ping

Herald added a subscriber: arphaman. · View Herald TranscriptOct 1 2018, 10:20 AM

Code looks fine, but attribute testing is always a pain.

test/CodeGen/aarch64-neon-3v.c
14	These attribute changes don't appear to test anything. They don't say anything about the min-legal-vector-width. It's unfortunate that LLVM attribute syntax is so filecheck unfriendly, but for now, I think you need to check for #0, #1, etc attribute definitions at the end of each .c file.
test/CodeGen/x86-vector-width.c
52	I'd look for `define {{.}}@foo{{.}} #0` to be a bit more precise.
test/CodeGenOpenCL/fpmath.cl
50–51	Does this actually work? Shouldn't these be `NODIVOPT-SAME: "correctly-rounded-divide-sqrt-fp-math"="false"`?

Couple of comments/questions:

a) How do you want these attributes to be handled in merging/inlining? Also, are they a failure on module linking? In general, how do they work with LTO?
b) Could use some more comments when we're adding/merging the attributes in IR generation as well.

Address Reid's comments. Add a comment with a list of all things that currently effect the vector width attribute emitted in IR.

For inlining, we update the caller's attribute during merging to ensure it is at least as large as the callee that is being inlined. This is required for always_inline of the intrinsics. We probably want a way to limit inlining, but that would effect the inlining decision. If the decision has been made to inline we have to take the max.

For LTO I don't have an answer. What do we do for things like target features and cpu today?

Add back the new test file that I lost in the previous update

Ping

In D52441#1258317, @craig.topper wrote:

Address Reid's comments. Add a comment with a list of all things that currently effect the vector width attribute emitted in IR.

For inlining, we update the caller's attribute during merging to ensure it is at least as large as the callee that is being inlined. This is required for always_inline of the intrinsics. We probably want a way to limit inlining, but that would effect the inlining decision. If the decision has been made to inline we have to take the max.

For LTO I don't have an answer. What do we do for things like target features and cpu today?

I think your comments about the behavior w.r.t. inlining are enough to describe what happens during LTO. I don't want to speak for Eric, but I think you've answered his questions.

In D52441#1271545, @rnk wrote:

In D52441#1258317, @craig.topper wrote:

Address Reid's comments. Add a comment with a list of all things that currently effect the vector width attribute emitted in IR.

For inlining, we update the caller's attribute during merging to ensure it is at least as large as the callee that is being inlined. This is required for always_inline of the intrinsics. We probably want a way to limit inlining, but that would effect the inlining decision. If the decision has been made to inline we have to take the max.

For LTO I don't have an answer. What do we do for things like target features and cpu today?

I think your comments about the behavior w.r.t. inlining are enough to describe what happens during LTO. I don't want to speak for Eric, but I think you've answered his questions.

Yes, it'll be the same. As a note, the inlining widening must happen after the check for subtarget features.

Otherwise we talked about this at the conference and LGTM.

This revision is now accepted and ready to land.Oct 23 2018, 5:20 PM

Closed by commit rC345168: [CodeGen] Update min-legal-vector width based on function argument and return… (authored by ctopper). · Explain WhyOct 24 2018, 10:44 AM

This revision was automatically updated to reflect the committed changes.

craig.topper mentioned this in D139701: [Clang] Emit "min-legal-vector-width" attribute for X86 only.Dec 17 2022, 10:24 AM

Revision Contents

Path

Size

lib/

CodeGen/

CGCall.cpp

12 lines

CodeGenFunction.cpp

19 lines

test/

CodeGen/

aarch64-neon-3v.c

83 lines

aarch64-neon-across.c

147 lines

aarch64-neon-fma.c

41 lines

aarch64-neon-ldst-one.c

460 lines

aarch64-neon-scalar-copy.c

26 lines

aarch64-neon-scalar-x-indexed-elem.c

43 lines

207 lines

51 lines

71 lines

11 lines

arm-neon-numeric-maxmin.c

15 lines

arm-neon-vcvtX.c

51 lines

arm64_vdupq_n_f64.c

4 lines

x86-vector-width.c

61 lines

CodeGenOpenCL/

fpmath.cl

12 lines

Diff 168725

lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 4,232 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < IRCallArgs.size(); ++i) {
if (IRFunctionArgs.hasInallocaArg() &&		if (IRFunctionArgs.hasInallocaArg() &&
i == IRFunctionArgs.getInallocaArgNo())		i == IRFunctionArgs.getInallocaArgNo())
continue;		continue;
if (i < IRFuncTy->getNumParams())		if (i < IRFuncTy->getNumParams())
assert(IRCallArgs[i]->getType() == IRFuncTy->getParamType(i));		assert(IRCallArgs[i]->getType() == IRFuncTy->getParamType(i));
}		}
#endif		#endif

		// Update the largest vector width if any arguments have vector types.
		for (unsigned i = 0; i < IRCallArgs.size(); ++i) {
		if (auto *VT = dyn_cast<llvm::VectorType>(IRCallArgs[i]->getType()))
		LargestVectorWidth = std::max(LargestVectorWidth,
		VT->getPrimitiveSizeInBits());
		}

// Compute the calling convention and attributes.		// Compute the calling convention and attributes.
unsigned CallingConv;		unsigned CallingConv;
llvm::AttributeList Attrs;		llvm::AttributeList Attrs;
CGM.ConstructAttributeList(CalleePtr->getName(), CallInfo,		CGM.ConstructAttributeList(CalleePtr->getName(), CallInfo,
Callee.getAbstractInfo(), Attrs, CallingConv,		Callee.getAbstractInfo(), Attrs, CallingConv,
/AttrOnCallSite=/true);		/AttrOnCallSite=/true);

// Apply some call-site-specific attributes.		// Apply some call-site-specific attributes.
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	#endif
CS.setAttributes(Attrs);		CS.setAttributes(Attrs);
CS.setCallingConv(static_cast<llvm::CallingConv::ID>(CallingConv));		CS.setCallingConv(static_cast<llvm::CallingConv::ID>(CallingConv));

// Apply various metadata.		// Apply various metadata.

if (!CI->getType()->isVoidTy())		if (!CI->getType()->isVoidTy())
CI->setName("call");		CI->setName("call");

		// Update largest vector width from the return type.
		if (auto *VT = dyn_cast<llvm::VectorType>(CI->getType()))
		LargestVectorWidth = std::max(LargestVectorWidth,
		VT->getPrimitiveSizeInBits());

// Insert instrumentation or attach profile metadata at indirect call sites.		// Insert instrumentation or attach profile metadata at indirect call sites.
// For more details, see the comment before the definition of		// For more details, see the comment before the definition of
// IPVK_IndirectCallTarget in InstrProfData.inc.		// IPVK_IndirectCallTarget in InstrProfData.inc.
if (!CS.getCalledFunction())		if (!CS.getCalledFunction())
PGO.valueProfile(Builder, llvm::IPVK_IndirectCallTarget,		PGO.valueProfile(Builder, llvm::IPVK_IndirectCallTarget,
CI, CalleePtr);		CI, CalleePtr);

// In ObjC ARC mode with no ObjC ARC exception safety, tell the ARC		// In ObjC ARC mode with no ObjC ARC exception safety, tell the ARC
▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

lib/CodeGen/CodeGenFunction.cpp

Show First 20 Lines • Show All 424 Lines • ▼ Show 20 Lines	void CodeGenFunction::FinishFunction(SourceLocation EndLoc) {
// difficult.		// difficult.
if (NormalCleanupDest.isValid() && isCoroutine()) {		if (NormalCleanupDest.isValid() && isCoroutine()) {
llvm::DominatorTree DT(*CurFn);		llvm::DominatorTree DT(*CurFn);
llvm::PromoteMemToReg(		llvm::PromoteMemToReg(
cast<llvm::AllocaInst>(NormalCleanupDest.getPointer()), DT);		cast<llvm::AllocaInst>(NormalCleanupDest.getPointer()), DT);
NormalCleanupDest = Address::invalid();		NormalCleanupDest = Address::invalid();
}		}

// Add the required-vector-width attribute.		// Scan function arguments for vector width.
		for (llvm::Argument &A : CurFn->args())
		if (auto *VT = dyn_cast<llvm::VectorType>(A.getType()))
		LargestVectorWidth = std::max(LargestVectorWidth,
		VT->getPrimitiveSizeInBits());

		// Update vector width based on return type.
		if (auto *VT = dyn_cast<llvm::VectorType>(CurFn->getReturnType()))
		LargestVectorWidth = std::max(LargestVectorWidth,
		VT->getPrimitiveSizeInBits());

		// Add the required-vector-width attribute. This contains the max width from:
		// 1. min-vector-width attribute used in the source program.
		// 2. Any builtins used that have a vector width specified.
		// 3. Values passed in and out of inline assembly.
		// 4. Width of vector arguments and return types for this function.
		// 5. Width of vector aguments and return types for functions called by this
		// function.
if (LargestVectorWidth != 0)		if (LargestVectorWidth != 0)
CurFn->addFnAttr("min-legal-vector-width",		CurFn->addFnAttr("min-legal-vector-width",
llvm::utostr(LargestVectorWidth));		llvm::utostr(LargestVectorWidth));
}		}

/// ShouldInstrumentFunction - Return true if the current function should be		/// ShouldInstrumentFunction - Return true if the current function should be
/// instrumented with __cyg_profile_func_* calls		/// instrumented with __cyg_profile_func_* calls
bool CodeGenFunction::ShouldInstrumentFunction() {		bool CodeGenFunction::ShouldInstrumentFunction() {
▲ Show 20 Lines • Show All 1,985 Lines • Show Last 20 Lines

test/CodeGen/aarch64-neon-3v.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <8 x i8> @test_vand_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vand_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, %b			// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, %b
	// CHECK: ret <8 x i8> [[AND_I]]			// CHECK: ret <8 x i8> [[AND_I]]
	int8x8_t test_vand_s8(int8x8_t a, int8x8_t b) {			int8x8_t test_vand_s8(int8x8_t a, int8x8_t b) {
	return vand_s8(a, b);			return vand_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vandq_s8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vandq_s8(<16 x i8> %a, <16 x i8> %b) #1 {
	rnkUnsubmitted Not Done Reply Inline Actions These attribute changes don't appear to test anything. They don't say anything about the min-legal-vector-width. It's unfortunate that LLVM attribute syntax is so filecheck unfriendly, but for now, I think you need to check for #0, #1, etc attribute definitions at the end of each .c file. rnk: These attribute changes don't appear to test anything. They don't say anything about the min…
	// CHECK: [[AND_I:%.*]] = and <16 x i8> %a, %b			// CHECK: [[AND_I:%.*]] = and <16 x i8> %a, %b
	// CHECK: ret <16 x i8> [[AND_I]]			// CHECK: ret <16 x i8> [[AND_I]]
	int8x16_t test_vandq_s8(int8x16_t a, int8x16_t b) {			int8x16_t test_vandq_s8(int8x16_t a, int8x16_t b) {
	return vandq_s8(a, b);			return vandq_s8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vand_s16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vand_s16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <4 x i16> %a, %b			// CHECK: [[AND_I:%.*]] = and <4 x i16> %a, %b
	// CHECK: ret <4 x i16> [[AND_I]]			// CHECK: ret <4 x i16> [[AND_I]]
	int16x4_t test_vand_s16(int16x4_t a, int16x4_t b) {			int16x4_t test_vand_s16(int16x4_t a, int16x4_t b) {
	return vand_s16(a, b);			return vand_s16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vandq_s16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vandq_s16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[AND_I:%.*]] = and <8 x i16> %a, %b			// CHECK: [[AND_I:%.*]] = and <8 x i16> %a, %b
	// CHECK: ret <8 x i16> [[AND_I]]			// CHECK: ret <8 x i16> [[AND_I]]
	int16x8_t test_vandq_s16(int16x8_t a, int16x8_t b) {			int16x8_t test_vandq_s16(int16x8_t a, int16x8_t b) {
	return vandq_s16(a, b);			return vandq_s16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vand_s32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vand_s32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <2 x i32> %a, %b			// CHECK: [[AND_I:%.*]] = and <2 x i32> %a, %b
	// CHECK: ret <2 x i32> [[AND_I]]			// CHECK: ret <2 x i32> [[AND_I]]
	int32x2_t test_vand_s32(int32x2_t a, int32x2_t b) {			int32x2_t test_vand_s32(int32x2_t a, int32x2_t b) {
	return vand_s32(a, b);			return vand_s32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vandq_s32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vandq_s32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[AND_I:%.*]] = and <4 x i32> %a, %b			// CHECK: [[AND_I:%.*]] = and <4 x i32> %a, %b
	// CHECK: ret <4 x i32> [[AND_I]]			// CHECK: ret <4 x i32> [[AND_I]]
	int32x4_t test_vandq_s32(int32x4_t a, int32x4_t b) {			int32x4_t test_vandq_s32(int32x4_t a, int32x4_t b) {
	return vandq_s32(a, b);			return vandq_s32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vand_s64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vand_s64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <1 x i64> %a, %b			// CHECK: [[AND_I:%.*]] = and <1 x i64> %a, %b
	// CHECK: ret <1 x i64> [[AND_I]]			// CHECK: ret <1 x i64> [[AND_I]]
	int64x1_t test_vand_s64(int64x1_t a, int64x1_t b) {			int64x1_t test_vand_s64(int64x1_t a, int64x1_t b) {
	return vand_s64(a, b);			return vand_s64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vandq_s64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vandq_s64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[AND_I:%.*]] = and <2 x i64> %a, %b			// CHECK: [[AND_I:%.*]] = and <2 x i64> %a, %b
	// CHECK: ret <2 x i64> [[AND_I]]			// CHECK: ret <2 x i64> [[AND_I]]
	int64x2_t test_vandq_s64(int64x2_t a, int64x2_t b) {			int64x2_t test_vandq_s64(int64x2_t a, int64x2_t b) {
	return vandq_s64(a, b);			return vandq_s64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vand_u8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vand_u8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, %b			// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, %b
	// CHECK: ret <8 x i8> [[AND_I]]			// CHECK: ret <8 x i8> [[AND_I]]
	uint8x8_t test_vand_u8(uint8x8_t a, uint8x8_t b) {			uint8x8_t test_vand_u8(uint8x8_t a, uint8x8_t b) {
	return vand_u8(a, b);			return vand_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vandq_u8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vandq_u8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[AND_I:%.*]] = and <16 x i8> %a, %b			// CHECK: [[AND_I:%.*]] = and <16 x i8> %a, %b
	// CHECK: ret <16 x i8> [[AND_I]]			// CHECK: ret <16 x i8> [[AND_I]]
	uint8x16_t test_vandq_u8(uint8x16_t a, uint8x16_t b) {			uint8x16_t test_vandq_u8(uint8x16_t a, uint8x16_t b) {
	return vandq_u8(a, b);			return vandq_u8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vand_u16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vand_u16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <4 x i16> %a, %b			// CHECK: [[AND_I:%.*]] = and <4 x i16> %a, %b
	// CHECK: ret <4 x i16> [[AND_I]]			// CHECK: ret <4 x i16> [[AND_I]]
	uint16x4_t test_vand_u16(uint16x4_t a, uint16x4_t b) {			uint16x4_t test_vand_u16(uint16x4_t a, uint16x4_t b) {
	return vand_u16(a, b);			return vand_u16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vandq_u16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vandq_u16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[AND_I:%.*]] = and <8 x i16> %a, %b			// CHECK: [[AND_I:%.*]] = and <8 x i16> %a, %b
	// CHECK: ret <8 x i16> [[AND_I]]			// CHECK: ret <8 x i16> [[AND_I]]
	uint16x8_t test_vandq_u16(uint16x8_t a, uint16x8_t b) {			uint16x8_t test_vandq_u16(uint16x8_t a, uint16x8_t b) {
	return vandq_u16(a, b);			return vandq_u16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vand_u32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vand_u32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <2 x i32> %a, %b			// CHECK: [[AND_I:%.*]] = and <2 x i32> %a, %b
	// CHECK: ret <2 x i32> [[AND_I]]			// CHECK: ret <2 x i32> [[AND_I]]
	uint32x2_t test_vand_u32(uint32x2_t a, uint32x2_t b) {			uint32x2_t test_vand_u32(uint32x2_t a, uint32x2_t b) {
	return vand_u32(a, b);			return vand_u32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vandq_u32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vandq_u32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[AND_I:%.*]] = and <4 x i32> %a, %b			// CHECK: [[AND_I:%.*]] = and <4 x i32> %a, %b
	// CHECK: ret <4 x i32> [[AND_I]]			// CHECK: ret <4 x i32> [[AND_I]]
	uint32x4_t test_vandq_u32(uint32x4_t a, uint32x4_t b) {			uint32x4_t test_vandq_u32(uint32x4_t a, uint32x4_t b) {
	return vandq_u32(a, b);			return vandq_u32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vand_u64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vand_u64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[AND_I:%.*]] = and <1 x i64> %a, %b			// CHECK: [[AND_I:%.*]] = and <1 x i64> %a, %b
	// CHECK: ret <1 x i64> [[AND_I]]			// CHECK: ret <1 x i64> [[AND_I]]
	uint64x1_t test_vand_u64(uint64x1_t a, uint64x1_t b) {			uint64x1_t test_vand_u64(uint64x1_t a, uint64x1_t b) {
	return vand_u64(a, b);			return vand_u64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vandq_u64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vandq_u64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[AND_I:%.*]] = and <2 x i64> %a, %b			// CHECK: [[AND_I:%.*]] = and <2 x i64> %a, %b
	// CHECK: ret <2 x i64> [[AND_I]]			// CHECK: ret <2 x i64> [[AND_I]]
	uint64x2_t test_vandq_u64(uint64x2_t a, uint64x2_t b) {			uint64x2_t test_vandq_u64(uint64x2_t a, uint64x2_t b) {
	return vandq_u64(a, b);			return vandq_u64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vorr_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vorr_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[OR_I:%.*]] = or <8 x i8> %a, %b			// CHECK: [[OR_I:%.*]] = or <8 x i8> %a, %b
	// CHECK: ret <8 x i8> [[OR_I]]			// CHECK: ret <8 x i8> [[OR_I]]
	int8x8_t test_vorr_s8(int8x8_t a, int8x8_t b) {			int8x8_t test_vorr_s8(int8x8_t a, int8x8_t b) {
	return vorr_s8(a, b);			return vorr_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vorrq_s8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vorrq_s8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[OR_I:%.*]] = or <16 x i8> %a, %b			// CHECK: [[OR_I:%.*]] = or <16 x i8> %a, %b
	// CHECK: ret <16 x i8> [[OR_I]]			// CHECK: ret <16 x i8> [[OR_I]]
	int8x16_t test_vorrq_s8(int8x16_t a, int8x16_t b) {			int8x16_t test_vorrq_s8(int8x16_t a, int8x16_t b) {
	return vorrq_s8(a, b);			return vorrq_s8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vorr_s16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vorr_s16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[OR_I:%.*]] = or <4 x i16> %a, %b			// CHECK: [[OR_I:%.*]] = or <4 x i16> %a, %b
	// CHECK: ret <4 x i16> [[OR_I]]			// CHECK: ret <4 x i16> [[OR_I]]
	int16x4_t test_vorr_s16(int16x4_t a, int16x4_t b) {			int16x4_t test_vorr_s16(int16x4_t a, int16x4_t b) {
	return vorr_s16(a, b);			return vorr_s16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vorrq_s16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vorrq_s16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[OR_I:%.*]] = or <8 x i16> %a, %b			// CHECK: [[OR_I:%.*]] = or <8 x i16> %a, %b
	// CHECK: ret <8 x i16> [[OR_I]]			// CHECK: ret <8 x i16> [[OR_I]]
	int16x8_t test_vorrq_s16(int16x8_t a, int16x8_t b) {			int16x8_t test_vorrq_s16(int16x8_t a, int16x8_t b) {
	return vorrq_s16(a, b);			return vorrq_s16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vorr_s32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vorr_s32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[OR_I:%.*]] = or <2 x i32> %a, %b			// CHECK: [[OR_I:%.*]] = or <2 x i32> %a, %b
	// CHECK: ret <2 x i32> [[OR_I]]			// CHECK: ret <2 x i32> [[OR_I]]
	int32x2_t test_vorr_s32(int32x2_t a, int32x2_t b) {			int32x2_t test_vorr_s32(int32x2_t a, int32x2_t b) {
	return vorr_s32(a, b);			return vorr_s32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vorrq_s32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vorrq_s32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[OR_I:%.*]] = or <4 x i32> %a, %b			// CHECK: [[OR_I:%.*]] = or <4 x i32> %a, %b
	// CHECK: ret <4 x i32> [[OR_I]]			// CHECK: ret <4 x i32> [[OR_I]]
	int32x4_t test_vorrq_s32(int32x4_t a, int32x4_t b) {			int32x4_t test_vorrq_s32(int32x4_t a, int32x4_t b) {
	return vorrq_s32(a, b);			return vorrq_s32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vorr_s64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vorr_s64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[OR_I:%.*]] = or <1 x i64> %a, %b			// CHECK: [[OR_I:%.*]] = or <1 x i64> %a, %b
	// CHECK: ret <1 x i64> [[OR_I]]			// CHECK: ret <1 x i64> [[OR_I]]
	int64x1_t test_vorr_s64(int64x1_t a, int64x1_t b) {			int64x1_t test_vorr_s64(int64x1_t a, int64x1_t b) {
	return vorr_s64(a, b);			return vorr_s64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vorrq_s64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vorrq_s64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[OR_I:%.*]] = or <2 x i64> %a, %b			// CHECK: [[OR_I:%.*]] = or <2 x i64> %a, %b
	// CHECK: ret <2 x i64> [[OR_I]]			// CHECK: ret <2 x i64> [[OR_I]]
	int64x2_t test_vorrq_s64(int64x2_t a, int64x2_t b) {			int64x2_t test_vorrq_s64(int64x2_t a, int64x2_t b) {
	return vorrq_s64(a, b);			return vorrq_s64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vorr_u8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vorr_u8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[OR_I:%.*]] = or <8 x i8> %a, %b			// CHECK: [[OR_I:%.*]] = or <8 x i8> %a, %b
	// CHECK: ret <8 x i8> [[OR_I]]			// CHECK: ret <8 x i8> [[OR_I]]
	uint8x8_t test_vorr_u8(uint8x8_t a, uint8x8_t b) {			uint8x8_t test_vorr_u8(uint8x8_t a, uint8x8_t b) {
	return vorr_u8(a, b);			return vorr_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vorrq_u8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vorrq_u8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[OR_I:%.*]] = or <16 x i8> %a, %b			// CHECK: [[OR_I:%.*]] = or <16 x i8> %a, %b
	// CHECK: ret <16 x i8> [[OR_I]]			// CHECK: ret <16 x i8> [[OR_I]]
	uint8x16_t test_vorrq_u8(uint8x16_t a, uint8x16_t b) {			uint8x16_t test_vorrq_u8(uint8x16_t a, uint8x16_t b) {
	return vorrq_u8(a, b);			return vorrq_u8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vorr_u16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vorr_u16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[OR_I:%.*]] = or <4 x i16> %a, %b			// CHECK: [[OR_I:%.*]] = or <4 x i16> %a, %b
	// CHECK: ret <4 x i16> [[OR_I]]			// CHECK: ret <4 x i16> [[OR_I]]
	uint16x4_t test_vorr_u16(uint16x4_t a, uint16x4_t b) {			uint16x4_t test_vorr_u16(uint16x4_t a, uint16x4_t b) {
	return vorr_u16(a, b);			return vorr_u16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vorrq_u16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vorrq_u16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[OR_I:%.*]] = or <8 x i16> %a, %b			// CHECK: [[OR_I:%.*]] = or <8 x i16> %a, %b
	// CHECK: ret <8 x i16> [[OR_I]]			// CHECK: ret <8 x i16> [[OR_I]]
	uint16x8_t test_vorrq_u16(uint16x8_t a, uint16x8_t b) {			uint16x8_t test_vorrq_u16(uint16x8_t a, uint16x8_t b) {
	return vorrq_u16(a, b);			return vorrq_u16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vorr_u32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vorr_u32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[OR_I:%.*]] = or <2 x i32> %a, %b			// CHECK: [[OR_I:%.*]] = or <2 x i32> %a, %b
	// CHECK: ret <2 x i32> [[OR_I]]			// CHECK: ret <2 x i32> [[OR_I]]
	uint32x2_t test_vorr_u32(uint32x2_t a, uint32x2_t b) {			uint32x2_t test_vorr_u32(uint32x2_t a, uint32x2_t b) {
	return vorr_u32(a, b);			return vorr_u32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vorrq_u32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vorrq_u32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[OR_I:%.*]] = or <4 x i32> %a, %b			// CHECK: [[OR_I:%.*]] = or <4 x i32> %a, %b
	// CHECK: ret <4 x i32> [[OR_I]]			// CHECK: ret <4 x i32> [[OR_I]]
	uint32x4_t test_vorrq_u32(uint32x4_t a, uint32x4_t b) {			uint32x4_t test_vorrq_u32(uint32x4_t a, uint32x4_t b) {
	return vorrq_u32(a, b);			return vorrq_u32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vorr_u64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vorr_u64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[OR_I:%.*]] = or <1 x i64> %a, %b			// CHECK: [[OR_I:%.*]] = or <1 x i64> %a, %b
	// CHECK: ret <1 x i64> [[OR_I]]			// CHECK: ret <1 x i64> [[OR_I]]
	uint64x1_t test_vorr_u64(uint64x1_t a, uint64x1_t b) {			uint64x1_t test_vorr_u64(uint64x1_t a, uint64x1_t b) {
	return vorr_u64(a, b);			return vorr_u64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vorrq_u64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vorrq_u64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[OR_I:%.*]] = or <2 x i64> %a, %b			// CHECK: [[OR_I:%.*]] = or <2 x i64> %a, %b
	// CHECK: ret <2 x i64> [[OR_I]]			// CHECK: ret <2 x i64> [[OR_I]]
	uint64x2_t test_vorrq_u64(uint64x2_t a, uint64x2_t b) {			uint64x2_t test_vorrq_u64(uint64x2_t a, uint64x2_t b) {
	return vorrq_u64(a, b);			return vorrq_u64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_veor_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_veor_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[XOR_I:%.*]] = xor <8 x i8> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <8 x i8> %a, %b
	// CHECK: ret <8 x i8> [[XOR_I]]			// CHECK: ret <8 x i8> [[XOR_I]]
	int8x8_t test_veor_s8(int8x8_t a, int8x8_t b) {			int8x8_t test_veor_s8(int8x8_t a, int8x8_t b) {
	return veor_s8(a, b);			return veor_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_veorq_s8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_veorq_s8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[XOR_I:%.*]] = xor <16 x i8> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <16 x i8> %a, %b
	// CHECK: ret <16 x i8> [[XOR_I]]			// CHECK: ret <16 x i8> [[XOR_I]]
	int8x16_t test_veorq_s8(int8x16_t a, int8x16_t b) {			int8x16_t test_veorq_s8(int8x16_t a, int8x16_t b) {
	return veorq_s8(a, b);			return veorq_s8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_veor_s16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_veor_s16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[XOR_I:%.*]] = xor <4 x i16> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <4 x i16> %a, %b
	// CHECK: ret <4 x i16> [[XOR_I]]			// CHECK: ret <4 x i16> [[XOR_I]]
	int16x4_t test_veor_s16(int16x4_t a, int16x4_t b) {			int16x4_t test_veor_s16(int16x4_t a, int16x4_t b) {
	return veor_s16(a, b);			return veor_s16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_veorq_s16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_veorq_s16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[XOR_I:%.*]] = xor <8 x i16> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <8 x i16> %a, %b
	// CHECK: ret <8 x i16> [[XOR_I]]			// CHECK: ret <8 x i16> [[XOR_I]]
	int16x8_t test_veorq_s16(int16x8_t a, int16x8_t b) {			int16x8_t test_veorq_s16(int16x8_t a, int16x8_t b) {
	return veorq_s16(a, b);			return veorq_s16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_veor_s32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_veor_s32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[XOR_I:%.*]] = xor <2 x i32> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <2 x i32> %a, %b
	// CHECK: ret <2 x i32> [[XOR_I]]			// CHECK: ret <2 x i32> [[XOR_I]]
	int32x2_t test_veor_s32(int32x2_t a, int32x2_t b) {			int32x2_t test_veor_s32(int32x2_t a, int32x2_t b) {
	return veor_s32(a, b);			return veor_s32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_veorq_s32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_veorq_s32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[XOR_I:%.*]] = xor <4 x i32> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <4 x i32> %a, %b
	// CHECK: ret <4 x i32> [[XOR_I]]			// CHECK: ret <4 x i32> [[XOR_I]]
	int32x4_t test_veorq_s32(int32x4_t a, int32x4_t b) {			int32x4_t test_veorq_s32(int32x4_t a, int32x4_t b) {
	return veorq_s32(a, b);			return veorq_s32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_veor_s64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_veor_s64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[XOR_I:%.*]] = xor <1 x i64> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <1 x i64> %a, %b
	// CHECK: ret <1 x i64> [[XOR_I]]			// CHECK: ret <1 x i64> [[XOR_I]]
	int64x1_t test_veor_s64(int64x1_t a, int64x1_t b) {			int64x1_t test_veor_s64(int64x1_t a, int64x1_t b) {
	return veor_s64(a, b);			return veor_s64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_veorq_s64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_veorq_s64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[XOR_I:%.*]] = xor <2 x i64> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <2 x i64> %a, %b
	// CHECK: ret <2 x i64> [[XOR_I]]			// CHECK: ret <2 x i64> [[XOR_I]]
	int64x2_t test_veorq_s64(int64x2_t a, int64x2_t b) {			int64x2_t test_veorq_s64(int64x2_t a, int64x2_t b) {
	return veorq_s64(a, b);			return veorq_s64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_veor_u8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_veor_u8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[XOR_I:%.*]] = xor <8 x i8> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <8 x i8> %a, %b
	// CHECK: ret <8 x i8> [[XOR_I]]			// CHECK: ret <8 x i8> [[XOR_I]]
	uint8x8_t test_veor_u8(uint8x8_t a, uint8x8_t b) {			uint8x8_t test_veor_u8(uint8x8_t a, uint8x8_t b) {
	return veor_u8(a, b);			return veor_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_veorq_u8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_veorq_u8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[XOR_I:%.*]] = xor <16 x i8> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <16 x i8> %a, %b
	// CHECK: ret <16 x i8> [[XOR_I]]			// CHECK: ret <16 x i8> [[XOR_I]]
	uint8x16_t test_veorq_u8(uint8x16_t a, uint8x16_t b) {			uint8x16_t test_veorq_u8(uint8x16_t a, uint8x16_t b) {
	return veorq_u8(a, b);			return veorq_u8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_veor_u16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_veor_u16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[XOR_I:%.*]] = xor <4 x i16> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <4 x i16> %a, %b
	// CHECK: ret <4 x i16> [[XOR_I]]			// CHECK: ret <4 x i16> [[XOR_I]]
	uint16x4_t test_veor_u16(uint16x4_t a, uint16x4_t b) {			uint16x4_t test_veor_u16(uint16x4_t a, uint16x4_t b) {
	return veor_u16(a, b);			return veor_u16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_veorq_u16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_veorq_u16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[XOR_I:%.*]] = xor <8 x i16> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <8 x i16> %a, %b
	// CHECK: ret <8 x i16> [[XOR_I]]			// CHECK: ret <8 x i16> [[XOR_I]]
	uint16x8_t test_veorq_u16(uint16x8_t a, uint16x8_t b) {			uint16x8_t test_veorq_u16(uint16x8_t a, uint16x8_t b) {
	return veorq_u16(a, b);			return veorq_u16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_veor_u32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_veor_u32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[XOR_I:%.*]] = xor <2 x i32> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <2 x i32> %a, %b
	// CHECK: ret <2 x i32> [[XOR_I]]			// CHECK: ret <2 x i32> [[XOR_I]]
	uint32x2_t test_veor_u32(uint32x2_t a, uint32x2_t b) {			uint32x2_t test_veor_u32(uint32x2_t a, uint32x2_t b) {
	return veor_u32(a, b);			return veor_u32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_veorq_u32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_veorq_u32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[XOR_I:%.*]] = xor <4 x i32> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <4 x i32> %a, %b
	// CHECK: ret <4 x i32> [[XOR_I]]			// CHECK: ret <4 x i32> [[XOR_I]]
	uint32x4_t test_veorq_u32(uint32x4_t a, uint32x4_t b) {			uint32x4_t test_veorq_u32(uint32x4_t a, uint32x4_t b) {
	return veorq_u32(a, b);			return veorq_u32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_veor_u64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_veor_u64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[XOR_I:%.*]] = xor <1 x i64> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <1 x i64> %a, %b
	// CHECK: ret <1 x i64> [[XOR_I]]			// CHECK: ret <1 x i64> [[XOR_I]]
	uint64x1_t test_veor_u64(uint64x1_t a, uint64x1_t b) {			uint64x1_t test_veor_u64(uint64x1_t a, uint64x1_t b) {
	return veor_u64(a, b);			return veor_u64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_veorq_u64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_veorq_u64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[XOR_I:%.*]] = xor <2 x i64> %a, %b			// CHECK: [[XOR_I:%.*]] = xor <2 x i64> %a, %b
	// CHECK: ret <2 x i64> [[XOR_I]]			// CHECK: ret <2 x i64> [[XOR_I]]
	uint64x2_t test_veorq_u64(uint64x2_t a, uint64x2_t b) {			uint64x2_t test_veorq_u64(uint64x2_t a, uint64x2_t b) {
	return veorq_u64(a, b);			return veorq_u64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vbic_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vbic_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <8 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[NEG_I:%.*]] = xor <8 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, [[NEG_I]]
	// CHECK: ret <8 x i8> [[AND_I]]			// CHECK: ret <8 x i8> [[AND_I]]
	int8x8_t test_vbic_s8(int8x8_t a, int8x8_t b) {			int8x8_t test_vbic_s8(int8x8_t a, int8x8_t b) {
	return vbic_s8(a, b);			return vbic_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vbicq_s8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vbicq_s8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[NEG_I:%.*]] = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[AND_I:%.*]] = and <16 x i8> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <16 x i8> %a, [[NEG_I]]
	// CHECK: ret <16 x i8> [[AND_I]]			// CHECK: ret <16 x i8> [[AND_I]]
	int8x16_t test_vbicq_s8(int8x16_t a, int8x16_t b) {			int8x16_t test_vbicq_s8(int8x16_t a, int8x16_t b) {
	return vbicq_s8(a, b);			return vbicq_s8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vbic_s16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vbic_s16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <4 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1>			// CHECK: [[NEG_I:%.*]] = xor <4 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1>
	// CHECK: [[AND_I:%.*]] = and <4 x i16> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <4 x i16> %a, [[NEG_I]]
	// CHECK: ret <4 x i16> [[AND_I]]			// CHECK: ret <4 x i16> [[AND_I]]
	int16x4_t test_vbic_s16(int16x4_t a, int16x4_t b) {			int16x4_t test_vbic_s16(int16x4_t a, int16x4_t b) {
	return vbic_s16(a, b);			return vbic_s16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vbicq_s16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vbicq_s16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <8 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>			// CHECK: [[NEG_I:%.*]] = xor <8 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
	// CHECK: [[AND_I:%.*]] = and <8 x i16> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <8 x i16> %a, [[NEG_I]]
	// CHECK: ret <8 x i16> [[AND_I]]			// CHECK: ret <8 x i16> [[AND_I]]
	int16x8_t test_vbicq_s16(int16x8_t a, int16x8_t b) {			int16x8_t test_vbicq_s16(int16x8_t a, int16x8_t b) {
	return vbicq_s16(a, b);			return vbicq_s16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vbic_s32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vbic_s32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <2 x i32> %b, <i32 -1, i32 -1>			// CHECK: [[NEG_I:%.*]] = xor <2 x i32> %b, <i32 -1, i32 -1>
	// CHECK: [[AND_I:%.*]] = and <2 x i32> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <2 x i32> %a, [[NEG_I]]
	// CHECK: ret <2 x i32> [[AND_I]]			// CHECK: ret <2 x i32> [[AND_I]]
	int32x2_t test_vbic_s32(int32x2_t a, int32x2_t b) {			int32x2_t test_vbic_s32(int32x2_t a, int32x2_t b) {
	return vbic_s32(a, b);			return vbic_s32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vbicq_s32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vbicq_s32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <4 x i32> %b, <i32 -1, i32 -1, i32 -1, i32 -1>			// CHECK: [[NEG_I:%.*]] = xor <4 x i32> %b, <i32 -1, i32 -1, i32 -1, i32 -1>
	// CHECK: [[AND_I:%.*]] = and <4 x i32> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <4 x i32> %a, [[NEG_I]]
	// CHECK: ret <4 x i32> [[AND_I]]			// CHECK: ret <4 x i32> [[AND_I]]
	int32x4_t test_vbicq_s32(int32x4_t a, int32x4_t b) {			int32x4_t test_vbicq_s32(int32x4_t a, int32x4_t b) {
	return vbicq_s32(a, b);			return vbicq_s32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vbic_s64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vbic_s64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <1 x i64> %b, <i64 -1>			// CHECK: [[NEG_I:%.*]] = xor <1 x i64> %b, <i64 -1>
	// CHECK: [[AND_I:%.*]] = and <1 x i64> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <1 x i64> %a, [[NEG_I]]
	// CHECK: ret <1 x i64> [[AND_I]]			// CHECK: ret <1 x i64> [[AND_I]]
	int64x1_t test_vbic_s64(int64x1_t a, int64x1_t b) {			int64x1_t test_vbic_s64(int64x1_t a, int64x1_t b) {
	return vbic_s64(a, b);			return vbic_s64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vbicq_s64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vbicq_s64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <2 x i64> %b, <i64 -1, i64 -1>			// CHECK: [[NEG_I:%.*]] = xor <2 x i64> %b, <i64 -1, i64 -1>
	// CHECK: [[AND_I:%.*]] = and <2 x i64> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <2 x i64> %a, [[NEG_I]]
	// CHECK: ret <2 x i64> [[AND_I]]			// CHECK: ret <2 x i64> [[AND_I]]
	int64x2_t test_vbicq_s64(int64x2_t a, int64x2_t b) {			int64x2_t test_vbicq_s64(int64x2_t a, int64x2_t b) {
	return vbicq_s64(a, b);			return vbicq_s64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vbic_u8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vbic_u8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <8 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[NEG_I:%.*]] = xor <8 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <8 x i8> %a, [[NEG_I]]
	// CHECK: ret <8 x i8> [[AND_I]]			// CHECK: ret <8 x i8> [[AND_I]]
	uint8x8_t test_vbic_u8(uint8x8_t a, uint8x8_t b) {			uint8x8_t test_vbic_u8(uint8x8_t a, uint8x8_t b) {
	return vbic_u8(a, b);			return vbic_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vbicq_u8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vbicq_u8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[NEG_I:%.*]] = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[AND_I:%.*]] = and <16 x i8> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <16 x i8> %a, [[NEG_I]]
	// CHECK: ret <16 x i8> [[AND_I]]			// CHECK: ret <16 x i8> [[AND_I]]
	uint8x16_t test_vbicq_u8(uint8x16_t a, uint8x16_t b) {			uint8x16_t test_vbicq_u8(uint8x16_t a, uint8x16_t b) {
	return vbicq_u8(a, b);			return vbicq_u8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vbic_u16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vbic_u16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <4 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1>			// CHECK: [[NEG_I:%.*]] = xor <4 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1>
	// CHECK: [[AND_I:%.*]] = and <4 x i16> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <4 x i16> %a, [[NEG_I]]
	// CHECK: ret <4 x i16> [[AND_I]]			// CHECK: ret <4 x i16> [[AND_I]]
	uint16x4_t test_vbic_u16(uint16x4_t a, uint16x4_t b) {			uint16x4_t test_vbic_u16(uint16x4_t a, uint16x4_t b) {
	return vbic_u16(a, b);			return vbic_u16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vbicq_u16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vbicq_u16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <8 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>			// CHECK: [[NEG_I:%.*]] = xor <8 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
	// CHECK: [[AND_I:%.*]] = and <8 x i16> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <8 x i16> %a, [[NEG_I]]
	// CHECK: ret <8 x i16> [[AND_I]]			// CHECK: ret <8 x i16> [[AND_I]]
	uint16x8_t test_vbicq_u16(uint16x8_t a, uint16x8_t b) {			uint16x8_t test_vbicq_u16(uint16x8_t a, uint16x8_t b) {
	return vbicq_u16(a, b);			return vbicq_u16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vbic_u32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vbic_u32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <2 x i32> %b, <i32 -1, i32 -1>			// CHECK: [[NEG_I:%.*]] = xor <2 x i32> %b, <i32 -1, i32 -1>
	// CHECK: [[AND_I:%.*]] = and <2 x i32> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <2 x i32> %a, [[NEG_I]]
	// CHECK: ret <2 x i32> [[AND_I]]			// CHECK: ret <2 x i32> [[AND_I]]
	uint32x2_t test_vbic_u32(uint32x2_t a, uint32x2_t b) {			uint32x2_t test_vbic_u32(uint32x2_t a, uint32x2_t b) {
	return vbic_u32(a, b);			return vbic_u32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vbicq_u32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vbicq_u32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <4 x i32> %b, <i32 -1, i32 -1, i32 -1, i32 -1>			// CHECK: [[NEG_I:%.*]] = xor <4 x i32> %b, <i32 -1, i32 -1, i32 -1, i32 -1>
	// CHECK: [[AND_I:%.*]] = and <4 x i32> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <4 x i32> %a, [[NEG_I]]
	// CHECK: ret <4 x i32> [[AND_I]]			// CHECK: ret <4 x i32> [[AND_I]]
	uint32x4_t test_vbicq_u32(uint32x4_t a, uint32x4_t b) {			uint32x4_t test_vbicq_u32(uint32x4_t a, uint32x4_t b) {
	return vbicq_u32(a, b);			return vbicq_u32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vbic_u64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vbic_u64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <1 x i64> %b, <i64 -1>			// CHECK: [[NEG_I:%.*]] = xor <1 x i64> %b, <i64 -1>
	// CHECK: [[AND_I:%.*]] = and <1 x i64> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <1 x i64> %a, [[NEG_I]]
	// CHECK: ret <1 x i64> [[AND_I]]			// CHECK: ret <1 x i64> [[AND_I]]
	uint64x1_t test_vbic_u64(uint64x1_t a, uint64x1_t b) {			uint64x1_t test_vbic_u64(uint64x1_t a, uint64x1_t b) {
	return vbic_u64(a, b);			return vbic_u64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vbicq_u64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vbicq_u64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <2 x i64> %b, <i64 -1, i64 -1>			// CHECK: [[NEG_I:%.*]] = xor <2 x i64> %b, <i64 -1, i64 -1>
	// CHECK: [[AND_I:%.*]] = and <2 x i64> %a, [[NEG_I]]			// CHECK: [[AND_I:%.*]] = and <2 x i64> %a, [[NEG_I]]
	// CHECK: ret <2 x i64> [[AND_I]]			// CHECK: ret <2 x i64> [[AND_I]]
	uint64x2_t test_vbicq_u64(uint64x2_t a, uint64x2_t b) {			uint64x2_t test_vbicq_u64(uint64x2_t a, uint64x2_t b) {
	return vbicq_u64(a, b);			return vbicq_u64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vorn_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vorn_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <8 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[NEG_I:%.*]] = xor <8 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[OR_I:%.*]] = or <8 x i8> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <8 x i8> %a, [[NEG_I]]
	// CHECK: ret <8 x i8> [[OR_I]]			// CHECK: ret <8 x i8> [[OR_I]]
	int8x8_t test_vorn_s8(int8x8_t a, int8x8_t b) {			int8x8_t test_vorn_s8(int8x8_t a, int8x8_t b) {
	return vorn_s8(a, b);			return vorn_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vornq_s8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vornq_s8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[NEG_I:%.*]] = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[OR_I:%.*]] = or <16 x i8> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <16 x i8> %a, [[NEG_I]]
	// CHECK: ret <16 x i8> [[OR_I]]			// CHECK: ret <16 x i8> [[OR_I]]
	int8x16_t test_vornq_s8(int8x16_t a, int8x16_t b) {			int8x16_t test_vornq_s8(int8x16_t a, int8x16_t b) {
	return vornq_s8(a, b);			return vornq_s8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vorn_s16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vorn_s16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <4 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1>			// CHECK: [[NEG_I:%.*]] = xor <4 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1>
	// CHECK: [[OR_I:%.*]] = or <4 x i16> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <4 x i16> %a, [[NEG_I]]
	// CHECK: ret <4 x i16> [[OR_I]]			// CHECK: ret <4 x i16> [[OR_I]]
	int16x4_t test_vorn_s16(int16x4_t a, int16x4_t b) {			int16x4_t test_vorn_s16(int16x4_t a, int16x4_t b) {
	return vorn_s16(a, b);			return vorn_s16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vornq_s16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vornq_s16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <8 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>			// CHECK: [[NEG_I:%.*]] = xor <8 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
	// CHECK: [[OR_I:%.*]] = or <8 x i16> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <8 x i16> %a, [[NEG_I]]
	// CHECK: ret <8 x i16> [[OR_I]]			// CHECK: ret <8 x i16> [[OR_I]]
	int16x8_t test_vornq_s16(int16x8_t a, int16x8_t b) {			int16x8_t test_vornq_s16(int16x8_t a, int16x8_t b) {
	return vornq_s16(a, b);			return vornq_s16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vorn_s32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vorn_s32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <2 x i32> %b, <i32 -1, i32 -1>			// CHECK: [[NEG_I:%.*]] = xor <2 x i32> %b, <i32 -1, i32 -1>
	// CHECK: [[OR_I:%.*]] = or <2 x i32> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <2 x i32> %a, [[NEG_I]]
	// CHECK: ret <2 x i32> [[OR_I]]			// CHECK: ret <2 x i32> [[OR_I]]
	int32x2_t test_vorn_s32(int32x2_t a, int32x2_t b) {			int32x2_t test_vorn_s32(int32x2_t a, int32x2_t b) {
	return vorn_s32(a, b);			return vorn_s32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vornq_s32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vornq_s32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <4 x i32> %b, <i32 -1, i32 -1, i32 -1, i32 -1>			// CHECK: [[NEG_I:%.*]] = xor <4 x i32> %b, <i32 -1, i32 -1, i32 -1, i32 -1>
	// CHECK: [[OR_I:%.*]] = or <4 x i32> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <4 x i32> %a, [[NEG_I]]
	// CHECK: ret <4 x i32> [[OR_I]]			// CHECK: ret <4 x i32> [[OR_I]]
	int32x4_t test_vornq_s32(int32x4_t a, int32x4_t b) {			int32x4_t test_vornq_s32(int32x4_t a, int32x4_t b) {
	return vornq_s32(a, b);			return vornq_s32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vorn_s64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vorn_s64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <1 x i64> %b, <i64 -1>			// CHECK: [[NEG_I:%.*]] = xor <1 x i64> %b, <i64 -1>
	// CHECK: [[OR_I:%.*]] = or <1 x i64> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <1 x i64> %a, [[NEG_I]]
	// CHECK: ret <1 x i64> [[OR_I]]			// CHECK: ret <1 x i64> [[OR_I]]
	int64x1_t test_vorn_s64(int64x1_t a, int64x1_t b) {			int64x1_t test_vorn_s64(int64x1_t a, int64x1_t b) {
	return vorn_s64(a, b);			return vorn_s64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vornq_s64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vornq_s64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <2 x i64> %b, <i64 -1, i64 -1>			// CHECK: [[NEG_I:%.*]] = xor <2 x i64> %b, <i64 -1, i64 -1>
	// CHECK: [[OR_I:%.*]] = or <2 x i64> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <2 x i64> %a, [[NEG_I]]
	// CHECK: ret <2 x i64> [[OR_I]]			// CHECK: ret <2 x i64> [[OR_I]]
	int64x2_t test_vornq_s64(int64x2_t a, int64x2_t b) {			int64x2_t test_vornq_s64(int64x2_t a, int64x2_t b) {
	return vornq_s64(a, b);			return vornq_s64(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vorn_u8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vorn_u8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <8 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[NEG_I:%.*]] = xor <8 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[OR_I:%.*]] = or <8 x i8> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <8 x i8> %a, [[NEG_I]]
	// CHECK: ret <8 x i8> [[OR_I]]			// CHECK: ret <8 x i8> [[OR_I]]
	uint8x8_t test_vorn_u8(uint8x8_t a, uint8x8_t b) {			uint8x8_t test_vorn_u8(uint8x8_t a, uint8x8_t b) {
	return vorn_u8(a, b);			return vorn_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vornq_u8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vornq_u8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[NEG_I:%.*]] = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[OR_I:%.*]] = or <16 x i8> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <16 x i8> %a, [[NEG_I]]
	// CHECK: ret <16 x i8> [[OR_I]]			// CHECK: ret <16 x i8> [[OR_I]]
	uint8x16_t test_vornq_u8(uint8x16_t a, uint8x16_t b) {			uint8x16_t test_vornq_u8(uint8x16_t a, uint8x16_t b) {
	return vornq_u8(a, b);			return vornq_u8(a, b);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vorn_u16(<4 x i16> %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vorn_u16(<4 x i16> %a, <4 x i16> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <4 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1>			// CHECK: [[NEG_I:%.*]] = xor <4 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1>
	// CHECK: [[OR_I:%.*]] = or <4 x i16> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <4 x i16> %a, [[NEG_I]]
	// CHECK: ret <4 x i16> [[OR_I]]			// CHECK: ret <4 x i16> [[OR_I]]
	uint16x4_t test_vorn_u16(uint16x4_t a, uint16x4_t b) {			uint16x4_t test_vorn_u16(uint16x4_t a, uint16x4_t b) {
	return vorn_u16(a, b);			return vorn_u16(a, b);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vornq_u16(<8 x i16> %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vornq_u16(<8 x i16> %a, <8 x i16> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <8 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>			// CHECK: [[NEG_I:%.*]] = xor <8 x i16> %b, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
	// CHECK: [[OR_I:%.*]] = or <8 x i16> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <8 x i16> %a, [[NEG_I]]
	// CHECK: ret <8 x i16> [[OR_I]]			// CHECK: ret <8 x i16> [[OR_I]]
	uint16x8_t test_vornq_u16(uint16x8_t a, uint16x8_t b) {			uint16x8_t test_vornq_u16(uint16x8_t a, uint16x8_t b) {
	return vornq_u16(a, b);			return vornq_u16(a, b);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vorn_u32(<2 x i32> %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vorn_u32(<2 x i32> %a, <2 x i32> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <2 x i32> %b, <i32 -1, i32 -1>			// CHECK: [[NEG_I:%.*]] = xor <2 x i32> %b, <i32 -1, i32 -1>
	// CHECK: [[OR_I:%.*]] = or <2 x i32> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <2 x i32> %a, [[NEG_I]]
	// CHECK: ret <2 x i32> [[OR_I]]			// CHECK: ret <2 x i32> [[OR_I]]
	uint32x2_t test_vorn_u32(uint32x2_t a, uint32x2_t b) {			uint32x2_t test_vorn_u32(uint32x2_t a, uint32x2_t b) {
	return vorn_u32(a, b);			return vorn_u32(a, b);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vornq_u32(<4 x i32> %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vornq_u32(<4 x i32> %a, <4 x i32> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <4 x i32> %b, <i32 -1, i32 -1, i32 -1, i32 -1>			// CHECK: [[NEG_I:%.*]] = xor <4 x i32> %b, <i32 -1, i32 -1, i32 -1, i32 -1>
	// CHECK: [[OR_I:%.*]] = or <4 x i32> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <4 x i32> %a, [[NEG_I]]
	// CHECK: ret <4 x i32> [[OR_I]]			// CHECK: ret <4 x i32> [[OR_I]]
	uint32x4_t test_vornq_u32(uint32x4_t a, uint32x4_t b) {			uint32x4_t test_vornq_u32(uint32x4_t a, uint32x4_t b) {
	return vornq_u32(a, b);			return vornq_u32(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vorn_u64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vorn_u64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[NEG_I:%.*]] = xor <1 x i64> %b, <i64 -1>			// CHECK: [[NEG_I:%.*]] = xor <1 x i64> %b, <i64 -1>
	// CHECK: [[OR_I:%.*]] = or <1 x i64> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <1 x i64> %a, [[NEG_I]]
	// CHECK: ret <1 x i64> [[OR_I]]			// CHECK: ret <1 x i64> [[OR_I]]
	uint64x1_t test_vorn_u64(uint64x1_t a, uint64x1_t b) {			uint64x1_t test_vorn_u64(uint64x1_t a, uint64x1_t b) {
	return vorn_u64(a, b);			return vorn_u64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vornq_u64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vornq_u64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[NEG_I:%.*]] = xor <2 x i64> %b, <i64 -1, i64 -1>			// CHECK: [[NEG_I:%.*]] = xor <2 x i64> %b, <i64 -1, i64 -1>
	// CHECK: [[OR_I:%.*]] = or <2 x i64> %a, [[NEG_I]]			// CHECK: [[OR_I:%.*]] = or <2 x i64> %a, [[NEG_I]]
	// CHECK: ret <2 x i64> [[OR_I]]			// CHECK: ret <2 x i64> [[OR_I]]
	uint64x2_t test_vornq_u64(uint64x2_t a, uint64x2_t b) {			uint64x2_t test_vornq_u64(uint64x2_t a, uint64x2_t b) {
	return vornq_u64(a, b);			return vornq_u64(a, b);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/aarch64-neon-across.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define i16 @test_vaddlv_s8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddlv_s8(<8 x i8> %a) #0 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v8i8(<8 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16
	// CHECK: ret i16 [[TMP0]]			// CHECK: ret i16 [[TMP0]]
	int16_t test_vaddlv_s8(int8x8_t a) {			int16_t test_vaddlv_s8(int8x8_t a) {
	return vaddlv_s8(a);			return vaddlv_s8(a);
	}			}

	// CHECK-LABEL: define i32 @test_vaddlv_s16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i32 @test_vaddlv_s16(<4 x i16> %a) #0 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v4i16(<4 x i16> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v4i16(<4 x i16> %a) #3
	// CHECK: ret i32 [[VADDLV_I]]			// CHECK: ret i32 [[VADDLV_I]]
	int32_t test_vaddlv_s16(int16x4_t a) {			int32_t test_vaddlv_s16(int16x4_t a) {
	return vaddlv_s16(a);			return vaddlv_s16(a);
	}			}

	// CHECK-LABEL: define i16 @test_vaddlv_u8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddlv_u8(<8 x i8> %a) #0 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddlv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddlv.i32.v8i8(<8 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16
	// CHECK: ret i16 [[TMP0]]			// CHECK: ret i16 [[TMP0]]
	uint16_t test_vaddlv_u8(uint8x8_t a) {			uint16_t test_vaddlv_u8(uint8x8_t a) {
	return vaddlv_u8(a);			return vaddlv_u8(a);
	}			}

	// CHECK-LABEL: define i32 @test_vaddlv_u16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i32 @test_vaddlv_u16(<4 x i16> %a) #0 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddlv.i32.v4i16(<4 x i16> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddlv.i32.v4i16(<4 x i16> %a) #3
	// CHECK: ret i32 [[VADDLV_I]]			// CHECK: ret i32 [[VADDLV_I]]
	uint32_t test_vaddlv_u16(uint16x4_t a) {			uint32_t test_vaddlv_u16(uint16x4_t a) {
	return vaddlv_u16(a);			return vaddlv_u16(a);
	}			}

	// CHECK-LABEL: define i16 @test_vaddlvq_s8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddlvq_s8(<16 x i8> %a) #1 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v16i8(<16 x i8> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v16i8(<16 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16
	// CHECK: ret i16 [[TMP0]]			// CHECK: ret i16 [[TMP0]]
	int16_t test_vaddlvq_s8(int8x16_t a) {			int16_t test_vaddlvq_s8(int8x16_t a) {
	return vaddlvq_s8(a);			return vaddlvq_s8(a);
	}			}

	// CHECK-LABEL: define i32 @test_vaddlvq_s16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i32 @test_vaddlvq_s16(<8 x i16> %a) #1 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v8i16(<8 x i16> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.saddlv.i32.v8i16(<8 x i16> %a) #3
	// CHECK: ret i32 [[VADDLV_I]]			// CHECK: ret i32 [[VADDLV_I]]
	int32_t test_vaddlvq_s16(int16x8_t a) {			int32_t test_vaddlvq_s16(int16x8_t a) {
	return vaddlvq_s16(a);			return vaddlvq_s16(a);
	}			}

	// CHECK-LABEL: define i64 @test_vaddlvq_s32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i64 @test_vaddlvq_s32(<4 x i32> %a) #1 {
	// CHECK: [[VADDLVQ_S32_I:%.*]] = call i64 @llvm.aarch64.neon.saddlv.i64.v4i32(<4 x i32> %a) #2			// CHECK: [[VADDLVQ_S32_I:%.*]] = call i64 @llvm.aarch64.neon.saddlv.i64.v4i32(<4 x i32> %a) #3
	// CHECK: ret i64 [[VADDLVQ_S32_I]]			// CHECK: ret i64 [[VADDLVQ_S32_I]]
	int64_t test_vaddlvq_s32(int32x4_t a) {			int64_t test_vaddlvq_s32(int32x4_t a) {
	return vaddlvq_s32(a);			return vaddlvq_s32(a);
	}			}

	// CHECK-LABEL: define i16 @test_vaddlvq_u8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddlvq_u8(<16 x i8> %a) #1 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddlv.i32.v16i8(<16 x i8> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddlv.i32.v16i8(<16 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDLV_I]] to i16
	// CHECK: ret i16 [[TMP0]]			// CHECK: ret i16 [[TMP0]]
	uint16_t test_vaddlvq_u8(uint8x16_t a) {			uint16_t test_vaddlvq_u8(uint8x16_t a) {
	return vaddlvq_u8(a);			return vaddlvq_u8(a);
	}			}

	// CHECK-LABEL: define i32 @test_vaddlvq_u16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i32 @test_vaddlvq_u16(<8 x i16> %a) #1 {
	// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddlv.i32.v8i16(<8 x i16> %a) #2			// CHECK: [[VADDLV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddlv.i32.v8i16(<8 x i16> %a) #3
	// CHECK: ret i32 [[VADDLV_I]]			// CHECK: ret i32 [[VADDLV_I]]
	uint32_t test_vaddlvq_u16(uint16x8_t a) {			uint32_t test_vaddlvq_u16(uint16x8_t a) {
	return vaddlvq_u16(a);			return vaddlvq_u16(a);
	}			}

	// CHECK-LABEL: define i64 @test_vaddlvq_u32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i64 @test_vaddlvq_u32(<4 x i32> %a) #1 {
	// CHECK: [[VADDLVQ_U32_I:%.*]] = call i64 @llvm.aarch64.neon.uaddlv.i64.v4i32(<4 x i32> %a) #2			// CHECK: [[VADDLVQ_U32_I:%.*]] = call i64 @llvm.aarch64.neon.uaddlv.i64.v4i32(<4 x i32> %a) #3
	// CHECK: ret i64 [[VADDLVQ_U32_I]]			// CHECK: ret i64 [[VADDLVQ_U32_I]]
	uint64_t test_vaddlvq_u32(uint32x4_t a) {			uint64_t test_vaddlvq_u32(uint32x4_t a) {
	return vaddlvq_u32(a);			return vaddlvq_u32(a);
	}			}

	// CHECK-LABEL: define i8 @test_vmaxv_s8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vmaxv_s8(<8 x i8> %a) #0 {
	// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v8i8(<8 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	int8_t test_vmaxv_s8(int8x8_t a) {			int8_t test_vmaxv_s8(int8x8_t a) {
	return vmaxv_s8(a);			return vmaxv_s8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vmaxv_s16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vmaxv_s16(<4 x i16> %a) #0 {
	// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v4i16(<4 x i16> %a) #2			// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v4i16(<4 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VMAXV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VMAXV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	int16_t test_vmaxv_s16(int16x4_t a) {			int16_t test_vmaxv_s16(int16x4_t a) {
	return vmaxv_s16(a);			return vmaxv_s16(a);
	}			}

	// CHECK-LABEL: define i8 @test_vmaxv_u8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vmaxv_u8(<8 x i8> %a) #0 {
	// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v8i8(<8 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	uint8_t test_vmaxv_u8(uint8x8_t a) {			uint8_t test_vmaxv_u8(uint8x8_t a) {
	return vmaxv_u8(a);			return vmaxv_u8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vmaxv_u16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vmaxv_u16(<4 x i16> %a) #0 {
	// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v4i16(<4 x i16> %a) #2			// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v4i16(<4 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VMAXV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VMAXV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	uint16_t test_vmaxv_u16(uint16x4_t a) {			uint16_t test_vmaxv_u16(uint16x4_t a) {
	return vmaxv_u16(a);			return vmaxv_u16(a);
	}			}

	// CHECK-LABEL: define i8 @test_vmaxvq_s8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vmaxvq_s8(<16 x i8> %a) #1 {
	// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v16i8(<16 x i8> %a) #2			// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v16i8(<16 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	int8_t test_vmaxvq_s8(int8x16_t a) {			int8_t test_vmaxvq_s8(int8x16_t a) {
	return vmaxvq_s8(a);			return vmaxvq_s8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vmaxvq_s16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vmaxvq_s16(<8 x i16> %a) #1 {
	// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v8i16(<8 x i16> %a) #2			// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v8i16(<8 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VMAXV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VMAXV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	int16_t test_vmaxvq_s16(int16x8_t a) {			int16_t test_vmaxvq_s16(int16x8_t a) {
	return vmaxvq_s16(a);			return vmaxvq_s16(a);
	}			}

	// CHECK-LABEL: define i32 @test_vmaxvq_s32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vmaxvq_s32(<4 x i32> %a) #1 {
	// CHECK: [[VMAXVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v4i32(<4 x i32> %a) #2			// CHECK: [[VMAXVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v4i32(<4 x i32> %a) #3
	// CHECK: ret i32 [[VMAXVQ_S32_I]]			// CHECK: ret i32 [[VMAXVQ_S32_I]]
	int32_t test_vmaxvq_s32(int32x4_t a) {			int32_t test_vmaxvq_s32(int32x4_t a) {
	return vmaxvq_s32(a);			return vmaxvq_s32(a);
	}			}

	// CHECK-LABEL: define i8 @test_vmaxvq_u8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vmaxvq_u8(<16 x i8> %a) #1 {
	// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v16i8(<16 x i8> %a) #2			// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v16i8(<16 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	uint8_t test_vmaxvq_u8(uint8x16_t a) {			uint8_t test_vmaxvq_u8(uint8x16_t a) {
	return vmaxvq_u8(a);			return vmaxvq_u8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vmaxvq_u16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vmaxvq_u16(<8 x i16> %a) #1 {
	// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v8i16(<8 x i16> %a) #2			// CHECK: [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v8i16(<8 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VMAXV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VMAXV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	uint16_t test_vmaxvq_u16(uint16x8_t a) {			uint16_t test_vmaxvq_u16(uint16x8_t a) {
	return vmaxvq_u16(a);			return vmaxvq_u16(a);
	}			}

	// CHECK-LABEL: define i32 @test_vmaxvq_u32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vmaxvq_u32(<4 x i32> %a) #1 {
	// CHECK: [[VMAXVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v4i32(<4 x i32> %a) #2			// CHECK: [[VMAXVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v4i32(<4 x i32> %a) #3
	// CHECK: ret i32 [[VMAXVQ_U32_I]]			// CHECK: ret i32 [[VMAXVQ_U32_I]]
	uint32_t test_vmaxvq_u32(uint32x4_t a) {			uint32_t test_vmaxvq_u32(uint32x4_t a) {
	return vmaxvq_u32(a);			return vmaxvq_u32(a);
	}			}

	// CHECK-LABEL: define i8 @test_vminv_s8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vminv_s8(<8 x i8> %a) #0 {
	// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v8i8(<8 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	int8_t test_vminv_s8(int8x8_t a) {			int8_t test_vminv_s8(int8x8_t a) {
	return vminv_s8(a);			return vminv_s8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vminv_s16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vminv_s16(<4 x i16> %a) #0 {
	// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v4i16(<4 x i16> %a) #2			// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v4i16(<4 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VMINV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VMINV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	int16_t test_vminv_s16(int16x4_t a) {			int16_t test_vminv_s16(int16x4_t a) {
	return vminv_s16(a);			return vminv_s16(a);
	}			}

	// CHECK-LABEL: define i8 @test_vminv_u8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vminv_u8(<8 x i8> %a) #0 {
	// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v8i8(<8 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	uint8_t test_vminv_u8(uint8x8_t a) {			uint8_t test_vminv_u8(uint8x8_t a) {
	return vminv_u8(a);			return vminv_u8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vminv_u16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vminv_u16(<4 x i16> %a) #0 {
	// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v4i16(<4 x i16> %a) #2			// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v4i16(<4 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VMINV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VMINV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	uint16_t test_vminv_u16(uint16x4_t a) {			uint16_t test_vminv_u16(uint16x4_t a) {
	return vminv_u16(a);			return vminv_u16(a);
	}			}

	// CHECK-LABEL: define i8 @test_vminvq_s8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vminvq_s8(<16 x i8> %a) #1 {
	// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v16i8(<16 x i8> %a) #2			// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v16i8(<16 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	int8_t test_vminvq_s8(int8x16_t a) {			int8_t test_vminvq_s8(int8x16_t a) {
	return vminvq_s8(a);			return vminvq_s8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vminvq_s16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vminvq_s16(<8 x i16> %a) #1 {
	// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v8i16(<8 x i16> %a) #2			// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v8i16(<8 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VMINV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VMINV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	int16_t test_vminvq_s16(int16x8_t a) {			int16_t test_vminvq_s16(int16x8_t a) {
	return vminvq_s16(a);			return vminvq_s16(a);
	}			}

	// CHECK-LABEL: define i32 @test_vminvq_s32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vminvq_s32(<4 x i32> %a) #1 {
	// CHECK: [[VMINVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v4i32(<4 x i32> %a) #2			// CHECK: [[VMINVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v4i32(<4 x i32> %a) #3
	// CHECK: ret i32 [[VMINVQ_S32_I]]			// CHECK: ret i32 [[VMINVQ_S32_I]]
	int32_t test_vminvq_s32(int32x4_t a) {			int32_t test_vminvq_s32(int32x4_t a) {
	return vminvq_s32(a);			return vminvq_s32(a);
	}			}

	// CHECK-LABEL: define i8 @test_vminvq_u8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vminvq_u8(<16 x i8> %a) #1 {
	// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v16i8(<16 x i8> %a) #2			// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v16i8(<16 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	uint8_t test_vminvq_u8(uint8x16_t a) {			uint8_t test_vminvq_u8(uint8x16_t a) {
	return vminvq_u8(a);			return vminvq_u8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vminvq_u16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vminvq_u16(<8 x i16> %a) #1 {
	// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v8i16(<8 x i16> %a) #2			// CHECK: [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v8i16(<8 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VMINV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VMINV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	uint16_t test_vminvq_u16(uint16x8_t a) {			uint16_t test_vminvq_u16(uint16x8_t a) {
	return vminvq_u16(a);			return vminvq_u16(a);
	}			}

	// CHECK-LABEL: define i32 @test_vminvq_u32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vminvq_u32(<4 x i32> %a) #1 {
	// CHECK: [[VMINVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v4i32(<4 x i32> %a) #2			// CHECK: [[VMINVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v4i32(<4 x i32> %a) #3
	// CHECK: ret i32 [[VMINVQ_U32_I]]			// CHECK: ret i32 [[VMINVQ_U32_I]]
	uint32_t test_vminvq_u32(uint32x4_t a) {			uint32_t test_vminvq_u32(uint32x4_t a) {
	return vminvq_u32(a);			return vminvq_u32(a);
	}			}

	// CHECK-LABEL: define i8 @test_vaddv_s8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vaddv_s8(<8 x i8> %a) #0 {
	// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v8i8(<8 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	int8_t test_vaddv_s8(int8x8_t a) {			int8_t test_vaddv_s8(int8x8_t a) {
	return vaddv_s8(a);			return vaddv_s8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vaddv_s16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddv_s16(<4 x i16> %a) #0 {
	// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v4i16(<4 x i16> %a) #2			// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v4i16(<4 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VADDV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VADDV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	int16_t test_vaddv_s16(int16x4_t a) {			int16_t test_vaddv_s16(int16x4_t a) {
	return vaddv_s16(a);			return vaddv_s16(a);
	}			}

	// CHECK-LABEL: define i8 @test_vaddv_u8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vaddv_u8(<8 x i8> %a) #0 {
	// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v8i8(<8 x i8> %a) #2			// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v8i8(<8 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	uint8_t test_vaddv_u8(uint8x8_t a) {			uint8_t test_vaddv_u8(uint8x8_t a) {
	return vaddv_u8(a);			return vaddv_u8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vaddv_u16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddv_u16(<4 x i16> %a) #0 {
	// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v4i16(<4 x i16> %a) #2			// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v4i16(<4 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VADDV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VADDV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	uint16_t test_vaddv_u16(uint16x4_t a) {			uint16_t test_vaddv_u16(uint16x4_t a) {
	return vaddv_u16(a);			return vaddv_u16(a);
	}			}

	// CHECK-LABEL: define i8 @test_vaddvq_s8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vaddvq_s8(<16 x i8> %a) #1 {
	// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v16i8(<16 x i8> %a) #2			// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v16i8(<16 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	int8_t test_vaddvq_s8(int8x16_t a) {			int8_t test_vaddvq_s8(int8x16_t a) {
	return vaddvq_s8(a);			return vaddvq_s8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vaddvq_s16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddvq_s16(<8 x i16> %a) #1 {
	// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v8i16(<8 x i16> %a) #2			// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v8i16(<8 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VADDV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VADDV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	int16_t test_vaddvq_s16(int16x8_t a) {			int16_t test_vaddvq_s16(int16x8_t a) {
	return vaddvq_s16(a);			return vaddvq_s16(a);
	}			}

	// CHECK-LABEL: define i32 @test_vaddvq_s32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vaddvq_s32(<4 x i32> %a) #1 {
	// CHECK: [[VADDVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v4i32(<4 x i32> %a) #2			// CHECK: [[VADDVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v4i32(<4 x i32> %a) #3
	// CHECK: ret i32 [[VADDVQ_S32_I]]			// CHECK: ret i32 [[VADDVQ_S32_I]]
	int32_t test_vaddvq_s32(int32x4_t a) {			int32_t test_vaddvq_s32(int32x4_t a) {
	return vaddvq_s32(a);			return vaddvq_s32(a);
	}			}

	// CHECK-LABEL: define i8 @test_vaddvq_u8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vaddvq_u8(<16 x i8> %a) #1 {
	// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8> %a) #2			// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8> %a) #3
	// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8			// CHECK: [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8
	// CHECK: ret i8 [[TMP0]]			// CHECK: ret i8 [[TMP0]]
	uint8_t test_vaddvq_u8(uint8x16_t a) {			uint8_t test_vaddvq_u8(uint8x16_t a) {
	return vaddvq_u8(a);			return vaddvq_u8(a);
	}			}

	// CHECK-LABEL: define i16 @test_vaddvq_u16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vaddvq_u16(<8 x i16> %a) #1 {
	// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v8i16(<8 x i16> %a) #2			// CHECK: [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v8i16(<8 x i16> %a) #3
	// CHECK: [[TMP2:%.*]] = trunc i32 [[VADDV_I]] to i16			// CHECK: [[TMP2:%.*]] = trunc i32 [[VADDV_I]] to i16
	// CHECK: ret i16 [[TMP2]]			// CHECK: ret i16 [[TMP2]]
	uint16_t test_vaddvq_u16(uint16x8_t a) {			uint16_t test_vaddvq_u16(uint16x8_t a) {
	return vaddvq_u16(a);			return vaddvq_u16(a);
	}			}

	// CHECK-LABEL: define i32 @test_vaddvq_u32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vaddvq_u32(<4 x i32> %a) #1 {
	// CHECK: [[VADDVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v4i32(<4 x i32> %a) #2			// CHECK: [[VADDVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v4i32(<4 x i32> %a) #3
	// CHECK: ret i32 [[VADDVQ_U32_I]]			// CHECK: ret i32 [[VADDVQ_U32_I]]
	uint32_t test_vaddvq_u32(uint32x4_t a) {			uint32_t test_vaddvq_u32(uint32x4_t a) {
	return vaddvq_u32(a);			return vaddvq_u32(a);
	}			}

	// CHECK-LABEL: define float @test_vmaxvq_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define float @test_vmaxvq_f32(<4 x float> %a) #1 {
	// CHECK: [[VMAXVQ_F32_I:%.*]] = call float @llvm.aarch64.neon.fmaxv.f32.v4f32(<4 x float> %a) #2			// CHECK: [[VMAXVQ_F32_I:%.*]] = call float @llvm.aarch64.neon.fmaxv.f32.v4f32(<4 x float> %a) #3
	// CHECK: ret float [[VMAXVQ_F32_I]]			// CHECK: ret float [[VMAXVQ_F32_I]]
	float32_t test_vmaxvq_f32(float32x4_t a) {			float32_t test_vmaxvq_f32(float32x4_t a) {
	return vmaxvq_f32(a);			return vmaxvq_f32(a);
	}			}

	// CHECK-LABEL: define float @test_vminvq_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define float @test_vminvq_f32(<4 x float> %a) #1 {
	// CHECK: [[VMINVQ_F32_I:%.*]] = call float @llvm.aarch64.neon.fminv.f32.v4f32(<4 x float> %a) #2			// CHECK: [[VMINVQ_F32_I:%.*]] = call float @llvm.aarch64.neon.fminv.f32.v4f32(<4 x float> %a) #3
	// CHECK: ret float [[VMINVQ_F32_I]]			// CHECK: ret float [[VMINVQ_F32_I]]
	float32_t test_vminvq_f32(float32x4_t a) {			float32_t test_vminvq_f32(float32x4_t a) {
	return vminvq_f32(a);			return vminvq_f32(a);
	}			}

	// CHECK-LABEL: define float @test_vmaxnmvq_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define float @test_vmaxnmvq_f32(<4 x float> %a) #1 {
	// CHECK: [[VMAXNMVQ_F32_I:%.*]] = call float @llvm.aarch64.neon.fmaxnmv.f32.v4f32(<4 x float> %a) #2			// CHECK: [[VMAXNMVQ_F32_I:%.*]] = call float @llvm.aarch64.neon.fmaxnmv.f32.v4f32(<4 x float> %a) #3
	// CHECK: ret float [[VMAXNMVQ_F32_I]]			// CHECK: ret float [[VMAXNMVQ_F32_I]]
	float32_t test_vmaxnmvq_f32(float32x4_t a) {			float32_t test_vmaxnmvq_f32(float32x4_t a) {
	return vmaxnmvq_f32(a);			return vmaxnmvq_f32(a);
	}			}

	// CHECK-LABEL: define float @test_vminnmvq_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define float @test_vminnmvq_f32(<4 x float> %a) #1 {
	// CHECK: [[VMINNMVQ_F32_I:%.*]] = call float @llvm.aarch64.neon.fminnmv.f32.v4f32(<4 x float> %a) #2			// CHECK: [[VMINNMVQ_F32_I:%.*]] = call float @llvm.aarch64.neon.fminnmv.f32.v4f32(<4 x float> %a) #3
	// CHECK: ret float [[VMINNMVQ_F32_I]]			// CHECK: ret float [[VMINNMVQ_F32_I]]
	float32_t test_vminnmvq_f32(float32x4_t a) {			float32_t test_vminnmvq_f32(float32x4_t a) {
	return vminnmvq_f32(a);			return vminnmvq_f32(a);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/aarch64-neon-fma.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -S -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -S -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x float> @test_vmla_n_f32(<2 x float> %a, <2 x float> %b, float %c) #0 {			// CHECK-LABEL: define <2 x float> @test_vmla_n_f32(<2 x float> %a, <2 x float> %b, float %c) #0 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x float> undef, float %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x float> undef, float %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x float> [[VECINIT_I]], float %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x float> [[VECINIT_I]], float %c, i32 1
	// CHECK: [[MUL_I:%.*]] = fmul <2 x float> %b, [[VECINIT1_I]]			// CHECK: [[MUL_I:%.*]] = fmul <2 x float> %b, [[VECINIT1_I]]
	// CHECK: [[ADD_I:%.*]] = fadd <2 x float> %a, [[MUL_I]]			// CHECK: [[ADD_I:%.*]] = fadd <2 x float> %a, [[MUL_I]]
	// CHECK: ret <2 x float> [[ADD_I]]			// CHECK: ret <2 x float> [[ADD_I]]
	float32x2_t test_vmla_n_f32(float32x2_t a, float32x2_t b, float32_t c) {			float32x2_t test_vmla_n_f32(float32x2_t a, float32x2_t b, float32_t c) {
	return vmla_n_f32(a, b, c);			return vmla_n_f32(a, b, c);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlaq_n_f32(<4 x float> %a, <4 x float> %b, float %c) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlaq_n_f32(<4 x float> %a, <4 x float> %b, float %c) #1 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <4 x float> undef, float %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <4 x float> undef, float %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <4 x float> [[VECINIT_I]], float %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <4 x float> [[VECINIT_I]], float %c, i32 1
	// CHECK: [[VECINIT2_I:%.*]] = insertelement <4 x float> [[VECINIT1_I]], float %c, i32 2			// CHECK: [[VECINIT2_I:%.*]] = insertelement <4 x float> [[VECINIT1_I]], float %c, i32 2
	// CHECK: [[VECINIT3_I:%.*]] = insertelement <4 x float> [[VECINIT2_I]], float %c, i32 3			// CHECK: [[VECINIT3_I:%.*]] = insertelement <4 x float> [[VECINIT2_I]], float %c, i32 3
	// CHECK: [[MUL_I:%.*]] = fmul <4 x float> %b, [[VECINIT3_I]]			// CHECK: [[MUL_I:%.*]] = fmul <4 x float> %b, [[VECINIT3_I]]
	// CHECK: [[ADD_I:%.*]] = fadd <4 x float> %a, [[MUL_I]]			// CHECK: [[ADD_I:%.*]] = fadd <4 x float> %a, [[MUL_I]]
	// CHECK: ret <4 x float> [[ADD_I]]			// CHECK: ret <4 x float> [[ADD_I]]
	float32x4_t test_vmlaq_n_f32(float32x4_t a, float32x4_t b, float32_t c) {			float32x4_t test_vmlaq_n_f32(float32x4_t a, float32x4_t b, float32_t c) {
	return vmlaq_n_f32(a, b, c);			return vmlaq_n_f32(a, b, c);
	}			}

	// CHECK-LABEL: define <2 x double> @test_vmlaq_n_f64(<2 x double> %a, <2 x double> %b, double %c) #0 {			// CHECK-LABEL: define <2 x double> @test_vmlaq_n_f64(<2 x double> %a, <2 x double> %b, double %c) #1 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %c, i32 1
	// CHECK: [[MUL_I:%.*]] = fmul <2 x double> %b, [[VECINIT1_I]]			// CHECK: [[MUL_I:%.*]] = fmul <2 x double> %b, [[VECINIT1_I]]
	// CHECK: [[ADD_I:%.*]] = fadd <2 x double> %a, [[MUL_I]]			// CHECK: [[ADD_I:%.*]] = fadd <2 x double> %a, [[MUL_I]]
	// CHECK: ret <2 x double> [[ADD_I]]			// CHECK: ret <2 x double> [[ADD_I]]
	float64x2_t test_vmlaq_n_f64(float64x2_t a, float64x2_t b, float64_t c) {			float64x2_t test_vmlaq_n_f64(float64x2_t a, float64x2_t b, float64_t c) {
	return vmlaq_n_f64(a, b, c);			return vmlaq_n_f64(a, b, c);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlsq_n_f32(<4 x float> %a, <4 x float> %b, float %c) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlsq_n_f32(<4 x float> %a, <4 x float> %b, float %c) #1 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <4 x float> undef, float %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <4 x float> undef, float %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <4 x float> [[VECINIT_I]], float %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <4 x float> [[VECINIT_I]], float %c, i32 1
	// CHECK: [[VECINIT2_I:%.*]] = insertelement <4 x float> [[VECINIT1_I]], float %c, i32 2			// CHECK: [[VECINIT2_I:%.*]] = insertelement <4 x float> [[VECINIT1_I]], float %c, i32 2
	// CHECK: [[VECINIT3_I:%.*]] = insertelement <4 x float> [[VECINIT2_I]], float %c, i32 3			// CHECK: [[VECINIT3_I:%.*]] = insertelement <4 x float> [[VECINIT2_I]], float %c, i32 3
	// CHECK: [[MUL_I:%.*]] = fmul <4 x float> %b, [[VECINIT3_I]]			// CHECK: [[MUL_I:%.*]] = fmul <4 x float> %b, [[VECINIT3_I]]
	// CHECK: [[SUB_I:%.*]] = fsub <4 x float> %a, [[MUL_I]]			// CHECK: [[SUB_I:%.*]] = fsub <4 x float> %a, [[MUL_I]]
	// CHECK: ret <4 x float> [[SUB_I]]			// CHECK: ret <4 x float> [[SUB_I]]
	float32x4_t test_vmlsq_n_f32(float32x4_t a, float32x4_t b, float32_t c) {			float32x4_t test_vmlsq_n_f32(float32x4_t a, float32x4_t b, float32_t c) {
	return vmlsq_n_f32(a, b, c);			return vmlsq_n_f32(a, b, c);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vmls_n_f32(<2 x float> %a, <2 x float> %b, float %c) #0 {			// CHECK-LABEL: define <2 x float> @test_vmls_n_f32(<2 x float> %a, <2 x float> %b, float %c) #0 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x float> undef, float %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x float> undef, float %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x float> [[VECINIT_I]], float %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x float> [[VECINIT_I]], float %c, i32 1
	// CHECK: [[MUL_I:%.*]] = fmul <2 x float> %b, [[VECINIT1_I]]			// CHECK: [[MUL_I:%.*]] = fmul <2 x float> %b, [[VECINIT1_I]]
	// CHECK: [[SUB_I:%.*]] = fsub <2 x float> %a, [[MUL_I]]			// CHECK: [[SUB_I:%.*]] = fsub <2 x float> %a, [[MUL_I]]
	// CHECK: ret <2 x float> [[SUB_I]]			// CHECK: ret <2 x float> [[SUB_I]]
	float32x2_t test_vmls_n_f32(float32x2_t a, float32x2_t b, float32_t c) {			float32x2_t test_vmls_n_f32(float32x2_t a, float32x2_t b, float32_t c) {
	return vmls_n_f32(a, b, c);			return vmls_n_f32(a, b, c);
	}			}

	// CHECK-LABEL: define <2 x double> @test_vmlsq_n_f64(<2 x double> %a, <2 x double> %b, double %c) #0 {			// CHECK-LABEL: define <2 x double> @test_vmlsq_n_f64(<2 x double> %a, <2 x double> %b, double %c) #1 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %c, i32 1
	// CHECK: [[MUL_I:%.*]] = fmul <2 x double> %b, [[VECINIT1_I]]			// CHECK: [[MUL_I:%.*]] = fmul <2 x double> %b, [[VECINIT1_I]]
	// CHECK: [[SUB_I:%.*]] = fsub <2 x double> %a, [[MUL_I]]			// CHECK: [[SUB_I:%.*]] = fsub <2 x double> %a, [[MUL_I]]
	// CHECK: ret <2 x double> [[SUB_I]]			// CHECK: ret <2 x double> [[SUB_I]]
	float64x2_t test_vmlsq_n_f64(float64x2_t a, float64x2_t b, float64_t c) {			float64x2_t test_vmlsq_n_f64(float64x2_t a, float64x2_t b, float64_t c) {
	return vmlsq_n_f64(a, b, c);			return vmlsq_n_f64(a, b, c);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vmla_lane_f32_0(<2 x float> %a, <2 x float> %b, <2 x float> %v) #0 {			// CHECK-LABEL: define <2 x float> @test_vmla_lane_f32_0(<2 x float> %a, <2 x float> %b, <2 x float> %v) #0 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <2 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <2 x i32> zeroinitializer
	// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]
	// CHECK: [[ADD:%.*]] = fadd <2 x float> %a, [[MUL]]			// CHECK: [[ADD:%.*]] = fadd <2 x float> %a, [[MUL]]
	// CHECK: ret <2 x float> [[ADD]]			// CHECK: ret <2 x float> [[ADD]]
	float32x2_t test_vmla_lane_f32_0(float32x2_t a, float32x2_t b, float32x2_t v) {			float32x2_t test_vmla_lane_f32_0(float32x2_t a, float32x2_t b, float32x2_t v) {
	return vmla_lane_f32(a, b, v, 0);			return vmla_lane_f32(a, b, v, 0);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlaq_lane_f32_0(<4 x float> %a, <4 x float> %b, <2 x float> %v) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlaq_lane_f32_0(<4 x float> %a, <4 x float> %b, <2 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <4 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <4 x i32> zeroinitializer
	// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]
	// CHECK: [[ADD:%.*]] = fadd <4 x float> %a, [[MUL]]			// CHECK: [[ADD:%.*]] = fadd <4 x float> %a, [[MUL]]
	// CHECK: ret <4 x float> [[ADD]]			// CHECK: ret <4 x float> [[ADD]]
	float32x4_t test_vmlaq_lane_f32_0(float32x4_t a, float32x4_t b, float32x2_t v) {			float32x4_t test_vmlaq_lane_f32_0(float32x4_t a, float32x4_t b, float32x2_t v) {
	return vmlaq_lane_f32(a, b, v, 0);			return vmlaq_lane_f32(a, b, v, 0);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vmla_laneq_f32_0(<2 x float> %a, <2 x float> %b, <4 x float> %v) #0 {			// CHECK-LABEL: define <2 x float> @test_vmla_laneq_f32_0(<2 x float> %a, <2 x float> %b, <4 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <2 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <2 x i32> zeroinitializer
	// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]
	// CHECK: [[ADD:%.*]] = fadd <2 x float> %a, [[MUL]]			// CHECK: [[ADD:%.*]] = fadd <2 x float> %a, [[MUL]]
	// CHECK: ret <2 x float> [[ADD]]			// CHECK: ret <2 x float> [[ADD]]
	float32x2_t test_vmla_laneq_f32_0(float32x2_t a, float32x2_t b, float32x4_t v) {			float32x2_t test_vmla_laneq_f32_0(float32x2_t a, float32x2_t b, float32x4_t v) {
	return vmla_laneq_f32(a, b, v, 0);			return vmla_laneq_f32(a, b, v, 0);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlaq_laneq_f32_0(<4 x float> %a, <4 x float> %b, <4 x float> %v) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlaq_laneq_f32_0(<4 x float> %a, <4 x float> %b, <4 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <4 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <4 x i32> zeroinitializer
	// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]
	// CHECK: [[ADD:%.*]] = fadd <4 x float> %a, [[MUL]]			// CHECK: [[ADD:%.*]] = fadd <4 x float> %a, [[MUL]]
	// CHECK: ret <4 x float> [[ADD]]			// CHECK: ret <4 x float> [[ADD]]
	float32x4_t test_vmlaq_laneq_f32_0(float32x4_t a, float32x4_t b, float32x4_t v) {			float32x4_t test_vmlaq_laneq_f32_0(float32x4_t a, float32x4_t b, float32x4_t v) {
	return vmlaq_laneq_f32(a, b, v, 0);			return vmlaq_laneq_f32(a, b, v, 0);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vmls_lane_f32_0(<2 x float> %a, <2 x float> %b, <2 x float> %v) #0 {			// CHECK-LABEL: define <2 x float> @test_vmls_lane_f32_0(<2 x float> %a, <2 x float> %b, <2 x float> %v) #0 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <2 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <2 x i32> zeroinitializer
	// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]
	// CHECK: [[SUB:%.*]] = fsub <2 x float> %a, [[MUL]]			// CHECK: [[SUB:%.*]] = fsub <2 x float> %a, [[MUL]]
	// CHECK: ret <2 x float> [[SUB]]			// CHECK: ret <2 x float> [[SUB]]
	float32x2_t test_vmls_lane_f32_0(float32x2_t a, float32x2_t b, float32x2_t v) {			float32x2_t test_vmls_lane_f32_0(float32x2_t a, float32x2_t b, float32x2_t v) {
	return vmls_lane_f32(a, b, v, 0);			return vmls_lane_f32(a, b, v, 0);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlsq_lane_f32_0(<4 x float> %a, <4 x float> %b, <2 x float> %v) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlsq_lane_f32_0(<4 x float> %a, <4 x float> %b, <2 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <4 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <4 x i32> zeroinitializer
	// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]
	// CHECK: [[SUB:%.*]] = fsub <4 x float> %a, [[MUL]]			// CHECK: [[SUB:%.*]] = fsub <4 x float> %a, [[MUL]]
	// CHECK: ret <4 x float> [[SUB]]			// CHECK: ret <4 x float> [[SUB]]
	float32x4_t test_vmlsq_lane_f32_0(float32x4_t a, float32x4_t b, float32x2_t v) {			float32x4_t test_vmlsq_lane_f32_0(float32x4_t a, float32x4_t b, float32x2_t v) {
	return vmlsq_lane_f32(a, b, v, 0);			return vmlsq_lane_f32(a, b, v, 0);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vmls_laneq_f32_0(<2 x float> %a, <2 x float> %b, <4 x float> %v) #0 {			// CHECK-LABEL: define <2 x float> @test_vmls_laneq_f32_0(<2 x float> %a, <2 x float> %b, <4 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <2 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <2 x i32> zeroinitializer
	// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]
	// CHECK: [[SUB:%.*]] = fsub <2 x float> %a, [[MUL]]			// CHECK: [[SUB:%.*]] = fsub <2 x float> %a, [[MUL]]
	// CHECK: ret <2 x float> [[SUB]]			// CHECK: ret <2 x float> [[SUB]]
	float32x2_t test_vmls_laneq_f32_0(float32x2_t a, float32x2_t b, float32x4_t v) {			float32x2_t test_vmls_laneq_f32_0(float32x2_t a, float32x2_t b, float32x4_t v) {
	return vmls_laneq_f32(a, b, v, 0);			return vmls_laneq_f32(a, b, v, 0);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlsq_laneq_f32_0(<4 x float> %a, <4 x float> %b, <4 x float> %v) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlsq_laneq_f32_0(<4 x float> %a, <4 x float> %b, <4 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <4 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <4 x i32> zeroinitializer
	// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]
	// CHECK: [[SUB:%.*]] = fsub <4 x float> %a, [[MUL]]			// CHECK: [[SUB:%.*]] = fsub <4 x float> %a, [[MUL]]
	// CHECK: ret <4 x float> [[SUB]]			// CHECK: ret <4 x float> [[SUB]]
	float32x4_t test_vmlsq_laneq_f32_0(float32x4_t a, float32x4_t b, float32x4_t v) {			float32x4_t test_vmlsq_laneq_f32_0(float32x4_t a, float32x4_t b, float32x4_t v) {
	return vmlsq_laneq_f32(a, b, v, 0);			return vmlsq_laneq_f32(a, b, v, 0);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vmla_lane_f32(<2 x float> %a, <2 x float> %b, <2 x float> %v) #0 {			// CHECK-LABEL: define <2 x float> @test_vmla_lane_f32(<2 x float> %a, <2 x float> %b, <2 x float> %v) #0 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <2 x i32> <i32 1, i32 1>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <2 x i32> <i32 1, i32 1>
	// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]
	// CHECK: [[ADD:%.*]] = fadd <2 x float> %a, [[MUL]]			// CHECK: [[ADD:%.*]] = fadd <2 x float> %a, [[MUL]]
	// CHECK: ret <2 x float> [[ADD]]			// CHECK: ret <2 x float> [[ADD]]
	float32x2_t test_vmla_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v) {			float32x2_t test_vmla_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v) {
	return vmla_lane_f32(a, b, v, 1);			return vmla_lane_f32(a, b, v, 1);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlaq_lane_f32(<4 x float> %a, <4 x float> %b, <2 x float> %v) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlaq_lane_f32(<4 x float> %a, <4 x float> %b, <2 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]
	// CHECK: [[ADD:%.*]] = fadd <4 x float> %a, [[MUL]]			// CHECK: [[ADD:%.*]] = fadd <4 x float> %a, [[MUL]]
	// CHECK: ret <4 x float> [[ADD]]			// CHECK: ret <4 x float> [[ADD]]
	float32x4_t test_vmlaq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v) {			float32x4_t test_vmlaq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v) {
	return vmlaq_lane_f32(a, b, v, 1);			return vmlaq_lane_f32(a, b, v, 1);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vmla_laneq_f32(<2 x float> %a, <2 x float> %b, <4 x float> %v) #0 {			// CHECK-LABEL: define <2 x float> @test_vmla_laneq_f32(<2 x float> %a, <2 x float> %b, <4 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <2 x i32> <i32 3, i32 3>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <2 x i32> <i32 3, i32 3>
	// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]
	// CHECK: [[ADD:%.*]] = fadd <2 x float> %a, [[MUL]]			// CHECK: [[ADD:%.*]] = fadd <2 x float> %a, [[MUL]]
	// CHECK: ret <2 x float> [[ADD]]			// CHECK: ret <2 x float> [[ADD]]
	float32x2_t test_vmla_laneq_f32(float32x2_t a, float32x2_t b, float32x4_t v) {			float32x2_t test_vmla_laneq_f32(float32x2_t a, float32x2_t b, float32x4_t v) {
	return vmla_laneq_f32(a, b, v, 3);			return vmla_laneq_f32(a, b, v, 3);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlaq_laneq_f32(<4 x float> %a, <4 x float> %b, <4 x float> %v) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlaq_laneq_f32(<4 x float> %a, <4 x float> %b, <4 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <4 x i32> <i32 3, i32 3, i32 3, i32 3>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]
	// CHECK: [[ADD:%.*]] = fadd <4 x float> %a, [[MUL]]			// CHECK: [[ADD:%.*]] = fadd <4 x float> %a, [[MUL]]
	// CHECK: ret <4 x float> [[ADD]]			// CHECK: ret <4 x float> [[ADD]]
	float32x4_t test_vmlaq_laneq_f32(float32x4_t a, float32x4_t b, float32x4_t v) {			float32x4_t test_vmlaq_laneq_f32(float32x4_t a, float32x4_t b, float32x4_t v) {
	return vmlaq_laneq_f32(a, b, v, 3);			return vmlaq_laneq_f32(a, b, v, 3);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vmls_lane_f32(<2 x float> %a, <2 x float> %b, <2 x float> %v) #0 {			// CHECK-LABEL: define <2 x float> @test_vmls_lane_f32(<2 x float> %a, <2 x float> %b, <2 x float> %v) #0 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <2 x i32> <i32 1, i32 1>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <2 x i32> <i32 1, i32 1>
	// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]
	// CHECK: [[SUB:%.*]] = fsub <2 x float> %a, [[MUL]]			// CHECK: [[SUB:%.*]] = fsub <2 x float> %a, [[MUL]]
	// CHECK: ret <2 x float> [[SUB]]			// CHECK: ret <2 x float> [[SUB]]
	float32x2_t test_vmls_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v) {			float32x2_t test_vmls_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v) {
	return vmls_lane_f32(a, b, v, 1);			return vmls_lane_f32(a, b, v, 1);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlsq_lane_f32(<4 x float> %a, <4 x float> %b, <2 x float> %v) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlsq_lane_f32(<4 x float> %a, <4 x float> %b, <2 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x float> %v, <2 x float> %v, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]
	// CHECK: [[SUB:%.*]] = fsub <4 x float> %a, [[MUL]]			// CHECK: [[SUB:%.*]] = fsub <4 x float> %a, [[MUL]]
	// CHECK: ret <4 x float> [[SUB]]			// CHECK: ret <4 x float> [[SUB]]
	float32x4_t test_vmlsq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v) {			float32x4_t test_vmlsq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v) {
	return vmlsq_lane_f32(a, b, v, 1);			return vmlsq_lane_f32(a, b, v, 1);
	}			}
	// CHECK-LABEL: define <2 x float> @test_vmls_laneq_f32(<2 x float> %a, <2 x float> %b, <4 x float> %v) #0 {			// CHECK-LABEL: define <2 x float> @test_vmls_laneq_f32(<2 x float> %a, <2 x float> %b, <4 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <2 x i32> <i32 3, i32 3>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <2 x i32> <i32 3, i32 3>
	// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <2 x float> %b, [[SHUFFLE]]
	// CHECK: [[SUB:%.*]] = fsub <2 x float> %a, [[MUL]]			// CHECK: [[SUB:%.*]] = fsub <2 x float> %a, [[MUL]]
	// CHECK: ret <2 x float> [[SUB]]			// CHECK: ret <2 x float> [[SUB]]
	float32x2_t test_vmls_laneq_f32(float32x2_t a, float32x2_t b, float32x4_t v) {			float32x2_t test_vmls_laneq_f32(float32x2_t a, float32x2_t b, float32x4_t v) {
	return vmls_laneq_f32(a, b, v, 3);			return vmls_laneq_f32(a, b, v, 3);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmlsq_laneq_f32(<4 x float> %a, <4 x float> %b, <4 x float> %v) #0 {			// CHECK-LABEL: define <4 x float> @test_vmlsq_laneq_f32(<4 x float> %a, <4 x float> %b, <4 x float> %v) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <4 x i32> <i32 3, i32 3, i32 3, i32 3>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x float> %v, <4 x float> %v, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]			// CHECK: [[MUL:%.*]] = fmul <4 x float> %b, [[SHUFFLE]]
	// CHECK: [[SUB:%.*]] = fsub <4 x float> %a, [[MUL]]			// CHECK: [[SUB:%.*]] = fsub <4 x float> %a, [[MUL]]
	// CHECK: ret <4 x float> [[SUB]]			// CHECK: ret <4 x float> [[SUB]]
	float32x4_t test_vmlsq_laneq_f32(float32x4_t a, float32x4_t b, float32x4_t v) {			float32x4_t test_vmlsq_laneq_f32(float32x4_t a, float32x4_t b, float32x4_t v) {
	return vmlsq_laneq_f32(a, b, v, 3);			return vmlsq_laneq_f32(a, b, v, 3);
	}			}

	// CHECK-LABEL: define <2 x double> @test_vfmaq_n_f64(<2 x double> %a, <2 x double> %b, double %c) #0 {			// CHECK-LABEL: define <2 x double> @test_vfmaq_n_f64(<2 x double> %a, <2 x double> %b, double %c) #1 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %c, i32 1
	// CHECK: [[TMP6:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %b, <2 x double> [[VECINIT1_I]], <2 x double> %a)			// CHECK: [[TMP6:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %b, <2 x double> [[VECINIT1_I]], <2 x double> %a)
	// CHECK: ret <2 x double> [[TMP6]]			// CHECK: ret <2 x double> [[TMP6]]
	float64x2_t test_vfmaq_n_f64(float64x2_t a, float64x2_t b, float64_t c) {			float64x2_t test_vfmaq_n_f64(float64x2_t a, float64x2_t b, float64_t c) {
	return vfmaq_n_f64(a, b, c);			return vfmaq_n_f64(a, b, c);
	}			}

	// CHECK-LABEL: define <2 x double> @test_vfmsq_n_f64(<2 x double> %a, <2 x double> %b, double %c) #0 {			// CHECK-LABEL: define <2 x double> @test_vfmsq_n_f64(<2 x double> %a, <2 x double> %b, double %c) #1 {
	// CHECK: [[SUB_I:%.*]] = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, %b			// CHECK: [[SUB_I:%.*]] = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, %b
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %c, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %c, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %c, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %c, i32 1
	// CHECK: [[TMP6:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> [[SUB_I]], <2 x double> [[VECINIT1_I]], <2 x double> %a) #2			// CHECK: [[TMP6:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> [[SUB_I]], <2 x double> [[VECINIT1_I]], <2 x double> %a) #3
	// CHECK: ret <2 x double> [[TMP6]]			// CHECK: ret <2 x double> [[TMP6]]
	float64x2_t test_vfmsq_n_f64(float64x2_t a, float64x2_t b, float64_t c) {			float64x2_t test_vfmsq_n_f64(float64x2_t a, float64x2_t b, float64_t c) {
	return vfmsq_n_f64(a, b, c);			return vfmsq_n_f64(a, b, c);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/aarch64-neon-ldst-one.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
	// CHECK: [[TMP2:%.]] = load i64, i64 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i64, i64 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <2 x i64> undef, i64 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <2 x i64> undef, i64 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP3]], <2 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP3]], <2 x i32> zeroinitializer
	// CHECK: ret <2 x i64> [[LANE]]			// CHECK: ret <2 x i64> [[LANE]]
	poly64x2_t test_vld1q_dup_p64(poly64_t *a) {			poly64x2_t test_vld1q_dup_p64(poly64_t *a) {
	return vld1q_dup_p64(a);			return vld1q_dup_p64(a);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vld1_dup_u8(i8* %a) #0 {			// CHECK-LABEL: define <8 x i8> @test_vld1_dup_u8(i8* %a) #1 {
	// CHECK: [[TMP0:%.]] = load i8, i8 %a			// CHECK: [[TMP0:%.]] = load i8, i8 %a
	// CHECK: [[TMP1:%.*]] = insertelement <8 x i8> undef, i8 [[TMP0]], i32 0			// CHECK: [[TMP1:%.*]] = insertelement <8 x i8> undef, i8 [[TMP0]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP1]], <8 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP1]], <8 x i32> zeroinitializer
	// CHECK: ret <8 x i8> [[LANE]]			// CHECK: ret <8 x i8> [[LANE]]
	uint8x8_t test_vld1_dup_u8(uint8_t *a) {			uint8x8_t test_vld1_dup_u8(uint8_t *a) {
	return vld1_dup_u8(a);			return vld1_dup_u8(a);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vld1_dup_u16(i16* %a) #0 {			// CHECK-LABEL: define <4 x i16> @test_vld1_dup_u16(i16* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer
	// CHECK: ret <4 x i16> [[LANE]]			// CHECK: ret <4 x i16> [[LANE]]
	uint16x4_t test_vld1_dup_u16(uint16_t *a) {			uint16x4_t test_vld1_dup_u16(uint16_t *a) {
	return vld1_dup_u16(a);			return vld1_dup_u16(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vld1_dup_u32(i32* %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vld1_dup_u32(i32* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i32*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i32*
	// CHECK: [[TMP2:%.]] = load i32, i32 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i32, i32 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <2 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <2 x i32> zeroinitializer
	// CHECK: ret <2 x i32> [[LANE]]			// CHECK: ret <2 x i32> [[LANE]]
	uint32x2_t test_vld1_dup_u32(uint32_t *a) {			uint32x2_t test_vld1_dup_u32(uint32_t *a) {
	return vld1_dup_u32(a);			return vld1_dup_u32(a);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vld1_dup_u64(i64* %a) #0 {			// CHECK-LABEL: define <1 x i64> @test_vld1_dup_u64(i64* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: [[TMP2:%.]] = load i64, i64 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i64, i64 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <1 x i64> undef, i64 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <1 x i64> undef, i64 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer
	// CHECK: ret <1 x i64> [[LANE]]			// CHECK: ret <1 x i64> [[LANE]]
	uint64x1_t test_vld1_dup_u64(uint64_t *a) {			uint64x1_t test_vld1_dup_u64(uint64_t *a) {
	return vld1_dup_u64(a);			return vld1_dup_u64(a);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vld1_dup_s8(i8* %a) #0 {			// CHECK-LABEL: define <8 x i8> @test_vld1_dup_s8(i8* %a) #1 {
	// CHECK: [[TMP0:%.]] = load i8, i8 %a			// CHECK: [[TMP0:%.]] = load i8, i8 %a
	// CHECK: [[TMP1:%.*]] = insertelement <8 x i8> undef, i8 [[TMP0]], i32 0			// CHECK: [[TMP1:%.*]] = insertelement <8 x i8> undef, i8 [[TMP0]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP1]], <8 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP1]], <8 x i32> zeroinitializer
	// CHECK: ret <8 x i8> [[LANE]]			// CHECK: ret <8 x i8> [[LANE]]
	int8x8_t test_vld1_dup_s8(int8_t *a) {			int8x8_t test_vld1_dup_s8(int8_t *a) {
	return vld1_dup_s8(a);			return vld1_dup_s8(a);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vld1_dup_s16(i16* %a) #0 {			// CHECK-LABEL: define <4 x i16> @test_vld1_dup_s16(i16* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer
	// CHECK: ret <4 x i16> [[LANE]]			// CHECK: ret <4 x i16> [[LANE]]
	int16x4_t test_vld1_dup_s16(int16_t *a) {			int16x4_t test_vld1_dup_s16(int16_t *a) {
	return vld1_dup_s16(a);			return vld1_dup_s16(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vld1_dup_s32(i32* %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vld1_dup_s32(i32* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i32*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i32*
	// CHECK: [[TMP2:%.]] = load i32, i32 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i32, i32 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <2 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <2 x i32> zeroinitializer
	// CHECK: ret <2 x i32> [[LANE]]			// CHECK: ret <2 x i32> [[LANE]]
	int32x2_t test_vld1_dup_s32(int32_t *a) {			int32x2_t test_vld1_dup_s32(int32_t *a) {
	return vld1_dup_s32(a);			return vld1_dup_s32(a);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vld1_dup_s64(i64* %a) #0 {			// CHECK-LABEL: define <1 x i64> @test_vld1_dup_s64(i64* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: [[TMP2:%.]] = load i64, i64 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i64, i64 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <1 x i64> undef, i64 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <1 x i64> undef, i64 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer
	// CHECK: ret <1 x i64> [[LANE]]			// CHECK: ret <1 x i64> [[LANE]]
	int64x1_t test_vld1_dup_s64(int64_t *a) {			int64x1_t test_vld1_dup_s64(int64_t *a) {
	return vld1_dup_s64(a);			return vld1_dup_s64(a);
	}			}

	// CHECK-LABEL: define <4 x half> @test_vld1_dup_f16(half* %a) #0 {			// CHECK-LABEL: define <4 x half> @test_vld1_dup_f16(half* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to half*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP2:%.]] = load half, half [[TMP1]]			// CHECK: [[TMP2:%.]] = load half, half [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <4 x half> undef, half [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x half> undef, half [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> zeroinitializer
	// CHECK: ret <4 x half> [[LANE]]			// CHECK: ret <4 x half> [[LANE]]
	float16x4_t test_vld1_dup_f16(float16_t *a) {			float16x4_t test_vld1_dup_f16(float16_t *a) {
	return vld1_dup_f16(a);			return vld1_dup_f16(a);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vld1_dup_f32(float* %a) #0 {			// CHECK-LABEL: define <2 x float> @test_vld1_dup_f32(float* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*
	// CHECK: [[TMP2:%.]] = load float, float [[TMP1]]			// CHECK: [[TMP2:%.]] = load float, float [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <2 x float> undef, float [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <2 x float> undef, float [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP3]], <2 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP3]], <2 x i32> zeroinitializer
	// CHECK: ret <2 x float> [[LANE]]			// CHECK: ret <2 x float> [[LANE]]
	float32x2_t test_vld1_dup_f32(float32_t *a) {			float32x2_t test_vld1_dup_f32(float32_t *a) {
	return vld1_dup_f32(a);			return vld1_dup_f32(a);
	}			}

	// CHECK-LABEL: define <1 x double> @test_vld1_dup_f64(double* %a) #0 {			// CHECK-LABEL: define <1 x double> @test_vld1_dup_f64(double* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast double %a to i8*			// CHECK: [[TMP0:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to double*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to double*
	// CHECK: [[TMP2:%.]] = load double, double [[TMP1]]			// CHECK: [[TMP2:%.]] = load double, double [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <1 x double> undef, double [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <1 x double> undef, double [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <1 x double> [[TMP3]], <1 x double> [[TMP3]], <1 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <1 x double> [[TMP3]], <1 x double> [[TMP3]], <1 x i32> zeroinitializer
	// CHECK: ret <1 x double> [[LANE]]			// CHECK: ret <1 x double> [[LANE]]
	float64x1_t test_vld1_dup_f64(float64_t *a) {			float64x1_t test_vld1_dup_f64(float64_t *a) {
	return vld1_dup_f64(a);			return vld1_dup_f64(a);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vld1_dup_p8(i8* %a) #0 {			// CHECK-LABEL: define <8 x i8> @test_vld1_dup_p8(i8* %a) #1 {
	// CHECK: [[TMP0:%.]] = load i8, i8 %a			// CHECK: [[TMP0:%.]] = load i8, i8 %a
	// CHECK: [[TMP1:%.*]] = insertelement <8 x i8> undef, i8 [[TMP0]], i32 0			// CHECK: [[TMP1:%.*]] = insertelement <8 x i8> undef, i8 [[TMP0]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP1]], <8 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP1]], <8 x i32> zeroinitializer
	// CHECK: ret <8 x i8> [[LANE]]			// CHECK: ret <8 x i8> [[LANE]]
	poly8x8_t test_vld1_dup_p8(poly8_t *a) {			poly8x8_t test_vld1_dup_p8(poly8_t *a) {
	return vld1_dup_p8(a);			return vld1_dup_p8(a);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vld1_dup_p16(i16* %a) #0 {			// CHECK-LABEL: define <4 x i16> @test_vld1_dup_p16(i16* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer
	// CHECK: ret <4 x i16> [[LANE]]			// CHECK: ret <4 x i16> [[LANE]]
	poly16x4_t test_vld1_dup_p16(poly16_t *a) {			poly16x4_t test_vld1_dup_p16(poly16_t *a) {
	return vld1_dup_p16(a);			return vld1_dup_p16(a);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vld1_dup_p64(i64* %a) #0 {			// CHECK-LABEL: define <1 x i64> @test_vld1_dup_p64(i64* %a) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: [[TMP2:%.]] = load i64, i64 [[TMP1]]			// CHECK: [[TMP2:%.]] = load i64, i64 [[TMP1]]
	// CHECK: [[TMP3:%.*]] = insertelement <1 x i64> undef, i64 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <1 x i64> undef, i64 [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer
	// CHECK: ret <1 x i64> [[LANE]]			// CHECK: ret <1 x i64> [[LANE]]
	poly64x1_t test_vld1_dup_p64(poly64_t *a) {			poly64x1_t test_vld1_dup_p64(poly64_t *a) {
	return vld1_dup_p64(a);			return vld1_dup_p64(a);
	}			}

	// CHECK-LABEL: define %struct.uint64x2x2_t @test_vld2q_dup_u64(i64* %a) #0 {			// CHECK-LABEL: define %struct.uint64x2x2_t @test_vld2q_dup_u64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD2:%.]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD2:%.]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64> } [[VLD2]], { <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64> } [[VLD2]], { <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.uint64x2x2_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.uint64x2x2_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.uint64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.uint64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.uint64x2x2_t, %struct.uint64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.uint64x2x2_t, %struct.uint64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint64x2x2_t [[TMP6]]			// CHECK: ret %struct.uint64x2x2_t [[TMP6]]
	uint64x2x2_t test_vld2q_dup_u64(uint64_t *a) {			uint64x2x2_t test_vld2q_dup_u64(uint64_t *a) {
	return vld2q_dup_u64(a);			return vld2q_dup_u64(a);
	}			}

	// CHECK-LABEL: define %struct.int64x2x2_t @test_vld2q_dup_s64(i64* %a) #0 {			// CHECK-LABEL: define %struct.int64x2x2_t @test_vld2q_dup_s64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD2:%.]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD2:%.]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64> } [[VLD2]], { <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64> } [[VLD2]], { <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.int64x2x2_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.int64x2x2_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.int64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.int64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.int64x2x2_t, %struct.int64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.int64x2x2_t, %struct.int64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.int64x2x2_t [[TMP6]]			// CHECK: ret %struct.int64x2x2_t [[TMP6]]
	int64x2x2_t test_vld2q_dup_s64(int64_t *a) {			int64x2x2_t test_vld2q_dup_s64(int64_t *a) {
	return vld2q_dup_s64(a);			return vld2q_dup_s64(a);
	}			}

	// CHECK-LABEL: define %struct.float64x2x2_t @test_vld2q_dup_f64(double* %a) #0 {			// CHECK-LABEL: define %struct.float64x2x2_t @test_vld2q_dup_f64(double* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast double %a to i8*			// CHECK: [[TMP1:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*
	// CHECK: [[VLD2:%.]] = call { <2 x double>, <2 x double> } @llvm.aarch64.neon.ld2r.v2f64.p0f64(double [[TMP2]])			// CHECK: [[VLD2:%.]] = call { <2 x double>, <2 x double> } @llvm.aarch64.neon.ld2r.v2f64.p0f64(double [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x double>, <2 x double> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x double>, <2 x double> }*
	// CHECK: store { <2 x double>, <2 x double> } [[VLD2]], { <2 x double>, <2 x double> }* [[TMP3]]			// CHECK: store { <2 x double>, <2 x double> } [[VLD2]], { <2 x double>, <2 x double> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.float64x2x2_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.float64x2x2_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.float64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.float64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.float64x2x2_t, %struct.float64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.float64x2x2_t, %struct.float64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.float64x2x2_t [[TMP6]]			// CHECK: ret %struct.float64x2x2_t [[TMP6]]
	float64x2x2_t test_vld2q_dup_f64(float64_t *a) {			float64x2x2_t test_vld2q_dup_f64(float64_t *a) {
	return vld2q_dup_f64(a);			return vld2q_dup_f64(a);
	}			}

	// CHECK-LABEL: define %struct.poly64x2x2_t @test_vld2q_dup_p64(i64* %a) #0 {			// CHECK-LABEL: define %struct.poly64x2x2_t @test_vld2q_dup_p64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD2:%.]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD2:%.]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64> } [[VLD2]], { <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64> } [[VLD2]], { <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x2_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x2_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x2x2_t, %struct.poly64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.poly64x2x2_t, %struct.poly64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x2_t [[TMP6]]			// CHECK: ret %struct.poly64x2x2_t [[TMP6]]
	poly64x2x2_t test_vld2q_dup_p64(poly64_t *a) {			poly64x2x2_t test_vld2q_dup_p64(poly64_t *a) {
	return vld2q_dup_p64(a);			return vld2q_dup_p64(a);
	}			}

	// CHECK-LABEL: define %struct.float64x1x2_t @test_vld2_dup_f64(double* %a) #0 {			// CHECK-LABEL: define %struct.float64x1x2_t @test_vld2_dup_f64(double* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast double %a to i8*			// CHECK: [[TMP1:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*
	// CHECK: [[VLD2:%.]] = call { <1 x double>, <1 x double> } @llvm.aarch64.neon.ld2r.v1f64.p0f64(double [[TMP2]])			// CHECK: [[VLD2:%.]] = call { <1 x double>, <1 x double> } @llvm.aarch64.neon.ld2r.v1f64.p0f64(double [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x double>, <1 x double> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x double>, <1 x double> }*
	// CHECK: store { <1 x double>, <1 x double> } [[VLD2]], { <1 x double>, <1 x double> }* [[TMP3]]			// CHECK: store { <1 x double>, <1 x double> } [[VLD2]], { <1 x double>, <1 x double> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.float64x1x2_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.float64x1x2_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.float64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.float64x1x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 16, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.float64x1x2_t, %struct.float64x1x2_t [[RETVAL]], align 8			// CHECK: [[TMP6:%.]] = load %struct.float64x1x2_t, %struct.float64x1x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.float64x1x2_t [[TMP6]]			// CHECK: ret %struct.float64x1x2_t [[TMP6]]
	float64x1x2_t test_vld2_dup_f64(float64_t *a) {			float64x1x2_t test_vld2_dup_f64(float64_t *a) {
	return vld2_dup_f64(a);			return vld2_dup_f64(a);
	}			}

	// CHECK-LABEL: define %struct.poly64x1x2_t @test_vld2_dup_p64(i64* %a) #0 {			// CHECK-LABEL: define %struct.poly64x1x2_t @test_vld2_dup_p64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD2:%.]] = call { <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld2r.v1i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD2:%.]] = call { <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld2r.v1i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64> }*
	// CHECK: store { <1 x i64>, <1 x i64> } [[VLD2]], { <1 x i64>, <1 x i64> }* [[TMP3]]			// CHECK: store { <1 x i64>, <1 x i64> } [[VLD2]], { <1 x i64>, <1 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x2_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x2_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 16, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x1x2_t, %struct.poly64x1x2_t [[RETVAL]], align 8			// CHECK: [[TMP6:%.]] = load %struct.poly64x1x2_t, %struct.poly64x1x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly64x1x2_t [[TMP6]]			// CHECK: ret %struct.poly64x1x2_t [[TMP6]]
	poly64x1x2_t test_vld2_dup_p64(poly64_t *a) {			poly64x1x2_t test_vld2_dup_p64(poly64_t *a) {
	return vld2_dup_p64(a);			return vld2_dup_p64(a);
	}			}

	// CHECK-LABEL: define %struct.uint64x2x3_t @test_vld3q_dup_u64(i64* %a) #0 {			// CHECK-LABEL: define %struct.uint64x2x3_t @test_vld3q_dup_u64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD3:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld3r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD3:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld3r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64> } [[VLD3]], { <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64> } [[VLD3]], { <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.uint64x2x3_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.uint64x2x3_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.uint64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.uint64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.uint64x2x3_t, %struct.uint64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.uint64x2x3_t, %struct.uint64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint64x2x3_t [[TMP6]]			// CHECK: ret %struct.uint64x2x3_t [[TMP6]]
	uint64x2x3_t test_vld3q_dup_u64(uint64_t *a) {			uint64x2x3_t test_vld3q_dup_u64(uint64_t *a) {
	return vld3q_dup_u64(a);			return vld3q_dup_u64(a);
	// [{{x[0-9]+\|sp}}]			// [{{x[0-9]+\|sp}}]
	}			}

	// CHECK-LABEL: define %struct.int64x2x3_t @test_vld3q_dup_s64(i64* %a) #0 {			// CHECK-LABEL: define %struct.int64x2x3_t @test_vld3q_dup_s64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD3:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld3r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD3:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld3r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64> } [[VLD3]], { <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64> } [[VLD3]], { <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.int64x2x3_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.int64x2x3_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.int64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.int64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.int64x2x3_t, %struct.int64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.int64x2x3_t, %struct.int64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.int64x2x3_t [[TMP6]]			// CHECK: ret %struct.int64x2x3_t [[TMP6]]
	int64x2x3_t test_vld3q_dup_s64(int64_t *a) {			int64x2x3_t test_vld3q_dup_s64(int64_t *a) {
	return vld3q_dup_s64(a);			return vld3q_dup_s64(a);
	// [{{x[0-9]+\|sp}}]			// [{{x[0-9]+\|sp}}]
	}			}

	// CHECK-LABEL: define %struct.float64x2x3_t @test_vld3q_dup_f64(double* %a) #0 {			// CHECK-LABEL: define %struct.float64x2x3_t @test_vld3q_dup_f64(double* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast double %a to i8*			// CHECK: [[TMP1:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*
	// CHECK: [[VLD3:%.]] = call { <2 x double>, <2 x double>, <2 x double> } @llvm.aarch64.neon.ld3r.v2f64.p0f64(double [[TMP2]])			// CHECK: [[VLD3:%.]] = call { <2 x double>, <2 x double>, <2 x double> } @llvm.aarch64.neon.ld3r.v2f64.p0f64(double [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x double>, <2 x double>, <2 x double> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x double>, <2 x double>, <2 x double> }*
	// CHECK: store { <2 x double>, <2 x double>, <2 x double> } [[VLD3]], { <2 x double>, <2 x double>, <2 x double> }* [[TMP3]]			// CHECK: store { <2 x double>, <2 x double>, <2 x double> } [[VLD3]], { <2 x double>, <2 x double>, <2 x double> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.float64x2x3_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.float64x2x3_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.float64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.float64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.float64x2x3_t, %struct.float64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.float64x2x3_t, %struct.float64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.float64x2x3_t [[TMP6]]			// CHECK: ret %struct.float64x2x3_t [[TMP6]]
	float64x2x3_t test_vld3q_dup_f64(float64_t *a) {			float64x2x3_t test_vld3q_dup_f64(float64_t *a) {
	return vld3q_dup_f64(a);			return vld3q_dup_f64(a);
	// [{{x[0-9]+\|sp}}]			// [{{x[0-9]+\|sp}}]
	}			}

	// CHECK-LABEL: define %struct.poly64x2x3_t @test_vld3q_dup_p64(i64* %a) #0 {			// CHECK-LABEL: define %struct.poly64x2x3_t @test_vld3q_dup_p64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD3:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld3r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD3:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld3r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64> } [[VLD3]], { <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64> } [[VLD3]], { <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x3_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x3_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x2x3_t, %struct.poly64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.poly64x2x3_t, %struct.poly64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x3_t [[TMP6]]			// CHECK: ret %struct.poly64x2x3_t [[TMP6]]
	poly64x2x3_t test_vld3q_dup_p64(poly64_t *a) {			poly64x2x3_t test_vld3q_dup_p64(poly64_t *a) {
	return vld3q_dup_p64(a);			return vld3q_dup_p64(a);
	// [{{x[0-9]+\|sp}}]			// [{{x[0-9]+\|sp}}]
	}			}

	// CHECK-LABEL: define %struct.float64x1x3_t @test_vld3_dup_f64(double* %a) #0 {			// CHECK-LABEL: define %struct.float64x1x3_t @test_vld3_dup_f64(double* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast double %a to i8*			// CHECK: [[TMP1:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*
	// CHECK: [[VLD3:%.]] = call { <1 x double>, <1 x double>, <1 x double> } @llvm.aarch64.neon.ld3r.v1f64.p0f64(double [[TMP2]])			// CHECK: [[VLD3:%.]] = call { <1 x double>, <1 x double>, <1 x double> } @llvm.aarch64.neon.ld3r.v1f64.p0f64(double [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x double>, <1 x double>, <1 x double> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x double>, <1 x double>, <1 x double> }*
	// CHECK: store { <1 x double>, <1 x double>, <1 x double> } [[VLD3]], { <1 x double>, <1 x double>, <1 x double> }* [[TMP3]]			// CHECK: store { <1 x double>, <1 x double>, <1 x double> } [[VLD3]], { <1 x double>, <1 x double>, <1 x double> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.float64x1x3_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.float64x1x3_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.float64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.float64x1x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 24, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.float64x1x3_t, %struct.float64x1x3_t [[RETVAL]], align 8			// CHECK: [[TMP6:%.]] = load %struct.float64x1x3_t, %struct.float64x1x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.float64x1x3_t [[TMP6]]			// CHECK: ret %struct.float64x1x3_t [[TMP6]]
	float64x1x3_t test_vld3_dup_f64(float64_t *a) {			float64x1x3_t test_vld3_dup_f64(float64_t *a) {
	return vld3_dup_f64(a);			return vld3_dup_f64(a);
	// [{{x[0-9]+\|sp}}]			// [{{x[0-9]+\|sp}}]
	}			}

	// CHECK-LABEL: define %struct.poly64x1x3_t @test_vld3_dup_p64(i64* %a) #0 {			// CHECK-LABEL: define %struct.poly64x1x3_t @test_vld3_dup_p64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD3:%.]] = call { <1 x i64>, <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld3r.v1i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD3:%.]] = call { <1 x i64>, <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld3r.v1i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64>, <1 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64>, <1 x i64> }*
	// CHECK: store { <1 x i64>, <1 x i64>, <1 x i64> } [[VLD3]], { <1 x i64>, <1 x i64>, <1 x i64> }* [[TMP3]]			// CHECK: store { <1 x i64>, <1 x i64>, <1 x i64> } [[VLD3]], { <1 x i64>, <1 x i64>, <1 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x3_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x3_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 24, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x1x3_t, %struct.poly64x1x3_t [[RETVAL]], align 8			// CHECK: [[TMP6:%.]] = load %struct.poly64x1x3_t, %struct.poly64x1x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly64x1x3_t [[TMP6]]			// CHECK: ret %struct.poly64x1x3_t [[TMP6]]
	poly64x1x3_t test_vld3_dup_p64(poly64_t *a) {			poly64x1x3_t test_vld3_dup_p64(poly64_t *a) {
	return vld3_dup_p64(a);			return vld3_dup_p64(a);
	// [{{x[0-9]+\|sp}}]			// [{{x[0-9]+\|sp}}]
	}			}

	// CHECK-LABEL: define %struct.uint64x2x4_t @test_vld4q_dup_u64(i64* %a) #0 {			// CHECK-LABEL: define %struct.uint64x2x4_t @test_vld4q_dup_u64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD4:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld4r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD4:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld4r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } [[VLD4]], { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } [[VLD4]], { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.uint64x2x4_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.uint64x2x4_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.uint64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.uint64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.uint64x2x4_t, %struct.uint64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.uint64x2x4_t, %struct.uint64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint64x2x4_t [[TMP6]]			// CHECK: ret %struct.uint64x2x4_t [[TMP6]]
	uint64x2x4_t test_vld4q_dup_u64(uint64_t *a) {			uint64x2x4_t test_vld4q_dup_u64(uint64_t *a) {
	return vld4q_dup_u64(a);			return vld4q_dup_u64(a);
	}			}

	// CHECK-LABEL: define %struct.int64x2x4_t @test_vld4q_dup_s64(i64* %a) #0 {			// CHECK-LABEL: define %struct.int64x2x4_t @test_vld4q_dup_s64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD4:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld4r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD4:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld4r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } [[VLD4]], { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } [[VLD4]], { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.int64x2x4_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.int64x2x4_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.int64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.int64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.int64x2x4_t, %struct.int64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.int64x2x4_t, %struct.int64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.int64x2x4_t [[TMP6]]			// CHECK: ret %struct.int64x2x4_t [[TMP6]]
	int64x2x4_t test_vld4q_dup_s64(int64_t *a) {			int64x2x4_t test_vld4q_dup_s64(int64_t *a) {
	return vld4q_dup_s64(a);			return vld4q_dup_s64(a);
	}			}

	// CHECK-LABEL: define %struct.float64x2x4_t @test_vld4q_dup_f64(double* %a) #0 {			// CHECK-LABEL: define %struct.float64x2x4_t @test_vld4q_dup_f64(double* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast double %a to i8*			// CHECK: [[TMP1:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*
	// CHECK: [[VLD4:%.]] = call { <2 x double>, <2 x double>, <2 x double>, <2 x double> } @llvm.aarch64.neon.ld4r.v2f64.p0f64(double [[TMP2]])			// CHECK: [[VLD4:%.]] = call { <2 x double>, <2 x double>, <2 x double>, <2 x double> } @llvm.aarch64.neon.ld4r.v2f64.p0f64(double [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x double>, <2 x double>, <2 x double>, <2 x double> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x double>, <2 x double>, <2 x double>, <2 x double> }*
	// CHECK: store { <2 x double>, <2 x double>, <2 x double>, <2 x double> } [[VLD4]], { <2 x double>, <2 x double>, <2 x double>, <2 x double> }* [[TMP3]]			// CHECK: store { <2 x double>, <2 x double>, <2 x double>, <2 x double> } [[VLD4]], { <2 x double>, <2 x double>, <2 x double>, <2 x double> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.float64x2x4_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.float64x2x4_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.float64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.float64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.float64x2x4_t, %struct.float64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.float64x2x4_t, %struct.float64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.float64x2x4_t [[TMP6]]			// CHECK: ret %struct.float64x2x4_t [[TMP6]]
	float64x2x4_t test_vld4q_dup_f64(float64_t *a) {			float64x2x4_t test_vld4q_dup_f64(float64_t *a) {
	return vld4q_dup_f64(a);			return vld4q_dup_f64(a);
	}			}

	// CHECK-LABEL: define %struct.poly64x2x4_t @test_vld4q_dup_p64(i64* %a) #0 {			// CHECK-LABEL: define %struct.poly64x2x4_t @test_vld4q_dup_p64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD4:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld4r.v2i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD4:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld4r.v2i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } [[VLD4]], { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } [[VLD4]], { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x4_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x4_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x2x4_t, %struct.poly64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.poly64x2x4_t, %struct.poly64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x4_t [[TMP6]]			// CHECK: ret %struct.poly64x2x4_t [[TMP6]]
	poly64x2x4_t test_vld4q_dup_p64(poly64_t *a) {			poly64x2x4_t test_vld4q_dup_p64(poly64_t *a) {
	return vld4q_dup_p64(a);			return vld4q_dup_p64(a);
	}			}

	// CHECK-LABEL: define %struct.float64x1x4_t @test_vld4_dup_f64(double* %a) #0 {			// CHECK-LABEL: define %struct.float64x1x4_t @test_vld4_dup_f64(double* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast double %a to i8*			// CHECK: [[TMP1:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to double*
	// CHECK: [[VLD4:%.]] = call { <1 x double>, <1 x double>, <1 x double>, <1 x double> } @llvm.aarch64.neon.ld4r.v1f64.p0f64(double [[TMP2]])			// CHECK: [[VLD4:%.]] = call { <1 x double>, <1 x double>, <1 x double>, <1 x double> } @llvm.aarch64.neon.ld4r.v1f64.p0f64(double [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x double>, <1 x double>, <1 x double>, <1 x double> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x double>, <1 x double>, <1 x double>, <1 x double> }*
	// CHECK: store { <1 x double>, <1 x double>, <1 x double>, <1 x double> } [[VLD4]], { <1 x double>, <1 x double>, <1 x double>, <1 x double> }* [[TMP3]]			// CHECK: store { <1 x double>, <1 x double>, <1 x double>, <1 x double> } [[VLD4]], { <1 x double>, <1 x double>, <1 x double>, <1 x double> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.float64x1x4_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.float64x1x4_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.float64x1x4_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.float64x1x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 32, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.float64x1x4_t, %struct.float64x1x4_t [[RETVAL]], align 8			// CHECK: [[TMP6:%.]] = load %struct.float64x1x4_t, %struct.float64x1x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.float64x1x4_t [[TMP6]]			// CHECK: ret %struct.float64x1x4_t [[TMP6]]
	float64x1x4_t test_vld4_dup_f64(float64_t *a) {			float64x1x4_t test_vld4_dup_f64(float64_t *a) {
	return vld4_dup_f64(a);			return vld4_dup_f64(a);
	}			}

	// CHECK-LABEL: define %struct.poly64x1x4_t @test_vld4_dup_p64(i64* %a) #0 {			// CHECK-LABEL: define %struct.poly64x1x4_t @test_vld4_dup_p64(i64* %a) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i64*
	// CHECK: [[VLD4:%.]] = call { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld4r.v1i64.p0i64(i64 [[TMP2]])			// CHECK: [[VLD4:%.]] = call { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld4r.v1i64.p0i64(i64 [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }*
	// CHECK: store { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> } [[VLD4]], { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }* [[TMP3]]			// CHECK: store { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> } [[VLD4]], { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }* [[TMP3]]
	▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: [[TMP4:%.]] = load i64, i64 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i64, i64 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <2 x i64> [[TMP2]], i64 [[TMP4]], i32 1			// CHECK: [[VLD1_LANE:%.*]] = insertelement <2 x i64> [[TMP2]], i64 [[TMP4]], i32 1
	// CHECK: ret <2 x i64> [[VLD1_LANE]]			// CHECK: ret <2 x i64> [[VLD1_LANE]]
	poly64x2_t test_vld1q_lane_p64(poly64_t *a, poly64x2_t b) {			poly64x2_t test_vld1q_lane_p64(poly64_t *a, poly64x2_t b) {
	return vld1q_lane_p64(a, b, 1);			return vld1q_lane_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vld1_lane_u8(i8* %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vld1_lane_u8(i8* %a, <8 x i8> %b) #1 {
	// CHECK: [[TMP0:%.]] = load i8, i8 %a			// CHECK: [[TMP0:%.]] = load i8, i8 %a
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i8> %b, i8 [[TMP0]], i32 7			// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i8> %b, i8 [[TMP0]], i32 7
	// CHECK: ret <8 x i8> [[VLD1_LANE]]			// CHECK: ret <8 x i8> [[VLD1_LANE]]
	uint8x8_t test_vld1_lane_u8(uint8_t *a, uint8x8_t b) {			uint8x8_t test_vld1_lane_u8(uint8_t *a, uint8x8_t b) {
	return vld1_lane_u8(a, b, 7);			return vld1_lane_u8(a, b, 7);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vld1_lane_u16(i16* %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vld1_lane_u16(i16* %a, <4 x i16> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3			// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3
	// CHECK: ret <4 x i16> [[VLD1_LANE]]			// CHECK: ret <4 x i16> [[VLD1_LANE]]
	uint16x4_t test_vld1_lane_u16(uint16_t *a, uint16x4_t b) {			uint16x4_t test_vld1_lane_u16(uint16_t *a, uint16x4_t b) {
	return vld1_lane_u16(a, b, 3);			return vld1_lane_u16(a, b, 3);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vld1_lane_u32(i32* %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vld1_lane_u32(i32* %a, <2 x i32> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x i32> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x i32> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i32*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i32*
	// CHECK: [[TMP4:%.]] = load i32, i32 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i32, i32 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <2 x i32> [[TMP2]], i32 [[TMP4]], i32 1			// CHECK: [[VLD1_LANE:%.*]] = insertelement <2 x i32> [[TMP2]], i32 [[TMP4]], i32 1
	// CHECK: ret <2 x i32> [[VLD1_LANE]]			// CHECK: ret <2 x i32> [[VLD1_LANE]]
	uint32x2_t test_vld1_lane_u32(uint32_t *a, uint32x2_t b) {			uint32x2_t test_vld1_lane_u32(uint32_t *a, uint32x2_t b) {
	return vld1_lane_u32(a, b, 1);			return vld1_lane_u32(a, b, 1);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vld1_lane_u64(i64* %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vld1_lane_u64(i64* %a, <1 x i64> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: [[TMP4:%.]] = load i64, i64 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i64, i64 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <1 x i64> [[TMP2]], i64 [[TMP4]], i32 0			// CHECK: [[VLD1_LANE:%.*]] = insertelement <1 x i64> [[TMP2]], i64 [[TMP4]], i32 0
	// CHECK: ret <1 x i64> [[VLD1_LANE]]			// CHECK: ret <1 x i64> [[VLD1_LANE]]
	uint64x1_t test_vld1_lane_u64(uint64_t *a, uint64x1_t b) {			uint64x1_t test_vld1_lane_u64(uint64_t *a, uint64x1_t b) {
	return vld1_lane_u64(a, b, 0);			return vld1_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vld1_lane_s8(i8* %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vld1_lane_s8(i8* %a, <8 x i8> %b) #1 {
	// CHECK: [[TMP0:%.]] = load i8, i8 %a			// CHECK: [[TMP0:%.]] = load i8, i8 %a
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i8> %b, i8 [[TMP0]], i32 7			// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i8> %b, i8 [[TMP0]], i32 7
	// CHECK: ret <8 x i8> [[VLD1_LANE]]			// CHECK: ret <8 x i8> [[VLD1_LANE]]
	int8x8_t test_vld1_lane_s8(int8_t *a, int8x8_t b) {			int8x8_t test_vld1_lane_s8(int8_t *a, int8x8_t b) {
	return vld1_lane_s8(a, b, 7);			return vld1_lane_s8(a, b, 7);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vld1_lane_s16(i16* %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vld1_lane_s16(i16* %a, <4 x i16> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3			// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3
	// CHECK: ret <4 x i16> [[VLD1_LANE]]			// CHECK: ret <4 x i16> [[VLD1_LANE]]
	int16x4_t test_vld1_lane_s16(int16_t *a, int16x4_t b) {			int16x4_t test_vld1_lane_s16(int16_t *a, int16x4_t b) {
	return vld1_lane_s16(a, b, 3);			return vld1_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vld1_lane_s32(i32* %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define <2 x i32> @test_vld1_lane_s32(i32* %a, <2 x i32> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x i32> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x i32> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i32*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i32*
	// CHECK: [[TMP4:%.]] = load i32, i32 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i32, i32 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <2 x i32> [[TMP2]], i32 [[TMP4]], i32 1			// CHECK: [[VLD1_LANE:%.*]] = insertelement <2 x i32> [[TMP2]], i32 [[TMP4]], i32 1
	// CHECK: ret <2 x i32> [[VLD1_LANE]]			// CHECK: ret <2 x i32> [[VLD1_LANE]]
	int32x2_t test_vld1_lane_s32(int32_t *a, int32x2_t b) {			int32x2_t test_vld1_lane_s32(int32_t *a, int32x2_t b) {
	return vld1_lane_s32(a, b, 1);			return vld1_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vld1_lane_s64(i64* %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vld1_lane_s64(i64* %a, <1 x i64> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: [[TMP4:%.]] = load i64, i64 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i64, i64 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <1 x i64> [[TMP2]], i64 [[TMP4]], i32 0			// CHECK: [[VLD1_LANE:%.*]] = insertelement <1 x i64> [[TMP2]], i64 [[TMP4]], i32 0
	// CHECK: ret <1 x i64> [[VLD1_LANE]]			// CHECK: ret <1 x i64> [[VLD1_LANE]]
	int64x1_t test_vld1_lane_s64(int64_t *a, int64x1_t b) {			int64x1_t test_vld1_lane_s64(int64_t *a, int64x1_t b) {
	return vld1_lane_s64(a, b, 0);			return vld1_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: define <4 x half> @test_vld1_lane_f16(half* %a, <4 x half> %b) #0 {			// CHECK-LABEL: define <4 x half> @test_vld1_lane_f16(half* %a, <4 x half> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to half*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP4:%.]] = load half, half [[TMP3]]			// CHECK: [[TMP4:%.]] = load half, half [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x half> [[TMP2]], half [[TMP4]], i32 3			// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x half> [[TMP2]], half [[TMP4]], i32 3
	// CHECK: ret <4 x half> [[VLD1_LANE]]			// CHECK: ret <4 x half> [[VLD1_LANE]]
	float16x4_t test_vld1_lane_f16(float16_t *a, float16x4_t b) {			float16x4_t test_vld1_lane_f16(float16_t *a, float16x4_t b) {
	return vld1_lane_f16(a, b, 3);			return vld1_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vld1_lane_f32(float* %a, <2 x float> %b) #0 {			// CHECK-LABEL: define <2 x float> @test_vld1_lane_f32(float* %a, <2 x float> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to float*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to float*
	// CHECK: [[TMP4:%.]] = load float, float [[TMP3]]			// CHECK: [[TMP4:%.]] = load float, float [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP4]], i32 1			// CHECK: [[VLD1_LANE:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP4]], i32 1
	// CHECK: ret <2 x float> [[VLD1_LANE]]			// CHECK: ret <2 x float> [[VLD1_LANE]]
	float32x2_t test_vld1_lane_f32(float32_t *a, float32x2_t b) {			float32x2_t test_vld1_lane_f32(float32_t *a, float32x2_t b) {
	return vld1_lane_f32(a, b, 1);			return vld1_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define <1 x double> @test_vld1_lane_f64(double* %a, <1 x double> %b) #0 {			// CHECK-LABEL: define <1 x double> @test_vld1_lane_f64(double* %a, <1 x double> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast double %a to i8*			// CHECK: [[TMP0:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x double> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x double> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to double*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to double*
	// CHECK: [[TMP4:%.]] = load double, double [[TMP3]]			// CHECK: [[TMP4:%.]] = load double, double [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <1 x double> [[TMP2]], double [[TMP4]], i32 0			// CHECK: [[VLD1_LANE:%.*]] = insertelement <1 x double> [[TMP2]], double [[TMP4]], i32 0
	// CHECK: ret <1 x double> [[VLD1_LANE]]			// CHECK: ret <1 x double> [[VLD1_LANE]]
	float64x1_t test_vld1_lane_f64(float64_t *a, float64x1_t b) {			float64x1_t test_vld1_lane_f64(float64_t *a, float64x1_t b) {
	return vld1_lane_f64(a, b, 0);			return vld1_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vld1_lane_p8(i8* %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vld1_lane_p8(i8* %a, <8 x i8> %b) #1 {
	// CHECK: [[TMP0:%.]] = load i8, i8 %a			// CHECK: [[TMP0:%.]] = load i8, i8 %a
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i8> %b, i8 [[TMP0]], i32 7			// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i8> %b, i8 [[TMP0]], i32 7
	// CHECK: ret <8 x i8> [[VLD1_LANE]]			// CHECK: ret <8 x i8> [[VLD1_LANE]]
	poly8x8_t test_vld1_lane_p8(poly8_t *a, poly8x8_t b) {			poly8x8_t test_vld1_lane_p8(poly8_t *a, poly8x8_t b) {
	return vld1_lane_p8(a, b, 7);			return vld1_lane_p8(a, b, 7);
	}			}

	// CHECK-LABEL: define <4 x i16> @test_vld1_lane_p16(i16* %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define <4 x i16> @test_vld1_lane_p16(i16* %a, <4 x i16> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3			// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3
	// CHECK: ret <4 x i16> [[VLD1_LANE]]			// CHECK: ret <4 x i16> [[VLD1_LANE]]
	poly16x4_t test_vld1_lane_p16(poly16_t *a, poly16x4_t b) {			poly16x4_t test_vld1_lane_p16(poly16_t *a, poly16x4_t b) {
	return vld1_lane_p16(a, b, 3);			return vld1_lane_p16(a, b, 3);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vld1_lane_p64(i64* %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vld1_lane_p64(i64* %a, <1 x i64> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: [[TMP4:%.]] = load i64, i64 [[TMP3]]			// CHECK: [[TMP4:%.]] = load i64, i64 [[TMP3]]
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <1 x i64> [[TMP2]], i64 [[TMP4]], i32 0			// CHECK: [[VLD1_LANE:%.*]] = insertelement <1 x i64> [[TMP2]], i64 [[TMP4]], i32 0
	// CHECK: ret <1 x i64> [[VLD1_LANE]]			// CHECK: ret <1 x i64> [[VLD1_LANE]]
	poly64x1_t test_vld1_lane_p64(poly64_t *a, poly64x1_t b) {			poly64x1_t test_vld1_lane_p64(poly64_t *a, poly64x1_t b) {
	return vld1_lane_p64(a, b, 0);			return vld1_lane_p64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.int8x16x2_t @test_vld2q_lane_s8(i8* %ptr, [2 x <16 x i8>] %src.coerce) #0 {			// CHECK-LABEL: define %struct.int8x16x2_t @test_vld2q_lane_s8(i8* %ptr, [2 x <16 x i8>] %src.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[SRC:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[SRC:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[SRC]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[SRC]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[SRC]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[SRC]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x2_t [[SRC]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x2_t [[SRC]] to i8*
	Show All 12 Lines
	// CHECK: [[TMP7:%.]] = bitcast %struct.int8x16x2_t [[__RET]] to i8*			// CHECK: [[TMP7:%.]] = bitcast %struct.int8x16x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP6]], i8* align 16 [[TMP7]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP6]], i8* align 16 [[TMP7]], i64 32, i1 false)
	// CHECK: [[TMP8:%.]] = load %struct.int8x16x2_t, %struct.int8x16x2_t [[RETVAL]], align 16			// CHECK: [[TMP8:%.]] = load %struct.int8x16x2_t, %struct.int8x16x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.int8x16x2_t [[TMP8]]			// CHECK: ret %struct.int8x16x2_t [[TMP8]]
	int8x16x2_t test_vld2q_lane_s8(int8_t const * ptr, int8x16x2_t src) {			int8x16x2_t test_vld2q_lane_s8(int8_t const * ptr, int8x16x2_t src) {
	return vld2q_lane_s8(ptr, src, 15);			return vld2q_lane_s8(ptr, src, 15);
	}			}

	// CHECK-LABEL: define %struct.uint8x16x2_t @test_vld2q_lane_u8(i8* %ptr, [2 x <16 x i8>] %src.coerce) #0 {			// CHECK-LABEL: define %struct.uint8x16x2_t @test_vld2q_lane_u8(i8* %ptr, [2 x <16 x i8>] %src.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[SRC:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[SRC:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[SRC]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[SRC]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[SRC]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[SRC]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x2_t [[SRC]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x2_t [[SRC]] to i8*
	Show All 12 Lines
	// CHECK: [[TMP7:%.]] = bitcast %struct.uint8x16x2_t [[__RET]] to i8*			// CHECK: [[TMP7:%.]] = bitcast %struct.uint8x16x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP6]], i8* align 16 [[TMP7]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP6]], i8* align 16 [[TMP7]], i64 32, i1 false)
	// CHECK: [[TMP8:%.]] = load %struct.uint8x16x2_t, %struct.uint8x16x2_t [[RETVAL]], align 16			// CHECK: [[TMP8:%.]] = load %struct.uint8x16x2_t, %struct.uint8x16x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint8x16x2_t [[TMP8]]			// CHECK: ret %struct.uint8x16x2_t [[TMP8]]
	uint8x16x2_t test_vld2q_lane_u8(uint8_t const * ptr, uint8x16x2_t src) {			uint8x16x2_t test_vld2q_lane_u8(uint8_t const * ptr, uint8x16x2_t src) {
	return vld2q_lane_u8(ptr, src, 15);			return vld2q_lane_u8(ptr, src, 15);
	}			}

	// CHECK-LABEL: define %struct.poly8x16x2_t @test_vld2q_lane_p8(i8* %ptr, [2 x <16 x i8>] %src.coerce) #0 {			// CHECK-LABEL: define %struct.poly8x16x2_t @test_vld2q_lane_p8(i8* %ptr, [2 x <16 x i8>] %src.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[SRC:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[SRC:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[SRC]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[SRC]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[SRC]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[SRC]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x2_t [[SRC]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x2_t [[SRC]] to i8*
	Show All 12 Lines
	// CHECK: [[TMP7:%.]] = bitcast %struct.poly8x16x2_t [[__RET]] to i8*			// CHECK: [[TMP7:%.]] = bitcast %struct.poly8x16x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP6]], i8* align 16 [[TMP7]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP6]], i8* align 16 [[TMP7]], i64 32, i1 false)
	// CHECK: [[TMP8:%.]] = load %struct.poly8x16x2_t, %struct.poly8x16x2_t [[RETVAL]], align 16			// CHECK: [[TMP8:%.]] = load %struct.poly8x16x2_t, %struct.poly8x16x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly8x16x2_t [[TMP8]]			// CHECK: ret %struct.poly8x16x2_t [[TMP8]]
	poly8x16x2_t test_vld2q_lane_p8(poly8_t const * ptr, poly8x16x2_t src) {			poly8x16x2_t test_vld2q_lane_p8(poly8_t const * ptr, poly8x16x2_t src) {
	return vld2q_lane_p8(ptr, src, 15);			return vld2q_lane_p8(ptr, src, 15);
	}			}

	// CHECK-LABEL: define %struct.int8x16x3_t @test_vld3q_lane_s8(i8* %ptr, [3 x <16 x i8>] %src.coerce) #0 {			// CHECK-LABEL: define %struct.int8x16x3_t @test_vld3q_lane_s8(i8* %ptr, [3 x <16 x i8>] %src.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[SRC:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[SRC:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[SRC]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[SRC]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[SRC]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[SRC]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x3_t [[SRC]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x3_t [[SRC]] to i8*
	Show All 15 Lines
	// CHECK: [[TMP8:%.]] = bitcast %struct.int8x16x3_t [[__RET]] to i8*			// CHECK: [[TMP8:%.]] = bitcast %struct.int8x16x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP7]], i8* align 16 [[TMP8]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP7]], i8* align 16 [[TMP8]], i64 48, i1 false)
	// CHECK: [[TMP9:%.]] = load %struct.int8x16x3_t, %struct.int8x16x3_t [[RETVAL]], align 16			// CHECK: [[TMP9:%.]] = load %struct.int8x16x3_t, %struct.int8x16x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.int8x16x3_t [[TMP9]]			// CHECK: ret %struct.int8x16x3_t [[TMP9]]
	int8x16x3_t test_vld3q_lane_s8(int8_t const * ptr, int8x16x3_t src) {			int8x16x3_t test_vld3q_lane_s8(int8_t const * ptr, int8x16x3_t src) {
	return vld3q_lane_s8(ptr, src, 15);			return vld3q_lane_s8(ptr, src, 15);
	}			}

	// CHECK-LABEL: define %struct.uint8x16x3_t @test_vld3q_lane_u8(i8* %ptr, [3 x <16 x i8>] %src.coerce) #0 {			// CHECK-LABEL: define %struct.uint8x16x3_t @test_vld3q_lane_u8(i8* %ptr, [3 x <16 x i8>] %src.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[SRC:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[SRC:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[SRC]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[SRC]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[SRC]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[SRC]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x3_t [[SRC]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x3_t [[SRC]] to i8*
	Show All 15 Lines
	// CHECK: [[TMP8:%.]] = bitcast %struct.uint8x16x3_t [[__RET]] to i8*			// CHECK: [[TMP8:%.]] = bitcast %struct.uint8x16x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP7]], i8* align 16 [[TMP8]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP7]], i8* align 16 [[TMP8]], i64 48, i1 false)
	// CHECK: [[TMP9:%.]] = load %struct.uint8x16x3_t, %struct.uint8x16x3_t [[RETVAL]], align 16			// CHECK: [[TMP9:%.]] = load %struct.uint8x16x3_t, %struct.uint8x16x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint8x16x3_t [[TMP9]]			// CHECK: ret %struct.uint8x16x3_t [[TMP9]]
	uint8x16x3_t test_vld3q_lane_u8(uint8_t const * ptr, uint8x16x3_t src) {			uint8x16x3_t test_vld3q_lane_u8(uint8_t const * ptr, uint8x16x3_t src) {
	return vld3q_lane_u8(ptr, src, 15);			return vld3q_lane_u8(ptr, src, 15);
	}			}

	// CHECK-LABEL: define %struct.uint16x8x2_t @test_vld2q_lane_u16(i16* %a, [2 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint16x8x2_t @test_vld2q_lane_u16(i16* %a, [2 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x8x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x8x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint16x8x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint16x8x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint16x8x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint16x8x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x2_t, %struct.uint16x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x2_t, %struct.uint16x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.uint16x8x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.uint16x8x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.uint16x8x2_t, %struct.uint16x8x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.uint16x8x2_t, %struct.uint16x8x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint16x8x2_t [[TMP13]]			// CHECK: ret %struct.uint16x8x2_t [[TMP13]]
	uint16x8x2_t test_vld2q_lane_u16(uint16_t *a, uint16x8x2_t b) {			uint16x8x2_t test_vld2q_lane_u16(uint16_t *a, uint16x8x2_t b) {
	return vld2q_lane_u16(a, b, 7);			return vld2q_lane_u16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.uint32x4x2_t @test_vld2q_lane_u32(i32* %a, [2 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint32x4x2_t @test_vld2q_lane_u32(i32* %a, [2 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x4x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x4x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint32x4x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint32x4x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x2_t, %struct.uint32x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x2_t, %struct.uint32x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i32>] [[B]].coerce, [2 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <4 x i32>] [[B]].coerce, [2 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.uint32x4x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.uint32x4x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.uint32x4x2_t, %struct.uint32x4x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.uint32x4x2_t, %struct.uint32x4x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint32x4x2_t [[TMP13]]			// CHECK: ret %struct.uint32x4x2_t [[TMP13]]
	uint32x4x2_t test_vld2q_lane_u32(uint32_t *a, uint32x4x2_t b) {			uint32x4x2_t test_vld2q_lane_u32(uint32_t *a, uint32x4x2_t b) {
	return vld2q_lane_u32(a, b, 3);			return vld2q_lane_u32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.uint64x2x2_t @test_vld2q_lane_u64(i64* %a, [2 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint64x2x2_t @test_vld2q_lane_u64(i64* %a, [2 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint64x2x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x2_t, %struct.uint64x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x2_t, %struct.uint64x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.uint64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.uint64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.uint64x2x2_t, %struct.uint64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.uint64x2x2_t, %struct.uint64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint64x2x2_t [[TMP13]]			// CHECK: ret %struct.uint64x2x2_t [[TMP13]]
	uint64x2x2_t test_vld2q_lane_u64(uint64_t *a, uint64x2x2_t b) {			uint64x2x2_t test_vld2q_lane_u64(uint64_t *a, uint64x2x2_t b) {
	return vld2q_lane_u64(a, b, 1);			return vld2q_lane_u64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.int16x8x2_t @test_vld2q_lane_s16(i16* %a, [2 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int16x8x2_t @test_vld2q_lane_s16(i16* %a, [2 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x8x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x8x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int16x8x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int16x8x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int16x8x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int16x8x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x2_t, %struct.int16x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x2_t, %struct.int16x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.int16x8x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.int16x8x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.int16x8x2_t, %struct.int16x8x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.int16x8x2_t, %struct.int16x8x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.int16x8x2_t [[TMP13]]			// CHECK: ret %struct.int16x8x2_t [[TMP13]]
	int16x8x2_t test_vld2q_lane_s16(int16_t *a, int16x8x2_t b) {			int16x8x2_t test_vld2q_lane_s16(int16_t *a, int16x8x2_t b) {
	return vld2q_lane_s16(a, b, 7);			return vld2q_lane_s16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.int32x4x2_t @test_vld2q_lane_s32(i32* %a, [2 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int32x4x2_t @test_vld2q_lane_s32(i32* %a, [2 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x4x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x4x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int32x4x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int32x4x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x2_t, %struct.int32x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x2_t, %struct.int32x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i32>] [[B]].coerce, [2 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <4 x i32>] [[B]].coerce, [2 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.int32x4x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.int32x4x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.int32x4x2_t, %struct.int32x4x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.int32x4x2_t, %struct.int32x4x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.int32x4x2_t [[TMP13]]			// CHECK: ret %struct.int32x4x2_t [[TMP13]]
	int32x4x2_t test_vld2q_lane_s32(int32_t *a, int32x4x2_t b) {			int32x4x2_t test_vld2q_lane_s32(int32_t *a, int32x4x2_t b) {
	return vld2q_lane_s32(a, b, 3);			return vld2q_lane_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.int64x2x2_t @test_vld2q_lane_s64(i64* %a, [2 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int64x2x2_t @test_vld2q_lane_s64(i64* %a, [2 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int64x2x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x2_t, %struct.int64x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x2_t, %struct.int64x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.int64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.int64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.int64x2x2_t, %struct.int64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.int64x2x2_t, %struct.int64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.int64x2x2_t [[TMP13]]			// CHECK: ret %struct.int64x2x2_t [[TMP13]]
	int64x2x2_t test_vld2q_lane_s64(int64_t *a, int64x2x2_t b) {			int64x2x2_t test_vld2q_lane_s64(int64_t *a, int64x2x2_t b) {
	return vld2q_lane_s64(a, b, 1);			return vld2q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.float16x8x2_t @test_vld2q_lane_f16(half* %a, [2 x <8 x half>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float16x8x2_t @test_vld2q_lane_f16(half* %a, [2 x <8 x half>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float16x8x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float16x8x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x half>] [[B]].coerce, [2 x <8 x half>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <8 x half>] [[B]].coerce, [2 x <8 x half>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.float16x8x2_t [[TMP13]]			// CHECK: ret %struct.float16x8x2_t [[TMP13]]
	float16x8x2_t test_vld2q_lane_f16(float16_t *a, float16x8x2_t b) {			float16x8x2_t test_vld2q_lane_f16(float16_t *a, float16x8x2_t b) {
	return vld2q_lane_f16(a, b, 7);			return vld2q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.float32x4x2_t @test_vld2q_lane_f32(float* %a, [2 x <4 x float>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float32x4x2_t @test_vld2q_lane_f32(float* %a, [2 x <4 x float>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x2_t, %struct.float32x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x2_t, %struct.float32x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x float>] [[B]].coerce, [2 x <4 x float>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <4 x float>] [[B]].coerce, [2 x <4 x float>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.float32x4x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.float32x4x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.float32x4x2_t, %struct.float32x4x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.float32x4x2_t, %struct.float32x4x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.float32x4x2_t [[TMP13]]			// CHECK: ret %struct.float32x4x2_t [[TMP13]]
	float32x4x2_t test_vld2q_lane_f32(float32_t *a, float32x4x2_t b) {			float32x4x2_t test_vld2q_lane_f32(float32_t *a, float32x4x2_t b) {
	return vld2q_lane_f32(a, b, 3);			return vld2q_lane_f32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.float64x2x2_t @test_vld2q_lane_f64(double* %a, [2 x <2 x double>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float64x2x2_t @test_vld2q_lane_f64(double* %a, [2 x <2 x double>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float64x2x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x2_t, %struct.float64x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x2_t, %struct.float64x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x double>] [[B]].coerce, [2 x <2 x double>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x double>] [[B]].coerce, [2 x <2 x double>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.float64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.float64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.float64x2x2_t, %struct.float64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.float64x2x2_t, %struct.float64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.float64x2x2_t [[TMP13]]			// CHECK: ret %struct.float64x2x2_t [[TMP13]]
	float64x2x2_t test_vld2q_lane_f64(float64_t *a, float64x2x2_t b) {			float64x2x2_t test_vld2q_lane_f64(float64_t *a, float64x2x2_t b) {
	return vld2q_lane_f64(a, b, 1);			return vld2q_lane_f64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.poly16x8x2_t @test_vld2q_lane_p16(i16* %a, [2 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly16x8x2_t @test_vld2q_lane_p16(i16* %a, [2 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x8x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x8x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly16x8x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly16x8x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly16x8x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly16x8x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x2_t, %struct.poly16x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x2_t, %struct.poly16x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.poly16x8x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.poly16x8x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.poly16x8x2_t, %struct.poly16x8x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.poly16x8x2_t, %struct.poly16x8x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly16x8x2_t [[TMP13]]			// CHECK: ret %struct.poly16x8x2_t [[TMP13]]
	poly16x8x2_t test_vld2q_lane_p16(poly16_t *a, poly16x8x2_t b) {			poly16x8x2_t test_vld2q_lane_p16(poly16_t *a, poly16x8x2_t b) {
	return vld2q_lane_p16(a, b, 7);			return vld2q_lane_p16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.poly64x2x2_t @test_vld2q_lane_p64(i64* %a, [2 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly64x2x2_t @test_vld2q_lane_p64(i64* %a, [2 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP11]], i8* align 16 [[TMP12]], i64 32, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.poly64x2x2_t, %struct.poly64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP13:%.]] = load %struct.poly64x2x2_t, %struct.poly64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x2_t [[TMP13]]			// CHECK: ret %struct.poly64x2x2_t [[TMP13]]
	poly64x2x2_t test_vld2q_lane_p64(poly64_t *a, poly64x2x2_t b) {			poly64x2x2_t test_vld2q_lane_p64(poly64_t *a, poly64x2x2_t b) {
	return vld2q_lane_p64(a, b, 1);			return vld2q_lane_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.uint8x8x2_t @test_vld2_lane_u8(i8* %a, [2 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint8x8x2_t @test_vld2_lane_u8(i8* %a, [2 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x8x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x8x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint8x8x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint8x8x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint8x8x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x2_t [[B]] to i8*
	Show All 12 Lines
	// CHECK: [[TMP7:%.]] = bitcast %struct.uint8x8x2_t [[__RET]] to i8*			// CHECK: [[TMP7:%.]] = bitcast %struct.uint8x8x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP6]], i8* align 8 [[TMP7]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP6]], i8* align 8 [[TMP7]], i64 16, i1 false)
	// CHECK: [[TMP8:%.]] = load %struct.uint8x8x2_t, %struct.uint8x8x2_t [[RETVAL]], align 8			// CHECK: [[TMP8:%.]] = load %struct.uint8x8x2_t, %struct.uint8x8x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint8x8x2_t [[TMP8]]			// CHECK: ret %struct.uint8x8x2_t [[TMP8]]
	uint8x8x2_t test_vld2_lane_u8(uint8_t *a, uint8x8x2_t b) {			uint8x8x2_t test_vld2_lane_u8(uint8_t *a, uint8x8x2_t b) {
	return vld2_lane_u8(a, b, 7);			return vld2_lane_u8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.uint16x4x2_t @test_vld2_lane_u16(i16* %a, [2 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint16x4x2_t @test_vld2_lane_u16(i16* %a, [2 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x4x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x4x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint16x4x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint16x4x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint16x4x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint16x4x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x2_t, %struct.uint16x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x2_t, %struct.uint16x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.uint16x4x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.uint16x4x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.uint16x4x2_t, %struct.uint16x4x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.uint16x4x2_t, %struct.uint16x4x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint16x4x2_t [[TMP13]]			// CHECK: ret %struct.uint16x4x2_t [[TMP13]]
	uint16x4x2_t test_vld2_lane_u16(uint16_t *a, uint16x4x2_t b) {			uint16x4x2_t test_vld2_lane_u16(uint16_t *a, uint16x4x2_t b) {
	return vld2_lane_u16(a, b, 3);			return vld2_lane_u16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.uint32x2x2_t @test_vld2_lane_u32(i32* %a, [2 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint32x2x2_t @test_vld2_lane_u32(i32* %a, [2 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x2x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x2x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint32x2x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint32x2x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x2_t, %struct.uint32x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x2_t, %struct.uint32x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i32>] [[B]].coerce, [2 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <2 x i32>] [[B]].coerce, [2 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.uint32x2x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.uint32x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.uint32x2x2_t, %struct.uint32x2x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.uint32x2x2_t, %struct.uint32x2x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint32x2x2_t [[TMP13]]			// CHECK: ret %struct.uint32x2x2_t [[TMP13]]
	uint32x2x2_t test_vld2_lane_u32(uint32_t *a, uint32x2x2_t b) {			uint32x2x2_t test_vld2_lane_u32(uint32_t *a, uint32x2x2_t b) {
	return vld2_lane_u32(a, b, 1);			return vld2_lane_u32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.uint64x1x2_t @test_vld2_lane_u64(i64* %a, [2 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint64x1x2_t @test_vld2_lane_u64(i64* %a, [2 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x1x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x1x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint64x1x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x1x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x2_t, %struct.uint64x1x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x2_t, %struct.uint64x1x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.uint64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.uint64x1x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.uint64x1x2_t, %struct.uint64x1x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.uint64x1x2_t, %struct.uint64x1x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint64x1x2_t [[TMP13]]			// CHECK: ret %struct.uint64x1x2_t [[TMP13]]
	uint64x1x2_t test_vld2_lane_u64(uint64_t *a, uint64x1x2_t b) {			uint64x1x2_t test_vld2_lane_u64(uint64_t *a, uint64x1x2_t b) {
	return vld2_lane_u64(a, b, 0);			return vld2_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.int8x8x2_t @test_vld2_lane_s8(i8* %a, [2 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int8x8x2_t @test_vld2_lane_s8(i8* %a, [2 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x8x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x8x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int8x8x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int8x8x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int8x8x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x2_t [[B]] to i8*
	Show All 12 Lines
	// CHECK: [[TMP7:%.]] = bitcast %struct.int8x8x2_t [[__RET]] to i8*			// CHECK: [[TMP7:%.]] = bitcast %struct.int8x8x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP6]], i8* align 8 [[TMP7]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP6]], i8* align 8 [[TMP7]], i64 16, i1 false)
	// CHECK: [[TMP8:%.]] = load %struct.int8x8x2_t, %struct.int8x8x2_t [[RETVAL]], align 8			// CHECK: [[TMP8:%.]] = load %struct.int8x8x2_t, %struct.int8x8x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.int8x8x2_t [[TMP8]]			// CHECK: ret %struct.int8x8x2_t [[TMP8]]
	int8x8x2_t test_vld2_lane_s8(int8_t *a, int8x8x2_t b) {			int8x8x2_t test_vld2_lane_s8(int8_t *a, int8x8x2_t b) {
	return vld2_lane_s8(a, b, 7);			return vld2_lane_s8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.int16x4x2_t @test_vld2_lane_s16(i16* %a, [2 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int16x4x2_t @test_vld2_lane_s16(i16* %a, [2 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x4x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x4x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int16x4x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int16x4x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int16x4x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int16x4x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x2_t, %struct.int16x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x2_t, %struct.int16x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.int16x4x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.int16x4x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.int16x4x2_t, %struct.int16x4x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.int16x4x2_t, %struct.int16x4x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.int16x4x2_t [[TMP13]]			// CHECK: ret %struct.int16x4x2_t [[TMP13]]
	int16x4x2_t test_vld2_lane_s16(int16_t *a, int16x4x2_t b) {			int16x4x2_t test_vld2_lane_s16(int16_t *a, int16x4x2_t b) {
	return vld2_lane_s16(a, b, 3);			return vld2_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.int32x2x2_t @test_vld2_lane_s32(i32* %a, [2 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int32x2x2_t @test_vld2_lane_s32(i32* %a, [2 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x2x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x2x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int32x2x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int32x2x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x2_t, %struct.int32x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x2_t, %struct.int32x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i32>] [[B]].coerce, [2 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <2 x i32>] [[B]].coerce, [2 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.int32x2x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.int32x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.int32x2x2_t, %struct.int32x2x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.int32x2x2_t, %struct.int32x2x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.int32x2x2_t [[TMP13]]			// CHECK: ret %struct.int32x2x2_t [[TMP13]]
	int32x2x2_t test_vld2_lane_s32(int32_t *a, int32x2x2_t b) {			int32x2x2_t test_vld2_lane_s32(int32_t *a, int32x2x2_t b) {
	return vld2_lane_s32(a, b, 1);			return vld2_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.int64x1x2_t @test_vld2_lane_s64(i64* %a, [2 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int64x1x2_t @test_vld2_lane_s64(i64* %a, [2 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x1x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x1x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int64x1x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x1x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x2_t, %struct.int64x1x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x2_t, %struct.int64x1x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.int64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.int64x1x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.int64x1x2_t, %struct.int64x1x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.int64x1x2_t, %struct.int64x1x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.int64x1x2_t [[TMP13]]			// CHECK: ret %struct.int64x1x2_t [[TMP13]]
	int64x1x2_t test_vld2_lane_s64(int64_t *a, int64x1x2_t b) {			int64x1x2_t test_vld2_lane_s64(int64_t *a, int64x1x2_t b) {
	return vld2_lane_s64(a, b, 0);			return vld2_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.float16x4x2_t @test_vld2_lane_f16(half* %a, [2 x <4 x half>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float16x4x2_t @test_vld2_lane_f16(half* %a, [2 x <4 x half>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x half>] [[B]].coerce, [2 x <4 x half>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <4 x half>] [[B]].coerce, [2 x <4 x half>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.float16x4x2_t [[TMP13]]			// CHECK: ret %struct.float16x4x2_t [[TMP13]]
	float16x4x2_t test_vld2_lane_f16(float16_t *a, float16x4x2_t b) {			float16x4x2_t test_vld2_lane_f16(float16_t *a, float16x4x2_t b) {
	return vld2_lane_f16(a, b, 3);			return vld2_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.float32x2x2_t @test_vld2_lane_f32(float* %a, [2 x <2 x float>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float32x2x2_t @test_vld2_lane_f32(float* %a, [2 x <2 x float>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x2_t, %struct.float32x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x2_t, %struct.float32x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x float>] [[B]].coerce, [2 x <2 x float>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <2 x float>] [[B]].coerce, [2 x <2 x float>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.float32x2x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.float32x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.float32x2x2_t, %struct.float32x2x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.float32x2x2_t, %struct.float32x2x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.float32x2x2_t [[TMP13]]			// CHECK: ret %struct.float32x2x2_t [[TMP13]]
	float32x2x2_t test_vld2_lane_f32(float32_t *a, float32x2x2_t b) {			float32x2x2_t test_vld2_lane_f32(float32_t *a, float32x2x2_t b) {
	return vld2_lane_f32(a, b, 1);			return vld2_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.float64x1x2_t @test_vld2_lane_f64(double* %a, [2 x <1 x double>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float64x1x2_t @test_vld2_lane_f64(double* %a, [2 x <1 x double>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float64x1x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x2_t, %struct.float64x1x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x2_t, %struct.float64x1x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <1 x double>] [[B]].coerce, [2 x <1 x double>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x double>] [[B]].coerce, [2 x <1 x double>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.float64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.float64x1x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.float64x1x2_t, %struct.float64x1x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.float64x1x2_t, %struct.float64x1x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.float64x1x2_t [[TMP13]]			// CHECK: ret %struct.float64x1x2_t [[TMP13]]
	float64x1x2_t test_vld2_lane_f64(float64_t *a, float64x1x2_t b) {			float64x1x2_t test_vld2_lane_f64(float64_t *a, float64x1x2_t b) {
	return vld2_lane_f64(a, b, 0);			return vld2_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.poly8x8x2_t @test_vld2_lane_p8(i8* %a, [2 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly8x8x2_t @test_vld2_lane_p8(i8* %a, [2 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x8x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x8x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly8x8x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly8x8x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly8x8x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x2_t [[B]] to i8*
	Show All 12 Lines
	// CHECK: [[TMP7:%.]] = bitcast %struct.poly8x8x2_t [[__RET]] to i8*			// CHECK: [[TMP7:%.]] = bitcast %struct.poly8x8x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP6]], i8* align 8 [[TMP7]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP6]], i8* align 8 [[TMP7]], i64 16, i1 false)
	// CHECK: [[TMP8:%.]] = load %struct.poly8x8x2_t, %struct.poly8x8x2_t [[RETVAL]], align 8			// CHECK: [[TMP8:%.]] = load %struct.poly8x8x2_t, %struct.poly8x8x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly8x8x2_t [[TMP8]]			// CHECK: ret %struct.poly8x8x2_t [[TMP8]]
	poly8x8x2_t test_vld2_lane_p8(poly8_t *a, poly8x8x2_t b) {			poly8x8x2_t test_vld2_lane_p8(poly8_t *a, poly8x8x2_t b) {
	return vld2_lane_p8(a, b, 7);			return vld2_lane_p8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.poly16x4x2_t @test_vld2_lane_p16(i16* %a, [2 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly16x4x2_t @test_vld2_lane_p16(i16* %a, [2 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x4x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x4x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly16x4x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly16x4x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly16x4x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly16x4x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x2_t, %struct.poly16x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x2_t, %struct.poly16x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.poly16x4x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.poly16x4x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.poly16x4x2_t, %struct.poly16x4x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.poly16x4x2_t, %struct.poly16x4x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly16x4x2_t [[TMP13]]			// CHECK: ret %struct.poly16x4x2_t [[TMP13]]
	poly16x4x2_t test_vld2_lane_p16(poly16_t *a, poly16x4x2_t b) {			poly16x4x2_t test_vld2_lane_p16(poly16_t *a, poly16x4x2_t b) {
	return vld2_lane_p16(a, b, 3);			return vld2_lane_p16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.poly64x1x2_t @test_vld2_lane_p64(i64* %a, [2 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly64x1x2_t @test_vld2_lane_p64(i64* %a, [2 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x2_t [[B]] to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP12:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP11]], i8* align 8 [[TMP12]], i64 16, i1 false)
	// CHECK: [[TMP13:%.]] = load %struct.poly64x1x2_t, %struct.poly64x1x2_t [[RETVAL]], align 8			// CHECK: [[TMP13:%.]] = load %struct.poly64x1x2_t, %struct.poly64x1x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly64x1x2_t [[TMP13]]			// CHECK: ret %struct.poly64x1x2_t [[TMP13]]
	poly64x1x2_t test_vld2_lane_p64(poly64_t *a, poly64x1x2_t b) {			poly64x1x2_t test_vld2_lane_p64(poly64_t *a, poly64x1x2_t b) {
	return vld2_lane_p64(a, b, 0);			return vld2_lane_p64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.uint16x8x3_t @test_vld3q_lane_u16(i16* %a, [3 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint16x8x3_t @test_vld3q_lane_u16(i16* %a, [3 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x8x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x8x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint16x8x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint16x8x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint16x8x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint16x8x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x3_t, %struct.uint16x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x3_t, %struct.uint16x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.uint16x8x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.uint16x8x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.uint16x8x3_t, %struct.uint16x8x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.uint16x8x3_t, %struct.uint16x8x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint16x8x3_t [[TMP16]]			// CHECK: ret %struct.uint16x8x3_t [[TMP16]]
	uint16x8x3_t test_vld3q_lane_u16(uint16_t *a, uint16x8x3_t b) {			uint16x8x3_t test_vld3q_lane_u16(uint16_t *a, uint16x8x3_t b) {
	return vld3q_lane_u16(a, b, 7);			return vld3q_lane_u16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.uint32x4x3_t @test_vld3q_lane_u32(i32* %a, [3 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint32x4x3_t @test_vld3q_lane_u32(i32* %a, [3 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x4x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x4x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint32x4x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint32x4x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x3_t, %struct.uint32x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x3_t, %struct.uint32x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i32>] [[B]].coerce, [3 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <4 x i32>] [[B]].coerce, [3 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.uint32x4x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.uint32x4x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.uint32x4x3_t, %struct.uint32x4x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.uint32x4x3_t, %struct.uint32x4x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint32x4x3_t [[TMP16]]			// CHECK: ret %struct.uint32x4x3_t [[TMP16]]
	uint32x4x3_t test_vld3q_lane_u32(uint32_t *a, uint32x4x3_t b) {			uint32x4x3_t test_vld3q_lane_u32(uint32_t *a, uint32x4x3_t b) {
	return vld3q_lane_u32(a, b, 3);			return vld3q_lane_u32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.uint64x2x3_t @test_vld3q_lane_u64(i64* %a, [3 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint64x2x3_t @test_vld3q_lane_u64(i64* %a, [3 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint64x2x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x3_t, %struct.uint64x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x3_t, %struct.uint64x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.uint64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.uint64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.uint64x2x3_t, %struct.uint64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.uint64x2x3_t, %struct.uint64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint64x2x3_t [[TMP16]]			// CHECK: ret %struct.uint64x2x3_t [[TMP16]]
	uint64x2x3_t test_vld3q_lane_u64(uint64_t *a, uint64x2x3_t b) {			uint64x2x3_t test_vld3q_lane_u64(uint64_t *a, uint64x2x3_t b) {
	return vld3q_lane_u64(a, b, 1);			return vld3q_lane_u64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.int16x8x3_t @test_vld3q_lane_s16(i16* %a, [3 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int16x8x3_t @test_vld3q_lane_s16(i16* %a, [3 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x8x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x8x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int16x8x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int16x8x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int16x8x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int16x8x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x3_t, %struct.int16x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x3_t, %struct.int16x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.int16x8x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.int16x8x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.int16x8x3_t, %struct.int16x8x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.int16x8x3_t, %struct.int16x8x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.int16x8x3_t [[TMP16]]			// CHECK: ret %struct.int16x8x3_t [[TMP16]]
	int16x8x3_t test_vld3q_lane_s16(int16_t *a, int16x8x3_t b) {			int16x8x3_t test_vld3q_lane_s16(int16_t *a, int16x8x3_t b) {
	return vld3q_lane_s16(a, b, 7);			return vld3q_lane_s16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.int32x4x3_t @test_vld3q_lane_s32(i32* %a, [3 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int32x4x3_t @test_vld3q_lane_s32(i32* %a, [3 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x4x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x4x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int32x4x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int32x4x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x3_t, %struct.int32x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x3_t, %struct.int32x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i32>] [[B]].coerce, [3 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <4 x i32>] [[B]].coerce, [3 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.int32x4x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.int32x4x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.int32x4x3_t, %struct.int32x4x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.int32x4x3_t, %struct.int32x4x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.int32x4x3_t [[TMP16]]			// CHECK: ret %struct.int32x4x3_t [[TMP16]]
	int32x4x3_t test_vld3q_lane_s32(int32_t *a, int32x4x3_t b) {			int32x4x3_t test_vld3q_lane_s32(int32_t *a, int32x4x3_t b) {
	return vld3q_lane_s32(a, b, 3);			return vld3q_lane_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.int64x2x3_t @test_vld3q_lane_s64(i64* %a, [3 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int64x2x3_t @test_vld3q_lane_s64(i64* %a, [3 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int64x2x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x3_t, %struct.int64x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x3_t, %struct.int64x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.int64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.int64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.int64x2x3_t, %struct.int64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.int64x2x3_t, %struct.int64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.int64x2x3_t [[TMP16]]			// CHECK: ret %struct.int64x2x3_t [[TMP16]]
	int64x2x3_t test_vld3q_lane_s64(int64_t *a, int64x2x3_t b) {			int64x2x3_t test_vld3q_lane_s64(int64_t *a, int64x2x3_t b) {
	return vld3q_lane_s64(a, b, 1);			return vld3q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.float16x8x3_t @test_vld3q_lane_f16(half* %a, [3 x <8 x half>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float16x8x3_t @test_vld3q_lane_f16(half* %a, [3 x <8 x half>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float16x8x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float16x8x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x half>] [[B]].coerce, [3 x <8 x half>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <8 x half>] [[B]].coerce, [3 x <8 x half>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.float16x8x3_t [[TMP16]]			// CHECK: ret %struct.float16x8x3_t [[TMP16]]
	float16x8x3_t test_vld3q_lane_f16(float16_t *a, float16x8x3_t b) {			float16x8x3_t test_vld3q_lane_f16(float16_t *a, float16x8x3_t b) {
	return vld3q_lane_f16(a, b, 7);			return vld3q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.float32x4x3_t @test_vld3q_lane_f32(float* %a, [3 x <4 x float>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float32x4x3_t @test_vld3q_lane_f32(float* %a, [3 x <4 x float>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x3_t, %struct.float32x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x3_t, %struct.float32x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x float>] [[B]].coerce, [3 x <4 x float>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <4 x float>] [[B]].coerce, [3 x <4 x float>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.float32x4x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.float32x4x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.float32x4x3_t, %struct.float32x4x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.float32x4x3_t, %struct.float32x4x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.float32x4x3_t [[TMP16]]			// CHECK: ret %struct.float32x4x3_t [[TMP16]]
	float32x4x3_t test_vld3q_lane_f32(float32_t *a, float32x4x3_t b) {			float32x4x3_t test_vld3q_lane_f32(float32_t *a, float32x4x3_t b) {
	return vld3q_lane_f32(a, b, 3);			return vld3q_lane_f32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.float64x2x3_t @test_vld3q_lane_f64(double* %a, [3 x <2 x double>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float64x2x3_t @test_vld3q_lane_f64(double* %a, [3 x <2 x double>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float64x2x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x3_t, %struct.float64x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x3_t, %struct.float64x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x double>] [[B]].coerce, [3 x <2 x double>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x double>] [[B]].coerce, [3 x <2 x double>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.float64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.float64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.float64x2x3_t, %struct.float64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.float64x2x3_t, %struct.float64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.float64x2x3_t [[TMP16]]			// CHECK: ret %struct.float64x2x3_t [[TMP16]]
	float64x2x3_t test_vld3q_lane_f64(float64_t *a, float64x2x3_t b) {			float64x2x3_t test_vld3q_lane_f64(float64_t *a, float64x2x3_t b) {
	return vld3q_lane_f64(a, b, 1);			return vld3q_lane_f64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.poly8x16x3_t @test_vld3q_lane_p8(i8* %a, [3 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly8x16x3_t @test_vld3q_lane_p8(i8* %a, [3 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x3_t [[B]] to i8*
	Show All 15 Lines
	// CHECK: [[TMP8:%.]] = bitcast %struct.poly8x16x3_t [[__RET]] to i8*			// CHECK: [[TMP8:%.]] = bitcast %struct.poly8x16x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP7]], i8* align 16 [[TMP8]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP7]], i8* align 16 [[TMP8]], i64 48, i1 false)
	// CHECK: [[TMP9:%.]] = load %struct.poly8x16x3_t, %struct.poly8x16x3_t [[RETVAL]], align 16			// CHECK: [[TMP9:%.]] = load %struct.poly8x16x3_t, %struct.poly8x16x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly8x16x3_t [[TMP9]]			// CHECK: ret %struct.poly8x16x3_t [[TMP9]]
	poly8x16x3_t test_vld3q_lane_p8(poly8_t *a, poly8x16x3_t b) {			poly8x16x3_t test_vld3q_lane_p8(poly8_t *a, poly8x16x3_t b) {
	return vld3q_lane_p8(a, b, 15);			return vld3q_lane_p8(a, b, 15);
	}			}

	// CHECK-LABEL: define %struct.poly16x8x3_t @test_vld3q_lane_p16(i16* %a, [3 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly16x8x3_t @test_vld3q_lane_p16(i16* %a, [3 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x8x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x8x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly16x8x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly16x8x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly16x8x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly16x8x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x3_t, %struct.poly16x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x3_t, %struct.poly16x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.poly16x8x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.poly16x8x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.poly16x8x3_t, %struct.poly16x8x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.poly16x8x3_t, %struct.poly16x8x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly16x8x3_t [[TMP16]]			// CHECK: ret %struct.poly16x8x3_t [[TMP16]]
	poly16x8x3_t test_vld3q_lane_p16(poly16_t *a, poly16x8x3_t b) {			poly16x8x3_t test_vld3q_lane_p16(poly16_t *a, poly16x8x3_t b) {
	return vld3q_lane_p16(a, b, 7);			return vld3q_lane_p16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.poly64x2x3_t @test_vld3q_lane_p64(i64* %a, [3 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly64x2x3_t @test_vld3q_lane_p64(i64* %a, [3 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x3_t, %struct.poly64x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x3_t, %struct.poly64x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP14]], i8* align 16 [[TMP15]], i64 48, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.poly64x2x3_t, %struct.poly64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP16:%.]] = load %struct.poly64x2x3_t, %struct.poly64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x3_t [[TMP16]]			// CHECK: ret %struct.poly64x2x3_t [[TMP16]]
	poly64x2x3_t test_vld3q_lane_p64(poly64_t *a, poly64x2x3_t b) {			poly64x2x3_t test_vld3q_lane_p64(poly64_t *a, poly64x2x3_t b) {
	return vld3q_lane_p64(a, b, 1);			return vld3q_lane_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.uint8x8x3_t @test_vld3_lane_u8(i8* %a, [3 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint8x8x3_t @test_vld3_lane_u8(i8* %a, [3 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x8x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint8x8x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint8x8x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x3_t [[B]] to i8*
	Show All 15 Lines
	// CHECK: [[TMP8:%.]] = bitcast %struct.uint8x8x3_t [[__RET]] to i8*			// CHECK: [[TMP8:%.]] = bitcast %struct.uint8x8x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP7]], i8* align 8 [[TMP8]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP7]], i8* align 8 [[TMP8]], i64 24, i1 false)
	// CHECK: [[TMP9:%.]] = load %struct.uint8x8x3_t, %struct.uint8x8x3_t [[RETVAL]], align 8			// CHECK: [[TMP9:%.]] = load %struct.uint8x8x3_t, %struct.uint8x8x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint8x8x3_t [[TMP9]]			// CHECK: ret %struct.uint8x8x3_t [[TMP9]]
	uint8x8x3_t test_vld3_lane_u8(uint8_t *a, uint8x8x3_t b) {			uint8x8x3_t test_vld3_lane_u8(uint8_t *a, uint8x8x3_t b) {
	return vld3_lane_u8(a, b, 7);			return vld3_lane_u8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.uint16x4x3_t @test_vld3_lane_u16(i16* %a, [3 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint16x4x3_t @test_vld3_lane_u16(i16* %a, [3 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x4x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x4x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint16x4x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint16x4x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint16x4x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint16x4x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x3_t, %struct.uint16x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x3_t, %struct.uint16x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.uint16x4x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.uint16x4x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.uint16x4x3_t, %struct.uint16x4x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.uint16x4x3_t, %struct.uint16x4x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint16x4x3_t [[TMP16]]			// CHECK: ret %struct.uint16x4x3_t [[TMP16]]
	uint16x4x3_t test_vld3_lane_u16(uint16_t *a, uint16x4x3_t b) {			uint16x4x3_t test_vld3_lane_u16(uint16_t *a, uint16x4x3_t b) {
	return vld3_lane_u16(a, b, 3);			return vld3_lane_u16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.uint32x2x3_t @test_vld3_lane_u32(i32* %a, [3 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint32x2x3_t @test_vld3_lane_u32(i32* %a, [3 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x2x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x2x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint32x2x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint32x2x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x3_t, %struct.uint32x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x3_t, %struct.uint32x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i32>] [[B]].coerce, [3 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <2 x i32>] [[B]].coerce, [3 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.uint32x2x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.uint32x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.uint32x2x3_t, %struct.uint32x2x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.uint32x2x3_t, %struct.uint32x2x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint32x2x3_t [[TMP16]]			// CHECK: ret %struct.uint32x2x3_t [[TMP16]]
	uint32x2x3_t test_vld3_lane_u32(uint32_t *a, uint32x2x3_t b) {			uint32x2x3_t test_vld3_lane_u32(uint32_t *a, uint32x2x3_t b) {
	return vld3_lane_u32(a, b, 1);			return vld3_lane_u32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.uint64x1x3_t @test_vld3_lane_u64(i64* %a, [3 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint64x1x3_t @test_vld3_lane_u64(i64* %a, [3 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x1x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x1x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint64x1x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x1x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x3_t, %struct.uint64x1x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x3_t, %struct.uint64x1x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.uint64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.uint64x1x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.uint64x1x3_t, %struct.uint64x1x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.uint64x1x3_t, %struct.uint64x1x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint64x1x3_t [[TMP16]]			// CHECK: ret %struct.uint64x1x3_t [[TMP16]]
	uint64x1x3_t test_vld3_lane_u64(uint64_t *a, uint64x1x3_t b) {			uint64x1x3_t test_vld3_lane_u64(uint64_t *a, uint64x1x3_t b) {
	return vld3_lane_u64(a, b, 0);			return vld3_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.int8x8x3_t @test_vld3_lane_s8(i8* %a, [3 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int8x8x3_t @test_vld3_lane_s8(i8* %a, [3 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x8x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int8x8x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int8x8x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x3_t [[B]] to i8*
	Show All 15 Lines
	// CHECK: [[TMP8:%.]] = bitcast %struct.int8x8x3_t [[__RET]] to i8*			// CHECK: [[TMP8:%.]] = bitcast %struct.int8x8x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP7]], i8* align 8 [[TMP8]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP7]], i8* align 8 [[TMP8]], i64 24, i1 false)
	// CHECK: [[TMP9:%.]] = load %struct.int8x8x3_t, %struct.int8x8x3_t [[RETVAL]], align 8			// CHECK: [[TMP9:%.]] = load %struct.int8x8x3_t, %struct.int8x8x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.int8x8x3_t [[TMP9]]			// CHECK: ret %struct.int8x8x3_t [[TMP9]]
	int8x8x3_t test_vld3_lane_s8(int8_t *a, int8x8x3_t b) {			int8x8x3_t test_vld3_lane_s8(int8_t *a, int8x8x3_t b) {
	return vld3_lane_s8(a, b, 7);			return vld3_lane_s8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.int16x4x3_t @test_vld3_lane_s16(i16* %a, [3 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int16x4x3_t @test_vld3_lane_s16(i16* %a, [3 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x4x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x4x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int16x4x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int16x4x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int16x4x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int16x4x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x3_t, %struct.int16x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x3_t, %struct.int16x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.int16x4x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.int16x4x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.int16x4x3_t, %struct.int16x4x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.int16x4x3_t, %struct.int16x4x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.int16x4x3_t [[TMP16]]			// CHECK: ret %struct.int16x4x3_t [[TMP16]]
	int16x4x3_t test_vld3_lane_s16(int16_t *a, int16x4x3_t b) {			int16x4x3_t test_vld3_lane_s16(int16_t *a, int16x4x3_t b) {
	return vld3_lane_s16(a, b, 3);			return vld3_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.int32x2x3_t @test_vld3_lane_s32(i32* %a, [3 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int32x2x3_t @test_vld3_lane_s32(i32* %a, [3 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x2x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x2x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int32x2x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int32x2x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x3_t, %struct.int32x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x3_t, %struct.int32x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i32>] [[B]].coerce, [3 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <2 x i32>] [[B]].coerce, [3 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.int32x2x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.int32x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.int32x2x3_t, %struct.int32x2x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.int32x2x3_t, %struct.int32x2x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.int32x2x3_t [[TMP16]]			// CHECK: ret %struct.int32x2x3_t [[TMP16]]
	int32x2x3_t test_vld3_lane_s32(int32_t *a, int32x2x3_t b) {			int32x2x3_t test_vld3_lane_s32(int32_t *a, int32x2x3_t b) {
	return vld3_lane_s32(a, b, 1);			return vld3_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.int64x1x3_t @test_vld3_lane_s64(i64* %a, [3 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int64x1x3_t @test_vld3_lane_s64(i64* %a, [3 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x1x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x1x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int64x1x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x1x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x3_t, %struct.int64x1x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x3_t, %struct.int64x1x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.int64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.int64x1x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.int64x1x3_t, %struct.int64x1x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.int64x1x3_t, %struct.int64x1x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.int64x1x3_t [[TMP16]]			// CHECK: ret %struct.int64x1x3_t [[TMP16]]
	int64x1x3_t test_vld3_lane_s64(int64_t *a, int64x1x3_t b) {			int64x1x3_t test_vld3_lane_s64(int64_t *a, int64x1x3_t b) {
	return vld3_lane_s64(a, b, 0);			return vld3_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.float16x4x3_t @test_vld3_lane_f16(half* %a, [3 x <4 x half>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float16x4x3_t @test_vld3_lane_f16(half* %a, [3 x <4 x half>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x half>] [[B]].coerce, [3 x <4 x half>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <4 x half>] [[B]].coerce, [3 x <4 x half>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.float16x4x3_t [[TMP16]]			// CHECK: ret %struct.float16x4x3_t [[TMP16]]
	float16x4x3_t test_vld3_lane_f16(float16_t *a, float16x4x3_t b) {			float16x4x3_t test_vld3_lane_f16(float16_t *a, float16x4x3_t b) {
	return vld3_lane_f16(a, b, 3);			return vld3_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.float32x2x3_t @test_vld3_lane_f32(float* %a, [3 x <2 x float>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float32x2x3_t @test_vld3_lane_f32(float* %a, [3 x <2 x float>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x3_t, %struct.float32x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x3_t, %struct.float32x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x float>] [[B]].coerce, [3 x <2 x float>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <2 x float>] [[B]].coerce, [3 x <2 x float>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.float32x2x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.float32x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.float32x2x3_t, %struct.float32x2x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.float32x2x3_t, %struct.float32x2x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.float32x2x3_t [[TMP16]]			// CHECK: ret %struct.float32x2x3_t [[TMP16]]
	float32x2x3_t test_vld3_lane_f32(float32_t *a, float32x2x3_t b) {			float32x2x3_t test_vld3_lane_f32(float32_t *a, float32x2x3_t b) {
	return vld3_lane_f32(a, b, 1);			return vld3_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.float64x1x3_t @test_vld3_lane_f64(double* %a, [3 x <1 x double>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float64x1x3_t @test_vld3_lane_f64(double* %a, [3 x <1 x double>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float64x1x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x3_t, %struct.float64x1x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x3_t, %struct.float64x1x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <1 x double>] [[B]].coerce, [3 x <1 x double>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x double>] [[B]].coerce, [3 x <1 x double>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.float64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.float64x1x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.float64x1x3_t, %struct.float64x1x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.float64x1x3_t, %struct.float64x1x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.float64x1x3_t [[TMP16]]			// CHECK: ret %struct.float64x1x3_t [[TMP16]]
	float64x1x3_t test_vld3_lane_f64(float64_t *a, float64x1x3_t b) {			float64x1x3_t test_vld3_lane_f64(float64_t *a, float64x1x3_t b) {
	return vld3_lane_f64(a, b, 0);			return vld3_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.poly8x8x3_t @test_vld3_lane_p8(i8* %a, [3 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly8x8x3_t @test_vld3_lane_p8(i8* %a, [3 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x8x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly8x8x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly8x8x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x3_t [[B]] to i8*
	Show All 15 Lines
	// CHECK: [[TMP8:%.]] = bitcast %struct.poly8x8x3_t [[__RET]] to i8*			// CHECK: [[TMP8:%.]] = bitcast %struct.poly8x8x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP7]], i8* align 8 [[TMP8]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP7]], i8* align 8 [[TMP8]], i64 24, i1 false)
	// CHECK: [[TMP9:%.]] = load %struct.poly8x8x3_t, %struct.poly8x8x3_t [[RETVAL]], align 8			// CHECK: [[TMP9:%.]] = load %struct.poly8x8x3_t, %struct.poly8x8x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly8x8x3_t [[TMP9]]			// CHECK: ret %struct.poly8x8x3_t [[TMP9]]
	poly8x8x3_t test_vld3_lane_p8(poly8_t *a, poly8x8x3_t b) {			poly8x8x3_t test_vld3_lane_p8(poly8_t *a, poly8x8x3_t b) {
	return vld3_lane_p8(a, b, 7);			return vld3_lane_p8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.poly16x4x3_t @test_vld3_lane_p16(i16* %a, [3 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly16x4x3_t @test_vld3_lane_p16(i16* %a, [3 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x4x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x4x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly16x4x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly16x4x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly16x4x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly16x4x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x3_t, %struct.poly16x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x3_t, %struct.poly16x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.poly16x4x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.poly16x4x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.poly16x4x3_t, %struct.poly16x4x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.poly16x4x3_t, %struct.poly16x4x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly16x4x3_t [[TMP16]]			// CHECK: ret %struct.poly16x4x3_t [[TMP16]]
	poly16x4x3_t test_vld3_lane_p16(poly16_t *a, poly16x4x3_t b) {			poly16x4x3_t test_vld3_lane_p16(poly16_t *a, poly16x4x3_t b) {
	return vld3_lane_p16(a, b, 3);			return vld3_lane_p16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.poly64x1x3_t @test_vld3_lane_p64(i64* %a, [3 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly64x1x3_t @test_vld3_lane_p64(i64* %a, [3 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x3_t, %struct.poly64x1x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x3_t, %struct.poly64x1x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x3_t [[B]] to i8*
	Show All 22 Lines
	// CHECK: [[TMP15:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP15:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP14]], i8* align 8 [[TMP15]], i64 24, i1 false)
	// CHECK: [[TMP16:%.]] = load %struct.poly64x1x3_t, %struct.poly64x1x3_t [[RETVAL]], align 8			// CHECK: [[TMP16:%.]] = load %struct.poly64x1x3_t, %struct.poly64x1x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly64x1x3_t [[TMP16]]			// CHECK: ret %struct.poly64x1x3_t [[TMP16]]
	poly64x1x3_t test_vld3_lane_p64(poly64_t *a, poly64x1x3_t b) {			poly64x1x3_t test_vld3_lane_p64(poly64_t *a, poly64x1x3_t b) {
	return vld3_lane_p64(a, b, 0);			return vld3_lane_p64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.uint8x16x4_t @test_vld4q_lane_u8(i8* %a, [4 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint8x16x4_t @test_vld4q_lane_u8(i8* %a, [4 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x4_t [[B]] to i8*
	Show All 18 Lines
	// CHECK: [[TMP9:%.]] = bitcast %struct.uint8x16x4_t [[__RET]] to i8*			// CHECK: [[TMP9:%.]] = bitcast %struct.uint8x16x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP8]], i8* align 16 [[TMP9]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP8]], i8* align 16 [[TMP9]], i64 64, i1 false)
	// CHECK: [[TMP10:%.]] = load %struct.uint8x16x4_t, %struct.uint8x16x4_t [[RETVAL]], align 16			// CHECK: [[TMP10:%.]] = load %struct.uint8x16x4_t, %struct.uint8x16x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint8x16x4_t [[TMP10]]			// CHECK: ret %struct.uint8x16x4_t [[TMP10]]
	uint8x16x4_t test_vld4q_lane_u8(uint8_t *a, uint8x16x4_t b) {			uint8x16x4_t test_vld4q_lane_u8(uint8_t *a, uint8x16x4_t b) {
	return vld4q_lane_u8(a, b, 15);			return vld4q_lane_u8(a, b, 15);
	}			}

	// CHECK-LABEL: define %struct.uint16x8x4_t @test_vld4q_lane_u16(i16* %a, [4 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint16x8x4_t @test_vld4q_lane_u16(i16* %a, [4 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x8x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x8x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint16x8x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint16x8x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint16x8x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint16x8x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x4_t, %struct.uint16x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x4_t, %struct.uint16x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.uint16x8x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.uint16x8x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.uint16x8x4_t, %struct.uint16x8x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.uint16x8x4_t, %struct.uint16x8x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint16x8x4_t [[TMP19]]			// CHECK: ret %struct.uint16x8x4_t [[TMP19]]
	uint16x8x4_t test_vld4q_lane_u16(uint16_t *a, uint16x8x4_t b) {			uint16x8x4_t test_vld4q_lane_u16(uint16_t *a, uint16x8x4_t b) {
	return vld4q_lane_u16(a, b, 7);			return vld4q_lane_u16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.uint32x4x4_t @test_vld4q_lane_u32(i32* %a, [4 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint32x4x4_t @test_vld4q_lane_u32(i32* %a, [4 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x4x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x4x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint32x4x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint32x4x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x4_t, %struct.uint32x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x4_t, %struct.uint32x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i32>] [[B]].coerce, [4 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <4 x i32>] [[B]].coerce, [4 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.uint32x4x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.uint32x4x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.uint32x4x4_t, %struct.uint32x4x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.uint32x4x4_t, %struct.uint32x4x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint32x4x4_t [[TMP19]]			// CHECK: ret %struct.uint32x4x4_t [[TMP19]]
	uint32x4x4_t test_vld4q_lane_u32(uint32_t *a, uint32x4x4_t b) {			uint32x4x4_t test_vld4q_lane_u32(uint32_t *a, uint32x4x4_t b) {
	return vld4q_lane_u32(a, b, 3);			return vld4q_lane_u32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.uint64x2x4_t @test_vld4q_lane_u64(i64* %a, [4 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint64x2x4_t @test_vld4q_lane_u64(i64* %a, [4 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x2x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint64x2x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x4_t, %struct.uint64x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x4_t, %struct.uint64x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.uint64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.uint64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.uint64x2x4_t, %struct.uint64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.uint64x2x4_t, %struct.uint64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.uint64x2x4_t [[TMP19]]			// CHECK: ret %struct.uint64x2x4_t [[TMP19]]
	uint64x2x4_t test_vld4q_lane_u64(uint64_t *a, uint64x2x4_t b) {			uint64x2x4_t test_vld4q_lane_u64(uint64_t *a, uint64x2x4_t b) {
	return vld4q_lane_u64(a, b, 1);			return vld4q_lane_u64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.int8x16x4_t @test_vld4q_lane_s8(i8* %a, [4 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int8x16x4_t @test_vld4q_lane_s8(i8* %a, [4 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x4_t [[B]] to i8*
	Show All 18 Lines
	// CHECK: [[TMP9:%.]] = bitcast %struct.int8x16x4_t [[__RET]] to i8*			// CHECK: [[TMP9:%.]] = bitcast %struct.int8x16x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP8]], i8* align 16 [[TMP9]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP8]], i8* align 16 [[TMP9]], i64 64, i1 false)
	// CHECK: [[TMP10:%.]] = load %struct.int8x16x4_t, %struct.int8x16x4_t [[RETVAL]], align 16			// CHECK: [[TMP10:%.]] = load %struct.int8x16x4_t, %struct.int8x16x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.int8x16x4_t [[TMP10]]			// CHECK: ret %struct.int8x16x4_t [[TMP10]]
	int8x16x4_t test_vld4q_lane_s8(int8_t *a, int8x16x4_t b) {			int8x16x4_t test_vld4q_lane_s8(int8_t *a, int8x16x4_t b) {
	return vld4q_lane_s8(a, b, 15);			return vld4q_lane_s8(a, b, 15);
	}			}

	// CHECK-LABEL: define %struct.int16x8x4_t @test_vld4q_lane_s16(i16* %a, [4 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int16x8x4_t @test_vld4q_lane_s16(i16* %a, [4 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x8x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x8x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int16x8x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int16x8x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int16x8x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int16x8x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x4_t, %struct.int16x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x4_t, %struct.int16x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.int16x8x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.int16x8x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.int16x8x4_t, %struct.int16x8x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.int16x8x4_t, %struct.int16x8x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.int16x8x4_t [[TMP19]]			// CHECK: ret %struct.int16x8x4_t [[TMP19]]
	int16x8x4_t test_vld4q_lane_s16(int16_t *a, int16x8x4_t b) {			int16x8x4_t test_vld4q_lane_s16(int16_t *a, int16x8x4_t b) {
	return vld4q_lane_s16(a, b, 7);			return vld4q_lane_s16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.int32x4x4_t @test_vld4q_lane_s32(i32* %a, [4 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int32x4x4_t @test_vld4q_lane_s32(i32* %a, [4 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x4x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x4x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int32x4x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int32x4x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x4_t, %struct.int32x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x4_t, %struct.int32x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i32>] [[B]].coerce, [4 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <4 x i32>] [[B]].coerce, [4 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.int32x4x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.int32x4x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.int32x4x4_t, %struct.int32x4x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.int32x4x4_t, %struct.int32x4x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.int32x4x4_t [[TMP19]]			// CHECK: ret %struct.int32x4x4_t [[TMP19]]
	int32x4x4_t test_vld4q_lane_s32(int32_t *a, int32x4x4_t b) {			int32x4x4_t test_vld4q_lane_s32(int32_t *a, int32x4x4_t b) {
	return vld4q_lane_s32(a, b, 3);			return vld4q_lane_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.int64x2x4_t @test_vld4q_lane_s64(i64* %a, [4 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int64x2x4_t @test_vld4q_lane_s64(i64* %a, [4 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x2x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int64x2x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.int64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x4_t, %struct.int64x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x4_t, %struct.int64x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.int64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.int64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.int64x2x4_t, %struct.int64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.int64x2x4_t, %struct.int64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.int64x2x4_t [[TMP19]]			// CHECK: ret %struct.int64x2x4_t [[TMP19]]
	int64x2x4_t test_vld4q_lane_s64(int64_t *a, int64x2x4_t b) {			int64x2x4_t test_vld4q_lane_s64(int64_t *a, int64x2x4_t b) {
	return vld4q_lane_s64(a, b, 1);			return vld4q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.float16x8x4_t @test_vld4q_lane_f16(half* %a, [4 x <8 x half>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float16x8x4_t @test_vld4q_lane_f16(half* %a, [4 x <8 x half>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float16x8x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float16x8x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x half>] [[B]].coerce, [4 x <8 x half>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <8 x half>] [[B]].coerce, [4 x <8 x half>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.float16x8x4_t [[TMP19]]			// CHECK: ret %struct.float16x8x4_t [[TMP19]]
	float16x8x4_t test_vld4q_lane_f16(float16_t *a, float16x8x4_t b) {			float16x8x4_t test_vld4q_lane_f16(float16_t *a, float16x8x4_t b) {
	return vld4q_lane_f16(a, b, 7);			return vld4q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.float32x4x4_t @test_vld4q_lane_f32(float* %a, [4 x <4 x float>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float32x4x4_t @test_vld4q_lane_f32(float* %a, [4 x <4 x float>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x4_t, %struct.float32x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x4_t, %struct.float32x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x float>] [[B]].coerce, [4 x <4 x float>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <4 x float>] [[B]].coerce, [4 x <4 x float>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.float32x4x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.float32x4x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.float32x4x4_t, %struct.float32x4x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.float32x4x4_t, %struct.float32x4x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.float32x4x4_t [[TMP19]]			// CHECK: ret %struct.float32x4x4_t [[TMP19]]
	float32x4x4_t test_vld4q_lane_f32(float32_t *a, float32x4x4_t b) {			float32x4x4_t test_vld4q_lane_f32(float32_t *a, float32x4x4_t b) {
	return vld4q_lane_f32(a, b, 3);			return vld4q_lane_f32(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.float64x2x4_t @test_vld4q_lane_f64(double* %a, [4 x <2 x double>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float64x2x4_t @test_vld4q_lane_f64(double* %a, [4 x <2 x double>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x2x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.float64x2x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x4_t, %struct.float64x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x4_t, %struct.float64x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x double>] [[B]].coerce, [4 x <2 x double>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x double>] [[B]].coerce, [4 x <2 x double>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.float64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.float64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.float64x2x4_t, %struct.float64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.float64x2x4_t, %struct.float64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.float64x2x4_t [[TMP19]]			// CHECK: ret %struct.float64x2x4_t [[TMP19]]
	float64x2x4_t test_vld4q_lane_f64(float64_t *a, float64x2x4_t b) {			float64x2x4_t test_vld4q_lane_f64(float64_t *a, float64x2x4_t b) {
	return vld4q_lane_f64(a, b, 1);			return vld4q_lane_f64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.poly8x16x4_t @test_vld4q_lane_p8(i8* %a, [4 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly8x16x4_t @test_vld4q_lane_p8(i8* %a, [4 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x4_t [[B]] to i8*
	Show All 18 Lines
	// CHECK: [[TMP9:%.]] = bitcast %struct.poly8x16x4_t [[__RET]] to i8*			// CHECK: [[TMP9:%.]] = bitcast %struct.poly8x16x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP8]], i8* align 16 [[TMP9]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP8]], i8* align 16 [[TMP9]], i64 64, i1 false)
	// CHECK: [[TMP10:%.]] = load %struct.poly8x16x4_t, %struct.poly8x16x4_t [[RETVAL]], align 16			// CHECK: [[TMP10:%.]] = load %struct.poly8x16x4_t, %struct.poly8x16x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly8x16x4_t [[TMP10]]			// CHECK: ret %struct.poly8x16x4_t [[TMP10]]
	poly8x16x4_t test_vld4q_lane_p8(poly8_t *a, poly8x16x4_t b) {			poly8x16x4_t test_vld4q_lane_p8(poly8_t *a, poly8x16x4_t b) {
	return vld4q_lane_p8(a, b, 15);			return vld4q_lane_p8(a, b, 15);
	}			}

	// CHECK-LABEL: define %struct.poly16x8x4_t @test_vld4q_lane_p16(i16* %a, [4 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly16x8x4_t @test_vld4q_lane_p16(i16* %a, [4 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x8x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x8x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly16x8x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly16x8x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly16x8x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly16x8x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x4_t, %struct.poly16x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x4_t, %struct.poly16x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.poly16x8x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.poly16x8x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.poly16x8x4_t, %struct.poly16x8x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.poly16x8x4_t, %struct.poly16x8x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly16x8x4_t [[TMP19]]			// CHECK: ret %struct.poly16x8x4_t [[TMP19]]
	poly16x8x4_t test_vld4q_lane_p16(poly16_t *a, poly16x8x4_t b) {			poly16x8x4_t test_vld4q_lane_p16(poly16_t *a, poly16x8x4_t b) {
	return vld4q_lane_p16(a, b, 7);			return vld4q_lane_p16(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.poly64x2x4_t @test_vld4q_lane_p64(i64* %a, [4 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly64x2x4_t @test_vld4q_lane_p64(i64* %a, [4 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x4_t, %struct.poly64x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x4_t, %struct.poly64x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP17]], i8* align 16 [[TMP18]], i64 64, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.poly64x2x4_t, %struct.poly64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP19:%.]] = load %struct.poly64x2x4_t, %struct.poly64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x4_t [[TMP19]]			// CHECK: ret %struct.poly64x2x4_t [[TMP19]]
	poly64x2x4_t test_vld4q_lane_p64(poly64_t *a, poly64x2x4_t b) {			poly64x2x4_t test_vld4q_lane_p64(poly64_t *a, poly64x2x4_t b) {
	return vld4q_lane_p64(a, b, 1);			return vld4q_lane_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.uint8x8x4_t @test_vld4_lane_u8(i8* %a, [4 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint8x8x4_t @test_vld4_lane_u8(i8* %a, [4 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x8x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint8x8x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint8x8x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint8x8x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint8x8x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint8x8x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x4_t [[B]] to i8*
	Show All 18 Lines
	// CHECK: [[TMP9:%.]] = bitcast %struct.uint8x8x4_t [[__RET]] to i8*			// CHECK: [[TMP9:%.]] = bitcast %struct.uint8x8x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP8]], i8* align 8 [[TMP9]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP8]], i8* align 8 [[TMP9]], i64 32, i1 false)
	// CHECK: [[TMP10:%.]] = load %struct.uint8x8x4_t, %struct.uint8x8x4_t [[RETVAL]], align 8			// CHECK: [[TMP10:%.]] = load %struct.uint8x8x4_t, %struct.uint8x8x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint8x8x4_t [[TMP10]]			// CHECK: ret %struct.uint8x8x4_t [[TMP10]]
	uint8x8x4_t test_vld4_lane_u8(uint8_t *a, uint8x8x4_t b) {			uint8x8x4_t test_vld4_lane_u8(uint8_t *a, uint8x8x4_t b) {
	return vld4_lane_u8(a, b, 7);			return vld4_lane_u8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.uint16x4x4_t @test_vld4_lane_u16(i16* %a, [4 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint16x4x4_t @test_vld4_lane_u16(i16* %a, [4 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x4x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint16x4x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint16x4x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint16x4x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint16x4x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint16x4x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x4_t, %struct.uint16x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x4_t, %struct.uint16x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.uint16x4x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.uint16x4x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.uint16x4x4_t, %struct.uint16x4x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.uint16x4x4_t, %struct.uint16x4x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint16x4x4_t [[TMP19]]			// CHECK: ret %struct.uint16x4x4_t [[TMP19]]
	uint16x4x4_t test_vld4_lane_u16(uint16_t *a, uint16x4x4_t b) {			uint16x4x4_t test_vld4_lane_u16(uint16_t *a, uint16x4x4_t b) {
	return vld4_lane_u16(a, b, 3);			return vld4_lane_u16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.uint32x2x4_t @test_vld4_lane_u32(i32* %a, [4 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint32x2x4_t @test_vld4_lane_u32(i32* %a, [4 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x2x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint32x2x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint32x2x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint32x2x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x4_t, %struct.uint32x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x4_t, %struct.uint32x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i32>] [[B]].coerce, [4 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <2 x i32>] [[B]].coerce, [4 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.uint32x2x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.uint32x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.uint32x2x4_t, %struct.uint32x2x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.uint32x2x4_t, %struct.uint32x2x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint32x2x4_t [[TMP19]]			// CHECK: ret %struct.uint32x2x4_t [[TMP19]]
	uint32x2x4_t test_vld4_lane_u32(uint32_t *a, uint32x2x4_t b) {			uint32x2x4_t test_vld4_lane_u32(uint32_t *a, uint32x2x4_t b) {
	return vld4_lane_u32(a, b, 1);			return vld4_lane_u32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.uint64x1x4_t @test_vld4_lane_u64(i64* %a, [4 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.uint64x1x4_t @test_vld4_lane_u64(i64* %a, [4 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x1x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.uint64x1x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint64x1x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.uint64x1x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.uint64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x4_t, %struct.uint64x1x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x4_t, %struct.uint64x1x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.uint64x1x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.uint64x1x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.uint64x1x4_t, %struct.uint64x1x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.uint64x1x4_t, %struct.uint64x1x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.uint64x1x4_t [[TMP19]]			// CHECK: ret %struct.uint64x1x4_t [[TMP19]]
	uint64x1x4_t test_vld4_lane_u64(uint64_t *a, uint64x1x4_t b) {			uint64x1x4_t test_vld4_lane_u64(uint64_t *a, uint64x1x4_t b) {
	return vld4_lane_u64(a, b, 0);			return vld4_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.int8x8x4_t @test_vld4_lane_s8(i8* %a, [4 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int8x8x4_t @test_vld4_lane_s8(i8* %a, [4 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x8x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int8x8x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int8x8x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int8x8x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int8x8x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int8x8x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x4_t [[B]] to i8*
	Show All 18 Lines
	// CHECK: [[TMP9:%.]] = bitcast %struct.int8x8x4_t [[__RET]] to i8*			// CHECK: [[TMP9:%.]] = bitcast %struct.int8x8x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP8]], i8* align 8 [[TMP9]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP8]], i8* align 8 [[TMP9]], i64 32, i1 false)
	// CHECK: [[TMP10:%.]] = load %struct.int8x8x4_t, %struct.int8x8x4_t [[RETVAL]], align 8			// CHECK: [[TMP10:%.]] = load %struct.int8x8x4_t, %struct.int8x8x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.int8x8x4_t [[TMP10]]			// CHECK: ret %struct.int8x8x4_t [[TMP10]]
	int8x8x4_t test_vld4_lane_s8(int8_t *a, int8x8x4_t b) {			int8x8x4_t test_vld4_lane_s8(int8_t *a, int8x8x4_t b) {
	return vld4_lane_s8(a, b, 7);			return vld4_lane_s8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.int16x4x4_t @test_vld4_lane_s16(i16* %a, [4 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int16x4x4_t @test_vld4_lane_s16(i16* %a, [4 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x4x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int16x4x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int16x4x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int16x4x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int16x4x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int16x4x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x4_t, %struct.int16x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x4_t, %struct.int16x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.int16x4x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.int16x4x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.int16x4x4_t, %struct.int16x4x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.int16x4x4_t, %struct.int16x4x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.int16x4x4_t [[TMP19]]			// CHECK: ret %struct.int16x4x4_t [[TMP19]]
	int16x4x4_t test_vld4_lane_s16(int16_t *a, int16x4x4_t b) {			int16x4x4_t test_vld4_lane_s16(int16_t *a, int16x4x4_t b) {
	return vld4_lane_s16(a, b, 3);			return vld4_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.int32x2x4_t @test_vld4_lane_s32(i32* %a, [4 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int32x2x4_t @test_vld4_lane_s32(i32* %a, [4 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x2x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int32x2x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int32x2x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int32x2x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x4_t, %struct.int32x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x4_t, %struct.int32x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i32>] [[B]].coerce, [4 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <2 x i32>] [[B]].coerce, [4 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.int32x2x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.int32x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.int32x2x4_t, %struct.int32x2x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.int32x2x4_t, %struct.int32x2x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.int32x2x4_t [[TMP19]]			// CHECK: ret %struct.int32x2x4_t [[TMP19]]
	int32x2x4_t test_vld4_lane_s32(int32_t *a, int32x2x4_t b) {			int32x2x4_t test_vld4_lane_s32(int32_t *a, int32x2x4_t b) {
	return vld4_lane_s32(a, b, 1);			return vld4_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.int64x1x4_t @test_vld4_lane_s64(i64* %a, [4 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.int64x1x4_t @test_vld4_lane_s64(i64* %a, [4 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x1x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.int64x1x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int64x1x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.int64x1x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.int64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x4_t, %struct.int64x1x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x4_t, %struct.int64x1x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.int64x1x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.int64x1x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.int64x1x4_t, %struct.int64x1x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.int64x1x4_t, %struct.int64x1x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.int64x1x4_t [[TMP19]]			// CHECK: ret %struct.int64x1x4_t [[TMP19]]
	int64x1x4_t test_vld4_lane_s64(int64_t *a, int64x1x4_t b) {			int64x1x4_t test_vld4_lane_s64(int64_t *a, int64x1x4_t b) {
	return vld4_lane_s64(a, b, 0);			return vld4_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.float16x4x4_t @test_vld4_lane_f16(half* %a, [4 x <4 x half>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float16x4x4_t @test_vld4_lane_f16(half* %a, [4 x <4 x half>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x half>] [[B]].coerce, [4 x <4 x half>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <4 x half>] [[B]].coerce, [4 x <4 x half>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.float16x4x4_t [[TMP19]]			// CHECK: ret %struct.float16x4x4_t [[TMP19]]
	float16x4x4_t test_vld4_lane_f16(float16_t *a, float16x4x4_t b) {			float16x4x4_t test_vld4_lane_f16(float16_t *a, float16x4x4_t b) {
	return vld4_lane_f16(a, b, 3);			return vld4_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.float32x2x4_t @test_vld4_lane_f32(float* %a, [4 x <2 x float>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float32x2x4_t @test_vld4_lane_f32(float* %a, [4 x <2 x float>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x4_t, %struct.float32x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x4_t, %struct.float32x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x float>] [[B]].coerce, [4 x <2 x float>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <2 x float>] [[B]].coerce, [4 x <2 x float>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.float32x2x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.float32x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.float32x2x4_t, %struct.float32x2x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.float32x2x4_t, %struct.float32x2x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.float32x2x4_t [[TMP19]]			// CHECK: ret %struct.float32x2x4_t [[TMP19]]
	float32x2x4_t test_vld4_lane_f32(float32_t *a, float32x2x4_t b) {			float32x2x4_t test_vld4_lane_f32(float32_t *a, float32x2x4_t b) {
	return vld4_lane_f32(a, b, 1);			return vld4_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define %struct.float64x1x4_t @test_vld4_lane_f64(double* %a, [4 x <1 x double>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.float64x1x4_t @test_vld4_lane_f64(double* %a, [4 x <1 x double>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.float64x1x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.float64x1x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x4_t, %struct.float64x1x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x4_t, %struct.float64x1x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <1 x double>] [[B]].coerce, [4 x <1 x double>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x double>] [[B]].coerce, [4 x <1 x double>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.float64x1x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.float64x1x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.float64x1x4_t, %struct.float64x1x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.float64x1x4_t, %struct.float64x1x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.float64x1x4_t [[TMP19]]			// CHECK: ret %struct.float64x1x4_t [[TMP19]]
	float64x1x4_t test_vld4_lane_f64(float64_t *a, float64x1x4_t b) {			float64x1x4_t test_vld4_lane_f64(float64_t *a, float64x1x4_t b) {
	return vld4_lane_f64(a, b, 0);			return vld4_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define %struct.poly8x8x4_t @test_vld4_lane_p8(i8* %a, [4 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly8x8x4_t @test_vld4_lane_p8(i8* %a, [4 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x8x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly8x8x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly8x8x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly8x8x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly8x8x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly8x8x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x4_t [[B]] to i8*
	Show All 18 Lines
	// CHECK: [[TMP9:%.]] = bitcast %struct.poly8x8x4_t [[__RET]] to i8*			// CHECK: [[TMP9:%.]] = bitcast %struct.poly8x8x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP8]], i8* align 8 [[TMP9]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP8]], i8* align 8 [[TMP9]], i64 32, i1 false)
	// CHECK: [[TMP10:%.]] = load %struct.poly8x8x4_t, %struct.poly8x8x4_t [[RETVAL]], align 8			// CHECK: [[TMP10:%.]] = load %struct.poly8x8x4_t, %struct.poly8x8x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly8x8x4_t [[TMP10]]			// CHECK: ret %struct.poly8x8x4_t [[TMP10]]
	poly8x8x4_t test_vld4_lane_p8(poly8_t *a, poly8x8x4_t b) {			poly8x8x4_t test_vld4_lane_p8(poly8_t *a, poly8x8x4_t b) {
	return vld4_lane_p8(a, b, 7);			return vld4_lane_p8(a, b, 7);
	}			}

	// CHECK-LABEL: define %struct.poly16x4x4_t @test_vld4_lane_p16(i16* %a, [4 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly16x4x4_t @test_vld4_lane_p16(i16* %a, [4 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x4x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly16x4x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly16x4x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly16x4x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly16x4x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly16x4x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x4_t, %struct.poly16x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x4_t, %struct.poly16x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x4_t [[B]] to i8*
	Show All 27 Lines
	// CHECK: [[TMP18:%.]] = bitcast %struct.poly16x4x4_t [[__RET]] to i8*			// CHECK: [[TMP18:%.]] = bitcast %struct.poly16x4x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP17]], i8* align 8 [[TMP18]], i64 32, i1 false)
	// CHECK: [[TMP19:%.]] = load %struct.poly16x4x4_t, %struct.poly16x4x4_t [[RETVAL]], align 8			// CHECK: [[TMP19:%.]] = load %struct.poly16x4x4_t, %struct.poly16x4x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly16x4x4_t [[TMP19]]			// CHECK: ret %struct.poly16x4x4_t [[TMP19]]
	poly16x4x4_t test_vld4_lane_p16(poly16_t *a, poly16x4x4_t b) {			poly16x4x4_t test_vld4_lane_p16(poly16_t *a, poly16x4x4_t b) {
	return vld4_lane_p16(a, b, 3);			return vld4_lane_p16(a, b, 3);
	}			}

	// CHECK-LABEL: define %struct.poly64x1x4_t @test_vld4_lane_p64(i64* %a, [4 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define %struct.poly64x1x4_t @test_vld4_lane_p64(i64* %a, [4 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x4_t, %struct.poly64x1x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x4_t, %struct.poly64x1x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x4_t [[B]] to i8*
	▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
	// CHECK: [[TMP3:%.*]] = extractelement <2 x i64> [[TMP2]], i32 1			// CHECK: [[TMP3:%.*]] = extractelement <2 x i64> [[TMP2]], i32 1
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: store i64 [[TMP3]], i64* [[TMP4]]			// CHECK: store i64 [[TMP3]], i64* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_lane_p64(poly64_t *a, poly64x2_t b) {			void test_vst1q_lane_p64(poly64_t *a, poly64x2_t b) {
	vst1q_lane_p64(a, b, 1);			vst1q_lane_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_u8(i8* %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_u8(i8* %a, <8 x i8> %b) #1 {
	// CHECK: [[TMP0:%.*]] = extractelement <8 x i8> %b, i32 7			// CHECK: [[TMP0:%.*]] = extractelement <8 x i8> %b, i32 7
	// CHECK: store i8 [[TMP0]], i8* %a			// CHECK: store i8 [[TMP0]], i8* %a
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_u8(uint8_t *a, uint8x8_t b) {			void test_vst1_lane_u8(uint8_t *a, uint8x8_t b) {
	vst1_lane_u8(a, b, 7);			vst1_lane_u8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_u16(i16* %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_u16(i16* %a, <4 x i16> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3			// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: store i16 [[TMP3]], i16* [[TMP4]]			// CHECK: store i16 [[TMP3]], i16* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_u16(uint16_t *a, uint16x4_t b) {			void test_vst1_lane_u16(uint16_t *a, uint16x4_t b) {
	vst1_lane_u16(a, b, 3);			vst1_lane_u16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_u32(i32* %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_u32(i32* %a, <2 x i32> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x i32> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x i32> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1			// CHECK: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i32*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i32*
	// CHECK: store i32 [[TMP3]], i32* [[TMP4]]			// CHECK: store i32 [[TMP3]], i32* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_u32(uint32_t *a, uint32x2_t b) {			void test_vst1_lane_u32(uint32_t *a, uint32x2_t b) {
	vst1_lane_u32(a, b, 1);			vst1_lane_u32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_u64(i64* %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_u64(i64* %a, <1 x i64> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: [[TMP3:%.*]] = extractelement <1 x i64> [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = extractelement <1 x i64> [[TMP2]], i32 0
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: store i64 [[TMP3]], i64* [[TMP4]]			// CHECK: store i64 [[TMP3]], i64* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_u64(uint64_t *a, uint64x1_t b) {			void test_vst1_lane_u64(uint64_t *a, uint64x1_t b) {
	vst1_lane_u64(a, b, 0);			vst1_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_s8(i8* %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_s8(i8* %a, <8 x i8> %b) #1 {
	// CHECK: [[TMP0:%.*]] = extractelement <8 x i8> %b, i32 7			// CHECK: [[TMP0:%.*]] = extractelement <8 x i8> %b, i32 7
	// CHECK: store i8 [[TMP0]], i8* %a			// CHECK: store i8 [[TMP0]], i8* %a
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_s8(int8_t *a, int8x8_t b) {			void test_vst1_lane_s8(int8_t *a, int8x8_t b) {
	vst1_lane_s8(a, b, 7);			vst1_lane_s8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_s16(i16* %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_s16(i16* %a, <4 x i16> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3			// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: store i16 [[TMP3]], i16* [[TMP4]]			// CHECK: store i16 [[TMP3]], i16* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_s16(int16_t *a, int16x4_t b) {			void test_vst1_lane_s16(int16_t *a, int16x4_t b) {
	vst1_lane_s16(a, b, 3);			vst1_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_s32(i32* %a, <2 x i32> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_s32(i32* %a, <2 x i32> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i32 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x i32> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x i32> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1			// CHECK: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i32*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i32*
	// CHECK: store i32 [[TMP3]], i32* [[TMP4]]			// CHECK: store i32 [[TMP3]], i32* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_s32(int32_t *a, int32x2_t b) {			void test_vst1_lane_s32(int32_t *a, int32x2_t b) {
	vst1_lane_s32(a, b, 1);			vst1_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_s64(i64* %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_s64(i64* %a, <1 x i64> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: [[TMP3:%.*]] = extractelement <1 x i64> [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = extractelement <1 x i64> [[TMP2]], i32 0
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: store i64 [[TMP3]], i64* [[TMP4]]			// CHECK: store i64 [[TMP3]], i64* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_s64(int64_t *a, int64x1_t b) {			void test_vst1_lane_s64(int64_t *a, int64x1_t b) {
	vst1_lane_s64(a, b, 0);			vst1_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_f16(half* %a, <4 x half> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_f16(half* %a, <4 x half> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// CHECK: [[TMP3:%.*]] = extractelement <4 x half> [[TMP2]], i32 3			// CHECK: [[TMP3:%.*]] = extractelement <4 x half> [[TMP2]], i32 3
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to half*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: store half [[TMP3]], half* [[TMP4]]			// CHECK: store half [[TMP3]], half* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_f16(float16_t *a, float16x4_t b) {			void test_vst1_lane_f16(float16_t *a, float16x4_t b) {
	vst1_lane_f16(a, b, 3);			vst1_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_f32(float* %a, <2 x float> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_f32(float* %a, <2 x float> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
	// CHECK: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 1			// CHECK: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to float*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to float*
	// CHECK: store float [[TMP3]], float* [[TMP4]]			// CHECK: store float [[TMP3]], float* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_f32(float32_t *a, float32x2_t b) {			void test_vst1_lane_f32(float32_t *a, float32x2_t b) {
	vst1_lane_f32(a, b, 1);			vst1_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_f64(double* %a, <1 x double> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_f64(double* %a, <1 x double> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast double %a to i8*			// CHECK: [[TMP0:%.]] = bitcast double %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x double> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x double> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
	// CHECK: [[TMP3:%.*]] = extractelement <1 x double> [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = extractelement <1 x double> [[TMP2]], i32 0
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to double*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to double*
	// CHECK: store double [[TMP3]], double* [[TMP4]]			// CHECK: store double [[TMP3]], double* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_f64(float64_t *a, float64x1_t b) {			void test_vst1_lane_f64(float64_t *a, float64x1_t b) {
	vst1_lane_f64(a, b, 0);			vst1_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_p8(i8* %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_p8(i8* %a, <8 x i8> %b) #1 {
	// CHECK: [[TMP0:%.*]] = extractelement <8 x i8> %b, i32 7			// CHECK: [[TMP0:%.*]] = extractelement <8 x i8> %b, i32 7
	// CHECK: store i8 [[TMP0]], i8* %a			// CHECK: store i8 [[TMP0]], i8* %a
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_p8(poly8_t *a, poly8x8_t b) {			void test_vst1_lane_p8(poly8_t *a, poly8x8_t b) {
	vst1_lane_p8(a, b, 7);			vst1_lane_p8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_p16(i16* %a, <4 x i16> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_p16(i16* %a, <4 x i16> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i16 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3			// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*
	// CHECK: store i16 [[TMP3]], i16* [[TMP4]]			// CHECK: store i16 [[TMP3]], i16* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_p16(poly16_t *a, poly16x4_t b) {			void test_vst1_lane_p16(poly16_t *a, poly16x4_t b) {
	vst1_lane_p16(a, b, 3);			vst1_lane_p16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst1_lane_p64(i64* %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define void @test_vst1_lane_p64(i64* %a, <1 x i64> %b) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: [[TMP3:%.*]] = extractelement <1 x i64> [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = extractelement <1 x i64> [[TMP2]], i32 0
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i64*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i64*
	// CHECK: store i64 [[TMP3]], i64* [[TMP4]]			// CHECK: store i64 [[TMP3]], i64* [[TMP4]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_p64(poly64_t *a, poly64x1_t b) {			void test_vst1_lane_p64(poly64_t *a, poly64x1_t b) {
	vst1_lane_p64(a, b, 0);			vst1_lane_p64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_u8(i8* %a, [2 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_u8(i8* %a, [2 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16
	// CHECK: call void @llvm.aarch64.neon.st2lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st2lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_u8(uint8_t *a, uint8x16x2_t b) {			void test_vst2q_lane_u8(uint8_t *a, uint8x16x2_t b) {
	vst2q_lane_u8(a, b, 15);			vst2q_lane_u8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_u16(i16* %a, [2 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_u16(i16* %a, [2 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint16x8x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint16x8x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x2_t, %struct.uint16x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x2_t, %struct.uint16x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint16x8x2_t, %struct.uint16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint16x8x2_t, %struct.uint16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint16x8x2_t, %struct.uint16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint16x8x2_t, %struct.uint16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <8 x i16> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <8 x i16> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v8i16.p0i8(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v8i16.p0i8(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_u16(uint16_t *a, uint16x8x2_t b) {			void test_vst2q_lane_u16(uint16_t *a, uint16x8x2_t b) {
	vst2q_lane_u16(a, b, 7);			vst2q_lane_u16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_u32(i32* %a, [2 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_u32(i32* %a, [2 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x2_t, %struct.uint32x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x2_t, %struct.uint32x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i32>] [[B]].coerce, [2 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <4 x i32>] [[B]].coerce, [2 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint32x4x2_t, %struct.uint32x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint32x4x2_t, %struct.uint32x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint32x4x2_t, %struct.uint32x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint32x4x2_t, %struct.uint32x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <4 x i32> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x i32> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <4 x i32>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <4 x i32>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x i32>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x i32>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v4i32.p0i8(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v4i32.p0i8(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_u32(uint32_t *a, uint32x4x2_t b) {			void test_vst2q_lane_u32(uint32_t *a, uint32x4x2_t b) {
	vst2q_lane_u32(a, b, 3);			vst2q_lane_u32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_u64(i64* %a, [2 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_u64(i64* %a, [2 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint64x2x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x2_t, %struct.uint64x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x2_t, %struct.uint64x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint64x2x2_t, %struct.uint64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint64x2x2_t, %struct.uint64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint64x2x2_t, %struct.uint64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint64x2x2_t, %struct.uint64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <2 x i64> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <2 x i64> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x i64>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x i64>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v2i64.p0i8(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v2i64.p0i8(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_u64(uint64_t *a, uint64x2x2_t b) {			void test_vst2q_lane_u64(uint64_t *a, uint64x2x2_t b) {
	vst2q_lane_u64(a, b, 1);			vst2q_lane_u64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_s8(i8* %a, [2 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_s8(i8* %a, [2 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16
	// CHECK: call void @llvm.aarch64.neon.st2lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st2lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_s8(int8_t *a, int8x16x2_t b) {			void test_vst2q_lane_s8(int8_t *a, int8x16x2_t b) {
	vst2q_lane_s8(a, b, 15);			vst2q_lane_s8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_s16(i16* %a, [2 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_s16(i16* %a, [2 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int16x8x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int16x8x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x2_t, %struct.int16x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x2_t, %struct.int16x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int16x8x2_t, %struct.int16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int16x8x2_t, %struct.int16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int16x8x2_t, %struct.int16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int16x8x2_t, %struct.int16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <8 x i16> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <8 x i16> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v8i16.p0i8(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v8i16.p0i8(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_s16(int16_t *a, int16x8x2_t b) {			void test_vst2q_lane_s16(int16_t *a, int16x8x2_t b) {
	vst2q_lane_s16(a, b, 7);			vst2q_lane_s16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_s32(i32* %a, [2 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_s32(i32* %a, [2 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x2_t, %struct.int32x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x2_t, %struct.int32x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i32>] [[B]].coerce, [2 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <4 x i32>] [[B]].coerce, [2 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int32x4x2_t, %struct.int32x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int32x4x2_t, %struct.int32x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int32x4x2_t, %struct.int32x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int32x4x2_t, %struct.int32x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <4 x i32> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x i32> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <4 x i32>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <4 x i32>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x i32>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x i32>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v4i32.p0i8(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v4i32.p0i8(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_s32(int32_t *a, int32x4x2_t b) {			void test_vst2q_lane_s32(int32_t *a, int32x4x2_t b) {
	vst2q_lane_s32(a, b, 3);			vst2q_lane_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_s64(i64* %a, [2 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_s64(i64* %a, [2 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int64x2x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x2_t, %struct.int64x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x2_t, %struct.int64x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int64x2x2_t, %struct.int64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int64x2x2_t, %struct.int64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int64x2x2_t, %struct.int64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int64x2x2_t, %struct.int64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <2 x i64> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <2 x i64> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x i64>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x i64>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v2i64.p0i8(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v2i64.p0i8(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_s64(int64_t *a, int64x2x2_t b) {			void test_vst2q_lane_s64(int64_t *a, int64x2x2_t b) {
	vst2q_lane_s64(a, b, 1);			vst2q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_f16(half* %a, [2 x <8 x half>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_f16(half* %a, [2 x <8 x half>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float16x8x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float16x8x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x half>] [[B]].coerce, [2 x <8 x half>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <8 x half>] [[B]].coerce, [2 x <8 x half>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast half %a to i8*			// CHECK: [[TMP2:%.]] = bitcast half %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <8 x half> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <8 x half> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v8f16.p0i8(<8 x half> [[TMP7]], <8 x half> [[TMP8]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v8f16.p0i8(<8 x half> [[TMP7]], <8 x half> [[TMP8]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_f16(float16_t *a, float16x8x2_t b) {			void test_vst2q_lane_f16(float16_t *a, float16x8x2_t b) {
	vst2q_lane_f16(a, b, 7);			vst2q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_f32(float* %a, [2 x <4 x float>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_f32(float* %a, [2 x <4 x float>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x2_t, %struct.float32x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x2_t, %struct.float32x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x float>] [[B]].coerce, [2 x <4 x float>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <4 x float>] [[B]].coerce, [2 x <4 x float>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast float %a to i8*			// CHECK: [[TMP2:%.]] = bitcast float %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float32x4x2_t, %struct.float32x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float32x4x2_t, %struct.float32x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x float>], [2 x <4 x float>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x float>], [2 x <4 x float>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <4 x float>, <4 x float> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <4 x float>, <4 x float> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <4 x float> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <4 x float> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float32x4x2_t, %struct.float32x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float32x4x2_t, %struct.float32x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x float>], [2 x <4 x float>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x float>], [2 x <4 x float>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <4 x float>, <4 x float> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <4 x float>, <4 x float> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <4 x float> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x float> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <4 x float>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <4 x float>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x float>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x float>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v4f32.p0i8(<4 x float> [[TMP7]], <4 x float> [[TMP8]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v4f32.p0i8(<4 x float> [[TMP7]], <4 x float> [[TMP8]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_f32(float32_t *a, float32x4x2_t b) {			void test_vst2q_lane_f32(float32_t *a, float32x4x2_t b) {
	vst2q_lane_f32(a, b, 3);			vst2q_lane_f32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_f64(double* %a, [2 x <2 x double>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_f64(double* %a, [2 x <2 x double>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float64x2x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x2_t, %struct.float64x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x2_t, %struct.float64x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x double>] [[B]].coerce, [2 x <2 x double>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x double>] [[B]].coerce, [2 x <2 x double>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast double %a to i8*			// CHECK: [[TMP2:%.]] = bitcast double %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float64x2x2_t, %struct.float64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float64x2x2_t, %struct.float64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x double>], [2 x <2 x double>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x double>], [2 x <2 x double>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <2 x double>, <2 x double> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <2 x double>, <2 x double> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <2 x double> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <2 x double> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float64x2x2_t, %struct.float64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float64x2x2_t, %struct.float64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x double>], [2 x <2 x double>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x double>], [2 x <2 x double>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <2 x double>, <2 x double> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <2 x double>, <2 x double> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <2 x double> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <2 x double> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x double>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x double>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x double>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x double>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v2f64.p0i8(<2 x double> [[TMP7]], <2 x double> [[TMP8]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v2f64.p0i8(<2 x double> [[TMP7]], <2 x double> [[TMP8]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_f64(float64_t *a, float64x2x2_t b) {			void test_vst2q_lane_f64(float64_t *a, float64x2x2_t b) {
	vst2q_lane_f64(a, b, 1);			vst2q_lane_f64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_p8(i8* %a, [2 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_p8(i8* %a, [2 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16
	// CHECK: call void @llvm.aarch64.neon.st2lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st2lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_p8(poly8_t *a, poly8x16x2_t b) {			void test_vst2q_lane_p8(poly8_t *a, poly8x16x2_t b) {
	vst2q_lane_p8(a, b, 15);			vst2q_lane_p8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_p16(i16* %a, [2 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_p16(i16* %a, [2 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly16x8x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly16x8x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x2_t, %struct.poly16x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x2_t, %struct.poly16x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <8 x i16>] [[B]].coerce, [2 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly16x8x2_t, %struct.poly16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly16x8x2_t, %struct.poly16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly16x8x2_t, %struct.poly16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly16x8x2_t, %struct.poly16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i16>], [2 x <8 x i16>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <8 x i16>, <8 x i16> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <8 x i16> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <8 x i16> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v8i16.p0i8(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v8i16.p0i8(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_p16(poly16_t *a, poly16x8x2_t b) {			void test_vst2q_lane_p16(poly16_t *a, poly16x8x2_t b) {
	vst2q_lane_p16(a, b, 7);			vst2q_lane_p16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst2q_lane_p64(i64* %a, [2 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_lane_p64(i64* %a, [2 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x i64>] [[B]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX2]], align 16			// CHECK: [[TMP5:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <2 x i64> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <2 x i64> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x i64>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x i64>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v2i64.p0i8(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v2i64.p0i8(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_p64(poly64_t *a, poly64x2x2_t b) {			void test_vst2q_lane_p64(poly64_t *a, poly64x2x2_t b) {
	vst2q_lane_p64(a, b, 1);			vst2q_lane_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_u8(i8* %a, [2 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_u8(i8* %a, [2 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint8x8x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint8x8x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8
	// CHECK: call void @llvm.aarch64.neon.st2lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st2lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_u8(uint8_t *a, uint8x8x2_t b) {			void test_vst2_lane_u8(uint8_t *a, uint8x8x2_t b) {
	vst2_lane_u8(a, b, 7);			vst2_lane_u8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_u16(i16* %a, [2 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_u16(i16* %a, [2 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint16x4x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint16x4x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x2_t, %struct.uint16x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x2_t, %struct.uint16x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint16x4x2_t, %struct.uint16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint16x4x2_t, %struct.uint16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint16x4x2_t, %struct.uint16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint16x4x2_t, %struct.uint16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <4 x i16> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x i16> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v4i16.p0i8(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v4i16.p0i8(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_u16(uint16_t *a, uint16x4x2_t b) {			void test_vst2_lane_u16(uint16_t *a, uint16x4x2_t b) {
	vst2_lane_u16(a, b, 3);			vst2_lane_u16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_u32(i32* %a, [2 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_u32(i32* %a, [2 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x2_t, %struct.uint32x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x2_t, %struct.uint32x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i32>] [[B]].coerce, [2 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <2 x i32>] [[B]].coerce, [2 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint32x2x2_t, %struct.uint32x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint32x2x2_t, %struct.uint32x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i32>], [2 x <2 x i32>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i32>], [2 x <2 x i32>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint32x2x2_t, %struct.uint32x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint32x2x2_t, %struct.uint32x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i32>], [2 x <2 x i32>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i32>], [2 x <2 x i32>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <2 x i32> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <2 x i32> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x i32>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x i32>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x i32>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x i32>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v2i32.p0i8(<2 x i32> [[TMP7]], <2 x i32> [[TMP8]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v2i32.p0i8(<2 x i32> [[TMP7]], <2 x i32> [[TMP8]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_u32(uint32_t *a, uint32x2x2_t b) {			void test_vst2_lane_u32(uint32_t *a, uint32x2x2_t b) {
	vst2_lane_u32(a, b, 1);			vst2_lane_u32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_u64(i64* %a, [2 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_u64(i64* %a, [2 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint64x1x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x2_t, %struct.uint64x1x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x2_t, %struct.uint64x1x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint64x1x2_t, %struct.uint64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint64x1x2_t, %struct.uint64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint64x1x2_t, %struct.uint64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint64x1x2_t, %struct.uint64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <1 x i64> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <1 x i64> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x i64>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x i64>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v1i64.p0i8(<1 x i64> [[TMP7]], <1 x i64> [[TMP8]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v1i64.p0i8(<1 x i64> [[TMP7]], <1 x i64> [[TMP8]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_u64(uint64_t *a, uint64x1x2_t b) {			void test_vst2_lane_u64(uint64_t *a, uint64x1x2_t b) {
	vst2_lane_u64(a, b, 0);			vst2_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_s8(i8* %a, [2 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_s8(i8* %a, [2 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int8x8x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int8x8x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8
	// CHECK: call void @llvm.aarch64.neon.st2lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st2lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_s8(int8_t *a, int8x8x2_t b) {			void test_vst2_lane_s8(int8_t *a, int8x8x2_t b) {
	vst2_lane_s8(a, b, 7);			vst2_lane_s8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_s16(i16* %a, [2 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_s16(i16* %a, [2 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int16x4x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int16x4x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x2_t, %struct.int16x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x2_t, %struct.int16x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int16x4x2_t, %struct.int16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int16x4x2_t, %struct.int16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int16x4x2_t, %struct.int16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int16x4x2_t, %struct.int16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <4 x i16> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x i16> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v4i16.p0i8(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v4i16.p0i8(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_s16(int16_t *a, int16x4x2_t b) {			void test_vst2_lane_s16(int16_t *a, int16x4x2_t b) {
	vst2_lane_s16(a, b, 3);			vst2_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_s32(i32* %a, [2 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_s32(i32* %a, [2 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x2_t, %struct.int32x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x2_t, %struct.int32x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x i32>] [[B]].coerce, [2 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <2 x i32>] [[B]].coerce, [2 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int32x2x2_t, %struct.int32x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int32x2x2_t, %struct.int32x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i32>], [2 x <2 x i32>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i32>], [2 x <2 x i32>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int32x2x2_t, %struct.int32x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int32x2x2_t, %struct.int32x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i32>], [2 x <2 x i32>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x i32>], [2 x <2 x i32>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <2 x i32> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <2 x i32> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x i32>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x i32>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x i32>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x i32>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v2i32.p0i8(<2 x i32> [[TMP7]], <2 x i32> [[TMP8]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v2i32.p0i8(<2 x i32> [[TMP7]], <2 x i32> [[TMP8]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_s32(int32_t *a, int32x2x2_t b) {			void test_vst2_lane_s32(int32_t *a, int32x2x2_t b) {
	vst2_lane_s32(a, b, 1);			vst2_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_s64(i64* %a, [2 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_s64(i64* %a, [2 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int64x1x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x2_t, %struct.int64x1x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x2_t, %struct.int64x1x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int64x1x2_t, %struct.int64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int64x1x2_t, %struct.int64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int64x1x2_t, %struct.int64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int64x1x2_t, %struct.int64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <1 x i64> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <1 x i64> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x i64>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x i64>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v1i64.p0i8(<1 x i64> [[TMP7]], <1 x i64> [[TMP8]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v1i64.p0i8(<1 x i64> [[TMP7]], <1 x i64> [[TMP8]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_s64(int64_t *a, int64x1x2_t b) {			void test_vst2_lane_s64(int64_t *a, int64x1x2_t b) {
	vst2_lane_s64(a, b, 0);			vst2_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_f16(half* %a, [2 x <4 x half>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_f16(half* %a, [2 x <4 x half>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x half>] [[B]].coerce, [2 x <4 x half>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <4 x half>] [[B]].coerce, [2 x <4 x half>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast half %a to i8*			// CHECK: [[TMP2:%.]] = bitcast half %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <4 x half> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <4 x half> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v4f16.p0i8(<4 x half> [[TMP7]], <4 x half> [[TMP8]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v4f16.p0i8(<4 x half> [[TMP7]], <4 x half> [[TMP8]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_f16(float16_t *a, float16x4x2_t b) {			void test_vst2_lane_f16(float16_t *a, float16x4x2_t b) {
	vst2_lane_f16(a, b, 3);			vst2_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_f32(float* %a, [2 x <2 x float>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_f32(float* %a, [2 x <2 x float>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x2_t, %struct.float32x2x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x2_t, %struct.float32x2x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <2 x float>] [[B]].coerce, [2 x <2 x float>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <2 x float>] [[B]].coerce, [2 x <2 x float>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast float %a to i8*			// CHECK: [[TMP2:%.]] = bitcast float %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float32x2x2_t, %struct.float32x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float32x2x2_t, %struct.float32x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x float>], [2 x <2 x float>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x float>], [2 x <2 x float>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <2 x float>, <2 x float> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <2 x float>, <2 x float> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <2 x float> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <2 x float> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float32x2x2_t, %struct.float32x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float32x2x2_t, %struct.float32x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x float>], [2 x <2 x float>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <2 x float>], [2 x <2 x float>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <2 x float>, <2 x float> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <2 x float>, <2 x float> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <2 x float> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <2 x float> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x float>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x float>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x float>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x float>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v2f32.p0i8(<2 x float> [[TMP7]], <2 x float> [[TMP8]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v2f32.p0i8(<2 x float> [[TMP7]], <2 x float> [[TMP8]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_f32(float32_t *a, float32x2x2_t b) {			void test_vst2_lane_f32(float32_t *a, float32x2x2_t b) {
	vst2_lane_f32(a, b, 1);			vst2_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_f64(double* %a, [2 x <1 x double>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_f64(double* %a, [2 x <1 x double>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float64x1x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x2_t, %struct.float64x1x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x2_t, %struct.float64x1x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <1 x double>] [[B]].coerce, [2 x <1 x double>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x double>] [[B]].coerce, [2 x <1 x double>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast double %a to i8*			// CHECK: [[TMP2:%.]] = bitcast double %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float64x1x2_t, %struct.float64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float64x1x2_t, %struct.float64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x double>], [2 x <1 x double>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x double>], [2 x <1 x double>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <1 x double>, <1 x double> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <1 x double>, <1 x double> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <1 x double> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <1 x double> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float64x1x2_t, %struct.float64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float64x1x2_t, %struct.float64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <1 x double>], [2 x <1 x double>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <1 x double>], [2 x <1 x double>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <1 x double>, <1 x double> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <1 x double>, <1 x double> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <1 x double> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <1 x double> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x double>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x double>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v1f64.p0i8(<1 x double> [[TMP7]], <1 x double> [[TMP8]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v1f64.p0i8(<1 x double> [[TMP7]], <1 x double> [[TMP8]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_f64(float64_t *a, float64x1x2_t b) {			void test_vst2_lane_f64(float64_t *a, float64x1x2_t b) {
	vst2_lane_f64(a, b, 0);			vst2_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_p8(i8* %a, [2 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_p8(i8* %a, [2 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly8x8x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly8x8x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[B]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8
	// CHECK: call void @llvm.aarch64.neon.st2lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st2lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_p8(poly8_t *a, poly8x8x2_t b) {			void test_vst2_lane_p8(poly8_t *a, poly8x8x2_t b) {
	vst2_lane_p8(a, b, 7);			vst2_lane_p8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_p16(i16* %a, [2 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_p16(i16* %a, [2 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly16x4x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly16x4x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x2_t, %struct.poly16x4x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x2_t, %struct.poly16x4x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <4 x i16>] [[B]].coerce, [2 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly16x4x2_t, %struct.poly16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly16x4x2_t, %struct.poly16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly16x4x2_t, %struct.poly16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly16x4x2_t, %struct.poly16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x i16>], [2 x <4 x i16>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <4 x i16>, <4 x i16> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <4 x i16> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x i16> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v4i16.p0i8(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v4i16.p0i8(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_p16(poly16_t *a, poly16x4x2_t b) {			void test_vst2_lane_p16(poly16_t *a, poly16x4x2_t b) {
	vst2_lane_p16(a, b, 3);			vst2_lane_p16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst2_lane_p64(i64* %a, [2 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_lane_p64(i64* %a, [2 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x i64>] [[B]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x2_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x2_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX2]], align 8			// CHECK: [[TMP5:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <1 x i64> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <1 x i64> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x i64>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x i64>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st2lane.v1i64.p0i8(<1 x i64> [[TMP7]], <1 x i64> [[TMP8]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2lane.v1i64.p0i8(<1 x i64> [[TMP7]], <1 x i64> [[TMP8]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_p64(poly64_t *a, poly64x1x2_t b) {			void test_vst2_lane_p64(poly64_t *a, poly64x1x2_t b) {
	vst2_lane_p64(a, b, 0);			vst2_lane_p64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_u8(i8* %a, [3 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_u8(i8* %a, [3 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3]], i64 0, i64 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3]], i64 0, i64 2
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4]], align 16
	// CHECK: call void @llvm.aarch64.neon.st3lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st3lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_u8(uint8_t *a, uint8x16x3_t b) {			void test_vst3q_lane_u8(uint8_t *a, uint8x16x3_t b) {
	vst3q_lane_u8(a, b, 15);			vst3q_lane_u8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_u16(i16* %a, [3 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_u16(i16* %a, [3 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint16x8x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint16x8x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x3_t, %struct.uint16x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x3_t, %struct.uint16x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v8i16.p0i8(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v8i16.p0i8(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_u16(uint16_t *a, uint16x8x3_t b) {			void test_vst3q_lane_u16(uint16_t *a, uint16x8x3_t b) {
	vst3q_lane_u16(a, b, 7);			vst3q_lane_u16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_u32(i32* %a, [3 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_u32(i32* %a, [3 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x3_t, %struct.uint32x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x3_t, %struct.uint32x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i32>] [[B]].coerce, [3 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <4 x i32>] [[B]].coerce, [3 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x i32>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x i32>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x i32>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x i32>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v4i32.p0i8(<4 x i32> [[TMP9]], <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v4i32.p0i8(<4 x i32> [[TMP9]], <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_u32(uint32_t *a, uint32x4x3_t b) {			void test_vst3q_lane_u32(uint32_t *a, uint32x4x3_t b) {
	vst3q_lane_u32(a, b, 3);			vst3q_lane_u32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_u64(i64* %a, [3 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_u64(i64* %a, [3 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint64x2x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x3_t, %struct.uint64x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x3_t, %struct.uint64x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v2i64.p0i8(<2 x i64> [[TMP9]], <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v2i64.p0i8(<2 x i64> [[TMP9]], <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_u64(uint64_t *a, uint64x2x3_t b) {			void test_vst3q_lane_u64(uint64_t *a, uint64x2x3_t b) {
	vst3q_lane_u64(a, b, 1);			vst3q_lane_u64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_s8(i8* %a, [3 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_s8(i8* %a, [3 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3]], i64 0, i64 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3]], i64 0, i64 2
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4]], align 16
	// CHECK: call void @llvm.aarch64.neon.st3lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st3lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_s8(int8_t *a, int8x16x3_t b) {			void test_vst3q_lane_s8(int8_t *a, int8x16x3_t b) {
	vst3q_lane_s8(a, b, 15);			vst3q_lane_s8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_s16(i16* %a, [3 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_s16(i16* %a, [3 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int16x8x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int16x8x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x3_t, %struct.int16x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x3_t, %struct.int16x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v8i16.p0i8(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v8i16.p0i8(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_s16(int16_t *a, int16x8x3_t b) {			void test_vst3q_lane_s16(int16_t *a, int16x8x3_t b) {
	vst3q_lane_s16(a, b, 7);			vst3q_lane_s16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_s32(i32* %a, [3 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_s32(i32* %a, [3 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x3_t, %struct.int32x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x3_t, %struct.int32x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i32>] [[B]].coerce, [3 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <4 x i32>] [[B]].coerce, [3 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x i32>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x i32>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x i32>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x i32>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v4i32.p0i8(<4 x i32> [[TMP9]], <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v4i32.p0i8(<4 x i32> [[TMP9]], <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_s32(int32_t *a, int32x4x3_t b) {			void test_vst3q_lane_s32(int32_t *a, int32x4x3_t b) {
	vst3q_lane_s32(a, b, 3);			vst3q_lane_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_s64(i64* %a, [3 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_s64(i64* %a, [3 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int64x2x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x3_t, %struct.int64x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x3_t, %struct.int64x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v2i64.p0i8(<2 x i64> [[TMP9]], <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v2i64.p0i8(<2 x i64> [[TMP9]], <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_s64(int64_t *a, int64x2x3_t b) {			void test_vst3q_lane_s64(int64_t *a, int64x2x3_t b) {
	vst3q_lane_s64(a, b, 1);			vst3q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_f16(half* %a, [3 x <8 x half>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_f16(half* %a, [3 x <8 x half>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float16x8x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float16x8x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x half>] [[B]].coerce, [3 x <8 x half>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <8 x half>] [[B]].coerce, [3 x <8 x half>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast half %a to i8*			// CHECK: [[TMP2:%.]] = bitcast half %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v8f16.p0i8(<8 x half> [[TMP9]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v8f16.p0i8(<8 x half> [[TMP9]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_f16(float16_t *a, float16x8x3_t b) {			void test_vst3q_lane_f16(float16_t *a, float16x8x3_t b) {
	vst3q_lane_f16(a, b, 7);			vst3q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_f32(float* %a, [3 x <4 x float>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_f32(float* %a, [3 x <4 x float>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x3_t, %struct.float32x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x3_t, %struct.float32x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x float>] [[B]].coerce, [3 x <4 x float>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <4 x float>] [[B]].coerce, [3 x <4 x float>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast float %a to i8*			// CHECK: [[TMP2:%.]] = bitcast float %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x float>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <4 x float>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x float>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x float>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v4f32.p0i8(<4 x float> [[TMP9]], <4 x float> [[TMP10]], <4 x float> [[TMP11]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v4f32.p0i8(<4 x float> [[TMP9]], <4 x float> [[TMP10]], <4 x float> [[TMP11]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_f32(float32_t *a, float32x4x3_t b) {			void test_vst3q_lane_f32(float32_t *a, float32x4x3_t b) {
	vst3q_lane_f32(a, b, 3);			vst3q_lane_f32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_f64(double* %a, [3 x <2 x double>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_f64(double* %a, [3 x <2 x double>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float64x2x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x3_t, %struct.float64x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x3_t, %struct.float64x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x double>] [[B]].coerce, [3 x <2 x double>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x double>] [[B]].coerce, [3 x <2 x double>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast double %a to i8*			// CHECK: [[TMP2:%.]] = bitcast double %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x double>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x double>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x double>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x double>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v2f64.p0i8(<2 x double> [[TMP9]], <2 x double> [[TMP10]], <2 x double> [[TMP11]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v2f64.p0i8(<2 x double> [[TMP9]], <2 x double> [[TMP10]], <2 x double> [[TMP11]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_f64(float64_t *a, float64x2x3_t b) {			void test_vst3q_lane_f64(float64_t *a, float64x2x3_t b) {
	vst3q_lane_f64(a, b, 1);			vst3q_lane_f64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_p8(i8* %a, [3 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_p8(i8* %a, [3 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX]], align 16
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2]], align 16
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3]], i64 0, i64 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3]], i64 0, i64 2
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4]], align 16
	// CHECK: call void @llvm.aarch64.neon.st3lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st3lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_p8(poly8_t *a, poly8x16x3_t b) {			void test_vst3q_lane_p8(poly8_t *a, poly8x16x3_t b) {
	vst3q_lane_p8(a, b, 15);			vst3q_lane_p8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_p16(i16* %a, [3 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_p16(i16* %a, [3 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly16x8x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly16x8x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x3_t, %struct.poly16x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x3_t, %struct.poly16x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <8 x i16>] [[B]].coerce, [3 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v8i16.p0i8(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v8i16.p0i8(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_p16(poly16_t *a, poly16x8x3_t b) {			void test_vst3q_lane_p16(poly16_t *a, poly16x8x3_t b) {
	vst3q_lane_p16(a, b, 7);			vst3q_lane_p16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst3q_lane_p64(i64* %a, [3 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_lane_p64(i64* %a, [3 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x3_t, %struct.poly64x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x3_t, %struct.poly64x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x i64>] [[B]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v2i64.p0i8(<2 x i64> [[TMP9]], <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v2i64.p0i8(<2 x i64> [[TMP9]], <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_p64(poly64_t *a, poly64x2x3_t b) {			void test_vst3q_lane_p64(poly64_t *a, poly64x2x3_t b) {
	vst3q_lane_p64(a, b, 1);			vst3q_lane_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_u8(i8* %a, [3 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_u8(i8* %a, [3 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint8x8x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3]], i64 0, i64 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3]], i64 0, i64 2
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4]], align 8
	// CHECK: call void @llvm.aarch64.neon.st3lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st3lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_u8(uint8_t *a, uint8x8x3_t b) {			void test_vst3_lane_u8(uint8_t *a, uint8x8x3_t b) {
	vst3_lane_u8(a, b, 7);			vst3_lane_u8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_u16(i16* %a, [3 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_u16(i16* %a, [3 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint16x4x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint16x4x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x3_t, %struct.uint16x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x3_t, %struct.uint16x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v4i16.p0i8(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v4i16.p0i8(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_u16(uint16_t *a, uint16x4x3_t b) {			void test_vst3_lane_u16(uint16_t *a, uint16x4x3_t b) {
	vst3_lane_u16(a, b, 3);			vst3_lane_u16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_u32(i32* %a, [3 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_u32(i32* %a, [3 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x3_t, %struct.uint32x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x3_t, %struct.uint32x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i32>] [[B]].coerce, [3 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <2 x i32>] [[B]].coerce, [3 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x i32>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x i32>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x i32>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x i32>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v2i32.p0i8(<2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v2i32.p0i8(<2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_u32(uint32_t *a, uint32x2x3_t b) {			void test_vst3_lane_u32(uint32_t *a, uint32x2x3_t b) {
	vst3_lane_u32(a, b, 1);			vst3_lane_u32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_u64(i64* %a, [3 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_u64(i64* %a, [3 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint64x1x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x3_t, %struct.uint64x1x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x3_t, %struct.uint64x1x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v1i64.p0i8(<1 x i64> [[TMP9]], <1 x i64> [[TMP10]], <1 x i64> [[TMP11]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v1i64.p0i8(<1 x i64> [[TMP9]], <1 x i64> [[TMP10]], <1 x i64> [[TMP11]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_u64(uint64_t *a, uint64x1x3_t b) {			void test_vst3_lane_u64(uint64_t *a, uint64x1x3_t b) {
	vst3_lane_u64(a, b, 0);			vst3_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_s8(i8* %a, [3 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_s8(i8* %a, [3 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int8x8x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3]], i64 0, i64 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3]], i64 0, i64 2
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4]], align 8
	// CHECK: call void @llvm.aarch64.neon.st3lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st3lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_s8(int8_t *a, int8x8x3_t b) {			void test_vst3_lane_s8(int8_t *a, int8x8x3_t b) {
	vst3_lane_s8(a, b, 7);			vst3_lane_s8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_s16(i16* %a, [3 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_s16(i16* %a, [3 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int16x4x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int16x4x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x3_t, %struct.int16x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x3_t, %struct.int16x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v4i16.p0i8(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v4i16.p0i8(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_s16(int16_t *a, int16x4x3_t b) {			void test_vst3_lane_s16(int16_t *a, int16x4x3_t b) {
	vst3_lane_s16(a, b, 3);			vst3_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_s32(i32* %a, [3 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_s32(i32* %a, [3 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x3_t, %struct.int32x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x3_t, %struct.int32x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x i32>] [[B]].coerce, [3 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <2 x i32>] [[B]].coerce, [3 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x i32>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x i32>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x i32>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x i32>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v2i32.p0i8(<2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v2i32.p0i8(<2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_s32(int32_t *a, int32x2x3_t b) {			void test_vst3_lane_s32(int32_t *a, int32x2x3_t b) {
	vst3_lane_s32(a, b, 1);			vst3_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_s64(i64* %a, [3 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_s64(i64* %a, [3 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int64x1x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x3_t, %struct.int64x1x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x3_t, %struct.int64x1x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v1i64.p0i8(<1 x i64> [[TMP9]], <1 x i64> [[TMP10]], <1 x i64> [[TMP11]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v1i64.p0i8(<1 x i64> [[TMP9]], <1 x i64> [[TMP10]], <1 x i64> [[TMP11]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_s64(int64_t *a, int64x1x3_t b) {			void test_vst3_lane_s64(int64_t *a, int64x1x3_t b) {
	vst3_lane_s64(a, b, 0);			vst3_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_f16(half* %a, [3 x <4 x half>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_f16(half* %a, [3 x <4 x half>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x half>] [[B]].coerce, [3 x <4 x half>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <4 x half>] [[B]].coerce, [3 x <4 x half>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast half %a to i8*			// CHECK: [[TMP2:%.]] = bitcast half %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v4f16.p0i8(<4 x half> [[TMP9]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v4f16.p0i8(<4 x half> [[TMP9]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_f16(float16_t *a, float16x4x3_t b) {			void test_vst3_lane_f16(float16_t *a, float16x4x3_t b) {
	vst3_lane_f16(a, b, 3);			vst3_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_f32(float* %a, [3 x <2 x float>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_f32(float* %a, [3 x <2 x float>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x3_t, %struct.float32x2x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x3_t, %struct.float32x2x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <2 x float>] [[B]].coerce, [3 x <2 x float>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <2 x float>] [[B]].coerce, [3 x <2 x float>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast float %a to i8*			// CHECK: [[TMP2:%.]] = bitcast float %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x float>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <2 x float>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x float>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x float>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v2f32.p0i8(<2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x float> [[TMP11]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v2f32.p0i8(<2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x float> [[TMP11]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_f32(float32_t *a, float32x2x3_t b) {			void test_vst3_lane_f32(float32_t *a, float32x2x3_t b) {
	vst3_lane_f32(a, b, 1);			vst3_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_f64(double* %a, [3 x <1 x double>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_f64(double* %a, [3 x <1 x double>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float64x1x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x3_t, %struct.float64x1x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x3_t, %struct.float64x1x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <1 x double>] [[B]].coerce, [3 x <1 x double>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x double>] [[B]].coerce, [3 x <1 x double>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast double %a to i8*			// CHECK: [[TMP2:%.]] = bitcast double %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x double>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x double>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x double>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x double>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v1f64.p0i8(<1 x double> [[TMP9]], <1 x double> [[TMP10]], <1 x double> [[TMP11]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v1f64.p0i8(<1 x double> [[TMP9]], <1 x double> [[TMP10]], <1 x double> [[TMP11]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_f64(float64_t *a, float64x1x3_t b) {			void test_vst3_lane_f64(float64_t *a, float64x1x3_t b) {
	vst3_lane_f64(a, b, 0);			vst3_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_p8(i8* %a, [3 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_p8(i8* %a, [3 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly8x8x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <8 x i8>] [[B]].coerce, [3 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL]], i64 0, i64 0
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX]], align 8
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1]], i64 0, i64 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1]], i64 0, i64 1
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2]], align 8
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3]], i64 0, i64 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3]], i64 0, i64 2
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4]], align 8
	// CHECK: call void @llvm.aarch64.neon.st3lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st3lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_p8(poly8_t *a, poly8x8x3_t b) {			void test_vst3_lane_p8(poly8_t *a, poly8x8x3_t b) {
	vst3_lane_p8(a, b, 7);			vst3_lane_p8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_p16(i16* %a, [3 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_p16(i16* %a, [3 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly16x4x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly16x4x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x3_t, %struct.poly16x4x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x3_t, %struct.poly16x4x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <4 x i16>] [[B]].coerce, [3 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v4i16.p0i8(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v4i16.p0i8(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_p16(poly16_t *a, poly16x4x3_t b) {			void test_vst3_lane_p16(poly16_t *a, poly16x4x3_t b) {
	vst3_lane_p16(a, b, 3);			vst3_lane_p16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst3_lane_p64(i64* %a, [3 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_lane_p64(i64* %a, [3 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x3_t, %struct.poly64x1x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x3_t, %struct.poly64x1x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x i64>] [[B]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x3_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x3_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st3lane.v1i64.p0i8(<1 x i64> [[TMP9]], <1 x i64> [[TMP10]], <1 x i64> [[TMP11]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3lane.v1i64.p0i8(<1 x i64> [[TMP9]], <1 x i64> [[TMP10]], <1 x i64> [[TMP11]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_p64(poly64_t *a, poly64x1x3_t b) {			void test_vst3_lane_p64(poly64_t *a, poly64x1x3_t b) {
	vst3_lane_p64(a, b, 0);			vst3_lane_p64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_u8(i8* %a, [4 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_u8(i8* %a, [4 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x16x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x16x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__S1]], i32 0, i32 0
	Show All 9 Lines
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5]], i64 0, i64 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5]], i64 0, i64 3
	// CHECK: [[TMP5:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6]], align 16			// CHECK: [[TMP5:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6]], align 16
	// CHECK: call void @llvm.aarch64.neon.st4lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st4lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_u8(uint8_t *a, uint8x16x4_t b) {			void test_vst4q_lane_u8(uint8_t *a, uint8x16x4_t b) {
	vst4q_lane_u8(a, b, 15);			vst4q_lane_u8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_u16(i16* %a, [4 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_u16(i16* %a, [4 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint16x8x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint16x8x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x8x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x4_t, %struct.uint16x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x8x4_t, %struct.uint16x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x8x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v8i16.p0i8(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v8i16.p0i8(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_u16(uint16_t *a, uint16x8x4_t b) {			void test_vst4q_lane_u16(uint16_t *a, uint16x8x4_t b) {
	vst4q_lane_u16(a, b, 7);			vst4q_lane_u16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_u32(i32* %a, [4 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_u32(i32* %a, [4 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x4x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x4_t, %struct.uint32x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x4x4_t, %struct.uint32x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i32>] [[B]].coerce, [4 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <4 x i32>] [[B]].coerce, [4 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x4x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x i32>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x i32>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <4 x i32>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <4 x i32>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v4i32.p0i8(<4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> [[TMP13]], <4 x i32> [[TMP14]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v4i32.p0i8(<4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> [[TMP13]], <4 x i32> [[TMP14]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_u32(uint32_t *a, uint32x4x4_t b) {			void test_vst4q_lane_u32(uint32_t *a, uint32x4x4_t b) {
	vst4q_lane_u32(a, b, 3);			vst4q_lane_u32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_u64(i64* %a, [4 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_u64(i64* %a, [4 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint64x2x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x4_t, %struct.uint64x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x2x4_t, %struct.uint64x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x2x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <2 x i64>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v2i64.p0i8(<2 x i64> [[TMP11]], <2 x i64> [[TMP12]], <2 x i64> [[TMP13]], <2 x i64> [[TMP14]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v2i64.p0i8(<2 x i64> [[TMP11]], <2 x i64> [[TMP12]], <2 x i64> [[TMP13]], <2 x i64> [[TMP14]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_u64(uint64_t *a, uint64x2x4_t b) {			void test_vst4q_lane_u64(uint64_t *a, uint64x2x4_t b) {
	vst4q_lane_u64(a, b, 1);			vst4q_lane_u64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_s8(i8* %a, [4 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_s8(i8* %a, [4 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x16x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x16x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__S1]], i32 0, i32 0
	Show All 9 Lines
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5]], i64 0, i64 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5]], i64 0, i64 3
	// CHECK: [[TMP5:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6]], align 16			// CHECK: [[TMP5:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6]], align 16
	// CHECK: call void @llvm.aarch64.neon.st4lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st4lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_s8(int8_t *a, int8x16x4_t b) {			void test_vst4q_lane_s8(int8_t *a, int8x16x4_t b) {
	vst4q_lane_s8(a, b, 15);			vst4q_lane_s8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_s16(i16* %a, [4 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_s16(i16* %a, [4 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int16x8x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int16x8x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int16x8x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x4_t, %struct.int16x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x8x4_t, %struct.int16x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x8x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v8i16.p0i8(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v8i16.p0i8(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_s16(int16_t *a, int16x8x4_t b) {			void test_vst4q_lane_s16(int16_t *a, int16x8x4_t b) {
	vst4q_lane_s16(a, b, 7);			vst4q_lane_s16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_s32(i32* %a, [4 x <4 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_s32(i32* %a, [4 x <4 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int32x4x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x4_t, %struct.int32x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x4x4_t, %struct.int32x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i32>] [[B]].coerce, [4 x <4 x i32>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <4 x i32>] [[B]].coerce, [4 x <4 x i32>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x4x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x i32>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x i32>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <4 x i32>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <4 x i32>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v4i32.p0i8(<4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> [[TMP13]], <4 x i32> [[TMP14]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v4i32.p0i8(<4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> [[TMP13]], <4 x i32> [[TMP14]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_s32(int32_t *a, int32x4x4_t b) {			void test_vst4q_lane_s32(int32_t *a, int32x4x4_t b) {
	vst4q_lane_s32(a, b, 3);			vst4q_lane_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_s64(i64* %a, [4 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_s64(i64* %a, [4 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int64x2x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.int64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x4_t, %struct.int64x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x2x4_t, %struct.int64x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x2x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <2 x i64>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v2i64.p0i8(<2 x i64> [[TMP11]], <2 x i64> [[TMP12]], <2 x i64> [[TMP13]], <2 x i64> [[TMP14]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v2i64.p0i8(<2 x i64> [[TMP11]], <2 x i64> [[TMP12]], <2 x i64> [[TMP13]], <2 x i64> [[TMP14]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_s64(int64_t *a, int64x2x4_t b) {			void test_vst4q_lane_s64(int64_t *a, int64x2x4_t b) {
	vst4q_lane_s64(a, b, 1);			vst4q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_f16(half* %a, [4 x <8 x half>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_f16(half* %a, [4 x <8 x half>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float16x8x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float16x8x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float16x8x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x half>] [[B]].coerce, [4 x <8 x half>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <8 x half>] [[B]].coerce, [4 x <8 x half>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x8x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast half %a to i8*			// CHECK: [[TMP2:%.]] = bitcast half %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v8f16.p0i8(<8 x half> [[TMP11]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v8f16.p0i8(<8 x half> [[TMP11]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_f16(float16_t *a, float16x8x4_t b) {			void test_vst4q_lane_f16(float16_t *a, float16x8x4_t b) {
	vst4q_lane_f16(a, b, 7);			vst4q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_f32(float* %a, [4 x <4 x float>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_f32(float* %a, [4 x <4 x float>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x4_t, %struct.float32x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x4x4_t, %struct.float32x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x float>] [[B]].coerce, [4 x <4 x float>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <4 x float>] [[B]].coerce, [4 x <4 x float>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x4x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast float %a to i8*			// CHECK: [[TMP2:%.]] = bitcast float %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x float>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <4 x float>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <4 x float>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <4 x float>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v4f32.p0i8(<4 x float> [[TMP11]], <4 x float> [[TMP12]], <4 x float> [[TMP13]], <4 x float> [[TMP14]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v4f32.p0i8(<4 x float> [[TMP11]], <4 x float> [[TMP12]], <4 x float> [[TMP13]], <4 x float> [[TMP14]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_f32(float32_t *a, float32x4x4_t b) {			void test_vst4q_lane_f32(float32_t *a, float32x4x4_t b) {
	vst4q_lane_f32(a, b, 3);			vst4q_lane_f32(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_f64(double* %a, [4 x <2 x double>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_f64(double* %a, [4 x <2 x double>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float64x2x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x4_t, %struct.float64x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x2x4_t, %struct.float64x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x double>] [[B]].coerce, [4 x <2 x double>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x double>] [[B]].coerce, [4 x <2 x double>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x2x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast double %a to i8*			// CHECK: [[TMP2:%.]] = bitcast double %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x double>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x double>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <2 x double>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <2 x double>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v2f64.p0i8(<2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x double> [[TMP13]], <2 x double> [[TMP14]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v2f64.p0i8(<2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x double> [[TMP13]], <2 x double> [[TMP14]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_f64(float64_t *a, float64x2x4_t b) {			void test_vst4q_lane_f64(float64_t *a, float64x2x4_t b) {
	vst4q_lane_f64(a, b, 1);			vst4q_lane_f64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_p8(i8* %a, [4 x <16 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_p8(i8* %a, [4 x <16 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x16x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x16x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__S1]], i32 0, i32 0
	Show All 9 Lines
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5]], i64 0, i64 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5]], i64 0, i64 3
	// CHECK: [[TMP5:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6]], align 16			// CHECK: [[TMP5:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6]], align 16
	// CHECK: call void @llvm.aarch64.neon.st4lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i64 15, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st4lane.v16i8.p0i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i64 15, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_p8(poly8_t *a, poly8x16x4_t b) {			void test_vst4q_lane_p8(poly8_t *a, poly8x16x4_t b) {
	vst4q_lane_p8(a, b, 15);			vst4q_lane_p8(a, b, 15);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_p16(i16* %a, [4 x <8 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_p16(i16* %a, [4 x <8 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly16x8x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly16x8x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x8x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x4_t, %struct.poly16x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x8x4_t, %struct.poly16x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <8 x i16>] [[B]].coerce, [4 x <8 x i16>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x8x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v8i16.p0i8(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i64 7, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v8i16.p0i8(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i64 7, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_p16(poly16_t *a, poly16x8x4_t b) {			void test_vst4q_lane_p16(poly16_t *a, poly16x8x4_t b) {
	vst4q_lane_p16(a, b, 7);			vst4q_lane_p16(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst4q_lane_p64(i64* %a, [4 x <2 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_lane_p64(i64* %a, [4 x <2 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x4_t, %struct.poly64x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x4_t, %struct.poly64x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x i64>] [[B]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <2 x i64>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v2i64.p0i8(<2 x i64> [[TMP11]], <2 x i64> [[TMP12]], <2 x i64> [[TMP13]], <2 x i64> [[TMP14]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v2i64.p0i8(<2 x i64> [[TMP11]], <2 x i64> [[TMP12]], <2 x i64> [[TMP13]], <2 x i64> [[TMP14]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_p64(poly64_t *a, poly64x2x4_t b) {			void test_vst4q_lane_p64(poly64_t *a, poly64x2x4_t b) {
	vst4q_lane_p64(a, b, 1);			vst4q_lane_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_u8(i8* %a, [4 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_u8(i8* %a, [4 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint8x8x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint8x8x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint8x8x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint8x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint8x8x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__S1]], i32 0, i32 0
	Show All 9 Lines
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5]], i64 0, i64 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5]], i64 0, i64 3
	// CHECK: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6]], align 8			// CHECK: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6]], align 8
	// CHECK: call void @llvm.aarch64.neon.st4lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st4lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_u8(uint8_t *a, uint8x8x4_t b) {			void test_vst4_lane_u8(uint8_t *a, uint8x8x4_t b) {
	vst4_lane_u8(a, b, 7);			vst4_lane_u8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_u16(i16* %a, [4 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_u16(i16* %a, [4 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint16x4x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint16x4x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint16x4x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x4_t, %struct.uint16x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint16x4x4_t, %struct.uint16x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint16x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint16x4x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v4i16.p0i8(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v4i16.p0i8(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_u16(uint16_t *a, uint16x4x4_t b) {			void test_vst4_lane_u16(uint16_t *a, uint16x4x4_t b) {
	vst4_lane_u16(a, b, 3);			vst4_lane_u16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_u32(i32* %a, [4 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_u32(i32* %a, [4 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint32x2x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x4_t, %struct.uint32x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint32x2x4_t, %struct.uint32x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i32>] [[B]].coerce, [4 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <2 x i32>] [[B]].coerce, [4 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint32x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint32x2x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x i32>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x i32>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <2 x i32>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <2 x i32>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v2i32.p0i8(<2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> [[TMP13]], <2 x i32> [[TMP14]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v2i32.p0i8(<2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> [[TMP13]], <2 x i32> [[TMP14]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_u32(uint32_t *a, uint32x2x4_t b) {			void test_vst4_lane_u32(uint32_t *a, uint32x2x4_t b) {
	vst4_lane_u32(a, b, 1);			vst4_lane_u32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_u64(i64* %a, [4 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_u64(i64* %a, [4 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.uint64x1x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.uint64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x4_t, %struct.uint64x1x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint64x1x4_t, %struct.uint64x1x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.uint64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.uint64x1x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x i64>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_u64(uint64_t *a, uint64x1x4_t b) {			void test_vst4_lane_u64(uint64_t *a, uint64x1x4_t b) {
	vst4_lane_u64(a, b, 0);			vst4_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_s8(i8* %a, [4 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_s8(i8* %a, [4 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int8x8x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int8x8x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int8x8x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int8x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int8x8x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__S1]], i32 0, i32 0
	Show All 9 Lines
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5]], i64 0, i64 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5]], i64 0, i64 3
	// CHECK: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6]], align 8			// CHECK: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6]], align 8
	// CHECK: call void @llvm.aarch64.neon.st4lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st4lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_s8(int8_t *a, int8x8x4_t b) {			void test_vst4_lane_s8(int8_t *a, int8x8x4_t b) {
	vst4_lane_s8(a, b, 7);			vst4_lane_s8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_s16(i16* %a, [4 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_s16(i16* %a, [4 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int16x4x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int16x4x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int16x4x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x4_t, %struct.int16x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int16x4x4_t, %struct.int16x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int16x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int16x4x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v4i16.p0i8(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v4i16.p0i8(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_s16(int16_t *a, int16x4x4_t b) {			void test_vst4_lane_s16(int16_t *a, int16x4x4_t b) {
	vst4_lane_s16(a, b, 3);			vst4_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_s32(i32* %a, [4 x <2 x i32>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_s32(i32* %a, [4 x <2 x i32>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int32x2x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x4_t, %struct.int32x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int32x2x4_t, %struct.int32x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x i32>] [[B]].coerce, [4 x <2 x i32>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <2 x i32>] [[B]].coerce, [4 x <2 x i32>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int32x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int32x2x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i32 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x i32>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x i32>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <2 x i32>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <2 x i32>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v2i32.p0i8(<2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> [[TMP13]], <2 x i32> [[TMP14]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v2i32.p0i8(<2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> [[TMP13]], <2 x i32> [[TMP14]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_s32(int32_t *a, int32x2x4_t b) {			void test_vst4_lane_s32(int32_t *a, int32x2x4_t b) {
	vst4_lane_s32(a, b, 1);			vst4_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_s64(i64* %a, [4 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_s64(i64* %a, [4 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.int64x1x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.int64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x4_t, %struct.int64x1x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int64x1x4_t, %struct.int64x1x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.int64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.int64x1x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x i64>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_s64(int64_t *a, int64x1x4_t b) {			void test_vst4_lane_s64(int64_t *a, int64x1x4_t b) {
	vst4_lane_s64(a, b, 0);			vst4_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_f16(half* %a, [4 x <4 x half>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_f16(half* %a, [4 x <4 x half>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x half>] [[B]].coerce, [4 x <4 x half>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <4 x half>] [[B]].coerce, [4 x <4 x half>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float16x4x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast half %a to i8*			// CHECK: [[TMP2:%.]] = bitcast half %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v4f16.p0i8(<4 x half> [[TMP11]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v4f16.p0i8(<4 x half> [[TMP11]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_f16(float16_t *a, float16x4x4_t b) {			void test_vst4_lane_f16(float16_t *a, float16x4x4_t b) {
	vst4_lane_f16(a, b, 3);			vst4_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_f32(float* %a, [4 x <2 x float>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_f32(float* %a, [4 x <2 x float>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x4_t, %struct.float32x2x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float32x2x4_t, %struct.float32x2x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <2 x float>] [[B]].coerce, [4 x <2 x float>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <2 x float>] [[B]].coerce, [4 x <2 x float>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float32x2x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast float %a to i8*			// CHECK: [[TMP2:%.]] = bitcast float %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x float>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <2 x float>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <2 x float>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <2 x float>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v2f32.p0i8(<2 x float> [[TMP11]], <2 x float> [[TMP12]], <2 x float> [[TMP13]], <2 x float> [[TMP14]], i64 1, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v2f32.p0i8(<2 x float> [[TMP11]], <2 x float> [[TMP12]], <2 x float> [[TMP13]], <2 x float> [[TMP14]], i64 1, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_f32(float32_t *a, float32x2x4_t b) {			void test_vst4_lane_f32(float32_t *a, float32x2x4_t b) {
	vst4_lane_f32(a, b, 1);			vst4_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_f64(double* %a, [4 x <1 x double>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_f64(double* %a, [4 x <1 x double>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.float64x1x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x4_t, %struct.float64x1x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.float64x1x4_t, %struct.float64x1x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <1 x double>] [[B]].coerce, [4 x <1 x double>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x double>] [[B]].coerce, [4 x <1 x double>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.float64x1x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast double %a to i8*			// CHECK: [[TMP2:%.]] = bitcast double %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x double>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x double>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x double>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x double>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v1f64.p0i8(<1 x double> [[TMP11]], <1 x double> [[TMP12]], <1 x double> [[TMP13]], <1 x double> [[TMP14]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v1f64.p0i8(<1 x double> [[TMP11]], <1 x double> [[TMP12]], <1 x double> [[TMP13]], <1 x double> [[TMP14]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_f64(float64_t *a, float64x1x4_t b) {			void test_vst4_lane_f64(float64_t *a, float64x1x4_t b) {
	vst4_lane_f64(a, b, 0);			vst4_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_p8(i8* %a, [4 x <8 x i8>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_p8(i8* %a, [4 x <8 x i8>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly8x8x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly8x8x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly8x8x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <8 x i8>] [[B]].coerce, [4 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly8x8x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly8x8x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__S1]], i32 0, i32 0
	Show All 9 Lines
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5]], i64 0, i64 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5]], i64 0, i64 3
	// CHECK: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6]], align 8			// CHECK: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6]], align 8
	// CHECK: call void @llvm.aarch64.neon.st4lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], i64 7, i8* %a)			// CHECK: call void @llvm.aarch64.neon.st4lane.v8i8.p0i8(<8 x i8> [[TMP2]], <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], i64 7, i8* %a)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_p8(poly8_t *a, poly8x8x4_t b) {			void test_vst4_lane_p8(poly8_t *a, poly8x8x4_t b) {
	vst4_lane_p8(a, b, 7);			vst4_lane_p8(a, b, 7);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_p16(i16* %a, [4 x <4 x i16>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_p16(i16* %a, [4 x <4 x i16>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly16x4x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly16x4x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly16x4x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x4_t, %struct.poly16x4x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly16x4x4_t, %struct.poly16x4x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <4 x i16>] [[B]].coerce, [4 x <4 x i16>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly16x4x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly16x4x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i16 %a to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v4i16.p0i8(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i64 3, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v4i16.p0i8(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i64 3, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_p16(poly16_t *a, poly16x4x4_t b) {			void test_vst4_lane_p16(poly16_t *a, poly16x4x4_t b) {
	vst4_lane_p16(a, b, 3);			vst4_lane_p16(a, b, 3);
	}			}

	// CHECK-LABEL: define void @test_vst4_lane_p64(i64* %a, [4 x <1 x i64>] %b.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_lane_p64(i64* %a, [4 x <1 x i64>] %b.coerce) #2 {
	// CHECK: [[B:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x4_t, %struct.poly64x1x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x4_t, %struct.poly64x1x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x i64>] [[B]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x4_t [[B]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x4_t [[B]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %a to i8*
	Show All 17 Lines
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x i64>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st4lane.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_p64(poly64_t *a, poly64x1x4_t b) {			void test_vst4_lane_p64(poly64_t *a, poly64x1x4_t b) {
	vst4_lane_p64(a, b, 0);			vst4_lane_p64(a, b, 0);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="128"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="64"
				// CHECK-NOT: attributes #2 ={{.*}}"min-legal-vector-width"

test/CodeGen/aarch64-neon-scalar-copy.c

	Show All 17 Lines
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>
	// CHECK: [[VDUPD_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0			// CHECK: [[VDUPD_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0
	// CHECK: ret double [[VDUPD_LANE]]			// CHECK: ret double [[VDUPD_LANE]]
	float64_t test_vdupd_lane_f64(float64x1_t a) {			float64_t test_vdupd_lane_f64(float64x1_t a) {
	return vdupd_lane_f64(a, 0);			return vdupd_lane_f64(a, 0);
	}			}


	// CHECK-LABEL: define float @test_vdups_laneq_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define float @test_vdups_laneq_f32(<4 x float> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> [[TMP1]], i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
	// CHECK: ret float [[VGETQ_LANE]]			// CHECK: ret float [[VGETQ_LANE]]
	float32_t test_vdups_laneq_f32(float32x4_t a) {			float32_t test_vdups_laneq_f32(float32x4_t a) {
	return vdups_laneq_f32(a, 3);			return vdups_laneq_f32(a, 3);
	}			}


	// CHECK-LABEL: define double @test_vdupd_laneq_f64(<2 x double> %a) #0 {			// CHECK-LABEL: define double @test_vdupd_laneq_f64(<2 x double> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	// CHECK: ret double [[VGETQ_LANE]]			// CHECK: ret double [[VGETQ_LANE]]
	float64_t test_vdupd_laneq_f64(float64x2_t a) {			float64_t test_vdupd_laneq_f64(float64x2_t a) {
	return vdupd_laneq_f64(a, 1);			return vdupd_laneq_f64(a, 1);
	}			}

	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %a to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %a to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0
	// CHECK: ret i64 [[VGET_LANE]]			// CHECK: ret i64 [[VGET_LANE]]
	uint64_t test_vdupd_lane_u64(uint64x1_t a) {			uint64_t test_vdupd_lane_u64(uint64x1_t a) {
	return vdupd_lane_u64(a, 0);			return vdupd_lane_u64(a, 0);
	}			}

	// CHECK-LABEL: define i8 @test_vdupb_laneq_s8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vdupb_laneq_s8(<16 x i8> %a) #1 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15
	// CHECK: ret i8 [[VGETQ_LANE]]			// CHECK: ret i8 [[VGETQ_LANE]]
	int8_t test_vdupb_laneq_s8(int8x16_t a) {			int8_t test_vdupb_laneq_s8(int8x16_t a) {
	return vdupb_laneq_s8(a, 15);			return vdupb_laneq_s8(a, 15);
	}			}


	// CHECK-LABEL: define i16 @test_vduph_laneq_s16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vduph_laneq_s16(<8 x i16> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
	// CHECK: ret i16 [[VGETQ_LANE]]			// CHECK: ret i16 [[VGETQ_LANE]]
	int16_t test_vduph_laneq_s16(int16x8_t a) {			int16_t test_vduph_laneq_s16(int16x8_t a) {
	return vduph_laneq_s16(a, 7);			return vduph_laneq_s16(a, 7);
	}			}


	// CHECK-LABEL: define i32 @test_vdups_laneq_s32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vdups_laneq_s32(<4 x i32> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
	// CHECK: ret i32 [[VGETQ_LANE]]			// CHECK: ret i32 [[VGETQ_LANE]]
	int32_t test_vdups_laneq_s32(int32x4_t a) {			int32_t test_vdups_laneq_s32(int32x4_t a) {
	return vdups_laneq_s32(a, 3);			return vdups_laneq_s32(a, 3);
	}			}


	// CHECK-LABEL: define i64 @test_vdupd_laneq_s64(<2 x i64> %a) #0 {			// CHECK-LABEL: define i64 @test_vdupd_laneq_s64(<2 x i64> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
	// CHECK: ret i64 [[VGETQ_LANE]]			// CHECK: ret i64 [[VGETQ_LANE]]
	int64_t test_vdupd_laneq_s64(int64x2_t a) {			int64_t test_vdupd_laneq_s64(int64x2_t a) {
	return vdupd_laneq_s64(a, 1);			return vdupd_laneq_s64(a, 1);
	}			}


	// CHECK-LABEL: define i8 @test_vdupb_laneq_u8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vdupb_laneq_u8(<16 x i8> %a) #1 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15
	// CHECK: ret i8 [[VGETQ_LANE]]			// CHECK: ret i8 [[VGETQ_LANE]]
	uint8_t test_vdupb_laneq_u8(uint8x16_t a) {			uint8_t test_vdupb_laneq_u8(uint8x16_t a) {
	return vdupb_laneq_u8(a, 15);			return vdupb_laneq_u8(a, 15);
	}			}


	// CHECK-LABEL: define i16 @test_vduph_laneq_u16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vduph_laneq_u16(<8 x i16> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
	// CHECK: ret i16 [[VGETQ_LANE]]			// CHECK: ret i16 [[VGETQ_LANE]]
	uint16_t test_vduph_laneq_u16(uint16x8_t a) {			uint16_t test_vduph_laneq_u16(uint16x8_t a) {
	return vduph_laneq_u16(a, 7);			return vduph_laneq_u16(a, 7);
	}			}


	// CHECK-LABEL: define i32 @test_vdups_laneq_u32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vdups_laneq_u32(<4 x i32> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
	// CHECK: ret i32 [[VGETQ_LANE]]			// CHECK: ret i32 [[VGETQ_LANE]]
	uint32_t test_vdups_laneq_u32(uint32x4_t a) {			uint32_t test_vdups_laneq_u32(uint32x4_t a) {
	return vdups_laneq_u32(a, 3);			return vdups_laneq_u32(a, 3);
	}			}


	// CHECK-LABEL: define i64 @test_vdupd_laneq_u64(<2 x i64> %a) #0 {			// CHECK-LABEL: define i64 @test_vdupd_laneq_u64(<2 x i64> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
	// CHECK: ret i64 [[VGETQ_LANE]]			// CHECK: ret i64 [[VGETQ_LANE]]
	uint64_t test_vdupd_laneq_u64(uint64x2_t a) {			uint64_t test_vdupd_laneq_u64(uint64x2_t a) {
	return vdupd_laneq_u64(a, 1);			return vdupd_laneq_u64(a, 1);
	}			}

	// CHECK-LABEL: define i8 @test_vdupb_lane_p8(<8 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vdupb_lane_p8(<8 x i8> %a) #0 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <8 x i8> %a, i32 7			// CHECK: [[VGET_LANE:%.*]] = extractelement <8 x i8> %a, i32 7
	// CHECK: ret i8 [[VGET_LANE]]			// CHECK: ret i8 [[VGET_LANE]]
	poly8_t test_vdupb_lane_p8(poly8x8_t a) {			poly8_t test_vdupb_lane_p8(poly8x8_t a) {
	return vdupb_lane_p8(a, 7);			return vdupb_lane_p8(a, 7);
	}			}

	// CHECK-LABEL: define i16 @test_vduph_lane_p16(<4 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vduph_lane_p16(<4 x i16> %a) #0 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x i16> %a to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x i16> %a to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK: [[VGET_LANE:%.*]] = extractelement <4 x i16> [[TMP1]], i32 3			// CHECK: [[VGET_LANE:%.*]] = extractelement <4 x i16> [[TMP1]], i32 3
	// CHECK: ret i16 [[VGET_LANE]]			// CHECK: ret i16 [[VGET_LANE]]
	poly16_t test_vduph_lane_p16(poly16x4_t a) {			poly16_t test_vduph_lane_p16(poly16x4_t a) {
	return vduph_lane_p16(a, 3);			return vduph_lane_p16(a, 3);
	}			}

	// CHECK-LABEL: define i8 @test_vdupb_laneq_p8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vdupb_laneq_p8(<16 x i8> %a) #1 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15
	// CHECK: ret i8 [[VGETQ_LANE]]			// CHECK: ret i8 [[VGETQ_LANE]]
	poly8_t test_vdupb_laneq_p8(poly8x16_t a) {			poly8_t test_vdupb_laneq_p8(poly8x16_t a) {
	return vdupb_laneq_p8(a, 15);			return vdupb_laneq_p8(a, 15);
	}			}

	// CHECK-LABEL: define i16 @test_vduph_laneq_p16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vduph_laneq_p16(<8 x i16> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
	// CHECK: ret i16 [[VGETQ_LANE]]			// CHECK: ret i16 [[VGETQ_LANE]]
	poly16_t test_vduph_laneq_p16(poly16x8_t a) {			poly16_t test_vduph_laneq_p16(poly16x8_t a) {
	return vduph_laneq_p16(a, 7);			return vduph_laneq_p16(a, 7);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/aarch64-neon-scalar-x-indexed-elem.c

Show All 20 Lines
// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>		// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>
// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0		// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0
// CHECK: [[MUL:%.*]] = fmul double %a, [[VGET_LANE]]		// CHECK: [[MUL:%.*]] = fmul double %a, [[VGET_LANE]]
// CHECK: ret double [[MUL]]		// CHECK: ret double [[MUL]]
float64_t test_vmuld_lane_f64(float64_t a, float64x1_t b) {		float64_t test_vmuld_lane_f64(float64_t a, float64x1_t b) {
return vmuld_lane_f64(a, b, 0);		return vmuld_lane_f64(a, b, 0);
}		}

// CHECK-LABEL: define float @test_vmuls_laneq_f32(float %a, <4 x float> %b) #0 {		// CHECK-LABEL: define float @test_vmuls_laneq_f32(float %a, <4 x float> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> [[TMP1]], i32 3		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
// CHECK: [[MUL:%.*]] = fmul float %a, [[VGETQ_LANE]]		// CHECK: [[MUL:%.*]] = fmul float %a, [[VGETQ_LANE]]
// CHECK: ret float [[MUL]]		// CHECK: ret float [[MUL]]
float32_t test_vmuls_laneq_f32(float32_t a, float32x4_t b) {		float32_t test_vmuls_laneq_f32(float32_t a, float32x4_t b) {
return vmuls_laneq_f32(a, b, 3);		return vmuls_laneq_f32(a, b, 3);
}		}

// CHECK-LABEL: define double @test_vmuld_laneq_f64(double %a, <2 x double> %b) #0 {		// CHECK-LABEL: define double @test_vmuld_laneq_f64(double %a, <2 x double> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
// CHECK: [[MUL:%.*]] = fmul double %a, [[VGETQ_LANE]]		// CHECK: [[MUL:%.*]] = fmul double %a, [[VGETQ_LANE]]
// CHECK: ret double [[MUL]]		// CHECK: ret double [[MUL]]
float64_t test_vmuld_laneq_f64(float64_t a, float64x2_t b) {		float64_t test_vmuld_laneq_f64(float64_t a, float64x2_t b) {
return vmuld_laneq_f64(a, b, 1);		return vmuld_laneq_f64(a, b, 1);
}		}
Show All 12 Lines
// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x float>		// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x float>
// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x float> [[TMP1]], i32 1		// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
// CHECK: [[VMULXS_F32_I:%.*]] = call float @llvm.aarch64.neon.fmulx.f32(float %a, float [[VGET_LANE]])		// CHECK: [[VMULXS_F32_I:%.*]] = call float @llvm.aarch64.neon.fmulx.f32(float %a, float [[VGET_LANE]])
// CHECK: ret float [[VMULXS_F32_I]]		// CHECK: ret float [[VMULXS_F32_I]]
float32_t test_vmulxs_lane_f32(float32_t a, float32x2_t b) {		float32_t test_vmulxs_lane_f32(float32_t a, float32x2_t b) {
return vmulxs_lane_f32(a, b, 1);		return vmulxs_lane_f32(a, b, 1);
}		}

// CHECK-LABEL: define float @test_vmulxs_laneq_f32(float %a, <4 x float> %b) #0 {		// CHECK-LABEL: define float @test_vmulxs_laneq_f32(float %a, <4 x float> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> [[TMP1]], i32 3		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
// CHECK: [[VMULXS_F32_I:%.*]] = call float @llvm.aarch64.neon.fmulx.f32(float %a, float [[VGETQ_LANE]])		// CHECK: [[VMULXS_F32_I:%.*]] = call float @llvm.aarch64.neon.fmulx.f32(float %a, float [[VGETQ_LANE]])
// CHECK: ret float [[VMULXS_F32_I]]		// CHECK: ret float [[VMULXS_F32_I]]
float32_t test_vmulxs_laneq_f32(float32_t a, float32x4_t b) {		float32_t test_vmulxs_laneq_f32(float32_t a, float32x4_t b) {
return vmulxs_laneq_f32(a, b, 3);		return vmulxs_laneq_f32(a, b, 3);
}		}

// CHECK-LABEL: define double @test_vmulxd_lane_f64(double %a, <1 x double> %b) #0 {		// CHECK-LABEL: define double @test_vmulxd_lane_f64(double %a, <1 x double> %b) #0 {
// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %b to <8 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %b to <8 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>		// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>
// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0		// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0
// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double %a, double [[VGET_LANE]])		// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double %a, double [[VGET_LANE]])
// CHECK: ret double [[VMULXD_F64_I]]		// CHECK: ret double [[VMULXD_F64_I]]
float64_t test_vmulxd_lane_f64(float64_t a, float64x1_t b) {		float64_t test_vmulxd_lane_f64(float64_t a, float64x1_t b) {
return vmulxd_lane_f64(a, b, 0);		return vmulxd_lane_f64(a, b, 0);
}		}

// CHECK-LABEL: define double @test_vmulxd_laneq_f64(double %a, <2 x double> %b) #0 {		// CHECK-LABEL: define double @test_vmulxd_laneq_f64(double %a, <2 x double> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double %a, double [[VGETQ_LANE]])		// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double %a, double [[VGETQ_LANE]])
// CHECK: ret double [[VMULXD_F64_I]]		// CHECK: ret double [[VMULXD_F64_I]]
float64_t test_vmulxd_laneq_f64(float64_t a, float64x2_t b) {		float64_t test_vmulxd_laneq_f64(float64_t a, float64x2_t b) {
return vmulxd_laneq_f64(a, b, 1);		return vmulxd_laneq_f64(a, b, 1);
}		}
Show All 10 Lines
// CHECK: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>		// CHECK: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>
// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> [[TMP5]], double [[VMULXD_F64_I]], i32 0		// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> [[TMP5]], double [[VMULXD_F64_I]], i32 0
// CHECK: ret <1 x double> [[VSET_LANE]]		// CHECK: ret <1 x double> [[VSET_LANE]]
float64x1_t test_vmulx_lane_f64(float64x1_t a, float64x1_t b) {		float64x1_t test_vmulx_lane_f64(float64x1_t a, float64x1_t b) {
return vmulx_lane_f64(a, b, 0);		return vmulx_lane_f64(a, b, 0);
}		}


// CHECK-LABEL: define <1 x double> @test_vmulx_laneq_f64_0(<1 x double> %a, <2 x double> %b) #0 {		// CHECK-LABEL: define <1 x double> @test_vmulx_laneq_f64_0(<1 x double> %a, <2 x double> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>		// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>
// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0		// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0
// CHECK: [[TMP2:%.*]] = bitcast <2 x double> %b to <16 x i8>		// CHECK: [[TMP2:%.*]] = bitcast <2 x double> %b to <16 x i8>
// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x double>		// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x double>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP3]], i32 0		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP3]], i32 0
// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])		// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])
// CHECK: [[TMP4:%.*]] = bitcast <1 x double> %a to <8 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <1 x double> %a to <8 x i8>
// CHECK: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>		// CHECK: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>
// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> [[TMP5]], double [[VMULXD_F64_I]], i32 0		// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> [[TMP5]], double [[VMULXD_F64_I]], i32 0
// CHECK: ret <1 x double> [[VSET_LANE]]		// CHECK: ret <1 x double> [[VSET_LANE]]
float64x1_t test_vmulx_laneq_f64_0(float64x1_t a, float64x2_t b) {		float64x1_t test_vmulx_laneq_f64_0(float64x1_t a, float64x2_t b) {
return vmulx_laneq_f64(a, b, 0);		return vmulx_laneq_f64(a, b, 0);
}		}

// CHECK-LABEL: define <1 x double> @test_vmulx_laneq_f64_1(<1 x double> %a, <2 x double> %b) #0 {		// CHECK-LABEL: define <1 x double> @test_vmulx_laneq_f64_1(<1 x double> %a, <2 x double> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>		// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>
// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0		// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP1]], i32 0
// CHECK: [[TMP2:%.*]] = bitcast <2 x double> %b to <16 x i8>		// CHECK: [[TMP2:%.*]] = bitcast <2 x double> %b to <16 x i8>
// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x double>		// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x double>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP3]], i32 1		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[TMP3]], i32 1
// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])		// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])
// CHECK: [[TMP4:%.*]] = bitcast <1 x double> %a to <8 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <1 x double> %a to <8 x i8>
Show All 20 Lines
// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>		// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>
// CHECK: [[EXTRACT:%.*]] = extractelement <1 x double> [[TMP1]], i32 0		// CHECK: [[EXTRACT:%.*]] = extractelement <1 x double> [[TMP1]], i32 0
// CHECK: [[TMP2:%.*]] = call double @llvm.fma.f64(double %b, double [[EXTRACT]], double %a)		// CHECK: [[TMP2:%.*]] = call double @llvm.fma.f64(double %b, double [[EXTRACT]], double %a)
// CHECK: ret double [[TMP2]]		// CHECK: ret double [[TMP2]]
float64_t test_vfmad_lane_f64(float64_t a, float64_t b, float64x1_t c) {		float64_t test_vfmad_lane_f64(float64_t a, float64_t b, float64x1_t c) {
return vfmad_lane_f64(a, b, c, 0);		return vfmad_lane_f64(a, b, c, 0);
}		}

// CHECK-LABEL: define double @test_vfmad_laneq_f64(double %a, double %b, <2 x double> %c) #0 {		// CHECK-LABEL: define double @test_vfmad_laneq_f64(double %a, double %b, <2 x double> %c) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %c to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %c to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>
// CHECK: [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		// CHECK: [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
// CHECK: [[TMP2:%.*]] = call double @llvm.fma.f64(double %b, double [[EXTRACT]], double %a)		// CHECK: [[TMP2:%.*]] = call double @llvm.fma.f64(double %b, double [[EXTRACT]], double %a)
// CHECK: ret double [[TMP2]]		// CHECK: ret double [[TMP2]]
float64_t test_vfmad_laneq_f64(float64_t a, float64_t b, float64x2_t c) {		float64_t test_vfmad_laneq_f64(float64_t a, float64_t b, float64x2_t c) {
return vfmad_laneq_f64(a, b, c, 1);		return vfmad_laneq_f64(a, b, c, 1);
}		}
Show All 33 Lines
// CHECK: [[FMLA:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>		// CHECK: [[FMLA:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
// CHECK: [[FMLA1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>		// CHECK: [[FMLA1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x double>
// CHECK: [[FMLA2:%.*]] = call <1 x double> @llvm.fma.v1f64(<1 x double> [[FMLA]], <1 x double> [[LANE]], <1 x double> [[FMLA1]])		// CHECK: [[FMLA2:%.*]] = call <1 x double> @llvm.fma.v1f64(<1 x double> [[FMLA]], <1 x double> [[LANE]], <1 x double> [[FMLA1]])
// CHECK: ret <1 x double> [[FMLA2]]		// CHECK: ret <1 x double> [[FMLA2]]
float64x1_t test_vfms_lane_f64(float64x1_t a, float64x1_t b, float64x1_t v) {		float64x1_t test_vfms_lane_f64(float64x1_t a, float64x1_t b, float64x1_t v) {
return vfms_lane_f64(a, b, v, 0);		return vfms_lane_f64(a, b, v, 0);
}		}

// CHECK-LABEL: define <1 x double> @test_vfma_laneq_f64(<1 x double> %a, <1 x double> %b, <2 x double> %v) #0 {		// CHECK-LABEL: define <1 x double> @test_vfma_laneq_f64(<1 x double> %a, <1 x double> %b, <2 x double> %v) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <1 x double> %b to <8 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <1 x double> %b to <8 x i8>
// CHECK: [[TMP2:%.*]] = bitcast <2 x double> %v to <16 x i8>		// CHECK: [[TMP2:%.*]] = bitcast <2 x double> %v to <16 x i8>
// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP0]] to double		// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP0]] to double
// CHECK: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to double		// CHECK: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to double
// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x double>		// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x double>
// CHECK: [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP5]], i32 0		// CHECK: [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
// CHECK: [[TMP6:%.*]] = call double @llvm.fma.f64(double [[TMP4]], double [[EXTRACT]], double [[TMP3]])		// CHECK: [[TMP6:%.*]] = call double @llvm.fma.f64(double [[TMP4]], double [[EXTRACT]], double [[TMP3]])
// CHECK: [[TMP7:%.*]] = bitcast double [[TMP6]] to <1 x double>		// CHECK: [[TMP7:%.*]] = bitcast double [[TMP6]] to <1 x double>
// CHECK: ret <1 x double> [[TMP7]]		// CHECK: ret <1 x double> [[TMP7]]
float64x1_t test_vfma_laneq_f64(float64x1_t a, float64x1_t b, float64x2_t v) {		float64x1_t test_vfma_laneq_f64(float64x1_t a, float64x1_t b, float64x2_t v) {
return vfma_laneq_f64(a, b, v, 0);		return vfma_laneq_f64(a, b, v, 0);
}		}

// CHECK-LABEL: define <1 x double> @test_vfms_laneq_f64(<1 x double> %a, <1 x double> %b, <2 x double> %v) #0 {		// CHECK-LABEL: define <1 x double> @test_vfms_laneq_f64(<1 x double> %a, <1 x double> %b, <2 x double> %v) #1 {
// CHECK: [[SUB:%.*]] = fsub <1 x double> <double -0.000000e+00>, %b		// CHECK: [[SUB:%.*]] = fsub <1 x double> <double -0.000000e+00>, %b
// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <1 x double> [[SUB]] to <8 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <1 x double> [[SUB]] to <8 x i8>
// CHECK: [[TMP2:%.*]] = bitcast <2 x double> %v to <16 x i8>		// CHECK: [[TMP2:%.*]] = bitcast <2 x double> %v to <16 x i8>
// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP0]] to double		// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP0]] to double
// CHECK: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to double		// CHECK: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to double
// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x double>		// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x double>
// CHECK: [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP5]], i32 0		// CHECK: [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
Show All 22 Lines
// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>		// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1		// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
// CHECK: [[VQDMULLS_S32_I:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %a, i32 [[VGET_LANE]])		// CHECK: [[VQDMULLS_S32_I:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %a, i32 [[VGET_LANE]])
// CHECK: ret i64 [[VQDMULLS_S32_I]]		// CHECK: ret i64 [[VQDMULLS_S32_I]]
int64_t test_vqdmulls_lane_s32(int32_t a, int32x2_t b) {		int64_t test_vqdmulls_lane_s32(int32_t a, int32x2_t b) {
return vqdmulls_lane_s32(a, b, 1);		return vqdmulls_lane_s32(a, b, 1);
}		}

// CHECK-LABEL: define i32 @test_vqdmullh_laneq_s16(i16 %a, <8 x i16> %b) #0 {		// CHECK-LABEL: define i32 @test_vqdmullh_laneq_s16(i16 %a, <8 x i16> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0		// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0		// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0
// CHECK: [[VQDMULLH_S16_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])		// CHECK: [[VQDMULLH_S16_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
// CHECK: [[TMP4:%.*]] = extractelement <4 x i32> [[VQDMULLH_S16_I]], i64 0		// CHECK: [[TMP4:%.*]] = extractelement <4 x i32> [[VQDMULLH_S16_I]], i64 0
// CHECK: ret i32 [[TMP4]]		// CHECK: ret i32 [[TMP4]]
int32_t test_vqdmullh_laneq_s16(int16_t a, int16x8_t b) {		int32_t test_vqdmullh_laneq_s16(int16_t a, int16x8_t b) {
return vqdmullh_laneq_s16(a, b, 7);		return vqdmullh_laneq_s16(a, b, 7);
}		}

// CHECK-LABEL: define i64 @test_vqdmulls_laneq_s32(i32 %a, <4 x i32> %b) #0 {		// CHECK-LABEL: define i64 @test_vqdmulls_laneq_s32(i32 %a, <4 x i32> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
// CHECK: [[VQDMULLS_S32_I:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %a, i32 [[VGETQ_LANE]])		// CHECK: [[VQDMULLS_S32_I:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %a, i32 [[VGETQ_LANE]])
// CHECK: ret i64 [[VQDMULLS_S32_I]]		// CHECK: ret i64 [[VQDMULLS_S32_I]]
int64_t test_vqdmulls_laneq_s32(int32_t a, int32x4_t b) {		int64_t test_vqdmulls_laneq_s32(int32_t a, int32x4_t b) {
return vqdmulls_laneq_s32(a, b, 3);		return vqdmulls_laneq_s32(a, b, 3);
}		}
Show All 17 Lines
// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1		// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
// CHECK: [[VQDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqdmulh.i32(i32 %a, i32 [[VGET_LANE]])		// CHECK: [[VQDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqdmulh.i32(i32 %a, i32 [[VGET_LANE]])
// CHECK: ret i32 [[VQDMULHS_S32_I]]		// CHECK: ret i32 [[VQDMULHS_S32_I]]
int32_t test_vqdmulhs_lane_s32(int32_t a, int32x2_t b) {		int32_t test_vqdmulhs_lane_s32(int32_t a, int32x2_t b) {
return vqdmulhs_lane_s32(a, b, 1);		return vqdmulhs_lane_s32(a, b, 1);
}		}


// CHECK-LABEL: define i16 @test_vqdmulhh_laneq_s16(i16 %a, <8 x i16> %b) #0 {		// CHECK-LABEL: define i16 @test_vqdmulhh_laneq_s16(i16 %a, <8 x i16> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0		// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0		// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0
// CHECK: [[VQDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])		// CHECK: [[VQDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQDMULHH_S16_I]], i64 0		// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQDMULHH_S16_I]], i64 0
// CHECK: ret i16 [[TMP4]]		// CHECK: ret i16 [[TMP4]]
int16_t test_vqdmulhh_laneq_s16(int16_t a, int16x8_t b) {		int16_t test_vqdmulhh_laneq_s16(int16_t a, int16x8_t b) {
return vqdmulhh_laneq_s16(a, b, 7);		return vqdmulhh_laneq_s16(a, b, 7);
}		}


// CHECK-LABEL: define i32 @test_vqdmulhs_laneq_s32(i32 %a, <4 x i32> %b) #0 {		// CHECK-LABEL: define i32 @test_vqdmulhs_laneq_s32(i32 %a, <4 x i32> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
// CHECK: [[VQDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqdmulh.i32(i32 %a, i32 [[VGETQ_LANE]])		// CHECK: [[VQDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqdmulh.i32(i32 %a, i32 [[VGETQ_LANE]])
// CHECK: ret i32 [[VQDMULHS_S32_I]]		// CHECK: ret i32 [[VQDMULHS_S32_I]]
int32_t test_vqdmulhs_laneq_s32(int32_t a, int32x4_t b) {		int32_t test_vqdmulhs_laneq_s32(int32_t a, int32x4_t b) {
return vqdmulhs_laneq_s32(a, b, 3);		return vqdmulhs_laneq_s32(a, b, 3);
}		}
Show All 17 Lines
// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1		// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
// CHECK: [[VQRDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %a, i32 [[VGET_LANE]])		// CHECK: [[VQRDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %a, i32 [[VGET_LANE]])
// CHECK: ret i32 [[VQRDMULHS_S32_I]]		// CHECK: ret i32 [[VQRDMULHS_S32_I]]
int32_t test_vqrdmulhs_lane_s32(int32_t a, int32x2_t b) {		int32_t test_vqrdmulhs_lane_s32(int32_t a, int32x2_t b) {
return vqrdmulhs_lane_s32(a, b, 1);		return vqrdmulhs_lane_s32(a, b, 1);
}		}


// CHECK-LABEL: define i16 @test_vqrdmulhh_laneq_s16(i16 %a, <8 x i16> %b) #0 {		// CHECK-LABEL: define i16 @test_vqrdmulhh_laneq_s16(i16 %a, <8 x i16> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0		// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0		// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0
// CHECK: [[VQRDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])		// CHECK: [[VQRDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQRDMULHH_S16_I]], i64 0		// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQRDMULHH_S16_I]], i64 0
// CHECK: ret i16 [[TMP4]]		// CHECK: ret i16 [[TMP4]]
int16_t test_vqrdmulhh_laneq_s16(int16_t a, int16x8_t b) {		int16_t test_vqrdmulhh_laneq_s16(int16_t a, int16x8_t b) {
return vqrdmulhh_laneq_s16(a, b, 7);		return vqrdmulhh_laneq_s16(a, b, 7);
}		}


// CHECK-LABEL: define i32 @test_vqrdmulhs_laneq_s32(i32 %a, <4 x i32> %b) #0 {		// CHECK-LABEL: define i32 @test_vqrdmulhs_laneq_s32(i32 %a, <4 x i32> %b) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3		// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
// CHECK: [[VQRDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %a, i32 [[VGETQ_LANE]])		// CHECK: [[VQRDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %a, i32 [[VGETQ_LANE]])
// CHECK: ret i32 [[VQRDMULHS_S32_I]]		// CHECK: ret i32 [[VQRDMULHS_S32_I]]
int32_t test_vqrdmulhs_laneq_s32(int32_t a, int32x4_t b) {		int32_t test_vqrdmulhs_laneq_s32(int32_t a, int32x4_t b) {
return vqrdmulhs_laneq_s32(a, b, 3);		return vqrdmulhs_laneq_s32(a, b, 3);
}		}
Show All 18 Lines
// CHECK: [[LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1		// CHECK: [[LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])		// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])
// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqadd.i64(i64 %a, i64 [[VQDMLXL]])		// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqadd.i64(i64 %a, i64 [[VQDMLXL]])
// CHECK: ret i64 [[VQDMLXL1]]		// CHECK: ret i64 [[VQDMLXL1]]
int64_t test_vqdmlals_lane_s32(int64_t a, int32_t b, int32x2_t c) {		int64_t test_vqdmlals_lane_s32(int64_t a, int32_t b, int32x2_t c) {
return vqdmlals_lane_s32(a, b, c, 1);		return vqdmlals_lane_s32(a, b, c, 1);
}		}

// CHECK-LABEL: define i32 @test_vqdmlalh_laneq_s16(i32 %a, i16 %b, <8 x i16> %c) #0 {		// CHECK-LABEL: define i32 @test_vqdmlalh_laneq_s16(i32 %a, i16 %b, <8 x i16> %c) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %c to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %c to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
// CHECK: [[LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7		// CHECK: [[LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %b, i64 0		// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %b, i64 0
// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[LANE]], i64 0		// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[LANE]], i64 0
// CHECK: [[VQDMLXL:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])		// CHECK: [[VQDMLXL:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
// CHECK: [[LANE0:%.*]] = extractelement <4 x i32> [[VQDMLXL]], i64 0		// CHECK: [[LANE0:%.*]] = extractelement <4 x i32> [[VQDMLXL]], i64 0
// CHECK: [[VQDMLXL1:%.*]] = call i32 @llvm.aarch64.neon.sqadd.i32(i32 %a, i32 [[LANE0]])		// CHECK: [[VQDMLXL1:%.*]] = call i32 @llvm.aarch64.neon.sqadd.i32(i32 %a, i32 [[LANE0]])
// CHECK: ret i32 [[VQDMLXL1]]		// CHECK: ret i32 [[VQDMLXL1]]
int32_t test_vqdmlalh_laneq_s16(int32_t a, int16_t b, int16x8_t c) {		int32_t test_vqdmlalh_laneq_s16(int32_t a, int16_t b, int16x8_t c) {
return vqdmlalh_laneq_s16(a, b, c, 7);		return vqdmlalh_laneq_s16(a, b, c, 7);
}		}

// CHECK-LABEL: define i64 @test_vqdmlals_laneq_s32(i64 %a, i32 %b, <4 x i32> %c) #0 {		// CHECK-LABEL: define i64 @test_vqdmlals_laneq_s32(i64 %a, i32 %b, <4 x i32> %c) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %c to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %c to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
// CHECK: [[LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3		// CHECK: [[LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])		// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])
// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqadd.i64(i64 %a, i64 [[VQDMLXL]])		// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqadd.i64(i64 %a, i64 [[VQDMLXL]])
// CHECK: ret i64 [[VQDMLXL1]]		// CHECK: ret i64 [[VQDMLXL1]]
int64_t test_vqdmlals_laneq_s32(int64_t a, int32_t b, int32x4_t c) {		int64_t test_vqdmlals_laneq_s32(int64_t a, int32_t b, int32x4_t c) {
return vqdmlals_laneq_s32(a, b, c, 3);		return vqdmlals_laneq_s32(a, b, c, 3);
Show All 19 Lines
// CHECK: [[LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1		// CHECK: [[LANE:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])		// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])
// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqsub.i64(i64 %a, i64 [[VQDMLXL]])		// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqsub.i64(i64 %a, i64 [[VQDMLXL]])
// CHECK: ret i64 [[VQDMLXL1]]		// CHECK: ret i64 [[VQDMLXL1]]
int64_t test_vqdmlsls_lane_s32(int64_t a, int32_t b, int32x2_t c) {		int64_t test_vqdmlsls_lane_s32(int64_t a, int32_t b, int32x2_t c) {
return vqdmlsls_lane_s32(a, b, c, 1);		return vqdmlsls_lane_s32(a, b, c, 1);
}		}

// CHECK-LABEL: define i32 @test_vqdmlslh_laneq_s16(i32 %a, i16 %b, <8 x i16> %c) #0 {		// CHECK-LABEL: define i32 @test_vqdmlslh_laneq_s16(i32 %a, i16 %b, <8 x i16> %c) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %c to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %c to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
// CHECK: [[LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7		// CHECK: [[LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %b, i64 0		// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %b, i64 0
// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[LANE]], i64 0		// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[LANE]], i64 0
// CHECK: [[VQDMLXL:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])		// CHECK: [[VQDMLXL:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
// CHECK: [[LANE0:%.*]] = extractelement <4 x i32> [[VQDMLXL]], i64 0		// CHECK: [[LANE0:%.*]] = extractelement <4 x i32> [[VQDMLXL]], i64 0
// CHECK: [[VQDMLXL1:%.*]] = call i32 @llvm.aarch64.neon.sqsub.i32(i32 %a, i32 [[LANE0]])		// CHECK: [[VQDMLXL1:%.*]] = call i32 @llvm.aarch64.neon.sqsub.i32(i32 %a, i32 [[LANE0]])
// CHECK: ret i32 [[VQDMLXL1]]		// CHECK: ret i32 [[VQDMLXL1]]
int32_t test_vqdmlslh_laneq_s16(int32_t a, int16_t b, int16x8_t c) {		int32_t test_vqdmlslh_laneq_s16(int32_t a, int16_t b, int16x8_t c) {
return vqdmlslh_laneq_s16(a, b, c, 7);		return vqdmlslh_laneq_s16(a, b, c, 7);
}		}

// CHECK-LABEL: define i64 @test_vqdmlsls_laneq_s32(i64 %a, i32 %b, <4 x i32> %c) #0 {		// CHECK-LABEL: define i64 @test_vqdmlsls_laneq_s32(i64 %a, i32 %b, <4 x i32> %c) #1 {
// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %c to <16 x i8>		// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %c to <16 x i8>
// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>		// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
// CHECK: [[LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3		// CHECK: [[LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])		// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])
// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqsub.i64(i64 %a, i64 [[VQDMLXL]])		// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqsub.i64(i64 %a, i64 [[VQDMLXL]])
// CHECK: ret i64 [[VQDMLXL1]]		// CHECK: ret i64 [[VQDMLXL1]]
int64_t test_vqdmlsls_laneq_s32(int64_t a, int32_t b, int32x4_t c) {		int64_t test_vqdmlsls_laneq_s32(int64_t a, int32_t b, int32x4_t c) {
return vqdmlsls_laneq_s32(a, b, c, 3);		return vqdmlsls_laneq_s32(a, b, c, 3);
Show All 19 Lines	float64x1_t test_vmulx_lane_f64_0() {
float64x1_t result;		float64x1_t result;
float64_t sarg1, sarg2, sres;		float64_t sarg1, sarg2, sres;
arg1 = vcreate_f64(UINT64_C(0x3fd6304bc43ab5c2));		arg1 = vcreate_f64(UINT64_C(0x3fd6304bc43ab5c2));
arg2 = vcreate_f64(UINT64_C(0x3fee211e215aeef3));		arg2 = vcreate_f64(UINT64_C(0x3fee211e215aeef3));
result = vmulx_lane_f64(arg1, arg2, 0);		result = vmulx_lane_f64(arg1, arg2, 0);
return result;		return result;
}		}

// CHECK-LABEL: define <1 x double> @test_vmulx_laneq_f64_2() #0 {		// CHECK-LABEL: define <1 x double> @test_vmulx_laneq_f64_2() #1 {
// CHECK: [[TMP0:%.*]] = bitcast i64 4599917171378402754 to <1 x double>		// CHECK: [[TMP0:%.*]] = bitcast i64 4599917171378402754 to <1 x double>
// CHECK: [[TMP1:%.*]] = bitcast i64 4606655882138939123 to <1 x double>		// CHECK: [[TMP1:%.*]] = bitcast i64 4606655882138939123 to <1 x double>
// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <1 x double> [[TMP0]], <1 x double> [[TMP1]], <2 x i32> <i32 0, i32 1>		// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <1 x double> [[TMP0]], <1 x double> [[TMP1]], <2 x i32> <i32 0, i32 1>
// CHECK: [[TMP2:%.*]] = bitcast <1 x double> [[TMP0]] to <8 x i8>		// CHECK: [[TMP2:%.*]] = bitcast <1 x double> [[TMP0]] to <8 x i8>
// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <1 x double>		// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <1 x double>
// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP3]], i32 0		// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP3]], i32 0
// CHECK: [[TMP4:%.*]] = bitcast <2 x double> [[SHUFFLE_I]] to <16 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <2 x double> [[SHUFFLE_I]] to <16 x i8>
// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x double>		// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x double>
Show All 10 Lines	float64x1_t test_vmulx_laneq_f64_2() {
float64x1_t result;		float64x1_t result;
float64_t sarg1, sarg2, sres;		float64_t sarg1, sarg2, sres;
arg1 = vcreate_f64(UINT64_C(0x3fd6304bc43ab5c2));		arg1 = vcreate_f64(UINT64_C(0x3fd6304bc43ab5c2));
arg2 = vcreate_f64(UINT64_C(0x3fee211e215aeef3));		arg2 = vcreate_f64(UINT64_C(0x3fee211e215aeef3));
arg3 = vcombine_f64(arg1, arg2);		arg3 = vcombine_f64(arg1, arg2);
result = vmulx_laneq_f64(arg1, arg3, 1);		result = vmulx_laneq_f64(arg1, arg3, 1);
return result;		return result;
}		}

		// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
		// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/aarch64-neon-tbl.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	// Test new aarch64 intrinsics and types			// Test new aarch64 intrinsics and types

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <8 x i8> @test_vtbl1_s8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl1_s8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #2			// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL11_I]]			// CHECK: ret <8 x i8> [[VTBL11_I]]
	int8x8_t test_vtbl1_s8(int8x8_t a, int8x8_t b) {			int8x8_t test_vtbl1_s8(int8x8_t a, int8x8_t b) {
	return vtbl1_s8(a, b);			return vtbl1_s8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl1_s8(<16 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl1_s8(<16 x i8> %a, <8 x i8> %b) #1 {
	// CHECK: [[VTBL1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> %a, <8 x i8> %b) #2			// CHECK: [[VTBL1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> %a, <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL1_I]]			// CHECK: ret <8 x i8> [[VTBL1_I]]
	int8x8_t test_vqtbl1_s8(int8x16_t a, int8x8_t b) {			int8x8_t test_vqtbl1_s8(int8x16_t a, int8x8_t b) {
	return vqtbl1_s8(a, b);			return vqtbl1_s8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl2_s8([2 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl2_s8([2 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x8x2_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x8x2_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.int8x8x2_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.int8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[A]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[A]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <8 x i8>], [2 x <8 x i8>] [[COERCE_DIVE1]], align 8			// CHECK: [[TMP0:%.]] = load [2 x <8 x i8>], [2 x <8 x i8>] [[COERCE_DIVE1]], align 8
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8			// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8			// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #2			// CHECK: [[VTBL13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL13_I]]			// CHECK: ret <8 x i8> [[VTBL13_I]]
	int8x8_t test_vtbl2_s8(int8x8x2_t a, int8x8_t b) {			int8x8_t test_vtbl2_s8(int8x8x2_t a, int8x8_t b) {
	return vtbl2_s8(a, b);			return vtbl2_s8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBL2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %b) #2			// CHECK: [[VTBL2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL2_I]]			// CHECK: ret <8 x i8> [[VTBL2_I]]
	int8x8_t test_vqtbl2_s8(int8x16x2_t a, int8x8_t b) {			int8x8_t test_vqtbl2_s8(int8x16x2_t a, int8x8_t b) {
	return vqtbl2_s8(a, b);			return vqtbl2_s8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl3_s8([3 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl3_s8([3 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x8x3_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.int8x8x3_t, align 8
	Show All 9 Lines
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %b) #2			// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL26_I]]			// CHECK: ret <8 x i8> [[VTBL26_I]]
	int8x8_t test_vtbl3_s8(int8x8x3_t a, int8x8_t b) {			int8x8_t test_vtbl3_s8(int8x8x3_t a, int8x8_t b) {
	return vtbl3_s8(a, b);			return vtbl3_s8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[A]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBL3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %b) #2			// CHECK: [[VTBL3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL3_I]]			// CHECK: ret <8 x i8> [[VTBL3_I]]
	int8x8_t test_vqtbl3_s8(int8x16x3_t a, int8x8_t b) {			int8x8_t test_vqtbl3_s8(int8x16x3_t a, int8x8_t b) {
	return vqtbl3_s8(a, b);			return vqtbl3_s8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl4_s8([4 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl4_s8([4 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x8x4_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x8x4_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.int8x8x4_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.int8x8x4_t, align 8
	Show All 12 Lines
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL27_I]], <8 x i8> %b) #2			// CHECK: [[VTBL28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL27_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL28_I]]			// CHECK: ret <8 x i8> [[VTBL28_I]]
	int8x8_t test_vtbl4_s8(int8x8x4_t a, int8x8_t b) {			int8x8_t test_vtbl4_s8(int8x8x4_t a, int8x8_t b) {
	return vtbl4_s8(a, b);			return vtbl4_s8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.int8x16x4_t, align 16
	Show All 10 Lines
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBL4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %b) #2			// CHECK: [[VTBL4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL4_I]]			// CHECK: ret <8 x i8> [[VTBL4_I]]
	int8x8_t test_vqtbl4_s8(int8x16x4_t a, int8x8_t b) {			int8x8_t test_vqtbl4_s8(int8x16x4_t a, int8x8_t b) {
	return vqtbl4_s8(a, b);			return vqtbl4_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl1q_s8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl1q_s8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[VTBL1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> %a, <16 x i8> %b) #2			// CHECK: [[VTBL1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> %a, <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL1_I]]			// CHECK: ret <16 x i8> [[VTBL1_I]]
	int8x16_t test_vqtbl1q_s8(int8x16_t a, int8x16_t b) {			int8x16_t test_vqtbl1q_s8(int8x16_t a, int8x16_t b) {
	return vqtbl1q_s8(a, b);			return vqtbl1q_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl2q_s8([2 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl2q_s8([2 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBL2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %b) #2			// CHECK: [[VTBL2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL2_I]]			// CHECK: ret <16 x i8> [[VTBL2_I]]
	int8x16_t test_vqtbl2q_s8(int8x16x2_t a, int8x16_t b) {			int8x16_t test_vqtbl2q_s8(int8x16x2_t a, int8x16_t b) {
	return vqtbl2q_s8(a, b);			return vqtbl2q_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl3q_s8([3 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl3q_s8([3 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[A]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBL3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %b) #2			// CHECK: [[VTBL3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL3_I]]			// CHECK: ret <16 x i8> [[VTBL3_I]]
	int8x16_t test_vqtbl3q_s8(int8x16x3_t a, int8x16_t b) {			int8x16_t test_vqtbl3q_s8(int8x16x3_t a, int8x16_t b) {
	return vqtbl3q_s8(a, b);			return vqtbl3q_s8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl4q_s8([4 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl4q_s8([4 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[A]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[A]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[A]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBL4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %b) #2			// CHECK: [[VTBL4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL4_I]]			// CHECK: ret <16 x i8> [[VTBL4_I]]
	int8x16_t test_vqtbl4q_s8(int8x16x4_t a, int8x16_t b) {			int8x16_t test_vqtbl4q_s8(int8x16x4_t a, int8x16_t b) {
	return vqtbl4q_s8(a, b);			return vqtbl4q_s8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbx1_s8(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbx1_s8(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c) #0 {
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %b, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %b, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %c) #2			// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %c) #3
	// CHECK: [[TMP0:%.*]] = icmp uge <8 x i8> %c, <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>			// CHECK: [[TMP0:%.*]] = icmp uge <8 x i8> %c, <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>
	// CHECK: [[TMP1:%.*]] = sext <8 x i1> [[TMP0]] to <8 x i8>			// CHECK: [[TMP1:%.*]] = sext <8 x i1> [[TMP0]] to <8 x i8>
	// CHECK: [[TMP2:%.*]] = and <8 x i8> [[TMP1]], %a			// CHECK: [[TMP2:%.*]] = and <8 x i8> [[TMP1]], %a
	// CHECK: [[TMP3:%.*]] = xor <8 x i8> [[TMP1]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[TMP3:%.*]] = xor <8 x i8> [[TMP1]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[TMP4:%.*]] = and <8 x i8> [[TMP3]], [[VTBL11_I]]			// CHECK: [[TMP4:%.*]] = and <8 x i8> [[TMP3]], [[VTBL11_I]]
	// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP2]], [[TMP4]]			// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP2]], [[TMP4]]
	// CHECK: ret <8 x i8> [[VTBX_I]]			// CHECK: ret <8 x i8> [[VTBX_I]]
	int8x8_t test_vtbx1_s8(int8x8_t a, int8x8_t b, int8x8_t c) {			int8x8_t test_vtbx1_s8(int8x8_t a, int8x8_t b, int8x8_t c) {
	Show All 11 Lines
	// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8			// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8			// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x8x2_t, %struct.int8x8x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VTBX1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> [[VTBX1_I]], <8 x i8> %c) #2			// CHECK: [[VTBX13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> [[VTBX1_I]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX13_I]]			// CHECK: ret <8 x i8> [[VTBX13_I]]
	int8x8_t test_vtbx2_s8(int8x8_t a, int8x8x2_t b, int8x8_t c) {			int8x8_t test_vtbx2_s8(int8x8_t a, int8x8x2_t b, int8x8_t c) {
	return vtbx2_s8(a, b, c);			return vtbx2_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbx3_s8(<8 x i8> %a, [3 x <8 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbx3_s8(<8 x i8> %a, [3 x <8 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x8x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.int8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.int8x8x3_t, align 8
	Show All 9 Lines
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x8x3_t, %struct.int8x8x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %c) #2			// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %c) #3
	// CHECK: [[TMP4:%.*]] = icmp uge <8 x i8> %c, <i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24>			// CHECK: [[TMP4:%.*]] = icmp uge <8 x i8> %c, <i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24>
	// CHECK: [[TMP5:%.*]] = sext <8 x i1> [[TMP4]] to <8 x i8>			// CHECK: [[TMP5:%.*]] = sext <8 x i1> [[TMP4]] to <8 x i8>
	// CHECK: [[TMP6:%.*]] = and <8 x i8> [[TMP5]], %a			// CHECK: [[TMP6:%.*]] = and <8 x i8> [[TMP5]], %a
	// CHECK: [[TMP7:%.*]] = xor <8 x i8> [[TMP5]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[TMP7:%.*]] = xor <8 x i8> [[TMP5]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[TMP8:%.*]] = and <8 x i8> [[TMP7]], [[VTBL26_I]]			// CHECK: [[TMP8:%.*]] = and <8 x i8> [[TMP7]], [[VTBL26_I]]
	// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP6]], [[TMP8]]			// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP6]], [[TMP8]]
	// CHECK: ret <8 x i8> [[VTBX_I]]			// CHECK: ret <8 x i8> [[VTBX_I]]
	int8x8_t test_vtbx3_s8(int8x8_t a, int8x8x3_t b, int8x8_t c) {			int8x8_t test_vtbx3_s8(int8x8_t a, int8x8x3_t b, int8x8_t c) {
	Show All 18 Lines
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x8x4_t, %struct.int8x8x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8
	// CHECK: [[VTBX2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[VTBX2_I]], <16 x i8> [[VTBX27_I]], <8 x i8> %c) #2			// CHECK: [[VTBX28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[VTBX2_I]], <16 x i8> [[VTBX27_I]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX28_I]]			// CHECK: ret <8 x i8> [[VTBX28_I]]
	int8x8_t test_vtbx4_s8(int8x8_t a, int8x8x4_t b, int8x8_t c) {			int8x8_t test_vtbx4_s8(int8x8_t a, int8x8x4_t b, int8x8_t c) {
	return vtbx4_s8(a, b, c);			return vtbx4_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx1_s8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx1_s8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #1 {
	// CHECK: [[VTBX1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #2			// CHECK: [[VTBX1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX1_I]]			// CHECK: ret <8 x i8> [[VTBX1_I]]
	int8x8_t test_vqtbx1_s8(int8x8_t a, int8x16_t b, int8x8_t c) {			int8x8_t test_vqtbx1_s8(int8x8_t a, int8x16_t b, int8x8_t c) {
	return vqtbx1_s8(a, b, c);			return vqtbx1_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx2_s8(<8 x i8> %a, [2 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx2_s8(<8 x i8> %a, [2 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBX2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %c) #2			// CHECK: [[VTBX2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX2_I]]			// CHECK: ret <8 x i8> [[VTBX2_I]]
	int8x8_t test_vqtbx2_s8(int8x8_t a, int8x16x2_t b, int8x8_t c) {			int8x8_t test_vqtbx2_s8(int8x8_t a, int8x16x2_t b, int8x8_t c) {
	return vqtbx2_s8(a, b, c);			return vqtbx2_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx3_s8(<8 x i8> %a, [3 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx3_s8(<8 x i8> %a, [3 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBX3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx3.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %c) #2			// CHECK: [[VTBX3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx3.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX3_I]]			// CHECK: ret <8 x i8> [[VTBX3_I]]
	int8x8_t test_vqtbx3_s8(int8x8_t a, int8x16x3_t b, int8x8_t c) {			int8x8_t test_vqtbx3_s8(int8x8_t a, int8x16x3_t b, int8x8_t c) {
	return vqtbx3_s8(a, b, c);			return vqtbx3_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx4_s8(<8 x i8> %a, [4 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx4_s8(<8 x i8> %a, [4 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x4_t, align 16
	Show All 10 Lines
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBX4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx4.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %c) #2			// CHECK: [[VTBX4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx4.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX4_I]]			// CHECK: ret <8 x i8> [[VTBX4_I]]
	int8x8_t test_vqtbx4_s8(int8x8_t a, int8x16x4_t b, int8x8_t c) {			int8x8_t test_vqtbx4_s8(int8x8_t a, int8x16x4_t b, int8x8_t c) {
	return vqtbx4_s8(a, b, c);			return vqtbx4_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx1q_s8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx1q_s8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #1 {
	// CHECK: [[VTBX1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx1.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #2			// CHECK: [[VTBX1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx1.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX1_I]]			// CHECK: ret <16 x i8> [[VTBX1_I]]
	int8x16_t test_vqtbx1q_s8(int8x16_t a, int8x16_t b, int8x16_t c) {			int8x16_t test_vqtbx1q_s8(int8x16_t a, int8x16_t b, int8x16_t c) {
	return vqtbx1q_s8(a, b, c);			return vqtbx1q_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx2q_s8(<16 x i8> %a, [2 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx2q_s8(<16 x i8> %a, [2 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBX2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx2.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %c) #2			// CHECK: [[VTBX2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx2.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX2_I]]			// CHECK: ret <16 x i8> [[VTBX2_I]]
	int8x16_t test_vqtbx2q_s8(int8x16_t a, int8x16x2_t b, int8x16_t c) {			int8x16_t test_vqtbx2q_s8(int8x16_t a, int8x16x2_t b, int8x16_t c) {
	return vqtbx2q_s8(a, b, c);			return vqtbx2q_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx3q_s8(<16 x i8> %a, [3 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx3q_s8(<16 x i8> %a, [3 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBX3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx3.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %c) #2			// CHECK: [[VTBX3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx3.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX3_I]]			// CHECK: ret <16 x i8> [[VTBX3_I]]
	int8x16_t test_vqtbx3q_s8(int8x16_t a, int8x16x3_t b, int8x16_t c) {			int8x16_t test_vqtbx3q_s8(int8x16_t a, int8x16x3_t b, int8x16_t c) {
	return vqtbx3q_s8(a, b, c);			return vqtbx3q_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx4q_s8(<16 x i8> %a, [4 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx4q_s8(<16 x i8> %a, [4 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.int8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.int8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBX4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx4.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %c) #2			// CHECK: [[VTBX4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx4.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX4_I]]			// CHECK: ret <16 x i8> [[VTBX4_I]]
	int8x16_t test_vqtbx4q_s8(int8x16_t a, int8x16x4_t b, int8x16_t c) {			int8x16_t test_vqtbx4q_s8(int8x16_t a, int8x16x4_t b, int8x16_t c) {
	return vqtbx4q_s8(a, b, c);			return vqtbx4q_s8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl1_u8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl1_u8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #2			// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL11_I]]			// CHECK: ret <8 x i8> [[VTBL11_I]]
	uint8x8_t test_vtbl1_u8(uint8x8_t a, uint8x8_t b) {			uint8x8_t test_vtbl1_u8(uint8x8_t a, uint8x8_t b) {
	return vtbl1_u8(a, b);			return vtbl1_u8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl1_u8(<16 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl1_u8(<16 x i8> %a, <8 x i8> %b) #1 {
	// CHECK: [[VTBL1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> %a, <8 x i8> %b) #2			// CHECK: [[VTBL1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> %a, <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL1_I]]			// CHECK: ret <8 x i8> [[VTBL1_I]]
	uint8x8_t test_vqtbl1_u8(uint8x16_t a, uint8x8_t b) {			uint8x8_t test_vqtbl1_u8(uint8x16_t a, uint8x8_t b) {
	return vqtbl1_u8(a, b);			return vqtbl1_u8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl2_u8([2 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl2_u8([2 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x8x2_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x8x2_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.uint8x8x2_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.uint8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[A]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[A]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <8 x i8>], [2 x <8 x i8>] [[COERCE_DIVE1]], align 8			// CHECK: [[TMP0:%.]] = load [2 x <8 x i8>], [2 x <8 x i8>] [[COERCE_DIVE1]], align 8
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8			// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8			// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #2			// CHECK: [[VTBL13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL13_I]]			// CHECK: ret <8 x i8> [[VTBL13_I]]
	uint8x8_t test_vtbl2_u8(uint8x8x2_t a, uint8x8_t b) {			uint8x8_t test_vtbl2_u8(uint8x8x2_t a, uint8x8_t b) {
	return vtbl2_u8(a, b);			return vtbl2_u8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl2_u8([2 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl2_u8([2 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBL2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %b) #2			// CHECK: [[VTBL2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL2_I]]			// CHECK: ret <8 x i8> [[VTBL2_I]]
	uint8x8_t test_vqtbl2_u8(uint8x16x2_t a, uint8x8_t b) {			uint8x8_t test_vqtbl2_u8(uint8x16x2_t a, uint8x8_t b) {
	return vqtbl2_u8(a, b);			return vqtbl2_u8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl3_u8([3 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl3_u8([3 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x8x3_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.uint8x8x3_t, align 8
	Show All 9 Lines
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %b) #2			// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL26_I]]			// CHECK: ret <8 x i8> [[VTBL26_I]]
	uint8x8_t test_vtbl3_u8(uint8x8x3_t a, uint8x8_t b) {			uint8x8_t test_vtbl3_u8(uint8x8x3_t a, uint8x8_t b) {
	return vtbl3_u8(a, b);			return vtbl3_u8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl3_u8([3 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl3_u8([3 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[A]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBL3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %b) #2			// CHECK: [[VTBL3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL3_I]]			// CHECK: ret <8 x i8> [[VTBL3_I]]
	uint8x8_t test_vqtbl3_u8(uint8x16x3_t a, uint8x8_t b) {			uint8x8_t test_vqtbl3_u8(uint8x16x3_t a, uint8x8_t b) {
	return vqtbl3_u8(a, b);			return vqtbl3_u8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl4_u8([4 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl4_u8([4 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x8x4_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x8x4_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.uint8x8x4_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.uint8x8x4_t, align 8
	Show All 12 Lines
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL27_I]], <8 x i8> %b) #2			// CHECK: [[VTBL28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL27_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL28_I]]			// CHECK: ret <8 x i8> [[VTBL28_I]]
	uint8x8_t test_vtbl4_u8(uint8x8x4_t a, uint8x8_t b) {			uint8x8_t test_vtbl4_u8(uint8x8x4_t a, uint8x8_t b) {
	return vtbl4_u8(a, b);			return vtbl4_u8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl4_u8([4 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl4_u8([4 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.uint8x16x4_t, align 16
	Show All 10 Lines
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBL4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %b) #2			// CHECK: [[VTBL4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL4_I]]			// CHECK: ret <8 x i8> [[VTBL4_I]]
	uint8x8_t test_vqtbl4_u8(uint8x16x4_t a, uint8x8_t b) {			uint8x8_t test_vqtbl4_u8(uint8x16x4_t a, uint8x8_t b) {
	return vqtbl4_u8(a, b);			return vqtbl4_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl1q_u8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl1q_u8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[VTBL1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> %a, <16 x i8> %b) #2			// CHECK: [[VTBL1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> %a, <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL1_I]]			// CHECK: ret <16 x i8> [[VTBL1_I]]
	uint8x16_t test_vqtbl1q_u8(uint8x16_t a, uint8x16_t b) {			uint8x16_t test_vqtbl1q_u8(uint8x16_t a, uint8x16_t b) {
	return vqtbl1q_u8(a, b);			return vqtbl1q_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl2q_u8([2 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl2q_u8([2 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBL2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %b) #2			// CHECK: [[VTBL2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL2_I]]			// CHECK: ret <16 x i8> [[VTBL2_I]]
	uint8x16_t test_vqtbl2q_u8(uint8x16x2_t a, uint8x16_t b) {			uint8x16_t test_vqtbl2q_u8(uint8x16x2_t a, uint8x16_t b) {
	return vqtbl2q_u8(a, b);			return vqtbl2q_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl3q_u8([3 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl3q_u8([3 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[A]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBL3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %b) #2			// CHECK: [[VTBL3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL3_I]]			// CHECK: ret <16 x i8> [[VTBL3_I]]
	uint8x16_t test_vqtbl3q_u8(uint8x16x3_t a, uint8x16_t b) {			uint8x16_t test_vqtbl3q_u8(uint8x16x3_t a, uint8x16_t b) {
	return vqtbl3q_u8(a, b);			return vqtbl3q_u8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl4q_u8([4 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl4q_u8([4 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[A]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[A]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[A]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBL4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %b) #2			// CHECK: [[VTBL4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL4_I]]			// CHECK: ret <16 x i8> [[VTBL4_I]]
	uint8x16_t test_vqtbl4q_u8(uint8x16x4_t a, uint8x16_t b) {			uint8x16_t test_vqtbl4q_u8(uint8x16x4_t a, uint8x16_t b) {
	return vqtbl4q_u8(a, b);			return vqtbl4q_u8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbx1_u8(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbx1_u8(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c) #0 {
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %b, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %b, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %c) #2			// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %c) #3
	// CHECK: [[TMP0:%.*]] = icmp uge <8 x i8> %c, <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>			// CHECK: [[TMP0:%.*]] = icmp uge <8 x i8> %c, <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>
	// CHECK: [[TMP1:%.*]] = sext <8 x i1> [[TMP0]] to <8 x i8>			// CHECK: [[TMP1:%.*]] = sext <8 x i1> [[TMP0]] to <8 x i8>
	// CHECK: [[TMP2:%.*]] = and <8 x i8> [[TMP1]], %a			// CHECK: [[TMP2:%.*]] = and <8 x i8> [[TMP1]], %a
	// CHECK: [[TMP3:%.*]] = xor <8 x i8> [[TMP1]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[TMP3:%.*]] = xor <8 x i8> [[TMP1]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[TMP4:%.*]] = and <8 x i8> [[TMP3]], [[VTBL11_I]]			// CHECK: [[TMP4:%.*]] = and <8 x i8> [[TMP3]], [[VTBL11_I]]
	// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP2]], [[TMP4]]			// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP2]], [[TMP4]]
	// CHECK: ret <8 x i8> [[VTBX_I]]			// CHECK: ret <8 x i8> [[VTBX_I]]
	uint8x8_t test_vtbx1_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c) {			uint8x8_t test_vtbx1_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c) {
	Show All 11 Lines
	// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8			// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8			// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x8x2_t, %struct.uint8x8x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VTBX1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> [[VTBX1_I]], <8 x i8> %c) #2			// CHECK: [[VTBX13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> [[VTBX1_I]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX13_I]]			// CHECK: ret <8 x i8> [[VTBX13_I]]
	uint8x8_t test_vtbx2_u8(uint8x8_t a, uint8x8x2_t b, uint8x8_t c) {			uint8x8_t test_vtbx2_u8(uint8x8_t a, uint8x8x2_t b, uint8x8_t c) {
	return vtbx2_u8(a, b, c);			return vtbx2_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbx3_u8(<8 x i8> %a, [3 x <8 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbx3_u8(<8 x i8> %a, [3 x <8 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x8x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.uint8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.uint8x8x3_t, align 8
	Show All 9 Lines
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x8x3_t, %struct.uint8x8x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %c) #2			// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %c) #3
	// CHECK: [[TMP4:%.*]] = icmp uge <8 x i8> %c, <i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24>			// CHECK: [[TMP4:%.*]] = icmp uge <8 x i8> %c, <i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24>
	// CHECK: [[TMP5:%.*]] = sext <8 x i1> [[TMP4]] to <8 x i8>			// CHECK: [[TMP5:%.*]] = sext <8 x i1> [[TMP4]] to <8 x i8>
	// CHECK: [[TMP6:%.*]] = and <8 x i8> [[TMP5]], %a			// CHECK: [[TMP6:%.*]] = and <8 x i8> [[TMP5]], %a
	// CHECK: [[TMP7:%.*]] = xor <8 x i8> [[TMP5]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[TMP7:%.*]] = xor <8 x i8> [[TMP5]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[TMP8:%.*]] = and <8 x i8> [[TMP7]], [[VTBL26_I]]			// CHECK: [[TMP8:%.*]] = and <8 x i8> [[TMP7]], [[VTBL26_I]]
	// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP6]], [[TMP8]]			// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP6]], [[TMP8]]
	// CHECK: ret <8 x i8> [[VTBX_I]]			// CHECK: ret <8 x i8> [[VTBX_I]]
	uint8x8_t test_vtbx3_u8(uint8x8_t a, uint8x8x3_t b, uint8x8_t c) {			uint8x8_t test_vtbx3_u8(uint8x8_t a, uint8x8x3_t b, uint8x8_t c) {
	Show All 18 Lines
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x8x4_t, %struct.uint8x8x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8
	// CHECK: [[VTBX2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[VTBX2_I]], <16 x i8> [[VTBX27_I]], <8 x i8> %c) #2			// CHECK: [[VTBX28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[VTBX2_I]], <16 x i8> [[VTBX27_I]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX28_I]]			// CHECK: ret <8 x i8> [[VTBX28_I]]
	uint8x8_t test_vtbx4_u8(uint8x8_t a, uint8x8x4_t b, uint8x8_t c) {			uint8x8_t test_vtbx4_u8(uint8x8_t a, uint8x8x4_t b, uint8x8_t c) {
	return vtbx4_u8(a, b, c);			return vtbx4_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx1_u8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx1_u8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #1 {
	// CHECK: [[VTBX1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #2			// CHECK: [[VTBX1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX1_I]]			// CHECK: ret <8 x i8> [[VTBX1_I]]
	uint8x8_t test_vqtbx1_u8(uint8x8_t a, uint8x16_t b, uint8x8_t c) {			uint8x8_t test_vqtbx1_u8(uint8x8_t a, uint8x16_t b, uint8x8_t c) {
	return vqtbx1_u8(a, b, c);			return vqtbx1_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx2_u8(<8 x i8> %a, [2 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx2_u8(<8 x i8> %a, [2 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBX2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %c) #2			// CHECK: [[VTBX2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX2_I]]			// CHECK: ret <8 x i8> [[VTBX2_I]]
	uint8x8_t test_vqtbx2_u8(uint8x8_t a, uint8x16x2_t b, uint8x8_t c) {			uint8x8_t test_vqtbx2_u8(uint8x8_t a, uint8x16x2_t b, uint8x8_t c) {
	return vqtbx2_u8(a, b, c);			return vqtbx2_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx3_u8(<8 x i8> %a, [3 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx3_u8(<8 x i8> %a, [3 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBX3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx3.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %c) #2			// CHECK: [[VTBX3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx3.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX3_I]]			// CHECK: ret <8 x i8> [[VTBX3_I]]
	uint8x8_t test_vqtbx3_u8(uint8x8_t a, uint8x16x3_t b, uint8x8_t c) {			uint8x8_t test_vqtbx3_u8(uint8x8_t a, uint8x16x3_t b, uint8x8_t c) {
	return vqtbx3_u8(a, b, c);			return vqtbx3_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx4_u8(<8 x i8> %a, [4 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx4_u8(<8 x i8> %a, [4 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x4_t, align 16
	Show All 10 Lines
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBX4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx4.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %c) #2			// CHECK: [[VTBX4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx4.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX4_I]]			// CHECK: ret <8 x i8> [[VTBX4_I]]
	uint8x8_t test_vqtbx4_u8(uint8x8_t a, uint8x16x4_t b, uint8x8_t c) {			uint8x8_t test_vqtbx4_u8(uint8x8_t a, uint8x16x4_t b, uint8x8_t c) {
	return vqtbx4_u8(a, b, c);			return vqtbx4_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx1q_u8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx1q_u8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #1 {
	// CHECK: [[VTBX1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx1.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #2			// CHECK: [[VTBX1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx1.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX1_I]]			// CHECK: ret <16 x i8> [[VTBX1_I]]
	uint8x16_t test_vqtbx1q_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c) {			uint8x16_t test_vqtbx1q_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c) {
	return vqtbx1q_u8(a, b, c);			return vqtbx1q_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx2q_u8(<16 x i8> %a, [2 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx2q_u8(<16 x i8> %a, [2 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x2_t, %struct.uint8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBX2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx2.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %c) #2			// CHECK: [[VTBX2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx2.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX2_I]]			// CHECK: ret <16 x i8> [[VTBX2_I]]
	uint8x16_t test_vqtbx2q_u8(uint8x16_t a, uint8x16x2_t b, uint8x16_t c) {			uint8x16_t test_vqtbx2q_u8(uint8x16_t a, uint8x16x2_t b, uint8x16_t c) {
	return vqtbx2q_u8(a, b, c);			return vqtbx2q_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx3q_u8(<16 x i8> %a, [3 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx3q_u8(<16 x i8> %a, [3 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x3_t, %struct.uint8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBX3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx3.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %c) #2			// CHECK: [[VTBX3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx3.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX3_I]]			// CHECK: ret <16 x i8> [[VTBX3_I]]
	uint8x16_t test_vqtbx3q_u8(uint8x16_t a, uint8x16x3_t b, uint8x16_t c) {			uint8x16_t test_vqtbx3q_u8(uint8x16_t a, uint8x16x3_t b, uint8x16_t c) {
	return vqtbx3q_u8(a, b, c);			return vqtbx3q_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx4q_u8(<16 x i8> %a, [4 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx4q_u8(<16 x i8> %a, [4 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.uint8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.uint8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.uint8x16x4_t, %struct.uint8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBX4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx4.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %c) #2			// CHECK: [[VTBX4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx4.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX4_I]]			// CHECK: ret <16 x i8> [[VTBX4_I]]
	uint8x16_t test_vqtbx4q_u8(uint8x16_t a, uint8x16x4_t b, uint8x16_t c) {			uint8x16_t test_vqtbx4q_u8(uint8x16_t a, uint8x16x4_t b, uint8x16_t c) {
	return vqtbx4q_u8(a, b, c);			return vqtbx4q_u8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl1_p8(<8 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl1_p8(<8 x i8> %a, <8 x i8> %b) #0 {
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %a, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #2			// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL11_I]]			// CHECK: ret <8 x i8> [[VTBL11_I]]
	poly8x8_t test_vtbl1_p8(poly8x8_t a, uint8x8_t b) {			poly8x8_t test_vtbl1_p8(poly8x8_t a, uint8x8_t b) {
	return vtbl1_p8(a, b);			return vtbl1_p8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl1_p8(<16 x i8> %a, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl1_p8(<16 x i8> %a, <8 x i8> %b) #1 {
	// CHECK: [[VTBL1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> %a, <8 x i8> %b) #2			// CHECK: [[VTBL1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> %a, <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL1_I]]			// CHECK: ret <8 x i8> [[VTBL1_I]]
	poly8x8_t test_vqtbl1_p8(poly8x16_t a, uint8x8_t b) {			poly8x8_t test_vqtbl1_p8(poly8x16_t a, uint8x8_t b) {
	return vqtbl1_p8(a, b);			return vqtbl1_p8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl2_p8([2 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl2_p8([2 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x8x2_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x8x2_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.poly8x8x2_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.poly8x8x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[A]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <8 x i8>] [[A]].coerce, [2 x <8 x i8>]* [[COERCE_DIVE]], align 8
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <8 x i8>], [2 x <8 x i8>] [[COERCE_DIVE1]], align 8			// CHECK: [[TMP0:%.]] = load [2 x <8 x i8>], [2 x <8 x i8>] [[COERCE_DIVE1]], align 8
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8			// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8			// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #2			// CHECK: [[VTBL13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL13_I]]			// CHECK: ret <8 x i8> [[VTBL13_I]]
	poly8x8_t test_vtbl2_p8(poly8x8x2_t a, uint8x8_t b) {			poly8x8_t test_vtbl2_p8(poly8x8x2_t a, uint8x8_t b) {
	return vtbl2_p8(a, b);			return vtbl2_p8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl2_p8([2 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl2_p8([2 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBL2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %b) #2			// CHECK: [[VTBL2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL2_I]]			// CHECK: ret <8 x i8> [[VTBL2_I]]
	poly8x8_t test_vqtbl2_p8(poly8x16x2_t a, uint8x8_t b) {			poly8x8_t test_vqtbl2_p8(poly8x16x2_t a, uint8x8_t b) {
	return vqtbl2_p8(a, b);			return vqtbl2_p8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl3_p8([3 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl3_p8([3 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x8x3_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.poly8x8x3_t, align 8
	Show All 9 Lines
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %b) #2			// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL26_I]]			// CHECK: ret <8 x i8> [[VTBL26_I]]
	poly8x8_t test_vtbl3_p8(poly8x8x3_t a, uint8x8_t b) {			poly8x8_t test_vtbl3_p8(poly8x8x3_t a, uint8x8_t b) {
	return vtbl3_p8(a, b);			return vtbl3_p8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl3_p8([3 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl3_p8([3 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[A]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBL3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %b) #2			// CHECK: [[VTBL3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL3_I]]			// CHECK: ret <8 x i8> [[VTBL3_I]]
	poly8x8_t test_vqtbl3_p8(poly8x16x3_t a, uint8x8_t b) {			poly8x8_t test_vqtbl3_p8(poly8x16x3_t a, uint8x8_t b) {
	return vqtbl3_p8(a, b);			return vqtbl3_p8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbl4_p8([4 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbl4_p8([4 x <8 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x8x4_t, align 8			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x8x4_t, align 8
	// CHECK: [[A:%.*]] = alloca %struct.poly8x8x4_t, align 8			// CHECK: [[A:%.*]] = alloca %struct.poly8x8x4_t, align 8
	Show All 12 Lines
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL27_I]], <8 x i8> %b) #2			// CHECK: [[VTBL28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL27_I]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL28_I]]			// CHECK: ret <8 x i8> [[VTBL28_I]]
	poly8x8_t test_vtbl4_p8(poly8x8x4_t a, uint8x8_t b) {			poly8x8_t test_vtbl4_p8(poly8x8x4_t a, uint8x8_t b) {
	return vtbl4_p8(a, b);			return vtbl4_p8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbl4_p8([4 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbl4_p8([4 x <16 x i8>] %a.coerce, <8 x i8> %b) #0 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.poly8x16x4_t, align 16
	Show All 10 Lines
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBL4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %b) #2			// CHECK: [[VTBL4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %b) #3
	// CHECK: ret <8 x i8> [[VTBL4_I]]			// CHECK: ret <8 x i8> [[VTBL4_I]]
	poly8x8_t test_vqtbl4_p8(poly8x16x4_t a, uint8x8_t b) {			poly8x8_t test_vqtbl4_p8(poly8x16x4_t a, uint8x8_t b) {
	return vqtbl4_p8(a, b);			return vqtbl4_p8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl1q_p8(<16 x i8> %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl1q_p8(<16 x i8> %a, <16 x i8> %b) #1 {
	// CHECK: [[VTBL1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> %a, <16 x i8> %b) #2			// CHECK: [[VTBL1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> %a, <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL1_I]]			// CHECK: ret <16 x i8> [[VTBL1_I]]
	poly8x16_t test_vqtbl1q_p8(poly8x16_t a, uint8x16_t b) {			poly8x16_t test_vqtbl1q_p8(poly8x16_t a, uint8x16_t b) {
	return vqtbl1q_p8(a, b);			return vqtbl1q_p8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl2q_p8([2 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl2q_p8([2 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[A]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[A]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBL2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %b) #2			// CHECK: [[VTBL2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL2_I]]			// CHECK: ret <16 x i8> [[VTBL2_I]]
	poly8x16_t test_vqtbl2q_p8(poly8x16x2_t a, uint8x16_t b) {			poly8x16_t test_vqtbl2q_p8(poly8x16x2_t a, uint8x16_t b) {
	return vqtbl2q_p8(a, b);			return vqtbl2q_p8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl3q_p8([3 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl3q_p8([3 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[A]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[A]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBL3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %b) #2			// CHECK: [[VTBL3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL3_I]]			// CHECK: ret <16 x i8> [[VTBL3_I]]
	poly8x16_t test_vqtbl3q_p8(poly8x16x3_t a, uint8x16_t b) {			poly8x16_t test_vqtbl3q_p8(poly8x16x3_t a, uint8x16_t b) {
	return vqtbl3q_p8(a, b);			return vqtbl3q_p8(a, b);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbl4q_p8([4 x <16 x i8>] %a.coerce, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbl4q_p8([4 x <16 x i8>] %a.coerce, <16 x i8> %b) #1 {
	// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[__P0_I:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[A:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[A:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[A]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[A]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[A]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[A]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[A]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P0_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBL4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %b) #2			// CHECK: [[VTBL4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %b) #3
	// CHECK: ret <16 x i8> [[VTBL4_I]]			// CHECK: ret <16 x i8> [[VTBL4_I]]
	poly8x16_t test_vqtbl4q_p8(poly8x16x4_t a, uint8x16_t b) {			poly8x16_t test_vqtbl4q_p8(poly8x16x4_t a, uint8x16_t b) {
	return vqtbl4q_p8(a, b);			return vqtbl4q_p8(a, b);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbx1_p8(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbx1_p8(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c) #0 {
	// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %b, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL1_I:%.*]] = shufflevector <8 x i8> %b, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %c) #2			// CHECK: [[VTBL11_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[VTBL1_I]], <8 x i8> %c) #3
	// CHECK: [[TMP0:%.*]] = icmp uge <8 x i8> %c, <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>			// CHECK: [[TMP0:%.*]] = icmp uge <8 x i8> %c, <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>
	// CHECK: [[TMP1:%.*]] = sext <8 x i1> [[TMP0]] to <8 x i8>			// CHECK: [[TMP1:%.*]] = sext <8 x i1> [[TMP0]] to <8 x i8>
	// CHECK: [[TMP2:%.*]] = and <8 x i8> [[TMP1]], %a			// CHECK: [[TMP2:%.*]] = and <8 x i8> [[TMP1]], %a
	// CHECK: [[TMP3:%.*]] = xor <8 x i8> [[TMP1]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[TMP3:%.*]] = xor <8 x i8> [[TMP1]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[TMP4:%.*]] = and <8 x i8> [[TMP3]], [[VTBL11_I]]			// CHECK: [[TMP4:%.*]] = and <8 x i8> [[TMP3]], [[VTBL11_I]]
	// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP2]], [[TMP4]]			// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP2]], [[TMP4]]
	// CHECK: ret <8 x i8> [[VTBX_I]]			// CHECK: ret <8 x i8> [[VTBX_I]]
	poly8x8_t test_vtbx1_p8(poly8x8_t a, poly8x8_t b, uint8x8_t c) {			poly8x8_t test_vtbx1_p8(poly8x8_t a, poly8x8_t b, uint8x8_t c) {
	Show All 11 Lines
	// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8			// CHECK: store [2 x <8 x i8>] [[TMP0]], [2 x <8 x i8>]* [[COERCE_DIVE_I]], align 8
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8			// CHECK: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX_I]], align 8
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x8x2_t, %struct.poly8x8x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <8 x i8>], [2 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VTBX1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX1_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> [[VTBX1_I]], <8 x i8> %c) #2			// CHECK: [[VTBX13_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> [[VTBX1_I]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX13_I]]			// CHECK: ret <8 x i8> [[VTBX13_I]]
	poly8x8_t test_vtbx2_p8(poly8x8_t a, poly8x8x2_t b, uint8x8_t c) {			poly8x8_t test_vtbx2_p8(poly8x8_t a, poly8x8x2_t b, uint8x8_t c) {
	return vtbx2_p8(a, b, c);			return vtbx2_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vtbx3_p8(<8 x i8> %a, [3 x <8 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vtbx3_p8(<8 x i8> %a, [3 x <8 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x8x3_t, align 8
	// CHECK: [[B:%.*]] = alloca %struct.poly8x8x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.poly8x8x3_t, align 8
	Show All 9 Lines
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8			// CHECK: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX2_I]], align 8
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x8x3_t, %struct.poly8x8x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <8 x i8>], [3 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBL25_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %c) #2			// CHECK: [[VTBL26_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[VTBL2_I]], <16 x i8> [[VTBL25_I]], <8 x i8> %c) #3
	// CHECK: [[TMP4:%.*]] = icmp uge <8 x i8> %c, <i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24>			// CHECK: [[TMP4:%.*]] = icmp uge <8 x i8> %c, <i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24, i8 24>
	// CHECK: [[TMP5:%.*]] = sext <8 x i1> [[TMP4]] to <8 x i8>			// CHECK: [[TMP5:%.*]] = sext <8 x i1> [[TMP4]] to <8 x i8>
	// CHECK: [[TMP6:%.*]] = and <8 x i8> [[TMP5]], %a			// CHECK: [[TMP6:%.*]] = and <8 x i8> [[TMP5]], %a
	// CHECK: [[TMP7:%.*]] = xor <8 x i8> [[TMP5]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			// CHECK: [[TMP7:%.*]] = xor <8 x i8> [[TMP5]], <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	// CHECK: [[TMP8:%.*]] = and <8 x i8> [[TMP7]], [[VTBL26_I]]			// CHECK: [[TMP8:%.*]] = and <8 x i8> [[TMP7]], [[VTBL26_I]]
	// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP6]], [[TMP8]]			// CHECK: [[VTBX_I:%.*]] = or <8 x i8> [[TMP6]], [[TMP8]]
	// CHECK: ret <8 x i8> [[VTBX_I]]			// CHECK: ret <8 x i8> [[VTBX_I]]
	poly8x8_t test_vtbx3_p8(poly8x8_t a, poly8x8x3_t b, uint8x8_t c) {			poly8x8_t test_vtbx3_p8(poly8x8_t a, poly8x8x3_t b, uint8x8_t c) {
	Show All 18 Lines
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8			// CHECK: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX4_I]], align 8
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x8x4_t, %struct.poly8x8x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <8 x i8>], [4 x <8 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8			// CHECK: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[ARRAYIDX6_I]], align 8
	// CHECK: [[VTBX2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX2_I:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: [[VTBX27_I:%.*]] = shufflevector <8 x i8> [[TMP3]], <8 x i8> [[TMP4]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	// CHECK: [[VTBX28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[VTBX2_I]], <16 x i8> [[VTBX27_I]], <8 x i8> %c) #2			// CHECK: [[VTBX28_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[VTBX2_I]], <16 x i8> [[VTBX27_I]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX28_I]]			// CHECK: ret <8 x i8> [[VTBX28_I]]
	poly8x8_t test_vtbx4_p8(poly8x8_t a, poly8x8x4_t b, uint8x8_t c) {			poly8x8_t test_vtbx4_p8(poly8x8_t a, poly8x8x4_t b, uint8x8_t c) {
	return vtbx4_p8(a, b, c);			return vtbx4_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx1_p8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx1_p8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #1 {
	// CHECK: [[VTBX1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #2			// CHECK: [[VTBX1_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx1.v8i8(<8 x i8> %a, <16 x i8> %b, <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX1_I]]			// CHECK: ret <8 x i8> [[VTBX1_I]]
	poly8x8_t test_vqtbx1_p8(poly8x8_t a, uint8x16_t b, uint8x8_t c) {			poly8x8_t test_vqtbx1_p8(poly8x8_t a, uint8x16_t b, uint8x8_t c) {
	return vqtbx1_p8(a, b, c);			return vqtbx1_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx2_p8(<8 x i8> %a, [2 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx2_p8(<8 x i8> %a, [2 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBX2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %c) #2			// CHECK: [[VTBX2_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx2.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX2_I]]			// CHECK: ret <8 x i8> [[VTBX2_I]]
	poly8x8_t test_vqtbx2_p8(poly8x8_t a, poly8x16x2_t b, uint8x8_t c) {			poly8x8_t test_vqtbx2_p8(poly8x8_t a, poly8x16x2_t b, uint8x8_t c) {
	return vqtbx2_p8(a, b, c);			return vqtbx2_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx3_p8(<8 x i8> %a, [3 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx3_p8(<8 x i8> %a, [3 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBX3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx3.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %c) #2			// CHECK: [[VTBX3_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx3.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX3_I]]			// CHECK: ret <8 x i8> [[VTBX3_I]]
	poly8x8_t test_vqtbx3_p8(poly8x8_t a, poly8x16x3_t b, uint8x8_t c) {			poly8x8_t test_vqtbx3_p8(poly8x8_t a, poly8x16x3_t b, uint8x8_t c) {
	return vqtbx3_p8(a, b, c);			return vqtbx3_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <8 x i8> @test_vqtbx4_p8(<8 x i8> %a, [4 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {			// CHECK-LABEL: define <8 x i8> @test_vqtbx4_p8(<8 x i8> %a, [4 x <16 x i8>] %b.coerce, <8 x i8> %c) #0 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x4_t, align 16
	Show All 10 Lines
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBX4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx4.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %c) #2			// CHECK: [[VTBX4_I:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbx4.v8i8(<8 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> %c) #3
	// CHECK: ret <8 x i8> [[VTBX4_I]]			// CHECK: ret <8 x i8> [[VTBX4_I]]
	poly8x8_t test_vqtbx4_p8(poly8x8_t a, poly8x16x4_t b, uint8x8_t c) {			poly8x8_t test_vqtbx4_p8(poly8x8_t a, poly8x16x4_t b, uint8x8_t c) {
	return vqtbx4_p8(a, b, c);			return vqtbx4_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx1q_p8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx1q_p8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #1 {
	// CHECK: [[VTBX1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx1.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #2			// CHECK: [[VTBX1_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx1.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX1_I]]			// CHECK: ret <16 x i8> [[VTBX1_I]]
	poly8x16_t test_vqtbx1q_p8(poly8x16_t a, uint8x16_t b, uint8x16_t c) {			poly8x16_t test_vqtbx1q_p8(poly8x16_t a, uint8x16_t b, uint8x16_t c) {
	return vqtbx1q_p8(a, b, c);			return vqtbx1q_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx2q_p8(<16 x i8> %a, [2 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx2q_p8(<16 x i8> %a, [2 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <16 x i8>] [[B]].coerce, [2 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [2 x <16 x i8>], [2 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [2 x <16 x i8>] [[TMP0]], [2 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x2_t, %struct.poly8x16x2_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [2 x <16 x i8>], [2 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VTBX2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx2.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %c) #2			// CHECK: [[VTBX2_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx2.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX2_I]]			// CHECK: ret <16 x i8> [[VTBX2_I]]
	poly8x16_t test_vqtbx2q_p8(poly8x16_t a, poly8x16x2_t b, uint8x16_t c) {			poly8x16_t test_vqtbx2q_p8(poly8x16_t a, poly8x16x2_t b, uint8x16_t c) {
	return vqtbx2q_p8(a, b, c);			return vqtbx2q_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx3q_p8(<16 x i8> %a, [3 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx3q_p8(<16 x i8> %a, [3 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <16 x i8>] [[B]].coerce, [3 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [3 x <16 x i8>], [3 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [3 x <16 x i8>] [[TMP0]], [3 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x3_t, %struct.poly8x16x3_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [3 x <16 x i8>], [3 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VTBX3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx3.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %c) #2			// CHECK: [[VTBX3_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx3.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX3_I]]			// CHECK: ret <16 x i8> [[VTBX3_I]]
	poly8x16_t test_vqtbx3q_p8(poly8x16_t a, poly8x16x3_t b, uint8x16_t c) {			poly8x16_t test_vqtbx3q_p8(poly8x16_t a, poly8x16x3_t b, uint8x16_t c) {
	return vqtbx3q_p8(a, b, c);			return vqtbx3q_p8(a, b, c);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vqtbx4q_p8(<16 x i8> %a, [4 x <16 x i8>] %b.coerce, <16 x i8> %c) #0 {			// CHECK-LABEL: define <16 x i8> @test_vqtbx4q_p8(<16 x i8> %a, [4 x <16 x i8>] %b.coerce, <16 x i8> %c) #1 {
	// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[__P1_I:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[B:%.*]] = alloca %struct.poly8x16x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.poly8x16x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[B]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <16 x i8>] [[B]].coerce, [4 x <16 x i8>]* [[COERCE_DIVE]], align 16
	// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[B]], i32 0, i32 0			// CHECK: [[COERCE_DIVE1:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[B]], i32 0, i32 0
	// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16			// CHECK: [[TMP0:%.]] = load [4 x <16 x i8>], [4 x <16 x i8>] [[COERCE_DIVE1]], align 16
	// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[COERCE_DIVE_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16			// CHECK: store [4 x <16 x i8>] [[TMP0]], [4 x <16 x i8>]* [[COERCE_DIVE_I]], align 16
	// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0			// CHECK: [[ARRAYIDX_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL_I]], i64 0, i64 0
	// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16			// CHECK: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX_I]], align 16
	// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL1_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1			// CHECK: [[ARRAYIDX2_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL1_I]], i64 0, i64 1
	// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16			// CHECK: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX2_I]], align 16
	// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL3_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2			// CHECK: [[ARRAYIDX4_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL3_I]], i64 0, i64 2
	// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16			// CHECK: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX4_I]], align 16
	// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0			// CHECK: [[VAL5_I:%.]] = getelementptr inbounds %struct.poly8x16x4_t, %struct.poly8x16x4_t [[__P1_I]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3			// CHECK: [[ARRAYIDX6_I:%.]] = getelementptr inbounds [4 x <16 x i8>], [4 x <16 x i8>] [[VAL5_I]], i64 0, i64 3
	// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16			// CHECK: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[ARRAYIDX6_I]], align 16
	// CHECK: [[VTBX4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx4.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %c) #2			// CHECK: [[VTBX4_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbx4.v16i8(<16 x i8> %a, <16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> %c) #3
	// CHECK: ret <16 x i8> [[VTBX4_I]]			// CHECK: ret <16 x i8> [[VTBX4_I]]
	poly8x16_t test_vqtbx4q_p8(poly8x16_t a, poly8x16x4_t b, uint8x16_t c) {			poly8x16_t test_vqtbx4q_p8(poly8x16_t a, poly8x16x4_t b, uint8x16_t c) {
	return vqtbx4q_p8(a, b, c);			return vqtbx4q_p8(a, b, c);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/aarch64-neon-vget.c

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	// CHECK: [[TMP4:%.]] = bitcast i16 [[__REINT1_242]] to half*			// CHECK: [[TMP4:%.]] = bitcast i16 [[__REINT1_242]] to half*
	// CHECK: [[TMP5:%.]] = load half, half [[TMP4]], align 2			// CHECK: [[TMP5:%.]] = load half, half [[TMP4]], align 2
	// CHECK: [[CONV:%.*]] = fpext half [[TMP5]] to float			// CHECK: [[CONV:%.*]] = fpext half [[TMP5]] to float
	// CHECK: ret float [[CONV]]			// CHECK: ret float [[CONV]]
	float32_t test_vget_lane_f16(float16x4_t a) {			float32_t test_vget_lane_f16(float16x4_t a) {
	return vget_lane_f16(a, 1);			return vget_lane_f16(a, 1);
	}			}

	// CHECK-LABEL: define i8 @test_vgetq_lane_u8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vgetq_lane_u8(<16 x i8> %a) #1 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15
	// CHECK: ret i8 [[VGETQ_LANE]]			// CHECK: ret i8 [[VGETQ_LANE]]
	uint8_t test_vgetq_lane_u8(uint8x16_t a) {			uint8_t test_vgetq_lane_u8(uint8x16_t a) {
	return vgetq_lane_u8(a, 15);			return vgetq_lane_u8(a, 15);
	}			}

	// CHECK-LABEL: define i16 @test_vgetq_lane_u16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vgetq_lane_u16(<8 x i16> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
	// CHECK: ret i16 [[VGETQ_LANE]]			// CHECK: ret i16 [[VGETQ_LANE]]
	uint16_t test_vgetq_lane_u16(uint16x8_t a) {			uint16_t test_vgetq_lane_u16(uint16x8_t a) {
	return vgetq_lane_u16(a, 7);			return vgetq_lane_u16(a, 7);
	}			}

	// CHECK-LABEL: define i32 @test_vgetq_lane_u32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vgetq_lane_u32(<4 x i32> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
	// CHECK: ret i32 [[VGETQ_LANE]]			// CHECK: ret i32 [[VGETQ_LANE]]
	uint32_t test_vgetq_lane_u32(uint32x4_t a) {			uint32_t test_vgetq_lane_u32(uint32x4_t a) {
	return vgetq_lane_u32(a, 3);			return vgetq_lane_u32(a, 3);
	}			}

	// CHECK-LABEL: define i8 @test_vgetq_lane_s8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vgetq_lane_s8(<16 x i8> %a) #1 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15
	// CHECK: ret i8 [[VGETQ_LANE]]			// CHECK: ret i8 [[VGETQ_LANE]]
	int8_t test_vgetq_lane_s8(int8x16_t a) {			int8_t test_vgetq_lane_s8(int8x16_t a) {
	return vgetq_lane_s8(a, 15);			return vgetq_lane_s8(a, 15);
	}			}

	// CHECK-LABEL: define i16 @test_vgetq_lane_s16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vgetq_lane_s16(<8 x i16> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
	// CHECK: ret i16 [[VGETQ_LANE]]			// CHECK: ret i16 [[VGETQ_LANE]]
	int16_t test_vgetq_lane_s16(int16x8_t a) {			int16_t test_vgetq_lane_s16(int16x8_t a) {
	return vgetq_lane_s16(a, 7);			return vgetq_lane_s16(a, 7);
	}			}

	// CHECK-LABEL: define i32 @test_vgetq_lane_s32(<4 x i32> %a) #0 {			// CHECK-LABEL: define i32 @test_vgetq_lane_s32(<4 x i32> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
	// CHECK: ret i32 [[VGETQ_LANE]]			// CHECK: ret i32 [[VGETQ_LANE]]
	int32_t test_vgetq_lane_s32(int32x4_t a) {			int32_t test_vgetq_lane_s32(int32x4_t a) {
	return vgetq_lane_s32(a, 3);			return vgetq_lane_s32(a, 3);
	}			}

	// CHECK-LABEL: define i8 @test_vgetq_lane_p8(<16 x i8> %a) #0 {			// CHECK-LABEL: define i8 @test_vgetq_lane_p8(<16 x i8> %a) #1 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <16 x i8> %a, i32 15
	// CHECK: ret i8 [[VGETQ_LANE]]			// CHECK: ret i8 [[VGETQ_LANE]]
	poly8_t test_vgetq_lane_p8(poly8x16_t a) {			poly8_t test_vgetq_lane_p8(poly8x16_t a) {
	return vgetq_lane_p8(a, 15);			return vgetq_lane_p8(a, 15);
	}			}

	// CHECK-LABEL: define i16 @test_vgetq_lane_p16(<8 x i16> %a) #0 {			// CHECK-LABEL: define i16 @test_vgetq_lane_p16(<8 x i16> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP1]], i32 7
	// CHECK: ret i16 [[VGETQ_LANE]]			// CHECK: ret i16 [[VGETQ_LANE]]
	poly16_t test_vgetq_lane_p16(poly16x8_t a) {			poly16_t test_vgetq_lane_p16(poly16x8_t a) {
	return vgetq_lane_p16(a, 7);			return vgetq_lane_p16(a, 7);
	}			}

	// CHECK-LABEL: define float @test_vgetq_lane_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define float @test_vgetq_lane_f32(<4 x float> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> [[TMP1]], i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
	// CHECK: ret float [[VGETQ_LANE]]			// CHECK: ret float [[VGETQ_LANE]]
	float32_t test_vgetq_lane_f32(float32x4_t a) {			float32_t test_vgetq_lane_f32(float32x4_t a) {
	return vgetq_lane_f32(a, 3);			return vgetq_lane_f32(a, 3);
	}			}

	// CHECK-LABEL: define float @test_vgetq_lane_f16(<8 x half> %a) #0 {			// CHECK-LABEL: define float @test_vgetq_lane_f16(<8 x half> %a) #1 {
	// CHECK: [[__REINT_244:%.*]] = alloca <8 x half>, align 16			// CHECK: [[__REINT_244:%.*]] = alloca <8 x half>, align 16
	// CHECK: [[__REINT1_244:%.*]] = alloca i16, align 2			// CHECK: [[__REINT1_244:%.*]] = alloca i16, align 2
	// CHECK: store <8 x half> %a, <8 x half>* [[__REINT_244]], align 16			// CHECK: store <8 x half> %a, <8 x half>* [[__REINT_244]], align 16
	// CHECK: [[TMP0:%.]] = bitcast <8 x half> [[__REINT_244]] to <8 x i16>*			// CHECK: [[TMP0:%.]] = bitcast <8 x half> [[__REINT_244]] to <8 x i16>*
	// CHECK: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 16			// CHECK: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 16
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i16> [[TMP1]] to <16 x i8>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i16> [[TMP1]] to <16 x i8>
	// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x i16>			// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x i16>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	Show All 19 Lines
	// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %a to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %a to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0
	// CHECK: ret i64 [[VGET_LANE]]			// CHECK: ret i64 [[VGET_LANE]]
	uint64_t test_vget_lane_u64(uint64x1_t a) {			uint64_t test_vget_lane_u64(uint64x1_t a) {
	return vget_lane_u64(a, 0);			return vget_lane_u64(a, 0);
	}			}

	// CHECK-LABEL: define i64 @test_vgetq_lane_s64(<2 x i64> %a) #0 {			// CHECK-LABEL: define i64 @test_vgetq_lane_s64(<2 x i64> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
	// CHECK: ret i64 [[VGETQ_LANE]]			// CHECK: ret i64 [[VGETQ_LANE]]
	int64_t test_vgetq_lane_s64(int64x2_t a) {			int64_t test_vgetq_lane_s64(int64x2_t a) {
	return vgetq_lane_s64(a, 1);			return vgetq_lane_s64(a, 1);
	}			}

	// CHECK-LABEL: define i64 @test_vgetq_lane_u64(<2 x i64> %a) #0 {			// CHECK-LABEL: define i64 @test_vgetq_lane_u64(<2 x i64> %a) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
	// CHECK: ret i64 [[VGETQ_LANE]]			// CHECK: ret i64 [[VGETQ_LANE]]
	uint64_t test_vgetq_lane_u64(uint64x2_t a) {			uint64_t test_vgetq_lane_u64(uint64x2_t a) {
	return vgetq_lane_u64(a, 1);			return vgetq_lane_u64(a, 1);
	}			}

	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	// CHECK: store <4 x i16> [[VSET_LANE]], <4 x i16>* [[__REINT2_246]], align 8			// CHECK: store <4 x i16> [[VSET_LANE]], <4 x i16>* [[__REINT2_246]], align 8
	// CHECK: [[TMP7:%.]] = bitcast <4 x i16> [[__REINT2_246]] to <4 x half>*			// CHECK: [[TMP7:%.]] = bitcast <4 x i16> [[__REINT2_246]] to <4 x half>*
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[TMP7]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[TMP7]], align 8
	// CHECK: ret <4 x half> [[TMP8]]			// CHECK: ret <4 x half> [[TMP8]]
	float16x4_t test_vset_lane_f16(float16_t *a, float16x4_t b) {			float16x4_t test_vset_lane_f16(float16_t *a, float16x4_t b) {
	return vset_lane_f16(*a, b, 3);			return vset_lane_f16(*a, b, 3);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vsetq_lane_u8(i8 %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vsetq_lane_u8(i8 %a, <16 x i8> %b) #1 {
	// CHECK: [[VSET_LANE:%.*]] = insertelement <16 x i8> %b, i8 %a, i32 15			// CHECK: [[VSET_LANE:%.*]] = insertelement <16 x i8> %b, i8 %a, i32 15
	// CHECK: ret <16 x i8> [[VSET_LANE]]			// CHECK: ret <16 x i8> [[VSET_LANE]]
	uint8x16_t test_vsetq_lane_u8(uint8_t a, uint8x16_t b) {			uint8x16_t test_vsetq_lane_u8(uint8_t a, uint8x16_t b) {
	return vsetq_lane_u8(a, b, 15);			return vsetq_lane_u8(a, b, 15);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vsetq_lane_u16(i16 %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vsetq_lane_u16(i16 %a, <8 x i16> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <8 x i16> [[TMP1]], i16 %a, i32 7			// CHECK: [[VSET_LANE:%.*]] = insertelement <8 x i16> [[TMP1]], i16 %a, i32 7
	// CHECK: ret <8 x i16> [[VSET_LANE]]			// CHECK: ret <8 x i16> [[VSET_LANE]]
	uint16x8_t test_vsetq_lane_u16(uint16_t a, uint16x8_t b) {			uint16x8_t test_vsetq_lane_u16(uint16_t a, uint16x8_t b) {
	return vsetq_lane_u16(a, b, 7);			return vsetq_lane_u16(a, b, 7);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vsetq_lane_u32(i32 %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vsetq_lane_u32(i32 %a, <4 x i32> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <4 x i32> [[TMP1]], i32 %a, i32 3			// CHECK: [[VSET_LANE:%.*]] = insertelement <4 x i32> [[TMP1]], i32 %a, i32 3
	// CHECK: ret <4 x i32> [[VSET_LANE]]			// CHECK: ret <4 x i32> [[VSET_LANE]]
	uint32x4_t test_vsetq_lane_u32(uint32_t a, uint32x4_t b) {			uint32x4_t test_vsetq_lane_u32(uint32_t a, uint32x4_t b) {
	return vsetq_lane_u32(a, b, 3);			return vsetq_lane_u32(a, b, 3);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vsetq_lane_s8(i8 %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vsetq_lane_s8(i8 %a, <16 x i8> %b) #1 {
	// CHECK: [[VSET_LANE:%.*]] = insertelement <16 x i8> %b, i8 %a, i32 15			// CHECK: [[VSET_LANE:%.*]] = insertelement <16 x i8> %b, i8 %a, i32 15
	// CHECK: ret <16 x i8> [[VSET_LANE]]			// CHECK: ret <16 x i8> [[VSET_LANE]]
	int8x16_t test_vsetq_lane_s8(int8_t a, int8x16_t b) {			int8x16_t test_vsetq_lane_s8(int8_t a, int8x16_t b) {
	return vsetq_lane_s8(a, b, 15);			return vsetq_lane_s8(a, b, 15);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vsetq_lane_s16(i16 %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vsetq_lane_s16(i16 %a, <8 x i16> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <8 x i16> [[TMP1]], i16 %a, i32 7			// CHECK: [[VSET_LANE:%.*]] = insertelement <8 x i16> [[TMP1]], i16 %a, i32 7
	// CHECK: ret <8 x i16> [[VSET_LANE]]			// CHECK: ret <8 x i16> [[VSET_LANE]]
	int16x8_t test_vsetq_lane_s16(int16_t a, int16x8_t b) {			int16x8_t test_vsetq_lane_s16(int16_t a, int16x8_t b) {
	return vsetq_lane_s16(a, b, 7);			return vsetq_lane_s16(a, b, 7);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vsetq_lane_s32(i32 %a, <4 x i32> %b) #0 {			// CHECK-LABEL: define <4 x i32> @test_vsetq_lane_s32(i32 %a, <4 x i32> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x i32> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <4 x i32> [[TMP1]], i32 %a, i32 3			// CHECK: [[VSET_LANE:%.*]] = insertelement <4 x i32> [[TMP1]], i32 %a, i32 3
	// CHECK: ret <4 x i32> [[VSET_LANE]]			// CHECK: ret <4 x i32> [[VSET_LANE]]
	int32x4_t test_vsetq_lane_s32(int32_t a, int32x4_t b) {			int32x4_t test_vsetq_lane_s32(int32_t a, int32x4_t b) {
	return vsetq_lane_s32(a, b, 3);			return vsetq_lane_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define <16 x i8> @test_vsetq_lane_p8(i8 %a, <16 x i8> %b) #0 {			// CHECK-LABEL: define <16 x i8> @test_vsetq_lane_p8(i8 %a, <16 x i8> %b) #1 {
	// CHECK: [[VSET_LANE:%.*]] = insertelement <16 x i8> %b, i8 %a, i32 15			// CHECK: [[VSET_LANE:%.*]] = insertelement <16 x i8> %b, i8 %a, i32 15
	// CHECK: ret <16 x i8> [[VSET_LANE]]			// CHECK: ret <16 x i8> [[VSET_LANE]]
	poly8x16_t test_vsetq_lane_p8(poly8_t a, poly8x16_t b) {			poly8x16_t test_vsetq_lane_p8(poly8_t a, poly8x16_t b) {
	return vsetq_lane_p8(a, b, 15);			return vsetq_lane_p8(a, b, 15);
	}			}

	// CHECK-LABEL: define <8 x i16> @test_vsetq_lane_p16(i16 %a, <8 x i16> %b) #0 {			// CHECK-LABEL: define <8 x i16> @test_vsetq_lane_p16(i16 %a, <8 x i16> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <8 x i16> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <8 x i16> [[TMP1]], i16 %a, i32 7			// CHECK: [[VSET_LANE:%.*]] = insertelement <8 x i16> [[TMP1]], i16 %a, i32 7
	// CHECK: ret <8 x i16> [[VSET_LANE]]			// CHECK: ret <8 x i16> [[VSET_LANE]]
	poly16x8_t test_vsetq_lane_p16(poly16_t a, poly16x8_t b) {			poly16x8_t test_vsetq_lane_p16(poly16_t a, poly16x8_t b) {
	return vsetq_lane_p16(a, b, 7);			return vsetq_lane_p16(a, b, 7);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vsetq_lane_f32(float %a, <4 x float> %b) #0 {			// CHECK-LABEL: define <4 x float> @test_vsetq_lane_f32(float %a, <4 x float> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x float>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <4 x float> [[TMP1]], float %a, i32 3			// CHECK: [[VSET_LANE:%.*]] = insertelement <4 x float> [[TMP1]], float %a, i32 3
	// CHECK: ret <4 x float> [[VSET_LANE]]			// CHECK: ret <4 x float> [[VSET_LANE]]
	float32x4_t test_vsetq_lane_f32(float32_t a, float32x4_t b) {			float32x4_t test_vsetq_lane_f32(float32_t a, float32x4_t b) {
	return vsetq_lane_f32(a, b, 3);			return vsetq_lane_f32(a, b, 3);
	}			}

	// CHECK-LABEL: define <8 x half> @test_vsetq_lane_f16(half* %a, <8 x half> %b) #0 {			// CHECK-LABEL: define <8 x half> @test_vsetq_lane_f16(half* %a, <8 x half> %b) #1 {
	// CHECK: [[__REINT_248:%.*]] = alloca half, align 2			// CHECK: [[__REINT_248:%.*]] = alloca half, align 2
	// CHECK: [[__REINT1_248:%.*]] = alloca <8 x half>, align 16			// CHECK: [[__REINT1_248:%.*]] = alloca <8 x half>, align 16
	// CHECK: [[__REINT2_248:%.*]] = alloca <8 x i16>, align 16			// CHECK: [[__REINT2_248:%.*]] = alloca <8 x i16>, align 16
	// CHECK: [[TMP0:%.]] = load half, half %a, align 2			// CHECK: [[TMP0:%.]] = load half, half %a, align 2
	// CHECK: store half [[TMP0]], half* [[__REINT_248]], align 2			// CHECK: store half [[TMP0]], half* [[__REINT_248]], align 2
	// CHECK: store <8 x half> %b, <8 x half>* [[__REINT1_248]], align 16			// CHECK: store <8 x half> %b, <8 x half>* [[__REINT1_248]], align 16
	// CHECK: [[TMP1:%.]] = bitcast half [[__REINT_248]] to i16*			// CHECK: [[TMP1:%.]] = bitcast half [[__REINT_248]] to i16*
	// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]], align 2			// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]], align 2
	Show All 23 Lines
	// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x i64> [[TMP1]], i64 %a, i32 0			// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x i64> [[TMP1]], i64 %a, i32 0
	// CHECK: ret <1 x i64> [[VSET_LANE]]			// CHECK: ret <1 x i64> [[VSET_LANE]]
	uint64x1_t test_vset_lane_u64(uint64_t a, uint64x1_t b) {			uint64x1_t test_vset_lane_u64(uint64_t a, uint64x1_t b) {
	return vset_lane_u64(a, b, 0);			return vset_lane_u64(a, b, 0);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vsetq_lane_s64(i64 %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vsetq_lane_s64(i64 %a, <2 x i64> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 %a, i32 1			// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 %a, i32 1
	// CHECK: ret <2 x i64> [[VSET_LANE]]			// CHECK: ret <2 x i64> [[VSET_LANE]]
	int64x2_t test_vsetq_lane_s64(int64_t a, int64x2_t b) {			int64x2_t test_vsetq_lane_s64(int64_t a, int64x2_t b) {
	return vsetq_lane_s64(a, b, 1);			return vsetq_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vsetq_lane_u64(i64 %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vsetq_lane_u64(i64 %a, <2 x i64> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 %a, i32 1			// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 %a, i32 1
	// CHECK: ret <2 x i64> [[VSET_LANE]]			// CHECK: ret <2 x i64> [[VSET_LANE]]
	uint64x2_t test_vsetq_lane_u64(uint64_t a, uint64x2_t b) {			uint64x2_t test_vsetq_lane_u64(uint64_t a, uint64x2_t b) {
	return vsetq_lane_u64(a, b, 1);			return vsetq_lane_u64(a, b, 1);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/aarch64-poly64.c

	// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \			// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon \
	// RUN: -ffp-contract=fast -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \			// RUN: -ffp-contract=fast -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// Test new aarch64 intrinsics with poly64			// Test new aarch64 intrinsics with poly64

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <1 x i64> @test_vceq_p64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vceq_p64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[CMP_I:%.*]] = icmp eq <1 x i64> %a, %b			// CHECK: [[CMP_I:%.*]] = icmp eq <1 x i64> %a, %b
	// CHECK: [[SEXT_I:%.*]] = sext <1 x i1> [[CMP_I]] to <1 x i64>			// CHECK: [[SEXT_I:%.*]] = sext <1 x i1> [[CMP_I]] to <1 x i64>
	// CHECK: ret <1 x i64> [[SEXT_I]]			// CHECK: ret <1 x i64> [[SEXT_I]]
	uint64x1_t test_vceq_p64(poly64x1_t a, poly64x1_t b) {			uint64x1_t test_vceq_p64(poly64x1_t a, poly64x1_t b) {
	return vceq_p64(a, b);			return vceq_p64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vceqq_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vceqq_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[CMP_I:%.*]] = icmp eq <2 x i64> %a, %b			// CHECK: [[CMP_I:%.*]] = icmp eq <2 x i64> %a, %b
	// CHECK: [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i64>			// CHECK: [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i64>
	// CHECK: ret <2 x i64> [[SEXT_I]]			// CHECK: ret <2 x i64> [[SEXT_I]]
	uint64x2_t test_vceqq_p64(poly64x2_t a, poly64x2_t b) {			uint64x2_t test_vceqq_p64(poly64x2_t a, poly64x2_t b) {
	return vceqq_p64(a, b);			return vceqq_p64(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vtst_p64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vtst_p64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[TMP4:%.*]] = and <1 x i64> %a, %b			// CHECK: [[TMP4:%.*]] = and <1 x i64> %a, %b
	// CHECK: [[TMP5:%.*]] = icmp ne <1 x i64> [[TMP4]], zeroinitializer			// CHECK: [[TMP5:%.*]] = icmp ne <1 x i64> [[TMP4]], zeroinitializer
	// CHECK: [[VTST_I:%.*]] = sext <1 x i1> [[TMP5]] to <1 x i64>			// CHECK: [[VTST_I:%.*]] = sext <1 x i1> [[TMP5]] to <1 x i64>
	// CHECK: ret <1 x i64> [[VTST_I]]			// CHECK: ret <1 x i64> [[VTST_I]]
	uint64x1_t test_vtst_p64(poly64x1_t a, poly64x1_t b) {			uint64x1_t test_vtst_p64(poly64x1_t a, poly64x1_t b) {
	return vtst_p64(a, b);			return vtst_p64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vtstq_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vtstq_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[TMP4:%.*]] = and <2 x i64> %a, %b			// CHECK: [[TMP4:%.*]] = and <2 x i64> %a, %b
	// CHECK: [[TMP5:%.*]] = icmp ne <2 x i64> [[TMP4]], zeroinitializer			// CHECK: [[TMP5:%.*]] = icmp ne <2 x i64> [[TMP4]], zeroinitializer
	// CHECK: [[VTST_I:%.*]] = sext <2 x i1> [[TMP5]] to <2 x i64>			// CHECK: [[VTST_I:%.*]] = sext <2 x i1> [[TMP5]] to <2 x i64>
	// CHECK: ret <2 x i64> [[VTST_I]]			// CHECK: ret <2 x i64> [[VTST_I]]
	uint64x2_t test_vtstq_p64(poly64x2_t a, poly64x2_t b) {			uint64x2_t test_vtstq_p64(poly64x2_t a, poly64x2_t b) {
	return vtstq_p64(a, b);			return vtstq_p64(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vbsl_p64(<1 x i64> %a, <1 x i64> %b, <1 x i64> %c) #0 {			// CHECK-LABEL: define <1 x i64> @test_vbsl_p64(<1 x i64> %a, <1 x i64> %b, <1 x i64> %c) #0 {
	// CHECK: [[VBSL3_I:%.*]] = and <1 x i64> %a, %b			// CHECK: [[VBSL3_I:%.*]] = and <1 x i64> %a, %b
	// CHECK: [[TMP3:%.*]] = xor <1 x i64> %a, <i64 -1>			// CHECK: [[TMP3:%.*]] = xor <1 x i64> %a, <i64 -1>
	// CHECK: [[VBSL4_I:%.*]] = and <1 x i64> [[TMP3]], %c			// CHECK: [[VBSL4_I:%.*]] = and <1 x i64> [[TMP3]], %c
	// CHECK: [[VBSL5_I:%.*]] = or <1 x i64> [[VBSL3_I]], [[VBSL4_I]]			// CHECK: [[VBSL5_I:%.*]] = or <1 x i64> [[VBSL3_I]], [[VBSL4_I]]
	// CHECK: ret <1 x i64> [[VBSL5_I]]			// CHECK: ret <1 x i64> [[VBSL5_I]]
	poly64x1_t test_vbsl_p64(poly64x1_t a, poly64x1_t b, poly64x1_t c) {			poly64x1_t test_vbsl_p64(poly64x1_t a, poly64x1_t b, poly64x1_t c) {
	return vbsl_p64(a, b, c);			return vbsl_p64(a, b, c);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vbslq_p64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) #0 {			// CHECK-LABEL: define <2 x i64> @test_vbslq_p64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) #1 {
	// CHECK: [[VBSL3_I:%.*]] = and <2 x i64> %a, %b			// CHECK: [[VBSL3_I:%.*]] = and <2 x i64> %a, %b
	// CHECK: [[TMP3:%.*]] = xor <2 x i64> %a, <i64 -1, i64 -1>			// CHECK: [[TMP3:%.*]] = xor <2 x i64> %a, <i64 -1, i64 -1>
	// CHECK: [[VBSL4_I:%.*]] = and <2 x i64> [[TMP3]], %c			// CHECK: [[VBSL4_I:%.*]] = and <2 x i64> [[TMP3]], %c
	// CHECK: [[VBSL5_I:%.*]] = or <2 x i64> [[VBSL3_I]], [[VBSL4_I]]			// CHECK: [[VBSL5_I:%.*]] = or <2 x i64> [[VBSL3_I]], [[VBSL4_I]]
	// CHECK: ret <2 x i64> [[VBSL5_I]]			// CHECK: ret <2 x i64> [[VBSL5_I]]
	poly64x2_t test_vbslq_p64(poly64x2_t a, poly64x2_t b, poly64x2_t c) {			poly64x2_t test_vbslq_p64(poly64x2_t a, poly64x2_t b, poly64x2_t c) {
	return vbslq_p64(a, b, c);			return vbslq_p64(a, b, c);
	}			}

	// CHECK-LABEL: define i64 @test_vget_lane_p64(<1 x i64> %v) #0 {			// CHECK-LABEL: define i64 @test_vget_lane_p64(<1 x i64> %v) #0 {
	// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %v to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %v to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0
	// CHECK: ret i64 [[VGET_LANE]]			// CHECK: ret i64 [[VGET_LANE]]
	poly64_t test_vget_lane_p64(poly64x1_t v) {			poly64_t test_vget_lane_p64(poly64x1_t v) {
	return vget_lane_p64(v, 0);			return vget_lane_p64(v, 0);
	}			}

	// CHECK-LABEL: define i64 @test_vgetq_lane_p64(<2 x i64> %v) #0 {			// CHECK-LABEL: define i64 @test_vgetq_lane_p64(<2 x i64> %v) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %v to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %v to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
	// CHECK: ret i64 [[VGETQ_LANE]]			// CHECK: ret i64 [[VGETQ_LANE]]
	poly64_t test_vgetq_lane_p64(poly64x2_t v) {			poly64_t test_vgetq_lane_p64(poly64x2_t v) {
	return vgetq_lane_p64(v, 1);			return vgetq_lane_p64(v, 1);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vset_lane_p64(i64 %a, <1 x i64> %v) #0 {			// CHECK-LABEL: define <1 x i64> @test_vset_lane_p64(i64 %a, <1 x i64> %v) #0 {
	// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %v to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %v to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x i64> [[TMP1]], i64 %a, i32 0			// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x i64> [[TMP1]], i64 %a, i32 0
	// CHECK: ret <1 x i64> [[VSET_LANE]]			// CHECK: ret <1 x i64> [[VSET_LANE]]
	poly64x1_t test_vset_lane_p64(poly64_t a, poly64x1_t v) {			poly64x1_t test_vset_lane_p64(poly64_t a, poly64x1_t v) {
	return vset_lane_p64(a, v, 0);			return vset_lane_p64(a, v, 0);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vsetq_lane_p64(i64 %a, <2 x i64> %v) #0 {			// CHECK-LABEL: define <2 x i64> @test_vsetq_lane_p64(i64 %a, <2 x i64> %v) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %v to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %v to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 %a, i32 1			// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 %a, i32 1
	// CHECK: ret <2 x i64> [[VSET_LANE]]			// CHECK: ret <2 x i64> [[VSET_LANE]]
	poly64x2_t test_vsetq_lane_p64(poly64_t a, poly64x2_t v) {			poly64x2_t test_vsetq_lane_p64(poly64_t a, poly64x2_t v) {
	return vsetq_lane_p64(a, v, 1);			return vsetq_lane_p64(a, v, 1);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vcopy_lane_p64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vcopy_lane_p64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0
	// CHECK: [[TMP2:%.*]] = bitcast <1 x i64> %a to <8 x i8>			// CHECK: [[TMP2:%.*]] = bitcast <1 x i64> %a to <8 x i8>
	// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <1 x i64>			// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <1 x i64>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x i64> [[TMP3]], i64 [[VGET_LANE]], i32 0			// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x i64> [[TMP3]], i64 [[VGET_LANE]], i32 0
	// CHECK: ret <1 x i64> [[VSET_LANE]]			// CHECK: ret <1 x i64> [[VSET_LANE]]
	poly64x1_t test_vcopy_lane_p64(poly64x1_t a, poly64x1_t b) {			poly64x1_t test_vcopy_lane_p64(poly64x1_t a, poly64x1_t b) {
	return vcopy_lane_p64(a, 0, b, 0);			return vcopy_lane_p64(a, 0, b, 0);

	}			}

	// CHECK-LABEL: define <2 x i64> @test_vcopyq_lane_p64(<2 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vcopyq_lane_p64(<2 x i64> %a, <1 x i64> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x i64> [[TMP1]], i32 0
	// CHECK: [[TMP2:%.*]] = bitcast <2 x i64> %a to <16 x i8>			// CHECK: [[TMP2:%.*]] = bitcast <2 x i64> %a to <16 x i8>
	// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x i64>			// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x i64>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[VGET_LANE]], i32 1			// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[VGET_LANE]], i32 1
	// CHECK: ret <2 x i64> [[VSET_LANE]]			// CHECK: ret <2 x i64> [[VSET_LANE]]
	poly64x2_t test_vcopyq_lane_p64(poly64x2_t a, poly64x1_t b) {			poly64x2_t test_vcopyq_lane_p64(poly64x2_t a, poly64x1_t b) {
	return vcopyq_lane_p64(a, 1, b, 0);			return vcopyq_lane_p64(a, 1, b, 0);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vcopyq_laneq_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vcopyq_laneq_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %b to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %b to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
	// CHECK: [[TMP2:%.*]] = bitcast <2 x i64> %a to <16 x i8>			// CHECK: [[TMP2:%.*]] = bitcast <2 x i64> %a to <16 x i8>
	// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x i64>			// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <2 x i64>
	// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[VGETQ_LANE]], i32 1			// CHECK: [[VSET_LANE:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[VGETQ_LANE]], i32 1
	// CHECK: ret <2 x i64> [[VSET_LANE]]			// CHECK: ret <2 x i64> [[VSET_LANE]]
	poly64x2_t test_vcopyq_laneq_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vcopyq_laneq_p64(poly64x2_t a, poly64x2_t b) {
	return vcopyq_laneq_p64(a, 1, b, 1);			return vcopyq_laneq_p64(a, 1, b, 1);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vcreate_p64(i64 %a) #0 {			// CHECK-LABEL: define <1 x i64> @test_vcreate_p64(i64 %a) #0 {
	// CHECK: [[TMP0:%.*]] = bitcast i64 %a to <1 x i64>			// CHECK: [[TMP0:%.*]] = bitcast i64 %a to <1 x i64>
	// CHECK: ret <1 x i64> [[TMP0]]			// CHECK: ret <1 x i64> [[TMP0]]
	poly64x1_t test_vcreate_p64(uint64_t a) {			poly64x1_t test_vcreate_p64(uint64_t a) {
	return vcreate_p64(a);			return vcreate_p64(a);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vdup_n_p64(i64 %a) #0 {			// CHECK-LABEL: define <1 x i64> @test_vdup_n_p64(i64 %a) #0 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <1 x i64> undef, i64 %a, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <1 x i64> undef, i64 %a, i32 0
	// CHECK: ret <1 x i64> [[VECINIT_I]]			// CHECK: ret <1 x i64> [[VECINIT_I]]
	poly64x1_t test_vdup_n_p64(poly64_t a) {			poly64x1_t test_vdup_n_p64(poly64_t a) {
	return vdup_n_p64(a);			return vdup_n_p64(a);
	}			}
	// CHECK-LABEL: define <2 x i64> @test_vdupq_n_p64(i64 %a) #0 {			// CHECK-LABEL: define <2 x i64> @test_vdupq_n_p64(i64 %a) #1 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x i64> undef, i64 %a, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x i64> undef, i64 %a, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x i64> [[VECINIT_I]], i64 %a, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x i64> [[VECINIT_I]], i64 %a, i32 1
	// CHECK: ret <2 x i64> [[VECINIT1_I]]			// CHECK: ret <2 x i64> [[VECINIT1_I]]
	poly64x2_t test_vdupq_n_p64(poly64_t a) {			poly64x2_t test_vdupq_n_p64(poly64_t a) {
	return vdupq_n_p64(a);			return vdupq_n_p64(a);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vmov_n_p64(i64 %a) #0 {			// CHECK-LABEL: define <1 x i64> @test_vmov_n_p64(i64 %a) #0 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <1 x i64> undef, i64 %a, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <1 x i64> undef, i64 %a, i32 0
	// CHECK: ret <1 x i64> [[VECINIT_I]]			// CHECK: ret <1 x i64> [[VECINIT_I]]
	poly64x1_t test_vmov_n_p64(poly64_t a) {			poly64x1_t test_vmov_n_p64(poly64_t a) {
	return vmov_n_p64(a);			return vmov_n_p64(a);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vmovq_n_p64(i64 %a) #0 {			// CHECK-LABEL: define <2 x i64> @test_vmovq_n_p64(i64 %a) #1 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x i64> undef, i64 %a, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x i64> undef, i64 %a, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x i64> [[VECINIT_I]], i64 %a, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x i64> [[VECINIT_I]], i64 %a, i32 1
	// CHECK: ret <2 x i64> [[VECINIT1_I]]			// CHECK: ret <2 x i64> [[VECINIT1_I]]
	poly64x2_t test_vmovq_n_p64(poly64_t a) {			poly64x2_t test_vmovq_n_p64(poly64_t a) {
	return vmovq_n_p64(a);			return vmovq_n_p64(a);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vdup_lane_p64(<1 x i64> %vec) #0 {			// CHECK-LABEL: define <1 x i64> @test_vdup_lane_p64(<1 x i64> %vec) #0 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <1 x i64> %vec, <1 x i64> %vec, <1 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <1 x i64> %vec, <1 x i64> %vec, <1 x i32> zeroinitializer
	// CHECK: ret <1 x i64> [[SHUFFLE]]			// CHECK: ret <1 x i64> [[SHUFFLE]]
	poly64x1_t test_vdup_lane_p64(poly64x1_t vec) {			poly64x1_t test_vdup_lane_p64(poly64x1_t vec) {
	return vdup_lane_p64(vec, 0);			return vdup_lane_p64(vec, 0);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vdupq_lane_p64(<1 x i64> %vec) #0 {			// CHECK-LABEL: define <2 x i64> @test_vdupq_lane_p64(<1 x i64> %vec) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <1 x i64> %vec, <1 x i64> %vec, <2 x i32> zeroinitializer			// CHECK: [[SHUFFLE:%.*]] = shufflevector <1 x i64> %vec, <1 x i64> %vec, <2 x i32> zeroinitializer
	// CHECK: ret <2 x i64> [[SHUFFLE]]			// CHECK: ret <2 x i64> [[SHUFFLE]]
	poly64x2_t test_vdupq_lane_p64(poly64x1_t vec) {			poly64x2_t test_vdupq_lane_p64(poly64x1_t vec) {
	return vdupq_lane_p64(vec, 0);			return vdupq_lane_p64(vec, 0);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vdupq_laneq_p64(<2 x i64> %vec) #0 {			// CHECK-LABEL: define <2 x i64> @test_vdupq_laneq_p64(<2 x i64> %vec) #1 {
	// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x i64> %vec, <2 x i64> %vec, <2 x i32> <i32 1, i32 1>			// CHECK: [[SHUFFLE:%.*]] = shufflevector <2 x i64> %vec, <2 x i64> %vec, <2 x i32> <i32 1, i32 1>
	// CHECK: ret <2 x i64> [[SHUFFLE]]			// CHECK: ret <2 x i64> [[SHUFFLE]]
	poly64x2_t test_vdupq_laneq_p64(poly64x2_t vec) {			poly64x2_t test_vdupq_laneq_p64(poly64x2_t vec) {
	return vdupq_laneq_p64(vec, 1);			return vdupq_laneq_p64(vec, 1);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vcombine_p64(<1 x i64> %low, <1 x i64> %high) #0 {			// CHECK-LABEL: define <2 x i64> @test_vcombine_p64(<1 x i64> %low, <1 x i64> %high) #1 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <1 x i64> %low, <1 x i64> %high, <2 x i32> <i32 0, i32 1>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <1 x i64> %low, <1 x i64> %high, <2 x i32> <i32 0, i32 1>
	// CHECK: ret <2 x i64> [[SHUFFLE_I]]			// CHECK: ret <2 x i64> [[SHUFFLE_I]]
	poly64x2_t test_vcombine_p64(poly64x1_t low, poly64x1_t high) {			poly64x2_t test_vcombine_p64(poly64x1_t low, poly64x1_t high) {
	return vcombine_p64(low, high);			return vcombine_p64(low, high);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vld1_p64(i64* %ptr) #0 {			// CHECK-LABEL: define <1 x i64> @test_vld1_p64(i64* %ptr) #0 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <1 x i64>*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <1 x i64>*
	// CHECK: [[TMP2:%.]] = load <1 x i64>, <1 x i64> [[TMP1]]			// CHECK: [[TMP2:%.]] = load <1 x i64>, <1 x i64> [[TMP1]]
	// CHECK: ret <1 x i64> [[TMP2]]			// CHECK: ret <1 x i64> [[TMP2]]
	poly64x1_t test_vld1_p64(poly64_t const * ptr) {			poly64x1_t test_vld1_p64(poly64_t const * ptr) {
	return vld1_p64(ptr);			return vld1_p64(ptr);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vld1q_p64(i64* %ptr) #0 {			// CHECK-LABEL: define <2 x i64> @test_vld1q_p64(i64* %ptr) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <2 x i64>*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <2 x i64>*
	// CHECK: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]]			// CHECK: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]]
	// CHECK: ret <2 x i64> [[TMP2]]			// CHECK: ret <2 x i64> [[TMP2]]
	poly64x2_t test_vld1q_p64(poly64_t const * ptr) {			poly64x2_t test_vld1q_p64(poly64_t const * ptr) {
	return vld1q_p64(ptr);			return vld1q_p64(ptr);
	}			}

	// CHECK-LABEL: define void @test_vst1_p64(i64* %ptr, <1 x i64> %val) #0 {			// CHECK-LABEL: define void @test_vst1_p64(i64* %ptr, <1 x i64> %val) #0 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %val to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %val to <8 x i8>
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP0]] to <1 x i64>*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP0]] to <1 x i64>*
	// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: store <1 x i64> [[TMP3]], <1 x i64>* [[TMP2]]			// CHECK: store <1 x i64> [[TMP3]], <1 x i64>* [[TMP2]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_p64(poly64_t * ptr, poly64x1_t val) {			void test_vst1_p64(poly64_t * ptr, poly64x1_t val) {
	return vst1_p64(ptr, val);			return vst1_p64(ptr, val);
	}			}

	// CHECK-LABEL: define void @test_vst1q_p64(i64* %ptr, <2 x i64> %val) #0 {			// CHECK-LABEL: define void @test_vst1q_p64(i64* %ptr, <2 x i64> %val) #1 {
	// CHECK: [[TMP0:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP0:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x i64> %val to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x i64> %val to <16 x i8>
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP0]] to <2 x i64>*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP0]] to <2 x i64>*
	// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x i64>			// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x i64>
	// CHECK: store <2 x i64> [[TMP3]], <2 x i64>* [[TMP2]]			// CHECK: store <2 x i64> [[TMP3]], <2 x i64>* [[TMP2]]
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_p64(poly64_t * ptr, poly64x2_t val) {			void test_vst1q_p64(poly64_t * ptr, poly64x2_t val) {
	return vst1q_p64(ptr, val);			return vst1q_p64(ptr, val);
	}			}

	// CHECK-LABEL: define %struct.poly64x1x2_t @test_vld2_p64(i64* %ptr) #0 {			// CHECK-LABEL: define %struct.poly64x1x2_t @test_vld2_p64(i64* %ptr) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <1 x i64>*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <1 x i64>*
	// CHECK: [[VLD2:%.]] = call { <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld2.v1i64.p0v1i64(<1 x i64> [[TMP2]])			// CHECK: [[VLD2:%.]] = call { <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld2.v1i64.p0v1i64(<1 x i64> [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64> }*
	// CHECK: store { <1 x i64>, <1 x i64> } [[VLD2]], { <1 x i64>, <1 x i64> }* [[TMP3]]			// CHECK: store { <1 x i64>, <1 x i64> } [[VLD2]], { <1 x i64>, <1 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x2_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x2_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 16, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x1x2_t, %struct.poly64x1x2_t [[RETVAL]], align 8			// CHECK: [[TMP6:%.]] = load %struct.poly64x1x2_t, %struct.poly64x1x2_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly64x1x2_t [[TMP6]]			// CHECK: ret %struct.poly64x1x2_t [[TMP6]]
	poly64x1x2_t test_vld2_p64(poly64_t const * ptr) {			poly64x1x2_t test_vld2_p64(poly64_t const * ptr) {
	return vld2_p64(ptr);			return vld2_p64(ptr);
	}			}

	// CHECK-LABEL: define %struct.poly64x2x2_t @test_vld2q_p64(i64* %ptr) #0 {			// CHECK-LABEL: define %struct.poly64x2x2_t @test_vld2q_p64(i64* %ptr) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <2 x i64>*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <2 x i64>*
	// CHECK: [[VLD2:%.]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2.v2i64.p0v2i64(<2 x i64> [[TMP2]])			// CHECK: [[VLD2:%.]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2.v2i64.p0v2i64(<2 x i64> [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64> } [[VLD2]], { <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64> } [[VLD2]], { <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x2_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x2_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x2_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 32, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x2x2_t, %struct.poly64x2x2_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.poly64x2x2_t, %struct.poly64x2x2_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x2_t [[TMP6]]			// CHECK: ret %struct.poly64x2x2_t [[TMP6]]
	poly64x2x2_t test_vld2q_p64(poly64_t const * ptr) {			poly64x2x2_t test_vld2q_p64(poly64_t const * ptr) {
	return vld2q_p64(ptr);			return vld2q_p64(ptr);
	}			}

	// CHECK-LABEL: define %struct.poly64x1x3_t @test_vld3_p64(i64* %ptr) #0 {			// CHECK-LABEL: define %struct.poly64x1x3_t @test_vld3_p64(i64* %ptr) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <1 x i64>*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <1 x i64>*
	// CHECK: [[VLD3:%.]] = call { <1 x i64>, <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld3.v1i64.p0v1i64(<1 x i64> [[TMP2]])			// CHECK: [[VLD3:%.]] = call { <1 x i64>, <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld3.v1i64.p0v1i64(<1 x i64> [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64>, <1 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64>, <1 x i64> }*
	// CHECK: store { <1 x i64>, <1 x i64>, <1 x i64> } [[VLD3]], { <1 x i64>, <1 x i64>, <1 x i64> }* [[TMP3]]			// CHECK: store { <1 x i64>, <1 x i64>, <1 x i64> } [[VLD3]], { <1 x i64>, <1 x i64>, <1 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x3_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x3_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 24, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x1x3_t, %struct.poly64x1x3_t [[RETVAL]], align 8			// CHECK: [[TMP6:%.]] = load %struct.poly64x1x3_t, %struct.poly64x1x3_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly64x1x3_t [[TMP6]]			// CHECK: ret %struct.poly64x1x3_t [[TMP6]]
	poly64x1x3_t test_vld3_p64(poly64_t const * ptr) {			poly64x1x3_t test_vld3_p64(poly64_t const * ptr) {
	return vld3_p64(ptr);			return vld3_p64(ptr);
	}			}

	// CHECK-LABEL: define %struct.poly64x2x3_t @test_vld3q_p64(i64* %ptr) #0 {			// CHECK-LABEL: define %struct.poly64x2x3_t @test_vld3q_p64(i64* %ptr) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <2 x i64>*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <2 x i64>*
	// CHECK: [[VLD3:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld3.v2i64.p0v2i64(<2 x i64> [[TMP2]])			// CHECK: [[VLD3:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld3.v2i64.p0v2i64(<2 x i64> [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64> } [[VLD3]], { <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64> } [[VLD3]], { <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x3_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x3_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x3_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 48, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x2x3_t, %struct.poly64x2x3_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.poly64x2x3_t, %struct.poly64x2x3_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x3_t [[TMP6]]			// CHECK: ret %struct.poly64x2x3_t [[TMP6]]
	poly64x2x3_t test_vld3q_p64(poly64_t const * ptr) {			poly64x2x3_t test_vld3q_p64(poly64_t const * ptr) {
	return vld3q_p64(ptr);			return vld3q_p64(ptr);
	}			}

	// CHECK-LABEL: define %struct.poly64x1x4_t @test_vld4_p64(i64* %ptr) #0 {			// CHECK-LABEL: define %struct.poly64x1x4_t @test_vld4_p64(i64* %ptr) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <1 x i64>*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <1 x i64>*
	// CHECK: [[VLD4:%.]] = call { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld4.v1i64.p0v1i64(<1 x i64> [[TMP2]])			// CHECK: [[VLD4:%.]] = call { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> } @llvm.aarch64.neon.ld4.v1i64.p0v1i64(<1 x i64> [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }*
	// CHECK: store { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> } [[VLD4]], { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }* [[TMP3]]			// CHECK: store { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> } [[VLD4]], { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x4_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x1x4_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x4_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x1x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP4]], i8* align 8 [[TMP5]], i64 32, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x1x4_t, %struct.poly64x1x4_t [[RETVAL]], align 8			// CHECK: [[TMP6:%.]] = load %struct.poly64x1x4_t, %struct.poly64x1x4_t [[RETVAL]], align 8
	// CHECK: ret %struct.poly64x1x4_t [[TMP6]]			// CHECK: ret %struct.poly64x1x4_t [[TMP6]]
	poly64x1x4_t test_vld4_p64(poly64_t const * ptr) {			poly64x1x4_t test_vld4_p64(poly64_t const * ptr) {
	return vld4_p64(ptr);			return vld4_p64(ptr);
	}			}

	// CHECK-LABEL: define %struct.poly64x2x4_t @test_vld4q_p64(i64* %ptr) #0 {			// CHECK-LABEL: define %struct.poly64x2x4_t @test_vld4q_p64(i64* %ptr) #2 {
	// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[RETVAL:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP1:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <2 x i64>*			// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <2 x i64>*
	// CHECK: [[VLD4:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld4.v2i64.p0v2i64(<2 x i64> [[TMP2]])			// CHECK: [[VLD4:%.]] = call { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld4.v2i64.p0v2i64(<2 x i64> [[TMP2]])
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }*
	// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } [[VLD4]], { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]			// CHECK: store { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> } [[VLD4]], { <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64> }* [[TMP3]]
	// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x4_t [[RETVAL]] to i8*			// CHECK: [[TMP4:%.]] = bitcast %struct.poly64x2x4_t [[RETVAL]] to i8*
	// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*			// CHECK: [[TMP5:%.]] = bitcast %struct.poly64x2x4_t [[__RET]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP4]], i8* align 16 [[TMP5]], i64 64, i1 false)
	// CHECK: [[TMP6:%.]] = load %struct.poly64x2x4_t, %struct.poly64x2x4_t [[RETVAL]], align 16			// CHECK: [[TMP6:%.]] = load %struct.poly64x2x4_t, %struct.poly64x2x4_t [[RETVAL]], align 16
	// CHECK: ret %struct.poly64x2x4_t [[TMP6]]			// CHECK: ret %struct.poly64x2x4_t [[TMP6]]
	poly64x2x4_t test_vld4q_p64(poly64_t const * ptr) {			poly64x2x4_t test_vld4q_p64(poly64_t const * ptr) {
	return vld4q_p64(ptr);			return vld4q_p64(ptr);
	}			}

	// CHECK-LABEL: define void @test_vst2_p64(i64* %ptr, [2 x <1 x i64>] %val.coerce) #0 {			// CHECK-LABEL: define void @test_vst2_p64(i64* %ptr, [2 x <1 x i64>] %val.coerce) #2 {
	// CHECK: [[VAL:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[VAL:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x2_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[VAL]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[VAL]], i32 0, i32 0
	// CHECK: store [2 x <1 x i64>] [[VAL]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [2 x <1 x i64>] [[VAL]].coerce, [2 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x2_t [[VAL]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x2_t [[VAL]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 16, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL1]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL1]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX]], align 8			// CHECK: [[TMP3:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX]], align 8
	// CHECK: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to <8 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to <8 x i8>
	// CHECK: [[VAL2:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL2:%.]] = getelementptr inbounds %struct.poly64x1x2_t, %struct.poly64x1x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX3:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL2]], i64 0, i64 1			// CHECK: [[ARRAYIDX3:%.]] = getelementptr inbounds [2 x <1 x i64>], [2 x <1 x i64>] [[VAL2]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX3]], align 8			// CHECK: [[TMP5:%.]] = load <1 x i64>, <1 x i64> [[ARRAYIDX3]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <1 x i64> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <1 x i64> [[TMP5]] to <8 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x i64>			// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x i64>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st2.v1i64.p0i8(<1 x i64> [[TMP7]], <1 x i64> [[TMP8]], i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2.v1i64.p0i8(<1 x i64> [[TMP7]], <1 x i64> [[TMP8]], i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_p64(poly64_t * ptr, poly64x1x2_t val) {			void test_vst2_p64(poly64_t * ptr, poly64x1x2_t val) {
	return vst2_p64(ptr, val);			return vst2_p64(ptr, val);
	}			}

	// CHECK-LABEL: define void @test_vst2q_p64(i64* %ptr, [2 x <2 x i64>] %val.coerce) #0 {			// CHECK-LABEL: define void @test_vst2q_p64(i64* %ptr, [2 x <2 x i64>] %val.coerce) #2 {
	// CHECK: [[VAL:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[VAL:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x2_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[VAL]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[VAL]], i32 0, i32 0
	// CHECK: store [2 x <2 x i64>] [[VAL]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [2 x <2 x i64>] [[VAL]].coerce, [2 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x2_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x2_t [[VAL]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x2_t [[VAL]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL1]], i64 0, i64 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL1]], i64 0, i64 0
	// CHECK: [[TMP3:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX]], align 16			// CHECK: [[TMP3:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX]], align 16
	// CHECK: [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <16 x i8>			// CHECK: [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <16 x i8>
	// CHECK: [[VAL2:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL2:%.]] = getelementptr inbounds %struct.poly64x2x2_t, %struct.poly64x2x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX3:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL2]], i64 0, i64 1			// CHECK: [[ARRAYIDX3:%.]] = getelementptr inbounds [2 x <2 x i64>], [2 x <2 x i64>] [[VAL2]], i64 0, i64 1
	// CHECK: [[TMP5:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX3]], align 16			// CHECK: [[TMP5:%.]] = load <2 x i64>, <2 x i64> [[ARRAYIDX3]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <2 x i64> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <2 x i64> [[TMP5]] to <16 x i8>
	// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x i64>			// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x i64>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st2.v2i64.p0i8(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st2.v2i64.p0i8(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_p64(poly64_t * ptr, poly64x2x2_t val) {			void test_vst2q_p64(poly64_t * ptr, poly64x2x2_t val) {
	return vst2q_p64(ptr, val);			return vst2q_p64(ptr, val);
	}			}

	// CHECK-LABEL: define void @test_vst3_p64(i64* %ptr, [3 x <1 x i64>] %val.coerce) #0 {			// CHECK-LABEL: define void @test_vst3_p64(i64* %ptr, [3 x <1 x i64>] %val.coerce) #2 {
	// CHECK: [[VAL:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[VAL:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x3_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x3_t, %struct.poly64x1x3_t [[VAL]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x3_t, %struct.poly64x1x3_t [[VAL]], i32 0, i32 0
	// CHECK: store [3 x <1 x i64>] [[VAL]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [3 x <1 x i64>] [[VAL]].coerce, [3 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x3_t [[VAL]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x3_t [[VAL]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 24, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <1 x i64>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st3.v1i64.p0i8(<1 x i64> [[TMP9]], <1 x i64> [[TMP10]], <1 x i64> [[TMP11]], i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3.v1i64.p0i8(<1 x i64> [[TMP9]], <1 x i64> [[TMP10]], <1 x i64> [[TMP11]], i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_p64(poly64_t * ptr, poly64x1x3_t val) {			void test_vst3_p64(poly64_t * ptr, poly64x1x3_t val) {
	return vst3_p64(ptr, val);			return vst3_p64(ptr, val);
	}			}

	// CHECK-LABEL: define void @test_vst3q_p64(i64* %ptr, [3 x <2 x i64>] %val.coerce) #0 {			// CHECK-LABEL: define void @test_vst3q_p64(i64* %ptr, [3 x <2 x i64>] %val.coerce) #2 {
	// CHECK: [[VAL:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[VAL:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x3_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x3_t, %struct.poly64x2x3_t [[VAL]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x3_t, %struct.poly64x2x3_t [[VAL]], i32 0, i32 0
	// CHECK: store [3 x <2 x i64>] [[VAL]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [3 x <2 x i64>] [[VAL]].coerce, [3 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x3_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x3_t [[VAL]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x3_t [[VAL]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 48, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*
	Show All 13 Lines
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <2 x i64>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <2 x i64>
	// CHECK: call void @llvm.aarch64.neon.st3.v2i64.p0i8(<2 x i64> [[TMP9]], <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st3.v2i64.p0i8(<2 x i64> [[TMP9]], <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_p64(poly64_t * ptr, poly64x2x3_t val) {			void test_vst3q_p64(poly64_t * ptr, poly64x2x3_t val) {
	return vst3q_p64(ptr, val);			return vst3q_p64(ptr, val);
	}			}

	// CHECK-LABEL: define void @test_vst4_p64(i64* %ptr, [4 x <1 x i64>] %val.coerce) #0 {			// CHECK-LABEL: define void @test_vst4_p64(i64* %ptr, [4 x <1 x i64>] %val.coerce) #2 {
	// CHECK: [[VAL:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[VAL:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x1x4_t, align 8
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x4_t, %struct.poly64x1x4_t [[VAL]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x1x4_t, %struct.poly64x1x4_t [[VAL]], i32 0, i32 0
	// CHECK: store [4 x <1 x i64>] [[VAL]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8			// CHECK: store [4 x <1 x i64>] [[VAL]].coerce, [4 x <1 x i64>]* [[COERCE_DIVE]], align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x1x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x4_t [[VAL]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x1x4_t [[VAL]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 [[TMP1]], i64 32, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*
	Show All 18 Lines
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <1 x i64>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x i64>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <1 x i64>
	// CHECK: call void @llvm.aarch64.neon.st4.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_p64(poly64_t * ptr, poly64x1x4_t val) {			void test_vst4_p64(poly64_t * ptr, poly64x1x4_t val) {
	return vst4_p64(ptr, val);			return vst4_p64(ptr, val);
	}			}

	// CHECK-LABEL: define void @test_vst4q_p64(i64* %ptr, [4 x <2 x i64>] %val.coerce) #0 {			// CHECK-LABEL: define void @test_vst4q_p64(i64* %ptr, [4 x <2 x i64>] %val.coerce) #2 {
	// CHECK: [[VAL:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[VAL:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.poly64x2x4_t, align 16
	// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x4_t, %struct.poly64x2x4_t [[VAL]], i32 0, i32 0			// CHECK: [[COERCE_DIVE:%.]] = getelementptr inbounds %struct.poly64x2x4_t, %struct.poly64x2x4_t [[VAL]], i32 0, i32 0
	// CHECK: store [4 x <2 x i64>] [[VAL]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16			// CHECK: store [4 x <2 x i64>] [[VAL]].coerce, [4 x <2 x i64>]* [[COERCE_DIVE]], align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__S1]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.poly64x2x4_t [[__S1]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x4_t [[VAL]] to i8*			// CHECK: [[TMP1:%.]] = bitcast %struct.poly64x2x4_t [[VAL]] to i8*
	// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)			// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 [[TMP0]], i8* align 16 [[TMP1]], i64 64, i1 false)
	// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*			// CHECK: [[TMP2:%.]] = bitcast i64 %ptr to i8*
	Show All 30 Lines
	// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: [[VEXT:%.*]] = shufflevector <1 x i64> [[TMP2]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer			// CHECK: [[VEXT:%.*]] = shufflevector <1 x i64> [[TMP2]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer
	// CHECK: ret <1 x i64> [[VEXT]]			// CHECK: ret <1 x i64> [[VEXT]]
	poly64x1_t test_vext_p64(poly64x1_t a, poly64x1_t b) {			poly64x1_t test_vext_p64(poly64x1_t a, poly64x1_t b) {
	return vext_u64(a, b, 0);			return vext_u64(a, b, 0);

	}			}

	// CHECK-LABEL: define <2 x i64> @test_vextq_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vextq_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <2 x i64> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x i64> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x i64>			// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x i64>
	// CHECK: [[VEXT:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> [[TMP3]], <2 x i32> <i32 1, i32 2>			// CHECK: [[VEXT:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> [[TMP3]], <2 x i32> <i32 1, i32 2>
	// CHECK: ret <2 x i64> [[VEXT]]			// CHECK: ret <2 x i64> [[VEXT]]
	poly64x2_t test_vextq_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vextq_p64(poly64x2_t a, poly64x2_t b) {
	return vextq_p64(a, b, 1);			return vextq_p64(a, b, 1);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vzip1q_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vzip1q_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 2>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 2>
	// CHECK: ret <2 x i64> [[SHUFFLE_I]]			// CHECK: ret <2 x i64> [[SHUFFLE_I]]
	poly64x2_t test_vzip1q_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vzip1q_p64(poly64x2_t a, poly64x2_t b) {
	return vzip1q_p64(a, b);			return vzip1q_p64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vzip2q_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vzip2q_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 1, i32 3>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 1, i32 3>
	// CHECK: ret <2 x i64> [[SHUFFLE_I]]			// CHECK: ret <2 x i64> [[SHUFFLE_I]]
	poly64x2_t test_vzip2q_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vzip2q_p64(poly64x2_t a, poly64x2_t b) {
	return vzip2q_u64(a, b);			return vzip2q_u64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vuzp1q_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vuzp1q_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 2>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 2>
	// CHECK: ret <2 x i64> [[SHUFFLE_I]]			// CHECK: ret <2 x i64> [[SHUFFLE_I]]
	poly64x2_t test_vuzp1q_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vuzp1q_p64(poly64x2_t a, poly64x2_t b) {
	return vuzp1q_p64(a, b);			return vuzp1q_p64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vuzp2q_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vuzp2q_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 1, i32 3>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 1, i32 3>
	// CHECK: ret <2 x i64> [[SHUFFLE_I]]			// CHECK: ret <2 x i64> [[SHUFFLE_I]]
	poly64x2_t test_vuzp2q_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vuzp2q_p64(poly64x2_t a, poly64x2_t b) {
	return vuzp2q_u64(a, b);			return vuzp2q_u64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vtrn1q_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vtrn1q_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 2>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 0, i32 2>
	// CHECK: ret <2 x i64> [[SHUFFLE_I]]			// CHECK: ret <2 x i64> [[SHUFFLE_I]]
	poly64x2_t test_vtrn1q_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vtrn1q_p64(poly64x2_t a, poly64x2_t b) {
	return vtrn1q_p64(a, b);			return vtrn1q_p64(a, b);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vtrn2q_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vtrn2q_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 1, i32 3>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 1, i32 3>
	// CHECK: ret <2 x i64> [[SHUFFLE_I]]			// CHECK: ret <2 x i64> [[SHUFFLE_I]]
	poly64x2_t test_vtrn2q_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vtrn2q_p64(poly64x2_t a, poly64x2_t b) {
	return vtrn2q_u64(a, b);			return vtrn2q_u64(a, b);
	}			}

	// CHECK-LABEL: define <1 x i64> @test_vsri_n_p64(<1 x i64> %a, <1 x i64> %b) #0 {			// CHECK-LABEL: define <1 x i64> @test_vsri_n_p64(<1 x i64> %a, <1 x i64> %b) #0 {
	// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %a to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <1 x i64> %a to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <1 x i64> %b to <8 x i8>
	// CHECK: [[VSRI_N:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>			// CHECK: [[VSRI_N:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
	// CHECK: [[VSRI_N1:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>			// CHECK: [[VSRI_N1:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x i64>
	// CHECK: [[VSRI_N2:%.*]] = call <1 x i64> @llvm.aarch64.neon.vsri.v1i64(<1 x i64> [[VSRI_N]], <1 x i64> [[VSRI_N1]], i32 33)			// CHECK: [[VSRI_N2:%.*]] = call <1 x i64> @llvm.aarch64.neon.vsri.v1i64(<1 x i64> [[VSRI_N]], <1 x i64> [[VSRI_N1]], i32 33)
	// CHECK: ret <1 x i64> [[VSRI_N2]]			// CHECK: ret <1 x i64> [[VSRI_N2]]
	poly64x1_t test_vsri_n_p64(poly64x1_t a, poly64x1_t b) {			poly64x1_t test_vsri_n_p64(poly64x1_t a, poly64x1_t b) {
	return vsri_n_p64(a, b, 33);			return vsri_n_p64(a, b, 33);
	}			}

	// CHECK-LABEL: define <2 x i64> @test_vsriq_n_p64(<2 x i64> %a, <2 x i64> %b) #0 {			// CHECK-LABEL: define <2 x i64> @test_vsriq_n_p64(<2 x i64> %a, <2 x i64> %b) #1 {
	// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x i64> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <2 x i64> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x i64> %b to <16 x i8>
	// CHECK: [[VSRI_N:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>			// CHECK: [[VSRI_N:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
	// CHECK: [[VSRI_N1:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x i64>			// CHECK: [[VSRI_N1:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x i64>
	// CHECK: [[VSRI_N2:%.*]] = call <2 x i64> @llvm.aarch64.neon.vsri.v2i64(<2 x i64> [[VSRI_N]], <2 x i64> [[VSRI_N1]], i32 64)			// CHECK: [[VSRI_N2:%.*]] = call <2 x i64> @llvm.aarch64.neon.vsri.v2i64(<2 x i64> [[VSRI_N]], <2 x i64> [[VSRI_N1]], i32 64)
	// CHECK: ret <2 x i64> [[VSRI_N2]]			// CHECK: ret <2 x i64> [[VSRI_N2]]
	poly64x2_t test_vsriq_n_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vsriq_n_p64(poly64x2_t a, poly64x2_t b) {
	return vsriq_n_p64(a, b, 64);			return vsriq_n_p64(a, b, 64);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"
				// CHECK-NOT: attributes #2 ={{.*}}"min-legal-vector-width"

test/CodeGen/arm-neon-fma.c

	// RUN: %clang_cc1 -triple thumbv7-none-linux-gnueabihf \			// RUN: %clang_cc1 -triple thumbv7-none-linux-gnueabihf \
	// RUN: -target-abi aapcs \			// RUN: -target-abi aapcs \
	// RUN: -target-cpu cortex-a7 \			// RUN: -target-cpu cortex-a7 \
	// RUN: -mfloat-abi hard \			// RUN: -mfloat-abi hard \
	// RUN: -ffreestanding \			// RUN: -ffreestanding \
	// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s			// RUN: -disable-O0-optnone -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x float> @test_fma_order(<2 x float> %accum, <2 x float> %lhs, <2 x float> %rhs) #0 {			// CHECK-LABEL: define <2 x float> @test_fma_order(<2 x float> %accum, <2 x float> %lhs, <2 x float> %rhs) #0 {
	// CHECK: [[TMP6:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> %lhs, <2 x float> %rhs, <2 x float> %accum) #2			// CHECK: [[TMP6:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> %lhs, <2 x float> %rhs, <2 x float> %accum) #3
	// CHECK: ret <2 x float> [[TMP6]]			// CHECK: ret <2 x float> [[TMP6]]
	float32x2_t test_fma_order(float32x2_t accum, float32x2_t lhs, float32x2_t rhs) {			float32x2_t test_fma_order(float32x2_t accum, float32x2_t lhs, float32x2_t rhs) {
	return vfma_f32(accum, lhs, rhs);			return vfma_f32(accum, lhs, rhs);
	}			}

	// CHECK-LABEL: define <4 x float> @test_fmaq_order(<4 x float> %accum, <4 x float> %lhs, <4 x float> %rhs) #0 {			// CHECK-LABEL: define <4 x float> @test_fmaq_order(<4 x float> %accum, <4 x float> %lhs, <4 x float> %rhs) #1 {
	// CHECK: [[TMP6:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %lhs, <4 x float> %rhs, <4 x float> %accum) #2			// CHECK: [[TMP6:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %lhs, <4 x float> %rhs, <4 x float> %accum) #3
	// CHECK: ret <4 x float> [[TMP6]]			// CHECK: ret <4 x float> [[TMP6]]
	float32x4_t test_fmaq_order(float32x4_t accum, float32x4_t lhs, float32x4_t rhs) {			float32x4_t test_fmaq_order(float32x4_t accum, float32x4_t lhs, float32x4_t rhs) {
	return vfmaq_f32(accum, lhs, rhs);			return vfmaq_f32(accum, lhs, rhs);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vfma_n_f32(<2 x float> %a, <2 x float> %b, float %n) #0 {			// CHECK-LABEL: define <2 x float> @test_vfma_n_f32(<2 x float> %a, <2 x float> %b, float %n) #0 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x float> undef, float %n, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x float> undef, float %n, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x float> [[VECINIT_I]], float %n, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x float> [[VECINIT_I]], float %n, i32 1
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <2 x float> [[VECINIT1_I]] to <8 x i8>			// CHECK: [[TMP2:%.*]] = bitcast <2 x float> [[VECINIT1_I]] to <8 x i8>
	// CHECK: [[TMP3:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> %b, <2 x float> [[VECINIT1_I]], <2 x float> %a)			// CHECK: [[TMP3:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> %b, <2 x float> [[VECINIT1_I]], <2 x float> %a)
	// CHECK: ret <2 x float> [[TMP3]]			// CHECK: ret <2 x float> [[TMP3]]
	float32x2_t test_vfma_n_f32(float32x2_t a, float32x2_t b, float32_t n) {			float32x2_t test_vfma_n_f32(float32x2_t a, float32x2_t b, float32_t n) {
	return vfma_n_f32(a, b, n);			return vfma_n_f32(a, b, n);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vfmaq_n_f32(<4 x float> %a, <4 x float> %b, float %n) #0 {			// CHECK-LABEL: define <4 x float> @test_vfmaq_n_f32(<4 x float> %a, <4 x float> %b, float %n) #1 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <4 x float> undef, float %n, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <4 x float> undef, float %n, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <4 x float> [[VECINIT_I]], float %n, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <4 x float> [[VECINIT_I]], float %n, i32 1
	// CHECK: [[VECINIT2_I:%.*]] = insertelement <4 x float> [[VECINIT1_I]], float %n, i32 2			// CHECK: [[VECINIT2_I:%.*]] = insertelement <4 x float> [[VECINIT1_I]], float %n, i32 2
	// CHECK: [[VECINIT3_I:%.*]] = insertelement <4 x float> [[VECINIT2_I]], float %n, i32 3			// CHECK: [[VECINIT3_I:%.*]] = insertelement <4 x float> [[VECINIT2_I]], float %n, i32 3
	// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <4 x float> [[VECINIT3_I]] to <16 x i8>			// CHECK: [[TMP2:%.*]] = bitcast <4 x float> [[VECINIT3_I]] to <16 x i8>
	// CHECK: [[TMP3:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %b, <4 x float> [[VECINIT3_I]], <4 x float> %a)			// CHECK: [[TMP3:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %b, <4 x float> [[VECINIT3_I]], <4 x float> %a)
	// CHECK: ret <4 x float> [[TMP3]]			// CHECK: ret <4 x float> [[TMP3]]
	float32x4_t test_vfmaq_n_f32(float32x4_t a, float32x4_t b, float32_t n) {			float32x4_t test_vfmaq_n_f32(float32x4_t a, float32x4_t b, float32_t n) {
	return vfmaq_n_f32(a, b, n);			return vfmaq_n_f32(a, b, n);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/arm-neon-numeric-maxmin.c

	// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x float> @test_vmaxnm_f32(<2 x float> %a, <2 x float> %b) #0 {			// CHECK-LABEL: define <2 x float> @test_vmaxnm_f32(<2 x float> %a, <2 x float> %b) #0 {
	// CHECK: [[VMAXNM_V2_I:%.*]] = call <2 x float> @llvm.arm.neon.vmaxnm.v2f32(<2 x float> %a, <2 x float> %b) #2			// CHECK: [[VMAXNM_V2_I:%.*]] = call <2 x float> @llvm.arm.neon.vmaxnm.v2f32(<2 x float> %a, <2 x float> %b) #3
	// CHECK: ret <2 x float> [[VMAXNM_V2_I]]			// CHECK: ret <2 x float> [[VMAXNM_V2_I]]
	float32x2_t test_vmaxnm_f32(float32x2_t a, float32x2_t b) {			float32x2_t test_vmaxnm_f32(float32x2_t a, float32x2_t b) {
	return vmaxnm_f32(a, b);			return vmaxnm_f32(a, b);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vmaxnmq_f32(<4 x float> %a, <4 x float> %b) #0 {			// CHECK-LABEL: define <4 x float> @test_vmaxnmq_f32(<4 x float> %a, <4 x float> %b) #1 {
	// CHECK: [[VMAXNMQ_V2_I:%.*]] = call <4 x float> @llvm.arm.neon.vmaxnm.v4f32(<4 x float> %a, <4 x float> %b) #2			// CHECK: [[VMAXNMQ_V2_I:%.*]] = call <4 x float> @llvm.arm.neon.vmaxnm.v4f32(<4 x float> %a, <4 x float> %b) #3
	// CHECK: ret <4 x float> [[VMAXNMQ_V2_I]]			// CHECK: ret <4 x float> [[VMAXNMQ_V2_I]]
	float32x4_t test_vmaxnmq_f32(float32x4_t a, float32x4_t b) {			float32x4_t test_vmaxnmq_f32(float32x4_t a, float32x4_t b) {
	return vmaxnmq_f32(a, b);			return vmaxnmq_f32(a, b);
	}			}

	// CHECK-LABEL: define <2 x float> @test_vminnm_f32(<2 x float> %a, <2 x float> %b) #0 {			// CHECK-LABEL: define <2 x float> @test_vminnm_f32(<2 x float> %a, <2 x float> %b) #0 {
	// CHECK: [[VMINNM_V2_I:%.*]] = call <2 x float> @llvm.arm.neon.vminnm.v2f32(<2 x float> %a, <2 x float> %b) #2			// CHECK: [[VMINNM_V2_I:%.*]] = call <2 x float> @llvm.arm.neon.vminnm.v2f32(<2 x float> %a, <2 x float> %b) #3
	// CHECK: ret <2 x float> [[VMINNM_V2_I]]			// CHECK: ret <2 x float> [[VMINNM_V2_I]]
	float32x2_t test_vminnm_f32(float32x2_t a, float32x2_t b) {			float32x2_t test_vminnm_f32(float32x2_t a, float32x2_t b) {
	return vminnm_f32(a, b);			return vminnm_f32(a, b);
	}			}

	// CHECK-LABEL: define <4 x float> @test_vminnmq_f32(<4 x float> %a, <4 x float> %b) #0 {			// CHECK-LABEL: define <4 x float> @test_vminnmq_f32(<4 x float> %a, <4 x float> %b) #1 {
	// CHECK: [[VMINNMQ_V2_I:%.*]] = call <4 x float> @llvm.arm.neon.vminnm.v4f32(<4 x float> %a, <4 x float> %b) #2			// CHECK: [[VMINNMQ_V2_I:%.*]] = call <4 x float> @llvm.arm.neon.vminnm.v4f32(<4 x float> %a, <4 x float> %b) #3
	// CHECK: ret <4 x float> [[VMINNMQ_V2_I]]			// CHECK: ret <4 x float> [[VMINNMQ_V2_I]]
	float32x4_t test_vminnmq_f32(float32x4_t a, float32x4_t b) {			float32x4_t test_vminnmq_f32(float32x4_t a, float32x4_t b) {
	return vminnmq_f32(a, b);			return vminnmq_f32(a, b);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/arm-neon-vcvtX.c

	// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s			// RUN: %clang_cc1 -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm %s -o - \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: define <2 x i32> @test_vcvta_s32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvta_s32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTA_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtas.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTA_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtas.v2i32.v2f32(<2 x float> %a) #3
	// CHECK: ret <2 x i32> [[VCVTA_S32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTA_S32_V1_I]]
	int32x2_t test_vcvta_s32_f32(float32x2_t a) {			int32x2_t test_vcvta_s32_f32(float32x2_t a) {
	return vcvta_s32_f32(a);			return vcvta_s32_f32(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vcvta_u32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvta_u32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTA_U32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtau.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTA_U32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtau.v2i32.v2f32(<2 x float> %a) #3
	// CHECK: ret <2 x i32> [[VCVTA_U32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTA_U32_V1_I]]
	uint32x2_t test_vcvta_u32_f32(float32x2_t a) {			uint32x2_t test_vcvta_u32_f32(float32x2_t a) {
	return vcvta_u32_f32(a);			return vcvta_u32_f32(a);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vcvtaq_s32_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define <4 x i32> @test_vcvtaq_s32_f32(<4 x float> %a) #1 {
	// CHECK: [[VCVTAQ_S32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtas.v4i32.v4f32(<4 x float> %a) #2			// CHECK: [[VCVTAQ_S32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtas.v4i32.v4f32(<4 x float> %a) #3
	// CHECK: ret <4 x i32> [[VCVTAQ_S32_V1_I]]			// CHECK: ret <4 x i32> [[VCVTAQ_S32_V1_I]]
	int32x4_t test_vcvtaq_s32_f32(float32x4_t a) {			int32x4_t test_vcvtaq_s32_f32(float32x4_t a) {
	return vcvtaq_s32_f32(a);			return vcvtaq_s32_f32(a);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vcvtaq_u32_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define <4 x i32> @test_vcvtaq_u32_f32(<4 x float> %a) #1 {
	// CHECK: [[VCVTAQ_U32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtau.v4i32.v4f32(<4 x float> %a) #2			// CHECK: [[VCVTAQ_U32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtau.v4i32.v4f32(<4 x float> %a) #3
	// CHECK: ret <4 x i32> [[VCVTAQ_U32_V1_I]]			// CHECK: ret <4 x i32> [[VCVTAQ_U32_V1_I]]
	uint32x4_t test_vcvtaq_u32_f32(float32x4_t a) {			uint32x4_t test_vcvtaq_u32_f32(float32x4_t a) {
	return vcvtaq_u32_f32(a);			return vcvtaq_u32_f32(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vcvtn_s32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvtn_s32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTN_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtns.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTN_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtns.v2i32.v2f32(<2 x float> %a) #3
	// CHECK: ret <2 x i32> [[VCVTN_S32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTN_S32_V1_I]]
	int32x2_t test_vcvtn_s32_f32(float32x2_t a) {			int32x2_t test_vcvtn_s32_f32(float32x2_t a) {
	return vcvtn_s32_f32(a);			return vcvtn_s32_f32(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vcvtn_u32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvtn_u32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTN_U32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtnu.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTN_U32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtnu.v2i32.v2f32(<2 x float> %a) #3
	// CHECK: ret <2 x i32> [[VCVTN_U32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTN_U32_V1_I]]
	uint32x2_t test_vcvtn_u32_f32(float32x2_t a) {			uint32x2_t test_vcvtn_u32_f32(float32x2_t a) {
	return vcvtn_u32_f32(a);			return vcvtn_u32_f32(a);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vcvtnq_s32_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define <4 x i32> @test_vcvtnq_s32_f32(<4 x float> %a) #1 {
	// CHECK: [[VCVTNQ_S32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtns.v4i32.v4f32(<4 x float> %a) #2			// CHECK: [[VCVTNQ_S32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtns.v4i32.v4f32(<4 x float> %a) #3
	// CHECK: ret <4 x i32> [[VCVTNQ_S32_V1_I]]			// CHECK: ret <4 x i32> [[VCVTNQ_S32_V1_I]]
	int32x4_t test_vcvtnq_s32_f32(float32x4_t a) {			int32x4_t test_vcvtnq_s32_f32(float32x4_t a) {
	return vcvtnq_s32_f32(a);			return vcvtnq_s32_f32(a);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vcvtnq_u32_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define <4 x i32> @test_vcvtnq_u32_f32(<4 x float> %a) #1 {
	// CHECK: [[VCVTNQ_U32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtnu.v4i32.v4f32(<4 x float> %a) #2			// CHECK: [[VCVTNQ_U32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtnu.v4i32.v4f32(<4 x float> %a) #3
	// CHECK: ret <4 x i32> [[VCVTNQ_U32_V1_I]]			// CHECK: ret <4 x i32> [[VCVTNQ_U32_V1_I]]
	uint32x4_t test_vcvtnq_u32_f32(float32x4_t a) {			uint32x4_t test_vcvtnq_u32_f32(float32x4_t a) {
	return vcvtnq_u32_f32(a);			return vcvtnq_u32_f32(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vcvtp_s32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvtp_s32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTP_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtps.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTP_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtps.v2i32.v2f32(<2 x float> %a) #3
	// CHECK: ret <2 x i32> [[VCVTP_S32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTP_S32_V1_I]]
	int32x2_t test_vcvtp_s32_f32(float32x2_t a) {			int32x2_t test_vcvtp_s32_f32(float32x2_t a) {
	return vcvtp_s32_f32(a);			return vcvtp_s32_f32(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vcvtp_u32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvtp_u32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTP_U32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtpu.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTP_U32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtpu.v2i32.v2f32(<2 x float> %a) #3
	// CHECK: ret <2 x i32> [[VCVTP_U32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTP_U32_V1_I]]
	uint32x2_t test_vcvtp_u32_f32(float32x2_t a) {			uint32x2_t test_vcvtp_u32_f32(float32x2_t a) {
	return vcvtp_u32_f32(a);			return vcvtp_u32_f32(a);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vcvtpq_s32_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define <4 x i32> @test_vcvtpq_s32_f32(<4 x float> %a) #1 {
	// CHECK: [[VCVTPQ_S32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtps.v4i32.v4f32(<4 x float> %a) #2			// CHECK: [[VCVTPQ_S32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtps.v4i32.v4f32(<4 x float> %a) #3
	// CHECK: ret <4 x i32> [[VCVTPQ_S32_V1_I]]			// CHECK: ret <4 x i32> [[VCVTPQ_S32_V1_I]]
	int32x4_t test_vcvtpq_s32_f32(float32x4_t a) {			int32x4_t test_vcvtpq_s32_f32(float32x4_t a) {
	return vcvtpq_s32_f32(a);			return vcvtpq_s32_f32(a);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vcvtpq_u32_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define <4 x i32> @test_vcvtpq_u32_f32(<4 x float> %a) #1 {
	// CHECK: [[VCVTPQ_U32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtpu.v4i32.v4f32(<4 x float> %a) #2			// CHECK: [[VCVTPQ_U32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtpu.v4i32.v4f32(<4 x float> %a) #3
	// CHECK: ret <4 x i32> [[VCVTPQ_U32_V1_I]]			// CHECK: ret <4 x i32> [[VCVTPQ_U32_V1_I]]
	uint32x4_t test_vcvtpq_u32_f32(float32x4_t a) {			uint32x4_t test_vcvtpq_u32_f32(float32x4_t a) {
	return vcvtpq_u32_f32(a);			return vcvtpq_u32_f32(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vcvtm_s32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvtm_s32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTM_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtms.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTM_S32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtms.v2i32.v2f32(<2 x float> %a) #3
	// CHECK: ret <2 x i32> [[VCVTM_S32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTM_S32_V1_I]]
	int32x2_t test_vcvtm_s32_f32(float32x2_t a) {			int32x2_t test_vcvtm_s32_f32(float32x2_t a) {
	return vcvtm_s32_f32(a);			return vcvtm_s32_f32(a);
	}			}

	// CHECK-LABEL: define <2 x i32> @test_vcvtm_u32_f32(<2 x float> %a) #0 {			// CHECK-LABEL: define <2 x i32> @test_vcvtm_u32_f32(<2 x float> %a) #0 {
	// CHECK: [[VCVTM_U32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtmu.v2i32.v2f32(<2 x float> %a) #2			// CHECK: [[VCVTM_U32_V1_I:%.*]] = call <2 x i32> @llvm.arm.neon.vcvtmu.v2i32.v2f32(<2 x float> %a) #3
	// CHECK: ret <2 x i32> [[VCVTM_U32_V1_I]]			// CHECK: ret <2 x i32> [[VCVTM_U32_V1_I]]
	uint32x2_t test_vcvtm_u32_f32(float32x2_t a) {			uint32x2_t test_vcvtm_u32_f32(float32x2_t a) {
	return vcvtm_u32_f32(a);			return vcvtm_u32_f32(a);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vcvtmq_s32_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define <4 x i32> @test_vcvtmq_s32_f32(<4 x float> %a) #1 {
	// CHECK: [[VCVTMQ_S32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtms.v4i32.v4f32(<4 x float> %a) #2			// CHECK: [[VCVTMQ_S32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtms.v4i32.v4f32(<4 x float> %a) #3
	// CHECK: ret <4 x i32> [[VCVTMQ_S32_V1_I]]			// CHECK: ret <4 x i32> [[VCVTMQ_S32_V1_I]]
	int32x4_t test_vcvtmq_s32_f32(float32x4_t a) {			int32x4_t test_vcvtmq_s32_f32(float32x4_t a) {
	return vcvtmq_s32_f32(a);			return vcvtmq_s32_f32(a);
	}			}

	// CHECK-LABEL: define <4 x i32> @test_vcvtmq_u32_f32(<4 x float> %a) #0 {			// CHECK-LABEL: define <4 x i32> @test_vcvtmq_u32_f32(<4 x float> %a) #1 {
	// CHECK: [[VCVTMQ_U32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtmu.v4i32.v4f32(<4 x float> %a) #2			// CHECK: [[VCVTMQ_U32_V1_I:%.*]] = call <4 x i32> @llvm.arm.neon.vcvtmu.v4i32.v4f32(<4 x float> %a) #3
	// CHECK: ret <4 x i32> [[VCVTMQ_U32_V1_I]]			// CHECK: ret <4 x i32> [[VCVTMQ_U32_V1_I]]
	uint32x4_t test_vcvtmq_u32_f32(float32x4_t a) {			uint32x4_t test_vcvtmq_u32_f32(float32x4_t a) {
	return vcvtmq_u32_f32(a);			return vcvtmq_u32_f32(a);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"

test/CodeGen/arm64_vdupq_n_f64.c

	Show All 38 Lines
	// CHECK-LABEL: define <2 x double> @test_vmovq_n_f64(double %w) #0 {			// CHECK-LABEL: define <2 x double> @test_vmovq_n_f64(double %w) #0 {
	// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %w, i32 0			// CHECK: [[VECINIT_I:%.*]] = insertelement <2 x double> undef, double %w, i32 0
	// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %w, i32 1			// CHECK: [[VECINIT1_I:%.*]] = insertelement <2 x double> [[VECINIT_I]], double %w, i32 1
	// CHECK: ret <2 x double> [[VECINIT1_I]]			// CHECK: ret <2 x double> [[VECINIT1_I]]
	float64x2_t test_vmovq_n_f64(float64_t w) {			float64x2_t test_vmovq_n_f64(float64_t w) {
	return vmovq_n_f64(w);			return vmovq_n_f64(w);
	}			}

	// CHECK-LABEL: define <4 x half> @test_vmov_n_f16(half* %a1) #0 {			// CHECK-LABEL: define <4 x half> @test_vmov_n_f16(half* %a1) #1 {
	// CHECK: [[TMP0:%.]] = load half, half %a1, align 2			// CHECK: [[TMP0:%.]] = load half, half %a1, align 2
	// CHECK: [[VECINIT:%.*]] = insertelement <4 x half> undef, half [[TMP0]], i32 0			// CHECK: [[VECINIT:%.*]] = insertelement <4 x half> undef, half [[TMP0]], i32 0
	// CHECK: [[VECINIT1:%.*]] = insertelement <4 x half> [[VECINIT]], half [[TMP0]], i32 1			// CHECK: [[VECINIT1:%.*]] = insertelement <4 x half> [[VECINIT]], half [[TMP0]], i32 1
	// CHECK: [[VECINIT2:%.*]] = insertelement <4 x half> [[VECINIT1]], half [[TMP0]], i32 2			// CHECK: [[VECINIT2:%.*]] = insertelement <4 x half> [[VECINIT1]], half [[TMP0]], i32 2
	// CHECK: [[VECINIT3:%.*]] = insertelement <4 x half> [[VECINIT2]], half [[TMP0]], i32 3			// CHECK: [[VECINIT3:%.*]] = insertelement <4 x half> [[VECINIT2]], half [[TMP0]], i32 3
	// CHECK: ret <4 x half> [[VECINIT3]]			// CHECK: ret <4 x half> [[VECINIT3]]
	float16x4_t test_vmov_n_f16(float16_t *a1) {			float16x4_t test_vmov_n_f16(float16_t *a1) {
	return vmov_n_f16(*a1);			return vmov_n_f16(*a1);
	Show All 15 Lines
	// CHECK: [[VECINIT5:%.*]] = insertelement <8 x half> [[VECINIT4]], half [[TMP0]], i32 5			// CHECK: [[VECINIT5:%.*]] = insertelement <8 x half> [[VECINIT4]], half [[TMP0]], i32 5
	// CHECK: [[VECINIT6:%.*]] = insertelement <8 x half> [[VECINIT5]], half [[TMP0]], i32 6			// CHECK: [[VECINIT6:%.*]] = insertelement <8 x half> [[VECINIT5]], half [[TMP0]], i32 6
	// CHECK: [[VECINIT7:%.*]] = insertelement <8 x half> [[VECINIT6]], half [[TMP0]], i32 7			// CHECK: [[VECINIT7:%.*]] = insertelement <8 x half> [[VECINIT6]], half [[TMP0]], i32 7
	// CHECK: ret <8 x half> [[VECINIT7]]			// CHECK: ret <8 x half> [[VECINIT7]]
	float16x8_t test_vmovq_n_f16(float16_t *a1) {			float16x8_t test_vmovq_n_f16(float16_t *a1) {
	return vmovq_n_f16(*a1);			return vmovq_n_f16(*a1);
	}			}

				// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="128"
				// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="64"

test/CodeGen/x86-vector-width.c

This file was added.

				// RUN: %clang_cc1 -triple i686-linux-gnu -target-cpu i686 -emit-llvm %s -o - \| FileCheck %s

				typedef signed long long V2LLi __attribute__((vector_size(16)));
				typedef signed long long V4LLi __attribute__((vector_size(32)));

				V2LLi ret_128();
				V4LLi ret_256();
				void arg_128(V2LLi);
				void arg_256(V4LLi);

				// Make sure return type forces a min-legal-width
				V2LLi foo(void) {
				return (V2LLi){ 0, 0 };
				}

				V4LLi goo(void) {
				return (V4LLi){ 0, 0 };
				}

				// Make sure return type of called function forces a min-legal-width
				void hoo(void) {
				V2LLi tmp_V2LLi;
				tmp_V2LLi = ret_128();
				}

				void joo(void) {
				V4LLi tmp_V4LLi;
				tmp_V4LLi = ret_256();
				}

				// Make sure arg type of called function forces a min-legal-width
				void koo(void) {
				V2LLi tmp_V2LLi;
				arg_128(tmp_V2LLi);
				}

				void loo(void) {
				V4LLi tmp_V4LLi;
				arg_256(tmp_V4LLi);
				}

				// Make sure arg type of our function forces a min-legal-width
				void moo(V2LLi x) {

				}

				void noo(V4LLi x) {

				}

				// CHECK: {{.}}@foo{{.}} #0
				// CHECK: {{.}}@goo{{.}} #1
				rnkUnsubmitted Not Done Reply Inline Actions I'd look for `define {{.}}@foo{{.}} #0` to be a bit more precise. rnk: I'd look for `define {{.}}@foo{{.}} #0` to be a bit more precise.
				// CHECK: {{.}}@hoo{{.}} #0
				// CHECK: {{.}}@joo{{.}} #1
				// CHECK: {{.}}@koo{{.}} #0
				// CHECK: {{.}}@loo{{.}} #1
				// CHECK: {{.}}@moo{{.}} #0
				// CHECK: {{.}}@noo{{.}} #1

				// CHECK: #0 = {{.*}}"min-legal-vector-width"="128"
				// CHECK: #1 = {{.*}}"min-legal-vector-width"="256"

test/CodeGenOpenCL/fpmath.cl

Show All 10 Lines	float spscalardiv(float a, float b) {
// CHECK: fdiv{{.*}},		// CHECK: fdiv{{.*}},
// NODIVOPT: !fpmath ![[MD:[0-9]+]]		// NODIVOPT: !fpmath ![[MD:[0-9]+]]
// DIVOPT-NOT: !fpmath ![[MD:[0-9]+]]		// DIVOPT-NOT: !fpmath ![[MD:[0-9]+]]
return a / b;		return a / b;
}		}

float4 spvectordiv(float4 a, float4 b) {		float4 spvectordiv(float4 a, float4 b) {
// CHECK: @spvectordiv		// CHECK: @spvectordiv
// CHECK: #[[ATTR]]		// CHECK: #[[ATTR2:[0-9]+]]
// CHECK: fdiv{{.*}},		// CHECK: fdiv{{.*}},
// NODIVOPT: !fpmath ![[MD]]		// NODIVOPT: !fpmath ![[MD]]
// DIVOPT-NOT: !fpmath ![[MD]]		// DIVOPT-NOT: !fpmath ![[MD]]
return a / b;		return a / b;
}		}

#if __OPENCL_C_VERSION__ >=120		#if __OPENCL_C_VERSION__ >=120
void printf(constant char* fmt, ...);		void printf(constant char* fmt, ...);
Show All 12 Lines	double dpscalardiv(double a, double b) {
// CHECK: @dpscalardiv		// CHECK: @dpscalardiv
// CHECK: #[[ATTR]]		// CHECK: #[[ATTR]]
// CHECK-NOT: !fpmath		// CHECK-NOT: !fpmath
return a / b;		return a / b;
}		}
#endif		#endif

// CHECK: attributes #[[ATTR]] = {		// CHECK: attributes #[[ATTR]] = {
// NODIVOPT: "correctly-rounded-divide-sqrt-fp-math"="false"		// NODIVOPT-SAME: "correctly-rounded-divide-sqrt-fp-math"="false"
// DIVOPT: "correctly-rounded-divide-sqrt-fp-math"="true"		// DIVOPT-SAME: "correctly-rounded-divide-sqrt-fp-math"="true"
// CHECK: }		// CHECK-SAME: }
		// CHECK: attributes #[[ATTR2]] = {
		rnkUnsubmitted Not Done Reply Inline Actions Does this actually work? Shouldn't these be `NODIVOPT-SAME: "correctly-rounded-divide-sqrt-fp-math"="false"`? rnk: Does this actually work? Shouldn't these be `NODIVOPT-SAME: "correctly-rounded-divide-sqrt-fp…
		// NODIVOPT-SAME: "correctly-rounded-divide-sqrt-fp-math"="false"
		// DIVOPT-SAME: "correctly-rounded-divide-sqrt-fp-math"="true"
		// CHECK-SAME: }
// NODIVOPT: ![[MD]] = !{float 2.500000e+00}		// NODIVOPT: ![[MD]] = !{float 2.500000e+00}