This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][ARM] Coerce FP16 vectors to integer vectors when needed
ClosedPublic

Authored by miyuki on Aug 9 2018, 5:03 AM.

Download Raw Diff

Details

Reviewers

eli.friedman
olista01
SjoerdMeijer
javed.absar
efriedma

Commits

rGe04ab4fe9710: [CodeGen][ARM] Coerce FP16 vectors to integer vectors when needed
rL342034: [CodeGen][ARM] Coerce FP16 vectors to integer vectors when needed
rC342034: [CodeGen][ARM] Coerce FP16 vectors to integer vectors when needed

Summary

On targets that do not support FP16 natively LLVM currently legalizes
vectors of FP16 values by scalarizing them and promoting to FP32. This
causes problems for the following code:

void foo(int, ...);

typedef __attribute__((neon_vector_type(4))) __fp16 float16x4_t;
void bar(float16x4_t x) {
  foo(42, x);
}

According to the AAPCS (appendix A.2) float16x4_t is a containerized
vector fundamental type, so 'foo' expects that the 4 16-bit FP values
are packed into 2 32-bit registers, but instead bar promotes them to
4 single precision values.

Since we already handle scalar FP16 values in the frontend by
bitcasting them to/from integers, this patch adds similar handling for
vector types and homogeneous FP16 vector aggregates.

One existing test required some adjustments because we now generate
more bitcasts (so the patch changes the test to target a machine with
native FP16 support).

Diff Detail

Repository: rC Clang

Event Timeline

miyuki created this revision.Aug 9 2018, 5:03 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 9 2018, 5:03 AM

Herald added subscribers: chrib, kristof.beyls. · View Herald Transcript

Do we need to check for homogeneous aggregates of half vectors somewhere?

Fixed handling of homogeneous aggregates of FP16 vectors

efriedma added inline comments.Aug 10 2018, 12:53 PM

lib/CodeGen/TargetInfo.cpp
5824	Do we need equivalent code in classifyReturnType?

Handle return of homogeneous aggregates

miyuki marked an inline comment as done.Aug 13 2018, 3:26 AM

ping

javed.absar accepted this revision.Sep 11 2018, 1:53 AM

This revision is now accepted and ready to land.Sep 11 2018, 1:53 AM

LGTM

Closed by commit rC342034: [CodeGen][ARM] Coerce FP16 vectors to integer vectors when needed (authored by miyuki). · Explain WhySep 12 2018, 2:20 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

TargetInfo.cpp

92 lines

test/

CodeGen/

arm-vfp16-arguments.c

76 lines

arm_neon_intrinsics.c

237 lines

Diff 165038

lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,543 Lines • ▼ Show 20 Lines	bool isEABIHF() const {
}		}
}		}

ABIKind getABIKind() const { return Kind; }		ABIKind getABIKind() const { return Kind; }

private:		private:
ABIArgInfo classifyReturnType(QualType RetTy, bool isVariadic) const;		ABIArgInfo classifyReturnType(QualType RetTy, bool isVariadic) const;
ABIArgInfo classifyArgumentType(QualType RetTy, bool isVariadic) const;		ABIArgInfo classifyArgumentType(QualType RetTy, bool isVariadic) const;
		ABIArgInfo classifyHomogeneousAggregate(QualType Ty, const Type *Base,
		uint64_t Members) const;
		ABIArgInfo coerceIllegalVector(QualType Ty) const;
bool isIllegalVectorType(QualType Ty) const;		bool isIllegalVectorType(QualType Ty) const;

bool isHomogeneousAggregateBaseType(QualType Ty) const override;		bool isHomogeneousAggregateBaseType(QualType Ty) const override;
bool isHomogeneousAggregateSmallEnough(const Type *Ty,		bool isHomogeneousAggregateSmallEnough(const Type *Ty,
uint64_t Members) const override;		uint64_t Members) const override;

void computeInfo(CGFunctionInfo &FI) const override;		void computeInfo(CGFunctionInfo &FI) const override;

▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	void ARMABIInfo::setCCs() {

// Don't muddy up the IR with a ton of explicit annotations if		// Don't muddy up the IR with a ton of explicit annotations if
// they'd just match what LLVM will infer from the triple.		// they'd just match what LLVM will infer from the triple.
llvm::CallingConv::ID abiCC = getABIDefaultCC();		llvm::CallingConv::ID abiCC = getABIDefaultCC();
if (abiCC != getLLVMDefaultCC())		if (abiCC != getLLVMDefaultCC())
RuntimeCC = abiCC;		RuntimeCC = abiCC;
}		}

		ABIArgInfo ARMABIInfo::coerceIllegalVector(QualType Ty) const {
		uint64_t Size = getContext().getTypeSize(Ty);
		if (Size <= 32) {
		llvm::Type *ResType =
		llvm::Type::getInt32Ty(getVMContext());
		return ABIArgInfo::getDirect(ResType);
		}
		if (Size == 64 \|\| Size == 128) {
		llvm::Type *ResType = llvm::VectorType::get(
		llvm::Type::getInt32Ty(getVMContext()), Size / 32);
		return ABIArgInfo::getDirect(ResType);
		}
		return getNaturalAlignIndirect(Ty, /ByVal=/false);
		}

		ABIArgInfo ARMABIInfo::classifyHomogeneousAggregate(QualType Ty,
		const Type *Base,
		uint64_t Members) const {
		assert(Base && "Base class should be set for homogeneous aggregate");
		// Base can be a floating-point or a vector.
		if (const VectorType *VT = Base->getAs<VectorType>()) {
		// FP16 vectors should be converted to integer vectors
		if (!getTarget().hasLegalHalfType() &&
		(VT->getElementType()->isFloat16Type() \|\|
		VT->getElementType()->isHalfType())) {
		uint64_t Size = getContext().getTypeSize(VT);
		llvm::Type *NewVecTy = llvm::VectorType::get(
		llvm::Type::getInt32Ty(getVMContext()), Size / 32);
		llvm::Type *Ty = llvm::ArrayType::get(NewVecTy, Members);
		return ABIArgInfo::getDirect(Ty, 0, nullptr, false);
		}
		}
		return ABIArgInfo::getDirect(nullptr, 0, nullptr, false);
		}

ABIArgInfo ARMABIInfo::classifyArgumentType(QualType Ty,		ABIArgInfo ARMABIInfo::classifyArgumentType(QualType Ty,
bool isVariadic) const {		bool isVariadic) const {
// 6.1.2.1 The following argument types are VFP CPRCs:		// 6.1.2.1 The following argument types are VFP CPRCs:
// A single-precision floating-point type (including promoted		// A single-precision floating-point type (including promoted
// half-precision types); A double-precision floating-point type;		// half-precision types); A double-precision floating-point type;
// A 64-bit or 128-bit containerized vector type; Homogeneous Aggregate		// A 64-bit or 128-bit containerized vector type; Homogeneous Aggregate
// with a Base Type of a single- or double-precision floating-point type,		// with a Base Type of a single- or double-precision floating-point type,
// 64-bit containerized vectors or 128-bit containerized vectors with one		// 64-bit containerized vectors or 128-bit containerized vectors with one
// to four Elements.		// to four Elements.
bool IsEffectivelyAAPCS_VFP = getABIKind() == AAPCS_VFP && !isVariadic;		bool IsEffectivelyAAPCS_VFP = getABIKind() == AAPCS_VFP && !isVariadic;

Ty = useFirstFieldIfTransparentUnion(Ty);		Ty = useFirstFieldIfTransparentUnion(Ty);

// Handle illegal vector types here.		// Handle illegal vector types here.
if (isIllegalVectorType(Ty)) {		if (isIllegalVectorType(Ty))
uint64_t Size = getContext().getTypeSize(Ty);		return coerceIllegalVector(Ty);
if (Size <= 32) {
llvm::Type *ResType =
llvm::Type::getInt32Ty(getVMContext());
return ABIArgInfo::getDirect(ResType);
}
if (Size == 64) {
llvm::Type *ResType = llvm::VectorType::get(
llvm::Type::getInt32Ty(getVMContext()), 2);
return ABIArgInfo::getDirect(ResType);
}
if (Size == 128) {
llvm::Type *ResType = llvm::VectorType::get(
llvm::Type::getInt32Ty(getVMContext()), 4);
return ABIArgInfo::getDirect(ResType);
}
return getNaturalAlignIndirect(Ty, /ByVal=/false);
}

// _Float16 and __fp16 get passed as if it were an int or float, but with		// _Float16 and __fp16 get passed as if it were an int or float, but with
// the top 16 bits unspecified. This is not done for OpenCL as it handles the		// the top 16 bits unspecified. This is not done for OpenCL as it handles the
// half type natively, and does not need to interwork with AAPCS code.		// half type natively, and does not need to interwork with AAPCS code.
if ((Ty->isFloat16Type() \|\| Ty->isHalfType()) &&		if ((Ty->isFloat16Type() \|\| Ty->isHalfType()) &&
!getContext().getLangOpts().NativeHalfArgsAndReturns) {		!getContext().getLangOpts().NativeHalfArgsAndReturns) {
llvm::Type *ResType = IsEffectivelyAAPCS_VFP ?		llvm::Type *ResType = IsEffectivelyAAPCS_VFP ?
llvm::Type::getFloatTy(getVMContext()) :		llvm::Type::getFloatTy(getVMContext()) :
Show All 19 Lines	ABIArgInfo ARMABIInfo::classifyArgumentType(QualType Ty,
if (isEmptyRecord(getContext(), Ty, true))		if (isEmptyRecord(getContext(), Ty, true))
return ABIArgInfo::getIgnore();		return ABIArgInfo::getIgnore();

if (IsEffectivelyAAPCS_VFP) {		if (IsEffectivelyAAPCS_VFP) {
// Homogeneous Aggregates need to be expanded when we can fit the aggregate		// Homogeneous Aggregates need to be expanded when we can fit the aggregate
// into VFP registers.		// into VFP registers.
const Type *Base = nullptr;		const Type *Base = nullptr;
uint64_t Members = 0;		uint64_t Members = 0;
if (isHomogeneousAggregate(Ty, Base, Members)) {		if (isHomogeneousAggregate(Ty, Base, Members))
assert(Base && "Base class should be set for homogeneous aggregate");		return classifyHomogeneousAggregate(Ty, Base, Members);
// Base can be a floating-point or a vector.
return ABIArgInfo::getDirect(nullptr, 0, nullptr, false);
}
} else if (getABIKind() == ARMABIInfo::AAPCS16_VFP) {		} else if (getABIKind() == ARMABIInfo::AAPCS16_VFP) {
// WatchOS does have homogeneous aggregates. Note that we intentionally use		// WatchOS does have homogeneous aggregates. Note that we intentionally use
// this convention even for a variadic function: the backend will use GPRs		// this convention even for a variadic function: the backend will use GPRs
// if needed.		// if needed.
const Type *Base = nullptr;		const Type *Base = nullptr;
uint64_t Members = 0;		uint64_t Members = 0;
if (isHomogeneousAggregate(Ty, Base, Members)) {		if (isHomogeneousAggregate(Ty, Base, Members)) {
assert(Base && Members <= 4 && "unexpected homogeneous aggregate");		assert(Base && Members <= 4 && "unexpected homogeneous aggregate");
		efriedmaUnsubmitted Done Reply Inline Actions Do we need equivalent code in classifyReturnType? efriedma: Do we need equivalent code in classifyReturnType?
llvm::Type *Ty =		llvm::Type *Ty =
llvm::ArrayType::get(CGT.ConvertType(QualType(Base, 0)), Members);		llvm::ArrayType::get(CGT.ConvertType(QualType(Base, 0)), Members);
return ABIArgInfo::getDirect(Ty, 0, nullptr, false);		return ABIArgInfo::getDirect(Ty, 0, nullptr, false);
}		}
}		}

if (getABIKind() == ARMABIInfo::AAPCS16_VFP &&		if (getABIKind() == ARMABIInfo::AAPCS16_VFP &&
getContext().getTypeSizeInChars(Ty) > CharUnits::fromQuantity(16)) {		getContext().getTypeSizeInChars(Ty) > CharUnits::fromQuantity(16)) {
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
ABIArgInfo ARMABIInfo::classifyReturnType(QualType RetTy,		ABIArgInfo ARMABIInfo::classifyReturnType(QualType RetTy,
bool isVariadic) const {		bool isVariadic) const {
bool IsEffectivelyAAPCS_VFP =		bool IsEffectivelyAAPCS_VFP =
(getABIKind() == AAPCS_VFP \|\| getABIKind() == AAPCS16_VFP) && !isVariadic;		(getABIKind() == AAPCS_VFP \|\| getABIKind() == AAPCS16_VFP) && !isVariadic;

if (RetTy->isVoidType())		if (RetTy->isVoidType())
return ABIArgInfo::getIgnore();		return ABIArgInfo::getIgnore();

		if (const VectorType *VT = RetTy->getAs<VectorType>()) {
// Large vector types should be returned via memory.		// Large vector types should be returned via memory.
if (RetTy->isVectorType() && getContext().getTypeSize(RetTy) > 128) {		if (getContext().getTypeSize(RetTy) > 128)
return getNaturalAlignIndirect(RetTy);		return getNaturalAlignIndirect(RetTy);
		// FP16 vectors should be converted to integer vectors
		if (!getTarget().hasLegalHalfType() &&
		(VT->getElementType()->isFloat16Type() \|\|
		VT->getElementType()->isHalfType()))
		return coerceIllegalVector(RetTy);
}		}

// _Float16 and __fp16 get returned as if it were an int or float, but with		// _Float16 and __fp16 get returned as if it were an int or float, but with
// the top 16 bits unspecified. This is not done for OpenCL as it handles the		// the top 16 bits unspecified. This is not done for OpenCL as it handles the
// half type natively, and does not need to interwork with AAPCS code.		// half type natively, and does not need to interwork with AAPCS code.
if ((RetTy->isFloat16Type() \|\| RetTy->isHalfType()) &&		if ((RetTy->isFloat16Type() \|\| RetTy->isHalfType()) &&
!getContext().getLangOpts().NativeHalfArgsAndReturns) {		!getContext().getLangOpts().NativeHalfArgsAndReturns) {
llvm::Type *ResType = IsEffectivelyAAPCS_VFP ?		llvm::Type *ResType = IsEffectivelyAAPCS_VFP ?
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	ABIArgInfo ARMABIInfo::classifyReturnType(QualType RetTy,

if (isEmptyRecord(getContext(), RetTy, true))		if (isEmptyRecord(getContext(), RetTy, true))
return ABIArgInfo::getIgnore();		return ABIArgInfo::getIgnore();

// Check for homogeneous aggregates with AAPCS-VFP.		// Check for homogeneous aggregates with AAPCS-VFP.
if (IsEffectivelyAAPCS_VFP) {		if (IsEffectivelyAAPCS_VFP) {
const Type *Base = nullptr;		const Type *Base = nullptr;
uint64_t Members = 0;		uint64_t Members = 0;
if (isHomogeneousAggregate(RetTy, Base, Members)) {		if (isHomogeneousAggregate(RetTy, Base, Members))
assert(Base && "Base class should be set for homogeneous aggregate");		return classifyHomogeneousAggregate(RetTy, Base, Members);
// Homogeneous Aggregates are returned directly.
return ABIArgInfo::getDirect(nullptr, 0, nullptr, false);
}
}		}

// Aggregates <= 4 bytes are returned in r0; other aggregates		// Aggregates <= 4 bytes are returned in r0; other aggregates
// are returned indirectly.		// are returned indirectly.
uint64_t Size = getContext().getTypeSize(RetTy);		uint64_t Size = getContext().getTypeSize(RetTy);
if (Size <= 32) {		if (Size <= 32) {
// On RenderScript, coerce Aggregates <= 4 bytes to an integer array of		// On RenderScript, coerce Aggregates <= 4 bytes to an integer array of
// same size and alignment.		// same size and alignment.
Show All 18 Lines	ABIArgInfo ARMABIInfo::classifyReturnType(QualType RetTy,
}		}

return getNaturalAlignIndirect(RetTy);		return getNaturalAlignIndirect(RetTy);
}		}

/// isIllegalVector - check whether Ty is an illegal vector type.		/// isIllegalVector - check whether Ty is an illegal vector type.
bool ARMABIInfo::isIllegalVectorType(QualType Ty) const {		bool ARMABIInfo::isIllegalVectorType(QualType Ty) const {
if (const VectorType *VT = Ty->getAs<VectorType> ()) {		if (const VectorType *VT = Ty->getAs<VectorType> ()) {
		// On targets that don't support FP16, FP16 is expanded into float, and we
		// don't want the ABI to depend on whether or not FP16 is supported in
		// hardware. Thus return false to coerce FP16 vectors into integer vectors.
		if (!getTarget().hasLegalHalfType() &&
		(VT->getElementType()->isFloat16Type() \|\|
		VT->getElementType()->isHalfType()))
		return true;
if (isAndroid()) {		if (isAndroid()) {
// Android shipped using Clang 3.1, which supported a slightly different		// Android shipped using Clang 3.1, which supported a slightly different
// vector ABI. The primary differences were that 3-element vector types		// vector ABI. The primary differences were that 3-element vector types
// were legal, and so were sub 32-bit vectors (i.e. <2 x i8>). This path		// were legal, and so were sub 32-bit vectors (i.e. <2 x i8>). This path
// accepts that legacy behavior for Android only.		// accepts that legacy behavior for Android only.
// Check whether VT is legal.		// Check whether VT is legal.
unsigned NumElements = VT->getNumElements();		unsigned NumElements = VT->getNumElements();
// NumElements should be power of 2 or equal to 3.		// NumElements should be power of 2 or equal to 3.
▲ Show 20 Lines • Show All 3,249 Lines • Show Last 20 Lines

test/CodeGen/arm-vfp16-arguments.c

				// RUN: %clang_cc1 -triple armv7a--none-eabi -target-abi aapcs \
				// RUN: -mfloat-abi soft -target-feature +neon -emit-llvm -o - -O1 %s \
				// RUN: \| FileCheck %s --check-prefix=CHECK-SOFT
				// RUN: %clang_cc1 -triple armv7a--none-eabi -target-abi aapcs \
				// RUN: -mfloat-abi hard -target-feature +neon -emit-llvm -o - -O1 %s \
				// RUN: \| FileCheck %s --check-prefix=CHECK-HARD
				// RUN: %clang_cc1 -triple armv7a--none-eabi -target-abi aapcs \
				// RUN: -mfloat-abi hard -target-feature +neon -target-feature +fullfp16 \
				// RUN: -emit-llvm -o - -O1 %s \
				// RUN: \| FileCheck %s --check-prefix=CHECK-FULL

				typedef __attribute__((neon_vector_type(4))) __fp16 float16x4_t;
				typedef __attribute__((neon_vector_type(8))) __fp16 float16x8_t;

				typedef struct { float16x4_t x[2]; } hfa_t;
				// CHECK-FULL: %struct.hfa_t = type { [2 x <4 x half>] }

				float16x4_t g4;
				float16x8_t g8;

				void st4(float16x4_t a) { g4 = a; }
				// CHECK-SOFT: define void @st4(<2 x i32> %a.coerce)
				// CHECK-SOFT: store <2 x i32> %a.coerce, <2 x i32>* bitcast (<4 x half>* @g4 to <2 x i32>*)
				//
				// CHECK-HARD: define arm_aapcs_vfpcc void @st4(<2 x i32> %a.coerce)
				// CHECK-HARD: store <2 x i32> %a.coerce, <2 x i32>* bitcast (<4 x half>* @g4 to <2 x i32>*)
				//
				// CHECK-FULL: define arm_aapcs_vfpcc void @st4(<4 x half> %a)
				// CHECK-FULL: store <4 x half> %a, <4 x half>* @g4

				float16x4_t ld4(void) { return g4; }
				// CHECK-SOFT: define <2 x i32> @ld4()
				// CHECK-SOFT: %0 = load <2 x i32>, <2 x i32>* bitcast (<4 x half>* @g4 to <2 x i32>*)
				// CHECK-SOFT: ret <2 x i32> %0
				//
				// CHECK-HARD: define arm_aapcs_vfpcc <2 x i32> @ld4()
				// CHECK-HARD: %0 = load <2 x i32>, <2 x i32>* bitcast (<4 x half>* @g4 to <2 x i32>*)
				// CHECK-HARD: ret <2 x i32> %0
				//
				// CHECK-FULL: define arm_aapcs_vfpcc <4 x half> @ld4()
				// CHECK-FULL: %0 = load <4 x half>, <4 x half>* @g4
				// CHECK-FULL: ret <4 x half> %0

				void st8(float16x8_t a) { g8 = a; }
				// CHECK-SOFT: define void @st8(<4 x i32> %a.coerce)
				// CHECK-SOFT: store <4 x i32> %a.coerce, <4 x i32>* bitcast (<8 x half>* @g8 to <4 x i32>*)
				//
				// CHECK-HARD: define arm_aapcs_vfpcc void @st8(<4 x i32> %a.coerce)
				// CHECK-HARD: store <4 x i32> %a.coerce, <4 x i32>* bitcast (<8 x half>* @g8 to <4 x i32>*)
				//
				// CHECK-FULL: define arm_aapcs_vfpcc void @st8(<8 x half> %a)
				// CHECK-FULL: store <8 x half> %a, <8 x half>* @g8

				float16x8_t ld8(void) { return g8; }
				// CHECK-SOFT: define <4 x i32> @ld8()
				// CHECK-SOFT: %0 = load <4 x i32>, <4 x i32>* bitcast (<8 x half>* @g8 to <4 x i32>*)
				// CHECK-SOFT: ret <4 x i32> %0
				//
				// CHECK-HARD: define arm_aapcs_vfpcc <4 x i32> @ld8()
				// CHECK-HARD: %0 = load <4 x i32>, <4 x i32>* bitcast (<8 x half>* @g8 to <4 x i32>*)
				// CHECK-HARD: ret <4 x i32> %0
				//
				// CHECK-FULL: define arm_aapcs_vfpcc <8 x half> @ld8()
				// CHECK-FULL: %0 = load <8 x half>, <8 x half>* @g8
				// CHECK-FULL: ret <8 x half> %0

				void test_hfa(hfa_t a) {}
				// CHECK-SOFT: define void @test_hfa([2 x i64] %a.coerce)
				// CHECK-HARD: define arm_aapcs_vfpcc void @test_hfa([2 x <2 x i32>] %a.coerce)
				// CHECK-FULL: define arm_aapcs_vfpcc void @test_hfa(%struct.hfa_t %a.coerce)

				hfa_t ghfa;
				hfa_t test_ret_hfa(void) { return ghfa; }
				// CHECK-SOFT: define void @test_ret_hfa(%struct.hfa_t* noalias nocapture sret %agg.result)
				// CHECK-HARD: define arm_aapcs_vfpcc [2 x <2 x i32>] @test_ret_hfa()
				// CHECK-FULL: define arm_aapcs_vfpcc %struct.hfa_t @test_ret_hfa()

test/CodeGen/arm_neon_intrinsics.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	// RUN: %clang_cc1 -triple thumbv7s-apple-darwin -target-abi apcs-gnu\			// RUN: %clang_cc1 -triple thumbv7s-apple-darwin -target-abi apcs-gnu\
	// RUN: -target-cpu swift -fallow-half-arguments-and-returns -ffreestanding \			// RUN: -target-cpu swift -fallow-half-arguments-and-returns \
				// RUN: -target-feature +fullfp16 -ffreestanding \
	// RUN: -disable-O0-optnone -emit-llvm -o - %s \			// RUN: -disable-O0-optnone -emit-llvm -o - %s \
	// RUN: \| opt -S -mem2reg \| FileCheck %s			// RUN: \| opt -S -mem2reg \| FileCheck %s

	#include <arm_neon.h>			#include <arm_neon.h>

	// CHECK-LABEL: @test_vaba_s8(			// CHECK-LABEL: @test_vaba_s8(
	// CHECK: [[VABD_V_I_I:%.*]] = call <8 x i8> @llvm.arm.neon.vabds.v8i8(<8 x i8> %b, <8 x i8> %c)			// CHECK: [[VABD_V_I_I:%.*]] = call <8 x i8> @llvm.arm.neon.vabds.v8i8(<8 x i8> %b, <8 x i8> %c)
	// CHECK: [[ADD_I:%.*]] = add <8 x i8> %a, [[VABD_V_I_I]]			// CHECK: [[ADD_I:%.*]] = add <8 x i8> %a, [[VABD_V_I_I]]
	▲ Show 20 Lines • Show All 3,880 Lines • ▼ Show 20 Lines
	// CHECK: [[VLD1:%.]] = call <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8 [[TMP0]], i32 4)			// CHECK: [[VLD1:%.]] = call <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8 [[TMP0]], i32 4)
	// CHECK: ret <2 x i64> [[VLD1]]			// CHECK: ret <2 x i64> [[VLD1]]
	int64x2_t test_vld1q_s64(int64_t const * a) {			int64x2_t test_vld1q_s64(int64_t const * a) {
	return vld1q_s64(a);			return vld1q_s64(a);
	}			}

	// CHECK-LABEL: @test_vld1q_f16(			// CHECK-LABEL: @test_vld1q_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD1:%.]] = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8 [[TMP0]], i32 2)			// CHECK: [[VLD1:%.]] = call <8 x half> @llvm.arm.neon.vld1.v8f16.p0i8(i8 [[TMP0]], i32 2)
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i16> [[VLD1]] to <8 x half>			// CHECK: ret <8 x half> [[VLD1]]
	// CHECK: ret <8 x half> [[TMP1]]
	float16x8_t test_vld1q_f16(float16_t const * a) {			float16x8_t test_vld1q_f16(float16_t const * a) {
	return vld1q_f16(a);			return vld1q_f16(a);
	}			}

	// CHECK-LABEL: @test_vld1q_f32(			// CHECK-LABEL: @test_vld1q_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[VLD1:%.]] = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8 [[TMP0]], i32 4)			// CHECK: [[VLD1:%.]] = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8 [[TMP0]], i32 4)
	// CHECK: ret <4 x float> [[VLD1]]			// CHECK: ret <4 x float> [[VLD1]]
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	// CHECK: [[VLD1:%.]] = call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8 [[TMP0]], i32 4)			// CHECK: [[VLD1:%.]] = call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8 [[TMP0]], i32 4)
	// CHECK: ret <1 x i64> [[VLD1]]			// CHECK: ret <1 x i64> [[VLD1]]
	int64x1_t test_vld1_s64(int64_t const * a) {			int64x1_t test_vld1_s64(int64_t const * a) {
	return vld1_s64(a);			return vld1_s64(a);
	}			}

	// CHECK-LABEL: @test_vld1_f16(			// CHECK-LABEL: @test_vld1_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD1:%.]] = call <4 x i16> @llvm.arm.neon.vld1.v4i16.p0i8(i8 [[TMP0]], i32 2)			// CHECK: [[VLD1:%.]] = call <4 x half> @llvm.arm.neon.vld1.v4f16.p0i8(i8 [[TMP0]], i32 2)
	// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> [[VLD1]] to <4 x half>			// CHECK: ret <4 x half> [[VLD1]]
	// CHECK: ret <4 x half> [[TMP1]]
	float16x4_t test_vld1_f16(float16_t const * a) {			float16x4_t test_vld1_f16(float16_t const * a) {
	return vld1_f16(a);			return vld1_f16(a);
	}			}

	// CHECK-LABEL: @test_vld1_f32(			// CHECK-LABEL: @test_vld1_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[VLD1:%.]] = call <2 x float> @llvm.arm.neon.vld1.v2f32.p0i8(i8 [[TMP0]], i32 4)			// CHECK: [[VLD1:%.]] = call <2 x float> @llvm.arm.neon.vld1.v2f32.p0i8(i8 [[TMP0]], i32 4)
	// CHECK: ret <2 x float> [[VLD1]]			// CHECK: ret <2 x float> [[VLD1]]
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	// CHECK: [[LANE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP3]], <2 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP3]], <2 x i32> zeroinitializer
	// CHECK: ret <2 x i64> [[LANE]]			// CHECK: ret <2 x i64> [[LANE]]
	int64x2_t test_vld1q_dup_s64(int64_t const * a) {			int64x2_t test_vld1q_dup_s64(int64_t const * a) {
	return vld1q_dup_s64(a);			return vld1q_dup_s64(a);
	}			}

	// CHECK-LABEL: @test_vld1q_dup_f16(			// CHECK-LABEL: @test_vld1q_dup_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]], align 2			// CHECK: [[TMP2:%.]] = load half, half [[TMP1]], align 2
	// CHECK: [[TMP3:%.*]] = insertelement <8 x i16> undef, i16 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <8 x half> undef, half [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP3]], <8 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <8 x half> [[TMP3]], <8 x half> [[TMP3]], <8 x i32> zeroinitializer
	// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[LANE]] to <8 x half>			// CHECK: ret <8 x half> [[LANE]]
	// CHECK: ret <8 x half> [[TMP4]]
	float16x8_t test_vld1q_dup_f16(float16_t const * a) {			float16x8_t test_vld1q_dup_f16(float16_t const * a) {
	return vld1q_dup_f16(a);			return vld1q_dup_f16(a);
	}			}

	// CHECK-LABEL: @test_vld1q_dup_f32(			// CHECK-LABEL: @test_vld1q_dup_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*
	// CHECK: [[TMP2:%.]] = load float, float [[TMP1]], align 4			// CHECK: [[TMP2:%.]] = load float, float [[TMP1]], align 4
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer
	// CHECK: ret <1 x i64> [[LANE]]			// CHECK: ret <1 x i64> [[LANE]]
	int64x1_t test_vld1_dup_s64(int64_t const * a) {			int64x1_t test_vld1_dup_s64(int64_t const * a) {
	return vld1_dup_s64(a);			return vld1_dup_s64(a);
	}			}

	// CHECK-LABEL: @test_vld1_dup_f16(			// CHECK-LABEL: @test_vld1_dup_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]], align 2			// CHECK: [[TMP2:%.]] = load half, half [[TMP1]], align 2
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x half> undef, half [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> zeroinitializer
	// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[LANE]] to <4 x half>			// CHECK: ret <4 x half> [[LANE]]
	// CHECK: ret <4 x half> [[TMP4]]
	float16x4_t test_vld1_dup_f16(float16_t const * a) {			float16x4_t test_vld1_dup_f16(float16_t const * a) {
	return vld1_dup_f16(a);			return vld1_dup_f16(a);
	}			}

	// CHECK-LABEL: @test_vld1_dup_f32(			// CHECK-LABEL: @test_vld1_dup_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*
	// CHECK: [[TMP2:%.]] = load float, float [[TMP1]], align 4			// CHECK: [[TMP2:%.]] = load float, float [[TMP1]], align 4
	▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	// CHECK: ret <2 x i64> [[VLD1Q_LANE]]			// CHECK: ret <2 x i64> [[VLD1Q_LANE]]
	int64x2_t test_vld1q_lane_s64(int64_t const * a, int64x2_t b) {			int64x2_t test_vld1q_lane_s64(int64_t const * a, int64x2_t b) {
	return vld1q_lane_s64(a, b, 1);			return vld1q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: @test_vld1q_lane_f16(			// CHECK-LABEL: @test_vld1q_lane_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]], align 2			// CHECK: [[TMP4:%.]] = load half, half [[TMP3]], align 2
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i16> [[TMP2]], i16 [[TMP4]], i32 7			// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x half> [[TMP2]], half [[TMP4]], i32 7
	// CHECK: [[TMP5:%.*]] = bitcast <8 x i16> [[VLD1_LANE]] to <8 x half>			// CHECK: ret <8 x half> [[VLD1_LANE]]
	// CHECK: ret <8 x half> [[TMP5]]
	float16x8_t test_vld1q_lane_f16(float16_t const * a, float16x8_t b) {			float16x8_t test_vld1q_lane_f16(float16_t const * a, float16x8_t b) {
	return vld1q_lane_f16(a, b, 7);			return vld1q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vld1q_lane_f32(			// CHECK-LABEL: @test_vld1q_lane_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	// CHECK: ret <1 x i64> [[VLD1_LANE]]			// CHECK: ret <1 x i64> [[VLD1_LANE]]
	int64x1_t test_vld1_lane_s64(int64_t const * a, int64x1_t b) {			int64x1_t test_vld1_lane_s64(int64_t const * a, int64x1_t b) {
	return vld1_lane_s64(a, b, 0);			return vld1_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: @test_vld1_lane_f16(			// CHECK-LABEL: @test_vld1_lane_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]], align 2			// CHECK: [[TMP4:%.]] = load half, half [[TMP3]], align 2
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3			// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x half> [[TMP2]], half [[TMP4]], i32 3
	// CHECK: [[TMP5:%.*]] = bitcast <4 x i16> [[VLD1_LANE]] to <4 x half>			// CHECK: ret <4 x half> [[VLD1_LANE]]
	// CHECK: ret <4 x half> [[TMP5]]
	float16x4_t test_vld1_lane_f16(float16_t const * a, float16x4_t b) {			float16x4_t test_vld1_lane_f16(float16_t const * a, float16x4_t b) {
	return vld1_lane_f16(a, b, 3);			return vld1_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vld1_lane_f32(			// CHECK-LABEL: @test_vld1_lane_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	int32x4x2_t test_vld2q_s32(int32_t const * a) {			int32x4x2_t test_vld2q_s32(int32_t const * a) {
	return vld2q_s32(a);			return vld2q_s32(a);
	}			}

	// CHECK-LABEL: @test_vld2q_f16(			// CHECK-LABEL: @test_vld2q_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD2Q_V:%.*]] = call { <8 x i16>, <8 x i16>			// CHECK: [[VLD2Q_V:%.*]] = call { <8 x half>, <8 x half>
	float16x8x2_t test_vld2q_f16(float16_t const * a) {			float16x8x2_t test_vld2q_f16(float16_t const * a) {
	return vld2q_f16(a);			return vld2q_f16(a);
	}			}

	// CHECK-LABEL: @test_vld2q_f32(			// CHECK-LABEL: @test_vld2q_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x2_t test_vld2_s64(int64_t const * a) {			int64x1x2_t test_vld2_s64(int64_t const * a) {
	return vld2_s64(a);			return vld2_s64(a);
	}			}

	// CHECK-LABEL: @test_vld2_f16(			// CHECK-LABEL: @test_vld2_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD2_V:%.*]] = call { <4 x i16>, <4 x i16>			// CHECK: [[VLD2_V:%.*]] = call { <4 x half>, <4 x half>
	float16x4x2_t test_vld2_f16(float16_t const * a) {			float16x4x2_t test_vld2_f16(float16_t const * a) {
	return vld2_f16(a);			return vld2_f16(a);
	}			}

	// CHECK-LABEL: @test_vld2_f32(			// CHECK-LABEL: @test_vld2_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16			// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>			// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
	// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
	// CHECK: [[VLD2Q_LANE_V:%.*]] = call { <8 x i16>, <8 x i16>			// CHECK: [[VLD2Q_LANE_V:%.*]] = call { <8 x half>, <8 x half>
	float16x8x2_t test_vld2q_lane_f16(float16_t const * a, float16x8x2_t b) {			float16x8x2_t test_vld2q_lane_f16(float16_t const * a, float16x8x2_t b) {
	return vld2q_lane_f16(a, b, 7);			return vld2q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vld2q_lane_f32(			// CHECK-LABEL: @test_vld2q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16
	▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8			// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>			// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
	// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
	// CHECK: [[VLD2_LANE_V:%.*]] = call { <4 x i16>, <4 x i16>			// CHECK: [[VLD2_LANE_V:%.*]] = call { <4 x half>, <4 x half>
	float16x4x2_t test_vld2_lane_f16(float16_t const * a, float16x4x2_t b) {			float16x4x2_t test_vld2_lane_f16(float16_t const * a, float16x4x2_t b) {
	return vld2_lane_f16(a, b, 3);			return vld2_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vld2_lane_f32(			// CHECK-LABEL: @test_vld2_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	int32x4x3_t test_vld3q_s32(int32_t const * a) {			int32x4x3_t test_vld3q_s32(int32_t const * a) {
	return vld3q_s32(a);			return vld3q_s32(a);
	}			}

	// CHECK-LABEL: @test_vld3q_f16(			// CHECK-LABEL: @test_vld3q_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD3Q_V:%.*]] = call { <8 x i16>, <8 x i16>, <8 x i16>			// CHECK: [[VLD3Q_V:%.*]] = call { <8 x half>, <8 x half>, <8 x half>
	float16x8x3_t test_vld3q_f16(float16_t const * a) {			float16x8x3_t test_vld3q_f16(float16_t const * a) {
	return vld3q_f16(a);			return vld3q_f16(a);
	}			}

	// CHECK-LABEL: @test_vld3q_f32(			// CHECK-LABEL: @test_vld3q_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x3_t test_vld3_s64(int64_t const * a) {			int64x1x3_t test_vld3_s64(int64_t const * a) {
	return vld3_s64(a);			return vld3_s64(a);
	}			}

	// CHECK-LABEL: @test_vld3_f16(			// CHECK-LABEL: @test_vld3_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD3_V:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD3_V:%.*]] = call { <4 x half>, <4 x half>, <4 x half>
	float16x4x3_t test_vld3_f16(float16_t const * a) {			float16x4x3_t test_vld3_f16(float16_t const * a) {
	return vld3_f16(a);			return vld3_f16(a);
	}			}

	// CHECK-LABEL: @test_vld3_f32(			// CHECK-LABEL: @test_vld3_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>			// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>			// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>
	// CHECK: [[VLD3Q_LANE_V:%.*]] = call { <8 x i16>, <8 x i16>, <8 x i16>			// CHECK: [[VLD3Q_LANE_V:%.*]] = call { <8 x half>, <8 x half>, <8 x half>
	float16x8x3_t test_vld3q_lane_f16(float16_t const * a, float16x8x3_t b) {			float16x8x3_t test_vld3q_lane_f16(float16_t const * a, float16x8x3_t b) {
	return vld3q_lane_f16(a, b, 7);			return vld3q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vld3q_lane_f32(			// CHECK-LABEL: @test_vld3q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16
	▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>			// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>			// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>
	// CHECK: [[VLD3_LANE_V:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD3_LANE_V:%.*]] = call { <4 x half>, <4 x half>, <4 x half>
	float16x4x3_t test_vld3_lane_f16(float16_t const * a, float16x4x3_t b) {			float16x4x3_t test_vld3_lane_f16(float16_t const * a, float16x4x3_t b) {
	return vld3_lane_f16(a, b, 3);			return vld3_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vld3_lane_f32(			// CHECK-LABEL: @test_vld3_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8
	▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	int32x4x4_t test_vld4q_s32(int32_t const * a) {			int32x4x4_t test_vld4q_s32(int32_t const * a) {
	return vld4q_s32(a);			return vld4q_s32(a);
	}			}

	// CHECK-LABEL: @test_vld4q_f16(			// CHECK-LABEL: @test_vld4q_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD4Q_V:%.*]] = call { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>			// CHECK: [[VLD4Q_V:%.*]] = call { <8 x half>, <8 x half>, <8 x half>, <8 x half>
	float16x8x4_t test_vld4q_f16(float16_t const * a) {			float16x8x4_t test_vld4q_f16(float16_t const * a) {
	return vld4q_f16(a);			return vld4q_f16(a);
	}			}

	// CHECK-LABEL: @test_vld4q_f32(			// CHECK-LABEL: @test_vld4q_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x4_t test_vld4_s64(int64_t const * a) {			int64x1x4_t test_vld4_s64(int64_t const * a) {
	return vld4_s64(a);			return vld4_s64(a);
	}			}

	// CHECK-LABEL: @test_vld4_f16(			// CHECK-LABEL: @test_vld4_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD4_V:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD4_V:%.*]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half>
	float16x4x4_t test_vld4_f16(float16_t const * a) {			float16x4x4_t test_vld4_f16(float16_t const * a) {
	return vld4_f16(a);			return vld4_f16(a);
	}			}

	// CHECK-LABEL: @test_vld4_f32(			// CHECK-LABEL: @test_vld4_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>			// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP11:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16			// CHECK: [[TMP11:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
	// CHECK: [[TMP12:%.*]] = bitcast <8 x half> [[TMP11]] to <16 x i8>			// CHECK: [[TMP12:%.*]] = bitcast <8 x half> [[TMP11]] to <16 x i8>
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>
	// CHECK: [[TMP16:%.*]] = bitcast <16 x i8> [[TMP12]] to <8 x i16>			// CHECK: [[TMP16:%.*]] = bitcast <16 x i8> [[TMP12]] to <8 x half>
	// CHECK: [[VLD4Q_LANE_V:%.*]] = call { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>			// CHECK: [[VLD4Q_LANE_V:%.*]] = call { <8 x half>, <8 x half>, <8 x half>, <8 x half>
	float16x8x4_t test_vld4q_lane_f16(float16_t const * a, float16x8x4_t b) {			float16x8x4_t test_vld4q_lane_f16(float16_t const * a, float16x8x4_t b) {
	return vld4q_lane_f16(a, b, 7);			return vld4q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vld4q_lane_f32(			// CHECK-LABEL: @test_vld4q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16
	▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>			// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP11:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8			// CHECK: [[TMP11:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
	// CHECK: [[TMP12:%.*]] = bitcast <4 x half> [[TMP11]] to <8 x i8>			// CHECK: [[TMP12:%.*]] = bitcast <4 x half> [[TMP11]] to <8 x i8>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>
	// CHECK: [[TMP16:%.*]] = bitcast <8 x i8> [[TMP12]] to <4 x i16>			// CHECK: [[TMP16:%.*]] = bitcast <8 x i8> [[TMP12]] to <4 x half>
	// CHECK: [[VLD4_LANE_V:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD4_LANE_V:%.*]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half>
	float16x4x4_t test_vld4_lane_f16(float16_t const * a, float16x4x4_t b) {			float16x4x4_t test_vld4_lane_f16(float16_t const * a, float16x4x4_t b) {
	return vld4_lane_f16(a, b, 3);			return vld4_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vld4_lane_f32(			// CHECK-LABEL: @test_vld4_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8
	▲ Show 20 Lines • Show All 8,874 Lines • ▼ Show 20 Lines
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_s64(int64_t * a, int64x2_t b) {			void test_vst1q_s64(int64_t * a, int64x2_t b) {
	vst1q_s64(a, b);			vst1q_s64(a, b);
	}			}

	// CHECK-LABEL: @test_vst1q_f16(			// CHECK-LABEL: @test_vst1q_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* [[TMP0]], <8 x i16> [[TMP2]], i32 2)			// CHECK: call void @llvm.arm.neon.vst1.p0i8.v8f16(i8* [[TMP0]], <8 x half> [[TMP2]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_f16(float16_t * a, float16x8_t b) {			void test_vst1q_f16(float16_t * a, float16x8_t b) {
	vst1q_f16(a, b);			vst1q_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst1q_f32(			// CHECK-LABEL: @test_vst1q_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
	▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_s64(int64_t * a, int64x1_t b) {			void test_vst1_s64(int64_t * a, int64x1_t b) {
	vst1_s64(a, b);			vst1_s64(a, b);
	}			}

	// CHECK-LABEL: @test_vst1_f16(			// CHECK-LABEL: @test_vst1_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst1.p0i8.v4i16(i8* [[TMP0]], <4 x i16> [[TMP2]], i32 2)			// CHECK: call void @llvm.arm.neon.vst1.p0i8.v4f16(i8* [[TMP0]], <4 x half> [[TMP2]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_f16(float16_t * a, float16x4_t b) {			void test_vst1_f16(float16_t * a, float16x4_t b) {
	vst1_f16(a, b);			vst1_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst1_f32(			// CHECK-LABEL: @test_vst1_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_lane_s64(int64_t * a, int64x2_t b) {			void test_vst1q_lane_s64(int64_t * a, int64x2_t b) {
	vst1q_lane_s64(a, b, 1);			vst1q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: @test_vst1q_lane_f16(			// CHECK-LABEL: @test_vst1q_lane_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// CHECK: [[TMP3:%.*]] = extractelement <8 x i16> [[TMP2]], i32 7			// CHECK: [[TMP3:%.*]] = extractelement <8 x half> [[TMP2]], i32 7
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: store i16 [[TMP3]], i16* [[TMP4]], align 2			// CHECK: store half [[TMP3]], half* [[TMP4]], align 2
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_lane_f16(float16_t * a, float16x8_t b) {			void test_vst1q_lane_f16(float16_t * a, float16x8_t b) {
	vst1q_lane_f16(a, b, 7);			vst1q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vst1q_lane_f32(			// CHECK-LABEL: @test_vst1q_lane_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
	▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_s64(int64_t * a, int64x1_t b) {			void test_vst1_lane_s64(int64_t * a, int64x1_t b) {
	vst1_lane_s64(a, b, 0);			vst1_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: @test_vst1_lane_f16(			// CHECK-LABEL: @test_vst1_lane_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3			// CHECK: [[TMP3:%.*]] = extractelement <4 x half> [[TMP2]], i32 3
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: store i16 [[TMP3]], i16* [[TMP4]], align 2			// CHECK: store half [[TMP3]], half* [[TMP4]], align 2
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_f16(float16_t * a, float16x4_t b) {			void test_vst1_lane_f16(float16_t * a, float16x4_t b) {
	vst1_lane_f16(a, b, 3);			vst1_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vst1_lane_f32(			// CHECK-LABEL: @test_vst1_lane_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16			// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
	// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>			// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst2.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP8]], <8 x i16> [[TMP9]], i32 2)			// CHECK: call void @llvm.arm.neon.vst2.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP8]], <8 x half> [[TMP9]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_f16(float16_t * a, float16x8x2_t b) {			void test_vst2q_f16(float16_t * a, float16x8x2_t b) {
	vst2q_f16(a, b);			vst2q_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst2q_f32(			// CHECK-LABEL: @test_vst2q_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
	▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8			// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
	// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>			// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst2.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], i32 2)			// CHECK: call void @llvm.arm.neon.vst2.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP8]], <4 x half> [[TMP9]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_f16(float16_t * a, float16x4x2_t b) {			void test_vst2_f16(float16_t * a, float16x4x2_t b) {
	vst2_f16(a, b);			vst2_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst2_f32(			// CHECK-LABEL: @test_vst2_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
	▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16			// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
	// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>			// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst2lane.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP8]], <8 x i16> [[TMP9]], i32 7, i32 2)			// CHECK: call void @llvm.arm.neon.vst2lane.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP8]], <8 x half> [[TMP9]], i32 7, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_f16(float16_t * a, float16x8x2_t b) {			void test_vst2q_lane_f16(float16_t * a, float16x8x2_t b) {
	vst2q_lane_f16(a, b, 7);			vst2q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vst2q_lane_f32(			// CHECK-LABEL: @test_vst2q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
	▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8			// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
	// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>			// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst2lane.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], i32 3, i32 2)			// CHECK: call void @llvm.arm.neon.vst2lane.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP8]], <4 x half> [[TMP9]], i32 3, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_f16(float16_t * a, float16x4x2_t b) {			void test_vst2_lane_f16(float16_t * a, float16x4x2_t b) {
	vst2_lane_f16(a, b, 3);			vst2_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vst2_lane_f32(			// CHECK-LABEL: @test_vst2_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
	▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst3.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], <8 x i16> [[TMP12]], i32 2)			// CHECK: call void @llvm.arm.neon.vst3.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], <8 x half> [[TMP12]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_f16(float16_t * a, float16x8x3_t b) {			void test_vst3q_f16(float16_t * a, float16x8x3_t b) {
	vst3q_f16(a, b);			vst3q_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst3q_f32(			// CHECK-LABEL: @test_vst3q_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
	▲ Show 20 Lines • Show All 331 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst3.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], <4 x i16> [[TMP12]], i32 2)			// CHECK: call void @llvm.arm.neon.vst3.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], <4 x half> [[TMP12]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_f16(float16_t * a, float16x4x3_t b) {			void test_vst3_f16(float16_t * a, float16x4x3_t b) {
	vst3_f16(a, b);			vst3_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst3_f32(			// CHECK-LABEL: @test_vst3_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
	▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst3lane.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], <8 x i16> [[TMP12]], i32 7, i32 2)			// CHECK: call void @llvm.arm.neon.vst3lane.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], <8 x half> [[TMP12]], i32 7, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_f16(float16_t * a, float16x8x3_t b) {			void test_vst3q_lane_f16(float16_t * a, float16x8x3_t b) {
	vst3q_lane_f16(a, b, 7);			vst3q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vst3q_lane_f32(			// CHECK-LABEL: @test_vst3q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
	▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst3lane.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], <4 x i16> [[TMP12]], i32 3, i32 2)			// CHECK: call void @llvm.arm.neon.vst3lane.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], <4 x half> [[TMP12]], i32 3, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_f16(float16_t * a, float16x4x3_t b) {			void test_vst3_lane_f16(float16_t * a, float16x4x3_t b) {
	vst3_lane_f16(a, b, 3);			vst3_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vst3_lane_f32(			// CHECK-LABEL: @test_vst3_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
	▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16			// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
	// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>			// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst4.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], <8 x i16> [[TMP15]], i32 2)			// CHECK: call void @llvm.arm.neon.vst4.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], <8 x half> [[TMP15]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_f16(float16_t * a, float16x8x4_t b) {			void test_vst4q_f16(float16_t * a, float16x8x4_t b) {
	vst4q_f16(a, b);			vst4q_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst4q_f32(			// CHECK-LABEL: @test_vst4q_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
	▲ Show 20 Lines • Show All 384 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8			// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
	// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>			// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst4.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], <4 x i16> [[TMP15]], i32 2)			// CHECK: call void @llvm.arm.neon.vst4.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], <4 x half> [[TMP15]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_f16(float16_t * a, float16x4x4_t b) {			void test_vst4_f16(float16_t * a, float16x4x4_t b) {
	vst4_f16(a, b);			vst4_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst4_f32(			// CHECK-LABEL: @test_vst4_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
	▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16			// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
	// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>			// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst4lane.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], <8 x i16> [[TMP15]], i32 7, i32 2)			// CHECK: call void @llvm.arm.neon.vst4lane.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], <8 x half> [[TMP15]], i32 7, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_f16(float16_t * a, float16x8x4_t b) {			void test_vst4q_lane_f16(float16_t * a, float16x8x4_t b) {
	vst4q_lane_f16(a, b, 7);			vst4q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vst4q_lane_f32(			// CHECK-LABEL: @test_vst4q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
	▲ Show 20 Lines • Show All 285 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8			// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
	// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>			// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst4lane.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], <4 x i16> [[TMP15]], i32 3, i32 2)			// CHECK: call void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], <4 x half> [[TMP15]], i32 3, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_f16(float16_t * a, float16x4x4_t b) {			void test_vst4_lane_f16(float16_t * a, float16x4x4_t b) {
	vst4_lane_f16(a, b, 3);			vst4_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vst4_lane_f32(			// CHECK-LABEL: @test_vst4_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
	▲ Show 20 Lines • Show All 1,867 Lines • Show Last 20 Lines