This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
arm_mve.td
-
arm_mve_defs.td
-
lib/CodeGen/
-
CodeGen/
-
CGBuiltin.cpp
-
test/CodeGen/arm-mve-intrinsics/
-
CodeGen/
-
arm-mve-intrinsics/
-
load-store.c
-
utils/TableGen/
-
TableGen/
1
MveEmitter.cpp
-
llvm/test/CodeGen/Thumb2/mve-intrinsics/
-
test/
-
CodeGen/
-
Thumb2/
-
mve-intrinsics/
-
load-store.ll

Differential D70088

[ARM,MVE] Add intrinsics for contiguous load/stores.
ClosedPublic

Authored by simon_tatham on Nov 11 2019, 9:18 AM.

Download Raw Diff

Details

Reviewers

ostannard
MarkMurrayARM
dmgreen

Commits

rGa12f588ebb1a: [ARM,MVE] Add intrinsics for contiguous load/stores.

Summary

This patch adds the ACLE intrinsics for all the MVE load and store
instructions not already handled by D69791. These ones don't need new
IR intrinsics, because they can be implemented in terms of standard
LLVM IR constructions.

Some of the load and store instructions access less than 128 bits of
memory, sign/zero extending each value to a wider vector lane on load
or truncating it on store. These are represented in IR by a load of a
shorter vector followed by a zext/sext, and conversely, a trunc
followed by a short store. Existing ISel patterns already recognize
those combinations and turn them into the right MVE instructions.

The predicated forms of all these instructions are represented in the
same way, except that the ordinary load/store operation is replaced
with the existing intrinsics @llvm.masked.{load,store}. These are
currently only code-generated as predicated MVE load/store
instructions if you give LLVM the -enable-arm-maskedldst option; so
I've done that in the LLVM codegen test. When we make that the
default, that option can be removed.

In the Tablegen backend, I've had to add a handful of extra support
features:

We need to be able to make clang::Address objects out of a pointer and an alignment (previously we only needed these when the user passed us an existing one).

We can now specify vector types that aren't 128 bits wide (for use in those intermediate values in IR), the parametrized type system can make one starting from two existing vector types (using the lane count of one and the element type of the other).

I've added support for code generation of pointer casts, and for specifying LLVM types as operands to IRBuilder operations (for zext and sext, though I think they'll come in useful again).

Now not all IR construction operations need to be specified as Builder.CreateFoo; some don't involve a Builder at all, and one passes it as a parameter to a tiny static helper function in CGBuiltin.cpp.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 40824
Build 40958: arc lint + arc unit

Event Timeline

simon_tatham created this revision.Nov 11 2019, 9:18 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 11 2019, 9:18 AM

Herald added subscribers: llvm-commits, cfe-commits, kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B40749: Diff 228709.Nov 11 2019, 9:18 AM

Minor revision to the Tablegen changes to support different kinds of IR construction function: now the differing function-call prefixes are set up in arm_mve_defs.td, instead of in MveEmitter.cpp. That fits better with further changes I'm making in that area.

simon_tatham added a child revision: D70133: [ARM,MVE] Add intrinsics for 'administrative' vector operations..Nov 12 2019, 9:57 AM

Harbormaster completed remote builds in B40824: Diff 228911.Nov 12 2019, 10:00 AM

Very nice

Just to check, we don't have to care about big endian here? Is just works OK because the rest of llvm handles it OK? (I'm not sure if a vld1 is different to a vldr in big endian, for example).

Yes, vld1 has the same semantics as vldrw_*32 or vldrh_*16 or vldrb_*8. It's just a convenience alias that makes polymorphism easier – if I remember rightly the intended use case was people writing MVE intrinsics inside C++ templates.

OK. vldr and vld1 working differently for Neon under BE, if I'm remembering correctly.

LGTM then.

clang/utils/TableGen/MveEmitter.cpp
475	Maybe update this comment?

This revision is now accepted and ready to land.Nov 13 2019, 3:58 AM

simon_tatham mentioned this in D70133: [ARM,MVE] Add intrinsics for 'administrative' vector operations..Nov 13 2019, 4:15 AM

Closed by commit rGa12f588ebb1a: [ARM,MVE] Add intrinsics for contiguous load/stores. (authored by simon_tatham). · Explain WhyNov 13 2019, 4:52 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

arm_mve.td

118 lines

arm_mve_defs.td

50 lines

lib/

CodeGen/

CGBuiltin.cpp

7 lines

test/

CodeGen/

arm-mve-intrinsics/

load-store.c

1325 lines

utils/

TableGen/

MveEmitter.cpp

122 lines

llvm/

test/

CodeGen/

Thumb2/

mve-intrinsics/

load-store.ll

1208 lines

Diff 228911

clang/include/clang/Basic/arm_mve.td

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	def vcvt#half#q_m_f16: Intrinsic<			def vcvt#half#q_m_f16: Intrinsic<
	VecOf<f16>, (args VecOf<f16>:$inactive, Vector:$a, PredOf<f32>:$pred),			VecOf<f16>, (args VecOf<f16>:$inactive, Vector:$a, PredOf<f32>:$pred),
	(IRInt<"vcvt_narrow_predicated"> $inactive, $a, halfconst, $pred)>;			(IRInt<"vcvt_narrow_predicated"> $inactive, $a, halfconst, $pred)>;

	} // params = [f32], pnt = PNT_None			} // params = [f32], pnt = PNT_None

	} // loop over half = "b", "t"			} // loop over half = "b", "t"

				multiclass contiguous_load<string mnemonic, PrimitiveType memtype,
				list<Type> same_size, list<Type> wider> {
				// Intrinsics named with explicit memory and element sizes that match:
				// vldrbq_?8, vldrhq_?16, vldrwq_?32.
				let params = same_size, pnt = PNT_None in {
				def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr),
				(load (address (CPtr<Vector> $addr), !srl(memtype.size,3)))>,
				NameOverride<mnemonic>;
				def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr,
				Predicate:$pred),
				(IRIntBase<"masked_load", [Vector, CPtr<Vector>]>
				(CPtr<Vector> $addr), !srl(memtype.size,3),
				$pred, (zeroinit Vector))>,
				NameOverride<mnemonic # "_z">;
				}

				// Synonyms for the above, with the generic name vld1q that just means
				// 'memory and element sizes match', and allows convenient polymorphism with
				// the memory and element types covariant.
				let params = same_size in {
				def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr),
				(load (address (CPtr<Vector> $addr), !srl(memtype.size,3)))>,
				NameOverride<"vld1q">;
				def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr,
				Predicate:$pred),
				(IRIntBase<"masked_load", [Vector, CPtr<Vector>]>
				(CPtr<Vector> $addr), !srl(memtype.size,3),
				$pred, (zeroinit Vector))>,
				NameOverride<"vld1q_z">;
				}

				// Intrinsics with the memory size narrower than the vector element, so that
				// they load less than 128 bits of memory and sign/zero extend each loaded
				// value into a wider vector lane.
				let params = wider, pnt = PNT_None in {
				def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr),
				(extend (load (address (CPtr<NarrowedVecOf<memtype,Vector>>
				$addr), !srl(memtype.size,3))),
				Vector, (unsignedflag Scalar))>,
				NameOverride<mnemonic>;
				def: Intrinsic<Vector, (args CPtr<CopyKind<same_size[0], Scalar>>:$addr,
				Predicate:$pred),
				(extend (IRIntBase<"masked_load",
				[NarrowedVecOf<memtype,Vector>,
				CPtr<NarrowedVecOf<memtype,Vector>>]>
				(CPtr<NarrowedVecOf<memtype,Vector>> $addr),
				!srl(memtype.size,3), $pred,
				(zeroinit NarrowedVecOf<memtype,Vector>)),
				Vector, (unsignedflag Scalar))>,
				NameOverride<mnemonic # "_z">;
				}
				}

				defm: contiguous_load<"vldrbq", u8, T.All8, !listconcat(T.Int16, T.Int32)>;
				defm: contiguous_load<"vldrhq", u16, T.All16, T.Int32>;
				defm: contiguous_load<"vldrwq", u32, T.All32, []>;

				multiclass contiguous_store<string mnemonic, PrimitiveType memtype,
				list<Type> same_size, list<Type> wider> {
				// Intrinsics named with explicit memory and element sizes that match:
				// vstrbq_?8, vstrhq_?16, vstrwq_?32.
				let params = same_size in {
				def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
				Vector:$value),
				(store $value,
				(address (Ptr<Vector> $addr), !srl(memtype.size,3)))>,
				NameOverride<mnemonic>;
				def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
				Vector:$value, Predicate:$pred),
				(IRIntBase<"masked_store", [Vector, Ptr<Vector>]>
				$value, (Ptr<Vector> $addr),
				!srl(memtype.size,3), $pred)>,
				NameOverride<mnemonic # "_p">;
				}

				// Synonyms for the above, with the generic name vst1q that just means
				// 'memory and element sizes match', and allows convenient polymorphism with
				// the memory and element types covariant.
				let params = same_size in {
				def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
				Vector:$value),
				(store $value,
				(address (Ptr<Vector> $addr), !srl(memtype.size,3)))>,
				NameOverride<"vst1q">;
				def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
				Vector:$value, Predicate:$pred),
				(IRIntBase<"masked_store", [Vector, Ptr<Vector>]>
				$value, (Ptr<Vector> $addr),
				!srl(memtype.size,3), $pred)>,
				NameOverride<"vst1q_p">;
				}

				// Intrinsics with the memory size narrower than the vector element, so that
				// they store less than 128 bits of memory, truncating each vector lane into
				// a narrower value to store.
				let params = wider in {
				def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
				Vector:$value),
				(store (trunc $value, NarrowedVecOf<memtype,Vector>),
				(address (Ptr<NarrowedVecOf<memtype,Vector>> $addr),
				!srl(memtype.size,3)))>,
				NameOverride<mnemonic>;
				def: Intrinsic<Void, (args Ptr<CopyKind<same_size[0], Scalar>>:$addr,
				Vector:$value, Predicate:$pred),
				(IRIntBase<"masked_store",
				[NarrowedVecOf<memtype,Vector>,
				Ptr<NarrowedVecOf<memtype,Vector>>]>
				(trunc $value, NarrowedVecOf<memtype,Vector>),
				(Ptr<NarrowedVecOf<memtype,Vector>> $addr),
				!srl(memtype.size,3), $pred)>,
				NameOverride<mnemonic # "_p">;
				}
				}

				defm: contiguous_store<"vstrbq", u8, T.All8, !listconcat(T.Int16, T.Int32)>;
				defm: contiguous_store<"vstrhq", u16, T.All16, T.Int32>;
				defm: contiguous_store<"vstrwq", u32, T.All32, []>;

	multiclass gather_base<list<Type> types, int size> {			multiclass gather_base<list<Type> types, int size> {
	let params = types, pnt = PNT_None in {			let params = types, pnt = PNT_None in {
	def _gather_base: Intrinsic<			def _gather_base: Intrinsic<
	Vector, (args UVector:$addr, imm_mem7bit<size>:$offset),			Vector, (args UVector:$addr, imm_mem7bit<size>:$offset),
	(IRInt<"vldr_gather_base", [Vector, UVector]> $addr, $offset)>;			(IRInt<"vldr_gather_base", [Vector, UVector]> $addr, $offset)>;

	def _gather_base_z: Intrinsic<			def _gather_base_z: Intrinsic<
	Vector, (args UVector:$addr, imm_mem7bit<size>:$offset, Predicate:$pred),			Vector, (args UVector:$addr, imm_mem7bit<size>:$offset, Predicate:$pred),
	▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines

clang/include/clang/Basic/arm_mve_defs.td

	Show All 22 Lines
	// each one a name, to be used in codegen. For example, (args Vector:$a,			// each one a name, to be used in codegen. For example, (args Vector:$a,
	// Scalar:$b) defines the names $a and $b which the specification of the code			// Scalar:$b) defines the names $a and $b which the specification of the code
	// for that intrinsic can refer to.			// for that intrinsic can refer to.

	def args;			def args;

	// -----------------------------------------------------------------------------			// -----------------------------------------------------------------------------
	// Family of nodes for use in the codegen dag for an intrinsic, corresponding			// Family of nodes for use in the codegen dag for an intrinsic, corresponding
	// roughly to operations in LLVM IR. More precisely, they correspond to calls			// to function calls that return LLVM IR nodes.
	// to methods of llvm::IRBuilder.			class IRBuilderBase {
	class IRBuilder<string func_> {			// The prefix of the function call, including an open parenthesis.
	string func = func_; // the method name			string prefix;

				// Any parameters that have types that have to be treated specially by the
				// Tablegen back end. Generally these will be types other than llvm::Value *,
				// although not all other types need special treatment (e.g. llvm::Type *).
	list<int> address_params = []; // indices of parameters with type Address			list<int> address_params = []; // indices of parameters with type Address
	list<int> int_constant_params = []; // indices of plain integer parameters			list<int> int_constant_params = []; // indices of plain integer parameters
	}			}
				class IRBuilder<string func> : IRBuilderBase {
				// The usual case: a method called on the code gen function's instance of
				// llvm::IRBuilder.
				let prefix = "Builder." # func # "(";
				}
				class IRFunction<string func> : IRBuilderBase {
				// Some other function that doesn't use the IRBuilder at all.
				let prefix = func # "(";
				}
				class CGHelperFn<string func> : IRBuilderBase {
				// A helper function defined in CGBuiltin.cpp, which takes the IRBuilder as
				// an argument.
				let prefix = func # "(Builder, ";
				}
	def add: IRBuilder<"CreateAdd">;			def add: IRBuilder<"CreateAdd">;
	def or: IRBuilder<"CreateOr">;			def or: IRBuilder<"CreateOr">;
	def and: IRBuilder<"CreateAnd">;			def and: IRBuilder<"CreateAnd">;
	def sub: IRBuilder<"CreateSub">;			def sub: IRBuilder<"CreateSub">;
	def shl: IRBuilder<"CreateShl">;			def shl: IRBuilder<"CreateShl">;
	def lshr: IRBuilder<"CreateLShr">;			def lshr: IRBuilder<"CreateLShr">;
	def fadd: IRBuilder<"CreateFAdd">;			def fadd: IRBuilder<"CreateFAdd">;
	def fsub: IRBuilder<"CreateFSub">;			def fsub: IRBuilder<"CreateFSub">;
	def load: IRBuilder<"CreateLoad"> { let address_params = [0]; }			def load: IRBuilder<"CreateLoad"> { let address_params = [0]; }
	def store: IRBuilder<"CreateStore"> { let address_params = [1]; }			def store: IRBuilder<"CreateStore"> { let address_params = [1]; }
	def xval: IRBuilder<"CreateExtractValue"> { let int_constant_params = [1]; }			def xval: IRBuilder<"CreateExtractValue"> { let int_constant_params = [1]; }
				def trunc: IRBuilder<"CreateTrunc">;
				def extend: CGHelperFn<"SignOrZeroExtend"> { let int_constant_params = [2]; }
				def zeroinit: IRFunction<"llvm::Constant::getNullValue">;

				// A node that makes an Address out of a pointer-typed Value, by
				// providing an alignment as the second argument.
				def address;

	// Another node class you can use in the codegen dag. This one corresponds to			// Another node class you can use in the codegen dag. This one corresponds to
	// an IR intrinsic function, which has to be specialized to a particular list			// an IR intrinsic function, which has to be specialized to a particular list
	// of types.			// of types.
	class IRInt<string name_, list<Type> params_ = [], bit appendKind_ = 0> {			class IRIntBase<string name_, list<Type> params_ = [], bit appendKind_ = 0> {
	string intname = name_; // base name of the intrinsic, minus "arm_mve_"			string intname = name_; // base name of the intrinsic
	list<Type> params = params_; // list of parameter types			list<Type> params = params_; // list of parameter types

	// If this flag is set, then the IR intrinsic name will get a suffix _s, _u			// If this flag is set, then the IR intrinsic name will get a suffix _s, _u
	// or _f depending on whether the main parameter type of the ACLE intrinsic			// or _f depending on whether the main parameter type of the ACLE intrinsic
	// being generated is a signed integer, unsigned integer, or float. Mostly			// being generated is a signed integer, unsigned integer, or float. Mostly
	// this is useful for signed vs unsigned integers, because the ACLE			// this is useful for signed vs unsigned integers, because the ACLE
	// intrinsics and the source-level integer types distinguish them, but at IR			// intrinsics and the source-level integer types distinguish them, but at IR
	// level the distinction has moved from the type system into the operations			// level the distinction has moved from the type system into the operations
	// and you just have i32 or i16 etc. So when an IR intrinsic has to vary with			// and you just have i32 or i16 etc. So when an IR intrinsic has to vary with
	// signedness, you set this bit, and then you can still put the signed and			// signedness, you set this bit, and then you can still put the signed and
	// unsigned versions in the same subclass of Intrinsic, and the Tablegen			// unsigned versions in the same subclass of Intrinsic, and the Tablegen
	// backend will take care of adding _s or _u as appropriate in each instance.			// backend will take care of adding _s or _u as appropriate in each instance.
	bit appendKind = appendKind_;			bit appendKind = appendKind_;
	}			}

				// Mostly we'll be using @llvm.arm.mve.* intrinsics, so here's a trivial
				// subclass that puts on that prefix.
				class IRInt<string name, list<Type> params = [], bit appendKind = 0>
				: IRIntBase<"arm_mve_" # name, params, appendKind>;

	// The 'seq' node in a codegen dag specifies a set of IR operations to be			// The 'seq' node in a codegen dag specifies a set of IR operations to be
	// performed in order. It has the special ability to define extra variable			// performed in order. It has the special ability to define extra variable
	// names, on top of the ones that refer to the intrinsic's parameters. For			// names, on top of the ones that refer to the intrinsic's parameters. For
	// example:			// example:
	//			//
	// (seq (foo this, that):$a,			// (seq (foo this, that):$a,
	// (bar this, $a):$b			// (bar this, $a):$b
	// (add $a, $b))			// (add $a, $b))
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	// rather than uint32_t.			// rather than uint32_t.
	def uint: PrimitiveType<"u", 32> { let nameOverride = "unsigned"; }			def uint: PrimitiveType<"u", 32> { let nameOverride = "unsigned"; }
	def sint: PrimitiveType<"s", 32> { let nameOverride = "int"; }			def sint: PrimitiveType<"s", 32> { let nameOverride = "int"; }

	// VecOf<t> expects t to be a scalar, and gives a 128-bit vector of whatever it			// VecOf<t> expects t to be a scalar, and gives a 128-bit vector of whatever it
	// is.			// is.
	class VecOf<Type t>: ComplexType<(CTO_Vec t)>;			class VecOf<Type t>: ComplexType<(CTO_Vec t)>;

				// NarrowedVecOf<t,v> expects t to be a scalar type, and v to be a vector
				// type. It returns a vector type whose element type is t, and whose lane
				// count is the same as the lane count of v. (Used as an intermediate value
				// type in the IR representation of a widening load: you load a vector of
				// small things out of memory, and then zext/sext them into a full 128-bit
				// output vector.)
				class NarrowedVecOf<Type t, Type v>: ComplexType<(CTO_Vec t, v)>;

	// PredOf expects t to be a scalar, and expands to a predicate vector which			// PredOf expects t to be a scalar, and expands to a predicate vector which
	// (logically speaking) has the same number of lanes as VecOf<t> would.			// (logically speaking) has the same number of lanes as VecOf<t> would.
	class PredOf<Type t>: ComplexType<(CTO_Pred t)>;			class PredOf<Type t>: ComplexType<(CTO_Pred t)>;

	// Scalar expands to whatever is the main parameter type of the current			// Scalar expands to whatever is the main parameter type of the current
	// intrinsic. Vector and Predicate expand to the vector and predicate types			// intrinsic. Vector and Predicate expand to the vector and predicate types
	// corresponding to that.			// corresponding to that.
	def Scalar: ComplexType<(CTO_Parameter)>;			def Scalar: ComplexType<(CTO_Parameter)>;
	▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,782 Lines • ▼ Show 20 Lines	case NEON::BI__builtin_neon_vtbx3_v:
return EmitNeonCall(CGM.getIntrinsic(Intrinsic::arm_neon_vtbx3),		return EmitNeonCall(CGM.getIntrinsic(Intrinsic::arm_neon_vtbx3),
Ops, "vtbx3");		Ops, "vtbx3");
case NEON::BI__builtin_neon_vtbx4_v:		case NEON::BI__builtin_neon_vtbx4_v:
return EmitNeonCall(CGM.getIntrinsic(Intrinsic::arm_neon_vtbx4),		return EmitNeonCall(CGM.getIntrinsic(Intrinsic::arm_neon_vtbx4),
Ops, "vtbx4");		Ops, "vtbx4");
}		}
}		}

		static llvm::Value SignOrZeroExtend(CGBuilderTy &Builder, llvm::Value V,
		llvm::Type *T, bool Unsigned) {
		// Helper function called by Tablegen-constructed ARM MVE builtin codegen,
		// which finds it convenient to specify signed/unsigned as a boolean flag.
		return Unsigned ? Builder.CreateZExt(V, T) : Builder.CreateSExt(V, T);
		}

Value *CodeGenFunction::EmitARMMVEBuiltinExpr(unsigned BuiltinID,		Value *CodeGenFunction::EmitARMMVEBuiltinExpr(unsigned BuiltinID,
const CallExpr *E,		const CallExpr *E,
ReturnValueSlot ReturnValue,		ReturnValueSlot ReturnValue,
llvm::Triple::ArchType Arch) {		llvm::Triple::ArchType Arch) {
enum class CustomCodeGen { VLD24, VST24 } CustomCodeGenType;		enum class CustomCodeGen { VLD24, VST24 } CustomCodeGenType;
Intrinsic::ID IRIntr;		Intrinsic::ID IRIntr;
unsigned NumVectors;		unsigned NumVectors;

▲ Show 20 Lines • Show All 7,897 Lines • Show Last 20 Lines

clang/test/CodeGen/arm-mve-intrinsics/load-store.c

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
				// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s
				// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -mfloat-abi hard -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

				#include <arm_mve.h>

				// CHECK-LABEL: @test_vld1q_f16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast half [[BASE:%.]] to <8 x half>
				// CHECK-NEXT: [[TMP1:%.]] = load <8 x half>, <8 x half> [[TMP0]], align 2
				// CHECK-NEXT: ret <8 x half> [[TMP1]]
				//
				float16x8_t test_vld1q_f16(const float16_t *base)
				{
				#ifdef POLYMORPHIC
				return vld1q(base);
				#else /* POLYMORPHIC */
				return vld1q_f16(base);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_f32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast float [[BASE:%.]] to <4 x float>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
				// CHECK-NEXT: ret <4 x float> [[TMP1]]
				//
				float32x4_t test_vld1q_f32(const float32_t *base)
				{
				#ifdef POLYMORPHIC
				return vld1q(base);
				#else /* POLYMORPHIC */
				return vld1q_f32(base);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[TMP0]], align 1
				// CHECK-NEXT: ret <16 x i8> [[TMP1]]
				//
				int8x16_t test_vld1q_s8(const int8_t *base)
				{
				#ifdef POLYMORPHIC
				return vld1q(base);
				#else /* POLYMORPHIC */
				return vld1q_s8(base);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
				// CHECK-NEXT: ret <8 x i16> [[TMP1]]
				//
				int16x8_t test_vld1q_s16(const int16_t *base)
				{
				#ifdef POLYMORPHIC
				return vld1q(base);
				#else /* POLYMORPHIC */
				return vld1q_s16(base);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
				// CHECK-NEXT: ret <4 x i32> [[TMP1]]
				//
				int32x4_t test_vld1q_s32(const int32_t *base)
				{
				#ifdef POLYMORPHIC
				return vld1q(base);
				#else /* POLYMORPHIC */
				return vld1q_s32(base);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[TMP0]], align 1
				// CHECK-NEXT: ret <16 x i8> [[TMP1]]
				//
				uint8x16_t test_vld1q_u8(const uint8_t *base)
				{
				#ifdef POLYMORPHIC
				return vld1q(base);
				#else /* POLYMORPHIC */
				return vld1q_u8(base);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
				// CHECK-NEXT: ret <8 x i16> [[TMP1]]
				//
				uint16x8_t test_vld1q_u16(const uint16_t *base)
				{
				#ifdef POLYMORPHIC
				return vld1q(base);
				#else /* POLYMORPHIC */
				return vld1q_u16(base);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
				// CHECK-NEXT: ret <4 x i32> [[TMP1]]
				//
				uint32x4_t test_vld1q_u32(const uint32_t *base)
				{
				#ifdef POLYMORPHIC
				return vld1q(base);
				#else /* POLYMORPHIC */
				return vld1q_u32(base);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_z_f16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast half [[BASE:%.]] to <8 x half>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <8 x half> @llvm.masked.load.v8f16.p0v8f16(<8 x half> [[TMP0]], i32 2, <8 x i1> [[TMP2]], <8 x half> zeroinitializer)
				// CHECK-NEXT: ret <8 x half> [[TMP3]]
				//
				float16x8_t test_vld1q_z_f16(const float16_t *base, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vld1q_z(base, p);
				#else /* POLYMORPHIC */
				return vld1q_z_f16(base, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_z_f32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast float [[BASE:%.]] to <4 x float>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP0]], i32 4, <4 x i1> [[TMP2]], <4 x float> zeroinitializer)
				// CHECK-NEXT: ret <4 x float> [[TMP3]]
				//
				float32x4_t test_vld1q_z_f32(const float32_t *base, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vld1q_z(base, p);
				#else /* POLYMORPHIC */
				return vld1q_z_f32(base, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_z_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP0]], i32 1, <16 x i1> [[TMP2]], <16 x i8> zeroinitializer)
				// CHECK-NEXT: ret <16 x i8> [[TMP3]]
				//
				int8x16_t test_vld1q_z_s8(const int8_t *base, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vld1q_z(base, p);
				#else /* POLYMORPHIC */
				return vld1q_z_s8(base, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_z_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16> [[TMP0]], i32 2, <8 x i1> [[TMP2]], <8 x i16> zeroinitializer)
				// CHECK-NEXT: ret <8 x i16> [[TMP3]]
				//
				int16x8_t test_vld1q_z_s16(const int16_t *base, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vld1q_z(base, p);
				#else /* POLYMORPHIC */
				return vld1q_z_s16(base, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_z_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP0]], i32 4, <4 x i1> [[TMP2]], <4 x i32> zeroinitializer)
				// CHECK-NEXT: ret <4 x i32> [[TMP3]]
				//
				int32x4_t test_vld1q_z_s32(const int32_t *base, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vld1q_z(base, p);
				#else /* POLYMORPHIC */
				return vld1q_z_s32(base, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_z_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP0]], i32 1, <16 x i1> [[TMP2]], <16 x i8> zeroinitializer)
				// CHECK-NEXT: ret <16 x i8> [[TMP3]]
				//
				uint8x16_t test_vld1q_z_u8(const uint8_t *base, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vld1q_z(base, p);
				#else /* POLYMORPHIC */
				return vld1q_z_u8(base, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_z_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16> [[TMP0]], i32 2, <8 x i1> [[TMP2]], <8 x i16> zeroinitializer)
				// CHECK-NEXT: ret <8 x i16> [[TMP3]]
				//
				uint16x8_t test_vld1q_z_u16(const uint16_t *base, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vld1q_z(base, p);
				#else /* POLYMORPHIC */
				return vld1q_z_u16(base, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vld1q_z_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP0]], i32 4, <4 x i1> [[TMP2]], <4 x i32> zeroinitializer)
				// CHECK-NEXT: ret <4 x i32> [[TMP3]]
				//
				uint32x4_t test_vld1q_z_u32(const uint32_t *base, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vld1q_z(base, p);
				#else /* POLYMORPHIC */
				return vld1q_z_u32(base, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vldrbq_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[TMP0]], align 1
				// CHECK-NEXT: ret <16 x i8> [[TMP1]]
				//
				int8x16_t test_vldrbq_s8(const int8_t *base)
				{
				return vldrbq_s8(base);
				}

				// CHECK-LABEL: @test_vldrbq_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1
				// CHECK-NEXT: [[TMP2:%.*]] = sext <8 x i8> [[TMP1]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				int16x8_t test_vldrbq_s16(const int8_t *base)
				{
				return vldrbq_s16(base);
				}

				// CHECK-LABEL: @test_vldrbq_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
				// CHECK-NEXT: [[TMP2:%.*]] = sext <4 x i8> [[TMP1]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				int32x4_t test_vldrbq_s32(const int8_t *base)
				{
				return vldrbq_s32(base);
				}

				// CHECK-LABEL: @test_vldrbq_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[TMP0]], align 1
				// CHECK-NEXT: ret <16 x i8> [[TMP1]]
				//
				uint8x16_t test_vldrbq_u8(const uint8_t *base)
				{
				return vldrbq_u8(base);
				}

				// CHECK-LABEL: @test_vldrbq_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1
				// CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i8> [[TMP1]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				uint16x8_t test_vldrbq_u16(const uint8_t *base)
				{
				return vldrbq_u16(base);
				}

				// CHECK-LABEL: @test_vldrbq_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
				// CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[TMP1]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				uint32x4_t test_vldrbq_u32(const uint8_t *base)
				{
				return vldrbq_u32(base);
				}

				// CHECK-LABEL: @test_vldrbq_z_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP0]], i32 1, <16 x i1> [[TMP2]], <16 x i8> zeroinitializer)
				// CHECK-NEXT: ret <16 x i8> [[TMP3]]
				//
				int8x16_t test_vldrbq_z_s8(const int8_t *base, mve_pred16_t p)
				{
				return vldrbq_z_s8(base, p);
				}

				// CHECK-LABEL: @test_vldrbq_z_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8> [[TMP0]], i32 1, <8 x i1> [[TMP2]], <8 x i8> zeroinitializer)
				// CHECK-NEXT: [[TMP4:%.*]] = sext <8 x i8> [[TMP3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP4]]
				//
				int16x8_t test_vldrbq_z_s16(const int8_t *base, mve_pred16_t p)
				{
				return vldrbq_z_s16(base, p);
				}

				// CHECK-LABEL: @test_vldrbq_z_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8> [[TMP0]], i32 1, <4 x i1> [[TMP2]], <4 x i8> zeroinitializer)
				// CHECK-NEXT: [[TMP4:%.*]] = sext <4 x i8> [[TMP3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP4]]
				//
				int32x4_t test_vldrbq_z_s32(const int8_t *base, mve_pred16_t p)
				{
				return vldrbq_z_s32(base, p);
				}

				// CHECK-LABEL: @test_vldrbq_z_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP0]], i32 1, <16 x i1> [[TMP2]], <16 x i8> zeroinitializer)
				// CHECK-NEXT: ret <16 x i8> [[TMP3]]
				//
				uint8x16_t test_vldrbq_z_u8(const uint8_t *base, mve_pred16_t p)
				{
				return vldrbq_z_u8(base, p);
				}

				// CHECK-LABEL: @test_vldrbq_z_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8> [[TMP0]], i32 1, <8 x i1> [[TMP2]], <8 x i8> zeroinitializer)
				// CHECK-NEXT: [[TMP4:%.*]] = zext <8 x i8> [[TMP3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP4]]
				//
				uint16x8_t test_vldrbq_z_u16(const uint8_t *base, mve_pred16_t p)
				{
				return vldrbq_z_u16(base, p);
				}

				// CHECK-LABEL: @test_vldrbq_z_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8> [[TMP0]], i32 1, <4 x i1> [[TMP2]], <4 x i8> zeroinitializer)
				// CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[TMP3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP4]]
				//
				uint32x4_t test_vldrbq_z_u32(const uint8_t *base, mve_pred16_t p)
				{
				return vldrbq_z_u32(base, p);
				}

				// CHECK-LABEL: @test_vldrhq_f16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast half [[BASE:%.]] to <8 x half>
				// CHECK-NEXT: [[TMP1:%.]] = load <8 x half>, <8 x half> [[TMP0]], align 2
				// CHECK-NEXT: ret <8 x half> [[TMP1]]
				//
				float16x8_t test_vldrhq_f16(const float16_t *base)
				{
				return vldrhq_f16(base);
				}

				// CHECK-LABEL: @test_vldrhq_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
				// CHECK-NEXT: ret <8 x i16> [[TMP1]]
				//
				int16x8_t test_vldrhq_s16(const int16_t *base)
				{
				return vldrhq_s16(base);
				}

				// CHECK-LABEL: @test_vldrhq_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> [[TMP0]], align 2
				// CHECK-NEXT: [[TMP2:%.*]] = sext <4 x i16> [[TMP1]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				int32x4_t test_vldrhq_s32(const int16_t *base)
				{
				return vldrhq_s32(base);
				}

				// CHECK-LABEL: @test_vldrhq_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
				// CHECK-NEXT: ret <8 x i16> [[TMP1]]
				//
				uint16x8_t test_vldrhq_u16(const uint16_t *base)
				{
				return vldrhq_u16(base);
				}

				// CHECK-LABEL: @test_vldrhq_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> [[TMP0]], align 2
				// CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i16> [[TMP1]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				uint32x4_t test_vldrhq_u32(const uint16_t *base)
				{
				return vldrhq_u32(base);
				}

				// CHECK-LABEL: @test_vldrhq_z_f16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast half [[BASE:%.]] to <8 x half>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <8 x half> @llvm.masked.load.v8f16.p0v8f16(<8 x half> [[TMP0]], i32 2, <8 x i1> [[TMP2]], <8 x half> zeroinitializer)
				// CHECK-NEXT: ret <8 x half> [[TMP3]]
				//
				float16x8_t test_vldrhq_z_f16(const float16_t *base, mve_pred16_t p)
				{
				return vldrhq_z_f16(base, p);
				}

				// CHECK-LABEL: @test_vldrhq_z_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16> [[TMP0]], i32 2, <8 x i1> [[TMP2]], <8 x i16> zeroinitializer)
				// CHECK-NEXT: ret <8 x i16> [[TMP3]]
				//
				int16x8_t test_vldrhq_z_s16(const int16_t *base, mve_pred16_t p)
				{
				return vldrhq_z_s16(base, p);
				}

				// CHECK-LABEL: @test_vldrhq_z_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16> [[TMP0]], i32 2, <4 x i1> [[TMP2]], <4 x i16> zeroinitializer)
				// CHECK-NEXT: [[TMP4:%.*]] = sext <4 x i16> [[TMP3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP4]]
				//
				int32x4_t test_vldrhq_z_s32(const int16_t *base, mve_pred16_t p)
				{
				return vldrhq_z_s32(base, p);
				}

				// CHECK-LABEL: @test_vldrhq_z_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16> [[TMP0]], i32 2, <8 x i1> [[TMP2]], <8 x i16> zeroinitializer)
				// CHECK-NEXT: ret <8 x i16> [[TMP3]]
				//
				uint16x8_t test_vldrhq_z_u16(const uint16_t *base, mve_pred16_t p)
				{
				return vldrhq_z_u16(base, p);
				}

				// CHECK-LABEL: @test_vldrhq_z_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16> [[TMP0]], i32 2, <4 x i1> [[TMP2]], <4 x i16> zeroinitializer)
				// CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i16> [[TMP3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP4]]
				//
				uint32x4_t test_vldrhq_z_u32(const uint16_t *base, mve_pred16_t p)
				{
				return vldrhq_z_u32(base, p);
				}

				// CHECK-LABEL: @test_vldrwq_f32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast float [[BASE:%.]] to <4 x float>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
				// CHECK-NEXT: ret <4 x float> [[TMP1]]
				//
				float32x4_t test_vldrwq_f32(const float32_t *base)
				{
				return vldrwq_f32(base);
				}

				// CHECK-LABEL: @test_vldrwq_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
				// CHECK-NEXT: ret <4 x i32> [[TMP1]]
				//
				int32x4_t test_vldrwq_s32(const int32_t *base)
				{
				return vldrwq_s32(base);
				}

				// CHECK-LABEL: @test_vldrwq_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
				// CHECK-NEXT: ret <4 x i32> [[TMP1]]
				//
				uint32x4_t test_vldrwq_u32(const uint32_t *base)
				{
				return vldrwq_u32(base);
				}

				// CHECK-LABEL: @test_vldrwq_z_f32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast float [[BASE:%.]] to <4 x float>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP0]], i32 4, <4 x i1> [[TMP2]], <4 x float> zeroinitializer)
				// CHECK-NEXT: ret <4 x float> [[TMP3]]
				//
				float32x4_t test_vldrwq_z_f32(const float32_t *base, mve_pred16_t p)
				{
				return vldrwq_z_f32(base, p);
				}

				// CHECK-LABEL: @test_vldrwq_z_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP0]], i32 4, <4 x i1> [[TMP2]], <4 x i32> zeroinitializer)
				// CHECK-NEXT: ret <4 x i32> [[TMP3]]
				//
				int32x4_t test_vldrwq_z_s32(const int32_t *base, mve_pred16_t p)
				{
				return vldrwq_z_s32(base, p);
				}

				// CHECK-LABEL: @test_vldrwq_z_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP0]], i32 4, <4 x i1> [[TMP2]], <4 x i32> zeroinitializer)
				// CHECK-NEXT: ret <4 x i32> [[TMP3]]
				//
				uint32x4_t test_vldrwq_z_u32(const uint32_t *base, mve_pred16_t p)
				{
				return vldrwq_z_u32(base, p);
				}

				// CHECK-LABEL: @test_vst1q_f16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast half [[BASE:%.]] to <8 x half>
				// CHECK-NEXT: store <8 x half> [[VALUE:%.]], <8 x half> [[TMP0]], align 2
				// CHECK-NEXT: ret void
				//
				void test_vst1q_f16(float16_t *base, float16x8_t value)
				{
				#ifdef POLYMORPHIC
				vst1q(base, value);
				#else /* POLYMORPHIC */
				vst1q_f16(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_f32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast float [[BASE:%.]] to <4 x float>
				// CHECK-NEXT: store <4 x float> [[VALUE:%.]], <4 x float> [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				void test_vst1q_f32(float32_t *base, float32x4_t value)
				{
				#ifdef POLYMORPHIC
				vst1q(base, value);
				#else /* POLYMORPHIC */
				vst1q_f32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: store <16 x i8> [[VALUE:%.]], <16 x i8> [[TMP0]], align 1
				// CHECK-NEXT: ret void
				//
				void test_vst1q_s8(int8_t *base, int8x16_t value)
				{
				#ifdef POLYMORPHIC
				vst1q(base, value);
				#else /* POLYMORPHIC */
				vst1q_s8(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: store <8 x i16> [[VALUE:%.]], <8 x i16> [[TMP0]], align 2
				// CHECK-NEXT: ret void
				//
				void test_vst1q_s16(int16_t *base, int16x8_t value)
				{
				#ifdef POLYMORPHIC
				vst1q(base, value);
				#else /* POLYMORPHIC */
				vst1q_s16(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: store <4 x i32> [[VALUE:%.]], <4 x i32> [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				void test_vst1q_s32(int32_t *base, int32x4_t value)
				{
				#ifdef POLYMORPHIC
				vst1q(base, value);
				#else /* POLYMORPHIC */
				vst1q_s32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: store <16 x i8> [[VALUE:%.]], <16 x i8> [[TMP0]], align 1
				// CHECK-NEXT: ret void
				//
				void test_vst1q_u8(uint8_t *base, uint8x16_t value)
				{
				#ifdef POLYMORPHIC
				vst1q(base, value);
				#else /* POLYMORPHIC */
				vst1q_u8(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: store <8 x i16> [[VALUE:%.]], <8 x i16> [[TMP0]], align 2
				// CHECK-NEXT: ret void
				//
				void test_vst1q_u16(uint16_t *base, uint16x8_t value)
				{
				#ifdef POLYMORPHIC
				vst1q(base, value);
				#else /* POLYMORPHIC */
				vst1q_u16(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: store <4 x i32> [[VALUE:%.]], <4 x i32> [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				void test_vst1q_u32(uint32_t *base, uint32x4_t value)
				{
				#ifdef POLYMORPHIC
				vst1q(base, value);
				#else /* POLYMORPHIC */
				vst1q_u32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_p_f16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast half [[BASE:%.]] to <8 x half>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v8f16.p0v8f16(<8 x half> [[VALUE:%.]], <8 x half> [[TMP0]], i32 2, <8 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vst1q_p_f16(float16_t *base, float16x8_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vst1q_p(base, value, p);
				#else /* POLYMORPHIC */
				vst1q_p_f16(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_p_f32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast float [[BASE:%.]] to <4 x float>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v4f32.p0v4f32(<4 x float> [[VALUE:%.]], <4 x float> [[TMP0]], i32 4, <4 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vst1q_p_f32(float32_t *base, float32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vst1q_p(base, value, p);
				#else /* POLYMORPHIC */
				vst1q_p_f32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_p_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[VALUE:%.]], <16 x i8> [[TMP0]], i32 1, <16 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vst1q_p_s8(int8_t *base, int8x16_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vst1q_p(base, value, p);
				#else /* POLYMORPHIC */
				vst1q_p_s8(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_p_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> [[VALUE:%.]], <8 x i16> [[TMP0]], i32 2, <8 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vst1q_p_s16(int16_t *base, int16x8_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vst1q_p(base, value, p);
				#else /* POLYMORPHIC */
				vst1q_p_s16(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_p_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[VALUE:%.]], <4 x i32> [[TMP0]], i32 4, <4 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vst1q_p_s32(int32_t *base, int32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vst1q_p(base, value, p);
				#else /* POLYMORPHIC */
				vst1q_p_s32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_p_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[VALUE:%.]], <16 x i8> [[TMP0]], i32 1, <16 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vst1q_p_u8(uint8_t *base, uint8x16_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vst1q_p(base, value, p);
				#else /* POLYMORPHIC */
				vst1q_p_u8(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_p_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> [[VALUE:%.]], <8 x i16> [[TMP0]], i32 2, <8 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vst1q_p_u16(uint16_t *base, uint16x8_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vst1q_p(base, value, p);
				#else /* POLYMORPHIC */
				vst1q_p_u16(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vst1q_p_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[VALUE:%.]], <4 x i32> [[TMP0]], i32 4, <4 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vst1q_p_u32(uint32_t *base, uint32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vst1q_p(base, value, p);
				#else /* POLYMORPHIC */
				vst1q_p_u32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: store <16 x i8> [[VALUE:%.]], <16 x i8> [[TMP0]], align 1
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_s8(int8_t *base, int8x16_t value)
				{
				#ifdef POLYMORPHIC
				vstrbq(base, value);
				#else /* POLYMORPHIC */
				vstrbq_s8(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <8 x i16> [[VALUE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[BASE:%.]] to <8 x i8>
				// CHECK-NEXT: store <8 x i8> [[TMP0]], <8 x i8>* [[TMP1]], align 1
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_s16(int8_t *base, int16x8_t value)
				{
				#ifdef POLYMORPHIC
				vstrbq(base, value);
				#else /* POLYMORPHIC */
				vstrbq_s16(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[VALUE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[BASE:%.]] to <4 x i8>
				// CHECK-NEXT: store <4 x i8> [[TMP0]], <4 x i8>* [[TMP1]], align 1
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_s32(int8_t *base, int32x4_t value)
				{
				#ifdef POLYMORPHIC
				vstrbq(base, value);
				#else /* POLYMORPHIC */
				vstrbq_s32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: store <16 x i8> [[VALUE:%.]], <16 x i8> [[TMP0]], align 1
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_u8(uint8_t *base, uint8x16_t value)
				{
				#ifdef POLYMORPHIC
				vstrbq(base, value);
				#else /* POLYMORPHIC */
				vstrbq_u8(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <8 x i16> [[VALUE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[BASE:%.]] to <8 x i8>
				// CHECK-NEXT: store <8 x i8> [[TMP0]], <8 x i8>* [[TMP1]], align 1
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_u16(uint8_t *base, uint16x8_t value)
				{
				#ifdef POLYMORPHIC
				vstrbq(base, value);
				#else /* POLYMORPHIC */
				vstrbq_u16(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[VALUE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[BASE:%.]] to <4 x i8>
				// CHECK-NEXT: store <4 x i8> [[TMP0]], <4 x i8>* [[TMP1]], align 1
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_u32(uint8_t *base, uint32x4_t value)
				{
				#ifdef POLYMORPHIC
				vstrbq(base, value);
				#else /* POLYMORPHIC */
				vstrbq_u32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_p_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[VALUE:%.]], <16 x i8> [[TMP0]], i32 1, <16 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_p_s8(int8_t *base, int8x16_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrbq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrbq_p_s8(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_p_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <8 x i16> [[VALUE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[BASE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP3:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP2]])
				// CHECK-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP0]], <8 x i8>* [[TMP1]], i32 1, <8 x i1> [[TMP3]])
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_p_s16(int8_t *base, int16x8_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrbq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrbq_p_s16(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_p_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[VALUE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[BASE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP2:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP2]])
				// CHECK-NEXT: call void @llvm.masked.store.v4i8.p0v4i8(<4 x i8> [[TMP0]], <4 x i8>* [[TMP1]], i32 1, <4 x i1> [[TMP3]])
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_p_s32(int8_t *base, int32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrbq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrbq_p_s32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_p_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[BASE:%.]] to <16 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[VALUE:%.]], <16 x i8> [[TMP0]], i32 1, <16 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_p_u8(uint8_t *base, uint8x16_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrbq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrbq_p_u8(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_p_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <8 x i16> [[VALUE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[BASE:%.]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP3:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP2]])
				// CHECK-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP0]], <8 x i8>* [[TMP1]], i32 1, <8 x i1> [[TMP3]])
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_p_u16(uint8_t *base, uint16x8_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrbq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrbq_p_u16(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrbq_p_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[VALUE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[BASE:%.]] to <4 x i8>
				// CHECK-NEXT: [[TMP2:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP2]])
				// CHECK-NEXT: call void @llvm.masked.store.v4i8.p0v4i8(<4 x i8> [[TMP0]], <4 x i8>* [[TMP1]], i32 1, <4 x i1> [[TMP3]])
				// CHECK-NEXT: ret void
				//
				void test_vstrbq_p_u32(uint8_t *base, uint32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrbq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrbq_p_u32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_f16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast half [[BASE:%.]] to <8 x half>
				// CHECK-NEXT: store <8 x half> [[VALUE:%.]], <8 x half> [[TMP0]], align 2
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_f16(float16_t *base, float16x8_t value)
				{
				#ifdef POLYMORPHIC
				vstrhq(base, value);
				#else /* POLYMORPHIC */
				vstrhq_f16(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: store <8 x i16> [[VALUE:%.]], <8 x i16> [[TMP0]], align 2
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_s16(int16_t *base, int16x8_t value)
				{
				#ifdef POLYMORPHIC
				vstrhq(base, value);
				#else /* POLYMORPHIC */
				vstrhq_s16(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[VALUE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[BASE:%.]] to <4 x i16>
				// CHECK-NEXT: store <4 x i16> [[TMP0]], <4 x i16>* [[TMP1]], align 2
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_s32(int16_t *base, int32x4_t value)
				{
				#ifdef POLYMORPHIC
				vstrhq(base, value);
				#else /* POLYMORPHIC */
				vstrhq_s32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: store <8 x i16> [[VALUE:%.]], <8 x i16> [[TMP0]], align 2
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_u16(uint16_t *base, uint16x8_t value)
				{
				#ifdef POLYMORPHIC
				vstrhq(base, value);
				#else /* POLYMORPHIC */
				vstrhq_u16(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[VALUE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[BASE:%.]] to <4 x i16>
				// CHECK-NEXT: store <4 x i16> [[TMP0]], <4 x i16>* [[TMP1]], align 2
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_u32(uint16_t *base, uint32x4_t value)
				{
				#ifdef POLYMORPHIC
				vstrhq(base, value);
				#else /* POLYMORPHIC */
				vstrhq_u32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_p_f16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast half [[BASE:%.]] to <8 x half>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v8f16.p0v8f16(<8 x half> [[VALUE:%.]], <8 x half> [[TMP0]], i32 2, <8 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_p_f16(float16_t *base, float16x8_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrhq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrhq_p_f16(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_p_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> [[VALUE:%.]], <8 x i16> [[TMP0]], i32 2, <8 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_p_s16(int16_t *base, int16x8_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrhq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrhq_p_s16(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_p_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[VALUE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[BASE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP2:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP2]])
				// CHECK-NEXT: call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> [[TMP0]], <4 x i16>* [[TMP1]], i32 2, <4 x i1> [[TMP3]])
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_p_s32(int16_t *base, int32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrhq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrhq_p_s32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_p_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[BASE:%.]] to <8 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> [[VALUE:%.]], <8 x i16> [[TMP0]], i32 2, <8 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_p_u16(uint16_t *base, uint16x8_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrhq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrhq_p_u16(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrhq_p_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[VALUE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[BASE:%.]] to <4 x i16>
				// CHECK-NEXT: [[TMP2:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP2]])
				// CHECK-NEXT: call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> [[TMP0]], <4 x i16>* [[TMP1]], i32 2, <4 x i1> [[TMP3]])
				// CHECK-NEXT: ret void
				//
				void test_vstrhq_p_u32(uint16_t *base, uint32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrhq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrhq_p_u32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrwq_f32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast float [[BASE:%.]] to <4 x float>
				// CHECK-NEXT: store <4 x float> [[VALUE:%.]], <4 x float> [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				void test_vstrwq_f32(float32_t *base, float32x4_t value)
				{
				#ifdef POLYMORPHIC
				vstrwq(base, value);
				#else /* POLYMORPHIC */
				vstrwq_f32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrwq_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: store <4 x i32> [[VALUE:%.]], <4 x i32> [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				void test_vstrwq_s32(int32_t *base, int32x4_t value)
				{
				#ifdef POLYMORPHIC
				vstrwq(base, value);
				#else /* POLYMORPHIC */
				vstrwq_s32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrwq_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: store <4 x i32> [[VALUE:%.]], <4 x i32> [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				void test_vstrwq_u32(uint32_t *base, uint32x4_t value)
				{
				#ifdef POLYMORPHIC
				vstrwq(base, value);
				#else /* POLYMORPHIC */
				vstrwq_u32(base, value);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrwq_p_f32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast float [[BASE:%.]] to <4 x float>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v4f32.p0v4f32(<4 x float> [[VALUE:%.]], <4 x float> [[TMP0]], i32 4, <4 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vstrwq_p_f32(float32_t *base, float32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrwq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrwq_p_f32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrwq_p_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[VALUE:%.]], <4 x i32> [[TMP0]], i32 4, <4 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vstrwq_p_s32(int32_t *base, int32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrwq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrwq_p_s32(base, value, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vstrwq_p_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[BASE:%.]] to <4 x i32>
				// CHECK-NEXT: [[TMP1:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
				// CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[VALUE:%.]], <4 x i32> [[TMP0]], i32 4, <4 x i1> [[TMP2]])
				// CHECK-NEXT: ret void
				//
				void test_vstrwq_p_u32(uint32_t *base, uint32x4_t value, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				vstrwq_p(base, value, p);
				#else /* POLYMORPHIC */
				vstrwq_p_u32(base, value, p);
				#endif /* POLYMORPHIC */
				}

clang/utils/TableGen/MveEmitter.cpp

Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	public:
}		}
};		};

class VectorType : public CRegularNamedType {		class VectorType : public CRegularNamedType {
const ScalarType *Element;		const ScalarType *Element;
unsigned Lanes;		unsigned Lanes;

public:		public:
VectorType(const ScalarType *Element)		VectorType(const ScalarType *Element, unsigned Lanes)
: CRegularNamedType(TypeKind::Vector), Element(Element) {		: CRegularNamedType(TypeKind::Vector), Element(Element), Lanes(Lanes) {}
// MVE has a fixed 128-bit vector size		unsigned sizeInBits() const override { return Lanes * Element->sizeInBits(); }
Lanes = 128 / Element->sizeInBits();
}
unsigned sizeInBits() const override { return 128; }
unsigned lanes() const { return Lanes; }		unsigned lanes() const { return Lanes; }
bool requiresFloat() const override { return Element->requiresFloat(); }		bool requiresFloat() const override { return Element->requiresFloat(); }
std::string cNameBase() const override {		std::string cNameBase() const override {
return Element->cNameBase() + "x" + utostr(Lanes);		return Element->cNameBase() + "x" + utostr(Lanes);
}		}
std::string llvmName() const override {		std::string llvmName() const override {
return "llvm::VectorType::get(" + Element->llvmName() + ", " +		return "llvm::VectorType::get(" + Element->llvmName() + ", " +
utostr(Lanes) + ")";		utostr(Lanes) + ")";
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	public:
virtual ~Result() = default;		virtual ~Result() = default;
using Scope = std::map<std::string, Ptr>;		using Scope = std::map<std::string, Ptr>;
virtual void genCode(raw_ostream &OS, CodeGenParamAllocator &) const = 0;		virtual void genCode(raw_ostream &OS, CodeGenParamAllocator &) const = 0;
virtual bool hasIntegerConstantValue() const { return false; }		virtual bool hasIntegerConstantValue() const { return false; }
virtual uint32_t integerConstantValue() const { return 0; }		virtual uint32_t integerConstantValue() const { return 0; }
virtual std::string typeName() const { return "Value *"; }		virtual std::string typeName() const { return "Value *"; }

// Mostly, when a code-generation operation has a dependency on prior		// Mostly, when a code-generation operation has a dependency on prior
// operations, it's because it uses the output values of those operations as		// operations, it's because it uses the output values of those operations as
		dmgreenUnsubmitted Not Done Reply Inline Actions Maybe update this comment? dmgreen: Maybe update this comment?
// inputs. But there's one exception, which is the use of 'seq' in Tablegen		// inputs. But there's one exception, which is the use of 'seq' in Tablegen
// to indicate that operations have to be performed in sequence regardless of		// to indicate that operations have to be performed in sequence regardless of
// whether they use each others' output values.		// whether they use each others' output values.
//		//
// So, the actual generation of code is done by depth-first search, using the		// So, the actual generation of code is done by depth-first search, using the
// prerequisites() method to get a list of all the other Results that have to		// prerequisites() method to get a list of all the other Results that have to
// be computed before this one. That method divides into the 'predecessor',		// be computed before this one. That method divides into the 'predecessor',
// set by setPredecessor() while processing a 'seq' dag node, and the list		// set by setPredecessor() while processing a 'seq' dag node, and the list
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	OS << "Builder.CreateIntCast(" << V->varname() << ", "
: "false")		: "false")
<< ")";		<< ")";
}		}
void morePrerequisites(std::vector<Ptr> &output) const override {		void morePrerequisites(std::vector<Ptr> &output) const override {
output.push_back(V);		output.push_back(V);
}		}
};		};

		// Result subclass representing a cast between different pointer types.
		class PointerCastResult : public Result {
		public:
		const PointerType *PtrType;
		Ptr V;
		PointerCastResult(const PointerType *PtrType, Ptr V)
		: PtrType(PtrType), V(V) {}
		void genCode(raw_ostream &OS,
		CodeGenParamAllocator &ParamAlloc) const override {
		OS << "Builder.CreatePointerCast(" << V->asValue() << ", "
		<< ParamAlloc.allocParam("llvm::Type *", PtrType->llvmName()) << ")";
		}
		void morePrerequisites(std::vector<Ptr> &output) const override {
		output.push_back(V);
		}
		};

// Result subclass representing a call to an IRBuilder method. Each IRBuilder		// Result subclass representing a call to an IRBuilder method. Each IRBuilder
// method we want to use will have a Tablegen record giving the method name and		// method we want to use will have a Tablegen record giving the method name and
// describing any important details of how to call it, such as whether a		// describing any important details of how to call it, such as whether a
// particular argument should be an integer constant instead of an llvm::Value.		// particular argument should be an integer constant instead of an llvm::Value.
class IRBuilderResult : public Result {		class IRBuilderResult : public Result {
public:		public:
StringRef BuilderMethod;		StringRef CallPrefix;
std::vector<Ptr> Args;		std::vector<Ptr> Args;
std::set<unsigned> AddressArgs;		std::set<unsigned> AddressArgs;
std::set<unsigned> IntConstantArgs;		std::set<unsigned> IntConstantArgs;
IRBuilderResult(StringRef BuilderMethod, std::vector<Ptr> Args,		IRBuilderResult(StringRef CallPrefix, std::vector<Ptr> Args,
std::set<unsigned> AddressArgs,		std::set<unsigned> AddressArgs,
std::set<unsigned> IntConstantArgs)		std::set<unsigned> IntConstantArgs)
: BuilderMethod(BuilderMethod), Args(Args), AddressArgs(AddressArgs),		: CallPrefix(CallPrefix), Args(Args), AddressArgs(AddressArgs),
IntConstantArgs(IntConstantArgs) {}		IntConstantArgs(IntConstantArgs) {}
void genCode(raw_ostream &OS,		void genCode(raw_ostream &OS,
CodeGenParamAllocator &ParamAlloc) const override {		CodeGenParamAllocator &ParamAlloc) const override {
OS << "Builder." << BuilderMethod << "(";		OS << CallPrefix;
const char *Sep = "";		const char *Sep = "";
for (unsigned i = 0, e = Args.size(); i < e; ++i) {		for (unsigned i = 0, e = Args.size(); i < e; ++i) {
Ptr Arg = Args[i];		Ptr Arg = Args[i];
if (IntConstantArgs.find(i) != IntConstantArgs.end()) {		if (IntConstantArgs.find(i) != IntConstantArgs.end()) {
assert(Arg->hasIntegerConstantValue());		assert(Arg->hasIntegerConstantValue());
OS << Sep		OS << Sep
<< ParamAlloc.allocParam("unsigned",		<< ParamAlloc.allocParam("unsigned",
utostr(Arg->integerConstantValue()));		utostr(Arg->integerConstantValue()));
Show All 9 Lines	for (unsigned i = 0, e = Args.size(); i < e; ++i) {
Ptr Arg = Args[i];		Ptr Arg = Args[i];
if (IntConstantArgs.find(i) != IntConstantArgs.end())		if (IntConstantArgs.find(i) != IntConstantArgs.end())
continue;		continue;
output.push_back(Arg);		output.push_back(Arg);
}		}
}		}
};		};

		// Result subclass representing making an Address out of a Value.
		class AddressResult : public Result {
		public:
		Ptr Arg;
		unsigned Align;
		AddressResult(Ptr Arg, unsigned Align) : Arg(Arg), Align(Align) {}
		void genCode(raw_ostream &OS,
		CodeGenParamAllocator &ParamAlloc) const override {
		OS << "Address(" << Arg->varname() << ", CharUnits::fromQuantity("
		<< Align << "))";
		}
		std::string typeName() const override {
		return "Address";
		}
		void morePrerequisites(std::vector<Ptr> &output) const override {
		output.push_back(Arg);
		}
		};

// Result subclass representing a call to an IR intrinsic, which we first have		// Result subclass representing a call to an IR intrinsic, which we first have
// to look up using an Intrinsic::ID constant and an array of types.		// to look up using an Intrinsic::ID constant and an array of types.
class IRIntrinsicResult : public Result {		class IRIntrinsicResult : public Result {
public:		public:
std::string IntrinsicID;		std::string IntrinsicID;
std::vector<const Type *> ParamTypes;		std::vector<const Type *> ParamTypes;
std::vector<Ptr> Args;		std::vector<Ptr> Args;
IRIntrinsicResult(StringRef IntrinsicID, std::vector<const Type *> ParamTypes,		IRIntrinsicResult(StringRef IntrinsicID, std::vector<const Type *> ParamTypes,
std::vector<Ptr> Args)		std::vector<Ptr> Args)
: IntrinsicID(IntrinsicID), ParamTypes(ParamTypes), Args(Args) {}		: IntrinsicID(IntrinsicID), ParamTypes(ParamTypes), Args(Args) {}
void genCode(raw_ostream &OS,		void genCode(raw_ostream &OS,
CodeGenParamAllocator &ParamAlloc) const override {		CodeGenParamAllocator &ParamAlloc) const override {
std::string IntNo = ParamAlloc.allocParam(		std::string IntNo = ParamAlloc.allocParam(
"Intrinsic::ID", "Intrinsic::arm_mve_" + IntrinsicID);		"Intrinsic::ID", "Intrinsic::" + IntrinsicID);
OS << "Builder.CreateCall(CGM.getIntrinsic(" << IntNo;		OS << "Builder.CreateCall(CGM.getIntrinsic(" << IntNo;
if (!ParamTypes.empty()) {		if (!ParamTypes.empty()) {
OS << ", llvm::SmallVector<llvm::Type *, " << ParamTypes.size() << "> {";		OS << ", llvm::SmallVector<llvm::Type *, " << ParamTypes.size() << "> {";
const char *Sep = "";		const char *Sep = "";
for (auto T : ParamTypes) {		for (auto T : ParamTypes) {
OS << Sep << ParamAlloc.allocParam("llvm::Type *", T->llvmName());		OS << Sep << ParamAlloc.allocParam("llvm::Type *", T->llvmName());
Sep = ", ";		Sep = ", ";
}		}
OS << "}";		OS << "}";
}		}
OS << "), llvm::SmallVector<Value *, " << Args.size() << "> {";		OS << "), llvm::SmallVector<Value *, " << Args.size() << "> {";
const char *Sep = "";		const char *Sep = "";
for (auto Arg : Args) {		for (auto Arg : Args) {
OS << Sep << Arg->asValue();		OS << Sep << Arg->asValue();
Sep = ", ";		Sep = ", ";
}		}
OS << "})";		OS << "})";
}		}
void morePrerequisites(std::vector<Ptr> &output) const override {		void morePrerequisites(std::vector<Ptr> &output) const override {
output.insert(output.end(), Args.begin(), Args.end());		output.insert(output.end(), Args.begin(), Args.end());
}		}
};		};

		// Result subclass that specifies a type, for use in IRBuilder operations such
		// as CreateBitCast that take a type argument.
		class TypeResult : public Result {
		public:
		const Type *T;
		TypeResult(const Type *T) : T(T) {}
		void genCode(raw_ostream &OS, CodeGenParamAllocator &) const override {
		OS << T->llvmName();
		}
		std::string typeName() const override {
		return "llvm::Type *";
		}
		};

// -----------------------------------------------------------------------------		// -----------------------------------------------------------------------------
// Class that describes a single ACLE intrinsic.		// Class that describes a single ACLE intrinsic.
//		//
// A Tablegen record will typically describe more than one ACLE intrinsic, by		// A Tablegen record will typically describe more than one ACLE intrinsic, by
// means of setting the 'list<Type> Params' field to a list of multiple		// means of setting the 'list<Type> Params' field to a list of multiple
// parameter types, so as to define vaddq_{s8,u8,...,f16,f32} all in one go.		// parameter types, so as to define vaddq_{s8,u8,...,f16,f32} all in one go.
// We'll end up with one instance of ACLEIntrinsic for each parameter type,		// We'll end up with one instance of ACLEIntrinsic for each parameter type,
// rather than a single one for all of them. Hence, the constructor takes both		// rather than a single one for all of them. Hence, the constructor takes both
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
// -----------------------------------------------------------------------------		// -----------------------------------------------------------------------------
// The top-level class that holds all the state from analyzing the entire		// The top-level class that holds all the state from analyzing the entire
// Tablegen input.		// Tablegen input.

class MveEmitter {		class MveEmitter {
// MveEmitter holds a collection of all the types we've instantiated.		// MveEmitter holds a collection of all the types we've instantiated.
VoidType Void;		VoidType Void;
std::map<std::string, std::unique_ptr<ScalarType>> ScalarTypes;		std::map<std::string, std::unique_ptr<ScalarType>> ScalarTypes;
std::map<std::pair<ScalarTypeKind, unsigned>, std::unique_ptr<VectorType>>		std::map<std::tuple<ScalarTypeKind, unsigned, unsigned>,
		std::unique_ptr<VectorType>>
VectorTypes;		VectorTypes;
std::map<std::pair<std::string, unsigned>, std::unique_ptr<MultiVectorType>>		std::map<std::pair<std::string, unsigned>, std::unique_ptr<MultiVectorType>>
MultiVectorTypes;		MultiVectorTypes;
std::map<unsigned, std::unique_ptr<PredicateType>> PredicateTypes;		std::map<unsigned, std::unique_ptr<PredicateType>> PredicateTypes;
std::map<std::string, std::unique_ptr<PointerType>> PointerTypes;		std::map<std::string, std::unique_ptr<PointerType>> PointerTypes;

// And all the ACLEIntrinsic instances we've created.		// And all the ACLEIntrinsic instances we've created.
std::map<std::string, std::unique_ptr<ACLEIntrinsic>> ACLEIntrinsics;		std::map<std::string, std::unique_ptr<ACLEIntrinsic>> ACLEIntrinsics;

public:		public:
// Methods to create a Type object, or return the right existing one from the		// Methods to create a Type object, or return the right existing one from the
// maps stored in this object.		// maps stored in this object.
const VoidType *getVoidType() { return &Void; }		const VoidType *getVoidType() { return &Void; }
const ScalarType *getScalarType(StringRef Name) {		const ScalarType *getScalarType(StringRef Name) {
return ScalarTypes[Name].get();		return ScalarTypes[Name].get();
}		}
const ScalarType getScalarType(Record R) {		const ScalarType getScalarType(Record R) {
return getScalarType(R->getName());		return getScalarType(R->getName());
}		}
const VectorType getVectorType(const ScalarType ST) {		const VectorType getVectorType(const ScalarType ST, unsigned Lanes) {
std::pair<ScalarTypeKind, unsigned> key(ST->kind(), ST->sizeInBits());		std::tuple<ScalarTypeKind, unsigned, unsigned> key(ST->kind(),
		ST->sizeInBits(), Lanes);
if (VectorTypes.find(key) == VectorTypes.end())		if (VectorTypes.find(key) == VectorTypes.end())
VectorTypes[key] = std::make_unique<VectorType>(ST);		VectorTypes[key] = std::make_unique<VectorType>(ST, Lanes);
return VectorTypes[key].get();		return VectorTypes[key].get();
}		}
		const VectorType getVectorType(const ScalarType ST) {
		return getVectorType(ST, 128 / ST->sizeInBits());
		}
const MultiVectorType *getMultiVectorType(unsigned Registers,		const MultiVectorType *getMultiVectorType(unsigned Registers,
const VectorType *VT) {		const VectorType *VT) {
std::pair<std::string, unsigned> key(VT->cNameBase(), Registers);		std::pair<std::string, unsigned> key(VT->cNameBase(), Registers);
if (MultiVectorTypes.find(key) == MultiVectorTypes.end())		if (MultiVectorTypes.find(key) == MultiVectorTypes.end())
MultiVectorTypes[key] = std::make_unique<MultiVectorType>(Registers, VT);		MultiVectorTypes[key] = std::make_unique<MultiVectorType>(Registers, VT);
return MultiVectorTypes[key].get();		return MultiVectorTypes[key].get();
}		}
const PredicateType *getPredicateType(unsigned Lanes) {		const PredicateType *getPredicateType(unsigned Lanes) {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	const Type MveEmitter::getType(DagInit D, const Type *Param) {
if (Op->getName() == "CTO_Parameter") {		if (Op->getName() == "CTO_Parameter") {
if (isa<VoidType>(Param))		if (isa<VoidType>(Param))
PrintFatalError("Parametric type in unparametrised context");		PrintFatalError("Parametric type in unparametrised context");
return Param;		return Param;
}		}

if (Op->getName() == "CTO_Vec") {		if (Op->getName() == "CTO_Vec") {
const Type *Element = getType(D->getArg(0), Param);		const Type *Element = getType(D->getArg(0), Param);
		if (D->getNumArgs() == 1) {
return getVectorType(cast<ScalarType>(Element));		return getVectorType(cast<ScalarType>(Element));
		} else {
		const Type *ExistingVector = getType(D->getArg(1), Param);
		return getVectorType(cast<ScalarType>(Element),
		cast<VectorType>(ExistingVector)->lanes());
		}
}		}

if (Op->getName() == "CTO_Pred") {		if (Op->getName() == "CTO_Pred") {
const Type *Element = getType(D->getArg(0), Param);		const Type *Element = getType(D->getArg(0), Param);
return getPredicateType(128 / Element->sizeInBits());		return getPredicateType(128 / Element->sizeInBits());
}		}

if (Op->isSubClassOf("CTO_Tuple")) {		if (Op->isSubClassOf("CTO_Tuple")) {
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (Op->getName() == "seq") {
if (const auto *ST = dyn_cast<ScalarType>(CastType)) {		if (const auto *ST = dyn_cast<ScalarType>(CastType)) {
if (!ST->requiresFloat()) {		if (!ST->requiresFloat()) {
if (Arg->hasIntegerConstantValue())		if (Arg->hasIntegerConstantValue())
return std::make_shared<IntLiteralResult>(		return std::make_shared<IntLiteralResult>(
ST, Arg->integerConstantValue());		ST, Arg->integerConstantValue());
else		else
return std::make_shared<IntCastResult>(ST, Arg);		return std::make_shared<IntCastResult>(ST, Arg);
}		}
		} else if (const auto *PT = dyn_cast<PointerType>(CastType)) {
		return std::make_shared<PointerCastResult>(PT, Arg);
}		}
PrintFatalError("Unsupported type cast");		PrintFatalError("Unsupported type cast");
		} else if (Op->getName() == "address") {
		if (D->getNumArgs() != 2)
		PrintFatalError("'address' should have two arguments");
		Result::Ptr Arg = getCodeForDagArg(D, 0, Scope, Param);
		unsigned Alignment;
		if (auto *II = dyn_cast<IntInit>(D->getArg(1))) {
		Alignment = II->getValue();
		} else {
		PrintFatalError("'address' alignment argument should be an integer");
		}
		return std::make_shared<AddressResult>(Arg, Alignment);
} else if (Op->getName() == "unsignedflag") {		} else if (Op->getName() == "unsignedflag") {
if (D->getNumArgs() != 1)		if (D->getNumArgs() != 1)
PrintFatalError("unsignedflag should have exactly one argument");		PrintFatalError("unsignedflag should have exactly one argument");
Record *TypeRec = cast<DefInit>(D->getArg(0))->getDef();		Record *TypeRec = cast<DefInit>(D->getArg(0))->getDef();
if (!TypeRec->isSubClassOf("Type"))		if (!TypeRec->isSubClassOf("Type"))
PrintFatalError("unsignedflag's argument should be a type");		PrintFatalError("unsignedflag's argument should be a type");
if (const auto *ST = dyn_cast<ScalarType>(getType(TypeRec, Param))) {		if (const auto *ST = dyn_cast<ScalarType>(getType(TypeRec, Param))) {
return std::make_shared<IntLiteralResult>(		return std::make_shared<IntLiteralResult>(
getScalarType("u32"), ST->kind() == ScalarTypeKind::UnsignedInt);		getScalarType("u32"), ST->kind() == ScalarTypeKind::UnsignedInt);
} else {		} else {
PrintFatalError("unsignedflag's argument should be a scalar type");		PrintFatalError("unsignedflag's argument should be a scalar type");
}		}
} else {		} else {
std::vector<Result::Ptr> Args;		std::vector<Result::Ptr> Args;
for (unsigned i = 0, e = D->getNumArgs(); i < e; ++i)		for (unsigned i = 0, e = D->getNumArgs(); i < e; ++i)
Args.push_back(getCodeForDagArg(D, i, Scope, Param));		Args.push_back(getCodeForDagArg(D, i, Scope, Param));
if (Op->isSubClassOf("IRBuilder")) {		if (Op->isSubClassOf("IRBuilderBase")) {
std::set<unsigned> AddressArgs;		std::set<unsigned> AddressArgs;
for (unsigned i : Op->getValueAsListOfInts("address_params"))		for (unsigned i : Op->getValueAsListOfInts("address_params"))
AddressArgs.insert(i);		AddressArgs.insert(i);
std::set<unsigned> IntConstantArgs;		std::set<unsigned> IntConstantArgs;
for (unsigned i : Op->getValueAsListOfInts("int_constant_params"))		for (unsigned i : Op->getValueAsListOfInts("int_constant_params"))
IntConstantArgs.insert(i);		IntConstantArgs.insert(i);
return std::make_shared<IRBuilderResult>(		return std::make_shared<IRBuilderResult>(
Op->getValueAsString("func"), Args, AddressArgs, IntConstantArgs);		Op->getValueAsString("prefix"), Args, AddressArgs, IntConstantArgs);
} else if (Op->isSubClassOf("IRInt")) {		} else if (Op->isSubClassOf("IRIntBase")) {
std::vector<const Type *> ParamTypes;		std::vector<const Type *> ParamTypes;
for (Record *RParam : Op->getValueAsListOfDefs("params"))		for (Record *RParam : Op->getValueAsListOfDefs("params"))
ParamTypes.push_back(getType(RParam, Param));		ParamTypes.push_back(getType(RParam, Param));
std::string IntName = Op->getValueAsString("intname");		std::string IntName = Op->getValueAsString("intname");
if (Op->getValueAsBit("appendKind"))		if (Op->getValueAsBit("appendKind"))
IntName += "_" + toLetter(cast<ScalarType>(Param)->kind());		IntName += "_" + toLetter(cast<ScalarType>(Param)->kind());
return std::make_shared<IRIntrinsicResult>(IntName, ParamTypes, Args);		return std::make_shared<IRIntrinsicResult>(IntName, ParamTypes, Args);
} else {		} else {
Show All 20 Lines	Result::Ptr MveEmitter::getCodeForDagArg(DagInit *D, unsigned ArgNum,

if (auto *II = dyn_cast<IntInit>(Arg))		if (auto *II = dyn_cast<IntInit>(Arg))
return std::make_shared<IntLiteralResult>(getScalarType("u32"),		return std::make_shared<IntLiteralResult>(getScalarType("u32"),
II->getValue());		II->getValue());

if (auto *DI = dyn_cast<DagInit>(Arg))		if (auto *DI = dyn_cast<DagInit>(Arg))
return getCodeForDag(DI, Scope, Param);		return getCodeForDag(DI, Scope, Param);

		if (auto *DI = dyn_cast<DefInit>(Arg)) {
		Record *Rec = DI->getDef();
		if (Rec->isSubClassOf("Type")) {
		const Type *T = getType(Rec, Param);
		return std::make_shared<TypeResult>(T);
		}
		}

PrintFatalError("bad dag argument type for code generation");		PrintFatalError("bad dag argument type for code generation");
}		}

Result::Ptr MveEmitter::getCodeForArg(unsigned ArgNum, const Type *ArgType) {		Result::Ptr MveEmitter::getCodeForArg(unsigned ArgNum, const Type *ArgType) {
Result::Ptr V =		Result::Ptr V =
std::make_shared<BuiltinArgResult>(ArgNum, isa<PointerType>(ArgType));		std::make_shared<BuiltinArgResult>(ArgNum, isa<PointerType>(ArgType));

if (const auto *ST = dyn_cast<ScalarType>(ArgType)) {		if (const auto *ST = dyn_cast<ScalarType>(ArgType)) {
if (ST->isInteger() && ST->sizeInBits() < 32)		if (ST->isInteger() && ST->sizeInBits() < 32)
V = std::make_shared<IntCastResult>(getScalarType("u32"), V);		V = std::make_shared<IntCastResult>(getScalarType("u32"), V);
} else if (const auto *PT = dyn_cast<PredicateType>(ArgType)) {		} else if (const auto *PT = dyn_cast<PredicateType>(ArgType)) {
V = std::make_shared<IntCastResult>(getScalarType("u32"), V);		V = std::make_shared<IntCastResult>(getScalarType("u32"), V);
V = std::make_shared<IRIntrinsicResult>(		V = std::make_shared<IRIntrinsicResult>("arm_mve_pred_i2v",
"pred_i2v", std::vector<const Type *>{PT}, std::vector<Result::Ptr>{V});		std::vector<const Type *>{PT},
		std::vector<Result::Ptr>{V});
}		}

return V;		return V;
}		}

ACLEIntrinsic::ACLEIntrinsic(MveEmitter &ME, Record R, const Type Param)		ACLEIntrinsic::ACLEIntrinsic(MveEmitter &ME, Record R, const Type Param)
: ReturnType(ME.getType(R->getValueAsDef("ret"), Param)) {		: ReturnType(ME.getType(R->getValueAsDef("ret"), Param)) {
// Derive the intrinsic's full name, by taking the name of the		// Derive the intrinsic's full name, by taking the name of the
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-intrinsics/load-store.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -verify-machineinstrs -enable-arm-maskedldst -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc <8 x half> @test_vld1q_f16(half* %base) {
				; CHECK-LABEL: test_vld1q_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrh.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast half* %base to <8 x half>*
				%1 = load <8 x half>, <8 x half>* %0, align 2
				ret <8 x half> %1
				}

				define arm_aapcs_vfpcc <4 x float> @test_vld1q_f32(float* %base) {
				; CHECK-LABEL: test_vld1q_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast float* %base to <4 x float>*
				%1 = load <4 x float>, <4 x float>* %0, align 4
				ret <4 x float> %1
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vld1q_s8(i8* %base) {
				; CHECK-LABEL: test_vld1q_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrb.u8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = load <16 x i8>, <16 x i8>* %0, align 1
				ret <16 x i8> %1
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vld1q_s16(i16* %base) {
				; CHECK-LABEL: test_vld1q_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrh.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = load <8 x i16>, <8 x i16>* %0, align 2
				ret <8 x i16> %1
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vld1q_s32(i32* %base) {
				; CHECK-LABEL: test_vld1q_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = load <4 x i32>, <4 x i32>* %0, align 4
				ret <4 x i32> %1
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vld1q_u8(i8* %base) {
				; CHECK-LABEL: test_vld1q_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrb.u8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = load <16 x i8>, <16 x i8>* %0, align 1
				ret <16 x i8> %1
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vld1q_u16(i16* %base) {
				; CHECK-LABEL: test_vld1q_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrh.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = load <8 x i16>, <8 x i16>* %0, align 2
				ret <8 x i16> %1
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vld1q_u32(i32* %base) {
				; CHECK-LABEL: test_vld1q_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = load <4 x i32>, <4 x i32>* %0, align 4
				ret <4 x i32> %1
				}

				define arm_aapcs_vfpcc <8 x half> @test_vld1q_z_f16(half* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vld1q_z_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrht.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast half* %base to <8 x half>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				%3 = call <8 x half> @llvm.masked.load.v8f16.p0v8f16(<8 x half>* %0, i32 2, <8 x i1> %2, <8 x half> zeroinitializer)
				ret <8 x half> %3
				}

				declare <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32)

				declare <8 x half> @llvm.masked.load.v8f16.p0v8f16(<8 x half>*, i32 immarg, <8 x i1>, <8 x half>)

				define arm_aapcs_vfpcc <4 x float> @test_vld1q_z_f32(float* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vld1q_z_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrwt.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast float* %base to <4 x float>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* %0, i32 4, <4 x i1> %2, <4 x float> zeroinitializer)
				ret <4 x float> %3
				}

				declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)

				declare <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>*, i32 immarg, <4 x i1>, <4 x float>)

				define arm_aapcs_vfpcc <16 x i8> @test_vld1q_z_s8(i8* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vld1q_z_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrbt.u8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %1)
				%3 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* %0, i32 1, <16 x i1> %2, <16 x i8> zeroinitializer)
				ret <16 x i8> %3
				}

				declare <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32)

				declare <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>*, i32 immarg, <16 x i1>, <16 x i8>)

				define arm_aapcs_vfpcc <8 x i16> @test_vld1q_z_s16(i16* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vld1q_z_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrht.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				%3 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* %0, i32 2, <8 x i1> %2, <8 x i16> zeroinitializer)
				ret <8 x i16> %3
				}

				declare <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>*, i32 immarg, <8 x i1>, <8 x i16>)

				define arm_aapcs_vfpcc <4 x i32> @test_vld1q_z_s32(i32* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vld1q_z_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrwt.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %0, i32 4, <4 x i1> %2, <4 x i32> zeroinitializer)
				ret <4 x i32> %3
				}

				declare <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>*, i32 immarg, <4 x i1>, <4 x i32>)

				define arm_aapcs_vfpcc <16 x i8> @test_vld1q_z_u8(i8* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vld1q_z_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrbt.u8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %1)
				%3 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* %0, i32 1, <16 x i1> %2, <16 x i8> zeroinitializer)
				ret <16 x i8> %3
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vld1q_z_u16(i16* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vld1q_z_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrht.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				%3 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* %0, i32 2, <8 x i1> %2, <8 x i16> zeroinitializer)
				ret <8 x i16> %3
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vld1q_z_u32(i32* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vld1q_z_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrwt.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %0, i32 4, <4 x i1> %2, <4 x i32> zeroinitializer)
				ret <4 x i32> %3
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vldrbq_s8(i8* %base) {
				; CHECK-LABEL: test_vldrbq_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrb.u8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = load <16 x i8>, <16 x i8>* %0, align 1
				ret <16 x i8> %1
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vldrbq_s16(i8* %base) {
				; CHECK-LABEL: test_vldrbq_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrb.s16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <8 x i8>*
				%1 = load <8 x i8>, <8 x i8>* %0, align 1
				%2 = sext <8 x i8> %1 to <8 x i16>
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrbq_s32(i8* %base) {
				; CHECK-LABEL: test_vldrbq_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrb.s32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <4 x i8>*
				%1 = load <4 x i8>, <4 x i8>* %0, align 1
				%2 = sext <4 x i8> %1 to <4 x i32>
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vldrbq_u8(i8* %base) {
				; CHECK-LABEL: test_vldrbq_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrb.u8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = load <16 x i8>, <16 x i8>* %0, align 1
				ret <16 x i8> %1
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vldrbq_u16(i8* %base) {
				; CHECK-LABEL: test_vldrbq_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrb.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <8 x i8>*
				%1 = load <8 x i8>, <8 x i8>* %0, align 1
				%2 = zext <8 x i8> %1 to <8 x i16>
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrbq_u32(i8* %base) {
				; CHECK-LABEL: test_vldrbq_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrb.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <4 x i8>*
				%1 = load <4 x i8>, <4 x i8>* %0, align 1
				%2 = zext <4 x i8> %1 to <4 x i32>
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vldrbq_z_s8(i8* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrbq_z_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrbt.u8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %1)
				%3 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* %0, i32 1, <16 x i1> %2, <16 x i8> zeroinitializer)
				ret <16 x i8> %3
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vldrbq_z_s16(i8* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrbq_z_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrbt.s16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <8 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				%3 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %0, i32 1, <8 x i1> %2, <8 x i8> zeroinitializer)
				%4 = sext <8 x i8> %3 to <8 x i16>
				ret <8 x i16> %4
				}

				declare <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>*, i32 immarg, <8 x i1>, <8 x i8>)

				define arm_aapcs_vfpcc <4 x i32> @test_vldrbq_z_s32(i8* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrbq_z_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrbt.s32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <4 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>* %0, i32 1, <4 x i1> %2, <4 x i8> zeroinitializer)
				%4 = sext <4 x i8> %3 to <4 x i32>
				ret <4 x i32> %4
				}

				declare <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>*, i32 immarg, <4 x i1>, <4 x i8>)

				define arm_aapcs_vfpcc <16 x i8> @test_vldrbq_z_u8(i8* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrbq_z_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrbt.u8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %1)
				%3 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* %0, i32 1, <16 x i1> %2, <16 x i8> zeroinitializer)
				ret <16 x i8> %3
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vldrbq_z_u16(i8* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrbq_z_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrbt.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <8 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				%3 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %0, i32 1, <8 x i1> %2, <8 x i8> zeroinitializer)
				%4 = zext <8 x i8> %3 to <8 x i16>
				ret <8 x i16> %4
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrbq_z_u32(i8* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrbq_z_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrbt.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <4 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>* %0, i32 1, <4 x i1> %2, <4 x i8> zeroinitializer)
				%4 = zext <4 x i8> %3 to <4 x i32>
				ret <4 x i32> %4
				}

				define arm_aapcs_vfpcc <8 x half> @test_vldrhq_f16(half* %base) {
				; CHECK-LABEL: test_vldrhq_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrh.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast half* %base to <8 x half>*
				%1 = load <8 x half>, <8 x half>* %0, align 2
				ret <8 x half> %1
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vldrhq_s16(i16* %base) {
				; CHECK-LABEL: test_vldrhq_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrh.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = load <8 x i16>, <8 x i16>* %0, align 2
				ret <8 x i16> %1
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrhq_s32(i16* %base) {
				; CHECK-LABEL: test_vldrhq_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrh.s32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <4 x i16>*
				%1 = load <4 x i16>, <4 x i16>* %0, align 2
				%2 = sext <4 x i16> %1 to <4 x i32>
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vldrhq_u16(i16* %base) {
				; CHECK-LABEL: test_vldrhq_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrh.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = load <8 x i16>, <8 x i16>* %0, align 2
				ret <8 x i16> %1
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrhq_u32(i16* %base) {
				; CHECK-LABEL: test_vldrhq_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrh.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <4 x i16>*
				%1 = load <4 x i16>, <4 x i16>* %0, align 2
				%2 = zext <4 x i16> %1 to <4 x i32>
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <8 x half> @test_vldrhq_z_f16(half* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrhq_z_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrht.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast half* %base to <8 x half>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				%3 = call <8 x half> @llvm.masked.load.v8f16.p0v8f16(<8 x half>* %0, i32 2, <8 x i1> %2, <8 x half> zeroinitializer)
				ret <8 x half> %3
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vldrhq_z_s16(i16* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrhq_z_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrht.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				%3 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* %0, i32 2, <8 x i1> %2, <8 x i16> zeroinitializer)
				ret <8 x i16> %3
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrhq_z_s32(i16* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrhq_z_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrht.s32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <4 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %0, i32 2, <4 x i1> %2, <4 x i16> zeroinitializer)
				%4 = sext <4 x i16> %3 to <4 x i32>
				ret <4 x i32> %4
				}

				declare <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>*, i32 immarg, <4 x i1>, <4 x i16>)

				define arm_aapcs_vfpcc <8 x i16> @test_vldrhq_z_u16(i16* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrhq_z_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrht.u16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				%3 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* %0, i32 2, <8 x i1> %2, <8 x i16> zeroinitializer)
				ret <8 x i16> %3
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrhq_z_u32(i16* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrhq_z_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrht.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <4 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %0, i32 2, <4 x i1> %2, <4 x i16> zeroinitializer)
				%4 = zext <4 x i16> %3 to <4 x i32>
				ret <4 x i32> %4
				}

				define arm_aapcs_vfpcc <4 x float> @test_vldrwq_f32(float* %base) {
				; CHECK-LABEL: test_vldrwq_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast float* %base to <4 x float>*
				%1 = load <4 x float>, <4 x float>* %0, align 4
				ret <4 x float> %1
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrwq_s32(i32* %base) {
				; CHECK-LABEL: test_vldrwq_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = load <4 x i32>, <4 x i32>* %0, align 4
				ret <4 x i32> %1
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrwq_u32(i32* %base) {
				; CHECK-LABEL: test_vldrwq_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = load <4 x i32>, <4 x i32>* %0, align 4
				ret <4 x i32> %1
				}

				define arm_aapcs_vfpcc <4 x float> @test_vldrwq_z_f32(float* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrwq_z_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrwt.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast float* %base to <4 x float>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* %0, i32 4, <4 x i1> %2, <4 x float> zeroinitializer)
				ret <4 x float> %3
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrwq_z_s32(i32* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrwq_z_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrwt.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %0, i32 4, <4 x i1> %2, <4 x i32> zeroinitializer)
				ret <4 x i32> %3
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vldrwq_z_u32(i32* %base, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrwq_z_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrwt.u32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %0, i32 4, <4 x i1> %2, <4 x i32> zeroinitializer)
				ret <4 x i32> %3
				}

				define arm_aapcs_vfpcc void @test_vst1q_f16(half* %base, <8 x half> %value) {
				; CHECK-LABEL: test_vst1q_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrh.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast half* %base to <8 x half>*
				store <8 x half> %value, <8 x half>* %0, align 2
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_f32(float* %base, <4 x float> %value) {
				; CHECK-LABEL: test_vst1q_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrw.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast float* %base to <4 x float>*
				store <4 x float> %value, <4 x float>* %0, align 4
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_s8(i8* %base, <16 x i8> %value) {
				; CHECK-LABEL: test_vst1q_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrb.8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				store <16 x i8> %value, <16 x i8>* %0, align 1
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_s16(i16* %base, <8 x i16> %value) {
				; CHECK-LABEL: test_vst1q_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrh.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				store <8 x i16> %value, <8 x i16>* %0, align 2
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_s32(i32* %base, <4 x i32> %value) {
				; CHECK-LABEL: test_vst1q_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrw.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				store <4 x i32> %value, <4 x i32>* %0, align 4
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_u8(i8* %base, <16 x i8> %value) {
				; CHECK-LABEL: test_vst1q_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrb.8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				store <16 x i8> %value, <16 x i8>* %0, align 1
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_u16(i16* %base, <8 x i16> %value) {
				; CHECK-LABEL: test_vst1q_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrh.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				store <8 x i16> %value, <8 x i16>* %0, align 2
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_u32(i32* %base, <4 x i32> %value) {
				; CHECK-LABEL: test_vst1q_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrw.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				store <4 x i32> %value, <4 x i32>* %0, align 4
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_p_f16(half* %base, <8 x half> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vst1q_p_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrht.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast half* %base to <8 x half>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				call void @llvm.masked.store.v8f16.p0v8f16(<8 x half> %value, <8 x half>* %0, i32 2, <8 x i1> %2)
				ret void
				}

				declare void @llvm.masked.store.v8f16.p0v8f16(<8 x half>, <8 x half>*, i32 immarg, <8 x i1>)

				define arm_aapcs_vfpcc void @test_vst1q_p_f32(float* %base, <4 x float> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vst1q_p_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrwt.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast float* %base to <4 x float>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				call void @llvm.masked.store.v4f32.p0v4f32(<4 x float> %value, <4 x float>* %0, i32 4, <4 x i1> %2)
				ret void
				}

				declare void @llvm.masked.store.v4f32.p0v4f32(<4 x float>, <4 x float>*, i32 immarg, <4 x i1>)

				define arm_aapcs_vfpcc void @test_vst1q_p_s8(i8* %base, <16 x i8> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vst1q_p_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrbt.8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %1)
				call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> %value, <16 x i8>* %0, i32 1, <16 x i1> %2)
				ret void
				}

				declare void @llvm.masked.store.v16i8.p0v16i8(<16 x i8>, <16 x i8>*, i32 immarg, <16 x i1>)

				define arm_aapcs_vfpcc void @test_vst1q_p_s16(i16* %base, <8 x i16> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vst1q_p_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrht.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> %value, <8 x i16>* %0, i32 2, <8 x i1> %2)
				ret void
				}

				declare void @llvm.masked.store.v8i16.p0v8i16(<8 x i16>, <8 x i16>*, i32 immarg, <8 x i1>)

				define arm_aapcs_vfpcc void @test_vst1q_p_s32(i32* %base, <4 x i32> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vst1q_p_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrwt.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %value, <4 x i32>* %0, i32 4, <4 x i1> %2)
				ret void
				}

				declare void @llvm.masked.store.v4i32.p0v4i32(<4 x i32>, <4 x i32>*, i32 immarg, <4 x i1>)

				define arm_aapcs_vfpcc void @test_vst1q_p_u8(i8* %base, <16 x i8> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vst1q_p_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrbt.8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %1)
				call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> %value, <16 x i8>* %0, i32 1, <16 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_p_u16(i16* %base, <8 x i16> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vst1q_p_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrht.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> %value, <8 x i16>* %0, i32 2, <8 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vst1q_p_u32(i32* %base, <4 x i32> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vst1q_p_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrwt.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %value, <4 x i32>* %0, i32 4, <4 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_s8(i8* %base, <16 x i8> %value) {
				; CHECK-LABEL: test_vstrbq_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrb.8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				store <16 x i8> %value, <16 x i8>* %0, align 1
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_s16(i8* %base, <8 x i16> %value) {
				; CHECK-LABEL: test_vstrbq_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrb.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <8 x i16> %value to <8 x i8>
				%1 = bitcast i8* %base to <8 x i8>*
				store <8 x i8> %0, <8 x i8>* %1, align 1
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_s32(i8* %base, <4 x i32> %value) {
				; CHECK-LABEL: test_vstrbq_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrb.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <4 x i32> %value to <4 x i8>
				%1 = bitcast i8* %base to <4 x i8>*
				store <4 x i8> %0, <4 x i8>* %1, align 1
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_u8(i8* %base, <16 x i8> %value) {
				; CHECK-LABEL: test_vstrbq_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrb.8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				store <16 x i8> %value, <16 x i8>* %0, align 1
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_u16(i8* %base, <8 x i16> %value) {
				; CHECK-LABEL: test_vstrbq_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrb.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <8 x i16> %value to <8 x i8>
				%1 = bitcast i8* %base to <8 x i8>*
				store <8 x i8> %0, <8 x i8>* %1, align 1
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_u32(i8* %base, <4 x i32> %value) {
				; CHECK-LABEL: test_vstrbq_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrb.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <4 x i32> %value to <4 x i8>
				%1 = bitcast i8* %base to <4 x i8>*
				store <4 x i8> %0, <4 x i8>* %1, align 1
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_p_s8(i8* %base, <16 x i8> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrbq_p_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrbt.8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %1)
				call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> %value, <16 x i8>* %0, i32 1, <16 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_p_s16(i8* %base, <8 x i16> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrbq_p_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrbt.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <8 x i16> %value to <8 x i8>
				%1 = bitcast i8* %base to <8 x i8>*
				%2 = zext i16 %p to i32
				%3 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %2)
				call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> %0, <8 x i8>* %1, i32 1, <8 x i1> %3)
				ret void
				}

				declare void @llvm.masked.store.v8i8.p0v8i8(<8 x i8>, <8 x i8>*, i32 immarg, <8 x i1>)

				define arm_aapcs_vfpcc void @test_vstrbq_p_s32(i8* %base, <4 x i32> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrbq_p_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrbt.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <4 x i32> %value to <4 x i8>
				%1 = bitcast i8* %base to <4 x i8>*
				%2 = zext i16 %p to i32
				%3 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %2)
				call void @llvm.masked.store.v4i8.p0v4i8(<4 x i8> %0, <4 x i8>* %1, i32 1, <4 x i1> %3)
				ret void
				}

				declare void @llvm.masked.store.v4i8.p0v4i8(<4 x i8>, <4 x i8>*, i32 immarg, <4 x i1>)

				define arm_aapcs_vfpcc void @test_vstrbq_p_u8(i8* %base, <16 x i8> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrbq_p_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrbt.8 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i8* %base to <16 x i8>*
				%1 = zext i16 %p to i32
				%2 = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %1)
				call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> %value, <16 x i8>* %0, i32 1, <16 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_p_u16(i8* %base, <8 x i16> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrbq_p_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrbt.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <8 x i16> %value to <8 x i8>
				%1 = bitcast i8* %base to <8 x i8>*
				%2 = zext i16 %p to i32
				%3 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %2)
				call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> %0, <8 x i8>* %1, i32 1, <8 x i1> %3)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrbq_p_u32(i8* %base, <4 x i32> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrbq_p_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrbt.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <4 x i32> %value to <4 x i8>
				%1 = bitcast i8* %base to <4 x i8>*
				%2 = zext i16 %p to i32
				%3 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %2)
				call void @llvm.masked.store.v4i8.p0v4i8(<4 x i8> %0, <4 x i8>* %1, i32 1, <4 x i1> %3)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_f16(half* %base, <8 x half> %value) {
				; CHECK-LABEL: test_vstrhq_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrh.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast half* %base to <8 x half>*
				store <8 x half> %value, <8 x half>* %0, align 2
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_s16(i16* %base, <8 x i16> %value) {
				; CHECK-LABEL: test_vstrhq_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrh.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				store <8 x i16> %value, <8 x i16>* %0, align 2
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_s32(i16* %base, <4 x i32> %value) {
				; CHECK-LABEL: test_vstrhq_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrh.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <4 x i32> %value to <4 x i16>
				%1 = bitcast i16* %base to <4 x i16>*
				store <4 x i16> %0, <4 x i16>* %1, align 2
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_u16(i16* %base, <8 x i16> %value) {
				; CHECK-LABEL: test_vstrhq_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrh.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				store <8 x i16> %value, <8 x i16>* %0, align 2
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_u32(i16* %base, <4 x i32> %value) {
				; CHECK-LABEL: test_vstrhq_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrh.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <4 x i32> %value to <4 x i16>
				%1 = bitcast i16* %base to <4 x i16>*
				store <4 x i16> %0, <4 x i16>* %1, align 2
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_p_f16(half* %base, <8 x half> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrhq_p_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrht.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast half* %base to <8 x half>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				call void @llvm.masked.store.v8f16.p0v8f16(<8 x half> %value, <8 x half>* %0, i32 2, <8 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_p_s16(i16* %base, <8 x i16> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrhq_p_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrht.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> %value, <8 x i16>* %0, i32 2, <8 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_p_s32(i16* %base, <4 x i32> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrhq_p_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrht.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <4 x i32> %value to <4 x i16>
				%1 = bitcast i16* %base to <4 x i16>*
				%2 = zext i16 %p to i32
				%3 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %2)
				call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> %0, <4 x i16>* %1, i32 2, <4 x i1> %3)
				ret void
				}

				declare void @llvm.masked.store.v4i16.p0v4i16(<4 x i16>, <4 x i16>*, i32 immarg, <4 x i1>)

				define arm_aapcs_vfpcc void @test_vstrhq_p_u16(i16* %base, <8 x i16> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrhq_p_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrht.16 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i16* %base to <8 x i16>*
				%1 = zext i16 %p to i32
				%2 = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %1)
				call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> %value, <8 x i16>* %0, i32 2, <8 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrhq_p_u32(i16* %base, <4 x i32> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrhq_p_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrht.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = trunc <4 x i32> %value to <4 x i16>
				%1 = bitcast i16* %base to <4 x i16>*
				%2 = zext i16 %p to i32
				%3 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %2)
				call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> %0, <4 x i16>* %1, i32 2, <4 x i1> %3)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrwq_f32(float* %base, <4 x float> %value) {
				; CHECK-LABEL: test_vstrwq_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrw.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast float* %base to <4 x float>*
				store <4 x float> %value, <4 x float>* %0, align 4
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrwq_s32(i32* %base, <4 x i32> %value) {
				; CHECK-LABEL: test_vstrwq_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrw.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				store <4 x i32> %value, <4 x i32>* %0, align 4
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrwq_u32(i32* %base, <4 x i32> %value) {
				; CHECK-LABEL: test_vstrwq_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vstrw.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				store <4 x i32> %value, <4 x i32>* %0, align 4
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrwq_p_f32(float* %base, <4 x float> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrwq_p_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrwt.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast float* %base to <4 x float>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				call void @llvm.masked.store.v4f32.p0v4f32(<4 x float> %value, <4 x float>* %0, i32 4, <4 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrwq_p_s32(i32* %base, <4 x i32> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrwq_p_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrwt.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %value, <4 x i32>* %0, i32 4, <4 x i1> %2)
				ret void
				}

				define arm_aapcs_vfpcc void @test_vstrwq_p_u32(i32* %base, <4 x i32> %value, i16 zeroext %p) {
				; CHECK-LABEL: test_vstrwq_p_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vstrwt.32 q0, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = bitcast i32* %base to <4 x i32>*
				%1 = zext i16 %p to i32
				%2 = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %value, <4 x i32>* %0, i32 4, <4 x i1> %2)
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ARM,MVE] Add intrinsics for contiguous load/stores.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 228911

clang/include/clang/Basic/arm_mve.td

clang/include/clang/Basic/arm_mve_defs.td

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/arm-mve-intrinsics/load-store.c

clang/utils/TableGen/MveEmitter.cpp

llvm/test/CodeGen/Thumb2/mve-intrinsics/load-store.ll

[ARM,MVE] Add intrinsics for contiguous load/stores.
ClosedPublic