This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
arm_mve.td
-
arm_mve_defs.td
-
lib/CodeGen/
-
CodeGen/
-
CGBuiltin.cpp
-
test/CodeGen/arm-mve-intrinsics/
-
CodeGen/
-
arm-mve-intrinsics/
-
vector-shift-imm.c
-
utils/TableGen/
-
TableGen/
1
MveEmitter.cpp
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsARM.td
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMInstrMVE.td
-
test/CodeGen/Thumb2/mve-intrinsics/
-
CodeGen/
-
Thumb2/
-
mve-intrinsics/
-
vector-shift-imm.ll

Differential D71065

[ARM][MVE] Add intrinsics for immediate shifts.
ClosedPublic

Authored by simon_tatham on Dec 5 2019, 6:24 AM.

Download Raw Diff

Details

Reviewers

dmgreen
miyuki
MarkMurrayARM
ostannard

Commits

rGbd0f271c9e55: [ARM][MVE] Add intrinsics for immediate shifts. (reland)
rGd97b3e3e65cd: [ARM][MVE] Add intrinsics for immediate shifts.

Summary

This adds the family of vshlq_n and vshrq_n ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR shl, lshr and
ashr operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in vshrq_n can take values up to and including the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the lshr with
an undef half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within arm_mve.td.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

simon_tatham created this revision.Dec 5 2019, 6:24 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 5 2019, 6:24 AM

Herald added subscribers: llvm-commits, cfe-commits, hiraditya, kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B41926: Diff 232334.Dec 5 2019, 6:24 AM

LGTM

This revision is now accepted and ready to land.Dec 6 2019, 9:54 AM

Closed by commit rGd97b3e3e65cd: [ARM][MVE] Add intrinsics for immediate shifts. (authored by simon_tatham). · Explain WhyDec 9 2019, 7:47 AM

This revision was automatically updated to reflect the committed changes.

simon_tatham mentioned this in rG8d70f3c933a5: [ARM] Fix NEON failure introduced by D71065..Dec 9 2019, 9:01 AM

Reopening to review a revised version of this patch. It was reverted yesterday because of a test failure in release builds, which looks like the result of a warning fix that moved a side effect into an assert statement.

This revision is now accepted and ready to land.Dec 10 2019, 2:24 AM

Changes from previous version:

minor cleanup: removed check of hasIntegerConstantValue in IRBuilderResult::more_prerequisites, which was causing the generated codegen to perofrm a pointless call to EmitScalarExpr whose result was thrown away.

incorporated NEON test failure fix from rG8d70f3c933a5b81a

incorporated the -Winconsistent-missing-override fix from rGff4dceef9201c5ae

did not incorporate the -Wunused-variable change from rGff4dceef9201c5ae which moved a side-effecting function call into an assert statement. Instead fixed it with a (void)IsConst like all the other call sites.

Harbormaster completed remote builds in B42190: Diff 233025.Dec 10 2019, 2:32 AM

@hokein , @rdhindsa , @echristo : you all pointed out test failures in the previous version. Any problems I haven't spotted with this one?

One of the advantages to smaller patches I guess :)

It's probably difficult to tell if this will cause problems again without trying it and see if any of the buildbots complain. Lets give it a try and see. Just keep an eye on them, we annoyingly don't get any of the failure emails that we should.

clang/utils/TableGen/MveEmitter.cpp
674	This Sep isn't needed any more?

Closed by commit rGbd0f271c9e55: [ARM][MVE] Add intrinsics for immediate shifts. (reland) (authored by simon_tatham). · Explain WhyDec 11 2019, 2:11 AM

This revision was automatically updated to reflect the committed changes.

simon_tatham mentioned this in D71335: [ARM][MVE] Factor out an IntrinsicMX multiclass..Dec 11 2019, 2:57 AM

simon_tatham mentioned this in rGd290424731ed: [ARM][MVE] Factor out an IntrinsicMX multiclass..Dec 11 2019, 4:09 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

arm_mve.td

27 lines

arm_mve_defs.td

8 lines

lib/

CodeGen/

CGBuiltin.cpp

29 lines

test/

CodeGen/

arm-mve-intrinsics/

vector-shift-imm.c

722 lines

utils/

TableGen/

MveEmitter.cpp

83 lines

llvm/

include/

llvm/

IR/

IntrinsicsARM.td

8 lines

lib/

Target/

ARM/

ARMInstrMVE.td

54 lines

test/

CodeGen/

Thumb2/

mve-intrinsics/

vector-shift-imm.ll

398 lines

Diff 232862

clang/include/clang/Basic/arm_mve.td

Show First 20 Lines • Show All 516 Lines • ▼ Show 20 Lines	multiclass scatter_offset_both<list<Type> types, PrimitiveType memtype,
defm "": scatter_offset_shifted<types, memtype, shift>;		defm "": scatter_offset_shifted<types, memtype, shift>;
}		}

defm vstrbq: scatter_offset_unshifted<!listconcat(T.All8,T.Int16,T.Int32), u8>;		defm vstrbq: scatter_offset_unshifted<!listconcat(T.All8,T.Int16,T.Int32), u8>;
defm vstrhq: scatter_offset_both<!listconcat(T.All16, T.Int32), u16, 1>;		defm vstrhq: scatter_offset_both<!listconcat(T.All16, T.Int32), u16, 1>;
defm vstrwq: scatter_offset_both<T.All32, u32, 2>;		defm vstrwq: scatter_offset_both<T.All32, u32, 2>;
defm vstrdq: scatter_offset_both<T.Int64, u64, 3>;		defm vstrdq: scatter_offset_both<T.Int64, u64, 3>;

		multiclass PredicatedImmediateVectorShift<
		Immediate immtype, string predIntrName, list<dag> unsignedFlag = []> {
		foreach predIntr = [IRInt<predIntrName, [Vector, Predicate]>] in {
		def _m_n: Intrinsic<Vector, (args Vector:$inactive, Vector:$v,
		immtype:$sh, Predicate:$pred),
		!con((predIntr $v, $sh), !dag(predIntr, unsignedFlag, ?),
		(predIntr $pred, $inactive))>;
		def _x_n: Intrinsic<Vector, (args Vector:$v, immtype:$sh,
		Predicate:$pred),
		!con((predIntr $v, $sh), !dag(predIntr, unsignedFlag, ?),
		(predIntr $pred, (undef Vector)))>;
		}
		}

		let params = T.Int in {
		def vshlq_n: Intrinsic<Vector, (args Vector:$v, imm_0toNm1:$sh),
		(shl $v, (splat (Scalar $sh)))>;
		defm vshlq: PredicatedImmediateVectorShift<imm_0toNm1, "shl_imm_predicated">;

		let pnt = PNT_NType in {
		def vshrq_n: Intrinsic<Vector, (args Vector:$v, imm_1toN:$sh),
		(immshr $v, $sh, (unsignedflag Scalar))>;
		defm vshrq: PredicatedImmediateVectorShift<imm_1toN, "shr_imm_predicated",
		[(unsignedflag Scalar)]>;
		}
		}

// Base class for the scalar shift intrinsics.		// Base class for the scalar shift intrinsics.
class ScalarShift<Type argtype, dag shiftCountArg, dag shiftCodeGen>:		class ScalarShift<Type argtype, dag shiftCountArg, dag shiftCodeGen>:
Intrinsic<argtype, !con((args argtype:$value), shiftCountArg), shiftCodeGen> {		Intrinsic<argtype, !con((args argtype:$value), shiftCountArg), shiftCodeGen> {
let params = [Void];		let params = [Void];
let pnt = PNT_None;		let pnt = PNT_None;
}		}

// Subclass that includes the machinery to take a 64-bit input apart		// Subclass that includes the machinery to take a 64-bit input apart
▲ Show 20 Lines • Show All 173 Lines • Show Last 20 Lines

clang/include/clang/Basic/arm_mve_defs.td

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	def mul: IRBuilder<"CreateMul">;			def mul: IRBuilder<"CreateMul">;
	def not: IRBuilder<"CreateNot">;			def not: IRBuilder<"CreateNot">;
	def or: IRBuilder<"CreateOr">;			def or: IRBuilder<"CreateOr">;
	def and: IRBuilder<"CreateAnd">;			def and: IRBuilder<"CreateAnd">;
	def xor: IRBuilder<"CreateXor">;			def xor: IRBuilder<"CreateXor">;
	def sub: IRBuilder<"CreateSub">;			def sub: IRBuilder<"CreateSub">;
	def shl: IRBuilder<"CreateShl">;			def shl: IRBuilder<"CreateShl">;
	def lshr: IRBuilder<"CreateLShr">;			def lshr: IRBuilder<"CreateLShr">;
				def immshr: CGHelperFn<"MVEImmediateShr"> {
				let special_params = [IRBuilderIntParam<1, "unsigned">,
				IRBuilderIntParam<2, "bool">];
				}
	def fadd: IRBuilder<"CreateFAdd">;			def fadd: IRBuilder<"CreateFAdd">;
	def fmul: IRBuilder<"CreateFMul">;			def fmul: IRBuilder<"CreateFMul">;
	def fsub: IRBuilder<"CreateFSub">;			def fsub: IRBuilder<"CreateFSub">;
	def load: IRBuilder<"CreateLoad"> {			def load: IRBuilder<"CreateLoad"> {
	let special_params = [IRBuilderAddrParam<0>];			let special_params = [IRBuilderAddrParam<0>];
	}			}
	def store: IRBuilder<"CreateStore"> {			def store: IRBuilder<"CreateStore"> {
	let special_params = [IRBuilderAddrParam<1>];			let special_params = [IRBuilderAddrParam<1>];
	▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines
	}			}

	// imm_1toN can take any value from 1 to N inclusive, where N is the number of			// imm_1toN can take any value from 1 to N inclusive, where N is the number of
	// bits in the main parameter type. (E.g. an immediate shift count, in an			// bits in the main parameter type. (E.g. an immediate shift count, in an
	// intrinsic that shifts every lane of a vector by the same amount.)			// intrinsic that shifts every lane of a vector by the same amount.)
	//			//
	// imm_0toNm1 is the same but with the range offset by 1, i.e. 0 to N-1			// imm_0toNm1 is the same but with the range offset by 1, i.e. 0 to N-1
	// inclusive.			// inclusive.
	def imm_1toN : Immediate<u32, IB_EltBit<1>>;			def imm_1toN : Immediate<sint, IB_EltBit<1>>;
	def imm_0toNm1 : Immediate<u32, IB_EltBit<0>>;			def imm_0toNm1 : Immediate<sint, IB_EltBit<0>>;

	// imm_lane has to be the index of a vector lane in the main vector type, i.e			// imm_lane has to be the index of a vector lane in the main vector type, i.e
	// it can range from 0 to (128 / size of scalar)-1 inclusive. (e.g. vgetq_lane)			// it can range from 0 to (128 / size of scalar)-1 inclusive. (e.g. vgetq_lane)
	def imm_lane : Immediate<sint, IB_LaneIndex>;			def imm_lane : Immediate<sint, IB_LaneIndex>;

	// imm_1to32 can be in the range 1 to 32, unconditionally. (e.g. scalar shift			// imm_1to32 can be in the range 1 to 32, unconditionally. (e.g. scalar shift
	// intrinsics)			// intrinsics)
	def imm_1to32 : Immediate<sint, IB_ConstRange<1, 32>>;			def imm_1to32 : Immediate<sint, IB_ConstRange<1, 32>>;
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,795 Lines • ▼ Show 20 Lines	case NEON::BI__builtin_neon_vtbx3_v:
return EmitNeonCall(CGM.getIntrinsic(Intrinsic::arm_neon_vtbx3),		return EmitNeonCall(CGM.getIntrinsic(Intrinsic::arm_neon_vtbx3),
Ops, "vtbx3");		Ops, "vtbx3");
case NEON::BI__builtin_neon_vtbx4_v:		case NEON::BI__builtin_neon_vtbx4_v:
return EmitNeonCall(CGM.getIntrinsic(Intrinsic::arm_neon_vtbx4),		return EmitNeonCall(CGM.getIntrinsic(Intrinsic::arm_neon_vtbx4),
Ops, "vtbx4");		Ops, "vtbx4");
}		}
}		}

		template<typename Integer>
		static Integer GetIntegerConstantValue(const Expr *E, ASTContext &Context) {
		llvm::APSInt IntVal;
		bool IsConst = E->isIntegerConstantExpr(IntVal, Context);
		assert(IsConst && "Sema should have checked this was a constant");
		return IntVal.getExtValue();
		}

static llvm::Value SignOrZeroExtend(CGBuilderTy &Builder, llvm::Value V,		static llvm::Value SignOrZeroExtend(CGBuilderTy &Builder, llvm::Value V,
llvm::Type *T, bool Unsigned) {		llvm::Type *T, bool Unsigned) {
// Helper function called by Tablegen-constructed ARM MVE builtin codegen,		// Helper function called by Tablegen-constructed ARM MVE builtin codegen,
// which finds it convenient to specify signed/unsigned as a boolean flag.		// which finds it convenient to specify signed/unsigned as a boolean flag.
return Unsigned ? Builder.CreateZExt(V, T) : Builder.CreateSExt(V, T);		return Unsigned ? Builder.CreateZExt(V, T) : Builder.CreateSExt(V, T);
}		}

		static llvm::Value MVEImmediateShr(CGBuilderTy &Builder, llvm::Value V,
		uint32_t Shift, bool Unsigned) {
		// MVE helper function for integer shift right. This must handle signed vs
		// unsigned, and also deal specially with the case where the shift count is
		// equal to the lane size. In LLVM IR, an LShr with that parameter would be
		// undefined behavior, but in MVE it's legal, so we must convert it to code
		// that is not undefined in IR.
		unsigned LaneBits =
		V->getType()->getVectorElementType()->getPrimitiveSizeInBits();
		if (Shift == LaneBits) {
		// An unsigned shift of the full lane size always generates zero, so we can
		// simply emit a zero vector. A signed shift of the full lane size does the
		// same thing as shifting by one bit fewer.
		if (Unsigned)
		return llvm::Constant::getNullValue(V->getType());
		else
		--Shift;
		}
		return Unsigned ? Builder.CreateLShr(V, Shift) : Builder.CreateAShr(V, Shift);
		}

static llvm::Value ARMMVEVectorSplat(CGBuilderTy &Builder, llvm::Value V) {		static llvm::Value ARMMVEVectorSplat(CGBuilderTy &Builder, llvm::Value V) {
// MVE-specific helper function for a vector splat, which infers the element		// MVE-specific helper function for a vector splat, which infers the element
// count of the output vector by knowing that MVE vectors are all 128 bits		// count of the output vector by knowing that MVE vectors are all 128 bits
// wide.		// wide.
unsigned Elements = 128 / V->getType()->getPrimitiveSizeInBits();		unsigned Elements = 128 / V->getType()->getPrimitiveSizeInBits();
return Builder.CreateVectorSplat(Elements, V);		return Builder.CreateVectorSplat(Elements, V);
}		}

▲ Show 20 Lines • Show All 7,905 Lines • Show Last 20 Lines

clang/test/CodeGen/arm-mve-intrinsics/vector-shift-imm.c

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
				// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s
				// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s \| opt -S -mem2reg \| FileCheck %s

				#include <arm_mve.h>

				// CHECK-LABEL: @test_vshlq_n_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <16 x i8> [[A:%.]], <i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5>
				// CHECK-NEXT: ret <16 x i8> [[TMP0]]
				//
				int8x16_t test_vshlq_n_s8(int8x16_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 5);
				#else /* POLYMORPHIC */
				return vshlq_n_s8(a, 5);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <8 x i16> [[A:%.]], <i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5>
				// CHECK-NEXT: ret <8 x i16> [[TMP0]]
				//
				int16x8_t test_vshlq_n_s16(int16x8_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 5);
				#else /* POLYMORPHIC */
				return vshlq_n_s16(a, 5);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <4 x i32> [[A:%.]], <i32 18, i32 18, i32 18, i32 18>
				// CHECK-NEXT: ret <4 x i32> [[TMP0]]
				//
				int32x4_t test_vshlq_n_s32(int32x4_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 18);
				#else /* POLYMORPHIC */
				return vshlq_n_s32(a, 18);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_s8_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <16 x i8> [[A:%.]], zeroinitializer
				// CHECK-NEXT: ret <16 x i8> [[TMP0]]
				//
				int8x16_t test_vshlq_n_s8_trivial(int8x16_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 0);
				#else /* POLYMORPHIC */
				return vshlq_n_s8(a, 0);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_s16_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <8 x i16> [[A:%.]], zeroinitializer
				// CHECK-NEXT: ret <8 x i16> [[TMP0]]
				//
				int16x8_t test_vshlq_n_s16_trivial(int16x8_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 0);
				#else /* POLYMORPHIC */
				return vshlq_n_s16(a, 0);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_s32_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <4 x i32> [[A:%.]], zeroinitializer
				// CHECK-NEXT: ret <4 x i32> [[TMP0]]
				//
				int32x4_t test_vshlq_n_s32_trivial(int32x4_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 0);
				#else /* POLYMORPHIC */
				return vshlq_n_s32(a, 0);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <16 x i8> [[A:%.]], <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
				// CHECK-NEXT: ret <16 x i8> [[TMP0]]
				//
				uint8x16_t test_vshlq_n_u8(uint8x16_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 3);
				#else /* POLYMORPHIC */
				return vshlq_n_u8(a, 3);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <8 x i16> [[A:%.]], <i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11>
				// CHECK-NEXT: ret <8 x i16> [[TMP0]]
				//
				uint16x8_t test_vshlq_n_u16(uint16x8_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 11);
				#else /* POLYMORPHIC */
				return vshlq_n_u16(a, 11);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <4 x i32> [[A:%.]], <i32 7, i32 7, i32 7, i32 7>
				// CHECK-NEXT: ret <4 x i32> [[TMP0]]
				//
				uint32x4_t test_vshlq_n_u32(uint32x4_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 7);
				#else /* POLYMORPHIC */
				return vshlq_n_u32(a, 7);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_u8_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <16 x i8> [[A:%.]], zeroinitializer
				// CHECK-NEXT: ret <16 x i8> [[TMP0]]
				//
				uint8x16_t test_vshlq_n_u8_trivial(uint8x16_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 0);
				#else /* POLYMORPHIC */
				return vshlq_n_u8(a, 0);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_u16_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <8 x i16> [[A:%.]], zeroinitializer
				// CHECK-NEXT: ret <8 x i16> [[TMP0]]
				//
				uint16x8_t test_vshlq_n_u16_trivial(uint16x8_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 0);
				#else /* POLYMORPHIC */
				return vshlq_n_u16(a, 0);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_n_u32_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = shl <4 x i32> [[A:%.]], zeroinitializer
				// CHECK-NEXT: ret <4 x i32> [[TMP0]]
				//
				uint32x4_t test_vshlq_n_u32_trivial(uint32x4_t a)
				{
				#ifdef POLYMORPHIC
				return vshlq_n(a, 0);
				#else /* POLYMORPHIC */
				return vshlq_n_u32(a, 0);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = ashr <16 x i8> [[A:%.]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
				// CHECK-NEXT: ret <16 x i8> [[TMP0]]
				//
				int8x16_t test_vshrq_n_s8(int8x16_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 4);
				#else /* POLYMORPHIC */
				return vshrq_n_s8(a, 4);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = ashr <8 x i16> [[A:%.]], <i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10>
				// CHECK-NEXT: ret <8 x i16> [[TMP0]]
				//
				int16x8_t test_vshrq_n_s16(int16x8_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 10);
				#else /* POLYMORPHIC */
				return vshrq_n_s16(a, 10);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = ashr <4 x i32> [[A:%.]], <i32 19, i32 19, i32 19, i32 19>
				// CHECK-NEXT: ret <4 x i32> [[TMP0]]
				//
				int32x4_t test_vshrq_n_s32(int32x4_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 19);
				#else /* POLYMORPHIC */
				return vshrq_n_s32(a, 19);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_s8_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = ashr <16 x i8> [[A:%.]], <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
				// CHECK-NEXT: ret <16 x i8> [[TMP0]]
				//
				int8x16_t test_vshrq_n_s8_trivial(int8x16_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 8);
				#else /* POLYMORPHIC */
				return vshrq_n_s8(a, 8);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_s16_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = ashr <8 x i16> [[A:%.]], <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
				// CHECK-NEXT: ret <8 x i16> [[TMP0]]
				//
				int16x8_t test_vshrq_n_s16_trivial(int16x8_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 16);
				#else /* POLYMORPHIC */
				return vshrq_n_s16(a, 16);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_s32_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = ashr <4 x i32> [[A:%.]], <i32 31, i32 31, i32 31, i32 31>
				// CHECK-NEXT: ret <4 x i32> [[TMP0]]
				//
				int32x4_t test_vshrq_n_s32_trivial(int32x4_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 32);
				#else /* POLYMORPHIC */
				return vshrq_n_s32(a, 32);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = lshr <16 x i8> [[A:%.]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
				// CHECK-NEXT: ret <16 x i8> [[TMP0]]
				//
				uint8x16_t test_vshrq_n_u8(uint8x16_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 1);
				#else /* POLYMORPHIC */
				return vshrq_n_u8(a, 1);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = lshr <8 x i16> [[A:%.]], <i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10>
				// CHECK-NEXT: ret <8 x i16> [[TMP0]]
				//
				uint16x8_t test_vshrq_n_u16(uint16x8_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 10);
				#else /* POLYMORPHIC */
				return vshrq_n_u16(a, 10);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = lshr <4 x i32> [[A:%.]], <i32 10, i32 10, i32 10, i32 10>
				// CHECK-NEXT: ret <4 x i32> [[TMP0]]
				//
				uint32x4_t test_vshrq_n_u32(uint32x4_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 10);
				#else /* POLYMORPHIC */
				return vshrq_n_u32(a, 10);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_u8_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: ret <16 x i8> zeroinitializer
				//
				uint8x16_t test_vshrq_n_u8_trivial(uint8x16_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 8);
				#else /* POLYMORPHIC */
				return vshrq_n_u8(a, 8);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_u16_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: ret <8 x i16> zeroinitializer
				//
				uint16x8_t test_vshrq_n_u16_trivial(uint16x8_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 16);
				#else /* POLYMORPHIC */
				return vshrq_n_u16(a, 16);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_n_u32_trivial(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: ret <4 x i32> zeroinitializer
				//
				uint32x4_t test_vshrq_n_u32_trivial(uint32x4_t a)
				{
				#ifdef POLYMORPHIC
				return vshrq(a, 32);
				#else /* POLYMORPHIC */
				return vshrq_n_u32(a, 32);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_m_n_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <16 x i8> @llvm.arm.mve.shl.imm.predicated.v16i8.v16i1(<16 x i8> [[A:%.]], i32 6, <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <16 x i8> [[TMP2]]
				//
				int8x16_t test_vshlq_m_n_s8(int8x16_t inactive, int8x16_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_m_n(inactive, a, 6, p);
				#else /* POLYMORPHIC */
				return vshlq_m_n_s8(inactive, a, 6, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_m_n_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <8 x i16> @llvm.arm.mve.shl.imm.predicated.v8i16.v8i1(<8 x i16> [[A:%.]], i32 13, <8 x i1> [[TMP1]], <8 x i16> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				int16x8_t test_vshlq_m_n_s16(int16x8_t inactive, int16x8_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_m_n(inactive, a, 13, p);
				#else /* POLYMORPHIC */
				return vshlq_m_n_s16(inactive, a, 13, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_m_n_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <4 x i32> @llvm.arm.mve.shl.imm.predicated.v4i32.v4i1(<4 x i32> [[A:%.]], i32 0, <4 x i1> [[TMP1]], <4 x i32> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				int32x4_t test_vshlq_m_n_s32(int32x4_t inactive, int32x4_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_m_n(inactive, a, 0, p);
				#else /* POLYMORPHIC */
				return vshlq_m_n_s32(inactive, a, 0, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_m_n_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <16 x i8> @llvm.arm.mve.shl.imm.predicated.v16i8.v16i1(<16 x i8> [[A:%.]], i32 3, <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <16 x i8> [[TMP2]]
				//
				uint8x16_t test_vshlq_m_n_u8(uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_m_n(inactive, a, 3, p);
				#else /* POLYMORPHIC */
				return vshlq_m_n_u8(inactive, a, 3, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_m_n_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <8 x i16> @llvm.arm.mve.shl.imm.predicated.v8i16.v8i1(<8 x i16> [[A:%.]], i32 1, <8 x i1> [[TMP1]], <8 x i16> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				uint16x8_t test_vshlq_m_n_u16(uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_m_n(inactive, a, 1, p);
				#else /* POLYMORPHIC */
				return vshlq_m_n_u16(inactive, a, 1, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_m_n_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <4 x i32> @llvm.arm.mve.shl.imm.predicated.v4i32.v4i1(<4 x i32> [[A:%.]], i32 24, <4 x i1> [[TMP1]], <4 x i32> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				uint32x4_t test_vshlq_m_n_u32(uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_m_n(inactive, a, 24, p);
				#else /* POLYMORPHIC */
				return vshlq_m_n_u32(inactive, a, 24, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_m_n_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8> [[A:%.]], i32 2, i32 0, <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <16 x i8> [[TMP2]]
				//
				int8x16_t test_vshrq_m_n_s8(int8x16_t inactive, int8x16_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_m(inactive, a, 2, p);
				#else /* POLYMORPHIC */
				return vshrq_m_n_s8(inactive, a, 2, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_m_n_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16> [[A:%.]], i32 3, i32 0, <8 x i1> [[TMP1]], <8 x i16> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				int16x8_t test_vshrq_m_n_s16(int16x8_t inactive, int16x8_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_m(inactive, a, 3, p);
				#else /* POLYMORPHIC */
				return vshrq_m_n_s16(inactive, a, 3, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_m_n_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32> [[A:%.]], i32 13, i32 0, <4 x i1> [[TMP1]], <4 x i32> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				int32x4_t test_vshrq_m_n_s32(int32x4_t inactive, int32x4_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_m(inactive, a, 13, p);
				#else /* POLYMORPHIC */
				return vshrq_m_n_s32(inactive, a, 13, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_m_n_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8> [[A:%.]], i32 4, i32 1, <16 x i1> [[TMP1]], <16 x i8> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <16 x i8> [[TMP2]]
				//
				uint8x16_t test_vshrq_m_n_u8(uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_m(inactive, a, 4, p);
				#else /* POLYMORPHIC */
				return vshrq_m_n_u8(inactive, a, 4, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_m_n_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16> [[A:%.]], i32 14, i32 1, <8 x i1> [[TMP1]], <8 x i16> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				uint16x8_t test_vshrq_m_n_u16(uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_m(inactive, a, 14, p);
				#else /* POLYMORPHIC */
				return vshrq_m_n_u16(inactive, a, 14, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_m_n_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32> [[A:%.]], i32 21, i32 1, <4 x i1> [[TMP1]], <4 x i32> [[INACTIVE:%.*]])
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				uint32x4_t test_vshrq_m_n_u32(uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_m(inactive, a, 21, p);
				#else /* POLYMORPHIC */
				return vshrq_m_n_u32(inactive, a, 21, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_x_n_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <16 x i8> @llvm.arm.mve.shl.imm.predicated.v16i8.v16i1(<16 x i8> [[A:%.]], i32 1, <16 x i1> [[TMP1]], <16 x i8> undef)
				// CHECK-NEXT: ret <16 x i8> [[TMP2]]
				//
				int8x16_t test_vshlq_x_n_s8(int8x16_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_x_n(a, 1, p);
				#else /* POLYMORPHIC */
				return vshlq_x_n_s8(a, 1, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_x_n_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <8 x i16> @llvm.arm.mve.shl.imm.predicated.v8i16.v8i1(<8 x i16> [[A:%.]], i32 15, <8 x i1> [[TMP1]], <8 x i16> undef)
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				int16x8_t test_vshlq_x_n_s16(int16x8_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_x_n(a, 15, p);
				#else /* POLYMORPHIC */
				return vshlq_x_n_s16(a, 15, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_x_n_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <4 x i32> @llvm.arm.mve.shl.imm.predicated.v4i32.v4i1(<4 x i32> [[A:%.]], i32 13, <4 x i1> [[TMP1]], <4 x i32> undef)
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				int32x4_t test_vshlq_x_n_s32(int32x4_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_x_n(a, 13, p);
				#else /* POLYMORPHIC */
				return vshlq_x_n_s32(a, 13, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_x_n_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <16 x i8> @llvm.arm.mve.shl.imm.predicated.v16i8.v16i1(<16 x i8> [[A:%.]], i32 4, <16 x i1> [[TMP1]], <16 x i8> undef)
				// CHECK-NEXT: ret <16 x i8> [[TMP2]]
				//
				uint8x16_t test_vshlq_x_n_u8(uint8x16_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_x_n(a, 4, p);
				#else /* POLYMORPHIC */
				return vshlq_x_n_u8(a, 4, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_x_n_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <8 x i16> @llvm.arm.mve.shl.imm.predicated.v8i16.v8i1(<8 x i16> [[A:%.]], i32 10, <8 x i1> [[TMP1]], <8 x i16> undef)
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				uint16x8_t test_vshlq_x_n_u16(uint16x8_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_x_n(a, 10, p);
				#else /* POLYMORPHIC */
				return vshlq_x_n_u16(a, 10, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshlq_x_n_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <4 x i32> @llvm.arm.mve.shl.imm.predicated.v4i32.v4i1(<4 x i32> [[A:%.]], i32 30, <4 x i1> [[TMP1]], <4 x i32> undef)
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				uint32x4_t test_vshlq_x_n_u32(uint32x4_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshlq_x_n(a, 30, p);
				#else /* POLYMORPHIC */
				return vshlq_x_n_u32(a, 30, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_x_n_s8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8> [[A:%.]], i32 4, i32 0, <16 x i1> [[TMP1]], <16 x i8> undef)
				// CHECK-NEXT: ret <16 x i8> [[TMP2]]
				//
				int8x16_t test_vshrq_x_n_s8(int8x16_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_x(a, 4, p);
				#else /* POLYMORPHIC */
				return vshrq_x_n_s8(a, 4, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_x_n_s16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16> [[A:%.]], i32 10, i32 0, <8 x i1> [[TMP1]], <8 x i16> undef)
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				int16x8_t test_vshrq_x_n_s16(int16x8_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_x(a, 10, p);
				#else /* POLYMORPHIC */
				return vshrq_x_n_s16(a, 10, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_x_n_s32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32> [[A:%.]], i32 7, i32 0, <4 x i1> [[TMP1]], <4 x i32> undef)
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				int32x4_t test_vshrq_x_n_s32(int32x4_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_x(a, 7, p);
				#else /* POLYMORPHIC */
				return vshrq_x_n_s32(a, 7, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_x_n_u8(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8> [[A:%.]], i32 7, i32 1, <16 x i1> [[TMP1]], <16 x i8> undef)
				// CHECK-NEXT: ret <16 x i8> [[TMP2]]
				//
				uint8x16_t test_vshrq_x_n_u8(uint8x16_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_x(a, 7, p);
				#else /* POLYMORPHIC */
				return vshrq_x_n_u8(a, 7, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_x_n_u16(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16> [[A:%.]], i32 7, i32 1, <8 x i1> [[TMP1]], <8 x i16> undef)
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
				//
				uint16x8_t test_vshrq_x_n_u16(uint16x8_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_x(a, 7, p);
				#else /* POLYMORPHIC */
				return vshrq_x_n_u16(a, 7, p);
				#endif /* POLYMORPHIC */
				}

				// CHECK-LABEL: @test_vshrq_x_n_u32(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = zext i16 [[P:%.]] to i32
				// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
				// CHECK-NEXT: [[TMP2:%.]] = call <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32> [[A:%.]], i32 6, i32 1, <4 x i1> [[TMP1]], <4 x i32> undef)
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
				//
				uint32x4_t test_vshrq_x_n_u32(uint32x4_t a, mve_pred16_t p)
				{
				#ifdef POLYMORPHIC
				return vshrq_x(a, 6, p);
				#else /* POLYMORPHIC */
				return vshrq_x_n_u32(a, 6, p);
				#endif /* POLYMORPHIC */
				}

clang/utils/TableGen/MveEmitter.cpp

Show First 20 Lines • Show All 463 Lines • ▼ Show 20 Lines	private:
unsigned Visited = 0;		unsigned Visited = 0;

public:		public:
virtual ~Result() = default;		virtual ~Result() = default;
using Scope = std::map<std::string, Ptr>;		using Scope = std::map<std::string, Ptr>;
virtual void genCode(raw_ostream &OS, CodeGenParamAllocator &) const = 0;		virtual void genCode(raw_ostream &OS, CodeGenParamAllocator &) const = 0;
virtual bool hasIntegerConstantValue() const { return false; }		virtual bool hasIntegerConstantValue() const { return false; }
virtual uint32_t integerConstantValue() const { return 0; }		virtual uint32_t integerConstantValue() const { return 0; }
		virtual bool hasIntegerValue() const { return false; }
		virtual std::string getIntegerValue(const std::string &) {
		llvm_unreachable("non-working Result::getIntegerValue called");
		}
virtual std::string typeName() const { return "Value *"; }		virtual std::string typeName() const { return "Value *"; }

// Mostly, when a code-generation operation has a dependency on prior		// Mostly, when a code-generation operation has a dependency on prior
// operations, it's because it uses the output values of those operations as		// operations, it's because it uses the output values of those operations as
// inputs. But there's one exception, which is the use of 'seq' in Tablegen		// inputs. But there's one exception, which is the use of 'seq' in Tablegen
// to indicate that operations have to be performed in sequence regardless of		// to indicate that operations have to be performed in sequence regardless of
// whether they use each others' output values.		// whether they use each others' output values.
//		//
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
// There are aggregate parameters in the MVE intrinsics API, but we don't deal		// There are aggregate parameters in the MVE intrinsics API, but we don't deal
// with them in this Tablegen back end: they only arise in the vld2q/vld4q and		// with them in this Tablegen back end: they only arise in the vld2q/vld4q and
// vst2q/vst4q family, which is few enough that we just write the code by hand		// vst2q/vst4q family, which is few enough that we just write the code by hand
// for those in CGBuiltin.cpp.		// for those in CGBuiltin.cpp.
class BuiltinArgResult : public Result {		class BuiltinArgResult : public Result {
public:		public:
unsigned ArgNum;		unsigned ArgNum;
bool AddressType;		bool AddressType;
BuiltinArgResult(unsigned ArgNum, bool AddressType)		bool Immediate;
: ArgNum(ArgNum), AddressType(AddressType) {}		BuiltinArgResult(unsigned ArgNum, bool AddressType, bool Immediate)
		: ArgNum(ArgNum), AddressType(AddressType), Immediate(Immediate) {}
void genCode(raw_ostream &OS, CodeGenParamAllocator &) const override {		void genCode(raw_ostream &OS, CodeGenParamAllocator &) const override {
OS << (AddressType ? "EmitPointerWithAlignment" : "EmitScalarExpr")		OS << (AddressType ? "EmitPointerWithAlignment" : "EmitScalarExpr")
<< "(E->getArg(" << ArgNum << "))";		<< "(E->getArg(" << ArgNum << "))";
}		}
std::string typeName() const override {		std::string typeName() const override {
return AddressType ? "Address" : Result::typeName();		return AddressType ? "Address" : Result::typeName();
}		}
// Emit code to generate this result as a Value *.		// Emit code to generate this result as a Value *.
std::string asValue() override {		std::string asValue() override {
if (AddressType)		if (AddressType)
return "(" + varname() + ".getPointer())";		return "(" + varname() + ".getPointer())";
return Result::asValue();		return Result::asValue();
}		}
		bool hasIntegerValue() const override { return Immediate; }
		virtual std::string getIntegerValue(const std::string &IntType) {
		return "GetIntegerConstantValue<" + IntType + ">(E->getArg(" +
		utostr(ArgNum) + "), getContext())";
		}
};		};

// Result subclass for an integer literal appearing in Tablegen. This may need		// Result subclass for an integer literal appearing in Tablegen. This may need
// to be turned into an llvm::Result by means of llvm::ConstantInt::get(), or		// to be turned into an llvm::Result by means of llvm::ConstantInt::get(), or
// it may be used directly as an integer, depending on which IRBuilder method		// it may be used directly as an integer, depending on which IRBuilder method
// it's being passed to.		// it's being passed to.
class IntLiteralResult : public Result {		class IntLiteralResult : public Result {
public:		public:
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
// method we want to use will have a Tablegen record giving the method name and		// method we want to use will have a Tablegen record giving the method name and
// describing any important details of how to call it, such as whether a		// describing any important details of how to call it, such as whether a
// particular argument should be an integer constant instead of an llvm::Value.		// particular argument should be an integer constant instead of an llvm::Value.
class IRBuilderResult : public Result {		class IRBuilderResult : public Result {
public:		public:
StringRef CallPrefix;		StringRef CallPrefix;
std::vector<Ptr> Args;		std::vector<Ptr> Args;
std::set<unsigned> AddressArgs;		std::set<unsigned> AddressArgs;
std::map<unsigned, std::string> IntConstantArgs;		std::map<unsigned, std::string> IntegerArgs;
IRBuilderResult(StringRef CallPrefix, std::vector<Ptr> Args,		IRBuilderResult(StringRef CallPrefix, std::vector<Ptr> Args,
std::set<unsigned> AddressArgs,		std::set<unsigned> AddressArgs,
std::map<unsigned, std::string> IntConstantArgs)		std::map<unsigned, std::string> IntegerArgs)
: CallPrefix(CallPrefix), Args(Args), AddressArgs(AddressArgs),		: CallPrefix(CallPrefix), Args(Args), AddressArgs(AddressArgs),
IntConstantArgs(IntConstantArgs) {}		IntegerArgs(IntegerArgs) {}
void genCode(raw_ostream &OS,		void genCode(raw_ostream &OS,
CodeGenParamAllocator &ParamAlloc) const override {		CodeGenParamAllocator &ParamAlloc) const override {
OS << CallPrefix;		OS << CallPrefix;
const char *Sep = "";		const char *Sep = "";
for (unsigned i = 0, e = Args.size(); i < e; ++i) {		for (unsigned i = 0, e = Args.size(); i < e; ++i) {
Ptr Arg = Args[i];		Ptr Arg = Args[i];
auto it = IntConstantArgs.find(i);		auto it = IntegerArgs.find(i);
if (it != IntConstantArgs.end()) {
assert(Arg->hasIntegerConstantValue());		OS << Sep;
OS << Sep << "static_cast<" << it->second << ">("		Sep = ", ";
<< ParamAlloc.allocParam("unsigned",
		if (it != IntegerArgs.end()) {
		if (Arg->hasIntegerConstantValue())
		OS << "static_cast<" << it->second << ">("
		<< ParamAlloc.allocParam(it->second,
utostr(Arg->integerConstantValue()))		utostr(Arg->integerConstantValue()))
<< ")";		<< ")";
		else if (Arg->hasIntegerValue())
		OS << ParamAlloc.allocParam(it->second,
		Arg->getIntegerValue(it->second));
} else {		} else {
OS << Sep << Arg->varname();		OS << Arg->varname();
}		}
Sep = ", ";		Sep = ", ";
		dmgreenUnsubmitted Not Done Reply Inline Actions This Sep isn't needed any more? dmgreen: This Sep isn't needed any more?
}		}
OS << ")";		OS << ")";
}		}
void morePrerequisites(std::vector<Ptr> &output) const override {		void morePrerequisites(std::vector<Ptr> &output) const override {
for (unsigned i = 0, e = Args.size(); i < e; ++i) {		for (unsigned i = 0, e = Args.size(); i < e; ++i) {
Ptr Arg = Args[i];		Ptr Arg = Args[i];
if (IntConstantArgs.find(i) != IntConstantArgs.end())		if (IntegerArgs.find(i) != IntegerArgs.end() &&
		Arg->hasIntegerConstantValue())
continue;		continue;
output.push_back(Arg);		output.push_back(Arg);
}		}
}		}
};		};

// Result subclass representing making an Address out of a Value.		// Result subclass representing making an Address out of a Value.
class AddressResult : public Result {		class AddressResult : public Result {
▲ Show 20 Lines • Show All 302 Lines • ▼ Show 20 Lines	public:

// Functions that translate the Tablegen representation of an intrinsic's		// Functions that translate the Tablegen representation of an intrinsic's
// code generation into a collection of Value objects (which will then be		// code generation into a collection of Value objects (which will then be
// reprocessed to read out the actual C++ code included by CGBuiltin.cpp).		// reprocessed to read out the actual C++ code included by CGBuiltin.cpp).
Result::Ptr getCodeForDag(DagInit *D, const Result::Scope &Scope,		Result::Ptr getCodeForDag(DagInit *D, const Result::Scope &Scope,
const Type *Param);		const Type *Param);
Result::Ptr getCodeForDagArg(DagInit *D, unsigned ArgNum,		Result::Ptr getCodeForDagArg(DagInit *D, unsigned ArgNum,
const Result::Scope &Scope, const Type *Param);		const Result::Scope &Scope, const Type *Param);
Result::Ptr getCodeForArg(unsigned ArgNum, const Type *ArgType,		Result::Ptr getCodeForArg(unsigned ArgNum, const Type *ArgType, bool Promote,
bool Promote);		bool Immediate);

// Constructor and top-level functions.		// Constructor and top-level functions.

MveEmitter(RecordKeeper &Records);		MveEmitter(RecordKeeper &Records);

void EmitHeader(raw_ostream &OS);		void EmitHeader(raw_ostream &OS);
void EmitBuiltinDef(raw_ostream &OS);		void EmitBuiltinDef(raw_ostream &OS);
void EmitBuiltinSema(raw_ostream &OS);		void EmitBuiltinSema(raw_ostream &OS);
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	if (const auto *ST = dyn_cast<ScalarType>(getType(TypeRec, Param))) {
PrintFatalError("unsignedflag's argument should be a scalar type");		PrintFatalError("unsignedflag's argument should be a scalar type");
}		}
} else {		} else {
std::vector<Result::Ptr> Args;		std::vector<Result::Ptr> Args;
for (unsigned i = 0, e = D->getNumArgs(); i < e; ++i)		for (unsigned i = 0, e = D->getNumArgs(); i < e; ++i)
Args.push_back(getCodeForDagArg(D, i, Scope, Param));		Args.push_back(getCodeForDagArg(D, i, Scope, Param));
if (Op->isSubClassOf("IRBuilderBase")) {		if (Op->isSubClassOf("IRBuilderBase")) {
std::set<unsigned> AddressArgs;		std::set<unsigned> AddressArgs;
std::map<unsigned, std::string> IntConstantArgs;		std::map<unsigned, std::string> IntegerArgs;
for (Record *sp : Op->getValueAsListOfDefs("special_params")) {		for (Record *sp : Op->getValueAsListOfDefs("special_params")) {
unsigned Index = sp->getValueAsInt("index");		unsigned Index = sp->getValueAsInt("index");
if (sp->isSubClassOf("IRBuilderAddrParam")) {		if (sp->isSubClassOf("IRBuilderAddrParam")) {
AddressArgs.insert(Index);		AddressArgs.insert(Index);
} else if (sp->isSubClassOf("IRBuilderIntParam")) {		} else if (sp->isSubClassOf("IRBuilderIntParam")) {
IntConstantArgs[Index] = sp->getValueAsString("type");		IntegerArgs[Index] = sp->getValueAsString("type");
}		}
}		}
return std::make_shared<IRBuilderResult>(		return std::make_shared<IRBuilderResult>(Op->getValueAsString("prefix"),
Op->getValueAsString("prefix"), Args, AddressArgs, IntConstantArgs);		Args, AddressArgs, IntegerArgs);
} else if (Op->isSubClassOf("IRIntBase")) {		} else if (Op->isSubClassOf("IRIntBase")) {
std::vector<const Type *> ParamTypes;		std::vector<const Type *> ParamTypes;
for (Record *RParam : Op->getValueAsListOfDefs("params"))		for (Record *RParam : Op->getValueAsListOfDefs("params"))
ParamTypes.push_back(getType(RParam, Param));		ParamTypes.push_back(getType(RParam, Param));
std::string IntName = Op->getValueAsString("intname");		std::string IntName = Op->getValueAsString("intname");
if (Op->getValueAsBit("appendKind"))		if (Op->getValueAsBit("appendKind"))
IntName += "_" + toLetter(cast<ScalarType>(Param)->kind());		IntName += "_" + toLetter(cast<ScalarType>(Param)->kind());
return std::make_shared<IRIntrinsicResult>(IntName, ParamTypes, Args);		return std::make_shared<IRIntrinsicResult>(IntName, ParamTypes, Args);
Show All 33 Lines	if (Rec->isSubClassOf("Type")) {
return std::make_shared<TypeResult>(T);		return std::make_shared<TypeResult>(T);
}		}
}		}

PrintFatalError("bad dag argument type for code generation");		PrintFatalError("bad dag argument type for code generation");
}		}

Result::Ptr MveEmitter::getCodeForArg(unsigned ArgNum, const Type *ArgType,		Result::Ptr MveEmitter::getCodeForArg(unsigned ArgNum, const Type *ArgType,
bool Promote) {		bool Promote, bool Immediate) {
Result::Ptr V =		Result::Ptr V = std::make_shared<BuiltinArgResult>(
std::make_shared<BuiltinArgResult>(ArgNum, isa<PointerType>(ArgType));		ArgNum, isa<PointerType>(ArgType), Immediate);

if (Promote) {		if (Promote) {
if (const auto *ST = dyn_cast<ScalarType>(ArgType)) {		if (const auto *ST = dyn_cast<ScalarType>(ArgType)) {
if (ST->isInteger() && ST->sizeInBits() < 32)		if (ST->isInteger() && ST->sizeInBits() < 32)
V = std::make_shared<IntCastResult>(getScalarType("u32"), V);		V = std::make_shared<IntCastResult>(getScalarType("u32"), V);
} else if (const auto *PT = dyn_cast<PredicateType>(ArgType)) {		} else if (const auto *PT = dyn_cast<PredicateType>(ArgType)) {
V = std::make_shared<IntCastResult>(getScalarType("u32"), V);		V = std::make_shared<IntCastResult>(getScalarType("u32"), V);
V = std::make_shared<IRIntrinsicResult>("arm_mve_pred_i2v",		V = std::make_shared<IRIntrinsicResult>("arm_mve_pred_i2v",
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (auto TypeDI = dyn_cast<DefInit>(TypeInit))
if (TypeDI->getDef()->isSubClassOf("unpromoted"))		if (TypeDI->getDef()->isSubClassOf("unpromoted"))
Promote = false;		Promote = false;

// Work out the type of the argument, for use in the function prototype in		// Work out the type of the argument, for use in the function prototype in
// the header file.		// the header file.
const Type *ArgType = ME.getType(TypeInit, Param);		const Type *ArgType = ME.getType(TypeInit, Param);
ArgTypes.push_back(ArgType);		ArgTypes.push_back(ArgType);

// The argument will usually have a name in the arguments dag, which goes
// into the variable-name scope that the code gen will refer to.
StringRef ArgName = ArgsDag->getArgNameStr(i);
if (!ArgName.empty())
Scope[ArgName] = ME.getCodeForArg(i, ArgType, Promote);

// If the argument is a subclass of Immediate, record the details about		// If the argument is a subclass of Immediate, record the details about
// what values it can take, for Sema checking.		// what values it can take, for Sema checking.
		bool Immediate = false;
if (auto TypeDI = dyn_cast<DefInit>(TypeInit)) {		if (auto TypeDI = dyn_cast<DefInit>(TypeInit)) {
Record *TypeRec = TypeDI->getDef();		Record *TypeRec = TypeDI->getDef();
if (TypeRec->isSubClassOf("Immediate")) {		if (TypeRec->isSubClassOf("Immediate")) {
		Immediate = true;

Record *Bounds = TypeRec->getValueAsDef("bounds");		Record *Bounds = TypeRec->getValueAsDef("bounds");
ImmediateArg &IA = ImmediateArgs[i];		ImmediateArg &IA = ImmediateArgs[i];
if (Bounds->isSubClassOf("IB_ConstRange")) {		if (Bounds->isSubClassOf("IB_ConstRange")) {
IA.boundsType = ImmediateArg::BoundsType::ExplicitRange;		IA.boundsType = ImmediateArg::BoundsType::ExplicitRange;
IA.i1 = Bounds->getValueAsInt("lo");		IA.i1 = Bounds->getValueAsInt("lo");
IA.i2 = Bounds->getValueAsInt("hi");		IA.i2 = Bounds->getValueAsInt("hi");
} else if (Bounds->getName() == "IB_UEltValue") {		} else if (Bounds->getName() == "IB_UEltValue") {
IA.boundsType = ImmediateArg::BoundsType::UInt;		IA.boundsType = ImmediateArg::BoundsType::UInt;
IA.i1 = Param->sizeInBits();		IA.i1 = Param->sizeInBits();
} else if (Bounds->getName() == "IB_LaneIndex") {		} else if (Bounds->getName() == "IB_LaneIndex") {
IA.boundsType = ImmediateArg::BoundsType::ExplicitRange;		IA.boundsType = ImmediateArg::BoundsType::ExplicitRange;
IA.i1 = 0;		IA.i1 = 0;
IA.i2 = 128 / Param->sizeInBits() - 1;		IA.i2 = 128 / Param->sizeInBits() - 1;
} else if (Bounds->getName() == "IB_EltBit") {		} else if (Bounds->isSubClassOf("IB_EltBit")) {
IA.boundsType = ImmediateArg::BoundsType::ExplicitRange;		IA.boundsType = ImmediateArg::BoundsType::ExplicitRange;
IA.i1 = Bounds->getValueAsInt("base");		IA.i1 = Bounds->getValueAsInt("base");
IA.i2 = IA.i1 + Param->sizeInBits() - 1;		IA.i2 = IA.i1 + Param->sizeInBits() - 1;
} else {		} else {
PrintFatalError("unrecognised ImmediateBounds subclass");		PrintFatalError("unrecognised ImmediateBounds subclass");
}		}

IA.ArgType = ArgType;		IA.ArgType = ArgType;

if (!TypeRec->isValueUnset("extra")) {		if (!TypeRec->isValueUnset("extra")) {
IA.ExtraCheckType = TypeRec->getValueAsString("extra");		IA.ExtraCheckType = TypeRec->getValueAsString("extra");
if (!TypeRec->isValueUnset("extraarg"))		if (!TypeRec->isValueUnset("extraarg"))
IA.ExtraCheckArgs = TypeRec->getValueAsString("extraarg");		IA.ExtraCheckArgs = TypeRec->getValueAsString("extraarg");
}		}
}		}
}		}

		// The argument will usually have a name in the arguments dag, which goes
		// into the variable-name scope that the code gen will refer to.
		StringRef ArgName = ArgsDag->getArgNameStr(i);
		if (!ArgName.empty())
		Scope[ArgName] = ME.getCodeForArg(i, ArgType, Promote, Immediate);
}		}

// Finally, go through the codegen dag and translate it into a Result object		// Finally, go through the codegen dag and translate it into a Result object
// (with an arbitrary DAG of depended-on Results hanging off it).		// (with an arbitrary DAG of depended-on Results hanging off it).
DagInit *CodeDag = R->getValueAsDag("codegen");		DagInit *CodeDag = R->getValueAsDag("codegen");
Record *MainOp = cast<DefInit>(CodeDag->getOperator())->getDef();		Record *MainOp = cast<DefInit>(CodeDag->getOperator())->getDef();
if (MainOp->isSubClassOf("CustomCodegen")) {		if (MainOp->isSubClassOf("CustomCodegen")) {
// Or, if it's the special case of CustomCodegen, just accumulate		// Or, if it's the special case of CustomCodegen, just accumulate
▲ Show 20 Lines • Show All 511 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsARM.td

	Show First 20 Lines • Show All 878 Lines • ▼ Show 20 Lines
	// narrows rather than widening, it doesn't have the last one.			// narrows rather than widening, it doesn't have the last one.
	defm int_arm_mve_vldr_gather_offset: MVEPredicated<			defm int_arm_mve_vldr_gather_offset: MVEPredicated<
	[llvm_anyvector_ty], [llvm_anyptr_ty, llvm_anyvector_ty,			[llvm_anyvector_ty], [llvm_anyptr_ty, llvm_anyvector_ty,
	llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], llvm_anyvector_ty, [IntrReadMem]>;			llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], llvm_anyvector_ty, [IntrReadMem]>;
	defm int_arm_mve_vstr_scatter_offset: MVEPredicated<			defm int_arm_mve_vstr_scatter_offset: MVEPredicated<
	[], [llvm_anyptr_ty, llvm_anyvector_ty, llvm_anyvector_ty,			[], [llvm_anyptr_ty, llvm_anyvector_ty, llvm_anyvector_ty,
	llvm_i32_ty, llvm_i32_ty], llvm_anyvector_ty, [IntrWriteMem]>;			llvm_i32_ty, llvm_i32_ty], llvm_anyvector_ty, [IntrWriteMem]>;

				def int_arm_mve_shl_imm_predicated: Intrinsic<[llvm_anyvector_ty],
				[LLVMMatchType<0>, llvm_i32_ty, llvm_anyvector_ty, LLVMMatchType<0>],
				[IntrNoMem]>;
				def int_arm_mve_shr_imm_predicated: Intrinsic<[llvm_anyvector_ty],
				[LLVMMatchType<0>, llvm_i32_ty, llvm_i32_ty, // extra i32 is unsigned flag
				llvm_anyvector_ty, LLVMMatchType<0>],
				[IntrNoMem]>;

	// MVE scalar shifts.			// MVE scalar shifts.
	class ARM_MVE_qrshift_single<list<LLVMType> value,			class ARM_MVE_qrshift_single<list<LLVMType> value,
	list<LLVMType> saturate = []> :			list<LLVMType> saturate = []> :
	Intrinsic<value, value # [llvm_i32_ty] # saturate, [IntrNoMem]>;			Intrinsic<value, value # [llvm_i32_ty] # saturate, [IntrNoMem]>;
	multiclass ARM_MVE_qrshift<list<LLVMType> saturate = []> {			multiclass ARM_MVE_qrshift<list<LLVMType> saturate = []> {
	// Most of these shifts come in 32- and 64-bit versions. But only			// Most of these shifts come in 32- and 64-bit versions. But only
	// the 64-bit ones have the extra saturation argument (if any).			// the 64-bit ones have the extra saturation argument (if any).
	def "": ARM_MVE_qrshift_single<[llvm_i32_ty]>;			def "": ARM_MVE_qrshift_single<[llvm_i32_ty]>;
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrMVE.td

	Show First 20 Lines • Show All 2,683 Lines • ▼ Show 20 Lines
	def MVE_VSHL_immi16 : MVE_VSHL_imm<"i16", (ins imm0_15:$imm)> {			def MVE_VSHL_immi16 : MVE_VSHL_imm<"i16", (ins imm0_15:$imm)> {
	let Inst{21-20} = 0b01;			let Inst{21-20} = 0b01;
	}			}

	def MVE_VSHL_immi32 : MVE_VSHL_imm<"i32", (ins imm0_31:$imm)> {			def MVE_VSHL_immi32 : MVE_VSHL_imm<"i32", (ins imm0_31:$imm)> {
	let Inst{21} = 0b1;			let Inst{21} = 0b1;
	}			}

	let Predicates = [HasMVEInt] in {			multiclass MVE_immediate_shift_patterns_inner<
	def : Pat<(v4i32 (ARMvshlImm (v4i32 MQPR:$src), imm0_31:$imm)),			MVEVectorVTInfo VTI, Operand imm_operand_type, SDNode unpred_op,
	(v4i32 (MVE_VSHL_immi32 (v4i32 MQPR:$src), imm0_31:$imm))>;			Intrinsic pred_int, Instruction inst, list<int> unsignedFlag = []> {
	def : Pat<(v8i16 (ARMvshlImm (v8i16 MQPR:$src), imm0_15:$imm)),
	(v8i16 (MVE_VSHL_immi16 (v8i16 MQPR:$src), imm0_15:$imm))>;			def : Pat<(VTI.Vec (unpred_op (VTI.Vec MQPR:$src), imm_operand_type:$imm)),
	def : Pat<(v16i8 (ARMvshlImm (v16i8 MQPR:$src), imm0_7:$imm)),			(VTI.Vec (inst (VTI.Vec MQPR:$src), imm_operand_type:$imm))>;
	(v16i8 (MVE_VSHL_immi8 (v16i8 MQPR:$src), imm0_7:$imm))>;
				def : Pat<(VTI.Vec !con((pred_int (VTI.Vec MQPR:$src), imm_operand_type:$imm),
	def : Pat<(v4i32 (ARMvshruImm (v4i32 MQPR:$src), imm0_31:$imm)),			!dag(pred_int, unsignedFlag, ?),
	(v4i32 (MVE_VSHR_immu32 (v4i32 MQPR:$src), imm0_31:$imm))>;			(pred_int (VTI.Pred VCCR:$mask),
	def : Pat<(v8i16 (ARMvshruImm (v8i16 MQPR:$src), imm0_15:$imm)),			(VTI.Vec MQPR:$inactive)))),
	(v8i16 (MVE_VSHR_immu16 (v8i16 MQPR:$src), imm0_15:$imm))>;			(VTI.Vec (inst (VTI.Vec MQPR:$src), imm_operand_type:$imm,
	def : Pat<(v16i8 (ARMvshruImm (v16i8 MQPR:$src), imm0_7:$imm)),			ARMVCCThen, (VTI.Pred VCCR:$mask),
	(v16i8 (MVE_VSHR_immu8 (v16i8 MQPR:$src), imm0_7:$imm))>;			(VTI.Vec MQPR:$inactive)))>;

	def : Pat<(v4i32 (ARMvshrsImm (v4i32 MQPR:$src), imm0_31:$imm)),
	(v4i32 (MVE_VSHR_imms32 (v4i32 MQPR:$src), imm0_31:$imm))>;
	def : Pat<(v8i16 (ARMvshrsImm (v8i16 MQPR:$src), imm0_15:$imm)),
	(v8i16 (MVE_VSHR_imms16 (v8i16 MQPR:$src), imm0_15:$imm))>;
	def : Pat<(v16i8 (ARMvshrsImm (v16i8 MQPR:$src), imm0_7:$imm)),
	(v16i8 (MVE_VSHR_imms8 (v16i8 MQPR:$src), imm0_7:$imm))>;
	}			}

				multiclass MVE_immediate_shift_patterns<MVEVectorVTInfo VTI,
				Operand imm_operand_type> {
				defm : MVE_immediate_shift_patterns_inner<VTI, imm_operand_type,
				ARMvshlImm, int_arm_mve_shl_imm_predicated,
				!cast<Instruction>("MVE_VSHL_immi" # VTI.BitsSuffix)>;
				defm : MVE_immediate_shift_patterns_inner<VTI, imm_operand_type,
				ARMvshruImm, int_arm_mve_shr_imm_predicated,
				!cast<Instruction>("MVE_VSHR_immu" # VTI.BitsSuffix), [1]>;
				defm : MVE_immediate_shift_patterns_inner<VTI, imm_operand_type,
				ARMvshrsImm, int_arm_mve_shr_imm_predicated,
				!cast<Instruction>("MVE_VSHR_imms" # VTI.BitsSuffix), [0]>;
				}

				defm : MVE_immediate_shift_patterns<MVE_v16i8, imm0_7>;
				defm : MVE_immediate_shift_patterns<MVE_v8i16, imm0_15>;
				defm : MVE_immediate_shift_patterns<MVE_v4i32, imm0_31>;

	// end of mve_shift instructions			// end of mve_shift instructions

	// start of MVE Floating Point instructions			// start of MVE Floating Point instructions

	class MVE_float<string iname, string suffix, dag oops, dag iops, string ops,			class MVE_float<string iname, string suffix, dag oops, dag iops, string ops,
	vpred_ops vpred, string cstr, list<dag> pattern=[]>			vpred_ops vpred, string cstr, list<dag> pattern=[]>
	: MVE_f<oops, iops, NoItinerary, iname, suffix, ops, vpred, cstr, pattern> {			: MVE_f<oops, iops, NoItinerary, iname, suffix, ops, vpred, cstr, pattern> {
	bits<4> Qm;			bits<4> Qm;
	▲ Show 20 Lines • Show All 3,116 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-intrinsics/vector-shift-imm.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve -verify-machineinstrs -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc <16 x i8> @test_vshlq_n_s8(<16 x i8> %a) {
				; CHECK-LABEL: test_vshlq_n_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshl.i8 q0, q0, #5
				; CHECK-NEXT: bx lr
				entry:
				%0 = shl <16 x i8> %a, <i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5>
				ret <16 x i8> %0
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshlq_n_s16(<8 x i16> %a) {
				; CHECK-LABEL: test_vshlq_n_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshl.i16 q0, q0, #5
				; CHECK-NEXT: bx lr
				entry:
				%0 = shl <8 x i16> %a, <i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5>
				ret <8 x i16> %0
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshlq_n_s32(<4 x i32> %a) {
				; CHECK-LABEL: test_vshlq_n_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshl.i32 q0, q0, #18
				; CHECK-NEXT: bx lr
				entry:
				%0 = shl <4 x i32> %a, <i32 18, i32 18, i32 18, i32 18>
				ret <4 x i32> %0
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshrq_n_s8(<16 x i8> %a) {
				; CHECK-LABEL: test_vshrq_n_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshr.s8 q0, q0, #4
				; CHECK-NEXT: bx lr
				entry:
				%0 = ashr <16 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
				ret <16 x i8> %0
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshrq_n_s16(<8 x i16> %a) {
				; CHECK-LABEL: test_vshrq_n_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshr.s16 q0, q0, #10
				; CHECK-NEXT: bx lr
				entry:
				%0 = ashr <8 x i16> %a, <i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10>
				ret <8 x i16> %0
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshrq_n_s32(<4 x i32> %a) {
				; CHECK-LABEL: test_vshrq_n_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshr.s32 q0, q0, #19
				; CHECK-NEXT: bx lr
				entry:
				%0 = ashr <4 x i32> %a, <i32 19, i32 19, i32 19, i32 19>
				ret <4 x i32> %0
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshrq_n_u8(<16 x i8> %a) {
				; CHECK-LABEL: test_vshrq_n_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshr.u8 q0, q0, #1
				; CHECK-NEXT: bx lr
				entry:
				%0 = lshr <16 x i8> %a, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
				ret <16 x i8> %0
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshrq_n_u16(<8 x i16> %a) {
				; CHECK-LABEL: test_vshrq_n_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshr.u16 q0, q0, #10
				; CHECK-NEXT: bx lr
				entry:
				%0 = lshr <8 x i16> %a, <i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10>
				ret <8 x i16> %0
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshrq_n_u32(<4 x i32> %a) {
				; CHECK-LABEL: test_vshrq_n_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vshr.u32 q0, q0, #10
				; CHECK-NEXT: bx lr
				entry:
				%0 = lshr <4 x i32> %a, <i32 10, i32 10, i32 10, i32 10>
				ret <4 x i32> %0
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshlq_m_n_s8(<16 x i8> %inactive, <16 x i8> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_m_n_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i8 q0, q1, #6
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				%2 = tail call <16 x i8> @llvm.arm.mve.shl.imm.predicated.v16i8.v16i1(<16 x i8> %a, i32 6, <16 x i1> %1, <16 x i8> %inactive)
				ret <16 x i8> %2
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshlq_m_n_s16(<8 x i16> %inactive, <8 x i16> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_m_n_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i16 q0, q1, #13
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %0)
				%2 = tail call <8 x i16> @llvm.arm.mve.shl.imm.predicated.v8i16.v8i1(<8 x i16> %a, i32 13, <8 x i1> %1, <8 x i16> %inactive)
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshlq_m_n_s32(<4 x i32> %inactive, <4 x i32> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_m_n_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i32 q0, q1, #0
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x i32> @llvm.arm.mve.shl.imm.predicated.v4i32.v4i1(<4 x i32> %a, i32 0, <4 x i1> %1, <4 x i32> %inactive)
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshrq_m_n_s8(<16 x i8> %inactive, <16 x i8> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_m_n_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.s8 q0, q1, #2
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				%2 = tail call <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8> %a, i32 2, i32 0, <16 x i1> %1, <16 x i8> %inactive)
				ret <16 x i8> %2
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshrq_m_n_s16(<8 x i16> %inactive, <8 x i16> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_m_n_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.s16 q0, q1, #3
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %0)
				%2 = tail call <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16> %a, i32 3, i32 0, <8 x i1> %1, <8 x i16> %inactive)
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshrq_m_n_s32(<4 x i32> %inactive, <4 x i32> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_m_n_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.s32 q0, q1, #13
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32> %a, i32 13, i32 0, <4 x i1> %1, <4 x i32> %inactive)
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshrq_m_n_u8(<16 x i8> %inactive, <16 x i8> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_m_n_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.u8 q0, q1, #4
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				%2 = tail call <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8> %a, i32 4, i32 1, <16 x i1> %1, <16 x i8> %inactive)
				ret <16 x i8> %2
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshrq_m_n_u16(<8 x i16> %inactive, <8 x i16> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_m_n_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.u16 q0, q1, #14
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %0)
				%2 = tail call <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16> %a, i32 14, i32 1, <8 x i1> %1, <8 x i16> %inactive)
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshrq_m_n_u32(<4 x i32> %inactive, <4 x i32> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_m_n_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.u32 q0, q1, #21
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32> %a, i32 21, i32 1, <4 x i1> %1, <4 x i32> %inactive)
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshlq_x_n_s8(<16 x i8> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_x_n_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i8 q0, q0, #1
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				%2 = tail call <16 x i8> @llvm.arm.mve.shl.imm.predicated.v16i8.v16i1(<16 x i8> %a, i32 1, <16 x i1> %1, <16 x i8> undef)
				ret <16 x i8> %2
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshlq_x_n_s16(<8 x i16> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_x_n_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i16 q0, q0, #15
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %0)
				%2 = tail call <8 x i16> @llvm.arm.mve.shl.imm.predicated.v8i16.v8i1(<8 x i16> %a, i32 15, <8 x i1> %1, <8 x i16> undef)
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshlq_x_n_s32(<4 x i32> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_x_n_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i32 q0, q0, #13
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x i32> @llvm.arm.mve.shl.imm.predicated.v4i32.v4i1(<4 x i32> %a, i32 13, <4 x i1> %1, <4 x i32> undef)
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshlq_x_n_u8(<16 x i8> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_x_n_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i8 q0, q0, #4
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				%2 = tail call <16 x i8> @llvm.arm.mve.shl.imm.predicated.v16i8.v16i1(<16 x i8> %a, i32 4, <16 x i1> %1, <16 x i8> undef)
				ret <16 x i8> %2
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshlq_x_n_u16(<8 x i16> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_x_n_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i16 q0, q0, #10
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %0)
				%2 = tail call <8 x i16> @llvm.arm.mve.shl.imm.predicated.v8i16.v8i1(<8 x i16> %a, i32 10, <8 x i1> %1, <8 x i16> undef)
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshlq_x_n_u32(<4 x i32> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshlq_x_n_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshlt.i32 q0, q0, #30
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x i32> @llvm.arm.mve.shl.imm.predicated.v4i32.v4i1(<4 x i32> %a, i32 30, <4 x i1> %1, <4 x i32> undef)
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshrq_x_n_s8(<16 x i8> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_x_n_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.s8 q0, q0, #4
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				%2 = tail call <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8> %a, i32 4, i32 0, <16 x i1> %1, <16 x i8> undef)
				ret <16 x i8> %2
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshrq_x_n_s16(<8 x i16> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_x_n_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.s16 q0, q0, #10
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %0)
				%2 = tail call <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16> %a, i32 10, i32 0, <8 x i1> %1, <8 x i16> undef)
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshrq_x_n_s32(<4 x i32> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_x_n_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.s32 q0, q0, #7
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32> %a, i32 7, i32 0, <4 x i1> %1, <4 x i32> undef)
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vshrq_x_n_u8(<16 x i8> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_x_n_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.u8 q0, q0, #7
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				%2 = tail call <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8> %a, i32 7, i32 1, <16 x i1> %1, <16 x i8> undef)
				ret <16 x i8> %2
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vshrq_x_n_u16(<8 x i16> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_x_n_u16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.u16 q0, q0, #7
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %0)
				%2 = tail call <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16> %a, i32 7, i32 1, <8 x i1> %1, <8 x i16> undef)
				ret <8 x i16> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vshrq_x_n_u32(<4 x i32> %a, i16 zeroext %p) {
				; CHECK-LABEL: test_vshrq_x_n_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vshrt.u32 q0, q0, #6
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32> %a, i32 6, i32 1, <4 x i1> %1, <4 x i32> undef)
				ret <4 x i32> %2
				}

				declare <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32)
				declare <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32)
				declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)

				declare <16 x i8> @llvm.arm.mve.shl.imm.predicated.v16i8.v16i1(<16 x i8>, i32, <16 x i1>, <16 x i8>)
				declare <8 x i16> @llvm.arm.mve.shl.imm.predicated.v8i16.v8i1(<8 x i16>, i32, <8 x i1>, <8 x i16>)
				declare <4 x i32> @llvm.arm.mve.shl.imm.predicated.v4i32.v4i1(<4 x i32>, i32, <4 x i1>, <4 x i32>)

				declare <16 x i8> @llvm.arm.mve.shr.imm.predicated.v16i8.v16i1(<16 x i8>, i32, i32, <16 x i1>, <16 x i8>)
				declare <8 x i16> @llvm.arm.mve.shr.imm.predicated.v8i16.v8i1(<8 x i16>, i32, i32, <8 x i1>, <8 x i16>)
				declare <4 x i32> @llvm.arm.mve.shr.imm.predicated.v4i32.v4i1(<4 x i32>, i32, i32, <4 x i1>, <4 x i32>)

This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] Add intrinsics for immediate shifts.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 232862

clang/include/clang/Basic/arm_mve.td

clang/include/clang/Basic/arm_mve_defs.td

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/arm-mve-intrinsics/vector-shift-imm.c

clang/utils/TableGen/MveEmitter.cpp

llvm/include/llvm/IR/IntrinsicsARM.td

llvm/lib/Target/ARM/ARMInstrMVE.td

llvm/test/CodeGen/Thumb2/mve-intrinsics/vector-shift-imm.ll

[ARM][MVE] Add intrinsics for immediate shifts.
ClosedPublic