This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
1/4
IntrinsicsARM.td
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMISelLowering.cpp
3/10
ARMInstrMVE.td
-
test/CodeGen/Thumb2/mve-intrinsics/
-
CodeGen/
-
Thumb2/
-
mve-intrinsics/
1/6
vaddq.ll
1/1
vcvt.ll
1/1
vminvq.ll

Differential D67158

[ARM] Begin adding IR intrinsics for MVE instructions.
ClosedPublic

Authored by simon_tatham on Sep 4 2019, 5:47 AM.

Download Raw Diff

Details

Reviewers

dmgreen
miyuki
ostannard

Commits

rG1b45297e013e: [ARM] Begin adding IR intrinsics for MVE instructions.

Summary

This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.

This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).

When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.

(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new MVEVectorVTInfo records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)

The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 39988
Build 40060: arc lint + arc unit

Event Timeline

simon_tatham created this revision.Sep 4 2019, 5:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2019, 5:47 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

Harbormaster completed remote builds in B37711: Diff 218662.Sep 4 2019, 5:47 AM

simon_tatham mentioned this in D67161: [clang,ARM] Initial ACLE intrinsics for MVE..Sep 4 2019, 5:48 AM

dmgreen added inline comments.Sep 9 2019, 2:49 AM

llvm/include/llvm/IR/IntrinsicsARM.td
810	A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.smin, although that doesn't contain the first parameter. (I'm just mentioning this for general interest mostly, I don't think we should be using the thing yet. It may be more interesting in the future if llvm started optimising the vector.reduce better, but for the moment I think that the arm intrinsic is a good idea).
820	Any reason this is called fltnarrow? As opposed to something closer to the name of the instruction?
llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
212 ↗	(On Diff #218662)	Maybe call this something like "AddMVEPredicateToOps"?
llvm/lib/Target/ARM/ARMInstrMVE.td
284	The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a set of "X86VectorVTInfo" classes for each type, that can contains extra information about that vector type. That way we could store the predicate type with the other info, and instead of adding the ValueType to the MVE_VADDSUB, for example, we could add this class data. (And potentially add extra stuff like signed_suffix and unsigned_suffix, etc) I'm not sure if that method is better of worse than this, either from an efficiency of tblgen, or if it's simpler to read.
746	Are the instructions correct here? Should they be s8/s16/s32?
1591	I feel like this deserves at least a little indentation. Maybe not for each level of for, but something to make it easier to parse. The last 2 levels are really just defining variables, right? And could be written in-place.
5534	There is an isel node called predicate_cast that does the same thing as this. It may be possible/beneficial to convert this earlier (during lowering, but I'm not sure that would do anything yet). The patterns are further up, near a lot of the vcmp patterns.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll
2 ↗	(On Diff #218662)	-verify-machineinstrs is probably worth putting on most of these tests.
22 ↗	(On Diff #218662)	The #1 can be removed?
llvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll
11	Can you add test for 0 too
llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll
5	Test other VT's too, please.

simon_tatham marked 8 inline comments as done.Sep 11 2019, 4:53 AM

simon_tatham added inline comments.

llvm/include/llvm/IR/IntrinsicsARM.td
820	Not any particularly good reason. I had the vague idea of naming things after their functionality as long as it didn't turn the name into a giant essay (there's probably no hope for `vrmlaldavhax`), on the vague principle that that might make them easier to find if anyone later goes looking for platform-specific intrinsics that could be pulled out into standard IR representations. But it's not something I'm strongly committed to, just a harebrained idea I thought I'd throw out there to see if anyone had opinions about it.

Addressed many review comments.

Harbormaster completed remote builds in B38005: Diff 219690.Sep 11 2019, 4:54 AM

dmgreen added inline comments.Sep 23 2019, 12:12 AM

llvm/include/llvm/IR/IntrinsicsARM.td
820	I would say that in this case, if it is going from something called vcvt to something called vcvt then similar name would be less confusing (presuming that you can search for arm_mve_vcvt to distinguish the intrinsic from the instruction from the builtin).

Rebased, and renamed the fltnarrow intrinsic as suggested.

Harbormaster completed remote builds in B38997: Diff 223225.Oct 4 2019, 8:29 AM

This is a bit large to review in a single patch, and I don't think all the parts are necessarily interrelated. Mind pulling a few logically separable parts out into separate patches, to make what's left simpler?

dmgreen mentioned this in D68566: [ARM] VQADD instructions.Oct 7 2019, 7:24 AM

Split this patch into three as requested. This one now contains only
the subset of the previous IR intrinsics that can be implemented by
Tablegen patterns. The ones using C++ are moved out to two followup
patches.

simon_tatham retitled this revision from [ARM] Add IR intrinsics for a sample of MVE instructions. to [ARM] Begin adding IR intrinsics for MVE instructions..Oct 9 2019, 6:00 AM

simon_tatham edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B39223: Diff 224027.Oct 9 2019, 6:03 AM

simon_tatham added a child revision: D68699: [ARM] Add some sample IR MVE intrinsics with C++ isel..Oct 9 2019, 6:07 AM

Thanks for splitting this up.

llvm/lib/Target/ARM/ARMInstrMVE.td
729	I feel like we should come up with a style and try and stick with it. The adds/subs below add in VT to the existing instructions and use it in the new Patterns, mixed in with the old patterns. These ones add the intrinsics to the multiclass so the top level can include the pattern (but also has patterns outside (below) too. I think there's value in making this somewhat structured, if we can.
2867	Little bit of indenting, please.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
5	We probably don't _need_ tests for simple instructions like this, they should be covered elsewhere (fine to leave them if you wish).
25	For the rest of the tests, at least for codegen we have tried to fill in all the combinations for type and operations (at least the legal types). It can be useful for making sure nothing is missed (here or in the future when some refactoring happens). Whether you want to do the same thing here is up to you, or whether you think that having interesting combinations is enough (adds with v16i8, subs with v4f32 for example).
53	Do we care about what happens when there is a fp intrinsic but we don't have mve.fp? I presume this will be a fail to select or some sort of legalisation error, which is probably fine considering what is happening.

Rewritten this patch to do all its jobs in the same reasonably
consistent way.

Also, I've replaced my mkpred function-oid with a system of
MVEVectorVTInfo, similar to the x86 system that @dmgreen pointed
out. I think it works out more nicely, for two reasons. Firstly, I can
make separate classes for vectors of signed and unsigned integers,
which MVE distinguishes even though LLVM IR doesn't. Secondly, I can
also have those info records contain bits and pieces that can be used
in the main instruction definition: the type suffix on the mnemonic
(.s32 or .f16 or whatever), and at least the _usual_ way that
vector types are represented in MVE instruction encodings. So now
every time I have to add an MVEVectorVTInfo as an extra template
parameter, I'll be able to remove a few existing ones that it makes
redundant, so the declarations shouldn't get too complicated.

Harbormaster completed remote builds in B39576: Diff 225038.Oct 15 2019, 7:50 AM

Oh, nearly forgot: I renamed the vcvt intrinsic again, to put "narrow" back into the name (it's now vcvt_narrow).

Rationale: the MVE VCVT instructions can't all be treated exactly the same way by their IR intrinsics, because conversions to a narrower element type need an extra input parameter giving the previous value of their output register, which they only overwrite half of. Non-narrowing VCVTs will have a simpler type signature without that parameter.

dmgreen added inline comments.Oct 23 2019, 2:10 AM

llvm/lib/Target/ARM/ARMInstrMVE.td
312	Maybe call these MVEv16i8? Or v16i8_t or v16i8info or something? Otherwise it looks very useful.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25	This could probably do with a reply, one way or the other. I was previously in the "don't mind either way" camp, now I feel more in the "why not just add them" camp, unless there is some reason not to.

simon_tatham marked 2 inline comments as done.Oct 23 2019, 6:46 AM

simon_tatham added inline comments.

llvm/lib/Target/ARM/ARMInstrMVE.td
312	I copied the `v16i8_info` naming system from the similar x86 case you pointed out to me. But I can call them something with "MVE" in the name if you prefer, sure.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25	I had intended to stop at "interesting subset of combinations", along the lines of one of each operation, and one of each type, but not the full cross product unless absolutely necessary. (There are about 2000 of these to come in future work, so at some point adding a test for every single one won't deserve the word "just" any more!)

dmgreen added inline comments.Oct 23 2019, 8:31 AM

llvm/lib/Target/ARM/ARMInstrMVE.td
312	Ah righteo. The underscore threw me off. Whatever you think is best. MVE might not be the worst idea, just in case someone decides to do something similar in NEON.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25	The add int and add float are two different instructions, same for sub, so it probably best to make sure we have at least one of each of those combinations. I was looking for prior art of only adding subsets of the combinations for codegen, I don't immediately see anywhere where we have done that. I'm not too worried about this set of tests not catching problems now. But in the future as things are refactored (in ways that might be difficult to predict), it might be expected that the test coverage is more complete and lead to things getting missed. I'd suggest we at least fill in the combinations for add and sub. Later on when we get to vrmlsldavhax it probably wont be as important.

Renamed the info classes from vXXyY_info to MVE_vXXyY, and added
some more tests of addition. Also rebased to current master.

Harbormaster completed remote builds in B39988: Diff 226235.Oct 24 2019, 5:52 AM

Thanks! LGTM, with just one comment that might be better left for Sam.

llvm/lib/Target/ARM/ARMInstrMVE.td
2841	Can this move into MVE_VADDSUBFMA_fp? I'm pretty sure VFMA should be fine there too. Feel free to leave that for @samparker though.

This revision is now accepted and ready to land.Oct 24 2019, 7:11 AM

Closed by commit rG1b45297e013e: [ARM] Begin adding IR intrinsics for MVE instructions. (authored by simon_tatham). · Explain WhyOct 24 2019, 8:35 AM

This revision was automatically updated to reflect the committed changes.

simon_tatham mentioned this in rG08074cc96557: [clang,ARM] Initial ACLE intrinsics for MVE..

simon_tatham mentioned this in D76490: [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family..Mar 20 2020, 5:22 AM

simon_tatham mentioned this in rG45a9945b9ea9: [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family..Mar 20 2020, 9:11 AM

Interested subject. I had never given it any thought before. Just making plans for my future studies. By the way, I discovered a helpful tool that all university students use in challenging circumstances nurse writing . I'm hoping that having his assistance will significantly speed up my learning.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 9:50 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsARM.td

30 lines

lib/

Target/

ARM/

ARMISelLowering.cpp

4 lines

ARMInstrMVE.td

212 lines

test/

CodeGen/

Thumb2/

mve-intrinsics/

vaddq.ll

112 lines

vcvt.ll

56 lines

vminvq.ll

36 lines

Diff 226235

llvm/include/llvm/IR/IntrinsicsARM.td

	Show First 20 Lines • Show All 781 Lines • ▼ Show 20 Lines
	def int_arm_vctp32 : Intrinsic<[llvm_v4i1_ty], [llvm_i32_ty], [IntrNoMem]>;			def int_arm_vctp32 : Intrinsic<[llvm_v4i1_ty], [llvm_i32_ty], [IntrNoMem]>;
	def int_arm_vctp64 : Intrinsic<[llvm_v2i1_ty], [llvm_i32_ty], [IntrNoMem]>;			def int_arm_vctp64 : Intrinsic<[llvm_v2i1_ty], [llvm_i32_ty], [IntrNoMem]>;

	// GNU eabi mcount			// GNU eabi mcount
	def int_arm_gnu_eabi_mcount : Intrinsic<[],			def int_arm_gnu_eabi_mcount : Intrinsic<[],
	[],			[],
	[IntrReadMem, IntrWriteMem]>;			[IntrReadMem, IntrWriteMem]>;

				def int_arm_mve_pred_i2v : Intrinsic<
				[llvm_anyvector_ty], [llvm_i32_ty], [IntrNoMem]>;
				def int_arm_mve_pred_v2i : Intrinsic<
				[llvm_i32_ty], [llvm_anyvector_ty], [IntrNoMem]>;

				multiclass IntrinsicSignSuffix<list<LLVMType> rets, list<LLVMType> params = [],
				list<IntrinsicProperty> props = [],
				string name = "",
				list<SDNodeProperty> sdprops = []> {
				def _s: Intrinsic<rets, params, props, name, sdprops>;
				def _u: Intrinsic<rets, params, props, name, sdprops>;
				}

				def int_arm_mve_add_predicated: Intrinsic<[llvm_anyvector_ty],
				[LLVMMatchType<0>, LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<0>],
				[IntrNoMem]>;
				def int_arm_mve_sub_predicated: Intrinsic<[llvm_anyvector_ty],
				[LLVMMatchType<0>, LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<0>],
				[IntrNoMem]>;

				defm int_arm_mve_minv: IntrinsicSignSuffix<[llvm_i32_ty],
				dmgreenUnsubmitted Not Done Reply Inline Actions A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.smin, although that doesn't contain the first parameter. (I'm just mentioning this for general interest mostly, I don't think we should be using the thing yet. It may be more interesting in the future if llvm started optimising the vector.reduce better, but for the moment I think that the arm intrinsic is a good idea). dmgreen: A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.
				[llvm_i32_ty, llvm_anyvector_ty], [IntrNoMem]>;
				defm int_arm_mve_maxv: IntrinsicSignSuffix<[llvm_i32_ty],
				[llvm_i32_ty, llvm_anyvector_ty], [IntrNoMem]>;

				def int_arm_mve_vcvt_narrow: Intrinsic<[llvm_v8f16_ty],
				[llvm_v8f16_ty, llvm_v4f32_ty, llvm_i32_ty], [IntrNoMem]>;
				def int_arm_mve_vcvt_narrow_predicated: Intrinsic<[llvm_v8f16_ty],
				[llvm_v8f16_ty, llvm_v4f32_ty, llvm_i32_ty, llvm_v4i1_ty], [IntrNoMem]>;

	} // end TargetPrefix			} // end TargetPrefix
				dmgreenUnsubmitted Not Done Reply Inline Actions Any reason this is called fltnarrow? As opposed to something closer to the name of the instruction? dmgreen: Any reason this is called fltnarrow? As opposed to something closer to the name of the…
				simon_tathamAuthorUnsubmitted Done Reply Inline Actions Not any particularly good reason. I had the vague idea of naming things after their functionality as long as it didn't turn the name into a giant essay (there's probably no hope for `vrmlaldavhax`), on the vague principle that that might make them easier to find if anyone later goes looking for platform-specific intrinsics that could be pulled out into standard IR representations. But it's not something I'm strongly committed to, just a harebrained idea I thought I'd throw out there to see if anyone had opinions about it. simon_tatham: Not any particularly good reason. I had the vague idea of naming things after their…
				dmgreenUnsubmitted Not Done Reply Inline Actions I would say that in this case, if it is going from something called vcvt to something called vcvt then similar name would be less confusing (presuming that you can search for arm_mve_vcvt to distinguish the intrinsic from the instruction from the builtin). dmgreen: I would say that in this case, if it is going from something called vcvt to something called…

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,692 Lines • ▼ Show 20 Lines	return DAG.getNode(NewOpc, SDLoc(Op), Op.getValueType(),
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));
}		}
case Intrinsic::arm_neon_vtbl1:		case Intrinsic::arm_neon_vtbl1:
return DAG.getNode(ARMISD::VTBL1, SDLoc(Op), Op.getValueType(),		return DAG.getNode(ARMISD::VTBL1, SDLoc(Op), Op.getValueType(),
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));
case Intrinsic::arm_neon_vtbl2:		case Intrinsic::arm_neon_vtbl2:
return DAG.getNode(ARMISD::VTBL2, SDLoc(Op), Op.getValueType(),		return DAG.getNode(ARMISD::VTBL2, SDLoc(Op), Op.getValueType(),
Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));		Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
		case Intrinsic::arm_mve_pred_i2v:
		case Intrinsic::arm_mve_pred_v2i:
		return DAG.getNode(ARMISD::PREDICATE_CAST, SDLoc(Op), Op.getValueType(),
		Op.getOperand(1));
}		}
}		}

static SDValue LowerATOMIC_FENCE(SDValue Op, SelectionDAG &DAG,		static SDValue LowerATOMIC_FENCE(SDValue Op, SelectionDAG &DAG,
const ARMSubtarget *Subtarget) {		const ARMSubtarget *Subtarget) {
SDLoc dl(Op);		SDLoc dl(Op);
ConstantSDNode *SSIDNode = cast<ConstantSDNode>(Op.getOperand(2));		ConstantSDNode *SSIDNode = cast<ConstantSDNode>(Op.getOperand(2));
auto SSID = static_cast<SyncScope::ID>(SSIDNode->getZExtValue());		auto SSID = static_cast<SyncScope::ID>(SSIDNode->getZExtValue());
▲ Show 20 Lines • Show All 13,381 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrMVE.td

Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	class mve_addr_q_shift<int shift> : MemOperand {
// Can be printed same way as other reg + imm operands		// Can be printed same way as other reg + imm operands
let PrintMethod = "printT2AddrModeImm8Operand<false>";		let PrintMethod = "printT2AddrModeImm8Operand<false>";
let ParserMatchClass =		let ParserMatchClass =
!cast<AsmOperandClass>("MemRegQS"#shift#"OffsetAsmOperand");		!cast<AsmOperandClass>("MemRegQS"#shift#"OffsetAsmOperand");
let DecoderMethod = "DecodeMveAddrModeQ<"#shift#">";		let DecoderMethod = "DecodeMveAddrModeQ<"#shift#">";
let MIOperandInfo = (ops MQPR:$base, i32imm:$imm);		let MIOperandInfo = (ops MQPR:$base, i32imm:$imm);
}		}

		// A family of classes wrapping up information about the vector types
		// used by MVE.
		class MVEVectorVTInfo<ValueType vec, ValueType pred, bits<2> size,
		string suffix, bit unsigned> {
		// The LLVM ValueType representing the vector, so we can use it in
		// ISel patterns.
		ValueType Vec = vec;
		dmgreenUnsubmitted Not Done Reply Inline Actions The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a set of "X86VectorVTInfo" classes for each type, that can contains extra information about that vector type. That way we could store the predicate type with the other info, and instead of adding the ValueType to the MVE_VADDSUB, for example, we could add this class data. (And potentially add extra stuff like signed_suffix and unsigned_suffix, etc) I'm not sure if that method is better of worse than this, either from an efficiency of tblgen, or if it's simpler to read. dmgreen: The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a…

		// An LLVM ValueType representing a corresponding vector of
		// predicate bits, for use in ISel patterns that handle an IR
		// intrinsic describing the predicated form of the instruction.
		//
		// Usually, for a vector of N things, this will be vNi1. But for
		// vectors of 2 values, we make an exception, and use v4i1 instead
		// of v2i1. Rationale: MVE codegen doesn't support doing all the
		// auxiliary operations on v2i1 (vector shuffles etc), and also,
		// there's no MVE compare instruction that will _generate_ v2i1
		// directly.
		ValueType Pred = pred;

		// The most common representation of the vector element size in MVE
		// instruction encodings: a 2-bit value V representing an (8<<V)-bit
		// vector element.
		bits<2> Size = size;

		// For vectors explicitly mentioning a signedness of integers: 0 for
		// signed and 1 for unsigned. For anything else, undefined.
		bit Unsigned = unsigned;

		// The suffix used on the instruction in assembly language.
		string Suffix = suffix;
		}

		// Integer vector types that don't treat signed and unsigned differently.
		def MVE_v16i8 : MVEVectorVTInfo<v16i8, v16i1, 0b00, "i8", ?>;
		dmgreenUnsubmitted Not Done Reply Inline Actions Maybe call these MVEv16i8? Or v16i8_t or v16i8info or something? Otherwise it looks very useful. dmgreen: Maybe call these MVEv16i8? Or v16i8_t or v16i8info or something? Otherwise it looks very…
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions I copied the `v16i8_info` naming system from the similar x86 case you pointed out to me. But I can call them something with "MVE" in the name if you prefer, sure. simon_tatham: I copied the ``v16i8_info`` naming system from the similar x86 case you pointed out to me. But…
		dmgreenUnsubmitted Not Done Reply Inline Actions Ah righteo. The underscore threw me off. Whatever you think is best. MVE might not be the worst idea, just in case someone decides to do something similar in NEON. dmgreen: Ah righteo. The underscore threw me off. Whatever you think is best. MVE might not be the worst…
		def MVE_v8i16 : MVEVectorVTInfo<v8i16, v8i1, 0b01, "i16", ?>;
		def MVE_v4i32 : MVEVectorVTInfo<v4i32, v4i1, 0b10, "i32", ?>;
		def MVE_v2i64 : MVEVectorVTInfo<v2i64, v4i1, 0b11, "i64", ?>;

		// Explicitly signed and unsigned integer vectors. They map to the
		// same set of LLVM ValueTypes as above, but are represented
		// differently in assembly and instruction encodings.
		def MVE_v16s8 : MVEVectorVTInfo<v16i8, v16i1, 0b00, "s8", 0b0>;
		def MVE_v8s16 : MVEVectorVTInfo<v8i16, v8i1, 0b01, "s16", 0b0>;
		def MVE_v4s32 : MVEVectorVTInfo<v4i32, v4i1, 0b10, "s32", 0b0>;
		def MVE_v2s64 : MVEVectorVTInfo<v2i64, v4i1, 0b11, "s64", 0b0>;
		def MVE_v16u8 : MVEVectorVTInfo<v16i8, v16i1, 0b00, "u8", 0b1>;
		def MVE_v8u16 : MVEVectorVTInfo<v8i16, v8i1, 0b01, "u16", 0b1>;
		def MVE_v4u32 : MVEVectorVTInfo<v4i32, v4i1, 0b10, "u32", 0b1>;
		def MVE_v2u64 : MVEVectorVTInfo<v2i64, v4i1, 0b11, "u64", 0b1>;

		// FP vector types.
		def MVE_v8f16 : MVEVectorVTInfo<v8f16, v8i1, 0b01, "f16", ?>;
		def MVE_v4f32 : MVEVectorVTInfo<v4f32, v4i1, 0b10, "f32", ?>;
		def MVE_v2f64 : MVEVectorVTInfo<v2f64, v4i1, 0b11, "f64", ?>;

// --------- Start of base classes for the instructions themselves		// --------- Start of base classes for the instructions themselves

class MVE_MI<dag oops, dag iops, InstrItinClass itin, string asm,		class MVE_MI<dag oops, dag iops, InstrItinClass itin, string asm,
string ops, string cstr, list<dag> pattern>		string ops, string cstr, list<dag> pattern>
: Thumb2XI<oops, iops, AddrModeNone, 4, itin, !strconcat(asm, "\t", ops), cstr,		: Thumb2XI<oops, iops, AddrModeNone, 4, itin, !strconcat(asm, "\t", ops), cstr,
pattern>,		pattern>,
Requires<[HasMVEInt]> {		Requires<[HasMVEInt]> {
let D = MVEDomain;		let D = MVEDomain;
▲ Show 20 Lines • Show All 367 Lines • ▼ Show 20 Lines	class MVE_VMINMAXV<string iname, string suffix, bit U, bits<2> size,
let Inst{15-12} = RdaDest{3-0};		let Inst{15-12} = RdaDest{3-0};
let Inst{8} = 0b1;		let Inst{8} = 0b1;
let Inst{7} = bit_7;		let Inst{7} = bit_7;
let Inst{6-5} = 0b00;		let Inst{6-5} = 0b00;
let Inst{3-1} = Qm{2-0};		let Inst{3-1} = Qm{2-0};
let Inst{0} = 0b0;		let Inst{0} = 0b0;
}		}

multiclass MVE_VMINMAXV_ty<string iname, bit bit_7, list<dag> pattern=[]> {		multiclass MVE_VMINMAXV_p<string iname, bit bit_17, bit bit_7,
def s8 : MVE_VMINMAXV<iname, "s8", 0b0, 0b00, 0b1, bit_7>;		MVEVectorVTInfo VTI, Intrinsic intr> {
def s16 : MVE_VMINMAXV<iname, "s16", 0b0, 0b01, 0b1, bit_7>;		def "": MVE_VMINMAXV<iname, VTI.Suffix, VTI.Unsigned, VTI.Size,
def s32 : MVE_VMINMAXV<iname, "s32", 0b0, 0b10, 0b1, bit_7>;		bit_17, bit_7>;
def u8 : MVE_VMINMAXV<iname, "u8", 0b1, 0b00, 0b1, bit_7>;
def u16 : MVE_VMINMAXV<iname, "u16", 0b1, 0b01, 0b1, bit_7>;		let Predicates = [HasMVEInt] in
def u32 : MVE_VMINMAXV<iname, "u32", 0b1, 0b10, 0b1, bit_7>;		def _pat : Pat<(i32 (intr (i32 rGPR:$prev), (VTI.Vec MQPR:$vec))),
}		(i32 (!cast<Instruction>(NAME)
		(i32 rGPR:$prev), (VTI.Vec MQPR:$vec)))>;
defm MVE_VMINV : MVE_VMINMAXV_ty<"vminv", 0b1>;		}
defm MVE_VMAXV : MVE_VMINMAXV_ty<"vmaxv", 0b0>;
		multiclass MVE_VMINMAXV_ty<string iname, bit bit_7,
		Intrinsic intr_s, Intrinsic intr_u> {
		dmgreenUnsubmitted Not Done Reply Inline Actions I feel like we should come up with a style and try and stick with it. The adds/subs below add in VT to the existing instructions and use it in the new Patterns, mixed in with the old patterns. These ones add the intrinsics to the multiclass so the top level can include the pattern (but also has patterns outside (below) too. I think there's value in making this somewhat structured, if we can. dmgreen: I feel like we should come up with a style and try and stick with it. The adds/subs below add…
		defm s8 : MVE_VMINMAXV_p<iname, 1, bit_7, MVE_v16s8, intr_s>;
		defm s16: MVE_VMINMAXV_p<iname, 1, bit_7, MVE_v8s16, intr_s>;
		defm s32: MVE_VMINMAXV_p<iname, 1, bit_7, MVE_v4s32, intr_s>;
		defm u8 : MVE_VMINMAXV_p<iname, 1, bit_7, MVE_v16u8, intr_u>;
		defm u16: MVE_VMINMAXV_p<iname, 1, bit_7, MVE_v8u16, intr_u>;
		defm u32: MVE_VMINMAXV_p<iname, 1, bit_7, MVE_v4u32, intr_u>;
		}

		defm MVE_VMINV : MVE_VMINMAXV_ty<
		"vminv", 0b1, int_arm_mve_minv_s, int_arm_mve_minv_u>;
		defm MVE_VMAXV : MVE_VMINMAXV_ty<
		"vmaxv", 0b0, int_arm_mve_maxv_s, int_arm_mve_maxv_u>;

let Predicates = [HasMVEInt] in {		let Predicates = [HasMVEInt] in {
def : Pat<(i32 (vecreduce_smax (v16i8 MQPR:$src))),		def : Pat<(i32 (vecreduce_smax (v16i8 MQPR:$src))),
(i32 (MVE_VMAXVs8 (t2MVNi (i32 127)), $src))>;		(i32 (MVE_VMAXVs8 (t2MVNi (i32 127)), $src))>;
def : Pat<(i32 (vecreduce_smax (v8i16 MQPR:$src))),		def : Pat<(i32 (vecreduce_smax (v8i16 MQPR:$src))),
		dmgreenUnsubmitted Done Reply Inline Actions Are the instructions correct here? Should they be s8/s16/s32? dmgreen: Are the instructions correct here? Should they be s8/s16/s32?
(i32 (MVE_VMAXVs16 (t2MOVi32imm (i32 -32768)), $src))>;		(i32 (MVE_VMAXVs16 (t2MOVi32imm (i32 -32768)), $src))>;
def : Pat<(i32 (vecreduce_smax (v4i32 MQPR:$src))),		def : Pat<(i32 (vecreduce_smax (v4i32 MQPR:$src))),
(i32 (MVE_VMAXVs32 (t2MOVi (i32 -2147483648)), $src))>;		(i32 (MVE_VMAXVs32 (t2MOVi (i32 -2147483648)), $src))>;
def : Pat<(i32 (vecreduce_umax (v16i8 MQPR:$src))),		def : Pat<(i32 (vecreduce_umax (v16i8 MQPR:$src))),
(i32 (MVE_VMAXVu8 (t2MOVi (i32 0)), $src))>;		(i32 (MVE_VMAXVu8 (t2MOVi (i32 0)), $src))>;
def : Pat<(i32 (vecreduce_umax (v8i16 MQPR:$src))),		def : Pat<(i32 (vecreduce_umax (v8i16 MQPR:$src))),
(i32 (MVE_VMAXVu16 (t2MOVi (i32 0)), $src))>;		(i32 (MVE_VMAXVu16 (t2MOVi (i32 0)), $src))>;
def : Pat<(i32 (vecreduce_umax (v4i32 MQPR:$src))),		def : Pat<(i32 (vecreduce_umax (v4i32 MQPR:$src))),
▲ Show 20 Lines • Show All 801 Lines • ▼ Show 20 Lines	class MVE_VADDSUB<string iname, string suffix, bits<2> size, bit subtract,
let Inst{25-23} = 0b110;		let Inst{25-23} = 0b110;
let Inst{16} = 0b0;		let Inst{16} = 0b0;
let Inst{12-8} = 0b01000;		let Inst{12-8} = 0b01000;
let Inst{4} = 0b0;		let Inst{4} = 0b0;
let Inst{0} = 0b0;		let Inst{0} = 0b0;
let validForTailPredication = 1;		let validForTailPredication = 1;
}		}

class MVE_VADD<string suffix, bits<2> size, list<dag> pattern=[]>		multiclass MVE_VADDSUB_m<string iname, MVEVectorVTInfo VTI, bit subtract,
: MVE_VADDSUB<"vadd", suffix, size, 0b0, pattern>;		SDNode unpred_op, Intrinsic pred_int> {
class MVE_VSUB<string suffix, bits<2> size, list<dag> pattern=[]>		def "" : MVE_VADDSUB<iname, VTI.Suffix, VTI.Size, subtract>;
: MVE_VADDSUB<"vsub", suffix, size, 0b1, pattern>;

def MVE_VADDi8 : MVE_VADD<"i8", 0b00>;
def MVE_VADDi16 : MVE_VADD<"i16", 0b01>;
def MVE_VADDi32 : MVE_VADD<"i32", 0b10>;

let Predicates = [HasMVEInt] in {		let Predicates = [HasMVEInt] in {
def : Pat<(v16i8 (add (v16i8 MQPR:$val1), (v16i8 MQPR:$val2))),		// Unpredicated add/subtract
(v16i8 (MVE_VADDi8 (v16i8 MQPR:$val1), (v16i8 MQPR:$val2)))>;		def : Pat<(VTI.Vec (unpred_op (VTI.Vec MQPR:$Qm), (VTI.Vec MQPR:$Qn))),
def : Pat<(v8i16 (add (v8i16 MQPR:$val1), (v8i16 MQPR:$val2))),		(VTI.Vec (!cast<Instruction>(NAME)
(v8i16 (MVE_VADDi16 (v8i16 MQPR:$val1), (v8i16 MQPR:$val2)))>;		(VTI.Vec MQPR:$Qm), (VTI.Vec MQPR:$Qn)))>;
def : Pat<(v4i32 (add (v4i32 MQPR:$val1), (v4i32 MQPR:$val2))),
(v4i32 (MVE_VADDi32 (v4i32 MQPR:$val1), (v4i32 MQPR:$val2)))>;		// Predicated add/subtract
		def : Pat<(VTI.Vec (pred_int (VTI.Vec MQPR:$Qm), (VTI.Vec MQPR:$Qn),
		(VTI.Pred VCCR:$mask), (VTI.Vec MQPR:$inactive))),
		(VTI.Vec (!cast<Instruction>(NAME)
		(VTI.Vec MQPR:$Qm), (VTI.Vec MQPR:$Qn),
		(i32 1), (VTI.Pred VCCR:$mask),
		(VTI.Vec MQPR:$inactive)))>;
}		}

def MVE_VSUBi8 : MVE_VSUB<"i8", 0b00>;
def MVE_VSUBi16 : MVE_VSUB<"i16", 0b01>;
def MVE_VSUBi32 : MVE_VSUB<"i32", 0b10>;

let Predicates = [HasMVEInt] in {
def : Pat<(v16i8 (sub (v16i8 MQPR:$val1), (v16i8 MQPR:$val2))),
(v16i8 (MVE_VSUBi8 (v16i8 MQPR:$val1), (v16i8 MQPR:$val2)))>;
def : Pat<(v8i16 (sub (v8i16 MQPR:$val1), (v8i16 MQPR:$val2))),
(v8i16 (MVE_VSUBi16 (v8i16 MQPR:$val1), (v8i16 MQPR:$val2)))>;
def : Pat<(v4i32 (sub (v4i32 MQPR:$val1), (v4i32 MQPR:$val2))),
(v4i32 (MVE_VSUBi32 (v4i32 MQPR:$val1), (v4i32 MQPR:$val2)))>;
}		}

		multiclass MVE_VADD<MVEVectorVTInfo VTI>
		: MVE_VADDSUB_m<"vadd", VTI, 0b0, add, int_arm_mve_add_predicated>;
		multiclass MVE_VSUB<MVEVectorVTInfo VTI>
		: MVE_VADDSUB_m<"vsub", VTI, 0b1, sub, int_arm_mve_sub_predicated>;

		defm MVE_VADDi8 : MVE_VADD<MVE_v16i8>;
		defm MVE_VADDi16 : MVE_VADD<MVE_v8i16>;
		defm MVE_VADDi32 : MVE_VADD<MVE_v4i32>;
		dmgreenUnsubmitted Done Reply Inline Actions I feel like this deserves at least a little indentation. Maybe not for each level of for, but something to make it easier to parse. The last 2 levels are really just defining variables, right? And could be written in-place. dmgreen: I feel like this deserves at least a little indentation. Maybe not for each level of for, but…

		defm MVE_VSUBi8 : MVE_VSUB<MVE_v16i8>;
		defm MVE_VSUBi16 : MVE_VSUB<MVE_v8i16>;
		defm MVE_VSUBi32 : MVE_VSUB<MVE_v4i32>;

class MVE_VQADDSUB<string iname, string suffix, bit U, bit subtract,		class MVE_VQADDSUB<string iname, string suffix, bit U, bit subtract,
bits<2> size, ValueType vt>		bits<2> size, ValueType vt>
: MVE_int<iname, suffix, size, []> {		: MVE_int<iname, suffix, size, []> {

let Inst{28} = U;		let Inst{28} = U;
let Inst{25-23} = 0b110;		let Inst{25-23} = 0b110;
let Inst{16} = 0b0;		let Inst{16} = 0b0;
let Inst{12-10} = 0b000;		let Inst{12-10} = 0b000;
▲ Show 20 Lines • Show All 1,225 Lines • ▼ Show 20 Lines

let Predicates = [HasMVEFloat] in {		let Predicates = [HasMVEFloat] in {
def : Pat<(v8f16 (fma (v8f16 MQPR:$src1), (v8f16 MQPR:$src2), (v8f16 MQPR:$src3))),		def : Pat<(v8f16 (fma (v8f16 MQPR:$src1), (v8f16 MQPR:$src2), (v8f16 MQPR:$src3))),
(v8f16 (MVE_VFMAf16 $src3, $src1, $src2))>;		(v8f16 (MVE_VFMAf16 $src3, $src1, $src2))>;
def : Pat<(v4f32 (fma (v4f32 MQPR:$src1), (v4f32 MQPR:$src2), (v4f32 MQPR:$src3))),		def : Pat<(v4f32 (fma (v4f32 MQPR:$src1), (v4f32 MQPR:$src2), (v4f32 MQPR:$src3))),
(v4f32 (MVE_VFMAf32 $src3, $src1, $src2))>;		(v4f32 (MVE_VFMAf32 $src3, $src1, $src2))>;
}		}

		multiclass MVE_VADDSUB_fp_m<string iname, bit bit_21, MVEVectorVTInfo VTI,
let validForTailPredication = 1 in {		SDNode unpred_op, Intrinsic pred_int> {
def MVE_VADDf32 : MVE_VADDSUBFMA_fp<"vadd", "f32", 0b0, 0b0, 0b1, 0b0>;		def "" : MVE_VADDSUBFMA_fp<iname, VTI.Suffix, VTI.Size{0}, 0, 1, bit_21> {
def MVE_VADDf16 : MVE_VADDSUBFMA_fp<"vadd", "f16", 0b1, 0b0, 0b1, 0b0>;		let validForTailPredication = 1;
		dmgreenUnsubmitted Not Done Reply Inline Actions Can this move into MVE_VADDSUBFMA_fp? I'm pretty sure VFMA should be fine there too. Feel free to leave that for @samparker though. dmgreen: Can this move into MVE_VADDSUBFMA_fp? I'm pretty sure VFMA should be fine there too. Feel free…
}		}

let Predicates = [HasMVEFloat] in {		let Predicates = [HasMVEFloat] in {
def : Pat<(v4f32 (fadd (v4f32 MQPR:$val1), (v4f32 MQPR:$val2))),		def : Pat<(VTI.Vec (unpred_op (VTI.Vec MQPR:$Qm), (VTI.Vec MQPR:$Qn))),
(v4f32 (MVE_VADDf32 (v4f32 MQPR:$val1), (v4f32 MQPR:$val2)))>;		(VTI.Vec (!cast<Instruction>(NAME)
def : Pat<(v8f16 (fadd (v8f16 MQPR:$val1), (v8f16 MQPR:$val2))),		(VTI.Vec MQPR:$Qm), (VTI.Vec MQPR:$Qn)))>;
(v8f16 (MVE_VADDf16 (v8f16 MQPR:$val1), (v8f16 MQPR:$val2)))>;		def : Pat<(VTI.Vec (pred_int (VTI.Vec MQPR:$Qm), (VTI.Vec MQPR:$Qn),
		(VTI.Pred VCCR:$mask), (VTI.Vec MQPR:$inactive))),
		(VTI.Vec (!cast<Instruction>(NAME)
		(VTI.Vec MQPR:$Qm), (VTI.Vec MQPR:$Qn),
		(i32 1), (VTI.Pred VCCR:$mask),
		(VTI.Vec MQPR:$inactive)))>;
		}
}		}

		multiclass MVE_VADD_fp_m<MVEVectorVTInfo VTI>
		: MVE_VADDSUB_fp_m<"vadd", 0, VTI, fadd, int_arm_mve_add_predicated>;
		multiclass MVE_VSUB_fp_m<MVEVectorVTInfo VTI>
		: MVE_VADDSUB_fp_m<"vsub", 1, VTI, fsub, int_arm_mve_sub_predicated>;

let validForTailPredication = 1 in {		defm MVE_VADDf32 : MVE_VADD_fp_m<MVE_v4f32>;
def MVE_VSUBf32 : MVE_VADDSUBFMA_fp<"vsub", "f32", 0b0, 0b0, 0b1, 0b1>;		defm MVE_VADDf16 : MVE_VADD_fp_m<MVE_v8f16>;
def MVE_VSUBf16 : MVE_VADDSUBFMA_fp<"vsub", "f16", 0b1, 0b0, 0b1, 0b1>;
}

let Predicates = [HasMVEFloat] in {		defm MVE_VSUBf32 : MVE_VSUB_fp_m<MVE_v4f32>;
def : Pat<(v4f32 (fsub (v4f32 MQPR:$val1), (v4f32 MQPR:$val2))),		defm MVE_VSUBf16 : MVE_VSUB_fp_m<MVE_v8f16>;
(v4f32 (MVE_VSUBf32 (v4f32 MQPR:$val1), (v4f32 MQPR:$val2)))>;
def : Pat<(v8f16 (fsub (v8f16 MQPR:$val1), (v8f16 MQPR:$val2))),
(v8f16 (MVE_VSUBf16 (v8f16 MQPR:$val1), (v8f16 MQPR:$val2)))>;
}

		dmgreenUnsubmitted Not Done Reply Inline Actions Little bit of indenting, please. dmgreen: Little bit of indenting, please.
class MVE_VCADD<string suffix, bit size, string cstr="", list<dag> pattern=[]>		class MVE_VCADD<string suffix, bit size, string cstr="", list<dag> pattern=[]>
: MVEFloatArithNeon<"vcadd", suffix, size, (outs MQPR:$Qd),		: MVEFloatArithNeon<"vcadd", suffix, size, (outs MQPR:$Qd),
(ins MQPR:$Qn, MQPR:$Qm, complexrotateopodd:$rot),		(ins MQPR:$Qn, MQPR:$Qm, complexrotateopodd:$rot),
"$Qd, $Qn, $Qm, $rot", vpred_r, cstr, pattern> {		"$Qd, $Qn, $Qm, $rot", vpred_r, cstr, pattern> {
bits<4> Qd;		bits<4> Qd;
bits<4> Qn;		bits<4> Qn;
bit rot;		bit rot;

▲ Show 20 Lines • Show All 746 Lines • ▼ Show 20 Lines	class MVE_VCVT_ff<string iname, string suffix, bit op, bit T,
let Inst{21-16} = 0b111111;		let Inst{21-16} = 0b111111;
let Inst{12} = T;		let Inst{12} = T;
let Inst{8-7} = 0b00;		let Inst{8-7} = 0b00;
let Inst{0} = 0b1;		let Inst{0} = 0b1;

let Predicates = [HasMVEFloat];		let Predicates = [HasMVEFloat];
}		}

multiclass MVE_VCVT_ff_halves<string suffix, bit op> {		multiclass MVE_VCVT_f2h_m<string iname, int half> {
def bh : MVE_VCVT_ff<"vcvtb", suffix, op, 0b0>;		def "": MVE_VCVT_ff<iname, "f16.f32", 0b0, half>;
def th : MVE_VCVT_ff<"vcvtt", suffix, op, 0b1>;
		let Predicates = [HasMVEFloat] in {
		def : Pat<(v8f16 (int_arm_mve_vcvt_narrow
		(v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm), (i32 half))),
		(v8f16 (!cast<Instruction>(NAME)
		(v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm)))>;
		def : Pat<(v8f16 (int_arm_mve_vcvt_narrow_predicated
		(v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm), (i32 half),
		(v4i1 VCCR:$mask))),
		(v8f16 (!cast<Instruction>(NAME)
		(v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm),
		(i32 1), (v4i1 VCCR:$mask)))>;
		}
}		}

defm MVE_VCVTf16f32 : MVE_VCVT_ff_halves<"f16.f32", 0b0>;		multiclass MVE_VCVT_h2f_m<string iname, int half> {
defm MVE_VCVTf32f16 : MVE_VCVT_ff_halves<"f32.f16", 0b1>;		def "": MVE_VCVT_ff<iname, "f32.f16", 0b1, half>;
		}

		defm MVE_VCVTf16f32bh : MVE_VCVT_f2h_m<"vcvtb", 0b0>;
		defm MVE_VCVTf16f32th : MVE_VCVT_f2h_m<"vcvtt", 0b1>;
		defm MVE_VCVTf32f16bh : MVE_VCVT_h2f_m<"vcvtb", 0b0>;
		defm MVE_VCVTf32f16th : MVE_VCVT_h2f_m<"vcvtt", 0b1>;

class MVE_VxCADD<string iname, string suffix, bits<2> size, bit halve,		class MVE_VxCADD<string iname, string suffix, bits<2> size, bit halve,
string cstr="", list<dag> pattern=[]>		string cstr="", list<dag> pattern=[]>
: MVE_qDest_qSrc<iname, suffix, (outs MQPR:$Qd),		: MVE_qDest_qSrc<iname, suffix, (outs MQPR:$Qd),
(ins MQPR:$Qn, MQPR:$Qm, complexrotateopodd:$rot),		(ins MQPR:$Qn, MQPR:$Qm, complexrotateopodd:$rot),
"$Qd, $Qn, $Qm, $rot", vpred_r, cstr, pattern> {		"$Qd, $Qn, $Qm, $rot", vpred_r, cstr, pattern> {
bits<4> Qn;		bits<4> Qn;
bit rot;		bit rot;
▲ Show 20 Lines • Show All 1,860 Lines • ▼ Show 20 Lines	let Predicates = [IsBE,HasMVEInt] in {
def : Pat<(v8i16 (bitconvert (v16i8 MQPR:$src))), (v8i16 (MVE_VREV16_8 MQPR:$src))>;		def : Pat<(v8i16 (bitconvert (v16i8 MQPR:$src))), (v8i16 (MVE_VREV16_8 MQPR:$src))>;

def : Pat<(v16i8 (bitconvert (v2f64 MQPR:$src))), (v16i8 (MVE_VREV64_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v2f64 MQPR:$src))), (v16i8 (MVE_VREV64_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v2i64 MQPR:$src))), (v16i8 (MVE_VREV64_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v2i64 MQPR:$src))), (v16i8 (MVE_VREV64_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v4f32 MQPR:$src))), (v16i8 (MVE_VREV32_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v4f32 MQPR:$src))), (v16i8 (MVE_VREV32_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v4i32 MQPR:$src))), (v16i8 (MVE_VREV32_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v4i32 MQPR:$src))), (v16i8 (MVE_VREV32_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v8f16 MQPR:$src))), (v16i8 (MVE_VREV16_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v8f16 MQPR:$src))), (v16i8 (MVE_VREV16_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v8i16 MQPR:$src))), (v16i8 (MVE_VREV16_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v8i16 MQPR:$src))), (v16i8 (MVE_VREV16_8 MQPR:$src))>;
}		}
		dmgreenUnsubmitted Not Done Reply Inline Actions There is an isel node called predicate_cast that does the same thing as this. It may be possible/beneficial to convert this earlier (during lowering, but I'm not sure that would do anything yet). The patterns are further up, near a lot of the vcmp patterns. dmgreen: There is an isel node called predicate_cast that does the same thing as this. It may be…

llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -verify-machineinstrs -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc <4 x i32> @test_vaddq_u32(<4 x i32> %a, <4 x i32> %b) {
				; CHECK-LABEL: test_vaddq_u32:
				dmgreenUnsubmitted Not Done Reply Inline Actions We probably don't _need_ tests for simple instructions like this, they should be covered elsewhere (fine to leave them if you wish). dmgreen: We probably don't _need_ tests for simple instructions like this, they should be covered…
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vadd.i32 q0, q1, q0
				; CHECK-NEXT: bx lr
				entry:
				%0 = add <4 x i32> %b, %a
				ret <4 x i32> %0
				}

				define arm_aapcs_vfpcc <4 x float> @test_vaddq_f32(<4 x float> %a, <4 x float> %b) {
				; CHECK-LABEL: test_vaddq_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vadd.f32 q0, q1, q0
				; CHECK-NEXT: bx lr
				entry:
				%0 = fadd <4 x float> %b, %a
				ret <4 x float> %0
				}

				define arm_aapcs_vfpcc <8 x half> @test_vsubq_f16(<8 x half> %a, <8 x half> %b) {
				; CHECK-LABEL: test_vsubq_f16:
				dmgreenUnsubmitted Not Done Reply Inline Actions For the rest of the tests, at least for codegen we have tried to fill in all the combinations for type and operations (at least the legal types). It can be useful for making sure nothing is missed (here or in the future when some refactoring happens). Whether you want to do the same thing here is up to you, or whether you think that having interesting combinations is enough (adds with v16i8, subs with v4f32 for example). dmgreen: For the rest of the tests, at least for codegen we have tried to fill in all the combinations…
				dmgreenUnsubmitted Not Done Reply Inline Actions This could probably do with a reply, one way or the other. I was previously in the "don't mind either way" camp, now I feel more in the "why not just add them" camp, unless there is some reason not to. dmgreen: This could probably do with a reply, one way or the other. I was previously in the "don't mind…
				simon_tathamAuthorUnsubmitted Done Reply Inline Actions I had intended to stop at "interesting subset of combinations", along the lines of one of each operation, and one of each type, but not the full cross product unless absolutely necessary. (There are about 2000 of these to come in future work, so at some point adding a test for every single one won't deserve the word "just" any more!) simon_tatham: I had intended to stop at "interesting subset of combinations", along the lines of one of each…
				dmgreenUnsubmitted Not Done Reply Inline Actions The add int and add float are two different instructions, same for sub, so it probably best to make sure we have at least one of each of those combinations. I was looking for prior art of only adding subsets of the combinations for codegen, I don't immediately see anywhere where we have done that. I'm not too worried about this set of tests not catching problems now. But in the future as things are refactored (in ways that might be difficult to predict), it might be expected that the test coverage is more complete and lead to things getting missed. I'd suggest we at least fill in the combinations for add and sub. Later on when we get to vrmlsldavhax it probably wont be as important. dmgreen: The add int and add float are two different instructions, same for sub, so it probably best to…
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vsub.f16 q0, q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = fsub <8 x half> %a, %b
				ret <8 x half> %0
				}

				define arm_aapcs_vfpcc <8 x i16> @test_vsubq_s16(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: test_vsubq_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vsub.i16 q0, q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = sub <8 x i16> %a, %b
				ret <8 x i16> %0
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vaddq_m_s8(<16 x i8> %inactive, <16 x i8> %a, <16 x i8> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vaddq_m_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vaddt.i8 q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				dmgreenUnsubmitted Not Done Reply Inline Actions Do we care about what happens when there is a fp intrinsic but we don't have mve.fp? I presume this will be a fail to select or some sort of legalisation error, which is probably fine considering what is happening. dmgreen: Do we care about what happens when there is a fp intrinsic but we don't have mve.fp? I presume…
				%2 = tail call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> %a, <16 x i8> %b, <16 x i1> %1, <16 x i8> %inactive)
				ret <16 x i8> %2
				}

				declare <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32)

				declare <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8>, <16 x i8>, <16 x i1>, <16 x i8>)

				define arm_aapcs_vfpcc <8 x half> @test_vaddq_m_f16(<8 x half> %inactive, <8 x half> %a, <8 x half> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vaddq_m_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vaddt.f16 q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 %0)
				%2 = tail call <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half> %a, <8 x half> %b, <8 x i1> %1, <8 x half> %inactive)
				ret <8 x half> %2
				}

				declare <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32)

				declare <8 x half> @llvm.arm.mve.add.predicated.v8f16.v8i1(<8 x half>, <8 x half>, <8 x i1>, <8 x half>)

				define arm_aapcs_vfpcc <4 x float> @test_vsubq_m_f32(<4 x float> %inactive, <4 x float> %a, <4 x float> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vsubq_m_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vsubt.f32 q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x float> @llvm.arm.mve.sub.predicated.v4f32.v4i1(<4 x float> %a, <4 x float> %b, <4 x i1> %1, <4 x float> %inactive)
				ret <4 x float> %2
				}

				declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)

				declare <4 x float> @llvm.arm.mve.sub.predicated.v4f32.v4i1(<4 x float>, <4 x float>, <4 x i1>, <4 x float>)

				define arm_aapcs_vfpcc <4 x i32> @test_vsubq_m_u32(<4 x i32> %inactive, <4 x i32> %a, <4 x i32> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vsubq_m_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vsubt.i32 q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x i32> @llvm.arm.mve.sub.predicated.v4i32.v4i1(<4 x i32> %a, <4 x i32> %b, <4 x i1> %1, <4 x i32> %inactive)
				ret <4 x i32> %2
				}

				declare <4 x i32> @llvm.arm.mve.sub.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, <4 x i1>, <4 x i32>)

llvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -verify-machineinstrs -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc <8 x half> @test_vcvttq_f16_f32(<8 x half> %a, <4 x float> %b) {
				; CHECK-LABEL: test_vcvttq_f16_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vcvtt.f16.f32 q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> %a, <4 x float> %b, i32 1)
				ret <8 x half> %0
				dmgreenUnsubmitted Done Reply Inline Actions Can you add test for 0 too dmgreen: Can you add test for 0 too
				}

				define arm_aapcs_vfpcc <8 x half> @test_vcvtbq_f16_f32(<8 x half> %a, <4 x float> %b) {
				; CHECK-LABEL: test_vcvtbq_f16_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vcvtb.f16.f32 q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> %a, <4 x float> %b, i32 0)
				ret <8 x half> %0
				}

				declare <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half>, <4 x float>, i32)

				define arm_aapcs_vfpcc <8 x half> @test_vcvttq_m_f16_f32(<8 x half> %a, <4 x float> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vcvttq_m_f16_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vcvttt.f16.f32 q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <8 x half> @llvm.arm.mve.vcvt.narrow.predicated(<8 x half> %a, <4 x float> %b, i32 1, <4 x i1> %1)
				ret <8 x half> %2
				}

				define arm_aapcs_vfpcc <8 x half> @test_vcvtbq_m_f16_f32(<8 x half> %a, <4 x float> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vcvtbq_m_f16_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vcvtbt.f16.f32 q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <8 x half> @llvm.arm.mve.vcvt.narrow.predicated(<8 x half> %a, <4 x float> %b, i32 0, <4 x i1> %1)
				ret <8 x half> %2
				}

				declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)

				declare <8 x half> @llvm.arm.mve.vcvt.narrow.predicated(<8 x half>, <4 x float>, i32, <4 x i1>)

llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -verify-machineinstrs -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc i32 @test_vminvq_u32(i32 %a, <4 x i32> %b) {
				; CHECK-LABEL: test_vminvq_u32:
				dmgreenUnsubmitted Done Reply Inline Actions Test other VT's too, please. dmgreen: Test other VT's too, please.
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vminv.u32 r0, q0
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call i32 @llvm.arm.mve.minv.u.v4i32(i32 %a, <4 x i32> %b)
				ret i32 %0
				}

				define arm_aapcs_vfpcc i32 @test_vmaxvq_u8(i32 %a, <16 x i8> %b) {
				; CHECK-LABEL: test_vmaxvq_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmaxv.u8 r0, q0
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call i32 @llvm.arm.mve.maxv.u.v16i8(i32 %a, <16 x i8> %b)
				ret i32 %0
				}

				define arm_aapcs_vfpcc i32 @test_vminvq_s16(i32 %a, <8 x i16> %b) {
				; CHECK-LABEL: test_vminvq_s16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vminv.s16 r0, q0
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call i32 @llvm.arm.mve.minv.s.v8i16(i32 %a, <8 x i16> %b)
				ret i32 %0
				}

				declare i32 @llvm.arm.mve.minv.u.v4i32(i32, <4 x i32>)
				declare i32 @llvm.arm.mve.maxv.u.v16i8(i32, <16 x i8>)
				declare i32 @llvm.arm.mve.minv.s.v8i16(i32, <8 x i16>)