This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
1/4
IntrinsicsARM.td
-
lib/Target/ARM/
-
Target/
-
ARM/
1/1
ARMISelDAGToDAG.cpp
3/10
ARMInstrMVE.td
-
test/CodeGen/Thumb2/mve-intrinsics/
-
CodeGen/
-
Thumb2/
-
mve-intrinsics/
-
scalar-shifts.ll
2/2
vadc.ll
1/6
vaddq.ll
1/1
vcvt.ll
-
vld24.ll
-
vldr.ll
1/1
vminvq.ll

Differential D67158

[ARM] Begin adding IR intrinsics for MVE instructions.
ClosedPublic

Authored by simon_tatham on Sep 4 2019, 5:47 AM.

Download Raw Diff

Details

Reviewers

dmgreen
miyuki
ostannard

Commits

rG1b45297e013e: [ARM] Begin adding IR intrinsics for MVE instructions.

Summary

This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.

This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).

When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.

(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new MVEVectorVTInfo records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)

The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 37711
Build 37710: arc lint + arc unit

Event Timeline

simon_tatham created this revision.Sep 4 2019, 5:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2019, 5:47 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

Harbormaster completed remote builds in B37711: Diff 218662.Sep 4 2019, 5:47 AM

simon_tatham mentioned this in D67161: [clang,ARM] Initial ACLE intrinsics for MVE..Sep 4 2019, 5:48 AM

dmgreen added inline comments.Sep 9 2019, 2:49 AM

llvm/include/llvm/IR/IntrinsicsARM.td
806	A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.smin, although that doesn't contain the first parameter. (I'm just mentioning this for general interest mostly, I don't think we should be using the thing yet. It may be more interesting in the future if llvm started optimising the vector.reduce better, but for the moment I think that the arm intrinsic is a good idea).
816	Any reason this is called fltnarrow? As opposed to something closer to the name of the instruction?
llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
212	Maybe call this something like "AddMVEPredicateToOps"?
llvm/lib/Target/ARM/ARMInstrMVE.td
283	The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a set of "X86VectorVTInfo" classes for each type, that can contains extra information about that vector type. That way we could store the predicate type with the other info, and instead of adding the ValueType to the MVE_VADDSUB, for example, we could add this class data. (And potentially add extra stuff like signed_suffix and unsigned_suffix, etc) I'm not sure if that method is better of worse than this, either from an efficiency of tblgen, or if it's simpler to read.
695	Are the instructions correct here? Should they be s8/s16/s32?
1498	I feel like this deserves at least a little indentation. Maybe not for each level of for, but something to make it easier to parse. The last 2 levels are really just defining variables, right? And could be written in-place.
5246	There is an isel node called predicate_cast that does the same thing as this. It may be possible/beneficial to convert this earlier (during lowering, but I'm not sure that would do anything yet). The patterns are further up, near a lot of the vcmp patterns.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll
2	-verify-machineinstrs is probably worth putting on most of these tests.
22	The #1 can be removed?
llvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll
10	Can you add test for 0 too
llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll
4	Test other VT's too, please.

simon_tatham marked 8 inline comments as done.Sep 11 2019, 4:53 AM

simon_tatham added inline comments.

llvm/include/llvm/IR/IntrinsicsARM.td
816	Not any particularly good reason. I had the vague idea of naming things after their functionality as long as it didn't turn the name into a giant essay (there's probably no hope for `vrmlaldavhax`), on the vague principle that that might make them easier to find if anyone later goes looking for platform-specific intrinsics that could be pulled out into standard IR representations. But it's not something I'm strongly committed to, just a harebrained idea I thought I'd throw out there to see if anyone had opinions about it.

Addressed many review comments.

Harbormaster completed remote builds in B38005: Diff 219690.Sep 11 2019, 4:54 AM

dmgreen added inline comments.Sep 23 2019, 12:12 AM

llvm/include/llvm/IR/IntrinsicsARM.td
816	I would say that in this case, if it is going from something called vcvt to something called vcvt then similar name would be less confusing (presuming that you can search for arm_mve_vcvt to distinguish the intrinsic from the instruction from the builtin).

Rebased, and renamed the fltnarrow intrinsic as suggested.

Harbormaster completed remote builds in B38997: Diff 223225.Oct 4 2019, 8:29 AM

This is a bit large to review in a single patch, and I don't think all the parts are necessarily interrelated. Mind pulling a few logically separable parts out into separate patches, to make what's left simpler?

dmgreen mentioned this in D68566: [ARM] VQADD instructions.Oct 7 2019, 7:24 AM

Split this patch into three as requested. This one now contains only
the subset of the previous IR intrinsics that can be implemented by
Tablegen patterns. The ones using C++ are moved out to two followup
patches.

simon_tatham retitled this revision from [ARM] Add IR intrinsics for a sample of MVE instructions. to [ARM] Begin adding IR intrinsics for MVE instructions..Oct 9 2019, 6:00 AM

simon_tatham edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B39223: Diff 224027.Oct 9 2019, 6:03 AM

simon_tatham added a child revision: D68699: [ARM] Add some sample IR MVE intrinsics with C++ isel..Oct 9 2019, 6:07 AM

Thanks for splitting this up.

llvm/lib/Target/ARM/ARMInstrMVE.td
691	I feel like we should come up with a style and try and stick with it. The adds/subs below add in VT to the existing instructions and use it in the new Patterns, mixed in with the old patterns. These ones add the intrinsics to the multiclass so the top level can include the pattern (but also has patterns outside (below) too. I think there's value in making this somewhat structured, if we can.
2733	Little bit of indenting, please.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
5	We probably don't _need_ tests for simple instructions like this, they should be covered elsewhere (fine to leave them if you wish).
25	For the rest of the tests, at least for codegen we have tried to fill in all the combinations for type and operations (at least the legal types). It can be useful for making sure nothing is missed (here or in the future when some refactoring happens). Whether you want to do the same thing here is up to you, or whether you think that having interesting combinations is enough (adds with v16i8, subs with v4f32 for example).
53	Do we care about what happens when there is a fp intrinsic but we don't have mve.fp? I presume this will be a fail to select or some sort of legalisation error, which is probably fine considering what is happening.

Rewritten this patch to do all its jobs in the same reasonably
consistent way.

Also, I've replaced my mkpred function-oid with a system of
MVEVectorVTInfo, similar to the x86 system that @dmgreen pointed
out. I think it works out more nicely, for two reasons. Firstly, I can
make separate classes for vectors of signed and unsigned integers,
which MVE distinguishes even though LLVM IR doesn't. Secondly, I can
also have those info records contain bits and pieces that can be used
in the main instruction definition: the type suffix on the mnemonic
(.s32 or .f16 or whatever), and at least the _usual_ way that
vector types are represented in MVE instruction encodings. So now
every time I have to add an MVEVectorVTInfo as an extra template
parameter, I'll be able to remove a few existing ones that it makes
redundant, so the declarations shouldn't get too complicated.

Harbormaster completed remote builds in B39576: Diff 225038.Oct 15 2019, 7:50 AM

Oh, nearly forgot: I renamed the vcvt intrinsic again, to put "narrow" back into the name (it's now vcvt_narrow).

Rationale: the MVE VCVT instructions can't all be treated exactly the same way by their IR intrinsics, because conversions to a narrower element type need an extra input parameter giving the previous value of their output register, which they only overwrite half of. Non-narrowing VCVTs will have a simpler type signature without that parameter.

dmgreen added inline comments.Oct 23 2019, 2:10 AM

llvm/lib/Target/ARM/ARMInstrMVE.td
311	Maybe call these MVEv16i8? Or v16i8_t or v16i8info or something? Otherwise it looks very useful.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25	This could probably do with a reply, one way or the other. I was previously in the "don't mind either way" camp, now I feel more in the "why not just add them" camp, unless there is some reason not to.

simon_tatham marked 2 inline comments as done.Oct 23 2019, 6:46 AM

simon_tatham added inline comments.

llvm/lib/Target/ARM/ARMInstrMVE.td
311	I copied the `v16i8_info` naming system from the similar x86 case you pointed out to me. But I can call them something with "MVE" in the name if you prefer, sure.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25	I had intended to stop at "interesting subset of combinations", along the lines of one of each operation, and one of each type, but not the full cross product unless absolutely necessary. (There are about 2000 of these to come in future work, so at some point adding a test for every single one won't deserve the word "just" any more!)

dmgreen added inline comments.Oct 23 2019, 8:31 AM

llvm/lib/Target/ARM/ARMInstrMVE.td
311	Ah righteo. The underscore threw me off. Whatever you think is best. MVE might not be the worst idea, just in case someone decides to do something similar in NEON.
llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25	The add int and add float are two different instructions, same for sub, so it probably best to make sure we have at least one of each of those combinations. I was looking for prior art of only adding subsets of the combinations for codegen, I don't immediately see anywhere where we have done that. I'm not too worried about this set of tests not catching problems now. But in the future as things are refactored (in ways that might be difficult to predict), it might be expected that the test coverage is more complete and lead to things getting missed. I'd suggest we at least fill in the combinations for add and sub. Later on when we get to vrmlsldavhax it probably wont be as important.

Renamed the info classes from vXXyY_info to MVE_vXXyY, and added
some more tests of addition. Also rebased to current master.

Harbormaster completed remote builds in B39988: Diff 226235.Oct 24 2019, 5:52 AM

Thanks! LGTM, with just one comment that might be better left for Sam.

llvm/lib/Target/ARM/ARMInstrMVE.td
2725	Can this move into MVE_VADDSUBFMA_fp? I'm pretty sure VFMA should be fine there too. Feel free to leave that for @samparker though.

This revision is now accepted and ready to land.Oct 24 2019, 7:11 AM

Closed by commit rG1b45297e013e: [ARM] Begin adding IR intrinsics for MVE instructions. (authored by simon_tatham). · Explain WhyOct 24 2019, 8:35 AM

This revision was automatically updated to reflect the committed changes.

simon_tatham mentioned this in rG08074cc96557: [clang,ARM] Initial ACLE intrinsics for MVE..

simon_tatham mentioned this in D76490: [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family..Mar 20 2020, 5:22 AM

simon_tatham mentioned this in rG45a9945b9ea9: [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family..Mar 20 2020, 9:11 AM

Interested subject. I had never given it any thought before. Just making plans for my future studies. By the way, I discovered a helpful tool that all university students use in challenging circumstances nurse writing . I'm hoping that having his assistance will significantly speed up my learning.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 9:50 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsARM.td

54 lines

lib/

Target/

ARM/

ARMISelDAGToDAG.cpp

245 lines

ARMInstrMVE.td

194 lines

test/

CodeGen/

Thumb2/

mve-intrinsics/

23 lines

98 lines

58 lines

32 lines

91 lines

62 lines

14 lines

Diff 218662

llvm/include/llvm/IR/IntrinsicsARM.td

	Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines
	def int_arm_neon_sdot : Neon_Dot_Intrinsic;			def int_arm_neon_sdot : Neon_Dot_Intrinsic;


	// GNU eabi mcount			// GNU eabi mcount
	def int_arm_gnu_eabi_mcount : Intrinsic<[],			def int_arm_gnu_eabi_mcount : Intrinsic<[],
	[],			[],
	[IntrReadMem, IntrWriteMem]>;			[IntrReadMem, IntrWriteMem]>;

				def int_arm_mve_pred_i2v : Intrinsic<
				[llvm_anyvector_ty], [llvm_i32_ty], [IntrNoMem]>;
				def int_arm_mve_pred_v2i : Intrinsic<
				[llvm_i32_ty], [llvm_anyvector_ty], [IntrNoMem]>;

				multiclass IntrinsicSignSuffix<list<LLVMType> rets, list<LLVMType> params = [],
				list<IntrinsicProperty> props = [],
				string name = "",
				list<SDNodeProperty> sdprops = []> {
				def _s: Intrinsic<rets, params, props, name, sdprops>;
				def _u: Intrinsic<rets, params, props, name, sdprops>;
				}

				def int_arm_mve_add_predicated: Intrinsic<[llvm_anyvector_ty],
				[LLVMMatchType<0>, LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<0>],
				[IntrNoMem]>;
				def int_arm_mve_sub_predicated: Intrinsic<[llvm_anyvector_ty],
				[LLVMMatchType<0>, LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<0>],
				[IntrNoMem]>;

				defm int_arm_mve_minv: IntrinsicSignSuffix<[llvm_i32_ty],
				dmgreenUnsubmitted Not Done Reply Inline Actions A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.smin, although that doesn't contain the first parameter. (I'm just mentioning this for general interest mostly, I don't think we should be using the thing yet. It may be more interesting in the future if llvm started optimising the vector.reduce better, but for the moment I think that the arm intrinsic is a good idea). dmgreen: A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.
				[llvm_i32_ty, llvm_anyvector_ty], [IntrNoMem]>;

				def int_arm_mve_vld2q: Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>], [llvm_anyptr_ty], [IntrReadMem]>;
				def int_arm_mve_vld4q: Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>], [llvm_anyptr_ty], [IntrReadMem]>;

				def int_arm_mve_vst2q: Intrinsic<[], [llvm_anyptr_ty, llvm_anyvector_ty, LLVMMatchType<1>, llvm_i32_ty], [IntrWriteMem]>;
				def int_arm_mve_vst4q: Intrinsic<[], [llvm_anyptr_ty, llvm_anyvector_ty, LLVMMatchType<1>, LLVMMatchType<1>, LLVMMatchType<1>, llvm_i32_ty], [IntrWriteMem]
				>;

				def int_arm_mve_fltnarrow: Intrinsic<[llvm_v8f16_ty],
				dmgreenUnsubmitted Not Done Reply Inline Actions Any reason this is called fltnarrow? As opposed to something closer to the name of the instruction? dmgreen: Any reason this is called fltnarrow? As opposed to something closer to the name of the…
				simon_tathamAuthorUnsubmitted Done Reply Inline Actions Not any particularly good reason. I had the vague idea of naming things after their functionality as long as it didn't turn the name into a giant essay (there's probably no hope for `vrmlaldavhax`), on the vague principle that that might make them easier to find if anyone later goes looking for platform-specific intrinsics that could be pulled out into standard IR representations. But it's not something I'm strongly committed to, just a harebrained idea I thought I'd throw out there to see if anyone had opinions about it. simon_tatham: Not any particularly good reason. I had the vague idea of naming things after their…
				dmgreenUnsubmitted Not Done Reply Inline Actions I would say that in this case, if it is going from something called vcvt to something called vcvt then similar name would be less confusing (presuming that you can search for arm_mve_vcvt to distinguish the intrinsic from the instruction from the builtin). dmgreen: I would say that in this case, if it is going from something called vcvt to something called…
				[llvm_v8f16_ty, llvm_v4f32_ty, llvm_i32_ty], [IntrNoMem]>;
				def int_arm_mve_fltnarrow_predicated: Intrinsic<[llvm_v8f16_ty],
				[llvm_v8f16_ty, llvm_v4f32_ty, llvm_i32_ty, llvm_v4i1_ty], [IntrNoMem]>;

				def int_arm_mve_vldr_gather_base_wb: Intrinsic<
				[llvm_anyvector_ty, llvm_anyvector_ty],
				[LLVMMatchType<1>, llvm_i32_ty], [IntrReadMem]>;
				def int_arm_mve_vldr_gather_base_wb_predicated: Intrinsic<
				[llvm_anyvector_ty, llvm_anyvector_ty],
				[LLVMMatchType<1>, llvm_i32_ty, llvm_anyvector_ty], [IntrReadMem]>;

				def int_arm_mve_urshrl: Intrinsic<
				[llvm_i32_ty, llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
				[IntrNoMem]>;

				def int_arm_mve_vadc: Intrinsic<
				[llvm_anyvector_ty, llvm_i32_ty],
				[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty], [IntrNoMem]>;
				def int_arm_mve_vadc_predicated: Intrinsic<
				[llvm_anyvector_ty, llvm_i32_ty],
				[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>,
				llvm_i32_ty, llvm_anyvector_ty], [IntrNoMem]>;

	} // end TargetPrefix			} // end TargetPrefix

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp

Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	private:

/// SelectVLDSTLane - Select NEON load/store lane intrinsics. NumVecs should		/// SelectVLDSTLane - Select NEON load/store lane intrinsics. NumVecs should
/// be 2, 3 or 4. The opcode arrays specify the instructions used for		/// be 2, 3 or 4. The opcode arrays specify the instructions used for
/// load/store of D registers and Q registers.		/// load/store of D registers and Q registers.
void SelectVLDSTLane(SDNode *N, bool IsLoad, bool isUpdating,		void SelectVLDSTLane(SDNode *N, bool IsLoad, bool isUpdating,
unsigned NumVecs, const uint16_t *DOpcodes,		unsigned NumVecs, const uint16_t *DOpcodes,
const uint16_t *QOpcodes);		const uint16_t *QOpcodes);

		/// Helper functions for setting up clusters of MVE predication operands.
		template <typename SDValueVector>
		void MVE_Predicated(SDValueVector &Ops, SDLoc Loc, SDValue PredicateMask);
		dmgreenUnsubmitted Done Reply Inline Actions Maybe call this something like "AddMVEPredicateToOps"? dmgreen: Maybe call this something like "AddMVEPredicateToOps"?
		template <typename SDValueVector>
		void MVE_Predicated(SDValueVector &Ops, SDLoc Loc, SDValue PredicateMask,
		SDValue Inactive);

		template <typename SDValueVector>
		void MVE_Unpredicated(SDValueVector &Ops, SDLoc Loc);
		template <typename SDValueVector>
		void MVE_Unpredicated(SDValueVector &Ops, SDLoc Loc, EVT InactiveTy);

		/// SelectMVE_VLD - Select MVE interleaving load intrinsics. NumVecs
		/// should be 2 or 4. The opcode array specifies the instructions
		/// used for 8, 16 and 32-bit lane sizes respectively, and each
		/// pointer points to a set of NumVecs sub-opcodes used for the
		/// different stages (e.g. VLD20 versus VLD21) of each load family.
		void SelectMVE_VLD(SDNode *N, unsigned NumVecs,
		const uint16_t const Opcodes);

		/// SelectMVE_WB - Select MVE writeback load/store intrinsics.
		void SelectMVE_WB(SDNode N, const uint16_t Opcodes, bool Predicated);

		/// SelectMVE_LongShift - Select MVE 64-bit scalar shift intrinsics.
		void SelectMVE_LongShift(SDNode *N, uint16_t Opcode, bool Immediate);

		/// SelectMVE_VADCSBC - Select MVE vector add/sub-with-carry intrinsics.
		void SelectMVE_VADCSBC(SDNode *N, uint16_t OpcodeWithCarry,
		uint16_t OpcodeWithNoCarry, bool Add, bool Predicated);

/// SelectVLDDup - Select NEON load-duplicate intrinsics. NumVecs		/// SelectVLDDup - Select NEON load-duplicate intrinsics. NumVecs
/// should be 1, 2, 3 or 4. The opcode array specifies the instructions used		/// should be 1, 2, 3 or 4. The opcode array specifies the instructions used
/// for loading D registers.		/// for loading D registers.
void SelectVLDDup(SDNode *N, bool IsIntrinsic, bool isUpdating,		void SelectVLDDup(SDNode *N, bool IsIntrinsic, bool isUpdating,
unsigned NumVecs, const uint16_t *DOpcodes,		unsigned NumVecs, const uint16_t *DOpcodes,
const uint16_t *QOpcodes0 = nullptr,		const uint16_t *QOpcodes0 = nullptr,
const uint16_t *QOpcodes1 = nullptr);		const uint16_t *QOpcodes1 = nullptr);

▲ Show 20 Lines • Show All 2,057 Lines • ▼ Show 20 Lines	for (unsigned Vec = 0; Vec < NumVecs; ++Vec)
ReplaceUses(SDValue(N, Vec),		ReplaceUses(SDValue(N, Vec),
CurDAG->getTargetExtractSubreg(Sub0 + Vec, dl, VT, SuperReg));		CurDAG->getTargetExtractSubreg(Sub0 + Vec, dl, VT, SuperReg));
ReplaceUses(SDValue(N, NumVecs), SDValue(VLdLn, 1));		ReplaceUses(SDValue(N, NumVecs), SDValue(VLdLn, 1));
if (isUpdating)		if (isUpdating)
ReplaceUses(SDValue(N, NumVecs + 1), SDValue(VLdLn, 2));		ReplaceUses(SDValue(N, NumVecs + 1), SDValue(VLdLn, 2));
CurDAG->RemoveDeadNode(N);		CurDAG->RemoveDeadNode(N);
}		}

		template <typename SDValueVector>
		void ARMDAGToDAGISel::MVE_Predicated(SDValueVector &Ops, SDLoc Loc,
		SDValue PredicateMask) {
		Ops.push_back(CurDAG->getTargetConstant(ARMVCC::Then, Loc, MVT::i32));
		Ops.push_back(PredicateMask);
		}

		template <typename SDValueVector>
		void ARMDAGToDAGISel::MVE_Predicated(SDValueVector &Ops, SDLoc Loc,
		SDValue PredicateMask, SDValue Inactive) {
		Ops.push_back(CurDAG->getTargetConstant(ARMVCC::Then, Loc, MVT::i32));
		Ops.push_back(PredicateMask);
		Ops.push_back(Inactive);
		}

		template <typename SDValueVector>
		void ARMDAGToDAGISel::MVE_Unpredicated(SDValueVector &Ops, SDLoc Loc) {
		Ops.push_back(CurDAG->getTargetConstant(ARMVCC::None, Loc, MVT::i32));
		Ops.push_back(CurDAG->getRegister(0, MVT::i32));
		}

		template <typename SDValueVector>
		void ARMDAGToDAGISel::MVE_Unpredicated(SDValueVector &Ops, SDLoc Loc,
		EVT InactiveTy) {
		Ops.push_back(CurDAG->getTargetConstant(ARMVCC::None, Loc, MVT::i32));
		Ops.push_back(CurDAG->getRegister(0, MVT::i32));
		Ops.push_back(SDValue(
		CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, Loc, InactiveTy), 0));
		}

		void ARMDAGToDAGISel::SelectMVE_VLD(SDNode *N, unsigned NumVecs,
		const uint16_t const Opcodes) {
		EVT VT = N->getValueType(0);
		SDLoc Loc(N);

		const uint16_t *OurOpcodes;
		switch (VT.getVectorElementType().getSizeInBits()) {
		case 8:
		OurOpcodes = Opcodes[0];
		break;
		case 16:
		OurOpcodes = Opcodes[1];
		break;
		case 32:
		OurOpcodes = Opcodes[2];
		break;
		default:
		llvm_unreachable("bad vector element size in SelectMVE_VLD");
		}

		EVT DataTy = EVT::getVectorVT(CurDAG->getContext(), MVT::i64, NumVecs 2);
		EVT ResultTys[] = {DataTy, MVT::Other};

		auto Data = SDValue(
		CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, Loc, DataTy), 0);
		SDValue Chain = N->getOperand(0);
		for (unsigned Stage = 0; Stage < NumVecs; ++Stage) {
		SDValue Ops[] = {Data, N->getOperand(2), Chain};
		auto LoadInst =
		CurDAG->getMachineNode(OurOpcodes[Stage], Loc, ResultTys, Ops);
		Data = SDValue(LoadInst, 0);
		Chain = SDValue(LoadInst, 1);
		}

		for (unsigned i = 0; i < NumVecs; i++)
		ReplaceUses(SDValue(N, i),
		CurDAG->getTargetExtractSubreg(ARM::qsub_0 + i, Loc, VT, Data));
		ReplaceUses(SDValue(N, NumVecs), Chain);
		CurDAG->RemoveDeadNode(N);
		}

		void ARMDAGToDAGISel::SelectMVE_WB(SDNode N, const uint16_t Opcodes,
		bool Predicated) {
		SDLoc Loc(N);
		SmallVector<SDValue, 8> Ops;

		uint16_t Opcode;
		switch (N->getValueType(1).getVectorElementType().getSizeInBits()) {
		case 32:
		Opcode = Opcodes[0];
		break;
		case 64:
		Opcode = Opcodes[1];
		break;
		default:
		llvm_unreachable("bad vector element size in SelectMVE_WB");
		}

		Ops.push_back(N->getOperand(2)); // vector of base addresses

		int32_t ImmValue = cast<ConstantSDNode>(N->getOperand(3))->getZExtValue();
		Ops.push_back(getI32Imm(ImmValue, Loc)); // immediate offset

		if (Predicated)
		MVE_Predicated(Ops, Loc, N->getOperand(4));
		else
		MVE_Unpredicated(Ops, Loc);

		Ops.push_back(N->getOperand(0)); // chain

		CurDAG->SelectNodeTo(N, Opcode, N->getVTList(), makeArrayRef(Ops));
		}

		void ARMDAGToDAGISel::SelectMVE_LongShift(SDNode *N, uint16_t Opcode,
		bool Immediate) {
		SDLoc Loc(N);
		SmallVector<SDValue, 8> Ops;

		// Two 32-bit halves of the value to be shifted
		Ops.push_back(N->getOperand(1));
		Ops.push_back(N->getOperand(2));

		// The shift count
		if (Immediate) {
		int32_t ImmValue = cast<ConstantSDNode>(N->getOperand(3))->getZExtValue();
		Ops.push_back(getI32Imm(ImmValue, Loc)); // immediate offset
		} else {
		Ops.push_back(N->getOperand(3));
		}

		// MVE scalar shifts are IT-predicable, so include the standard
		// predicate arguments.
		Ops.push_back(getAL(CurDAG, Loc));
		Ops.push_back(CurDAG->getRegister(0, MVT::i32));

		CurDAG->SelectNodeTo(N, Opcode, N->getVTList(), makeArrayRef(Ops));
		}

		void ARMDAGToDAGISel::SelectMVE_VADCSBC(SDNode *N, uint16_t OpcodeWithCarry,
		uint16_t OpcodeWithNoCarry,
		bool Add, bool Predicated) {
		SDLoc Loc(N);
		SmallVector<SDValue, 8> Ops;
		uint16_t Opcode;

		unsigned FirstInputOp = Predicated ? 2 : 1;

		// Two input vectors and the input carry flag
		Ops.push_back(N->getOperand(FirstInputOp));
		Ops.push_back(N->getOperand(FirstInputOp + 1));
		SDValue CarryIn = N->getOperand(FirstInputOp + 2);
		ConstantSDNode *CarryInConstant = dyn_cast<ConstantSDNode>(CarryIn);
		uint32_t CarryMask = 1 << 29;
		uint32_t CarryExpected = Add ? 0 : CarryMask;
		if (CarryInConstant &&
		(CarryInConstant->getZExtValue() & CarryMask) == CarryExpected) {
		Opcode = OpcodeWithNoCarry;
		} else {
		Ops.push_back(CarryIn);
		Opcode = OpcodeWithCarry;
		}

		if (Predicated)
		MVE_Predicated(Ops, Loc,
		N->getOperand(FirstInputOp + 3), // predicate
		N->getOperand(FirstInputOp - 1)); // inactive
		else
		MVE_Unpredicated(Ops, Loc, N->getValueType(0));

		CurDAG->SelectNodeTo(N, Opcode, N->getVTList(), makeArrayRef(Ops));
		}

void ARMDAGToDAGISel::SelectVLDDup(SDNode *N, bool IsIntrinsic,		void ARMDAGToDAGISel::SelectVLDDup(SDNode *N, bool IsIntrinsic,
bool isUpdating, unsigned NumVecs,		bool isUpdating, unsigned NumVecs,
const uint16_t *DOpcodes,		const uint16_t *DOpcodes,
const uint16_t *QOpcodes0,		const uint16_t *QOpcodes0,
const uint16_t *QOpcodes1) {		const uint16_t *QOpcodes1) {
assert(NumVecs >= 1 && NumVecs <= 4 && "VLDDup NumVecs out-of-range");		assert(NumVecs >= 1 && NumVecs <= 4 && "VLDDup NumVecs out-of-range");
SDLoc dl(N);		SDLoc dl(N);

▲ Show 20 Lines • Show All 1,708 Lines • ▼ Show 20 Lines	case Intrinsic::arm_neon_vst4lane: {
static const uint16_t DOpcodes[] = { ARM::VST4LNd8Pseudo,		static const uint16_t DOpcodes[] = { ARM::VST4LNd8Pseudo,
ARM::VST4LNd16Pseudo,		ARM::VST4LNd16Pseudo,
ARM::VST4LNd32Pseudo };		ARM::VST4LNd32Pseudo };
static const uint16_t QOpcodes[] = { ARM::VST4LNq16Pseudo,		static const uint16_t QOpcodes[] = { ARM::VST4LNq16Pseudo,
ARM::VST4LNq32Pseudo };		ARM::VST4LNq32Pseudo };
SelectVLDSTLane(N, false, false, 4, DOpcodes, QOpcodes);		SelectVLDSTLane(N, false, false, 4, DOpcodes, QOpcodes);
return;		return;
}		}

		case Intrinsic::arm_mve_vld2q: {
		static const uint16_t Opcodes8[] = {ARM::MVE_VLD20_8, ARM::MVE_VLD21_8};
		static const uint16_t Opcodes16[] = {ARM::MVE_VLD20_16,
		ARM::MVE_VLD21_16};
		static const uint16_t Opcodes32[] = {ARM::MVE_VLD20_32,
		ARM::MVE_VLD21_32};
		static const uint16_t *const Opcodes[] = {Opcodes8, Opcodes16, Opcodes32};
		SelectMVE_VLD(N, 2, Opcodes);
		return;
		}

		case Intrinsic::arm_mve_vld4q: {
		static const uint16_t Opcodes8[] = {ARM::MVE_VLD40_8, ARM::MVE_VLD41_8,
		ARM::MVE_VLD42_8, ARM::MVE_VLD43_8};
		static const uint16_t Opcodes16[] = {ARM::MVE_VLD40_16, ARM::MVE_VLD41_16,
		ARM::MVE_VLD42_16,
		ARM::MVE_VLD43_16};
		static const uint16_t Opcodes32[] = {ARM::MVE_VLD40_32, ARM::MVE_VLD41_32,
		ARM::MVE_VLD42_32,
		ARM::MVE_VLD43_32};
		static const uint16_t *const Opcodes[] = {Opcodes8, Opcodes16, Opcodes32};
		SelectMVE_VLD(N, 4, Opcodes);
		return;
		}

		case Intrinsic::arm_mve_vldr_gather_base_wb:
		case Intrinsic::arm_mve_vldr_gather_base_wb_predicated:
		static const uint16_t Opcodes[] = {ARM::MVE_VLDRWU32_qi_pre,
		ARM::MVE_VLDRDU64_qi_pre};
		SelectMVE_WB(N, Opcodes,
		IntNo == Intrinsic::arm_mve_vldr_gather_base_wb_predicated);
		return;
}		}
break;		break;
}		}

		case ISD::INTRINSIC_WO_CHAIN: {
		unsigned IntNo = cast<ConstantSDNode>(N->getOperand(0))->getZExtValue();
		switch (IntNo) {
		default:
		break;

		case Intrinsic::arm_mve_urshrl:
		SelectMVE_LongShift(N, ARM::MVE_URSHRL, true);
		return;

		case Intrinsic::arm_mve_vadc:
		case Intrinsic::arm_mve_vadc_predicated:
		SelectMVE_VADCSBC(N, ARM::MVE_VADC, ARM::MVE_VADCI, true,
		IntNo == Intrinsic::arm_mve_vadc_predicated);
		return;
		}

		break;
		}

case ISD::ATOMIC_CMP_SWAP:		case ISD::ATOMIC_CMP_SWAP:
SelectCMP_SWAP(N);		SelectCMP_SWAP(N);
return;		return;
}		}

SelectCode(N);		SelectCode(N);
}		}

▲ Show 20 Lines • Show All 541 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrMVE.td

Show First 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	class mve_addr_q_shift<int shift> : MemOperand {
// Can be printed same way as other reg + imm operands		// Can be printed same way as other reg + imm operands
let PrintMethod = "printT2AddrModeImm8Operand<false>";		let PrintMethod = "printT2AddrModeImm8Operand<false>";
let ParserMatchClass =		let ParserMatchClass =
!cast<AsmOperandClass>("MemRegQS"#shift#"OffsetAsmOperand");		!cast<AsmOperandClass>("MemRegQS"#shift#"OffsetAsmOperand");
let DecoderMethod = "DecodeMveAddrModeQ<"#shift#">";		let DecoderMethod = "DecodeMveAddrModeQ<"#shift#">";
let MIOperandInfo = (ops MQPR:$base, i32imm:$imm);		let MIOperandInfo = (ops MQPR:$base, i32imm:$imm);
}		}

		// This class is effectively just a sort of 'subroutine', working around the
		// fact that Tablegen has no explicit syntax for function definition. If you
		// have a vector type like v8i16, and you want the corresponding predicate type
		// that should be used in IR intrinsics whose source vector is of that type,
		// you can refer to mkpred<MyVectorType>.p to compute it automatically, e.g. in
		// foreach or multiclass definitions.
		class mkpred<ValueType VT> {
		dmgreenUnsubmitted Not Done Reply Inline Actions The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a set of "X86VectorVTInfo" classes for each type, that can contains extra information about that vector type. That way we could store the predicate type with the other info, and instead of adding the ValueType to the MVE_VADDSUB, for example, we could add this class data. (And potentially add extra stuff like signed_suffix and unsigned_suffix, etc) I'm not sure if that method is better of worse than this, either from an efficiency of tblgen, or if it's simpler to read. dmgreen: The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a…
		ValueType p = !cond(!eq(VT.Value, v16i8.Value): v16i1,
		!eq(VT.Value, v8i16.Value): v8i1,
		!eq(VT.Value, v8f16.Value): v8i1,
		!eq(VT.Value, v4i32.Value): v4i1,
		!eq(VT.Value, v4f32.Value): v4i1,
		// For vectors of 2 values, use v4i1 instead of v2i1 for
		// the moment: MVE codegen doesn't support doing all the
		// auxiliary operations on v2i1 such as vector shuffles,
		// and also, there's no MVE compare instruction that will
		// generate v2i1 directly. We could rethink this later if
		// we have a better idea.
		!eq(VT.Value, v2i64.Value): v4i1,
		!eq(VT.Value, v2f64.Value): v4i1);
		}

// --------- Start of base classes for the instructions themselves		// --------- Start of base classes for the instructions themselves

class MVE_MI<dag oops, dag iops, InstrItinClass itin, string asm,		class MVE_MI<dag oops, dag iops, InstrItinClass itin, string asm,
string ops, string cstr, list<dag> pattern>		string ops, string cstr, list<dag> pattern>
: Thumb2XI<oops, iops, AddrModeNone, 4, itin, !strconcat(asm, "\t", ops), cstr,		: Thumb2XI<oops, iops, AddrModeNone, 4, itin, !strconcat(asm, "\t", ops), cstr,
pattern>,		pattern>,
Requires<[HasMVEInt]> {		Requires<[HasMVEInt]> {
let D = MVEDomain;		let D = MVEDomain;
let DecoderNamespace = "MVE";		let DecoderNamespace = "MVE";
}		}

// MVE_p is used for most predicated instructions, to add the cluster		// MVE_p is used for most predicated instructions, to add the cluster
// of input operands that provides the VPT suffix (none, T or E) and		// of input operands that provides the VPT suffix (none, T or E) and
		dmgreenUnsubmitted Not Done Reply Inline Actions Maybe call these MVEv16i8? Or v16i8_t or v16i8info or something? Otherwise it looks very useful. dmgreen: Maybe call these MVEv16i8? Or v16i8_t or v16i8info or something? Otherwise it looks very…
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions I copied the `v16i8_info` naming system from the similar x86 case you pointed out to me. But I can call them something with "MVE" in the name if you prefer, sure. simon_tatham: I copied the ``v16i8_info`` naming system from the similar x86 case you pointed out to me. But…
		dmgreenUnsubmitted Not Done Reply Inline Actions Ah righteo. The underscore threw me off. Whatever you think is best. MVE might not be the worst idea, just in case someone decides to do something similar in NEON. dmgreen: Ah righteo. The underscore threw me off. Whatever you think is best. MVE might not be the worst…
// the input predicate register.		// the input predicate register.
class MVE_p<dag oops, dag iops, InstrItinClass itin, string iname,		class MVE_p<dag oops, dag iops, InstrItinClass itin, string iname,
string suffix, string ops, vpred_ops vpred, string cstr,		string suffix, string ops, vpred_ops vpred, string cstr,
list<dag> pattern=[]>		list<dag> pattern=[]>
: MVE_MI<oops, !con(iops, (ins vpred:$vp)), itin,		: MVE_MI<oops, !con(iops, (ins vpred:$vp)), itin,
// If the instruction has a suffix, like vadd.f32, then the		// If the instruction has a suffix, like vadd.f32, then the
// VPT predication suffix goes before the dot, so the full		// VPT predication suffix goes before the dot, so the full
// name has to be "vadd${vp}.f32".		// name has to be "vadd${vp}.f32".
▲ Show 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	multiclass MVE_VMINMAXV_ty<string iname, bit bit_7, list<dag> pattern=[]> {
def s32 : MVE_VMINMAXV<iname, "s32", 0b0, 0b10, 0b1, bit_7>;		def s32 : MVE_VMINMAXV<iname, "s32", 0b0, 0b10, 0b1, bit_7>;
def u8 : MVE_VMINMAXV<iname, "u8", 0b1, 0b00, 0b1, bit_7>;		def u8 : MVE_VMINMAXV<iname, "u8", 0b1, 0b00, 0b1, bit_7>;
def u16 : MVE_VMINMAXV<iname, "u16", 0b1, 0b01, 0b1, bit_7>;		def u16 : MVE_VMINMAXV<iname, "u16", 0b1, 0b01, 0b1, bit_7>;
def u32 : MVE_VMINMAXV<iname, "u32", 0b1, 0b10, 0b1, bit_7>;		def u32 : MVE_VMINMAXV<iname, "u32", 0b1, 0b10, 0b1, bit_7>;
}		}

defm MVE_VMINV : MVE_VMINMAXV_ty<"vminv", 0b1>;		defm MVE_VMINV : MVE_VMINMAXV_ty<"vminv", 0b1>;
defm MVE_VMAXV : MVE_VMINMAXV_ty<"vmaxv", 0b0>;		defm MVE_VMAXV : MVE_VMINMAXV_ty<"vmaxv", 0b0>;

		dmgreenUnsubmitted Not Done Reply Inline Actions I feel like we should come up with a style and try and stick with it. The adds/subs below add in VT to the existing instructions and use it in the new Patterns, mixed in with the old patterns. These ones add the intrinsics to the multiclass so the top level can include the pattern (but also has patterns outside (below) too. I think there's value in making this somewhat structured, if we can. dmgreen: I feel like we should come up with a style and try and stick with it. The adds/subs below add…
		let Predicates = [HasMVEInt] in {
		foreach vtype = [v16i8, v8i16, v4i32] in {
		def : Pat<(i32 (int_arm_mve_minv_s (i32 rGPR:$prev), (vtype MQPR:$vec))),
		(i32 (MVE_VMINVs8 (i32 rGPR:$prev), (vtype MQPR:$vec)))>;
		dmgreenUnsubmitted Done Reply Inline Actions Are the instructions correct here? Should they be s8/s16/s32? dmgreen: Are the instructions correct here? Should they be s8/s16/s32?
		def : Pat<(i32 (int_arm_mve_minv_u (i32 rGPR:$prev), (vtype MQPR:$vec))),
		(i32 (MVE_VMINVu8 (i32 rGPR:$prev), (vtype MQPR:$vec)))>;
		}
		}

multiclass MVE_VMINMAXAV_ty<string iname, bit bit_7, list<dag> pattern=[]> {		multiclass MVE_VMINMAXAV_ty<string iname, bit bit_7, list<dag> pattern=[]> {
def s8 : MVE_VMINMAXV<iname, "s8", 0b0, 0b00, 0b0, bit_7>;		def s8 : MVE_VMINMAXV<iname, "s8", 0b0, 0b00, 0b0, bit_7>;
def s16 : MVE_VMINMAXV<iname, "s16", 0b0, 0b01, 0b0, bit_7>;		def s16 : MVE_VMINMAXV<iname, "s16", 0b0, 0b01, 0b0, bit_7>;
def s32 : MVE_VMINMAXV<iname, "s32", 0b0, 0b10, 0b0, bit_7>;		def s32 : MVE_VMINMAXV<iname, "s32", 0b0, 0b10, 0b0, bit_7>;
}		}

defm MVE_VMINAV : MVE_VMINMAXAV_ty<"vminav", 0b1>;		defm MVE_VMINAV : MVE_VMINMAXAV_ty<"vminav", 0b1>;
defm MVE_VMAXAV : MVE_VMINMAXAV_ty<"vmaxav", 0b0>;		defm MVE_VMAXAV : MVE_VMINMAXAV_ty<"vmaxav", 0b0>;
▲ Show 20 Lines • Show All 751 Lines • ▼ Show 20 Lines
def MVE_VQDMULHi16 : MVE_VQDMULH<"s16", 0b01>;		def MVE_VQDMULHi16 : MVE_VQDMULH<"s16", 0b01>;
def MVE_VQDMULHi32 : MVE_VQDMULH<"s32", 0b10>;		def MVE_VQDMULHi32 : MVE_VQDMULH<"s32", 0b10>;

def MVE_VQRDMULHi8 : MVE_VQRDMULH<"s8", 0b00>;		def MVE_VQRDMULHi8 : MVE_VQRDMULH<"s8", 0b00>;
def MVE_VQRDMULHi16 : MVE_VQRDMULH<"s16", 0b01>;		def MVE_VQRDMULHi16 : MVE_VQRDMULH<"s16", 0b01>;
def MVE_VQRDMULHi32 : MVE_VQRDMULH<"s32", 0b10>;		def MVE_VQRDMULHi32 : MVE_VQRDMULH<"s32", 0b10>;

class MVE_VADDSUB<string iname, string suffix, bits<2> size, bit subtract,		class MVE_VADDSUB<string iname, string suffix, bits<2> size, bit subtract,
list<dag> pattern=[]>		ValueType VT_, list<dag> pattern=[]>
: MVE_int<iname, suffix, size, pattern> {		: MVE_int<iname, suffix, size, pattern> {

let Inst{28} = subtract;		let Inst{28} = subtract;
let Inst{25-23} = 0b110;		let Inst{25-23} = 0b110;
let Inst{16} = 0b0;		let Inst{16} = 0b0;
let Inst{12-8} = 0b01000;		let Inst{12-8} = 0b01000;
let Inst{4} = 0b0;		let Inst{4} = 0b0;
let Inst{0} = 0b0;		let Inst{0} = 0b0;

		ValueType VT = VT_;
}		}

class MVE_VADD<string suffix, bits<2> size, list<dag> pattern=[]>		class MVE_VADD<string suffix, bits<2> size, ValueType VT>
: MVE_VADDSUB<"vadd", suffix, size, 0b0, pattern>;		: MVE_VADDSUB<"vadd", suffix, size, 0b0, VT>;
class MVE_VSUB<string suffix, bits<2> size, list<dag> pattern=[]>		class MVE_VSUB<string suffix, bits<2> size, ValueType VT>
: MVE_VADDSUB<"vsub", suffix, size, 0b1, pattern>;		: MVE_VADDSUB<"vsub", suffix, size, 0b1, VT>;

def MVE_VADDi8 : MVE_VADD<"i8", 0b00>;
def MVE_VADDi16 : MVE_VADD<"i16", 0b01>;
def MVE_VADDi32 : MVE_VADD<"i32", 0b10>;

let Predicates = [HasMVEInt] in {		def MVE_VADDi8 : MVE_VADD<"i8", 0b00, v16i8>;
def : Pat<(v16i8 (add (v16i8 MQPR:$val1), (v16i8 MQPR:$val2))),		def MVE_VADDi16 : MVE_VADD<"i16", 0b01, v8i16>;
(v16i8 (MVE_VADDi8 (v16i8 MQPR:$val1), (v16i8 MQPR:$val2)))>;		def MVE_VADDi32 : MVE_VADD<"i32", 0b10, v4i32>;
def : Pat<(v8i16 (add (v8i16 MQPR:$val1), (v8i16 MQPR:$val2))),
(v8i16 (MVE_VADDi16 (v8i16 MQPR:$val1), (v8i16 MQPR:$val2)))>;
def : Pat<(v4i32 (add (v4i32 MQPR:$val1), (v4i32 MQPR:$val2))),
(v4i32 (MVE_VADDi32 (v4i32 MQPR:$val1), (v4i32 MQPR:$val2)))>;
}

def MVE_VSUBi8 : MVE_VSUB<"i8", 0b00>;		def MVE_VSUBi8 : MVE_VSUB<"i8", 0b00, v16i8>;
def MVE_VSUBi16 : MVE_VSUB<"i16", 0b01>;		def MVE_VSUBi16 : MVE_VSUB<"i16", 0b01, v8i16>;
def MVE_VSUBi32 : MVE_VSUB<"i32", 0b10>;		def MVE_VSUBi32 : MVE_VSUB<"i32", 0b10, v4i32>;

let Predicates = [HasMVEInt] in {		let Predicates = [HasMVEInt] in {
def : Pat<(v16i8 (sub (v16i8 MQPR:$val1), (v16i8 MQPR:$val2))),		foreach instr = [MVE_VADDi8, MVE_VADDi16, MVE_VADDi32] in
(v16i8 (MVE_VSUBi8 (v16i8 MQPR:$val1), (v16i8 MQPR:$val2)))>;		foreach vtype = [instr.VT] in
def : Pat<(v8i16 (sub (v8i16 MQPR:$val1), (v8i16 MQPR:$val2))),		foreach ptype = [mkpred<vtype>.p] in {
(v8i16 (MVE_VSUBi16 (v8i16 MQPR:$val1), (v8i16 MQPR:$val2)))>;		def : Pat<(vtype (add (vtype MQPR:$Qm), (vtype MQPR:$Qn))),
		dmgreenUnsubmitted Done Reply Inline Actions I feel like this deserves at least a little indentation. Maybe not for each level of for, but something to make it easier to parse. The last 2 levels are really just defining variables, right? And could be written in-place. dmgreen: I feel like this deserves at least a little indentation. Maybe not for each level of for, but…
def : Pat<(v4i32 (sub (v4i32 MQPR:$val1), (v4i32 MQPR:$val2))),		(vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn)))>;
(v4i32 (MVE_VSUBi32 (v4i32 MQPR:$val1), (v4i32 MQPR:$val2)))>;		def : Pat<(vtype (int_arm_mve_add_predicated (vtype MQPR:$Qm),
		(vtype MQPR:$Qn),
		(ptype VCCR:$mask),
		(vtype MQPR:$inactive))),
		(vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn),
		(i32 1), (ptype VCCR:$mask),
		(vtype MQPR:$inactive)))>;
		}

		foreach instr = [MVE_VSUBi8, MVE_VSUBi16, MVE_VSUBi32] in
		foreach vtype = [instr.VT] in
		foreach ptype = [mkpred<vtype>.p] in {
		def : Pat<(vtype (sub (vtype MQPR:$Qm), (vtype MQPR:$Qn))),
		(vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn)))>;
		def : Pat<(vtype (int_arm_mve_sub_predicated (vtype MQPR:$Qm),
		(vtype MQPR:$Qn),
		(ptype VCCR:$mask),
		(vtype MQPR:$inactive))),
		(vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn),
		(i32 1), (ptype VCCR:$mask),
		(vtype MQPR:$inactive)))>;
		}
}		}

class MVE_VQADDSUB<string iname, string suffix, bit U, bit subtract,		class MVE_VQADDSUB<string iname, string suffix, bit U, bit subtract,
bits<2> size, list<dag> pattern=[]>		bits<2> size, list<dag> pattern=[]>
: MVE_int<iname, suffix, size, pattern> {		: MVE_int<iname, suffix, size, pattern> {

let Inst{28} = U;		let Inst{28} = U;
let Inst{25-23} = 0b110;		let Inst{25-23} = 0b110;
▲ Show 20 Lines • Show All 1,124 Lines • ▼ Show 20 Lines	class MVE_VCMLA<string suffix, bit size, list<dag> pattern=[]>
let Inst{7} = Qn{3};		let Inst{7} = Qn{3};
let Inst{4} = 0b0;		let Inst{4} = 0b0;
}		}

def MVE_VCMLAf16 : MVE_VCMLA<"f16", 0b0>;		def MVE_VCMLAf16 : MVE_VCMLA<"f16", 0b0>;
def MVE_VCMLAf32 : MVE_VCMLA<"f32", 0b1>;		def MVE_VCMLAf32 : MVE_VCMLA<"f32", 0b1>;

class MVE_VADDSUBFMA_fp<string iname, string suffix, bit size, bit bit_4,		class MVE_VADDSUBFMA_fp<string iname, string suffix, bit size, bit bit_4,
bit bit_8, bit bit_21, dag iops=(ins),		bit bit_8, bit bit_21, ValueType VT_, dag iops=(ins),
vpred_ops vpred=vpred_r, string cstr="",		vpred_ops vpred=vpred_r, string cstr="",
list<dag> pattern=[]>		list<dag> pattern=[]>
: MVEFloatArithNeon<iname, suffix, size, (outs MQPR:$Qd),		: MVEFloatArithNeon<iname, suffix, size, (outs MQPR:$Qd),
!con(iops, (ins MQPR:$Qn, MQPR:$Qm)), "$Qd, $Qn, $Qm",		!con(iops, (ins MQPR:$Qn, MQPR:$Qm)), "$Qd, $Qn, $Qm",
vpred, cstr, pattern> {		vpred, cstr, pattern> {
bits<4> Qd;		bits<4> Qd;
bits<4> Qn;		bits<4> Qn;

let Inst{28} = 0b0;		let Inst{28} = 0b0;
let Inst{25-23} = 0b110;		let Inst{25-23} = 0b110;
let Inst{22} = Qd{3};		let Inst{22} = Qd{3};
let Inst{21} = bit_21;		let Inst{21} = bit_21;
let Inst{19-17} = Qn{2-0};		let Inst{19-17} = Qn{2-0};
let Inst{15-13} = Qd{2-0};		let Inst{15-13} = Qd{2-0};
let Inst{11-9} = 0b110;		let Inst{11-9} = 0b110;
let Inst{8} = bit_8;		let Inst{8} = bit_8;
let Inst{7} = Qn{3};		let Inst{7} = Qn{3};
let Inst{4} = bit_4;		let Inst{4} = bit_4;

		ValueType VT = VT_;
}		}

def MVE_VFMAf32 : MVE_VADDSUBFMA_fp<"vfma", "f32", 0b0, 0b1, 0b0, 0b0,		def MVE_VFMAf32 : MVE_VADDSUBFMA_fp<"vfma", "f32", 0b0, 0b1, 0b0, 0b0, v4f32,
(ins MQPR:$Qd_src), vpred_n, "$Qd = $Qd_src">;		(ins MQPR:$Qd_src), vpred_n, "$Qd = $Qd_src">;
def MVE_VFMAf16 : MVE_VADDSUBFMA_fp<"vfma", "f16", 0b1, 0b1, 0b0, 0b0,		def MVE_VFMAf16 : MVE_VADDSUBFMA_fp<"vfma", "f16", 0b1, 0b1, 0b0, 0b0, v8f16,
(ins MQPR:$Qd_src), vpred_n, "$Qd = $Qd_src">;		(ins MQPR:$Qd_src), vpred_n, "$Qd = $Qd_src">;

def MVE_VFMSf32 : MVE_VADDSUBFMA_fp<"vfms", "f32", 0b0, 0b1, 0b0, 0b1,		def MVE_VFMSf32 : MVE_VADDSUBFMA_fp<"vfms", "f32", 0b0, 0b1, 0b0, 0b1, v4f32,
(ins MQPR:$Qd_src), vpred_n, "$Qd = $Qd_src">;		(ins MQPR:$Qd_src), vpred_n, "$Qd = $Qd_src">;
def MVE_VFMSf16 : MVE_VADDSUBFMA_fp<"vfms", "f16", 0b1, 0b1, 0b0, 0b1,		def MVE_VFMSf16 : MVE_VADDSUBFMA_fp<"vfms", "f16", 0b1, 0b1, 0b0, 0b1, v8f16,
(ins MQPR:$Qd_src), vpred_n, "$Qd = $Qd_src">;		(ins MQPR:$Qd_src), vpred_n, "$Qd = $Qd_src">;

let Predicates = [HasMVEFloat, UseFusedMAC] in {		let Predicates = [HasMVEFloat, UseFusedMAC] in {
def : Pat<(v8f16 (fadd (v8f16 MQPR:$src1),		def : Pat<(v8f16 (fadd (v8f16 MQPR:$src1),
(fmul (v8f16 MQPR:$src2),		(fmul (v8f16 MQPR:$src2),
(v8f16 MQPR:$src3)))),		(v8f16 MQPR:$src3)))),
(v8f16 (MVE_VFMAf16 $src1, $src2, $src3))>;		(v8f16 (MVE_VFMAf16 $src1, $src2, $src3))>;
def : Pat<(v4f32 (fadd (v4f32 MQPR:$src1),		def : Pat<(v4f32 (fadd (v4f32 MQPR:$src1),
Show All 14 Lines
let Predicates = [HasMVEFloat] in {		let Predicates = [HasMVEFloat] in {
def : Pat<(v8f16 (fma (v8f16 MQPR:$src1), (v8f16 MQPR:$src2), (v8f16 MQPR:$src3))),		def : Pat<(v8f16 (fma (v8f16 MQPR:$src1), (v8f16 MQPR:$src2), (v8f16 MQPR:$src3))),
(v8f16 (MVE_VFMAf16 $src3, $src1, $src2))>;		(v8f16 (MVE_VFMAf16 $src3, $src1, $src2))>;
def : Pat<(v4f32 (fma (v4f32 MQPR:$src1), (v4f32 MQPR:$src2), (v4f32 MQPR:$src3))),		def : Pat<(v4f32 (fma (v4f32 MQPR:$src1), (v4f32 MQPR:$src2), (v4f32 MQPR:$src3))),
(v4f32 (MVE_VFMAf32 $src3, $src1, $src2))>;		(v4f32 (MVE_VFMAf32 $src3, $src1, $src2))>;
}		}


def MVE_VADDf32 : MVE_VADDSUBFMA_fp<"vadd", "f32", 0b0, 0b0, 0b1, 0b0>;		def MVE_VADDf32 : MVE_VADDSUBFMA_fp<"vadd", "f32", 0b0, 0b0, 0b1, 0b0, v4f32>;
def MVE_VADDf16 : MVE_VADDSUBFMA_fp<"vadd", "f16", 0b1, 0b0, 0b1, 0b0>;		def MVE_VADDf16 : MVE_VADDSUBFMA_fp<"vadd", "f16", 0b1, 0b0, 0b1, 0b0, v8f16>;

let Predicates = [HasMVEFloat] in {
def : Pat<(v4f32 (fadd (v4f32 MQPR:$val1), (v4f32 MQPR:$val2))),
(v4f32 (MVE_VADDf32 (v4f32 MQPR:$val1), (v4f32 MQPR:$val2)))>;
def : Pat<(v8f16 (fadd (v8f16 MQPR:$val1), (v8f16 MQPR:$val2))),
(v8f16 (MVE_VADDf16 (v8f16 MQPR:$val1), (v8f16 MQPR:$val2)))>;
}

		dmgreenUnsubmitted Not Done Reply Inline Actions Can this move into MVE_VADDSUBFMA_fp? I'm pretty sure VFMA should be fine there too. Feel free to leave that for @samparker though. dmgreen: Can this move into MVE_VADDSUBFMA_fp? I'm pretty sure VFMA should be fine there too. Feel free…
def MVE_VSUBf32 : MVE_VADDSUBFMA_fp<"vsub", "f32", 0b0, 0b0, 0b1, 0b1>;		def MVE_VSUBf32 : MVE_VADDSUBFMA_fp<"vsub", "f32", 0b0, 0b0, 0b1, 0b1, v4f32>;
def MVE_VSUBf16 : MVE_VADDSUBFMA_fp<"vsub", "f16", 0b1, 0b0, 0b1, 0b1>;		def MVE_VSUBf16 : MVE_VADDSUBFMA_fp<"vsub", "f16", 0b1, 0b0, 0b1, 0b1, v8f16>;

let Predicates = [HasMVEFloat] in {		let Predicates = [HasMVEFloat] in {
def : Pat<(v4f32 (fsub (v4f32 MQPR:$val1), (v4f32 MQPR:$val2))),		foreach instr = [MVE_VADDf16, MVE_VADDf32] in
(v4f32 (MVE_VSUBf32 (v4f32 MQPR:$val1), (v4f32 MQPR:$val2)))>;		foreach vtype = [instr.VT] in
def : Pat<(v8f16 (fsub (v8f16 MQPR:$val1), (v8f16 MQPR:$val2))),		foreach ptype = [mkpred<vtype>.p] in {
(v8f16 (MVE_VSUBf16 (v8f16 MQPR:$val1), (v8f16 MQPR:$val2)))>;		def : Pat<(vtype (fadd (vtype MQPR:$Qm), (vtype MQPR:$Qn))),
		dmgreenUnsubmitted Not Done Reply Inline Actions Little bit of indenting, please. dmgreen: Little bit of indenting, please.
		(vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn)))>;
		def : Pat<(vtype (int_arm_mve_add_predicated (vtype MQPR:$Qm),
		(vtype MQPR:$Qn),
		(ptype VCCR:$mask),
		(vtype MQPR:$inactive))),
		(vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn),
		(i32 1), (ptype VCCR:$mask),
		(vtype MQPR:$inactive)))>;
		}

		foreach instr = [MVE_VSUBf16, MVE_VSUBf32] in
		foreach vtype = [instr.VT] in
		foreach ptype = [mkpred<vtype>.p] in {
		def : Pat<(vtype (fsub (vtype MQPR:$Qm), (vtype MQPR:$Qn))),
		(vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn)))>;
		def : Pat<(vtype (int_arm_mve_sub_predicated (vtype MQPR:$Qm),
		(vtype MQPR:$Qn),
		(ptype VCCR:$mask),
		(vtype MQPR:$inactive))),
		(vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn),
		(i32 1), (ptype VCCR:$mask),
		(vtype MQPR:$inactive)))>;
		}
}		}

class MVE_VCADD<string suffix, bit size, list<dag> pattern=[]>		class MVE_VCADD<string suffix, bit size, list<dag> pattern=[]>
: MVEFloatArithNeon<"vcadd", suffix, size, (outs MQPR:$Qd),		: MVEFloatArithNeon<"vcadd", suffix, size, (outs MQPR:$Qd),
(ins MQPR:$Qn, MQPR:$Qm, complexrotateopodd:$rot),		(ins MQPR:$Qn, MQPR:$Qm, complexrotateopodd:$rot),
"$Qd, $Qn, $Qm, $rot", vpred_r, "", pattern> {		"$Qd, $Qn, $Qm, $rot", vpred_r, "", pattern> {
bits<4> Qd;		bits<4> Qd;
bits<4> Qn;		bits<4> Qn;
▲ Show 20 Lines • Show All 737 Lines • ▼ Show 20 Lines
multiclass MVE_VCVT_ff_halves<string suffix, bit op> {		multiclass MVE_VCVT_ff_halves<string suffix, bit op> {
def bh : MVE_VCVT_ff<"vcvtb", suffix, op, 0b0>;		def bh : MVE_VCVT_ff<"vcvtb", suffix, op, 0b0>;
def th : MVE_VCVT_ff<"vcvtt", suffix, op, 0b1>;		def th : MVE_VCVT_ff<"vcvtt", suffix, op, 0b1>;
}		}

defm MVE_VCVTf16f32 : MVE_VCVT_ff_halves<"f16.f32", 0b0>;		defm MVE_VCVTf16f32 : MVE_VCVT_ff_halves<"f16.f32", 0b0>;
defm MVE_VCVTf32f16 : MVE_VCVT_ff_halves<"f32.f16", 0b1>;		defm MVE_VCVTf32f16 : MVE_VCVT_ff_halves<"f32.f16", 0b1>;

		let Predicates = [HasMVEFloat] in {
		def : Pat<(v8f16 (int_arm_mve_fltnarrow (v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm), (i32 0))),
		(v8f16 (MVE_VCVTf16f32bh (v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm)))>;
		def : Pat<(v8f16 (int_arm_mve_fltnarrow (v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm), (i32 1))),
		(v8f16 (MVE_VCVTf16f32th (v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm)))>;
		def : Pat<(v8f16 (int_arm_mve_fltnarrow_predicated (v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm), (i32 0), (v4i1 VCCR:$mask))),
		(v8f16 (MVE_VCVTf16f32bh (v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm), (i32 1), (v4i1 VCCR:$mask)))>;
		def : Pat<(v8f16 (int_arm_mve_fltnarrow_predicated (v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm), (i32 1), (v4i1 VCCR:$mask))),
		(v8f16 (MVE_VCVTf16f32th (v8f16 MQPR:$Qd_src), (v4f32 MQPR:$Qm), (i32 1), (v4i1 VCCR:$mask)))>;
		}

class MVE_VxCADD<string iname, string suffix, bits<2> size, bit halve,		class MVE_VxCADD<string iname, string suffix, bits<2> size, bit halve,
list<dag> pattern=[]>		list<dag> pattern=[]>
: MVE_qDest_qSrc<iname, suffix, (outs MQPR:$Qd),		: MVE_qDest_qSrc<iname, suffix, (outs MQPR:$Qd),
(ins MQPR:$Qn, MQPR:$Qm, complexrotateopodd:$rot),		(ins MQPR:$Qn, MQPR:$Qm, complexrotateopodd:$rot),
"$Qd, $Qn, $Qm, $rot", vpred_r, "",		"$Qd, $Qn, $Qm, $rot", vpred_r, "",
pattern> {		pattern> {
bits<4> Qn;		bits<4> Qn;
bit rot;		bit rot;
▲ Show 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	def "MVE_VLD" # n.nvecs # stage # "_" # s.lanesize # wb.id_suffix
: MVE_vld24_base<n, stage, s.sizebits, wb,		: MVE_vld24_base<n, stage, s.sizebits, wb,
"vld" # n.nvecs # stage # "." # s.lanesize>;		"vld" # n.nvecs # stage # "." # s.lanesize>;

def "MVE_VST" # n.nvecs # stage # "_" # s.lanesize # wb.id_suffix		def "MVE_VST" # n.nvecs # stage # "_" # s.lanesize # wb.id_suffix
: MVE_vst24_base<n, stage, s.sizebits, wb,		: MVE_vst24_base<n, stage, s.sizebits, wb,
"vst" # n.nvecs # stage # "." # s.lanesize>;		"vst" # n.nvecs # stage # "." # s.lanesize>;
}		}

		multiclass MVE_vst24_patterns<int lanesize, ValueType VT> {
		foreach stage = [0,1] in
		def : Pat<(int_arm_mve_vst2q i32:$addr,
		(VT MQPR:$v0), (VT MQPR:$v1), (i32 stage)),
		(!cast<Instruction>("MVE_VST2"#stage#"_"#lanesize)
		(REG_SEQUENCE QQPR, VT:$v0, qsub_0, VT:$v1, qsub_1),
		t2_addr_offset_none:$addr)>;

		foreach stage = [0,1,2,3] in
		def : Pat<(int_arm_mve_vst4q i32:$addr,
		(VT MQPR:$v0), (VT MQPR:$v1),
		(VT MQPR:$v2), (VT MQPR:$v3), (i32 stage)),
		(!cast<Instruction>("MVE_VST4"#stage#"_"#lanesize)
		(REG_SEQUENCE QQQQPR, VT:$v0, qsub_0, VT:$v1, qsub_1,
		VT:$v2, qsub_2, VT:$v3, qsub_3),
		t2_addr_offset_none:$addr)>;
		}
		defm : MVE_vst24_patterns<8, v16i8>;
		defm : MVE_vst24_patterns<16, v8i16>;
		defm : MVE_vst24_patterns<32, v4i32>;

// end of MVE interleaving load/store		// end of MVE interleaving load/store

// start of MVE predicable load/store		// start of MVE predicable load/store

// A parameter class for the direction of transfer.		// A parameter class for the direction of transfer.
class MVE_ldst_direction<bit b, dag Oo, dag Io, string c=""> {		class MVE_ldst_direction<bit b, dag Oo, dag Io, string c=""> {
bit load = b;		bit load = b;
dag Oops = Oo;		dag Oops = Oo;
▲ Show 20 Lines • Show All 1,043 Lines • ▼ Show 20 Lines	let Predicates = [IsBE,HasMVEInt] in {

def : Pat<(v16i8 (bitconvert (v2f64 MQPR:$src))), (v16i8 (MVE_VREV64_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v2f64 MQPR:$src))), (v16i8 (MVE_VREV64_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v2i64 MQPR:$src))), (v16i8 (MVE_VREV64_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v2i64 MQPR:$src))), (v16i8 (MVE_VREV64_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v4f32 MQPR:$src))), (v16i8 (MVE_VREV32_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v4f32 MQPR:$src))), (v16i8 (MVE_VREV32_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v4i32 MQPR:$src))), (v16i8 (MVE_VREV32_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v4i32 MQPR:$src))), (v16i8 (MVE_VREV32_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v8f16 MQPR:$src))), (v16i8 (MVE_VREV16_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v8f16 MQPR:$src))), (v16i8 (MVE_VREV16_8 MQPR:$src))>;
def : Pat<(v16i8 (bitconvert (v8i16 MQPR:$src))), (v16i8 (MVE_VREV16_8 MQPR:$src))>;		def : Pat<(v16i8 (bitconvert (v8i16 MQPR:$src))), (v16i8 (MVE_VREV16_8 MQPR:$src))>;
}		}

		foreach vi1 = [ v16i1, v8i1, v4i1 ] in {
		def : Pat<(vi1 (int_arm_mve_pred_i2v (i32 GPR:$pred))),
		dmgreenUnsubmitted Not Done Reply Inline Actions There is an isel node called predicate_cast that does the same thing as this. It may be possible/beneficial to convert this earlier (during lowering, but I'm not sure that would do anything yet). The patterns are further up, near a lot of the vcmp patterns. dmgreen: There is an isel node called predicate_cast that does the same thing as this. It may be…
		(vi1 (COPY_TO_REGCLASS GPR:$pred, VCCR))>;
		def : Pat<(i32 (int_arm_mve_pred_v2i (vi1 VCCR:$pred))),
		(i32 (COPY_TO_REGCLASS VCCR:$pred, GPR))>;
		}

llvm/test/CodeGen/Thumb2/mve-intrinsics/scalar-shifts.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc i64 @test_urshrl(i64 %value) {
				; CHECK-LABEL: test_urshrl:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: urshrl r0, r1, #6
				; CHECK-NEXT: bx lr
				entry:
				%0 = lshr i64 %value, 32
				%1 = trunc i64 %0 to i32
				%2 = trunc i64 %value to i32
				%3 = tail call { i32, i32 } @llvm.arm.mve.urshrl(i32 %2, i32 %1, i32 6)
				%4 = extractvalue { i32, i32 } %3, 1
				%5 = zext i32 %4 to i64
				%6 = shl nuw i64 %5, 32
				%7 = extractvalue { i32, i32 } %3, 0
				%8 = zext i32 %7 to i64
				%9 = or i64 %6, %8
				ret i64 %9
				}

				declare { i32, i32 } @llvm.arm.mve.urshrl(i32, i32, i32)

llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -o - %s \| FileCheck %s
				dmgreenUnsubmitted Done Reply Inline Actions -verify-machineinstrs is probably worth putting on most of these tests. dmgreen: -verify-machineinstrs is probably worth putting on most of these tests.

				define arm_aapcs_vfpcc <4 x i32> @test_vadciq_s32(<4 x i32> %a, <4 x i32> %b, i32* %carry_out) {
				; CHECK-LABEL: test_vadciq_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vadci.i32 q0, q0, q1
				; CHECK-NEXT: vmrs r1, fpscr_nzcvqc
				; CHECK-NEXT: ubfx r1, r1, #29, #1
				; CHECK-NEXT: str r1, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call { <4 x i32>, i32 } @llvm.arm.mve.vadc.v4i32(<4 x i32> %a, <4 x i32> %b, i32 0)
				%1 = extractvalue { <4 x i32>, i32 } %0, 1
				%2 = lshr i32 %1, 29
				%3 = and i32 %2, 1
				store i32 %3, i32* %carry_out, align 4
				%4 = extractvalue { <4 x i32>, i32 } %0, 0
				ret <4 x i32> %4
				}

				declare { <4 x i32>, i32 } @llvm.arm.mve.vadc.v4i32(<4 x i32>, <4 x i32>, i32) #1
				dmgreenUnsubmitted Done Reply Inline Actions The #1 can be removed? dmgreen: The #1 can be removed?

				define arm_aapcs_vfpcc <4 x i32> @test_vadcq_u32(<4 x i32> %a, <4 x i32> %b, i32* %carry) {
				; CHECK-LABEL: test_vadcq_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: ldr r1, [r0]
				; CHECK-NEXT: lsls r1, r1, #29
				; CHECK-NEXT: vmsr fpscr_nzcvqc, r1
				; CHECK-NEXT: vadc.i32 q0, q0, q1
				; CHECK-NEXT: vmrs r1, fpscr_nzcvqc
				; CHECK-NEXT: ubfx r1, r1, #29, #1
				; CHECK-NEXT: str r1, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = load i32, i32* %carry, align 4
				%1 = shl i32 %0, 29
				%2 = tail call { <4 x i32>, i32 } @llvm.arm.mve.vadc.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %1)
				%3 = extractvalue { <4 x i32>, i32 } %2, 1
				%4 = lshr i32 %3, 29
				%5 = and i32 %4, 1
				store i32 %5, i32* %carry, align 4
				%6 = extractvalue { <4 x i32>, i32 } %2, 0
				ret <4 x i32> %6
				}

				define arm_aapcs_vfpcc <4 x i32> @test_vadciq_m_u32(<4 x i32> %inactive, <4 x i32> %a, <4 x i32> %b, i32* %carry_out, i16 zeroext %p) {
				; CHECK-LABEL: test_vadciq_m_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vadcit.i32 q0, q1, q2
				; CHECK-NEXT: vmrs r1, fpscr_nzcvqc
				; CHECK-NEXT: ubfx r1, r1, #29, #1
				; CHECK-NEXT: str r1, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call { <4 x i32>, i32 } @llvm.arm.mve.vadc.predicated.v4i32.v4i1(<4 x i32> %inactive, <4 x i32> %a, <4 x i32> %b, i32 0, <4 x i1> %1)
				%3 = extractvalue { <4 x i32>, i32 } %2, 1
				%4 = lshr i32 %3, 29
				%5 = and i32 %4, 1
				store i32 %5, i32* %carry_out, align 4
				%6 = extractvalue { <4 x i32>, i32 } %2, 0
				ret <4 x i32> %6
				}

				declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32) #1

				declare { <4 x i32>, i32 } @llvm.arm.mve.vadc.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, <4 x i32>, i32, <4 x i1>) #1

				define arm_aapcs_vfpcc <4 x i32> @test_vadcq_m_s32(<4 x i32> %inactive, <4 x i32> %a, <4 x i32> %b, i32* %carry, i16 zeroext %p) {
				; CHECK-LABEL: test_vadcq_m_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: ldr r2, [r0]
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: lsls r1, r2, #29
				; CHECK-NEXT: vmsr fpscr_nzcvqc, r1
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vadct.i32 q0, q1, q2
				; CHECK-NEXT: vmrs r1, fpscr_nzcvqc
				; CHECK-NEXT: ubfx r1, r1, #29, #1
				; CHECK-NEXT: str r1, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = load i32, i32* %carry, align 4
				%1 = shl i32 %0, 29
				%2 = zext i16 %p to i32
				%3 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %2)
				%4 = tail call { <4 x i32>, i32 } @llvm.arm.mve.vadc.predicated.v4i32.v4i1(<4 x i32> %inactive, <4 x i32> %a, <4 x i32> %b, i32 %1, <4 x i1> %3)
				%5 = extractvalue { <4 x i32>, i32 } %4, 1
				%6 = lshr i32 %5, 29
				%7 = and i32 %6, 1
				store i32 %7, i32* %carry, align 4
				%8 = extractvalue { <4 x i32>, i32 } %4, 0
				ret <4 x i32> %8
				}

llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc <4 x i32> @test_vaddq_u32(<4 x i32> %a, <4 x i32> %b) {
				; CHECK-LABEL: test_vaddq_u32:
				dmgreenUnsubmitted Not Done Reply Inline Actions We probably don't _need_ tests for simple instructions like this, they should be covered elsewhere (fine to leave them if you wish). dmgreen: We probably don't _need_ tests for simple instructions like this, they should be covered…
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vadd.i32 q0, q1, q0
				; CHECK-NEXT: bx lr
				entry:
				%0 = add <4 x i32> %b, %a
				ret <4 x i32> %0
				}

				define arm_aapcs_vfpcc <8 x half> @test_vsubq_f16(<8 x half> %a, <8 x half> %b) {
				; CHECK-LABEL: test_vsubq_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vsub.f16 q0, q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = fsub <8 x half> %a, %b
				ret <8 x half> %0
				}

				define arm_aapcs_vfpcc <16 x i8> @test_vaddq_m_s8(<16 x i8> %inactive, <16 x i8> %a, <16 x i8> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vaddq_m_s8:
				dmgreenUnsubmitted Not Done Reply Inline Actions For the rest of the tests, at least for codegen we have tried to fill in all the combinations for type and operations (at least the legal types). It can be useful for making sure nothing is missed (here or in the future when some refactoring happens). Whether you want to do the same thing here is up to you, or whether you think that having interesting combinations is enough (adds with v16i8, subs with v4f32 for example). dmgreen: For the rest of the tests, at least for codegen we have tried to fill in all the combinations…
				dmgreenUnsubmitted Not Done Reply Inline Actions This could probably do with a reply, one way or the other. I was previously in the "don't mind either way" camp, now I feel more in the "why not just add them" camp, unless there is some reason not to. dmgreen: This could probably do with a reply, one way or the other. I was previously in the "don't mind…
				simon_tathamAuthorUnsubmitted Done Reply Inline Actions I had intended to stop at "interesting subset of combinations", along the lines of one of each operation, and one of each type, but not the full cross product unless absolutely necessary. (There are about 2000 of these to come in future work, so at some point adding a test for every single one won't deserve the word "just" any more!) simon_tatham: I had intended to stop at "interesting subset of combinations", along the lines of one of each…
				dmgreenUnsubmitted Not Done Reply Inline Actions The add int and add float are two different instructions, same for sub, so it probably best to make sure we have at least one of each of those combinations. I was looking for prior art of only adding subsets of the combinations for codegen, I don't immediately see anywhere where we have done that. I'm not too worried about this set of tests not catching problems now. But in the future as things are refactored (in ways that might be difficult to predict), it might be expected that the test coverage is more complete and lead to things getting missed. I'd suggest we at least fill in the combinations for add and sub. Later on when we get to vrmlsldavhax it probably wont be as important. dmgreen: The add int and add float are two different instructions, same for sub, so it probably best to…
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vaddt.i8 q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 %0)
				%2 = tail call <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8> %a, <16 x i8> %b, <16 x i1> %1, <16 x i8> %inactive)
				ret <16 x i8> %2
				}

				declare <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32)

				declare <16 x i8> @llvm.arm.mve.add.predicated.v16i8.v16i1(<16 x i8>, <16 x i8>, <16 x i1>, <16 x i8>)

				define arm_aapcs_vfpcc <4 x float> @test_vsubq_m_f32(<4 x float> %inactive, <4 x float> %a, <4 x float> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vsubq_m_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vsubt.f32 q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <4 x float> @llvm.arm.mve.sub.predicated.v4f32.v4i1(<4 x float> %a, <4 x float> %b, <4 x i1> %1, <4 x float> %inactive)
				ret <4 x float> %2
				dmgreenUnsubmitted Not Done Reply Inline Actions Do we care about what happens when there is a fp intrinsic but we don't have mve.fp? I presume this will be a fail to select or some sort of legalisation error, which is probably fine considering what is happening. dmgreen: Do we care about what happens when there is a fp intrinsic but we don't have mve.fp? I presume…
				}

				declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)

				declare <4 x float> @llvm.arm.mve.sub.predicated.v4f32.v4i1(<4 x float>, <4 x float>, <4 x i1>, <4 x float>)

llvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc <8 x half> @test_vcvttq_f16_f32(<8 x half> %a, <4 x float> %b) {
				; CHECK-LABEL: test_vcvttq_f16_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vcvtt.f16.f32 q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call <8 x half> @llvm.arm.mve.fltnarrow(<8 x half> %a, <4 x float> %b, i32 1)
				dmgreenUnsubmitted Done Reply Inline Actions Can you add test for 0 too dmgreen: Can you add test for 0 too
				ret <8 x half> %0
				}

				declare <8 x half> @llvm.arm.mve.fltnarrow(<8 x half>, <4 x float>, i32)

				define arm_aapcs_vfpcc <8 x half> @test_vcvttq_m_f16_f32(<8 x half> %a, <4 x float> %b, i16 zeroext %p) {
				; CHECK-LABEL: test_vcvttq_m_f16_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vcvttt.f16.f32 q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%0 = zext i16 %p to i32
				%1 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %0)
				%2 = tail call <8 x half> @llvm.arm.mve.fltnarrow.predicated(<8 x half> %a, <4 x float> %b, i32 1, <4 x i1> %1)
				ret <8 x half> %2
				}

				declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)

				declare <8 x half> @llvm.arm.mve.fltnarrow.predicated(<8 x half>, <4 x float>, i32, <4 x i1>)

llvm/test/CodeGen/Thumb2/mve-intrinsics/vld24.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -o - %s \| FileCheck %s

				%struct.float16x8x2_t = type { [2 x <8 x half>] }
				%struct.uint8x16x4_t = type { [4 x <16 x i8>] }
				%struct.uint32x4x2_t = type { [2 x <4 x i32>] }
				%struct.int8x16x4_t = type { [4 x <16 x i8>] }

				define arm_aapcs_vfpcc %struct.float16x8x2_t @test_vld2q_f16(half* %addr) {
				; CHECK-LABEL: test_vld2q_f16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vld20.16 {q0, q1}, [r0]
				; CHECK-NEXT: vld21.16 {q0, q1}, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call { <8 x half>, <8 x half> } @llvm.arm.mve.vld2q.v8f16.p0f16(half* %addr)
				%1 = extractvalue { <8 x half>, <8 x half> } %0, 0
				%2 = insertvalue %struct.float16x8x2_t undef, <8 x half> %1, 0, 0
				%3 = extractvalue { <8 x half>, <8 x half> } %0, 1
				%4 = insertvalue %struct.float16x8x2_t %2, <8 x half> %3, 0, 1
				ret %struct.float16x8x2_t %4
				}

				declare { <8 x half>, <8 x half> } @llvm.arm.mve.vld2q.v8f16.p0f16(half*)

				define arm_aapcs_vfpcc %struct.uint8x16x4_t @test_vld4q_u8(i8* %addr) {
				; CHECK-LABEL: test_vld4q_u8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vld40.8 {q0, q1, q2, q3}, [r0]
				; CHECK-NEXT: vld41.8 {q0, q1, q2, q3}, [r0]
				; CHECK-NEXT: vld42.8 {q0, q1, q2, q3}, [r0]
				; CHECK-NEXT: vld43.8 {q0, q1, q2, q3}, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call { <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.mve.vld4q.v16i8.p0i8(i8* %addr)
				%1 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8> } %0, 0
				%2 = insertvalue %struct.uint8x16x4_t undef, <16 x i8> %1, 0, 0
				%3 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8> } %0, 1
				%4 = insertvalue %struct.uint8x16x4_t %2, <16 x i8> %3, 0, 1
				%5 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8> } %0, 2
				%6 = insertvalue %struct.uint8x16x4_t %4, <16 x i8> %5, 0, 2
				%7 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8> } %0, 3
				%8 = insertvalue %struct.uint8x16x4_t %6, <16 x i8> %7, 0, 3
				ret %struct.uint8x16x4_t %8
				}

				declare { <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.mve.vld4q.v16i8.p0i8(i8*)

				define arm_aapcs_vfpcc void @test_vst2q_u32(i32* %addr, %struct.uint32x4x2_t %value.coerce) {
				; CHECK-LABEL: test_vst2q_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: @ kill: def $q1 killed $q1 killed $q0_q1 def $q0_q1
				; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1 def $q0_q1
				; CHECK-NEXT: vst20.32 {q0, q1}, [r0]
				; CHECK-NEXT: vst21.32 {q0, q1}, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%value.coerce.fca.0.0.extract = extractvalue %struct.uint32x4x2_t %value.coerce, 0, 0
				%value.coerce.fca.0.1.extract = extractvalue %struct.uint32x4x2_t %value.coerce, 0, 1
				tail call void @llvm.arm.mve.vst2q.p0i32.v4i32(i32* %addr, <4 x i32> %value.coerce.fca.0.0.extract, <4 x i32> %value.coerce.fca.0.1.extract, i32 0)
				tail call void @llvm.arm.mve.vst2q.p0i32.v4i32(i32* %addr, <4 x i32> %value.coerce.fca.0.0.extract, <4 x i32> %value.coerce.fca.0.1.extract, i32 1)
				ret void
				}

				declare void @llvm.arm.mve.vst2q.p0i32.v4i32(i32*, <4 x i32>, <4 x i32>, i32)

				define arm_aapcs_vfpcc void @test_vst4q_s8(i8* %addr, %struct.int8x16x4_t %value.coerce) {
				; CHECK-LABEL: test_vst4q_s8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: @ kill: def $q3 killed $q3 killed $q0_q1_q2_q3 def $q0_q1_q2_q3
				; CHECK-NEXT: @ kill: def $q2 killed $q2 killed $q0_q1_q2_q3 def $q0_q1_q2_q3
				; CHECK-NEXT: @ kill: def $q1 killed $q1 killed $q0_q1_q2_q3 def $q0_q1_q2_q3
				; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3 def $q0_q1_q2_q3
				; CHECK-NEXT: vst40.8 {q0, q1, q2, q3}, [r0]
				; CHECK-NEXT: vst41.8 {q0, q1, q2, q3}, [r0]
				; CHECK-NEXT: vst42.8 {q0, q1, q2, q3}, [r0]
				; CHECK-NEXT: vst43.8 {q0, q1, q2, q3}, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%value.coerce.fca.0.0.extract = extractvalue %struct.int8x16x4_t %value.coerce, 0, 0
				%value.coerce.fca.0.1.extract = extractvalue %struct.int8x16x4_t %value.coerce, 0, 1
				%value.coerce.fca.0.2.extract = extractvalue %struct.int8x16x4_t %value.coerce, 0, 2
				%value.coerce.fca.0.3.extract = extractvalue %struct.int8x16x4_t %value.coerce, 0, 3
				tail call void @llvm.arm.mve.vst4q.p0i8.v16i8(i8* %addr, <16 x i8> %value.coerce.fca.0.0.extract, <16 x i8> %value.coerce.fca.0.1.extract, <16 x i8> %value.coerce.fca.0.2.extract, <16 x i8> %value.coerce.fca.0.3.extract, i32 0)
				tail call void @llvm.arm.mve.vst4q.p0i8.v16i8(i8* %addr, <16 x i8> %value.coerce.fca.0.0.extract, <16 x i8> %value.coerce.fca.0.1.extract, <16 x i8> %value.coerce.fca.0.2.extract, <16 x i8> %value.coerce.fca.0.3.extract, i32 1)
				tail call void @llvm.arm.mve.vst4q.p0i8.v16i8(i8* %addr, <16 x i8> %value.coerce.fca.0.0.extract, <16 x i8> %value.coerce.fca.0.1.extract, <16 x i8> %value.coerce.fca.0.2.extract, <16 x i8> %value.coerce.fca.0.3.extract, i32 2)
				tail call void @llvm.arm.mve.vst4q.p0i8.v16i8(i8* %addr, <16 x i8> %value.coerce.fca.0.0.extract, <16 x i8> %value.coerce.fca.0.1.extract, <16 x i8> %value.coerce.fca.0.2.extract, <16 x i8> %value.coerce.fca.0.3.extract, i32 3)
				ret void
				}

				declare void @llvm.arm.mve.vst4q.p0i8.v16i8(i8*, <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>, i32)

llvm/test/CodeGen/Thumb2/mve-intrinsics/vldr.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc <4 x i32> @test_vldrwq_gather_base_wb_s32(<4 x i32>* %addr) {
				; CHECK-LABEL: test_vldrwq_gather_base_wb_s32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: vldrw.u32 q1, [q0, #80]!
				; CHECK-NEXT: vstrw.32 q1, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = load <4 x i32>, <4 x i32>* %addr, align 8
				%1 = tail call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4i32.v4i32(<4 x i32> %0, i32 80)
				%2 = extractvalue { <4 x i32>, <4 x i32> } %1, 1
				store <4 x i32> %2, <4 x i32>* %addr, align 8
				%3 = extractvalue { <4 x i32>, <4 x i32> } %1, 0
				ret <4 x i32> %3
				}

				declare { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4i32.v4i32(<4 x i32>, i32)

				define arm_aapcs_vfpcc <4 x float> @test_vldrwq_gather_base_wb_f32(<4 x i32>* %addr) {
				; CHECK-LABEL: test_vldrwq_gather_base_wb_f32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: vldrw.u32 q1, [q0, #64]!
				; CHECK-NEXT: vstrw.32 q1, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = load <4 x i32>, <4 x i32>* %addr, align 8
				%1 = tail call { <4 x float>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4f32.v4i32(<4 x i32> %0, i32 64)
				%2 = extractvalue { <4 x float>, <4 x i32> } %1, 1
				store <4 x i32> %2, <4 x i32>* %addr, align 8
				%3 = extractvalue { <4 x float>, <4 x i32> } %1, 0
				ret <4 x float> %3
				}

				declare { <4 x float>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4f32.v4i32(<4 x i32>, i32)

				define arm_aapcs_vfpcc <2 x i64> @test_vldrdq_gather_base_wb_z_u64(<2 x i64>* %addr, i16 zeroext %p) {
				; CHECK-LABEL: test_vldrdq_gather_base_wb_z_u64:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmsr p0, r1
				; CHECK-NEXT: vldrw.u32 q0, [r0]
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vldrdt.u64 q1, [q0, #656]!
				; CHECK-NEXT: vstrw.32 q1, [r0]
				; CHECK-NEXT: bx lr
				entry:
				%0 = load <2 x i64>, <2 x i64>* %addr, align 8
				%1 = zext i16 %p to i32
				%2 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 %1)
				%3 = tail call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v2i64.v2i64.v4i1(<2 x i64> %0, i32 656, <4 x i1> %2)
				%4 = extractvalue { <2 x i64>, <2 x i64> } %3, 1
				store <2 x i64> %4, <2 x i64>* %addr, align 8
				%5 = extractvalue { <2 x i64>, <2 x i64> } %3, 0
				ret <2 x i64> %5
				}

				declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)

				declare { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v2i64.v2i64.v4i1(<2 x i64>, i32, <4 x i1>)

llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -o - %s \| FileCheck %s

				define arm_aapcs_vfpcc i32 @test_vminvq_u32(i32 %a, <4 x i32> %b) {
				dmgreenUnsubmitted Done Reply Inline Actions Test other VT's too, please. dmgreen: Test other VT's too, please.
				; CHECK-LABEL: test_vminvq_u32:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vminv.u8 r0, q0
				; CHECK-NEXT: bx lr
				entry:
				%0 = tail call i32 @llvm.arm.mve.minv.u.v4i32(i32 %a, <4 x i32> %b)
				ret i32 %0
				}

				declare i32 @llvm.arm.mve.minv.u.v4i32(i32, <4 x i32>)