This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64SchedA510.td
10/10
AArch64SchedNeoverseV2.td
-
unittests/Target/AArch64/
-
Target/
-
AArch64/
5/9
AArch64SvePseudoTest.cpp
-
CMakeLists.txt

Differential D154084

[AArch64] Modify SVE Pseudo appends
ClosedPublic

Authored by harviniriawan on Jun 29 2023, 6:59 AM.

Download Raw Diff

Details

Reviewers

dmgreen
paulwalker-arm

Summary

Update cortex-a510 and neoverse-v2 SVE scheduling so that pseudos have the same instruction latency as original instruction.

Differential Revision: https://reviews.llvm.org/D154084

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60 ms	x64 debian > LLVM.CodeGen/AArch64::sve-pseudos-expand-undef.mir

Event Timeline

harviniriawan created this revision.Jun 29 2023, 6:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 6:59 AM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls. · View Herald Transcript

harviniriawan requested review of this revision.Jun 29 2023, 6:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 6:59 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Sounds like a good idea. And I like the test, that looks very useful in keeping the data insync.

llvm/unittests/Target/AArch64/AArch64SvePseudoTest.cpp
35	false -> true
39	Can you add CortexA510 to this test name. It would be good to make use of the test in other cpu's like N2 and V2 (in a future patch). Perhaps pass the CPU as a parameter to createTargetMachine too? And maybe make it clear that it is testing scheduling info. AArch64SVESchedPseudoTest or something similar?
52	origInstr -> OrigInstr, as per the llvm coding standard in https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly.
60	Descorig -> DescOrig, same below.

Can the UNDEF_D -> D_UNDEF renaming be pulled into a separate NFC patch?

This looks like a nice patch. I left a few comments for the Neoverse V2 below.

Also, though not particularly important for now, maybe we could also normalise SME instructions (like ADDHA_MPPZ_D_PSEUDO_D)?

llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
2081–2082	This line is actually not necessary anymore
2095–2096	Same as above.
2156–2157	Unnecessary now.
2370–2371	These should still use the explicit forms, or be changed to instregex.
2524–2525	Should still use the explicit form.
2528–2529	Ditto.
2533	Ditto.
2560	Should use the explicit form.
2563	Ditto.
2566	Ditto.
llvm/unittests/Target/AArch64/AArch64SvePseudoTest.cpp
50	nit: typo

Harbormaster completed remote builds in B242076: Diff 535772.Jun 29 2023, 8:05 AM

In D154084#4459841, @paulwalker-arm wrote:

Can the UNDEF_D -> D_UNDEF renaming be pulled into a separate NFC patch?

I don't think it's possible. I can add tests on Neoverse-N2 CPU as well in here, then

Matt added a subscriber: Matt.Jun 29 2023, 5:07 PM

In D154084#4460011, @harviniriawan wrote:

In D154084#4459841, @paulwalker-arm wrote:

Can the UNDEF_D -> D_UNDEF renaming be pulled into a separate NFC patch?

I don't think it's possible. I can add tests on Neoverse-N2 CPU as well in here, then

Are you sure? To be clear I'm talking about the changes to llvm/lib/Target/AArch64/SVEInstrFormats.td which looks like a mechanical name change to me that should not affect any behaviour. If it does then that's likely a bug.

In D154084#4463203, @paulwalker-arm wrote:

In D154084#4460011, @harviniriawan wrote:

In D154084#4459841, @paulwalker-arm wrote:

Can the UNDEF_D -> D_UNDEF renaming be pulled into a separate NFC patch?

I don't think it's possible. I can add tests on Neoverse-N2 CPU as well in here, then

Are you sure? To be clear I'm talking about the changes to llvm/lib/Target/AArch64/SVEInstrFormats.td which looks like a mechanical name change to me that should not affect any behaviour. If it does then that's likely a bug.

the UNDEF is actually used in pseudo lowering (if that's the right term). In AArch64GenInstrInfo.inc : So if during pre RA MI Sched the UNDEF variant of the SVE instruction is being generated, it will not pick up the right latency
// getSVEPseudoMap
LLVM_READONLY
int getSVEPseudoMap(uint16_t Opcode) {
static const uint16_t getSVEPseudoMapTable[][2] = {

{ AArch64::ABS_ZPmZ_B_UNDEF, AArch64::ABS_ZPmZ_B },
{ AArch64::ABS_ZPmZ_D_UNDEF, AArch64::ABS_ZPmZ_D },

harviniriawan added a parent revision: D154232: [AArch64] NFC : Change the way SVE pseudos are appended.Jun 30 2023, 9:40 AM

harviniriawan updated this revision to Diff 536303.Jun 30 2023, 9:56 AM

harviniriawan marked 15 inline comments as done.

Harbormaster completed remote builds in B242461: Diff 536303.Jun 30 2023, 11:29 AM

dmgreen added inline comments.Jul 3 2023, 1:06 AM

llvm/unittests/Target/AArch64/AArch64SvePseudoTest.cpp
83	Can you make this two tests, one for each CPU. It can make the tests easier to work with if they fail, where it is better to have each test more independant. The neoverse-v2 part may want to be moved to D154232 to keep that patch as an NFC. This patch could then be to improve A510.

paulwalker-arm added inline comments.Jul 3 2023, 2:41 AM

llvm/unittests/Target/AArch64/AArch64SvePseudoTest.cpp
83	@dmgreen - The AArch64SchedNeoverseV2.td changes are functional are they not? As in this patch is about adding scheduling information for SVE pseudo instructions, which coveres multiple scheduling models. Or have I misunderstood your comment?

dmgreen added inline comments.Jul 3 2023, 2:50 AM

llvm/unittests/Target/AArch64/AArch64SvePseudoTest.cpp
83	The V2 scheduling model was already matching `(ABS\|CNOT\|NEG)_ZPmZ_UNDEF_[BHSD]$`. It needs to be changed to match `(ABS\|CNOT\|NEG)_ZPmZ_[BHSD]_UNDEF$` (or `(ABS\|CNOT\|NEG)_ZPmZ_[BHSD]` as is done here) in order to keep it matching the same instructions. I'm not sure if there were other missing UNDEF instructions? I might just include them in D154232 if they were, but it may be better to pull it out into a different patch.

paulwalker-arm added inline comments.Jul 3 2023, 2:55 AM

llvm/unittests/Target/AArch64/AArch64SvePseudoTest.cpp
83	I see. Thanks. So I was thinking D154232 would just be a literal change to move the `UNDEF` matching to where it needs to be. Then this patch can unify the patterns as it's already doing, which would then catch any missing ones.

harviniriawan updated this revision to Diff 536769.Jul 3 2023, 7:40 AM

harviniriawan edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B242801: Diff 536769.Jul 3 2023, 10:58 AM

This looks OK to me, as far as I can see, and I like the test case. LGTM, thanks.

llvm/unittests/Target/AArch64/AArch64SVESchedPseudoTest.cpp
82 ↗	(On Diff #536769)	Perhaps use `TEST(AArch64SVESchedPseudoTest, CortexA510) {`

This revision is now accepted and ready to land.Jul 4 2023, 1:17 AM

harviniriawan closed this revision.Jul 4 2023, 2:44 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SchedA510.td

419 lines

AArch64SchedNeoverseV2.td

462 lines

unittests/

Target/

AArch64/

AArch64SvePseudoTest.cpp

88 lines

CMakeLists.txt

1 line

Diff 536303

llvm/lib/Target/AArch64/AArch64SchedA510.td

Context not available.

	// Loop control, based on GPR	// Loop control, based on GPR
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],
	(instregex "^WHILE(GE\|GT\|HI\|HS\|LE\|LO\|LS\|LT)_P(WW\|XX)_[BHSD]$")>;	(instregex "^WHILE(GE\|GT\|HI\|HS\|LE\|LO\|LS\|LT)_P(WW\|XX)_[BHSD]")>;

	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^WHILE(RW\|WR)_PXX_[BHSD]$")>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^WHILE(RW\|WR)_PXX_[BHSD]")>;

	// Loop terminate	// Loop terminate
	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>], (instregex "^CTERM(EQ\|NE)_(WW\|XX)$")>;	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>], (instregex "^CTERM(EQ\|NE)_(WW\|XX)")>;

	// Predicate counting scalar	// Predicate counting scalar
	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>], (instrs ADDPL_XXI, ADDVL_XXI, RDVLI_XI)>;	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>], (instrs ADDPL_XXI, ADDVL_XXI, RDVLI_XI)>;

	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>],	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>],
	(instregex "^CNT[BHWD]_XPiI$")>;	(instregex "^CNT[BHWD]_XPiI")>;

	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>],	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>],
	(instregex "^(INC\|DEC)[BHWD]_XPiI$")>;	(instregex "^(INC\|DEC)[BHWD]_XPiI")>;

	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>],	def : InstRW<[CortexA510Write<1, CortexA510UnitALU>],
	(instregex "^(SQINC\|SQDEC\|UQINC\|UQDEC)[BHWD]_[XW]Pi(Wd)?I$")>;	(instregex "^(SQINC\|SQDEC\|UQINC\|UQDEC)[BHWD]_[XW]Pi(Wd)?I")>;

	// Predicate counting scalar, active predicate	// Predicate counting scalar, active predicate
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],
	(instregex "^CNTP_XPP_[BHSD]$")>;	(instregex "^CNTP_XPP_[BHSD]")>;

	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],
	(instregex "^(DEC\|INC)P_XP_[BHSD]$")>;	(instregex "^(DEC\|INC)P_XP_[BHSD]")>;

	def : InstRW<[CortexA510Write<8, CortexA510UnitVALU0>],	def : InstRW<[CortexA510Write<8, CortexA510UnitVALU0>],
	(instregex "^(SQDEC\|SQINC\|UQDEC\|UQINC)P_XP_[BHSD]$",	(instregex "^(SQDEC\|SQINC\|UQDEC\|UQINC)P_XP_[BHSD]",
	"^(UQDEC\|UQINC)P_WP_[BHSD]$",	"^(UQDEC\|UQINC)P_WP_[BHSD]",
	"^(SQDEC\|SQINC\|UQDEC\|UQINC)P_XPWd_[BHSD]$")>;	"^(SQDEC\|SQINC\|UQDEC\|UQINC)P_XPWd_[BHSD]")>;


	// Predicate counting vector, active predicate	// Predicate counting vector, active predicate
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],
	(instregex "^(DEC\|INC\|SQDEC\|SQINC\|UQDEC\|UQINC)P_ZP_[HSD]$")>;	(instregex "^(DEC\|INC\|SQDEC\|SQINC\|UQDEC\|UQINC)P_ZP_[HSD]")>;

	// Predicate logical	// Predicate logical
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],
	(instregex "^(AND\|BIC\|EOR\|NAND\|NOR\|ORN\|ORR)_PPzPP$")>;	(instregex "^(AND\|BIC\|EOR\|NAND\|NOR\|ORN\|ORR)_PPzPP")>;

	// Predicate logical, flag setting	// Predicate logical, flag setting
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>],
	(instregex "^(ANDS\|BICS\|EORS\|NANDS\|NORS\|ORNS\|ORRS)_PPzPP$")>;	(instregex "^(ANDS\|BICS\|EORS\|NANDS\|NORS\|ORNS\|ORRS)_PPzPP")>;

	// Predicate reverse	// Predicate reverse
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^REV_PP_[BHSD]$")>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^REV_PP_[BHSD]")>;

	// Predicate select	// Predicate select
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instrs SEL_PPPP)>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instrs SEL_PPPP)>;

	// Predicate set	// Predicate set
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^PFALSE$", "^PTRUE_[BHSD]$")>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^PFALSE", "^PTRUE_[BHSD]")>;

	// Predicate set/initialize, set flags	// Predicate set/initialize, set flags
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^PTRUES_[BHSD]$")>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^PTRUES_[BHSD]")>;

	// Predicate find first/next	// Predicate find first/next
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^PFIRST_B$", "^PNEXT_[BHSD]$")>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^PFIRST_B", "^PNEXT_[BHSD]")>;

	// Predicate test	// Predicate test
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instrs PTEST_PP)>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instrs PTEST_PP)>;

	// Predicate transpose	// Predicate transpose
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^TRN[12]_PPP_[BHSDQ]$")>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^TRN[12]_PPP_[BHSDQ]")>;

	// Predicate unpack and widen	// Predicate unpack and widen
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instrs PUNPKHI_PP, PUNPKLO_PP)>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instrs PUNPKHI_PP, PUNPKLO_PP)>;

	// Predicate zip/unzip	// Predicate zip/unzip
	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^(ZIP\|UZP)[12]_PPP_[BHSDQ]$")>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVALU0>], (instregex "^(ZIP\|UZP)[12]_PPP_[BHSDQ]")>;


	// SVE integer instructions	// SVE integer instructions
	// -----------------------------------------------------------------------------	// -----------------------------------------------------------------------------
	// Arithmetic, absolute diff	// Arithmetic, absolute diff
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^[SU]ABD_ZPmZ_[BHSD]$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^[SU]ABD_(ZPmZ\|ZPZZ)_[BHSD]")>;

	// Arithmetic, absolute diff accum	// Arithmetic, absolute diff accum
	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU>], (instregex "^[SU]ABA_ZZZ_[BHSD]$")>;	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU>], (instregex "^[SU]ABA_ZZZ_[BHSD]")>;

	// Arithmetic, absolute diff accum long	// Arithmetic, absolute diff accum long
	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU>], (instregex "^[SU]ABAL[TB]_ZZZ_[HSD]$")>;	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU>], (instregex "^[SU]ABAL[TB]_ZZZ_[HSD]")>;

	// Arithmetic, absolute diff long	// Arithmetic, absolute diff long
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^[SU]ABDL[TB]_ZZZ_[HSD]$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^[SU]ABDL[TB]_ZZZ_[HSD]")>;

	// Arithmetic, basic	// Arithmetic, basic
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],
	(instregex "^(ABS\|CNOT\|NEG)_ZPmZ_[BHSD]$",	(instregex "^(ABS\|CNOT\|NEG)_ZPmZ_[BHSD]",
	"^(ADD\|SUB\|SUBR)_ZPmZ_[BHSD]$",	"^(ADD\|SUB\|SUBR)_ZPmZ_[BHSD]",
	"^(ADD\|SUB)_ZZZ_[BHSD]$",	"^(ADD\|SUB\|SUBR)_ZPZZ_[BHSD]",
	"^(ADD\|SUB\|SUBR)_ZI_[BHSD]$",	"^(ADD\|SUB)_ZZZ_[BHSD]",
	"^ADR_[SU]XTW_ZZZ_D_[0123]$",	"^(ADD\|SUB\|SUBR)_ZI_[BHSD]",
	"^ADR_LSL_ZZZ_[SD]_[0123]$",	"^ADR_[SU]XTW_ZZZ_D_[0123]",
	"^[SU](ADD\|SUB)[LW][BT]_ZZZ_[HSD]$",	"^ADR_LSL_ZZZ_[SD]_[0123]",
	"^SADDLBT_ZZZ_[HSD]$",	"^[SU](ADD\|SUB)[LW][BT]_ZZZ_[HSD]",
	"^[SU]H(ADD\|SUB\|SUBR)_ZPmZ_[BHSD]$",	"^SADDLBT_ZZZ_[HSD]",
	"^SSUBL(BT\|TB)_ZZZ_[HSD]$")>;	"^[SU]H(ADD\|SUB\|SUBR)_ZPmZ_[BHSD]",
		"^SSUBL(BT\|TB)_ZZZ_[HSD]")>;

	// Arithmetic, complex	// Arithmetic, complex
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],
	(instregex "^R?(ADD\|SUB)HN[BT]_ZZZ_[BHS]$",	(instregex "^R?(ADD\|SUB)HN[BT]_ZZZ_[BHS]",
	"^SQ(ABS\|NEG)_ZPmZ_[BHSD]$",	"^SQ(ABS\|NEG)_ZPmZ_[BHSD]",
	"^SQ(ADD\|SUB\|SUBR)_ZPmZ_?[BHSD]$",	"^SQ(ADD\|SUB\|SUBR)_ZPmZ_?[BHSD]",
	"^[SU]Q(ADD\|SUB)_ZZZ_[BHSD]$",	"^[SU]Q(ADD\|SUB)_ZZZ_[BHSD]",
	"^[SU]Q(ADD\|SUB)_ZI_[BHSD]$",	"^[SU]Q(ADD\|SUB)_ZI_[BHSD]",
	"^(SRH\|SUQ\|UQ\|USQ\|URH)ADD_ZPmZ_[BHSD]$",	"^(SRH\|SUQ\|UQ\|USQ\|URH)ADD_ZPmZ_[BHSD]",
	"^(UQSUB\|UQSUBR)_ZPmZ_[BHSD]$")>;	"^(UQSUB\|UQSUBR)_ZPmZ_[BHSD]")>;

	// Arithmetic, large integer	// Arithmetic, large integer
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^(AD\|SB)CL[BT]_ZZZ_[SD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^(AD\|SB)CL[BT]_ZZZ_[SD]")>;

	// Arithmetic, pairwise add	// Arithmetic, pairwise add
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^ADDP_ZPmZ_[BHSD]$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^ADDP_ZPmZ_[BHSD]")>;

	// Arithmetic, pairwise add and accum long	// Arithmetic, pairwise add and accum long
	def : InstRW<[CortexA510MCWrite<7, 2, CortexA510UnitVALU>], (instregex "^[SU]ADALP_ZPmZ_[HSD]$")>;	def : InstRW<[CortexA510MCWrite<7, 2, CortexA510UnitVALU>], (instregex "^[SU]ADALP_ZPmZ_[HSD]")>;

	// Arithmetic, shift	// Arithmetic, shift
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],
	(instregex "^(ASR\|LSL\|LSR)_WIDE_ZPmZ_[BHS]$",	(instregex "^(ASR\|LSL\|LSR)_WIDE_ZPmZ_[BHS]",
	"^(ASR\|LSL\|LSR)_WIDE_ZZZ_[BHS]$",	"^(ASR\|LSL\|LSR)_WIDE_ZZZ_[BHS]",
	"^(ASR\|LSL\|LSR)_ZPmI_[BHSD]$",	"^(ASR\|LSL\|LSR)_ZPmI_[BHSD]",
	"^(ASR\|LSL\|LSR)_ZPmZ_[BHSD]$",	"^(ASR\|LSL\|LSR)_ZPZI_[BHSD]",
	"^(ASR\|LSL\|LSR)_ZZI_[BHSD]$",	"^(ASR\|LSL\|LSR)_ZPmZ_[BHSD]",
	"^(ASRR\|LSLR\|LSRR)_ZPmZ_[BHSD]$")>;	"^(ASR\|LSL\|LSR)_ZPZZ_[BHSD]",
		"^(ASR\|LSL\|LSR)_ZZI_[BHSD]",
		"^(ASRR\|LSLR\|LSRR)_ZPmZ_[BHSD]")>;
		// Arithmetic, shift right for divide
		def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],
		(instregex "^ASRD_ZPmI_[BHSD]",
		"^ASRD_ZPZI_[BHSD]")>;

	// Arithmetic, shift and accumulate	// Arithmetic, shift and accumulate
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],
	(instregex "^(SSRA\|USRA)_ZZI_[BHSD]$")>;	(instregex "^(SSRA\|USRA)_ZZI_[BHSD]")>;

	def : InstRW<[CortexA510MCWrite<7, 2, CortexA510UnitVALU>],	def : InstRW<[CortexA510MCWrite<7, 2, CortexA510UnitVALU>],
	(instregex "^(SRSRA\|URSRA)_ZZI_[BHSD]$")>;	(instregex "^(SRSRA\|URSRA)_ZZI_[BHSD]")>;


	// Arithmetic, shift by immediate	// Arithmetic, shift by immediate
	// Arithmetic, shift by immediate and insert	// Arithmetic, shift by immediate and insert
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],
	(instregex "^(SHRNB\|SHRNT\|SSHLLB\|SSHLLT\|USHLLB\|USHLLT\|SLI\|SRI)_ZZI_[BHSD]$")>;	(instregex "^(SHRNB\|SHRNT\|SSHLLB\|SSHLLT\|USHLLB\|USHLLT\|SLI\|SRI)_ZZI_[BHSD]")>;

	// Arithmetic, shift complex	// Arithmetic, shift complex
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],
	(instregex "^(SQ)?RSHRU?N[BT]_ZZI_[BHS]$",	(instregex "^(SQ)?RSHRU?N[BT]_ZZI_[BHS]",
	"^(SQRSHL\|SQRSHLR\|SQSHL\|SQSHLR\|UQRSHL\|UQRSHLR\|UQSHL\|UQSHLR)_ZPmZ_[BHSD]$",	"^(SQRSHL\|SQRSHLR\|SQSHL\|SQSHLR\|UQRSHL\|UQRSHLR\|UQSHL\|UQSHLR)_(ZPmZ\|ZPZZ)_[BHSD]",
	"^(SQSHL\|SQSHLU\|UQSHL)_ZPmI_[BHSD]$",	"^(SQSHL\|SQSHLU\|UQSHL)_(ZPmI\|ZPZI)_[BHSD]",
	"^SQSHRU?N[BT]_ZZI_[BHS]$",	"^SQSHRU?N[BT]_ZZI_[BHS]",
	"^UQR?SHRN[BT]_ZZI_[BHS]$")>;	"^UQR?SHRN[BT]_ZZI_[BHS]")>;

	// Arithmetic, shift right for divide
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^ASRD_ZPmI_[BHSD]$")>;

	// Arithmetic, shift rounding	// Arithmetic, shift rounding
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],
	(instregex "^(SRSHL\|SRSHLR\|URSHL\|URSHLR)_ZPmZ_[BHSD]$",	(instregex "^(SRSHL\|SRSHR\|SRSHLR\|URSHL\|URSHLR\|URSHR)_(ZPmZ\|ZPZZ\|ZPZI)_[BHSD]",
	"^[SU]RSHR_ZPmI_[BHSD]$")>;	"^[SU]RSHR_ZPmI_[BHSD]")>;

	// Bit manipulation	// Bit manipulation
	def : InstRW<[CortexA510MCWrite<14, 13, CortexA510UnitVMC>],	def : InstRW<[CortexA510MCWrite<14, 13, CortexA510UnitVMC>],
	(instregex "^(BDEP\|BEXT\|BGRP)_ZZZ_B$")>;	(instregex "^(BDEP\|BEXT\|BGRP)_ZZZ_B")>;

	def : InstRW<[CortexA510MCWrite<22, 21, CortexA510UnitVMC>],	def : InstRW<[CortexA510MCWrite<22, 21, CortexA510UnitVMC>],
	(instregex "^(BDEP\|BEXT\|BGRP)_ZZZ_H$")>;	(instregex "^(BDEP\|BEXT\|BGRP)_ZZZ_H")>;

	def : InstRW<[CortexA510MCWrite<38, 37, CortexA510UnitVMC>],	def : InstRW<[CortexA510MCWrite<38, 37, CortexA510UnitVMC>],
	(instregex "^(BDEP\|BEXT\|BGRP)_ZZZ_S$")>;	(instregex "^(BDEP\|BEXT\|BGRP)_ZZZ_S")>;

	def : InstRW<[CortexA510MCWrite<70, 69, CortexA510UnitVMC>],	def : InstRW<[CortexA510MCWrite<70, 69, CortexA510UnitVMC>],
	(instregex "^(BDEP\|BEXT\|BGRP)_ZZZ_D$")>;	(instregex "^(BDEP\|BEXT\|BGRP)_ZZZ_D")>;


	// Bitwise select	// Bitwise select
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^(BSL\|BSL1N\|BSL2N\|NBSL)_ZZZZ$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^(BSL\|BSL1N\|BSL2N\|NBSL)_ZZZZ")>;

	// Count/reverse bits	// Count/reverse bits
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^(CLS\|CLZ\|RBIT)_ZPmZ_[BHSD]$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^(CLS\|CLZ\|RBIT)_ZPmZ_[BHSD]")>;
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^CNT_ZPmZ_[BH]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^CNT_ZPmZ_[BH]")>;
	def : InstRW<[CortexA510Write<8, CortexA510UnitVALU>], (instregex "^CNT_ZPmZ_S$")>;	def : InstRW<[CortexA510Write<8, CortexA510UnitVALU>], (instregex "^CNT_ZPmZ_S")>;
	def : InstRW<[CortexA510Write<12, CortexA510UnitVALU>], (instregex "^CNT_ZPmZ_D$")>;	def : InstRW<[CortexA510Write<12, CortexA510UnitVALU>], (instregex "^CNT_ZPmZ_D")>;
	// Broadcast logical bitmask immediate to vector	// Broadcast logical bitmask immediate to vector
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instrs DUPM_ZI)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instrs DUPM_ZI)>;

	// Compare and set flags	// Compare and set flags
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],
	(instregex "^CMP(EQ\|GE\|GT\|HI\|HS\|LE\|LO\|LS\|LT\|NE)_PPzZ[IZ]_[BHSD]$",	(instregex "^CMP(EQ\|GE\|GT\|HI\|HS\|LE\|LO\|LS\|LT\|NE)_PPzZ[IZ]_[BHSD]",
	"^CMP(EQ\|GE\|GT\|HI\|HS\|LE\|LO\|LS\|LT\|NE)_WIDE_PPzZZ_[BHS]$")>;	"^CMP(EQ\|GE\|GT\|HI\|HS\|LE\|LO\|LS\|LT\|NE)_WIDE_PPzZZ_[BHS]")>;

	// Complex add	// Complex add
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^CADD_ZZI_[BHSD]$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^CADD_ZZI_[BHSD]")>;

	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^SQCADD_ZZI_[BHSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^SQCADD_ZZI_[BHSD]")>;

	// Complex dot product 8-bit element	// Complex dot product 8-bit element
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs CDOT_ZZZ_S, CDOT_ZZZI_S)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs CDOT_ZZZ_S, CDOT_ZZZI_S)>;
Context not available.
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs CDOT_ZZZ_D, CDOT_ZZZI_D)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs CDOT_ZZZ_D, CDOT_ZZZI_D)>;

	// Complex multiply-add B, H, S element size	// Complex multiply-add B, H, S element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^CMLA_ZZZ_[BHS]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^CMLA_ZZZ_[BHS]",
	"^CMLA_ZZZI_[HS]$")>;	"^CMLA_ZZZI_[HS]")>;

	// Complex multiply-add D element size	// Complex multiply-add D element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs CMLA_ZZZ_D)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs CMLA_ZZZ_D)>;

	// Conditional extract operations, scalar form	// Conditional extract operations, scalar form
	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU>], (instregex "^CLAST[AB]_RPZ_[BHSD]$")>;	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU>], (instregex "^CLAST[AB]_RPZ_[BHSD]")>;

	// Conditional extract operations, SIMD&FP scalar and vector forms	// Conditional extract operations, SIMD&FP scalar and vector forms
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^CLAST[AB]_[VZ]PZ_[BHSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^CLAST[AB]_[VZ]PZ_[BHSD]",
	"^COMPACT_ZPZ_[SD]$",	"^COMPACT_ZPZ_[SD]",
	"^SPLICE_ZPZZ?_[BHSD]$")>;	"^SPLICE_ZPZZ?_[BHSD]")>;

	// Convert to floating point, 64b to float or convert to double	// Convert to floating point, 64b to float or convert to double
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^[SU]CVTF_ZPmZ_Dto[SD]")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^[SU]CVTF_ZPmZ_Dto[SD]")>;
Context not available.
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^[SU]CVTF_ZPmZ_HtoH")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^[SU]CVTF_ZPmZ_HtoH")>;

	// Copy, scalar	// Copy, scalar
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU0>],(instregex "^CPY_ZPmR_[BHSD]$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU0>],(instregex "^CPY_ZPmR_[BHSD]")>;

	// Copy, scalar SIMD&FP or imm	// Copy, scalar SIMD&FP or imm
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^CPY_ZPm[IV]_[BHSD]$",	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^CPY_ZPm[IV]_[BHSD]",
	"^CPY_ZPzI_[BHSD]$")>;	"^CPY_ZPzI_[BHSD]")>;

	// Divides, 32 bit	// Divides, 32 bit
	def : InstRW<[CortexA510MCWrite<15, 12, CortexA510UnitVMC>], (instregex "^[SU]DIVR?_ZPmZ_S$")>;	def : InstRW<[CortexA510MCWrite<15, 12, CortexA510UnitVMC>], (instregex "^[SU]DIVR?_(ZPmZ\|ZPZZ)_S")>;

	// Divides, 64 bit	// Divides, 64 bit
	def : InstRW<[CortexA510MCWrite<26, 23, CortexA510UnitVMC>], (instregex "^[SU]DIVR?_ZPmZ_D$")>;	def : InstRW<[CortexA510MCWrite<26, 23, CortexA510UnitVMC>], (instregex "^[SU]DIVR?_(ZPmZ\|ZPZZ)_D")>;

	// Dot product, 8 bit	// Dot product, 8 bit
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^[SU]DOT_ZZZI?_S$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^[SU]DOT_ZZZI?_S")>;

	// Dot product, 8 bit, using signed and unsigned integers	// Dot product, 8 bit, using signed and unsigned integers
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs SUDOT_ZZZI, USDOT_ZZZI, USDOT_ZZZ)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs SUDOT_ZZZI, USDOT_ZZZI, USDOT_ZZZ)>;

	// Dot product, 16 bit	// Dot product, 16 bit
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^[SU]DOT_ZZZI?_D$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^[SU]DOT_ZZZI?_D")>;

	// Duplicate, immediate and indexed form	// Duplicate, immediate and indexed form
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^DUP_ZI_[BHSD]$",	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^DUP_ZI_[BHSD]",
	"^DUP_ZZI_[BHSDQ]$")>;	"^DUP_ZZI_[BHSDQ]")>;

	// Duplicate, scalar form	// Duplicate, scalar form
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^DUP_ZR_[BHSD]$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^DUP_ZR_[BHSD]")>;

	// Extend, sign or zero	// Extend, sign or zero
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^[SU]XTB_ZPmZ_[HSD]$",	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^[SU]XTB_ZPmZ_[HSD]",
	"^[SU]XTH_ZPmZ_[SD]$",	"^[SU]XTH_ZPmZ_[SD]",
	"^[SU]XTW_ZPmZ_[D]$")>;	"^[SU]XTW_ZPmZ_[D]")>;

	// Extract	// Extract
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instrs EXT_ZZI, EXT_ZZI_B)>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instrs EXT_ZZI, EXT_ZZI_B)>;

	// Extract narrow saturating	// Extract narrow saturating
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^[SU]QXTN[BT]_ZZ_[BHS]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^[SU]QXTN[BT]_ZZ_[BHS]",
	"^SQXTUN[BT]_ZZ_[BHS]$")>;	"^SQXTUN[BT]_ZZ_[BHS]")>;

	// Extract/insert operation, SIMD and FP scalar form	// Extract/insert operation, SIMD and FP scalar form
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^LAST[AB]_VPZ_[BHSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^LAST[AB]_VPZ_[BHSD]",
	"^INSR_ZV_[BHSD]$")>;	"^INSR_ZV_[BHSD]")>;

	// Extract/insert operation, scalar	// Extract/insert operation, scalar
	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU0>], (instregex "^LAST[AB]_RPZ_[BHSD]$",	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU0>], (instregex "^LAST[AB]_RPZ_[BHSD]",
	"^INSR_ZR_[BHSD]$")>;	"^INSR_ZR_[BHSD]")>;

	// Histogram operations	// Histogram operations
	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU0>], (instregex "^HISTCNT_ZPzZZ_[SD]$",	def : InstRW<[CortexA510MCWrite<8, 2, CortexA510UnitVALU0>], (instregex "^HISTCNT_ZPzZZ_[SD]",
	"^HISTSEG_ZZZ$")>;	"^HISTSEG_ZZZ")>;

	// Horizontal operations, B, H, S form, immediate operands only	// Horizontal operations, B, H, S form, immediate operands only
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^INDEX_II_[BHS]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^INDEX_II_[BHS]")>;

	// Horizontal operations, B, H, S form, scalar, immediate operands/ scalar	// Horizontal operations, B, H, S form, scalar, immediate operands/ scalar
	// operands only / immediate, scalar operands	// operands only / immediate, scalar operands
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^INDEX_(IR\|RI\|RR)_[BHS]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^INDEX_(IR\|RI\|RR)_[BHS]")>;

	// Horizontal operations, D form, immediate operands only	// Horizontal operations, D form, immediate operands only
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs INDEX_II_D)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs INDEX_II_D)>;

	// Horizontal operations, D form, scalar, immediate operands)/ scalar operands	// Horizontal operations, D form, scalar, immediate operands)/ scalar operands
	// only / immediate, scalar operands	// only / immediate, scalar operands
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^INDEX_(IR\|RI\|RR)_D$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^INDEX_(IR\|RI\|RR)_D")>;

	// Logical	// Logical
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>],
	(instregex "^(AND\|EOR\|ORR)_ZI$",	(instregex "^(AND\|EOR\|ORR)_ZI",
	"^(AND\|BIC\|EOR\|EOR(BT\|TB)?\|ORR)_ZZZ$",	"^(AND\|BIC\|EOR\|EOR\|ORR)_ZZZ",
	"^(AND\|BIC\|EOR\|NOT\|ORR)_ZPmZ_[BHSD]$")>;	"^(AND\|BIC\|EOR\|NOT\|ORR)_ZPmZ_[BHSD]",
		"^(AND\|BIC\|EOR\|NOT\|ORR)_ZPZZ_[BHSD]")>;

	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],
	(instregex "^EOR(BT\|TB)_ZZZ_[BHSD]$")>;	(instregex "^EOR(BT\|TB)_ZZZ_[BHSD]")>;

	// Max/min, basic and pairwise	// Max/min, basic and pairwise
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^[SU](MAX\|MIN)_ZI_[BHSD]$",	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^[SU](MAX\|MIN)_ZI_[BHSD]",
	"^[SU](MAX\|MIN)P?_ZPmZ_[BHSD]$")>;	"^[SU](MAX\|MIN)P?_(ZPmZ\|ZPZZ)_[BHSD]")>;

	// Matching operations	// Matching operations
	def : InstRW<[CortexA510MCWrite<7, 2, CortexA510UnitVALU>], (instregex "^N?MATCH_PPzZZ_[BH]$")>;	def : InstRW<[CortexA510MCWrite<7, 2, CortexA510UnitVALU>], (instregex "^N?MATCH_PPzZZ_[BH]")>;

	// Matrix multiply-accumulate	// Matrix multiply-accumulate
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs SMMLA_ZZZ, UMMLA_ZZZ, USMMLA_ZZZ)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs SMMLA_ZZZ, UMMLA_ZZZ, USMMLA_ZZZ)>;

	// Move prefix	// Move prefix
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^MOVPRFX_ZP[mz]Z_[BHSD]$",	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^MOVPRFX_ZP[mz]Z_[BHSD]",
	"^MOVPRFX_ZZ$")>;	"^MOVPRFX_ZZ")>;

	// Multiply, B, H, S element size	// Multiply, B, H, S element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^MUL_(ZI\|ZPmZ\|ZZZI\|ZZZ)_[BHS]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^MUL_(ZI\|ZPmZ\|ZZZI\|ZZZ\|ZPZZ)_[BHS]",
	"^[SU]MULH_(ZPmZ\|ZZZ)_[BHS]$")>;	"^[SU]MULH_(ZPmZ\|ZZZ\|ZPZZ)_[BHS]")>;

	// Multiply, D element size	// Multiply, D element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^MUL_(ZI\|ZPmZ\|ZZZI\|ZZZ)_D$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^MUL_(ZI\|ZPmZ\|ZZZI\|ZZZ\|ZPZZ)_D",
	"^[SU]MULH_(ZPmZ\|ZZZ)_D$")>;	"^[SU]MULH_(ZPmZ\|ZZZ\|ZPZZ)_D")>;

	// Multiply long	// Multiply long
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^[SU]MULL[BT]_ZZZI_[SD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^[SU]MULL[BT]_ZZZI_[SD]",
	"^[SU]MULL[BT]_ZZZ_[HSD]$")>;	"^[SU]MULL[BT]_ZZZ_[HSD]")>;

	// Multiply accumulate, B, H, S element size	// Multiply accumulate, B, H, S element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^ML[AS]_ZZZI_[BHS]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^ML[AS]_(ZZZI\|ZPZZZ)_[BHS]",
	"^(ML[AS]\|MAD\|MSB)_ZPmZZ_[BHS]$")>;	"^(ML[AS]\|MAD\|MSB)_ZPmZZ_[BHS]")>;

	// Multiply accumulate, D element size	// Multiply accumulate, D element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^ML[AS]_ZZZI_D$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^ML[AS]_(ZZZI\|ZPZZZ)_D",
	"^(ML[AS]\|MAD\|MSB)_ZPmZZ_D$")>;	"^(ML[AS]\|MAD\|MSB)_ZPmZZ_D")>;

	// Multiply accumulate long	// Multiply accumulate long
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^[SU]ML[AS]L[BT]_ZZZ_[HSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^[SU]ML[AS]L[BT]_ZZZ_[HSD]",
	"^[SU]ML[AS]L[BT]_ZZZI_[SD]$")>;	"^[SU]ML[AS]L[BT]_ZZZI_[SD]")>;

	// Multiply accumulate saturating doubling long regular	// Multiply accumulate saturating doubling long regular
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQDML[AS](LB\|LT\|LBT)_ZZZ_[HSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQDML[AS](LB\|LT\|LBT)_ZZZ_[HSD]",
	"^SQDML[AS](LB\|LT)_ZZZI_[SD]$")>;	"^SQDML[AS](LB\|LT)_ZZZI_[SD]")>;

	// Multiply saturating doubling high, B, H, S element size	// Multiply saturating doubling high, B, H, S element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQDMULH_ZZZ_[BHS]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQDMULH_ZZZ_[BHS]",
	"^SQDMULH_ZZZI_[HS]$")>;	"^SQDMULH_ZZZI_[HS]")>;

	// Multiply saturating doubling high, D element size	// Multiply saturating doubling high, D element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs SQDMULH_ZZZ_D, SQDMULH_ZZZI_D)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs SQDMULH_ZZZ_D, SQDMULH_ZZZI_D)>;

	// Multiply saturating doubling long	// Multiply saturating doubling long
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQDMULL[BT]_ZZZ_[HSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQDMULL[BT]_ZZZ_[HSD]",
	"^SQDMULL[BT]_ZZZI_[SD]$")>;	"^SQDMULL[BT]_ZZZI_[SD]")>;

	// Multiply saturating rounding doubling regular/complex accumulate, B, H, S	// Multiply saturating rounding doubling regular/complex accumulate, B, H, S
	// element size	// element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQRDML[AS]H_ZZZ_[BHS]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQRDML[AS]H_ZZZ_[BHS]",
	"^SQRDCMLAH_ZZZ_[BHS]$",	"^SQRDCMLAH_ZZZ_[BHS]",
	"^SQRDML[AS]H_ZZZI_[HS]$",	"^SQRDML[AS]H_ZZZI_[HS]",
	"^SQRDCMLAH_ZZZI_[HS]$")>;	"^SQRDCMLAH_ZZZI_[HS]")>;

	// Multiply saturating rounding doubling regular/complex accumulate, D element	// Multiply saturating rounding doubling regular/complex accumulate, D element
	// size	// size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQRDML[AS]H_ZZZI?_D$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQRDML[AS]H_ZZZI?_D",
	"^SQRDCMLAH_ZZZ_D$")>;	"^SQRDCMLAH_ZZZ_D")>;

	// Multiply saturating rounding doubling regular/complex, B, H, S element size	// Multiply saturating rounding doubling regular/complex, B, H, S element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQRDMULH_ZZZ_[BHS]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQRDMULH_ZZZ_[BHS]",
	"^SQRDMULH_ZZZI_[HS]$")>;	"^SQRDMULH_ZZZI_[HS]")>;

	// Multiply saturating rounding doubling regular/complex, D element size	// Multiply saturating rounding doubling regular/complex, D element size
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQRDMULH_ZZZI?_D$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^SQRDMULH_ZZZI?_D")>;

	// Multiply/multiply long, (8x8) polynomial	// Multiply/multiply long, (8x8) polynomial
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^PMUL_ZZZ_B$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^PMUL_ZZZ_B")>;

	def : InstRW<[CortexA510Write<6, CortexA510UnitVMC>], (instregex "^PMULL[BT]_ZZZ_[HDQ]$")>;	def : InstRW<[CortexA510Write<6, CortexA510UnitVMC>], (instregex "^PMULL[BT]_ZZZ_[HDQ]")>;


	// Predicate counting vector	// Predicate counting vector
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>],
	(instregex "^(DEC\|INC\|SQDEC\|SQINC\|UQDEC\|UQINC)[HWD]_ZPiI$")>;	(instregex "^(DEC\|INC\|SQDEC\|SQINC\|UQDEC\|UQINC)[HWD]_ZPiI")>;

	// Reciprocal estimate	// Reciprocal estimate
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs URECPE_ZPmZ_S, URSQRTE_ZPmZ_S)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^URECPE_ZPmZ_S", "^URSQRTE_ZPmZ_S")>;

	// Reduction, arithmetic, B form	// Reduction, arithmetic, B form
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>], (instregex "^[SU](ADD\|MAX\|MIN)V_VPZ_B")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>], (instregex "^[SU](ADD\|MAX\|MIN)V_VPZ_B")>;
Context not available.
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>], (instregex "^[SU](ADD\|MAX\|MIN)V_VPZ_D")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>], (instregex "^[SU](ADD\|MAX\|MIN)V_VPZ_D")>;

	// Reduction, logical	// Reduction, logical
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>], (instregex "^(ANDV\|EORV\|ORV)_VPZ_[BHSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>], (instregex "^(ANDV\|EORV\|ORV)_VPZ_[BHSD]")>;

	// Reverse, vector	// Reverse, vector
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^REV_ZZ_[BHSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^REV_ZZ_[BHSD]",
	"^REVB_ZPmZ_[HSD]$",	"^REVB_ZPmZ_[HSD]",
	"^REVH_ZPmZ_[SD]$",	"^REVH_ZPmZ_[SD]",
	"^REVW_ZPmZ_D$")>;	"^REVW_ZPmZ_D")>;

	// Select, vector form	// Select, vector form
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^SEL_ZPZZ_[BHSD]$")>;	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "^SEL_ZPZZ_[BHSD]")>;

	// Table lookup	// Table lookup
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^TBL_ZZZZ?_[BHSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^TBL_ZZZZ?_[BHSD]")>;

	// Table lookup extension	// Table lookup extension
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^TBX_ZZZ_[BHSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^TBX_ZZZ_[BHSD]")>;

	// Transpose, vector form	// Transpose, vector form
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^TRN[12]_ZZZ_[BHSDQ]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^TRN[12]_ZZZ_[BHSDQ]")>;

	// Unpack and extend	// Unpack and extend
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^[SU]UNPK(HI\|LO)_ZZ_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^[SU]UNPK(HI\|LO)_ZZ_[HSD]")>;

	// Zip/unzip	// Zip/unzip
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^(UZP\|ZIP)[12]_ZZZ_[BHSDQ]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^(UZP\|ZIP)[12]_ZZZ_[BHSDQ]")>;

	// SVE floating-point instructions	// SVE floating-point instructions
	// -----------------------------------------------------------------------------	// -----------------------------------------------------------------------------

	// Floating point absolute value/difference	// Floating point absolute value/difference
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FAB[SD]_ZPmZ_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FAB[SD]_ZPmZ_[HSD]",
		"^FAB[SD]_ZPZZ_[HSD]")>;

	// Floating point arithmetic	// Floating point arithmetic
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^F(ADD\|SUB)_(ZPm[IZ]\|ZZZ)_[HSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^F(ADD\|SUB)_(ZPm[IZ]\|ZZZ\|ZPZI\|ZPZZ)_[HSD]",
	"^FADDP_ZPmZZ_[HSD]$",	"^FADDP_ZPmZZ_[HSD]",
	"^FNEG_ZPmZ_[HSD]$",	"^FNEG_ZPmZ_[HSD]",
	"^FSUBR_ZPm[IZ]_[HSD]$")>;	"^FSUBR_(ZPm[IZ]\|ZPZ[IZ])_[HSD]")>;

	// Floating point associative add, F16	// Floating point associative add, F16
	def : InstRW<[CortexA510MCWrite<32, 29, CortexA510UnitVALU>], (instrs FADDA_VPZ_H)>;	def : InstRW<[CortexA510MCWrite<32, 29, CortexA510UnitVALU>], (instrs FADDA_VPZ_H)>;
Context not available.
	def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU>], (instrs FADDA_VPZ_D)>;	def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU>], (instrs FADDA_VPZ_D)>;

	// Floating point compare	// Floating point compare
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
	"^FCM(EQ\|GE\|GT\|NE)_PPzZ[0Z]_[HSD]$",	"^FCM(EQ\|GE\|GT\|NE)_PPzZ[0Z]_[HSD]",
	"^FCM(LE\|LT)_PPzZ0_[HSD]$",	"^FCM(LE\|LT)_PPzZ0_[HSD]",
	"^FCMUO_PPzZZ_[HSD]$")>;	"^FCMUO_PPzZZ_[HSD]")>;

	// Floating point complex add	// Floating point complex add
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FCADD_ZPmZ_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FCADD_ZPmZ_[HSD]")>;

	// Floating point complex multiply add	// Floating point complex multiply add
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FCMLA_ZPmZZ_[HSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FCMLA_ZPmZZ_[HSD]",
	"^FCMLA_ZZZI_[HS]$")>;	"^FCMLA_ZZZI_[HS]")>;

	// Floating point convert, long or narrow (F16 to F32 or F32 to F16)	// Floating point convert, long or narrow (F16 to F32 or F32 to F16)
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FCVT_ZPmZ_(HtoS\|StoH)",	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FCVT_ZPmZ_(HtoS\|StoH)",
Context not available.
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FCVTX_ZPmZ_DtoS", "FCVTXNT_ZPmZ_DtoS")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FCVTX_ZPmZ_DtoS", "FCVTXNT_ZPmZ_DtoS")>;

	// Floating point base2 log, F16	// Floating point base2 log, F16
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs FLOGB_ZPmZ_H)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FLOGB_(ZPmZ\|ZPZZ)_H")>;

	// Floating point base2 log, F32	// Floating point base2 log, F32
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs FLOGB_ZPmZ_S)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FLOGB_(ZPmZ\|ZPZZ)_S")>;

	// Floating point base2 log, F64	// Floating point base2 log, F64
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs FLOGB_ZPmZ_D)>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FLOGB_(ZPmZ\|ZPZZ)_D")>;

	// Floating point convert to integer, F16	// Floating point convert to integer, F16
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FCVTZ[SU]_ZPmZ_HtoH")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FCVTZ[SU]_ZPmZ_HtoH")>;
Context not available.
	(instregex "^FCVTZ[SU]_ZPmZ_(HtoD\|StoD\|DtoS\|DtoD)")>;	(instregex "^FCVTZ[SU]_ZPmZ_(HtoD\|StoD\|DtoS\|DtoD)")>;

	// Floating point copy	// Floating point copy
	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU0>], (instregex "^FCPY_ZPmI_[HSD]$",	def : InstRW<[CortexA510Write<3, CortexA510UnitVALU0>], (instregex "^FCPY_ZPmI_[HSD]",
	"^FDUP_ZI_[HSD]$")>;	"^FDUP_ZI_[HSD]")>;

	// Floating point divide, F16	// Floating point divide, F16
	def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVMC>], (instregex "^FDIVR?_ZPmZ_H$")>;	def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVMC>], (instregex "^FDIVR?_(ZPmZ\|ZPZZ)_H")>;

	// Floating point divide, F32	// Floating point divide, F32
	def : InstRW<[CortexA510MCWrite<13, 10, CortexA510UnitVMC>], (instregex "^FDIVR?_ZPmZ_S$")>;	def : InstRW<[CortexA510MCWrite<13, 10, CortexA510UnitVMC>], (instregex "^FDIVR?_(ZPmZ\|ZPZZ)_S")>;

	// Floating point divide, F64	// Floating point divide, F64
	def : InstRW<[CortexA510MCWrite<22, 19, CortexA510UnitVMC>], (instregex "^FDIVR?_ZPmZ_D$")>;	def : InstRW<[CortexA510MCWrite<22, 19, CortexA510UnitVMC>], (instregex "^FDIVR?_(ZPmZ\|ZPZZ)_D")>;

	// Floating point min/max pairwise	// Floating point min/max pairwise
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^F(MAX\|MIN)(NM)?P_ZPmZZ_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^F(MAX\|MIN)(NM)?P_ZPmZZ_[HSD]")>;

	// Floating point min/max	// Floating point min/max
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^F(MAX\|MIN)(NM)?_ZPm[IZ]_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^F(MAX\|MIN)(NM)?_(ZPm[IZ]\|ZPZZ\|ZPZI)_[HSD]")>;

	// Floating point multiply	// Floating point multiply
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^(FSCALE\|FMULX)_ZPmZ_[HSD]$",	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^(FSCALE\|FMULX)_(ZPmZ\|ZPZZ)_[HSD]",
	"^FMUL_(ZPm[IZ]\|ZZZI?)_[HSD]$")>;	"^FMUL_(ZPm[IZ]\|ZZZI?\|ZPZI\|ZPZZ)_[HSD]")>;

	// Floating point multiply accumulate	// Floating point multiply accumulate
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>],
	(instregex "^FML[AS]_(ZPmZZ\|ZZZI)_[HSD]$",	(instregex "^FML[AS]_(ZPmZZ\|ZZZI\|ZPZZZ)_[HSD]",
	"^(FMAD\|FNMAD\|FNML[AS]\|FN?MSB)_ZPmZZ_[HSD]$")>;	"^(FMAD\|FNMAD\|FNML[AS]\|FN?MSB)_(ZPmZZ\|ZPZZZ)_[HSD]")>;

	// Floating point multiply add/sub accumulate long	// Floating point multiply add/sub accumulate long
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FML[AS]L[BT]_ZZZI?_SHH$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FML[AS]L[BT]_ZZZI?_SHH")>;

	// Floating point reciprocal estimate, F16	// Floating point reciprocal estimate, F16
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs FRECPE_ZZ_H, FRECPX_ZPmZ_H,	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FRECPE_ZZ_H", "^FRECPX_ZPmZ_H",
	FRSQRTE_ZZ_H)>;	"^FRSQRTE_ZZ_H")>;

	// Floating point reciprocal estimate, F32	// Floating point reciprocal estimate, F32
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs FRECPE_ZZ_S, FRECPX_ZPmZ_S,	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FRECPE_ZZ_S", "^FRECPX_ZPmZ_S",
	FRSQRTE_ZZ_S)>;	"^FRSQRTE_ZZ_S")>;

	// Floating point reciprocal estimate, F64	// Floating point reciprocal estimate, F64
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instrs FRECPE_ZZ_D, FRECPX_ZPmZ_D,	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>],(instregex "^FRECPE_ZZ_D", "^FRECPX_ZPmZ_D",
	FRSQRTE_ZZ_D)>;	"^FRSQRTE_ZZ_D")>;

	// Floating point reciprocal step	// Floating point reciprocal step
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^F(RECPS\|RSQRTS)_ZZZ_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^F(RECPS\|RSQRTS)_ZZZ_[HSD]")>;

	// Floating point reduction, F16	// Floating point reduction, F16
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>],
	(instregex "^(FMAXNMV\|FMAXV\|FMINNMV\|FMINV)_VPZ_[HSD]$")>;	(instregex "^(FMAXNMV\|FMAXV\|FMINNMV\|FMINV)_VPZ_[HSD]")>;

	// Floating point reduction, F32	// Floating point reduction, F32
	def : InstRW<[CortexA510MCWrite<12, 11, CortexA510UnitVALU0>],	def : InstRW<[CortexA510MCWrite<12, 11, CortexA510UnitVALU0>],
	(instregex "^FADDV_VPZ_H$")>;	(instregex "^FADDV_VPZ_H")>;

	def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU0>],	def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU0>],
	(instregex "^FADDV_VPZ_S$")>;	(instregex "^FADDV_VPZ_S")>;

	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>],	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU0>],
	(instregex "^FADDV_VPZ_D$")>;	(instregex "^FADDV_VPZ_D")>;


	// Floating point round to integral, F16	// Floating point round to integral, F16
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FRINT[AIMNPXZ]_ZPmZ_H$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FRINT[AIMNPXZ]_ZPmZ_H")>;

	// Floating point round to integral, F32	// Floating point round to integral, F32
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FRINT[AIMNPXZ]_ZPmZ_S$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FRINT[AIMNPXZ]_ZPmZ_S")>;

	// Floating point round to integral, F64	// Floating point round to integral, F64
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FRINT[AIMNPXZ]_ZPmZ_D$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FRINT[AIMNPXZ]_ZPmZ_D")>;

	// Floating point square root, F16	// Floating point square root, F16
	def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVMC>], (instrs FSQRT_ZPmZ_H)>;	def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVMC>], (instregex "^FSQRT_ZPmZ_H")>;

	// Floating point square root, F32	// Floating point square root, F32
	def : InstRW<[CortexA510MCWrite<12, 9, CortexA510UnitVMC>], (instrs FSQRT_ZPmZ_S)>;	def : InstRW<[CortexA510MCWrite<12, 9, CortexA510UnitVMC>], (instregex "^FSQRT_ZPmZ_S")>;

	// Floating point square root, F64	// Floating point square root, F64
	def : InstRW<[CortexA510MCWrite<22, 19, CortexA510UnitVMC>], (instrs FSQRT_ZPmZ_D)>;	def : InstRW<[CortexA510MCWrite<22, 19, CortexA510UnitVMC>], (instregex "^FSQRT_ZPmZ_D")>;

	// Floating point trigonometric exponentiation	// Floating point trigonometric exponentiation
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FEXPA_ZZ_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FEXPA_ZZ_[HSD]")>;

	// Floating point trigonometric multiply add	// Floating point trigonometric multiply add
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FTMAD_ZZI_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FTMAD_ZZI_[HSD]")>;

	// Floating point trigonometric, miscellaneous	// Floating point trigonometric, miscellaneous
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FTSMUL_ZZZ_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^FTSMUL_ZZZ_[HSD]")>;
	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FTSSEL_ZZZ_[HSD]$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FTSSEL_ZZZ_[HSD]")>;


	// SVE BFloat16 (BF16) instructions	// SVE BFloat16 (BF16) instructions
Context not available.
	def : InstRW<[A510Write_15cyc_1VMAC_1VALU], (instrs BFMMLA_ZZZ)>;	def : InstRW<[A510Write_15cyc_1VMAC_1VALU], (instrs BFMMLA_ZZZ)>;

	// Multiply accumulate long	// Multiply accumulate long
	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^BFMLAL[BT]_ZZZ(I)?$")>;	def : InstRW<[CortexA510Write<4, CortexA510UnitVMAC>], (instregex "^BFMLAL[BT]_ZZZ(I)?")>;

	// SVE Load instructions	// SVE Load instructions
	// -----------------------------------------------------------------------------	// -----------------------------------------------------------------------------
Context not available.

llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td

Context not available.

// Loop control, based on GPR

def : InstRW<[V2Write_3cyc_2M],

(instregex "^WHILE(GE|GT|HI|HS|LE|LO|LS|LT)_P(WW|XX)_[BHSD]$")>;

(instregex "^WHILE(GE|GT|HI|HS|LE|LO|LS|LT)_P(WW|XX)_[BHSD]")>;

def : InstRW<[V2Write_3cyc_2M], (instregex "^WHILE(RW|WR)_PXX_[BHSD]$")>;

def : InstRW<[V2Write_3cyc_2M], (instregex "^WHILE(RW|WR)_PXX_[BHSD]")>;

// Loop terminate

def : InstRW<[V2Write_1cyc_2M], (instregex "^CTERM(EQ|NE)_(WW|XX)$")>;

def : InstRW<[V2Write_1cyc_2M], (instregex "^CTERM(EQ|NE)_(WW|XX)")>;

// Predicate counting scalar

def : InstRW<[V2Write_2cyc_1M], (instrs ADDPL_XXI, ADDVL_XXI, RDVLI_XI)>;

def : InstRW<[V2Write_2cyc_1M],

(instregex "^(CNT|SQDEC|SQINC|UQDEC|UQINC)[BHWD]_XPiI$",

(instregex "^(CNT|SQDEC|SQINC|UQDEC|UQINC)[BHWD]_XPiI",

"^SQ(DEC|INC)[BHWD]_XPiWdI$",

"^SQ(DEC|INC)[BHWD]_XPiWdI",

"^UQ(DEC|INC)[BHWD]_WPiI$")>;

"^UQ(DEC|INC)[BHWD]_WPiI")>;

// Predicate counting scalar, ALL, {1,2,4}

def : InstRW<[V2Write_IncDec], (instregex "^(DEC|INC)[BHWD]_XPiI$")>;

def : InstRW<[V2Write_IncDec], (instregex "^(DEC|INC)[BHWD]_XPiI")>;

// Predicate counting scalar, active predicate

def : InstRW<[V2Write_2cyc_1M],

(instregex "^CNTP_XPP_[BHSD]$",

(instregex "^CNTP_XPP_[BHSD]",

"^(UQDEC|UQINC)P_WP_[BHSD]$",

"^(UQDEC|UQINC)P_WP_[BHSD]",

"^(SQDEC|SQINC)P_XPWd_[BHSD]$")>;

"^(SQDEC|SQINC)P_XPWd_[BHSD]")>;

// Predicate counting vector, active predicate

def : InstRW<[V2Write_7cyc_1M_1M0_1V],

// Predicate logical

def : InstRW<[V2Write_1or2cyc_1M0],

(instregex "^(AND|BIC|EOR|NAND|NOR|ORN|ORR)_PPzPP$")>;

(instregex "^(AND|BIC|EOR|NAND|NOR|ORN|ORR)_PPzPP")>;

// Predicate logical, flag setting

def : InstRW<[V2Write_1or2cyc_1M0_1M],

// Predicate reverse

def : InstRW<[V2Write_2cyc_1M], (instregex "^REV_PP_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1M], (instregex "^REV_PP_[BHSD]")>;

// Predicate select

def : InstRW<[V2Write_1cyc_1M0], (instrs SEL_PPPP)>;

// Predicate set

def : InstRW<[V2Write_2cyc_1M], (instregex "^PFALSE$", "^PTRUE_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1M], (instregex "^PFALSE", "^PTRUE_[BHSD]")>;

// Predicate set/initialize, set flags

def : InstRW<[V2Write_3cyc_2M], (instregex "^PTRUES_[BHSD]$")>;

def : InstRW<[V2Write_3cyc_2M], (instregex "^PTRUES_[BHSD]")>;

// Predicate find first/next

def : InstRW<[V2Write_2cyc_1M], (instregex "^PFIRST_B$", "^PNEXT_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1M], (instregex "^PFIRST_B", "^PNEXT_[BHSD]")>;

// Predicate test

def : InstRW<[V2Write_1cyc_1M], (instrs PTEST_PP)>;

// Predicate transpose

def : InstRW<[V2Write_2cyc_1M], (instregex "^TRN[12]_PPP_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1M], (instregex "^TRN[12]_PPP_[BHSD]")>;

// Predicate unpack and widen

def : InstRW<[V2Write_2cyc_1M], (instrs PUNPKHI_PP, PUNPKLO_PP)>;

// Predicate zip/unzip

def : InstRW<[V2Write_2cyc_1M], (instregex "^(ZIP|UZP)[12]_PPP_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1M], (instregex "^(ZIP|UZP)[12]_PPP_[BHSD]")>;

// SVE integer instructions

// -----------------------------------------------------------------------------

// Arithmetic, absolute diff

def : InstRW<[V2Write_2cyc_1V], (instregex "^[SU]ABD_ZPmZ_[BHSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^[SU]ABD_ZPmZ_[BHSD]",

"^[SU]ABD_ZPZZ_UNDEF_[BHSD]$")>;

"^[SU]ABD_ZPZZ_[BHSD]")>;

// Arithmetic, absolute diff accum

def : InstRW<[V2Wr_ZA, V2Rd_ZA], (instregex "^[SU]ABA_ZZZ_[BHSD]$")>;

def : InstRW<[V2Wr_ZA, V2Rd_ZA], (instregex "^[SU]ABA_ZZZ_[BHSD]")>;

// Arithmetic, absolute diff accum long

def : InstRW<[V2Wr_ZA, V2Rd_ZA], (instregex "^[SU]ABAL[TB]_ZZZ_[HSD]$")>;

def : InstRW<[V2Wr_ZA, V2Rd_ZA], (instregex "^[SU]ABAL[TB]_ZZZ_[HSD]")>;

// Arithmetic, absolute diff long

def : InstRW<[V2Write_2cyc_1V], (instregex "^[SU]ABDL[TB]_ZZZ_[HSD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^[SU]ABDL[TB]_ZZZ_[HSD]")>;

// Arithmetic, basic

def : InstRW<[V2Write_2cyc_1V],

(instregex "^(ABS|ADD|CNOT|NEG|SUB|SUBR)_ZPmZ_[BHSD]$",

(instregex "^(ABS|ADD|CNOT|NEG|SUB|SUBR)_ZPmZ_[BHSD]",

"^(ABS|CNOT|NEG)_ZPmZ_UNDEF_[BHSD]$",

"^(ADD|SUB)_ZZZ_[BHSD]",

rjjUnsubmitted

Done

def : InstRW<[V2Write_2cyc_1V],

(instregex "^(ABS|ADD|CNOT|NEG|SUB|SUBR)_ZPmZ_[BHSD]",

- "^(ABS|CNOT|NEG)_ZPmZ_[BHSD]",

"^(ADD|SUB)_ZZZ_[BHSD]",

This line is actually not necessary anymore

rjj: This line is actually not necessary anymore

"^(ADD|SUB)_ZZZ_[BHSD]$",

"^(ADD|SUB|SUBR)_ZPZZ_[BHSD]",

"^(ADD|SUB|SUBR)_ZI_[BHSD]$",

"^(ADD|SUB|SUBR)_ZI_[BHSD]",

"^ADR_[SU]XTW_ZZZ_D_[0123]$",

"^ADR_[SU]XTW_ZZZ_D_[0123]",

"^ADR_LSL_ZZZ_[SD]_[0123]$",

"^ADR_LSL_ZZZ_[SD]_[0123]",

"^[SU](ADD|SUB)[LW][BT]_ZZZ_[HSD]$",

"^[SU](ADD|SUB)[LW][BT]_ZZZ_[HSD]",

"^SADDLBT_ZZZ_[HSD]$",

"^SADDLBT_ZZZ_[HSD]",

"^[SU]H(ADD|SUB|SUBR)_ZPmZ_[BHSD]$",

"^[SU]H(ADD|SUB|SUBR)_ZPmZ_[BHSD]",

"^SSUBL(BT|TB)_ZZZ_[HSD]$")>;

"^SSUBL(BT|TB)_ZZZ_[HSD]")>;

// Arithmetic, complex

def : InstRW<[V2Write_2cyc_1V],

(instregex "^R?(ADD|SUB)HN[BT]_ZZZ_[BHS]$",

(instregex "^R?(ADD|SUB)HN[BT]_ZZZ_[BHS]",

"^SQ(ABS|ADD|NEG|SUB|SUBR)_ZPmZ_[BHSD]$",

"^SQ(ABS|ADD|NEG|SUB|SUBR)_ZPmZ_[BHSD]",

"^SQ(ABS|NEG)_ZPmZ_UNDEF_[BHSD]$",

"^[SU]Q(ADD|SUB)_ZZZ_[BHSD]",

rjjUnsubmitted

Done

(instregex "^R?(ADD|SUB)HN[BT]_ZZZ_[BHS]",

"^SQ(ABS|ADD|NEG|SUB|SUBR)_ZPmZ_[BHSD]",

- "^SQ(ABS|NEG)_ZPmZ_[BHSD]",

"^[SU]Q(ADD|SUB)_ZZZ_[BHSD]",

Same as above.

rjj: Same as above.

"^[SU]Q(ADD|SUB)_ZZZ_[BHSD]$",

"^[SU]Q(ADD|SUB)_ZI_[BHSD]",

"^[SU]Q(ADD|SUB)_ZI_[BHSD]$",

"^(SRH|SUQ|UQ|USQ|URH)ADD_ZPmZ_[BHSD]",

"^(SRH|SUQ|UQ|USQ|URH)ADD_ZPmZ_[BHSD]$",

"^(UQSUB|UQSUBR)_ZPmZ_[BHSD]")>;

"^(UQSUB|UQSUBR)_ZPmZ_[BHSD]$")>;

// Arithmetic, large integer

def : InstRW<[V2Write_2cyc_1V], (instregex "^(AD|SB)CL[BT]_ZZZ_[SD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^(AD|SB)CL[BT]_ZZZ_[SD]")>;

// Arithmetic, pairwise add

def : InstRW<[V2Write_2cyc_1V], (instregex "^ADDP_ZPmZ_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^ADDP_ZPmZ_[BHSD]")>;

// Arithmetic, pairwise add and accum long

def : InstRW<[V2Wr_ZPA, ReadDefault, V2Rd_ZPA],

(instregex "^[SU]ADALP_ZPmZ_[HSD]$")>;

(instregex "^[SU]ADALP_ZPmZ_[HSD]")>;

// Arithmetic, shift

def : InstRW<[V2Write_2cyc_1V13],

(instregex "^(ASR|LSL|LSR)_WIDE_ZPmZ_[BHS]$",

(instregex "^(ASR|LSL|LSR)_WIDE_ZPmZ_[BHS]",

"^(ASR|LSL|LSR)_WIDE_ZZZ_[BHS]$",

"^(ASR|LSL|LSR)_WIDE_ZZZ_[BHS]",

"^(ASR|LSL|LSR)_ZPmI_[BHSD]$",

"^(ASR|LSL|LSR)_ZPmI_[BHSD]",

"^(ASR|LSL|LSR)_ZPmZ_[BHSD]$",

"^(ASR|LSL|LSR)_ZPmZ_[BHSD]",

"^(ASR|LSL|LSR)_ZZI_[BHSD]$",

"^(ASR|LSL|LSR)_ZZI_[BHSD]",

"^(ASR|LSL|LSR)_ZPZ[IZ]_UNDEF_[BHSD]$",

"^(ASR|LSL|LSR)_ZPZ[IZ]_[BHSD]",

"^(ASRR|LSLR|LSRR)_ZPmZ_[BHSD]$")>;

"^(ASRR|LSLR|LSRR)_ZPmZ_[BHSD]")>;

// Arithmetic, shift and accumulate

def : InstRW<[V2Wr_ZSA, V2Rd_ZSA], (instregex "^[SU]R?SRA_ZZI_[BHSD]$")>;

def : InstRW<[V2Wr_ZSA, V2Rd_ZSA], (instregex "^[SU]R?SRA_ZZI_[BHSD]")>;

// Arithmetic, shift by immediate

def : InstRW<[V2Write_2cyc_1V13], (instregex "^SHRN[BT]_ZZI_[BHS]$",

def : InstRW<[V2Write_2cyc_1V13], (instregex "^SHRN[BT]_ZZI_[BHS]",

"^[SU]SHLL[BT]_ZZI_[HSD]$")>;

"^[SU]SHLL[BT]_ZZI_[HSD]")>;

// Arithmetic, shift by immediate and insert

def : InstRW<[V2Write_2cyc_1V13], (instregex "^(SLI|SRI)_ZZI_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1V13], (instregex "^(SLI|SRI)_ZZI_[BHSD]")>;

// Arithmetic, shift complex

def : InstRW<[V2Write_4cyc_1V13],

(instregex "^(SQ)?RSHRU?N[BT]_ZZI_[BHS]$",

(instregex "^(SQ)?RSHRU?N[BT]_ZZI_[BHS]",

"^[SU]QR?SHL_ZPZZ_UNDEF_[BHSD]$",

"^[SU]QR?SHL_ZPZZ_[BHSD]",

"^(SQSHL|SQSHLU|UQSHL)_ZPmI_[BHSD]$",

"^(SQSHL|SQSHLU|UQSHL)_(ZPmI|ZPZI)_[BHSD]",

"^SQSHRU?N[BT]_ZZI_[BHS]$",

"^SQSHRU?N[BT]_ZZI_[BHS]",

"^UQR?SHRN[BT]_ZZI_[BHS]$")>;

"^UQR?SHRN[BT]_ZZI_[BHS]")>;

// Arithmetic, shift right for divide

def : InstRW<[V2Write_4cyc_1V13], (instregex "^ASRD_ZPmI_[BHSD]$")>;

def : InstRW<[V2Write_4cyc_1V13], (instregex "^ASRD_(ZPmI|ZPZI)_[BHSD]")>;

// Arithmetic, shift rounding

def : InstRW<[V2Write_4cyc_1V13], (instregex "^[SU]RSHLR?_ZPmZ_[BHSD]$",

def : InstRW<[V2Write_4cyc_1V13], (instregex "^[SU]RSHLR?_ZPmZ_[BHSD]",

"^[SU]RSHL_ZPZZ_UNDEF_[BHSD]$",

"^[SU]RSHL_ZPZZ_[BHSD]",

"^[SU]RSHR_ZPmI_[BHSD]$")>;

"^[SU]RSHR_(ZPmI|ZPZI)_[BHSD]")>;

// Bit manipulation

def : InstRW<[V2Write_6cyc_2V1], (instregex "^(BDEP|BEXT|BGRP)_ZZZ_[BHSD]$")>;

def : InstRW<[V2Write_6cyc_2V1], (instregex "^(BDEP|BEXT|BGRP)_ZZZ_[BHSD]")>;

// Bitwise select

def : InstRW<[V2Write_2cyc_1V], (instregex "^(BSL|BSL1N|BSL2N|NBSL)_ZZZZ$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^(BSL|BSL1N|BSL2N|NBSL)_ZZZZ")>;

// Count/reverse bits

def : InstRW<[V2Write_2cyc_1V], (instregex "^(CLS|CLZ|CNT|RBIT)_ZPmZ_[BHSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^(CLS|CLZ|CNT|RBIT)_ZPmZ_[BHSD]")>;

"^(CLS|CLZ|CNT)_ZPmZ_UNDEF_[BHSD]$")>;

// Broadcast logical bitmask immediate to vector

rjjUnsubmitted

Done

// Count/reverse bits

- def : InstRW<[V2Write_2cyc_1V], (instregex "^(CLS|CLZ|CNT|RBIT)_ZPmZ_[BHSD]",

- "^(CLS|CLZ|CNT)_ZPmZ_[BHSD]")>;

+ def : InstRW<[V2Write_2cyc_1V], (instregex "^(CLS|CLZ|CNT|RBIT)_ZPmZ_[BHSD]")>;

// Broadcast logical bitmask immediate to vector

Unnecessary now.

rjj: Unnecessary now.

def : InstRW<[V2Write_2cyc_1V], (instrs DUPM_ZI)>;

// Compare and set flags

def : InstRW<[V2Write_4or5cyc_1V0_1M0],

(instregex "^CMP(EQ|GE|GT|HI|HS|LE|LO|LS|LT|NE)_PPzZ[IZ]_[BHSD]$",

(instregex "^CMP(EQ|GE|GT|HI|HS|LE|LO|LS|LT|NE)_PPzZ[IZ]_[BHSD]",

"^CMP(EQ|GE|GT|HI|HS|LE|LO|LS|LT|NE)_WIDE_PPzZZ_[BHS]$")>;

"^CMP(EQ|GE|GT|HI|HS|LE|LO|LS|LT|NE)_WIDE_PPzZZ_[BHS]")>;

// Complex add

def : InstRW<[V2Write_2cyc_1V], (instregex "^(SQ)?CADD_ZZI_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^(SQ)?CADD_ZZI_[BHSD]")>;

// Complex dot product 8-bit element

def : InstRW<[V2Wr_ZDOTB, V2Rd_ZDOTB], (instrs CDOT_ZZZ_S, CDOT_ZZZI_S)>;

Context not available.

def : InstRW<[V2Wr_ZDOTH, V2Rd_ZDOTH], (instrs CDOT_ZZZ_D, CDOT_ZZZI_D)>;

// Complex multiply-add B, H, S element size

def : InstRW<[V2Wr_ZCMABHS, V2Rd_ZCMABHS], (instregex "^CMLA_ZZZ_[BHS]$",

def : InstRW<[V2Wr_ZCMABHS, V2Rd_ZCMABHS], (instregex "^CMLA_ZZZ_[BHS]",

"^CMLA_ZZZI_[HS]$")>;

"^CMLA_ZZZI_[HS]")>;

// Complex multiply-add D element size

def : InstRW<[V2Wr_ZCMAD, V2Rd_ZCMAD], (instrs CMLA_ZZZ_D)>;

// Conditional extract operations, scalar form

def : InstRW<[V2Write_8cyc_1M0_1V01], (instregex "^CLAST[AB]_RPZ_[BHSD]$")>;

def : InstRW<[V2Write_8cyc_1M0_1V01], (instregex "^CLAST[AB]_RPZ_[BHSD]")>;

// Conditional extract operations, SIMD&FP scalar and vector forms

def : InstRW<[V2Write_3cyc_1V1], (instregex "^CLAST[AB]_[VZ]PZ_[BHSD]$",

def : InstRW<[V2Write_3cyc_1V1], (instregex "^CLAST[AB]_[VZ]PZ_[BHSD]",

"^COMPACT_ZPZ_[SD]$",

"^COMPACT_ZPZ_[SD]",

"^SPLICE_ZPZZ?_[BHSD]$")>;

"^SPLICE_ZPZZ?_[BHSD]")>;

// Convert to floating point, 64b to float or convert to double

def : InstRW<[V2Write_3cyc_1V02], (instregex "^[SU]CVTF_ZPmZ_Dto[HSD](_UNDEF)?$",

def : InstRW<[V2Write_3cyc_1V02], (instregex "^[SU]CVTF_ZPmZ_Dto[HSD]",

"^[SU]CVTF_ZPmZ_StoD(_UNDEF)?$")>;

"^[SU]CVTF_ZPmZ_StoD")>;

// Convert to floating point, 32b to single or half

def : InstRW<[V2Write_4cyc_2V02], (instregex "^[SU]CVTF_ZPmZ_Sto[HS](_UNDEF)?$")>;

def : InstRW<[V2Write_4cyc_2V02], (instregex "^[SU]CVTF_ZPmZ_Sto[HS]")>;

// Convert to floating point, 16b to half

def : InstRW<[V2Write_6cyc_4V02], (instregex "^[SU]CVTF_ZPmZ_HtoH(_UNDEF)?$")>;

def : InstRW<[V2Write_6cyc_4V02], (instregex "^[SU]CVTF_ZPmZ_HtoH")>;

// Copy, scalar

def : InstRW<[V2Write_5cyc_1M0_1V], (instregex "^CPY_ZPmR_[BHSD]$")>;

def : InstRW<[V2Write_5cyc_1M0_1V], (instregex "^CPY_ZPmR_[BHSD]")>;

// Copy, scalar SIMD&FP or imm

def : InstRW<[V2Write_2cyc_1V], (instregex "^CPY_ZPm[IV]_[BHSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^CPY_ZPm[IV]_[BHSD]",

"^CPY_ZPzI_[BHSD]$")>;

"^CPY_ZPzI_[BHSD]")>;

// Divides, 32 bit

def : InstRW<[V2Write_12cyc_1V0], (instregex "^[SU]DIVR?_ZPmZ_S$",

def : InstRW<[V2Write_12cyc_1V0], (instregex "^[SU]DIVR?_ZPmZ_S",

"^[SU]DIV_ZPZZ_UNDEF_S$")>;

"^[SU]DIV_ZPZZ_S")>;

// Divides, 64 bit

def : InstRW<[V2Write_20cyc_1V0], (instregex "^[SU]DIVR?_ZPmZ_D$",

def : InstRW<[V2Write_20cyc_1V0], (instregex "^[SU]DIVR?_ZPmZ_D",

"^[SU]DIV_ZPZZ_UNDEF_D$")>;

"^[SU]DIV_ZPZZ_D")>;

// Dot product, 8 bit

def : InstRW<[V2Wr_ZDOTB, V2Rd_ZDOTB], (instregex "^[SU]DOT_ZZZI?_S$")>;

def : InstRW<[V2Wr_ZDOTB, V2Rd_ZDOTB], (instregex "^[SU]DOT_ZZZI?_S")>;

// Dot product, 8 bit, using signed and unsigned integers

def : InstRW<[V2Wr_ZDOTB, V2Rd_ZDOTB], (instrs SUDOT_ZZZI, USDOT_ZZZI, USDOT_ZZZ)>;

// Dot product, 16 bit

def : InstRW<[V2Wr_ZDOTH, V2Rd_ZDOTH], (instregex "^[SU]DOT_ZZZI?_D$")>;

def : InstRW<[V2Wr_ZDOTH, V2Rd_ZDOTH], (instregex "^[SU]DOT_ZZZI?_D")>;

// Duplicate, immediate and indexed form

def : InstRW<[V2Write_2cyc_1V], (instregex "^DUP_ZI_[BHSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^DUP_ZI_[BHSD]",

"^DUP_ZZI_[BHSDQ]$")>;

"^DUP_ZZI_[BHSDQ]")>;

// Duplicate, scalar form

def : InstRW<[V2Write_3cyc_1M0], (instregex "^DUP_ZR_[BHSD]$")>;

def : InstRW<[V2Write_3cyc_1M0], (instregex "^DUP_ZR_[BHSD]")>;

// Extend, sign or zero

def : InstRW<[V2Write_2cyc_1V13], (instregex "^[SU]XTB_ZPmZ(_UNDEF)?_[HSD]$",

def : InstRW<[V2Write_2cyc_1V13], (instregex "^[SU]XTB_ZPmZ_[HSD]",

"^[SU]XTH_ZPmZ(_UNDEF)?_[SD]$",

"^[SU]XTH_ZPmZ_[SD]",

"^[SU]XTW_ZPmZ(_UNDEF)?_[D]$")>;

"^[SU]XTW_ZPmZ_[D]")>;

// Extract

def : InstRW<[V2Write_2cyc_1V], (instrs EXT_ZZI, EXT_ZZI_B)>;

// Extract narrow saturating

def : InstRW<[V2Write_4cyc_1V13], (instregex "^[SU]QXTN[BT]_ZZ_[BHS]$",

def : InstRW<[V2Write_4cyc_1V13], (instregex "^[SU]QXTN[BT]_ZZ_[BHS]",

"^SQXTUN[BT]_ZZ_[BHS]$")>;

"^SQXTUN[BT]_ZZ_[BHS]")>;

// Extract/insert operation, SIMD and FP scalar form

def : InstRW<[V2Write_3cyc_1V1], (instregex "^LAST[AB]_VPZ_[BHSD]$",

def : InstRW<[V2Write_3cyc_1V1], (instregex "^LAST[AB]_VPZ_[BHSD]",

"^INSR_ZV_[BHSD]$")>;

"^INSR_ZV_[BHSD]")>;

// Extract/insert operation, scalar

def : InstRW<[V2Write_6cyc_1V1_1M0], (instregex "^LAST[AB]_RPZ_[BHSD]$",

def : InstRW<[V2Write_6cyc_1V1_1M0], (instregex "^LAST[AB]_RPZ_[BHSD]",

"^INSR_ZR_[BHSD]$")>;

"^INSR_ZR_[BHSD]")>;

// Histogram operations

def : InstRW<[V2Write_2cyc_1V], (instregex "^HISTCNT_ZPzZZ_[SD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^HISTCNT_ZPzZZ_[SD]",

"^HISTSEG_ZZZ$")>;

"^HISTSEG_ZZZ")>;

// Horizontal operations, B, H, S form, immediate operands only

def : InstRW<[V2Write_4cyc_1V02], (instregex "^INDEX_II_[BHS]$")>;

def : InstRW<[V2Write_4cyc_1V02], (instregex "^INDEX_II_[BHS]")>;

// Horizontal operations, B, H, S form, scalar, immediate operands/ scalar

// operands only / immediate, scalar operands

def : InstRW<[V2Write_7cyc_1M0_1V02], (instregex "^INDEX_(IR|RI|RR)_[BHS]$")>;

def : InstRW<[V2Write_7cyc_1M0_1V02], (instregex "^INDEX_(IR|RI|RR)_[BHS]")>;

// Horizontal operations, D form, immediate operands only

def : InstRW<[V2Write_5cyc_2V02], (instrs INDEX_II_D)>;

// Horizontal operations, D form, scalar, immediate operands)/ scalar operands

// only / immediate, scalar operands

def : InstRW<[V2Write_8cyc_2M0_2V02], (instregex "^INDEX_(IR|RI|RR)_D$")>;

def : InstRW<[V2Write_8cyc_2M0_2V02], (instregex "^INDEX_(IR|RI|RR)_D")>;

// Logical

def : InstRW<[V2Write_2cyc_1V],

(instregex "^(AND|EOR|ORR)_ZI$",

(instregex "^(AND|EOR|ORR)_ZI",

"^(AND|BIC|EOR|ORR)_ZZZ$",

"^(AND|BIC|EOR|ORR)_ZZZ",

"^EOR(BT|TB)_ZZZ_[BHSD]$",

"^EOR(BT|TB)_ZZZ_[BHSD]",

"^(AND|BIC|EOR|NOT|ORR)_ZPmZ_[BHSD]$",

"^(AND|BIC|EOR|NOT|ORR)_ZPmZ_[BHSD]",

"^NOT_ZPmZ_UNDEF_[BHSD]$")>;

"^(AND|BIC|EOR|NOT|ORR)_ZPZZ_[BHSD]",

"^NOT_ZPmZ_[BHSD]")>;

// Max/min, basic and pairwise

def : InstRW<[V2Write_2cyc_1V], (instregex "^[SU](MAX|MIN)_ZI_[BHSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^[SU](MAX|MIN)_ZI_[BHSD]",

"^[SU](MAX|MIN)P?_ZPmZ_[BHSD]$",

"^[SU](MAX|MIN)P?_ZPmZ_[BHSD]",

"^[SU](MAX|MIN)_ZPZZ_UNDEF_[BHSD]$")>;

"^[SU](MAX|MIN)_ZPZZ_[BHSD]")>;

// Matching operations

// FIXME: SOG p. 44, n. 5: If the consuming instruction has a flag source, the

// latency for this instruction is 4 cycles.

def : InstRW<[V2Write_2or3cyc_1V0_1M], (instregex "^N?MATCH_PPzZZ_[BH]$")>;

def : InstRW<[V2Write_2or3cyc_1V0_1M], (instregex "^N?MATCH_PPzZZ_[BH]")>;

// Matrix multiply-accumulate

def : InstRW<[V2Wr_ZMMA, V2Rd_ZMMA], (instrs SMMLA_ZZZ, UMMLA_ZZZ, USMMLA_ZZZ)>;

// Move prefix

def : InstRW<[V2Write_2cyc_1V], (instregex "^MOVPRFX_ZP[mz]Z_[BHSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^MOVPRFX_ZP[mz]Z_[BHSD]",

"^MOVPRFX_ZZ$")>;

"^MOVPRFX_ZZ")>;

// Multiply, B, H, S element size

def : InstRW<[V2Write_4cyc_1V02], (instregex "^MUL_(ZI|ZPmZ|ZZZI|ZZZ)_[BHS]$",

def : InstRW<[V2Write_4cyc_1V02], (instregex "^MUL_(ZI|ZPmZ|ZZZI|ZZZ)_[BHS]",

"^MUL_ZPZZ_UNDEF_[BHS]$",

"^MUL_ZPZZ_[BHS]",

"^[SU]MULH_(ZPmZ|ZZZ)_[BHS]$",

"^[SU]MULH_(ZPmZ|ZZZ)_[BHS]",

"^[SU]MULH_ZPZZ_UNDEF_[BHS]$")>;

"^[SU]MULH_ZPZZ_[BHS]")>;

// Multiply, D element size

def : InstRW<[V2Write_5cyc_2V02], (instregex "^MUL_(ZI|ZPmZ|ZZZI|ZZZ)_D$",

def : InstRW<[V2Write_5cyc_2V02], (instregex "^MUL_(ZI|ZPmZ|ZZZI|ZZZ)_D",

"^MUL_ZPZZ_UNDEF_D$",

"^MUL_ZPZZ_D",

"^[SU]MULH_(ZPmZ|ZZZ)_D$",

"^[SU]MULH_(ZPmZ|ZZZ)_D",

"^[SU]MULH_ZPZZ_UNDEF_D$")>;

"^[SU]MULH_ZPZZ_D")>;

// Multiply long

def : InstRW<[V2Write_4cyc_1V02], (instregex "^[SU]MULL[BT]_ZZZI_[SD]$",

def : InstRW<[V2Write_4cyc_1V02], (instregex "^[SU]MULL[BT]_ZZZI_[SD]",

"^[SU]MULL[BT]_ZZZ_[HSD]$")>;

"^[SU]MULL[BT]_ZZZ_[HSD]")>;

// Multiply accumulate, B, H, S element size

def : InstRW<[V2Wr_ZMABHS, V2Rd_ZMABHS],

(instregex "^ML[AS]_ZZZI_[HS]$", "^ML[AS]_ZPZZZ_UNDEF_[BHS]$")>;

(instregex "^ML[AS]_ZZZI_[HS]", "^ML[AS]_ZPZZZ_[BHS]")>;

def : InstRW<[V2Wr_ZMABHS, ReadDefault, V2Rd_ZMABHS],

(instregex "^(ML[AS]|MAD|MSB)_ZPmZZ_[BHS]$")>;

(instregex "^(ML[AS]|MAD|MSB)_ZPmZZ_[BHS]")>;

// Multiply accumulate, D element size

def : InstRW<[V2Wr_ZMAD, V2Rd_ZMAD],

(instregex "^ML[AS]_ZZZI_D$", "^ML[AS]_ZPZZZ_UNDEF_D$")>;

(instregex "^ML[AS]_ZZZI_D", "^ML[AS]_ZPZZZ_D")>;

def : InstRW<[V2Wr_ZMAD, ReadDefault, V2Rd_ZMAD],

(instregex "^(ML[AS]|MAD|MSB)_ZPmZZ_D$")>;

(instregex "^(ML[AS]|MAD|MSB)_ZPmZZ_D")>;

// Multiply accumulate long

def : InstRW<[V2Wr_ZMAL, V2Rd_ZMAL], (instregex "^[SU]ML[AS]L[BT]_ZZZ_[HSD]$",

def : InstRW<[V2Wr_ZMAL, V2Rd_ZMAL], (instregex "^[SU]ML[AS]L[BT]_ZZZ_[HSD]",

"^[SU]ML[AS]L[BT]_ZZZI_[SD]$")>;

"^[SU]ML[AS]L[BT]_ZZZI_[SD]")>;

// Multiply accumulate saturating doubling long regular

def : InstRW<[V2Wr_ZMASQL, V2Rd_ZMASQ],

(instregex "^SQDML[AS]L(B|T|BT)_ZZZ_[HSD]$",

(instregex "^SQDML[AS]L(B|T|BT)_ZZZ_[HSD]",

"^SQDML[AS]L[BT]_ZZZI_[SD]$")>;

"^SQDML[AS]L[BT]_ZZZI_[SD]")>;

// Multiply saturating doubling high, B, H, S element size

def : InstRW<[V2Write_4cyc_1V02], (instregex "^SQDMULH_ZZZ_[BHS]$",

def : InstRW<[V2Write_4cyc_1V02], (instregex "^SQDMULH_ZZZ_[BHS]",

"^SQDMULH_ZZZI_[HS]$")>;

"^SQDMULH_ZZZI_[HS]")>;

// Multiply saturating doubling high, D element size

def : InstRW<[V2Write_5cyc_2V02], (instrs SQDMULH_ZZZ_D, SQDMULH_ZZZI_D)>;

// Multiply saturating doubling long

def : InstRW<[V2Write_4cyc_1V02], (instregex "^SQDMULL[BT]_ZZZ_[HSD]$",

def : InstRW<[V2Write_4cyc_1V02], (instregex "^SQDMULL[BT]_ZZZ_[HSD]",

"^SQDMULL[BT]_ZZZI_[SD]$")>;

"^SQDMULL[BT]_ZZZI_[SD]")>;

// Multiply saturating rounding doubling regular/complex accumulate, B, H, S

// element size

def : InstRW<[V2Wr_ZMASQBHS, V2Rd_ZMASQ], (instregex "^SQRDML[AS]H_ZZZ_[BHS]$",

def : InstRW<[V2Wr_ZMASQBHS, V2Rd_ZMASQ], (instregex "^SQRDML[AS]H_ZZZ_[BHS]",

"^SQRDCMLAH_ZZZ_[BHS]$",

"^SQRDCMLAH_ZZZ_[BHS]",

"^SQRDML[AS]H_ZZZI_[HS]$",

"^SQRDML[AS]H_ZZZI_[HS]",

"^SQRDCMLAH_ZZZI_[HS]$")>;

"^SQRDCMLAH_ZZZI_[HS]")>;

// Multiply saturating rounding doubling regular/complex accumulate, D element

// size

def : InstRW<[V2Wr_ZMASQD, V2Rd_ZMASQ], (instregex "^SQRDML[AS]H_ZZZI?_D$",

def : InstRW<[V2Wr_ZMASQD, V2Rd_ZMASQ], (instregex "^SQRDML[AS]H_ZZZI?_D",

"^SQRDCMLAH_ZZZ_D$")>;

"^SQRDCMLAH_ZZZ_D")>;

// Multiply saturating rounding doubling regular/complex, B, H, S element size

def : InstRW<[V2Write_4cyc_1V02], (instregex "^SQRDMULH_ZZZ_[BHS]$",

def : InstRW<[V2Write_4cyc_1V02], (instregex "^SQRDMULH_ZZZ_[BHS]",

"^SQRDMULH_ZZZI_[HS]$")>;

"^SQRDMULH_ZZZI_[HS]")>;

// Multiply saturating rounding doubling regular/complex, D element size

def : InstRW<[V2Write_5cyc_2V02], (instregex "^SQRDMULH_ZZZI?_D$")>;

def : InstRW<[V2Write_5cyc_2V02], (instregex "^SQRDMULH_ZZZI?_D")>;

// Multiply/multiply long, (8x8) polynomial

def : InstRW<[V2Write_2cyc_1V23], (instregex "^PMUL_ZZZ_B$",

def : InstRW<[V2Write_2cyc_1V23], (instregex "^PMUL_ZZZ_B",

"^PMULL[BT]_ZZZ_[HDQ]$")>;

"^PMULL[BT]_ZZZ_[HDQ]")>;

// Predicate counting vector

def : InstRW<[V2Write_2cyc_1V], (instregex "^([SU]Q)?(DEC|INC)[HWD]_ZPiI$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^([SU]Q)?(DEC|INC)[HWD]_ZPiI")>;

// Reciprocal estimate

def : InstRW<[V2Write_4cyc_2V02], (instrs URECPE_ZPmZ_S, URSQRTE_ZPmZ_S,

def : InstRW<[V2Write_4cyc_2V02], (instregex "^URECPE_ZPmZ_S", "^URSQRTE_ZPmZ_S")>;

URECPE_ZPmZ_UNDEF_S, URSQRTE_ZPmZ_UNDEF_S)>;

// Reduction, arithmetic, B form

rjjUnsubmitted

Done

These should still use the explicit forms, or be changed to instregex.

rjj: These should still use the explicit forms, or be changed to instregex.

def : InstRW<[V2Write_9cyc_2V_4V13], (instregex "^[SU](ADD|MAX|MIN)V_VPZ_B")>;

Context not available.

def : InstRW<[V2Write_4cyc_2V], (instregex "^[SU](ADD|MAX|MIN)V_VPZ_D")>;

// Reduction, logical

def : InstRW<[V2Write_6cyc_1V_1V13], (instregex "^(AND|EOR|OR)V_VPZ_[BHSD]$")>;

def : InstRW<[V2Write_6cyc_1V_1V13], (instregex "^(AND|EOR|OR)V_VPZ_[BHSD]")>;

// Reverse, vector

def : InstRW<[V2Write_2cyc_1V], (instregex "^REV_ZZ_[BHSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^REV_ZZ_[BHSD]",

"^REVB_ZPmZ_[HSD]$",

"^REVB_ZPmZ_[HSD]",

"^REVH_ZPmZ_[SD]$",

"^REVH_ZPmZ_[SD]",

"^REVW_ZPmZ_D$")>;

"^REVW_ZPmZ_D")>;

// Select, vector form

def : InstRW<[V2Write_2cyc_1V], (instregex "^SEL_ZPZZ_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^SEL_ZPZZ_[BHSD]")>;

// Table lookup

def : InstRW<[V2Write_2cyc_1V], (instregex "^TBL_ZZZZ?_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^TBL_ZZZZ?_[BHSD]")>;

// Table lookup extension

def : InstRW<[V2Write_2cyc_1V], (instregex "^TBX_ZZZ_[BHSD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^TBX_ZZZ_[BHSD]")>;

// Transpose, vector form

def : InstRW<[V2Write_2cyc_1V], (instregex "^TRN[12]_ZZZ_[BHSDQ]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^TRN[12]_ZZZ_[BHSDQ]")>;

// Unpack and extend

def : InstRW<[V2Write_2cyc_1V], (instregex "^[SU]UNPK(HI|LO)_ZZ_[HSD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^[SU]UNPK(HI|LO)_ZZ_[HSD]")>;

// Zip/unzip

def : InstRW<[V2Write_2cyc_1V], (instregex "^(UZP|ZIP)[12]_ZZZ_[BHSDQ]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^(UZP|ZIP)[12]_ZZZ_[BHSDQ]")>;

// SVE floating-point instructions

// -----------------------------------------------------------------------------

// Floating point absolute value/difference

def : InstRW<[V2Write_2cyc_1V], (instregex "^FAB[SD]_ZPmZ_[HSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^FAB[SD]_ZPmZ_[HSD]",

"^FABD_ZPZZ_UNDEF_[HSD]$",

"^FABD_ZPZZ_[HSD]",

"^FABS_ZPmZ_UNDEF_[HSD]$")>;

"^FABS_ZPmZ_[HSD]")>;

// Floating point arithmetic

def : InstRW<[V2Write_2cyc_1V], (instregex "^F(ADD|SUB)_(ZPm[IZ]|ZZZ)_[HSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^F(ADD|SUB)_(ZPm[IZ]|ZZZ)_[HSD]",

"^F(ADD|SUB)_ZPZ[IZ]_UNDEF_[HSD]$",

"^F(ADD|SUB)_ZPZ[IZ]_[HSD]",

"^FADDP_ZPmZZ_[HSD]$",

"^FADDP_ZPmZZ_[HSD]",

"^FNEG_ZPmZ(_UNDEF)?_[HSD]$",

"^FNEG_ZPmZ_[HSD]",

"^FSUBR_ZPm[IZ]_[HSD]$",

"^FSUBR_ZPm[IZ]_[HSD]",

"^FSUBR_ZPZI_UNDEF_[HSD]$")>;

"^FSUBR_(ZPZI|ZPZZ)_[HSD]")>;

// Floating point associative add, F16

def : InstRW<[V2Write_10cyc_1V1_9rc], (instrs FADDA_VPZ_H)>;

Context not available.

def : InstRW<[V2Write_4cyc_1V], (instrs FADDA_VPZ_D)>;

// Floating point compare

def : InstRW<[V2Write_2cyc_1V0], (instregex "^FACG[ET]_PPzZZ_[HSD]$",

def : InstRW<[V2Write_2cyc_1V0], (instregex "^FACG[ET]_PPzZZ_[HSD]",

"^FCM(EQ|GE|GT|NE)_PPzZ[0Z]_[HSD]$",

"^FCM(EQ|GE|GT|NE)_PPzZ[0Z]_[HSD]",

"^FCM(LE|LT)_PPzZ0_[HSD]$",

"^FCM(LE|LT)_PPzZ0_[HSD]",

"^FCMUO_PPzZZ_[HSD]$")>;

"^FCMUO_PPzZZ_[HSD]")>;

// Floating point complex add

def : InstRW<[V2Write_3cyc_1V], (instregex "^FCADD_ZPmZ_[HSD]$")>;

def : InstRW<[V2Write_3cyc_1V], (instregex "^FCADD_ZPmZ_[HSD]")>;

// Floating point complex multiply add

def : InstRW<[V2Wr_ZFCMA, ReadDefault, V2Rd_ZFCMA], (instregex "^FCMLA_ZPmZZ_[HSD]$")>;

def : InstRW<[V2Wr_ZFCMA, ReadDefault, V2Rd_ZFCMA], (instregex "^FCMLA_ZPmZZ_[HSD]")>;

def : InstRW<[V2Wr_ZFCMA, V2Rd_ZFCMA], (instregex "^FCMLA_ZZZI_[HS]$")>;

def : InstRW<[V2Wr_ZFCMA, V2Rd_ZFCMA], (instregex "^FCMLA_ZZZI_[HS]")>;

// Floating point convert, long or narrow (F16 to F32 or F32 to F16)

def : InstRW<[V2Write_4cyc_2V02], (instregex "^FCVT_ZPmZ_(HtoS|StoH)(_UNDEF)?$",

def : InstRW<[V2Write_4cyc_2V02], (instregex "^FCVT_ZPmZ_(HtoS|StoH)",

"^FCVTLT_ZPmZ_HtoS$",

"^FCVTLT_ZPmZ_HtoS",

"^FCVTNT_ZPmZ_StoH$")>;

"^FCVTNT_ZPmZ_StoH")>;

// Floating point convert, long or narrow (F16 to F64, F32 to F64, F64 to F32

// or F64 to F16)

def : InstRW<[V2Write_3cyc_1V02], (instregex "^FCVT_ZPmZ_(HtoD|StoD|DtoS|DtoH)(_UNDEF)?$",

def : InstRW<[V2Write_3cyc_1V02], (instregex "^FCVT_ZPmZ_(HtoD|StoD|DtoS|DtoH)",

"^FCVTLT_ZPmZ_StoD$",

"^FCVTLT_ZPmZ_StoD",

"^FCVTNT_ZPmZ_DtoS$")>;

"^FCVTNT_ZPmZ_DtoS")>;

// Floating point convert, round to odd

def : InstRW<[V2Write_3cyc_1V02], (instrs FCVTX_ZPmZ_DtoS, FCVTXNT_ZPmZ_DtoS)>;

// Floating point base2 log, F16

def : InstRW<[V2Write_6cyc_4V02], (instrs FLOGB_ZPmZ_H)>;

def : InstRW<[V2Write_6cyc_4V02], (instregex "^FLOGB_(ZPmZ|ZPZZ)_H")>;

// Floating point base2 log, F32

def : InstRW<[V2Write_4cyc_2V02], (instrs FLOGB_ZPmZ_S)>;

def : InstRW<[V2Write_4cyc_2V02], (instregex "^FLOGB_(ZPmZ|ZPZZ)_S")>;

// Floating point base2 log, F64

def : InstRW<[V2Write_3cyc_1V02], (instrs FLOGB_ZPmZ_D)>;

def : InstRW<[V2Write_3cyc_1V02], (instregex "^FLOGB_(ZPmZ|ZPZZ)_D")>;

// Floating point convert to integer, F16

def : InstRW<[V2Write_6cyc_4V02], (instregex "^FCVTZ[SU]_ZPmZ_HtoH(_UNDEF)?$")>;

def : InstRW<[V2Write_6cyc_4V02], (instregex "^FCVTZ[SU]_ZPmZ_HtoH")>;

// Floating point convert to integer, F32

def : InstRW<[V2Write_4cyc_2V02], (instregex "^FCVTZ[SU]_ZPmZ_(HtoS|StoS)(_UNDEF)?$")>;

def : InstRW<[V2Write_4cyc_2V02], (instregex "^FCVTZ[SU]_ZPmZ_(HtoS|StoS)")>;

// Floating point convert to integer, F64

def : InstRW<[V2Write_3cyc_1V02],

(instregex "^FCVTZ[SU]_ZPmZ_(HtoD|StoD|DtoS|DtoD)(_UNDEF)?$")>;

(instregex "^FCVTZ[SU]_ZPmZ_(HtoD|StoD|DtoS|DtoD)")>;

// Floating point copy

def : InstRW<[V2Write_2cyc_1V], (instregex "^FCPY_ZPmI_[HSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^FCPY_ZPmI_[HSD]",

"^FDUP_ZI_[HSD]$")>;

"^FDUP_ZI_[HSD]")>;

// Floating point divide, F16

def : InstRW<[V2Write_13cyc_1V02_12rc], (instregex "^FDIVR?_ZPmZ_H$",

def : InstRW<[V2Write_13cyc_1V02_12rc], (instregex "^FDIVR?_(ZPmZ|ZPZZ)_H")>;

"^FDIV_ZPZZ_UNDEF_H$")>;

// Floating point divide, F32

def : InstRW<[V2Write_10cyc_1V02_9rc], (instregex "^FDIVR?_ZPmZ_S$",

def : InstRW<[V2Write_10cyc_1V02_9rc], (instregex "^FDIVR?_(ZPmZ|ZPZZ)_S")>;

"^FDIV_ZPZZ_UNDEF_S$")>;

// Floating point divide, F64

def : InstRW<[V2Write_15cyc_1V02_14rc], (instregex "^FDIVR?_ZPmZ_D$",

def : InstRW<[V2Write_15cyc_1V02_14rc], (instregex "^FDIVR?_(ZPmZ|ZPZZ)_D")>;

"^FDIV_ZPZZ_UNDEF_D$")>;

// Floating point min/max pairwise

def : InstRW<[V2Write_2cyc_1V], (instregex "^F(MAX|MIN)(NM)?P_ZPmZZ_[HSD]$")>;

def : InstRW<[V2Write_2cyc_1V], (instregex "^F(MAX|MIN)(NM)?P_ZPmZZ_[HSD]")>;

// Floating point min/max

def : InstRW<[V2Write_2cyc_1V], (instregex "^F(MAX|MIN)(NM)?_ZPm[IZ]_[HSD]$",

def : InstRW<[V2Write_2cyc_1V], (instregex "^F(MAX|MIN)(NM)?_ZPm[IZ]_[HSD]",

"^F(MAX|MIN)(NM)?_ZPZ[IZ]_UNDEF_[HSD]$")>;

"^F(MAX|MIN)(NM)?_ZPZ[IZ]_[HSD]")>;

// Floating point multiply

def : InstRW<[V2Write_3cyc_1V], (instregex "^(FSCALE|FMULX)_ZPmZ_[HSD]$",

def : InstRW<[V2Write_3cyc_1V], (instregex "^(FSCALE|FMULX)_ZPmZ_[HSD]",

"^FMULX_ZPZZ_UNDEF_[HSD]$",

"^FMULX_ZPZZ_[HSD]",

"^FMUL_(ZPm[IZ]|ZZZI?)_[HSD]$",

"^FMUL_(ZPm[IZ]|ZZZI?)_[HSD]",

"^FMUL_ZPZ[IZ]_UNDEF_[HSD]$")>;

"^FMUL_ZPZ[IZ]_[HSD]")>;

// Floating point multiply accumulate

def : InstRW<[V2Wr_ZFMA, ReadDefault, V2Rd_ZFMA],

(instregex "^FN?ML[AS]_ZPmZZ_[HSD]$",

(instregex "^FN?ML[AS]_ZPmZZ_[HSD]",

"^FN?(MAD|MSB)_ZPmZZ_[HSD]$")>;

"^FN?(MAD|MSB)_ZPmZZ_[HSD]")>;

def : InstRW<[V2Wr_ZFMA, V2Rd_ZFMA],

(instregex "^FML[AS]_ZZZI_[HSD]$",

(instregex "^FML[AS]_ZZZI_[HSD]",

"^FN?ML[AS]_ZPZZZ_UNDEF_[HSD]$")>;

"^FN?ML[AS]_ZPZZZ_[HSD]")>;

// Floating point multiply add/sub accumulate long

def : InstRW<[V2Wr_ZFMAL, V2Rd_ZFMAL], (instregex "^FML[AS]L[BT]_ZZZI?_SHH$")>;

def : InstRW<[V2Wr_ZFMAL, V2Rd_ZFMAL], (instregex "^FML[AS]L[BT]_ZZZI?_SHH")>;

// Floating point reciprocal estimate, F16

def : InstRW<[V2Write_6cyc_4V02], (instrs FRECPE_ZZ_H, FRECPX_ZPmZ_H,

def : InstRW<[V2Write_6cyc_4V02], (instregex "^FR(ECP|SQRT)E_ZZ_H", "^FRECPX_ZPmZ_H")>;

FRSQRTE_ZZ_H, FRECPX_ZPmZ_UNDEF_H)>;

// Floating point reciprocal estimate, F32

def : InstRW<[V2Write_4cyc_2V02], (instrs FRECPE_ZZ_S, FRECPX_ZPmZ_S,

def : InstRW<[V2Write_4cyc_2V02], (instregex "^FR(ECP|SQRT)E_ZZ_S", "^FRECPX_ZPmZ_S")>;

FRSQRTE_ZZ_S, FRECPX_ZPmZ_UNDEF_S)>;

// Floating point reciprocal estimate, F64

def : InstRW<[V2Write_3cyc_1V02], (instrs FRECPE_ZZ_D, FRECPX_ZPmZ_D,

def : InstRW<[V2Write_3cyc_1V02], (instregex "^FR(ECP|SQRT)E_ZZ_D", "^FRECPX_ZPmZ_D")>;

rjjUnsubmitted

Done

Should still use the explicit form.

rjj: Should still use the explicit form.

FRSQRTE_ZZ_D, FRECPX_ZPmZ_UNDEF_D)>;

// Floating point reciprocal step

def : InstRW<[V2Write_4cyc_1V], (instregex "^F(RECPS|RSQRTS)_ZZZ_[HSD]$")>;

def : InstRW<[V2Write_4cyc_1V], (instregex "^F(RECPS|RSQRTS)_ZZZ_[HSD]")>;

rjjUnsubmitted

Done

Ditto.

rjj: Ditto.

// Floating point reduction, F16

def : InstRW<[V2Write_8cyc_4V],

(instregex "^(FADDV|FMAXNMV|FMAXV|FMINNMV|FMINV)_VPZ_H$")>;

(instregex "^(FADDV|FMAXNMV|FMAXV|FMINNMV|FMINV)_VPZ_H")>;

rjjUnsubmitted

Done

Ditto.

rjj: Ditto.

// Floating point reduction, F32

def : InstRW<[V2Write_6cyc_3V],

(instregex "^(FADDV|FMAXNMV|FMAXV|FMINNMV|FMINV)_VPZ_S$")>;

(instregex "^(FADDV|FMAXNMV|FMAXV|FMINNMV|FMINV)_VPZ_S")>;

// Floating point reduction, F64

def : InstRW<[V2Write_4cyc_2V],

(instregex "^(FADDV|FMAXNMV|FMAXV|FMINNMV|FMINV)_VPZ_D$")>;

(instregex "^(FADDV|FMAXNMV|FMAXV|FMINNMV|FMINV)_VPZ_D")>;

// Floating point round to integral, F16

def : InstRW<[V2Write_6cyc_4V02], (instregex "^FRINT[AIMNPXZ]_ZPmZ(_UNDEF)?_H$")>;

def : InstRW<[V2Write_6cyc_4V02], (instregex "^FRINT[AIMNPXZ]_ZPmZ_H")>;

// Floating point round to integral, F32

def : InstRW<[V2Write_4cyc_2V02], (instregex "^FRINT[AIMNPXZ]_ZPmZ(_UNDEF)?_S$")>;

def : InstRW<[V2Write_4cyc_2V02], (instregex "^FRINT[AIMNPXZ]_ZPmZ_S")>;

// Floating point round to integral, F64

def : InstRW<[V2Write_3cyc_1V02], (instregex "^FRINT[AIMNPXZ]_ZPmZ(_UNDEF)?_D$")>;

def : InstRW<[V2Write_3cyc_1V02], (instregex "^FRINT[AIMNPXZ]_ZPmZ_D")>;

// Floating point square root, F16

def : InstRW<[V2Write_13cyc_1V0_12rc], (instrs FSQRT_ZPmZ_H, FSQRT_ZPmZ_UNDEF_H)>;

def : InstRW<[V2Write_13cyc_1V0_12rc], (instregex "^FSQRT_ZPmZ_H", "^FSQRT_ZPmZ_H")>;

// Floating point square root, F32

def : InstRW<[V2Write_10cyc_1V0_9rc], (instrs FSQRT_ZPmZ_S, FSQRT_ZPmZ_UNDEF_S)>;

def : InstRW<[V2Write_10cyc_1V0_9rc], (instregex "^FSQRT_ZPmZ_S", "^FSQRT_ZPmZ_S")>;

// Floating point square root, F64

def : InstRW<[V2Write_16cyc_1V0_14rc], (instrs FSQRT_ZPmZ_D, FSQRT_ZPmZ_UNDEF_D)>;

def : InstRW<[V2Write_16cyc_1V0_14rc], (instregex "^FSQRT_ZPmZ_D", "^FSQRT_ZPmZ_D")>;

// Floating point trigonometric exponentiation

rjjUnsubmitted

Done

Should use the explicit form.

rjj: Should use the explicit form.

def : InstRW<[V2Write_3cyc_1V1], (instregex "^FEXPA_ZZ_[HSD]$")>;

def : InstRW<[V2Write_3cyc_1V1], (instregex "^FEXPA_ZZ_[HSD]")>;

// Floating point trigonometric multiply add

rjjUnsubmitted

Done

Ditto.

rjj: Ditto.

def : InstRW<[V2Write_4cyc_1V], (instregex "^FTMAD_ZZI_[HSD]$")>;

def : InstRW<[V2Write_4cyc_1V], (instregex "^FTMAD_ZZI_[HSD]")>;

// Floating point trigonometric, miscellaneous

rjjUnsubmitted

Done

Ditto.

rjj: Ditto.

def : InstRW<[V2Write_3cyc_1V], (instregex "^FTS(MUL|SEL)_ZZZ_[HSD]$")>;

def : InstRW<[V2Write_3cyc_1V], (instregex "^FTS(MUL|SEL)_ZZZ_[HSD]")>;

// SVE BFloat16 (BF16) instructions

// -----------------------------------------------------------------------------

Context not available.

def : InstRW<[V2Wr_ZBFMMA, V2Rd_ZBFMMA], (instrs BFMMLA_ZZZ)>;

// Multiply accumulate long

def : InstRW<[V2Wr_ZBFMAL, V2Rd_ZBFMAL], (instregex "^BFMLAL[BT]_ZZZI?$")>;

def : InstRW<[V2Wr_ZBFMAL, V2Rd_ZBFMAL], (instregex "^BFMLAL[BT]_ZZZI?")>;

// SVE Load instructions

// -----------------------------------------------------------------------------

Context not available.

llvm/unittests/Target/AArch64/AArch64SvePseudoTest.cpp

This file was added.

				#include "AArch64InstrInfo.h"
				#include "AArch64Subtarget.h"
				#include "AArch64TargetMachine.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/MC/TargetRegistry.h"
				#include "llvm/Support/TargetSelect.h"
				#include "llvm/Target/TargetMachine.h"
				#include "llvm/Target/TargetOptions.h"

				#include "gtest/gtest.h"

				using namespace llvm;
				namespace {
				std::unique_ptr<LLVMTargetMachine> createTargetMachine(const std::string &CPU) {
				auto TT(Triple::normalize("aarch64--"));

				LLVMInitializeAArch64TargetInfo();
				LLVMInitializeAArch64Target();
				LLVMInitializeAArch64TargetMC();

				std::string Error;
				const Target *TheTarget = TargetRegistry::lookupTarget(TT, Error);

				return std::unique_ptr<LLVMTargetMachine>(static_cast<LLVMTargetMachine *>(
				TheTarget->createTargetMachine(TT, CPU, "", TargetOptions(), std::nullopt,
				std::nullopt, CodeGenOpt::Default)));
				}

				std::unique_ptr<AArch64InstrInfo> createInstrInfo(TargetMachine *TM) {
				AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
				std::string(TM->getTargetCPU()),
				std::string(TM->getTargetFeatureString()), *TM,
				true);
				return std::make_unique<AArch64InstrInfo>(ST);
				}
				dmgreenUnsubmitted Done Reply Inline Actions false -> true dmgreen: false -> true

				void runSVEPseudoTestForCPU(const std::string &CPU) {

				std::unique_ptr<LLVMTargetMachine> TM = createTargetMachine(CPU);
				dmgreenUnsubmitted Done Reply Inline Actions Can you add CortexA510 to this test name. It would be good to make use of the test in other cpu's like N2 and V2 (in a future patch). Perhaps pass the CPU as a parameter to createTargetMachine too? And maybe make it clear that it is testing scheduling info. AArch64SVESchedPseudoTest or something similar? dmgreen: Can you add CortexA510 to this test name. It would be good to make use of the test in other…
				ASSERT_TRUE(TM);
				std::unique_ptr<AArch64InstrInfo> II = createInstrInfo(TM.get());
				ASSERT_TRUE(II);

				const MCSubtargetInfo *STI = TM->getMCSubtargetInfo();
				MCSchedModel SchedModel = STI->getSchedModel();

				for (unsigned i = 0; i < AArch64::INSTRUCTION_LIST_END; ++i) {
				// Check if instruction is in the pseudo table
				// i holds the opcode of the pseudo, OrigInstr holds the opcode of the
				// original instruction
				rjjUnsubmitted Done Reply Inline Actions nit: typo rjj: nit: typo
				int OrigInstr = AArch64::getSVEPseudoMap(i);
				if (OrigInstr == -1)
				dmgreenUnsubmitted Done Reply Inline Actions origInstr -> OrigInstr, as per the llvm coding standard in https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly. dmgreen: origInstr -> OrigInstr, as per the llvm coding standard in https://llvm.
				continue;

				const MCInstrDesc &Desc = II->get(i);
				unsigned SCClass = Desc.getSchedClass();
				const MCSchedClassDesc *SCDesc = SchedModel.getSchedClassDesc(SCClass);

				const MCInstrDesc &DescOrig = II->get(OrigInstr);
				unsigned SCClassOrig = DescOrig.getSchedClass();
				dmgreenUnsubmitted Done Reply Inline Actions Descorig -> DescOrig, same below. dmgreen: Descorig -> DescOrig, same below.
				const MCSchedClassDesc *SCDescOrig =
				SchedModel.getSchedClassDesc(SCClassOrig);

				int Latency = 0;
				int LatencyOrig = 0;

				for (unsigned DefIdx = 0, DefEnd = SCDesc->NumWriteLatencyEntries;
				DefIdx != DefEnd; ++DefIdx) {
				const MCWriteLatencyEntry *WLEntry =
				STI->getWriteLatencyEntry(SCDesc, DefIdx);
				const MCWriteLatencyEntry *WLEntryOrig =
				STI->getWriteLatencyEntry(SCDescOrig, DefIdx);
				Latency = std::max(Latency, static_cast<int>(WLEntry->Cycles));
				LatencyOrig = std::max(Latency, static_cast<int>(WLEntryOrig->Cycles));
				}

				ASSERT_EQ(Latency, LatencyOrig);
				ASSERT_TRUE(SCDesc->isValid());
				}

				}

				TEST(AArch64SVESchedPseudoTest, IsCorrect) {
				dmgreenUnsubmitted Not Done Reply Inline Actions Can you make this two tests, one for each CPU. It can make the tests easier to work with if they fail, where it is better to have each test more independant. The neoverse-v2 part may want to be moved to D154232 to keep that patch as an NFC. This patch could then be to improve A510. dmgreen: Can you make this two tests, one for each CPU. It can make the tests easier to work with if…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions @dmgreen - The AArch64SchedNeoverseV2.td changes are functional are they not? As in this patch is about adding scheduling information for SVE pseudo instructions, which coveres multiple scheduling models. Or have I misunderstood your comment? paulwalker-arm: @dmgreen - The AArch64SchedNeoverseV2.td changes are functional are they not? As in this patch…
				dmgreenUnsubmitted Not Done Reply Inline Actions The V2 scheduling model was already matching `(ABS\|CNOT\|NEG)_ZPmZ_UNDEF_[BHSD]$`. It needs to be changed to match `(ABS\|CNOT\|NEG)_ZPmZ_[BHSD]_UNDEF$` (or `(ABS\|CNOT\|NEG)_ZPmZ_[BHSD]` as is done here) in order to keep it matching the same instructions. I'm not sure if there were other missing UNDEF instructions? I might just include them in D154232 if they were, but it may be better to pull it out into a different patch. dmgreen: The V2 scheduling model was already matching `(ABS\|CNOT\|NEG)_ZPmZ_UNDEF_[BHSD]$`. It needs to…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions I see. Thanks. So I was thinking D154232 would just be a literal change to move the `UNDEF` matching to where it needs to be. Then this patch can unify the patterns as it's already doing, which would then catch any missing ones. paulwalker-arm: I see. Thanks. So I was thinking D154232 would just be a literal change to move the `UNDEF`…
				// TODO : Add more CPUs that support SVE/SVE2
				runSVEPseudoTestForCPU("cortex-a510");
				runSVEPseudoTestForCPU("neoverse-v2");
				}
				} // namespace

llvm/unittests/Target/AArch64/CMakeLists.txt

Context not available.
	InstSizes.cpp	InstSizes.cpp
	MatrixRegisterAliasing.cpp	MatrixRegisterAliasing.cpp
	SMEAttributesTest.cpp	SMEAttributesTest.cpp
		AArch64SvePseudoTest.cpp
	)	)

	set_property(TARGET AArch64Tests PROPERTY FOLDER "Tests/UnitTests/TargetTests")	set_property(TARGET AArch64Tests PROPERTY FOLDER "Tests/UnitTests/TargetTests")
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Modify SVE Pseudo appendsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 536303

llvm/lib/Target/AArch64/AArch64SchedA510.td

llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td

llvm/unittests/Target/AArch64/AArch64SvePseudoTest.cpp

llvm/unittests/Target/AArch64/CMakeLists.txt

[AArch64] Modify SVE Pseudo appends
ClosedPublic