This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Refines the Cortex-A57 Machine Model
ClosedPublic

Authored by cestes on Sep 16 2014, 1:53 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy
atrick
Jiangning
apazos

Summary

The largest refinement is model the Cortex-A57 as an in-order
machine to help ensure that the maximum number of micro-ops are
issued into the out-of-order stages on every cycle. By modeling
it as a 3-wide in-order machine, it helps to ensure that each of
the 3 microps that are decoded and dispatched per cycle are able to
be issued immediately.

Secondly, a few advanced features are modeled, including forwarding
for MAC instructions and hazards for floating point SQRT and DIV.

Lastly, all of the instructions with inaccurate latency or micro
op information are refined to be as accurate as possible.
These refinements are largely for the NEON instructions.

Diff Detail

Event Timeline

cestes updated this revision to Diff 13763.Sep 16 2014, 1:53 PM

cestes retitled this revision from to [AArch64] Refines the Cortex-A57 Machine Model.

cestes updated this object.

cestes edited the test plan for this revision. (Show Details)

cestes added reviewers: atrick, apazos, jmolloy, t.p.northover.

Herald added subscribers: mcrosier, aemerson. · View Herald TranscriptSep 16 2014, 1:53 PM

mcrosier added a reviewer: Jiangning.Sep 16 2014, 2:01 PM

FWIW, Dave and I discussed this offline. IMO, this is actually three separate patches and should be committed as such. However, I told him to go ahead and post a single patch to simplify the review process. I'll defer to Andy/others to decided if they'd like to review separate patches, however.

Chad

Setting IssueWidth=3 is correct. That really means how many micro-ops can be "handled" per cycle. So it should be the minimum of decode/issue width. To be precise, we should have a decodeWidth that counts instructions, but I never bothered to add it since IssueWidth can serve the same purpose.

I don't think MinLatency is being used anymore by the generic machine scheduler. I made a note to rip that out.

MicroOpBufferSize determines in-order modeling of latency. It's your machine, so if you want to model it as in-order and get better results, then I can't argue!

You could go even further and model the in-order stalls on functional units that are not fully pipelined by setting BufferSize=0.
Note that you can have a mix of in-order/out-of-order resources if you choose.

You can also model just a certain class of instructions as having in-order latency by boosting MicroOpBufferSize and setting BufferSize=1. You can have a class of instructions consume multiple resources so you could model both in-order resource contention and latency.

Note that the idea behind modeling out-of-order is that we don't want an instruction issue limitation to be modeled as a hard stall that preempts all other heuristics. There are thresholds and heuristics that then come into play to try to balance resources. However, the default heuristics are very conservative, in the sense that the schedule is preserved unless we suspect a real stall (first do no harm). Given the scheduler only sees a single block, it often doesn't do anything to improve issue bandwidth on an aggressive OOO model. The scheduler could be improved by recognizing loops, inferring a steady cpu state and adjusting heuristics. I've added some loop awareness to the heuristics but it could be much better.

Since you have plenty of registers, scheduling in-order probably doesn't often hurt and is occasionally useful depending on how effective the hardware is at balancing instruction dispatch. You'll probably see a lot of unnecessary shuffling with in-order scheduling, but if you get better performance, then it's worth it.

One thing you will notice is that interdependent instructions will no longer be scheduled in the same 3-wide decoding group. Since we're not inserting nops, it's probably not a big deal though.

This revision is now accepted and ready to land.Sep 16 2014, 4:37 PM

Jiangning added inline comments.Sep 16 2014, 11:10 PM

lib/Target/AArch64/AArch64SchedA57.td
523	Where are FN?MUL[DS]rr ?

Hi Dave,

I tried your patch on ToT, and got the following result. (negative number is good).

spec.cpu2000.ref.175_vpr -1.10%
spec.cpu2000.ref.177_mesa -2.46%
spec.cpu2000.ref.179_art 1.96%
spec.cpu2000.ref.183_equake 4.30%
spec.cpu2000.ref.252_eon 2.06%
spec.cpu2000.ref.254_gap 1.59%
spec.cpu2000.ref.256_bzip2 1.49%
spec.cpu2000.ref.300_twolf 3.71%

Somehow we see regressions for spec2000.

Thanks,
-Jiangning

I'm seeing strong improvements for Spec2000 on device here, so I'll try ToT too and get to the bottom of this.

Thanks.

I tried your patch on ToT, and got the following result. (negative number is good).

spec.cpu2000.ref.175_vpr -1.10%
spec.cpu2000.ref.177_mesa -2.46%
spec.cpu2000.ref.179_art 1.96%
spec.cpu2000.ref.183_equake 4.30%
spec.cpu2000.ref.252_eon 2.06%
spec.cpu2000.ref.254_gap 1.59%
spec.cpu2000.ref.256_bzip2 1.49%
spec.cpu2000.ref.300_twolf 3.71%

lib/Target/AArch64/AArch64SchedA57.td
523	Thanks for the feedback, Jiangning. In this case, FN?MUL[DS]rr instructions don't have a specific InstRW, because their default WriteFMul has been mapped to the correct specific SchedWrite already, A57Write_5cyc_1V. I only use InstRWs to refine instructions that aren't correct with the default mappings.

Setting IssueWidth=3 is correct. That really means how many micro-ops can be "handled" per cycle. So it should be the minimum of decode/issue width. To be precise, we should have a decodeWidth that counts instructions, but I never bothered to add it since IssueWidth can serve the same purpose.

Thanks for the clarification.

MicroOpBufferSize determines in-order modeling of latency. It's your machine, so if you want to model it as in-order and get better results, then I can't argue!

You could go even further and model the in-order stalls on functional units that are not fully pipelined by setting BufferSize=0.
Note that you can have a mix of in-order/out-of-order resources if you choose.

I figured there was some tradeoffs with modeling purely in-order, but the gains were so broadly beneficial that it was a no brainer. I really want to do just this and model both the in-order and out-of-order portions of the pipelines for each instructions. It wasn't immediately obvious how to do it, so I temporarily shelved the idea. Might be a nice experiment for a proposed SchedMachineModel tutorial. :)

You can also model just a certain class of instructions as having in-order latency by boosting MicroOpBufferSize and setting BufferSize=1. You can have a class of instructions consume multiple resources so you could model both in-order resource contention and latency.

Note that the idea behind modeling out-of-order is that we don't want an instruction issue limitation to be modeled as a hard stall that preempts all other heuristics. There are thresholds and heuristics that then come into play to try to balance resources. However, the default heuristics are very conservative, in the sense that the schedule is preserved unless we suspect a real stall (first do no harm). Given the scheduler only sees a single block, it often doesn't do anything to improve issue bandwidth on an aggressive OOO model. The scheduler could be improved by recognizing loops, inferring a steady cpu state and adjusting heuristics. I've added some loop awareness to the heuristics but it could be much better.

I really like this idea of adjusting heuristics. Think this is something that PGO can also help with?

Since you have plenty of registers, scheduling in-order probably doesn't often hurt and is occasionally useful depending on how effective the hardware is at balancing instruction dispatch. You'll probably see a lot of unnecessary shuffling with in-order scheduling, but if you get better performance, then it's worth it.

One thing you will notice is that interdependent instructions will no longer be scheduled in the same 3-wide decoding group. Since we're not inserting nops, it's probably not a big deal though.

Thanks again for all of the clarification, Andy.

Jianging,

I did some more runs and I've got mixed news. Seems I've been a bit more
focused on this new model's gains over -mcpu=generic rather than using
the existing A57 model as a baseline. The reason was primarily because
our earlier testing showed the existing A57 model performing very
poorly. However, I re-did my runs using the existing A57 model as a
baseline and it actually performs really well. So that's the good news.
The mediocre news is that increasing the accuracy of the model has
merely shifted performance around and not actually increased it.

With that said, I'm going to do some more experimenting and then I'm
going to try to model the in-order and out-of-order resources
accordingly in a hope that I can capture the best gains. It might take a
bit of time, but I'll hopefully replace this patch with those efforts.

In the meantime, I might have some questions for you guys but I'll take
that chatter off-list.

-Dave

Hi Dave,

I’ve discovered that we should be running the FPLoadBalancing pass AFTER
the Post-RA scheduler. We aren’t, and I thought we were.

The FPLoadBalancing pass is sensitive to instruction order - a permutation
such as might be expected if the post-RA scheduler does its job could
cause worse performance.

I’ve looked into switching it to later but it exposed a couple of bugs, so
I’m working on fixing those first.

Cheers,
James

Update changes from 3-way issue in-order to 3-way issue out-of-
order.

All,

This new patchset moves the model back to out-of-order yet restricts the issue-width to the minimum of the actual issue width and dispatch width as Andy suggested. It brought the Spec2000/2006 numbers back up and even outperformed the original model by a few percent (geomean). It also improved the EEMBC numbers by a percent (geomean). I did see some degradation in individual tests, but nothing horrible. It will take some more detailed analysis to determine the cause there.

Jiangning,

In the meantime, if you can replicate the performance gain, then I'd like to move forward with this review, because the more accurate latency information will be key to future analysis and refinements.

Thanks...
-Dave

Dave,

I'm running benchmark and will let you know the result as soon as I got it.

Thanks,
-Jiangning

2014-09-24 22:47 GMT+08:00 Dave Estes <cestes@codeaurora.org>:

All,

This new patchset moves the model back to out-of-order yet restricts the
issue-width to the minimum of the actual issue width and dispatch width as
Andy suggested. It brought the Spec2000/2006 numbers back up and even
outperformed the original model by a few percent (geomean). It also
improved the EEMBC numbers by a percent (geomean). I did see some
degradation in individual tests, but nothing horrible. It will take some
more detailed analysis to determine the cause there.

Jiangning,

In the meantime, if you can replicate the performance gain, then I'd like
to move forward with this review, because the more accurate latency
information will be key to future analysis and refinements.

Thanks...
-Dave

http://reviews.llvm.org/D5372

Hi Dave,

The new version shows good potential, I think.

spec.cpu2000.ref.300_twolf -4.44%
spec.cpu2000.ref.175_vpr -2.58%
spec.cpu2000.ref.255_vortex -1.39%
spec.cpu2000.ref.254_gap 1.40%
spec.cpu2000.ref.183_equake 2.88%

Thanks,
-Jiangning

Excellent. Thanks, Jiangning. Your new numbers show a regressions in gap and equake. I'll try to get an equake number, but I do know that we're seeing ~1% gain. Interestingly enough one of the regressions that we're seeing is on twolf, but your device shows a gain. :) Despite the differences, I too think this latest patch looks like a good foundation for future work.

If I can get a fresh LGTM, I'll get it committed.

Committed as r218627.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64SchedA57.td

367 lines

AArch64SchedA57WriteRes.td

52 lines

Diff 14018

lib/Target/AArch64/AArch64SchedA57.td

	//=- AArch64SchedA57.td - ARM Cortex-A57 Scheduling Defs ------ tablegen --=//			//=- AArch64SchedA57.td - ARM Cortex-A57 Scheduling Defs ------ tablegen --=//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines the machine model for ARM Cortex-A57 to support			// This file defines the machine model for ARM Cortex-A57 to support
	// instruction scheduling and other instruction cost heuristics.			// instruction scheduling and other instruction cost heuristics.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				//===----------------------------------------------------------------------===//
				// The Cortex-A57 is a traditional superscaler microprocessor with a
				// conservative 3-wide in-order stage for decode and dispatch. Combined with the
				// much wider out-of-order issue stage, this produced a need to carefully
				// schedule micro-ops so that all three decoded each cycle are successfully
				// issued as the reservation station(s) simply don't stay occupied for long.
				// Therefore, IssueWidth is set to the narrower of the two at three, while still
				// modeling the machine as out-of-order.

	def CortexA57Model : SchedMachineModel {			def CortexA57Model : SchedMachineModel {
	let IssueWidth = 8; // 3-way decode and 8-way issue			let IssueWidth = 3; // 3-way decode and dispatch
	let MicroOpBufferSize = 128; // 128 micro-op re-order buffer			let MicroOpBufferSize = 128; // 128 micro-op re-order buffer
	let LoadLatency = 4; // Optimistic load latency			let LoadLatency = 4; // Optimistic load latency
	let MispredictPenalty = 14; // Fetch + Decode/Rename/Dispatch + Branch			let MispredictPenalty = 14; // Fetch + Decode/Rename/Dispatch + Branch
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Define each kind of processor resource and number available on Cortex-A57.			// Define each kind of processor resource and number available on Cortex-A57.
	// Cortex A-57 has 8 pipelines that each has its own 8-entry queue where			// Cortex A-57 has 8 pipelines that each has its own 8-entry queue where
	// micro-ops wait for their operands and then issue out-of-order.			// micro-ops wait for their operands and then issue out-of-order.

	def A57UnitB : ProcResource<1> { let BufferSize = 8; } // Type B micro-ops			def A57UnitB : ProcResource<1>; // Type B micro-ops
	def A57UnitI : ProcResource<2> { let BufferSize = 8; } // Type I micro-ops			def A57UnitI : ProcResource<2>; // Type I micro-ops
	def A57UnitM : ProcResource<1> { let BufferSize = 8; } // Type M micro-ops			def A57UnitM : ProcResource<1>; // Type M micro-ops
	def A57UnitL : ProcResource<1> { let BufferSize = 8; } // Type L micro-ops			def A57UnitL : ProcResource<1>; // Type L micro-ops
	def A57UnitS : ProcResource<1> { let BufferSize = 8; } // Type S micro-ops			def A57UnitS : ProcResource<1>; // Type S micro-ops
	def A57UnitX : ProcResource<1> { let BufferSize = 8; } // Type X micro-ops			def A57UnitX : ProcResource<1>; // Type X micro-ops
	def A57UnitW : ProcResource<1> { let BufferSize = 8; } // Type W micro-ops			def A57UnitW : ProcResource<1>; // Type W micro-ops
	let SchedModel = CortexA57Model in {			let SchedModel = CortexA57Model in {
	def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>; // Type V micro-ops			def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>; // Type V micro-ops
	}			}


	let SchedModel = CortexA57Model in {			let SchedModel = CortexA57Model in {

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Define customized scheduler read/write types specific to the Cortex-A57.			// Define customized scheduler read/write types specific to the Cortex-A57.

	include "AArch64SchedA57WriteRes.td"			include "AArch64SchedA57WriteRes.td"

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	Show All 19 Lines
	def : SchedAlias<WriteST, A57Write_1cyc_1S>;			def : SchedAlias<WriteST, A57Write_1cyc_1S>;
	def : SchedAlias<WriteSTP, A57Write_1cyc_1S>;			def : SchedAlias<WriteSTP, A57Write_1cyc_1S>;
	def : SchedAlias<WriteAdr, A57Write_1cyc_1I>;			def : SchedAlias<WriteAdr, A57Write_1cyc_1I>;
	def : SchedAlias<WriteLDIdx, A57Write_4cyc_1I_1L>;			def : SchedAlias<WriteLDIdx, A57Write_4cyc_1I_1L>;
	def : SchedAlias<WriteSTIdx, A57Write_1cyc_1I_1S>;			def : SchedAlias<WriteSTIdx, A57Write_1cyc_1I_1S>;
	def : SchedAlias<WriteF, A57Write_3cyc_1V>;			def : SchedAlias<WriteF, A57Write_3cyc_1V>;
	def : SchedAlias<WriteFCmp, A57Write_3cyc_1V>;			def : SchedAlias<WriteFCmp, A57Write_3cyc_1V>;
	def : SchedAlias<WriteFCvt, A57Write_5cyc_1V>;			def : SchedAlias<WriteFCvt, A57Write_5cyc_1V>;
	def : SchedAlias<WriteFCopy, A57Write_3cyc_1V>;			def : SchedAlias<WriteFCopy, A57Write_5cyc_1L>;
	def : SchedAlias<WriteFImm, A57Write_3cyc_1V>;			def : SchedAlias<WriteFImm, A57Write_3cyc_1V>;
	def : SchedAlias<WriteFMul, A57Write_5cyc_1V>;			def : SchedAlias<WriteFMul, A57Write_5cyc_1V>;
	def : SchedAlias<WriteFDiv, A57Write_18cyc_1X>;			def : SchedAlias<WriteFDiv, A57Write_18cyc_1X>;
	def : SchedAlias<WriteV, A57Write_3cyc_1V>;			def : SchedAlias<WriteV, A57Write_3cyc_1V>;
	def : SchedAlias<WriteVLD, A57Write_5cyc_1L>;			def : SchedAlias<WriteVLD, A57Write_5cyc_1L>;
	def : SchedAlias<WriteVST, A57Write_1cyc_1S>;			def : SchedAlias<WriteVST, A57Write_1cyc_1S>;

	def : WriteRes<WriteSys, []> { let Latency = 1; }			def : WriteRes<WriteSys, []> { let Latency = 1; }
	def : WriteRes<WriteBarrier, []> { let Latency = 1; }			def : WriteRes<WriteBarrier, []> { let Latency = 1; }
	def : WriteRes<WriteHint, []> { let Latency = 1; }			def : WriteRes<WriteHint, []> { let Latency = 1; }

	def : WriteRes<WriteLDHi, []> { let Latency = 4; }			def : WriteRes<WriteLDHi, []> { let Latency = 4; }

	// Forwarding logic is not [yet] explicitly modeled beyond what is captured			// Forwarding logic is only modeled for multiply and accumulate
	// in the latencies of the A57 Generic SchedWriteRes's.
	def : ReadAdvance<ReadI, 0>;			def : ReadAdvance<ReadI, 0>;
	def : ReadAdvance<ReadISReg, 0>;			def : ReadAdvance<ReadISReg, 0>;
	def : ReadAdvance<ReadIEReg, 0>;			def : ReadAdvance<ReadIEReg, 0>;
	def : ReadAdvance<ReadIM, 0>;			def : ReadAdvance<ReadIM, 0>;
	def : ReadAdvance<ReadIMA, 0>;			def : ReadAdvance<ReadIMA, 2, [WriteIM32, WriteIM64]>;
	def : ReadAdvance<ReadID, 0>;			def : ReadAdvance<ReadID, 0>;
	def : ReadAdvance<ReadExtrHi, 0>;			def : ReadAdvance<ReadExtrHi, 0>;
	def : ReadAdvance<ReadAdrBase, 0>;			def : ReadAdvance<ReadAdrBase, 0>;
	def : ReadAdvance<ReadVLD, 0>;			def : ReadAdvance<ReadVLD, 0>;


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Specialize the coarse model by associating instruction groups with the			// Specialize the coarse model by associating instruction groups with the
	Show All 26 Lines
	def : InstRW<[A57Write_1cyc_1I], (instrs EXTRWrri)>;			def : InstRW<[A57Write_1cyc_1I], (instrs EXTRWrri)>;
	def : InstRW<[A57Write_3cyc_1I_1M], (instrs EXTRXrri)>;			def : InstRW<[A57Write_3cyc_1I_1M], (instrs EXTRXrri)>;
	def : InstRW<[A57Write_2cyc_1M], (instregex "BFM")>;			def : InstRW<[A57Write_2cyc_1M], (instregex "BFM")>;


	// Cryptography Extensions			// Cryptography Extensions
	// -----------------------------------------------------------------------------			// -----------------------------------------------------------------------------

	def : InstRW<[A57Write_3cyc_1W], (instregex "CRC32")>;			def : InstRW<[A57Write_3cyc_1W], (instregex "^AES")>;
				def : InstRW<[A57Write_6cyc_2V], (instregex "^SHA1SU0")>;
				def : InstRW<[A57Write_3cyc_1W], (instregex "^SHA1(H\|SU1)")>;
				def : InstRW<[A57Write_6cyc_2W], (instregex "^SHA1[CMP]")>;
				def : InstRW<[A57Write_3cyc_1W], (instregex "^SHA256SU0")>;
				def : InstRW<[A57Write_6cyc_2W], (instregex "^SHA256(H\|H2\|SU1)")>;
				def : InstRW<[A57Write_3cyc_1W], (instregex "^CRC32")>;


	// Vector Load			// Vector Load
	// -----------------------------------------------------------------------------			// -----------------------------------------------------------------------------

	def : InstRW<[A57Write_8cyc_1L_1V], (instregex "LD1i(8\|16\|32)$")>;			def : InstRW<[A57Write_8cyc_1L_1V], (instregex "LD1i(8\|16\|32)$")>;
	def : InstRW<[A57Write_8cyc_1L_1V, WriteAdr], (instregex "LD1i(8\|16\|32)_POST$")>;			def : InstRW<[A57Write_8cyc_1L_1V, WriteAdr], (instregex "LD1i(8\|16\|32)_POST$")>;
	def : InstRW<[A57Write_5cyc_1L], (instregex "LD1i(64)$")>;			def : InstRW<[A57Write_5cyc_1L], (instregex "LD1i(64)$")>;
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines

	def : InstRW<[A57Write_4cyc_4S_2V], (instregex "ST4Fourv(8b\|4h\|2s)$")>;			def : InstRW<[A57Write_4cyc_4S_2V], (instregex "ST4Fourv(8b\|4h\|2s)$")>;
	def : InstRW<[A57Write_4cyc_4S_2V, WriteAdr], (instregex "ST4Fourv(8b\|4h\|2s)_POST$")>;			def : InstRW<[A57Write_4cyc_4S_2V, WriteAdr], (instregex "ST4Fourv(8b\|4h\|2s)_POST$")>;
	def : InstRW<[A57Write_8cyc_8S_4V], (instregex "ST4Fourv(16b\|8h\|4s)$")>;			def : InstRW<[A57Write_8cyc_8S_4V], (instregex "ST4Fourv(16b\|8h\|4s)$")>;
	def : InstRW<[A57Write_8cyc_8S_4V, WriteAdr], (instregex "ST4Fourv(16b\|8h\|4s)_POST$")>;			def : InstRW<[A57Write_8cyc_8S_4V, WriteAdr], (instregex "ST4Fourv(16b\|8h\|4s)_POST$")>;
	def : InstRW<[A57Write_8cyc_8S], (instregex "ST4Fourv(2d)$")>;			def : InstRW<[A57Write_8cyc_8S], (instregex "ST4Fourv(2d)$")>;
	def : InstRW<[A57Write_8cyc_8S, WriteAdr], (instregex "ST4Fourv(2d)_POST$")>;			def : InstRW<[A57Write_8cyc_8S, WriteAdr], (instregex "ST4Fourv(2d)_POST$")>;

				// Vector - Integer
				// -----------------------------------------------------------------------------

				// Reference for forms in this group
				// D form - v8i8, v4i16, v2i32
				// Q form - v16i8, v8i16, v4i32
				// D form - v1i8, v1i16, v1i32, v1i64
				// Q form - v16i8, v8i16, v4i32, v2i64
				// D form - v8i8_v8i16, v4i16_v4i32, v2i32_v2i64
				// Q form - v16i8_v8i16, v8i16_v4i32, v4i32_v2i64

				// ASIMD absolute diff accum, D-form
				def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]ABA(v8i8\|v4i16\|v2i32)$")>;
				// ASIMD absolute diff accum, Q-form
				def : InstRW<[A57Write_5cyc_2X], (instregex "^[SU]ABA(v16i8\|v8i16\|v4i32)$")>;
				// ASIMD absolute diff accum long
				def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]ABAL")>;

				// ASIMD arith, reduce, 4H/4S
				def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]?ADDL?V(v8i8\|v4i16\|v2i32)v$")>;
				// ASIMD arith, reduce, 8B/8H
				def : InstRW<[A57Write_7cyc_1V_1X], (instregex "^[SU]?ADDL?V(v8i16\|v4i32)v$")>;
				// ASIMD arith, reduce, 16B
				def : InstRW<[A57Write_8cyc_2X], (instregex "^[SU]?ADDL?Vv16i8v$")>;

				// ASIMD max/min, reduce, 4H/4S
				def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU](MIN\|MAX)V(v4i16\|v4i32)v$")>;
				// ASIMD max/min, reduce, 8B/8H
				def : InstRW<[A57Write_7cyc_1V_1X], (instregex "^[SU](MIN\|MAX)V(v8i8\|v8i16)v$")>;
				// ASIMD max/min, reduce, 16B
				def : InstRW<[A57Write_8cyc_2X], (instregex "^[SU](MIN\|MAX)Vv16i8v$")>;

				// ASIMD multiply, D-form
				def : InstRW<[A57Write_5cyc_1W], (instregex "^(P?MUL\|SQR?DMULH)(v8i8\|v4i16\|v2i32\|v1i8\|v1i16\|v1i32\|v1i64)(_indexed)?$")>;
				// ASIMD multiply, Q-form
				def : InstRW<[A57Write_6cyc_2W], (instregex "^(P?MUL\|SQR?DMULH)(v16i8\|v8i16\|v4i32)(_indexed)?$")>;

				// ASIMD multiply accumulate, D-form
				def : InstRW<[A57Write_5cyc_1W], (instregex "^ML[AS](v8i8\|v4i16\|v2i32)(_indexed)?$")>;
				// ASIMD multiply accumulate, Q-form
				def : InstRW<[A57Write_6cyc_2W], (instregex "^ML[AS](v16i8\|v8i16\|v4i32)(_indexed)?$")>;

				// ASIMD multiply accumulate long
				// ASIMD multiply accumulate saturating long
				def A57WriteIVMA : SchedWriteRes<[A57UnitW]> { let Latency = 5; }
				def A57ReadIVMA4 : SchedReadAdvance<4, [A57WriteIVMA]>;
				def : InstRW<[A57WriteIVMA, A57ReadIVMA4], (instregex "^(S\|U\|SQD)ML[AS]L")>;

				// ASIMD multiply long
				def : InstRW<[A57Write_5cyc_1W], (instregex "^(S\|U\|SQD)MULL")>;
				def : InstRW<[A57Write_5cyc_1W], (instregex "^PMULL(v8i8\|v16i8)")>;
				def : InstRW<[A57Write_3cyc_1W], (instregex "^PMULL(v1i64\|v2i64)")>;

				// ASIMD pairwise add and accumulate
				// ASIMD shift accumulate
				def A57WriteIVA : SchedWriteRes<[A57UnitX]> { let Latency = 4; }
				def A57ReadIVA3 : SchedReadAdvance<3, [A57WriteIVA]>;
				def : InstRW<[A57WriteIVA, A57ReadIVA3], (instregex "^[SU]ADALP")>;
				def : InstRW<[A57WriteIVA, A57ReadIVA3], (instregex "^(S\|SR\|U\|UR)SRA")>;

				// ASIMD shift by immed, complex
				def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]?(Q\|R){1,2}SHR")>;
				def : InstRW<[A57Write_4cyc_1X], (instregex "^SQSHLU")>;


				// ASIMD shift by register, basic, Q-form
				def : InstRW<[A57Write_4cyc_2X], (instregex "^[SU]SHL(v16i8\|v8i16\|v4i32\|v2i64)")>;

				// ASIMD shift by register, complex, D-form
				def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU][QR]{1,2}SHL(v1i8\|v1i16\|v1i32\|v1i64\|v8i8\|v4i16\|v2i32\|b\|d\|h\|s)")>;

				// ASIMD shift by register, complex, Q-form
				def : InstRW<[A57Write_5cyc_2X], (instregex "^[SU][QR]{1,2}SHL(v16i8\|v8i16\|v4i32\|v2i64)")>;


				// Vector - Floating Point
				// -----------------------------------------------------------------------------

				// Reference for forms in this group
				// D form - v2f32
				// Q form - v4f32, v2f64
				// D form - 32, 64
				// D form - v1i32, v1i64
				// D form - v2i32
				// Q form - v4i32, v2i64

				// ASIMD FP arith, normal, D-form
				def : InstRW<[A57Write_5cyc_1V], (instregex "^(FABD\|FADD\|FSUB)(v2f32\|32\|64\|v2i32p)")>;
				// ASIMD FP arith, normal, Q-form
				def : InstRW<[A57Write_5cyc_2V], (instregex "^(FABD\|FADD\|FSUB)(v4f32\|v2f64\|v2i64p)")>;

				// ASIMD FP arith, pairwise, D-form
				def : InstRW<[A57Write_5cyc_1V], (instregex "^FADDP(v2f32\|32\|64\|v2i32)")>;
				// ASIMD FP arith, pairwise, Q-form
				def : InstRW<[A57Write_9cyc_3V], (instregex "^FADDP(v4f32\|v2f64\|v2i64)")>;

				// ASIMD FP compare, D-form
				def : InstRW<[A57Write_5cyc_1V], (instregex "^(FACGE\|FACGT\|FCMEQ\|FCMGE\|FCMGT\|FCMLE\|FCMLT)(v2f32\|32\|64\|v1i32\|v2i32\|v1i64)")>;
				// ASIMD FP compare, Q-form
				def : InstRW<[A57Write_5cyc_2V], (instregex "^(FACGE\|FACGT\|FCMEQ\|FCMGE\|FCMGT\|FCMLE\|FCMLT)(v4f32\|v2f64\|v4i32\|v2i64)")>;

				// ASIMD FP convert, long and narrow
				def : InstRW<[A57Write_8cyc_3V], (instregex "^FCVT(L\|N\|XN)v")>;
				// ASIMD FP convert, other, D-form
				def : InstRW<[A57Write_5cyc_1V], (instregex "^[FVSU]CVT([AMNPZ][SU])?(_Int)?(v2f32\|v1i32\|v2i32\|v1i64)")>;
				// ASIMD FP convert, other, Q-form
				def : InstRW<[A57Write_5cyc_2V], (instregex "^[FVSU]CVT([AMNPZ][SU])?(_Int)?(v4f32\|v2f64\|v4i32\|v2i64)")>;

				// ASIMD FP divide, D-form, F32
				def : InstRW<[A57Write_18cyc_1X], (instregex "FDIVv2f32")>;
				// ASIMD FP divide, Q-form, F32
				def : InstRW<[A57Write_36cyc_2X], (instregex "FDIVv4f32")>;
				// ASIMD FP divide, Q-form, F64
				def : InstRW<[A57Write_64cyc_2X], (instregex "FDIVv2f64")>;

				// Note: These were simply duplicated from ASIMD FDIV because of missing documentation
				// ASIMD FP square root, D-form, F32
				def : InstRW<[A57Write_18cyc_1X], (instregex "FSQRTv2f32")>;
				// ASIMD FP square root, Q-form, F32
				def : InstRW<[A57Write_36cyc_2X], (instregex "FSQRTv4f32")>;
				// ASIMD FP square root, Q-form, F64
				def : InstRW<[A57Write_64cyc_2X], (instregex "FSQRTv2f64")>;

				// ASIMD FP max/min, normal, D-form
				def : InstRW<[A57Write_5cyc_1V], (instregex "^(FMAX\|FMIN)(NM)?(v2f32)")>;
				// ASIMD FP max/min, normal, Q-form
				def : InstRW<[A57Write_5cyc_2V], (instregex "^(FMAX\|FMIN)(NM)?(v4f32\|v2f64)")>;
				// ASIMD FP max/min, pairwise, D-form
				def : InstRW<[A57Write_5cyc_1V], (instregex "^(FMAX\|FMIN)(NM)?P(v2f32\|v2i32)")>;
				// ASIMD FP max/min, pairwise, Q-form
				def : InstRW<[A57Write_9cyc_3V], (instregex "^(FMAX\|FMIN)(NM)?P(v4f32\|v2f64\|v2i64)")>;
				// ASIMD FP max/min, reduce
				def : InstRW<[A57Write_10cyc_3V], (instregex "^(FMAX\|FMIN)(NM)?Vv")>;

				// ASIMD FP multiply, D-form, FZ
				def : InstRW<[A57Write_5cyc_1V], (instregex "^FMULX?(v2f32\|v1i32\|v2i32\|v1i64\|32\|64)")>;
				// ASIMD FP multiply, Q-form, FZ
				def : InstRW<[A57Write_5cyc_2V], (instregex "^FMULX?(v4f32\|v2f64\|v4i32\|v2i64)")>;

				// ASIMD FP multiply accumulate, D-form, FZ
				// ASIMD FP multiply accumulate, Q-form, FZ
				def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; }
				def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10; }
				def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>;
				def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32\|v1i32\|v2i32\|v1i64)")>;
				def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex "^FML[AS](v4f32\|v2f64\|v4i32\|v2i64)")>;

				// ASIMD FP round, D-form
				def : InstRW<[A57Write_5cyc_1V], (instregex "^FRINT[AIMNPXZ](v2f32)")>;
				// ASIMD FP round, Q-form
				def : InstRW<[A57Write_5cyc_2V], (instregex "^FRINT[AIMNPXZ](v4f32\|v2f64)")>;


				// Vector - Miscellaneous
				// -----------------------------------------------------------------------------

				// Reference for forms in this group
				// D form - v8i8, v4i16, v2i32
				// Q form - v16i8, v8i16, v4i32
				// D form - v1i8, v1i16, v1i32, v1i64
				// Q form - v16i8, v8i16, v4i32, v2i64

				// ASIMD bitwise insert, Q-form
				def : InstRW<[A57Write_3cyc_2V], (instregex "^(BIF\|BIT\|BSL)v16i8")>;

				// ASIMD duplicate, gen reg, D-form and Q-form
				def : InstRW<[A57Write_8cyc_1L_1V], (instregex "^CPY")>;
				def : InstRW<[A57Write_8cyc_1L_1V], (instregex "^DUPv.+gpr")>;

				// ASIMD move, saturating
				def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]QXTU?N")>;

				// ASIMD reciprocal estimate, D-form
				def : InstRW<[A57Write_5cyc_1V], (instregex "^[FU](RECP\|RSQRT)(E\|X)(v2f32\|v1i32\|v2i32\|v1i64)")>;
				// ASIMD reciprocal estimate, Q-form
				def : InstRW<[A57Write_5cyc_2V], (instregex "^[FU](RECP\|RSQRT)(E\|X)(v2f64\|v4f32\|v4i32)")>;

				// ASIMD reciprocal step, D-form, FZ
				def : InstRW<[A57Write_9cyc_1V], (instregex "^F(RECP\|RSQRT)S(v2f32\|v1i32\|v2i32\|v1i64\|32\|64)")>;
				// ASIMD reciprocal step, Q-form, FZ
				def : InstRW<[A57Write_9cyc_2V], (instregex "^F(RECP\|RSQRT)S(v2f64\|v4f32\|v4i32)")>;

				// ASIMD table lookup, D-form
				def : InstRW<[A57Write_3cyc_1V], (instregex "^TB[LX]v8i8One")>;
				def : InstRW<[A57Write_6cyc_2V], (instregex "^TB[LX]v8i8Two")>;
				def : InstRW<[A57Write_9cyc_3V], (instregex "^TB[LX]v8i8Three")>;
				def : InstRW<[A57Write_12cyc_4V], (instregex "^TB[LX]v8i8Four")>;
				// ASIMD table lookup, Q-form
				def : InstRW<[A57Write_6cyc_3V], (instregex "^TB[LX]v16i8One")>;
				def : InstRW<[A57Write_9cyc_5V], (instregex "^TB[LX]v16i8Two")>;
				def : InstRW<[A57Write_12cyc_7V], (instregex "^TB[LX]v16i8Three")>;
				def : InstRW<[A57Write_15cyc_9V], (instregex "^TB[LX]v16i8Four")>;

				// ASIMD transfer, element to gen reg
				def : InstRW<[A57Write_6cyc_1I_1L], (instregex "^[SU]MOVv")>;

				// ASIMD transfer, gen reg to element
				def : InstRW<[A57Write_8cyc_1L_1V], (instregex "^INSv")>;

				// ASIMD unzip/zip, Q-form
				def : InstRW<[A57Write_6cyc_3V], (instregex "^(UZP\|ZIP)(1\|2)(v16i8\|v8i16\|v4i32\|v2i64)")>;


				// Remainder
				// -----------------------------------------------------------------------------

				def : InstRW<[A57Write_5cyc_1V], (instregex "^F(ADD\|SUB)[DS]rr")>;
				JiangningUnsubmitted Not Done Reply Inline Actions Where are FN?MUL[DS]rr ? Jiangning: Where are FN?MUL[DS]rr ?
				cestesAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the feedback, Jiangning. In this case, FN?MUL[DS]rr instructions don't have a specific InstRW, because their default WriteFMul has been mapped to the correct specific SchedWrite already, A57Write_5cyc_1V. I only use InstRWs to refine instructions that aren't correct with the default mappings. cestes: Thanks for the feedback, Jiangning. In this case, FN?MUL[DS]rr instructions don't have a…

				def A57WriteFPMA : SchedWriteRes<[A57UnitV]> { let Latency = 9; }
				def A57ReadFPMA5 : SchedReadAdvance<5, [A57WriteFPMA]>;
				def A57ReadFPM : SchedReadAdvance<0>;
				def : InstRW<[A57WriteFPMA, A57ReadFPM, A57ReadFPM, A57ReadFPMA5], (instregex "^FN?M(ADD\|SUB)[DS]rrr")>;

				def : InstRW<[A57Write_10cyc_1L_1V], (instregex "^[FSU]CVT[AMNPZ][SU](_Int)?[SU]?[XW]?[DS]?[rds]i?")>;
				def : InstRW<[A57Write_10cyc_1L_1V], (instregex "^[SU]CVTF")>;

				def : InstRW<[A57Write_32cyc_1X], (instrs FDIVDrr)>;
				def : InstRW<[A57Write_18cyc_1X], (instrs FDIVSrr)>;

				def : InstRW<[A57Write_5cyc_1V], (instregex "^F(MAX\|MIN).+rr")>;

				def : InstRW<[A57Write_5cyc_1V], (instregex "^FRINT.+r")>;

				def : InstRW<[A57Write_32cyc_1X], (instrs FSQRTDr)>;
				def : InstRW<[A57Write_18cyc_1X], (instrs FSQRTSr)>;

				def : InstRW<[A57Write_5cyc_1L, WriteLDHi], (instrs LDNPDi)>;
				def : InstRW<[A57Write_6cyc_2L, WriteLDHi], (instrs LDNPQi)>;
				def : InstRW<[A57Write_5cyc_1L, WriteLDHi], (instrs LDNPSi)>;
				def : InstRW<[A57Write_5cyc_1L, WriteLDHi], (instrs LDPDi)>;
				def : InstRW<[A57Write_5cyc_1L, WriteLDHi, WriteAdr], (instrs LDPDpost)>;
				def : InstRW<[A57Write_5cyc_1L, WriteLDHi, WriteAdr], (instrs LDPDpre)>;
				def : InstRW<[A57Write_6cyc_2L, WriteLDHi], (instrs LDPQi)>;
				def : InstRW<[A57Write_6cyc_2L, WriteLDHi, WriteAdr], (instrs LDPQpost)>;
				def : InstRW<[A57Write_6cyc_2L, WriteLDHi, WriteAdr], (instrs LDPQpre)>;
				def : InstRW<[A57Write_5cyc_1I_2L, WriteLDHi], (instrs LDPSWi)>;
				def : InstRW<[A57Write_5cyc_1I_2L, WriteLDHi, WriteAdr], (instrs LDPSWpost)>;
				def : InstRW<[A57Write_5cyc_1I_2L, WriteLDHi, WriteAdr], (instrs LDPSWpre)>;
				def : InstRW<[A57Write_5cyc_1L, WriteLDHi], (instrs LDPSi)>;
				def : InstRW<[A57Write_5cyc_1L, WriteLDHi, WriteAdr], (instrs LDPSpost)>;
				def : InstRW<[A57Write_5cyc_1L, WriteLDHi, WriteAdr], (instrs LDPSpre)>;
				def : InstRW<[A57Write_5cyc_1L, WriteI], (instrs LDRBpost)>;
				def : InstRW<[A57Write_5cyc_1L, WriteAdr], (instrs LDRBpre)>;
				def : InstRW<[A57Write_5cyc_1L, ReadAdrBase], (instrs LDRBroW)>;
				def : InstRW<[A57Write_5cyc_1L, ReadAdrBase], (instrs LDRBroX)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDRBui)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDRDl)>;
				def : InstRW<[A57Write_5cyc_1L, WriteI], (instrs LDRDpost)>;
				def : InstRW<[A57Write_5cyc_1L, WriteAdr], (instrs LDRDpre)>;
				def : InstRW<[A57Write_5cyc_1L, ReadAdrBase], (instrs LDRDroW)>;
				def : InstRW<[A57Write_5cyc_1L, ReadAdrBase], (instrs LDRDroX)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDRDui)>;
				def : InstRW<[A57Write_5cyc_1I_1L, ReadAdrBase], (instrs LDRHHroW)>;
				def : InstRW<[A57Write_5cyc_1I_1L, ReadAdrBase], (instrs LDRHHroX)>;
				def : InstRW<[A57Write_5cyc_1L, WriteI], (instrs LDRHpost)>;
				def : InstRW<[A57Write_5cyc_1L, WriteAdr], (instrs LDRHpre)>;
				def : InstRW<[A57Write_6cyc_1I_1L, ReadAdrBase], (instrs LDRHroW)>;
				def : InstRW<[A57Write_6cyc_1I_1L, ReadAdrBase], (instrs LDRHroX)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDRHui)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDRQl)>;
				def : InstRW<[A57Write_5cyc_1L, WriteI], (instrs LDRQpost)>;
				def : InstRW<[A57Write_5cyc_1L, WriteAdr], (instrs LDRQpre)>;
				def : InstRW<[A57Write_6cyc_1I_1L, ReadAdrBase], (instrs LDRQroW)>;
				def : InstRW<[A57Write_6cyc_1I_1L, ReadAdrBase], (instrs LDRQroX)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDRQui)>;
				def : InstRW<[A57Write_5cyc_1I_1L, ReadAdrBase], (instrs LDRSHWroW)>;
				def : InstRW<[A57Write_5cyc_1I_1L, ReadAdrBase], (instrs LDRSHWroX)>;
				def : InstRW<[A57Write_5cyc_1I_1L, ReadAdrBase], (instrs LDRSHXroW)>;
				def : InstRW<[A57Write_5cyc_1I_1L, ReadAdrBase], (instrs LDRSHXroX)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDRSl)>;
				def : InstRW<[A57Write_5cyc_1L, WriteI], (instrs LDRSpost)>;
				def : InstRW<[A57Write_5cyc_1L, WriteAdr], (instrs LDRSpre)>;
				def : InstRW<[A57Write_5cyc_1L, ReadAdrBase], (instrs LDRSroW)>;
				def : InstRW<[A57Write_5cyc_1L, ReadAdrBase], (instrs LDRSroX)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDRSui)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDURBi)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDURDi)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDURHi)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDURQi)>;
				def : InstRW<[A57Write_5cyc_1L], (instrs LDURSi)>;

				def : InstRW<[A57Write_2cyc_2S], (instrs STNPDi)>;
				def : InstRW<[A57Write_4cyc_1I_4S], (instrs STNPQi)>;
				def : InstRW<[A57Write_2cyc_2S], (instrs STNPXi)>;
				def : InstRW<[A57Write_2cyc_2S], (instrs STPDi)>;
				def : InstRW<[WriteAdr, A57Write_2cyc_1I_2S], (instrs STPDpost)>;
				def : InstRW<[WriteAdr, A57Write_2cyc_1I_2S], (instrs STPDpre)>;
				def : InstRW<[A57Write_4cyc_1I_4S], (instrs STPQi)>;
				def : InstRW<[WriteAdr, A57Write_4cyc_1I_4S], (instrs STPQpost)>;
				def : InstRW<[WriteAdr, A57Write_4cyc_2I_4S], (instrs STPQpre)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S], (instrs STPSpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S], (instrs STPSpre)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S], (instrs STPWpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S], (instrs STPWpre)>;
				def : InstRW<[A57Write_2cyc_2S], (instrs STPXi)>;
				def : InstRW<[WriteAdr, A57Write_2cyc_1I_2S], (instrs STPXpost)>;
				def : InstRW<[WriteAdr, A57Write_2cyc_1I_2S], (instrs STPXpre)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRBBpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRBBpre)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRBpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S], (instrs STRBpre)>;
				def : InstRW<[A57Write_3cyc_1I_1S, ReadAdrBase], (instrs STRBroW)>;
				def : InstRW<[A57Write_3cyc_1I_1S, ReadAdrBase], (instrs STRBroX)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRDpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S], (instrs STRDpre)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRHHpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRHHpre)>;
				def : InstRW<[A57Write_3cyc_1I_1S, ReadAdrBase], (instrs STRHHroW)>;
				def : InstRW<[A57Write_3cyc_1I_1S, ReadAdrBase], (instrs STRHHroX)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRHpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S], (instrs STRHpre)>;
				def : InstRW<[A57Write_3cyc_1I_1S, ReadAdrBase], (instrs STRHroW)>;
				def : InstRW<[A57Write_3cyc_1I_1S, ReadAdrBase], (instrs STRHroX)>;
				def : InstRW<[WriteAdr, A57Write_2cyc_1I_2S, ReadAdrBase], (instrs STRQpost)>;
				def : InstRW<[WriteAdr, A57Write_2cyc_1I_2S], (instrs STRQpre)>;
				def : InstRW<[A57Write_2cyc_1I_2S, ReadAdrBase], (instrs STRQroW)>;
				def : InstRW<[A57Write_2cyc_1I_2S, ReadAdrBase], (instrs STRQroX)>;
				def : InstRW<[A57Write_2cyc_1I_2S], (instrs STRQui)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRSpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S], (instrs STRSpre)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRWpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRWpre)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRXpost)>;
				def : InstRW<[WriteAdr, A57Write_1cyc_1I_1S, ReadAdrBase], (instrs STRXpre)>;
				def : InstRW<[A57Write_2cyc_2S], (instrs STURQi)>;

	} // SchedModel = CortexA57Model			} // SchedModel = CortexA57Model

lib/Target/AArch64/AArch64SchedA57WriteRes.td

Show All 22 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Define Generic 1 micro-op types		// Define Generic 1 micro-op types

def A57Write_5cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 5; }		def A57Write_5cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 5; }
def A57Write_5cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 5; }		def A57Write_5cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 5; }
def A57Write_5cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 5; }		def A57Write_5cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 5; }
def A57Write_5cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 5; }		def A57Write_5cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 5; }
def A57Write_10cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 10; }		def A57Write_10cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 10; }
def A57Write_18cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 18; }		def A57Write_18cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 18;
def A57Write_19cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 19; }		let ResourceCycles = [18]; }
		def A57Write_19cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 19;
		let ResourceCycles = [19]; }
def A57Write_1cyc_1B : SchedWriteRes<[A57UnitB]> { let Latency = 1; }		def A57Write_1cyc_1B : SchedWriteRes<[A57UnitB]> { let Latency = 1; }
def A57Write_1cyc_1I : SchedWriteRes<[A57UnitI]> { let Latency = 1; }		def A57Write_1cyc_1I : SchedWriteRes<[A57UnitI]> { let Latency = 1; }
def A57Write_1cyc_1S : SchedWriteRes<[A57UnitS]> { let Latency = 1; }		def A57Write_1cyc_1S : SchedWriteRes<[A57UnitS]> { let Latency = 1; }
def A57Write_2cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 2; }		def A57Write_2cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 2; }
def A57Write_32cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 32; }		def A57Write_32cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 32;
def A57Write_35cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 35; }		let ResourceCycles = [32]; }
		def A57Write_35cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 35;
		let ResourceCycles = [35]; }
def A57Write_3cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 3; }		def A57Write_3cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 3; }
def A57Write_3cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 3; }		def A57Write_3cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 3; }
def A57Write_3cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 3; }		def A57Write_3cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 3; }
def A57Write_3cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 3; }		def A57Write_3cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 3; }
def A57Write_4cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 4; }		def A57Write_4cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 4; }
def A57Write_4cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 4; }		def A57Write_4cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 4; }
def A57Write_9cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 9; }		def A57Write_9cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 9; }
def A57Write_6cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 6; }		def A57Write_6cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 6; }
def A57Write_6cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 6; }		def A57Write_6cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 6; }


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Define Generic 2 micro-op types		// Define Generic 2 micro-op types

def A57Write_64cyc_2X : SchedWriteRes<[A57UnitX, A57UnitX]> {		def A57Write_64cyc_2X : SchedWriteRes<[A57UnitX, A57UnitX]> {
let Latency = 64;		let Latency = 64;
let NumMicroOps = 2;		let NumMicroOps = 2;
		let ResourceCycles = [32, 32];
}		}
def A57Write_6cyc_1I_1L : SchedWriteRes<[A57UnitI,		def A57Write_6cyc_1I_1L : SchedWriteRes<[A57UnitI,
A57UnitL]> {		A57UnitL]> {
let Latency = 6;		let Latency = 6;
let NumMicroOps = 2;		let NumMicroOps = 2;
}		}
def A57Write_7cyc_1V_1X : SchedWriteRes<[A57UnitV,		def A57Write_7cyc_1V_1X : SchedWriteRes<[A57UnitV,
A57UnitX]> {		A57UnitX]> {
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
}		}
def A57Write_2cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {		def A57Write_2cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {
let Latency = 2;		let Latency = 2;
let NumMicroOps = 2;		let NumMicroOps = 2;
}		}
def A57Write_36cyc_2X : SchedWriteRes<[A57UnitX, A57UnitX]> {		def A57Write_36cyc_2X : SchedWriteRes<[A57UnitX, A57UnitX]> {
let Latency = 36;		let Latency = 36;
let NumMicroOps = 2;		let NumMicroOps = 2;
		let ResourceCycles = [18, 18];
}		}
def A57Write_3cyc_1I_1M : SchedWriteRes<[A57UnitI,		def A57Write_3cyc_1I_1M : SchedWriteRes<[A57UnitI,
A57UnitM]> {		A57UnitM]> {
let Latency = 3;		let Latency = 3;
let NumMicroOps = 2;		let NumMicroOps = 2;
}		}
def A57Write_3cyc_1I_1S : SchedWriteRes<[A57UnitI,		def A57Write_3cyc_1I_1S : SchedWriteRes<[A57UnitI,
A57UnitS]> {		A57UnitS]> {
let Latency = 3;		let Latency = 3;
let NumMicroOps = 2;		let NumMicroOps = 2;
}		}
def A57Write_3cyc_1S_1V : SchedWriteRes<[A57UnitS,		def A57Write_3cyc_1S_1V : SchedWriteRes<[A57UnitS,
A57UnitV]> {		A57UnitV]> {
let Latency = 3;		let Latency = 3;
let NumMicroOps = 2;		let NumMicroOps = 2;
}		}
		def A57Write_3cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {
		let Latency = 3;
		let NumMicroOps = 2;
		}
def A57Write_4cyc_1I_1L : SchedWriteRes<[A57UnitI,		def A57Write_4cyc_1I_1L : SchedWriteRes<[A57UnitI,
A57UnitL]> {		A57UnitL]> {
let Latency = 4;		let Latency = 4;
let NumMicroOps = 2;		let NumMicroOps = 2;
}		}
def A57Write_4cyc_2X : SchedWriteRes<[A57UnitX, A57UnitX]> {		def A57Write_4cyc_2X : SchedWriteRes<[A57UnitX, A57UnitX]> {
let Latency = 4;		let Latency = 4;
let NumMicroOps = 2;		let NumMicroOps = 2;
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	def A57Write_9cyc_2L_2V : SchedWriteRes<[A57UnitL, A57UnitL,
let Latency = 9;		let Latency = 9;
let NumMicroOps = 4;		let NumMicroOps = 4;
}		}
def A57Write_9cyc_1L_3V : SchedWriteRes<[A57UnitL,		def A57Write_9cyc_1L_3V : SchedWriteRes<[A57UnitL,
A57UnitV, A57UnitV, A57UnitV]> {		A57UnitV, A57UnitV, A57UnitV]> {
let Latency = 9;		let Latency = 9;
let NumMicroOps = 4;		let NumMicroOps = 4;
}		}
		def A57Write_12cyc_4V : SchedWriteRes<[A57UnitV, A57UnitV,
		A57UnitV, A57UnitV]> {
		let Latency = 12;
		let NumMicroOps = 4;
		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Define Generic 5 micro-op types		// Define Generic 5 micro-op types

def A57Write_3cyc_3S_2V : SchedWriteRes<[A57UnitS, A57UnitS, A57UnitS,		def A57Write_3cyc_3S_2V : SchedWriteRes<[A57UnitS, A57UnitS, A57UnitS,
A57UnitV, A57UnitV]> {		A57UnitV, A57UnitV]> {
let Latency = 3;		let Latency = 3;
Show All 23 Lines	def A57Write_9cyc_1I_1L_3V : SchedWriteRes<[A57UnitI,
let Latency = 9;		let Latency = 9;
let NumMicroOps = 5;		let NumMicroOps = 5;
}		}
def A57Write_9cyc_2L_3V : SchedWriteRes<[A57UnitL, A57UnitL,		def A57Write_9cyc_2L_3V : SchedWriteRes<[A57UnitL, A57UnitL,
A57UnitV, A57UnitV, A57UnitV]> {		A57UnitV, A57UnitV, A57UnitV]> {
let Latency = 9;		let Latency = 9;
let NumMicroOps = 5;		let NumMicroOps = 5;
}		}
		def A57Write_9cyc_5V : SchedWriteRes<[A57UnitV, A57UnitV, A57UnitV,
		A57UnitV, A57UnitV]> {
		let Latency = 9;
		let NumMicroOps = 5;
		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Define Generic 6 micro-op types		// Define Generic 6 micro-op types

def A57Write_3cyc_1I_3S_2V : SchedWriteRes<[A57UnitI,		def A57Write_3cyc_1I_3S_2V : SchedWriteRes<[A57UnitI,
A57UnitS, A57UnitS, A57UnitS,		A57UnitS, A57UnitS, A57UnitS,
A57UnitV, A57UnitV]> {		A57UnitV, A57UnitV]> {
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
}		}
def A57Write_4cyc_1I_4S_2V : SchedWriteRes<[A57UnitI,		def A57Write_4cyc_1I_4S_2V : SchedWriteRes<[A57UnitI,
A57UnitS, A57UnitS,		A57UnitS, A57UnitS,
A57UnitS, A57UnitS,		A57UnitS, A57UnitS,
A57UnitV, A57UnitV]> {		A57UnitV, A57UnitV]> {
let Latency = 4;		let Latency = 4;
let NumMicroOps = 7;		let NumMicroOps = 7;
}		}
def A57Write_6cyc_1I_6S : SchedWriteRes<[A57UnitI,		def A57Write_6cyc_1I_6S : SchedWriteRes<[A57UnitI,
A57UnitS, A57UnitS, A57UnitS,		A57UnitS, A57UnitS, A57UnitS,
A57UnitS, A57UnitS, A57UnitS]> {		A57UnitS, A57UnitS, A57UnitS]> {
let Latency = 6;		let Latency = 6;
let NumMicroOps = 7;		let NumMicroOps = 7;
}		}
def A57Write_9cyc_1I_2L_4V : SchedWriteRes<[A57UnitI,		def A57Write_9cyc_1I_2L_4V : SchedWriteRes<[A57UnitI,
A57UnitL, A57UnitL,		A57UnitL, A57UnitL,
A57UnitV, A57UnitV,		A57UnitV, A57UnitV,
A57UnitV, A57UnitV]> {		A57UnitV, A57UnitV]> {
let Latency = 9;		let Latency = 9;
let NumMicroOps = 7;		let NumMicroOps = 7;
}		}
		def A57Write_12cyc_7V : SchedWriteRes<[A57UnitV, A57UnitV, A57UnitV,
		A57UnitV, A57UnitV,
		A57UnitV, A57UnitV]> {
		let Latency = 12;
		let NumMicroOps = 7;
		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Define Generic 8 micro-op types		// Define Generic 8 micro-op types

def A57Write_10cyc_1I_3L_4V : SchedWriteRes<[A57UnitI,		def A57Write_10cyc_1I_3L_4V : SchedWriteRes<[A57UnitI,
A57UnitL, A57UnitL, A57UnitL,		A57UnitL, A57UnitL, A57UnitL,
A57UnitV, A57UnitV,		A57UnitV, A57UnitV,
Show All 15 Lines	def A57Write_8cyc_8S : SchedWriteRes<[A57UnitS, A57UnitS,
let Latency = 8;		let Latency = 8;
let NumMicroOps = 8;		let NumMicroOps = 8;
}		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Define Generic 9 micro-op types		// Define Generic 9 micro-op types

def A57Write_8cyc_1I_8S : SchedWriteRes<[A57UnitI,		def A57Write_8cyc_1I_8S : SchedWriteRes<[A57UnitI,
A57UnitS, A57UnitS,		A57UnitS, A57UnitS,
A57UnitS, A57UnitS,		A57UnitS, A57UnitS,
A57UnitS, A57UnitS,		A57UnitS, A57UnitS,
A57UnitS, A57UnitS]> {		A57UnitS, A57UnitS]> {
let Latency = 8;		let Latency = 8;
let NumMicroOps = 9;		let NumMicroOps = 9;
}		}
def A57Write_11cyc_1I_4L_4V : SchedWriteRes<[A57UnitI,		def A57Write_11cyc_1I_4L_4V : SchedWriteRes<[A57UnitI,
A57UnitL, A57UnitL,		A57UnitL, A57UnitL,
A57UnitL, A57UnitL,		A57UnitL, A57UnitL,
A57UnitV, A57UnitV,		A57UnitV, A57UnitV,
A57UnitV, A57UnitV]> {		A57UnitV, A57UnitV]> {
let Latency = 11;		let Latency = 11;
let NumMicroOps = 9;		let NumMicroOps = 9;
}		}
		def A57Write_15cyc_9V : SchedWriteRes<[A57UnitV, A57UnitV, A57UnitV,
		A57UnitV, A57UnitV, A57UnitV,
		A57UnitV, A57UnitV, A57UnitV]> {
		let Latency = 15;
		let NumMicroOps = 9;
		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Define Generic 10 micro-op types		// Define Generic 10 micro-op types

def A57Write_6cyc_6S_4V : SchedWriteRes<[A57UnitS, A57UnitS, A57UnitS,		def A57Write_6cyc_6S_4V : SchedWriteRes<[A57UnitS, A57UnitS, A57UnitS,
A57UnitS, A57UnitS, A57UnitS,		A57UnitS, A57UnitS, A57UnitS,
A57UnitV, A57UnitV,		A57UnitV, A57UnitV,
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Refines the Cortex-A57 Machine ModelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 14018

lib/Target/AArch64/AArch64SchedA57.td

lib/Target/AArch64/AArch64SchedA57WriteRes.td

[AArch64] Refines the Cortex-A57 Machine Model
ClosedPublic