Diff 313793

llvm/lib/Target/AArch64/AArch64SchedA57.td

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	def : SchedAlias<WriteAdr, A57Write_1cyc_1I>;			def : SchedAlias<WriteAdr, A57Write_1cyc_1I>;
	def : SchedAlias<WriteLDIdx, A57Write_4cyc_1I_1L>;			def : SchedAlias<WriteLDIdx, A57Write_4cyc_1I_1L>;
	def : SchedAlias<WriteSTIdx, A57Write_1cyc_1I_1S>;			def : SchedAlias<WriteSTIdx, A57Write_1cyc_1I_1S>;
	def : SchedAlias<WriteF, A57Write_3cyc_1V>;			def : SchedAlias<WriteF, A57Write_3cyc_1V>;
	def : SchedAlias<WriteFCmp, A57Write_3cyc_1V>;			def : SchedAlias<WriteFCmp, A57Write_3cyc_1V>;
	def : SchedAlias<WriteFCvt, A57Write_5cyc_1V>;			def : SchedAlias<WriteFCvt, A57Write_5cyc_1V>;
	def : SchedAlias<WriteFCopy, A57Write_5cyc_1L>;			def : SchedAlias<WriteFCopy, A57Write_5cyc_1L>;
	def : SchedAlias<WriteFImm, A57Write_3cyc_1V>;			def : SchedAlias<WriteFImm, A57Write_3cyc_1V>;
	def : SchedAlias<WriteFMul, A57Write_5cyc_1V>;			def : WriteRes<WriteFMul, [A57UnitV]> { let Latency = 5;}
				dmgreenUnsubmitted Done Reply Inline Actions Can you update this comment to why this is different, not why it has _changed_ (which doesn't mean a lot once the code is in-tree.) So something like "Use a WriteRes as opposed to SchedAlias for advance lookup" dmgreen: Can you update this comment to why this is different, not why it has _changed_ (which doesn't…
				mnadeemAuthorUnsubmitted Done Reply Inline Actions Removed the comment because the warning on line 68 explains the change. I felt that the comment was redundant. mnadeem: Removed the comment because the warning on line 68 explains the change. I felt that the comment…
	def : SchedAlias<WriteFDiv, A57Write_17cyc_1W>;			def : SchedAlias<WriteFDiv, A57Write_17cyc_1W>;
	def : SchedAlias<WriteV, A57Write_3cyc_1V>;			def : SchedAlias<WriteV, A57Write_3cyc_1V>;
	def : SchedAlias<WriteVLD, A57Write_5cyc_1L>;			def : SchedAlias<WriteVLD, A57Write_5cyc_1L>;
	def : SchedAlias<WriteVST, A57Write_1cyc_1S>;			def : SchedAlias<WriteVST, A57Write_1cyc_1S>;

	def : WriteRes<WriteAtomic, []> { let Unsupported = 1; }			def : WriteRes<WriteAtomic, []> { let Unsupported = 1; }

	def : WriteRes<WriteSys, []> { let Latency = 1; }			def : WriteRes<WriteSys, []> { let Latency = 1; }
	▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	// Reference for forms in this group			// Reference for forms in this group
	// D form - v8i8, v4i16, v2i32			// D form - v8i8, v4i16, v2i32
	// Q form - v16i8, v8i16, v4i32			// Q form - v16i8, v8i16, v4i32
	// D form - v1i8, v1i16, v1i32, v1i64			// D form - v1i8, v1i16, v1i32, v1i64
	// Q form - v16i8, v8i16, v4i32, v2i64			// Q form - v16i8, v8i16, v4i32, v2i64
	// D form - v8i8_v8i16, v4i16_v4i32, v2i32_v2i64			// D form - v8i8_v8i16, v4i16_v4i32, v2i32_v2i64
	// Q form - v16i8_v8i16, v8i16_v4i32, v4i32_v2i64			// Q form - v16i8_v8i16, v8i16_v4i32, v4i32_v2i64

				// Cortex A57 Software Optimization Guide Sec 3.14
				// Advance for absolute diff accum, pairwise add and accumulate, shift accumulate
				def A57ReadIVA3 : SchedReadAdvance<3, [A57Write_4cyc_1X_NonMul_Forward, A57Write_5cyc_2X_NonMul_Forward]>;

	// ASIMD absolute diff accum, D-form			// ASIMD absolute diff accum, D-form
	def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]ABA(v8i8\|v4i16\|v2i32)$")>;			def : InstRW<[A57Write_4cyc_1X_NonMul_Forward, A57ReadIVA3], (instregex "^[SU]ABA(v8i8\|v4i16\|v2i32)$")>;
	// ASIMD absolute diff accum, Q-form			// ASIMD absolute diff accum, Q-form
	def : InstRW<[A57Write_5cyc_2X], (instregex "^[SU]ABA(v16i8\|v8i16\|v4i32)$")>;			def : InstRW<[A57Write_5cyc_2X_NonMul_Forward, A57ReadIVA3], (instregex "^[SU]ABA(v16i8\|v8i16\|v4i32)$")>;
	// ASIMD absolute diff accum long			// ASIMD absolute diff accum long
	def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]ABAL")>;			def : InstRW<[A57Write_4cyc_1X_NonMul_Forward, A57ReadIVA3], (instregex "^[SU]ABAL")>;

	// ASIMD arith, reduce, 4H/4S			// ASIMD arith, reduce, 4H/4S
	def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]?ADDL?V(v8i8\|v4i16\|v2i32)v$")>;			def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]?ADDL?V(v8i8\|v4i16\|v2i32)v$")>;
	// ASIMD arith, reduce, 8B/8H			// ASIMD arith, reduce, 8B/8H
	def : InstRW<[A57Write_7cyc_1V_1X], (instregex "^[SU]?ADDL?V(v8i16\|v4i32)v$")>;			def : InstRW<[A57Write_7cyc_1V_1X], (instregex "^[SU]?ADDL?V(v8i16\|v4i32)v$")>;
	// ASIMD arith, reduce, 16B			// ASIMD arith, reduce, 16B
	def : InstRW<[A57Write_8cyc_2X], (instregex "^[SU]?ADDL?Vv16i8v$")>;			def : InstRW<[A57Write_8cyc_2X], (instregex "^[SU]?ADDL?Vv16i8v$")>;

	// ASIMD max/min, reduce, 4H/4S			// ASIMD max/min, reduce, 4H/4S
	def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU](MIN\|MAX)V(v4i16\|v4i32)v$")>;			def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU](MIN\|MAX)V(v4i16\|v4i32)v$")>;
	// ASIMD max/min, reduce, 8B/8H			// ASIMD max/min, reduce, 8B/8H
	def : InstRW<[A57Write_7cyc_1V_1X], (instregex "^[SU](MIN\|MAX)V(v8i8\|v8i16)v$")>;			def : InstRW<[A57Write_7cyc_1V_1X], (instregex "^[SU](MIN\|MAX)V(v8i8\|v8i16)v$")>;
	// ASIMD max/min, reduce, 16B			// ASIMD max/min, reduce, 16B
	def : InstRW<[A57Write_8cyc_2X], (instregex "^[SU](MIN\|MAX)Vv16i8v$")>;			def : InstRW<[A57Write_8cyc_2X], (instregex "^[SU](MIN\|MAX)Vv16i8v$")>;

	// ASIMD multiply, D-form			// ASIMD multiply, D-form
				mnadeemAuthorUnsubmitted Done Reply Inline Actions Divided these into two because pmul etc dont forward. mnadeem: Divided these into two because pmul etc dont forward.
	def : InstRW<[A57Write_5cyc_1W], (instregex "^(P?MUL\|SQR?DMULH)(v8i8\|v4i16\|v2i32\|v1i8\|v1i16\|v1i32\|v1i64)(_indexed)?$")>;			def : InstRW<[A57Write_5cyc_1W], (instregex "^(P?MUL\|SQR?DMULH)(v8i8\|v4i16\|v2i32\|v1i8\|v1i16\|v1i32\|v1i64)(_indexed)?$")>;
				dmgreenUnsubmitted Not Done Reply Inline Actions Do PMUL and sqdmulh have this forwarding? Same for other instructions like SQDMLAL below. dmgreen: Do PMUL and sqdmulh have this forwarding? Same for other instructions like SQDMLAL below.
				evgeny777Unsubmitted Not Done Reply Inline Actions @mnadeem It looks like they don't (at least PMUL). I've done some experiments with llvm-exegesis with following results: Latency of PMUL is 4 cycles, not 5 cycles There is always 4 cyc latency for PMUL result forwarded to MLA/MLS accumulator I've used Jetson Nano board for testing evgeny777: @mnadeem It looks like they don't (at least PMUL). I've done some experiments with llvm…
	// ASIMD multiply, Q-form			// ASIMD multiply, Q-form
	def : InstRW<[A57Write_6cyc_2W], (instregex "^(P?MUL\|SQR?DMULH)(v16i8\|v8i16\|v4i32)(_indexed)?$")>;			def : InstRW<[A57Write_6cyc_2W], (instregex "^(P?MUL\|SQR?DMULH)(v16i8\|v8i16\|v4i32)(_indexed)?$")>;

				// Cortex A57 Software Optimization Guide Sec 3.14
				def A57ReadIVMA4 : SchedReadAdvance<4 , [A57Write_5cyc_1W_Mul_Forward, A57Write_6cyc_2W_Mul_Forward]>;
				def A57ReadIVMA3 : SchedReadAdvance<3 , [A57Write_5cyc_1W_Mul_Forward, A57Write_6cyc_2W_Mul_Forward]>;

	// ASIMD multiply accumulate, D-form			// ASIMD multiply accumulate, D-form
	def : InstRW<[A57Write_5cyc_1W], (instregex "^ML[AS](v8i8\|v4i16\|v2i32)(_indexed)?$")>;			def : InstRW<[A57Write_5cyc_1W_Mul_Forward, A57ReadIVMA4], (instregex "^ML[AS](v8i8\|v4i16\|v2i32)(_indexed)?$")>;
	// ASIMD multiply accumulate, Q-form			// ASIMD multiply accumulate, Q-form
	def : InstRW<[A57Write_6cyc_2W], (instregex "^ML[AS](v16i8\|v8i16\|v4i32)(_indexed)?$")>;			def : InstRW<[A57Write_6cyc_2W_Mul_Forward, A57ReadIVMA4], (instregex "^ML[AS](v16i8\|v8i16\|v4i32)(_indexed)?$")>;

	// ASIMD multiply accumulate long			// ASIMD multiply accumulate long
	// ASIMD multiply accumulate saturating long			// ASIMD multiply accumulate saturating long
	def A57WriteIVMA : SchedWriteRes<[A57UnitW]> { let Latency = 5; }			def : InstRW<[A57Write_5cyc_1W_Mul_Forward, A57ReadIVMA4], (instregex "^(S\|U)ML[AS]L")>;
	def A57ReadIVMA4 : SchedReadAdvance<4, [A57WriteIVMA]>;			def : InstRW<[A57Write_5cyc_1W_Mul_Forward, A57ReadIVMA3], (instregex "^SQDML[AS]L")>;
	def : InstRW<[A57WriteIVMA, A57ReadIVMA4], (instregex "^(S\|U\|SQD)ML[AS]L")>;

	// ASIMD multiply long			// ASIMD multiply long
	def : InstRW<[A57Write_5cyc_1W], (instregex "^(S\|U\|SQD)MULL")>;			def : InstRW<[A57Write_5cyc_1W], (instregex "^(S\|U\|SQD)MULL")>;
	def : InstRW<[A57Write_5cyc_1W], (instregex "^PMULL(v8i8\|v16i8)")>;			def : InstRW<[A57Write_5cyc_1W], (instregex "^PMULL(v8i8\|v16i8)")>;
	def : InstRW<[A57Write_3cyc_1W], (instregex "^PMULL(v1i64\|v2i64)")>;			def : InstRW<[A57Write_3cyc_1W], (instregex "^PMULL(v1i64\|v2i64)")>;

	// ASIMD pairwise add and accumulate			// ASIMD pairwise add and accumulate
	// ASIMD shift accumulate			// ASIMD shift accumulate
	def A57WriteIVA : SchedWriteRes<[A57UnitX]> { let Latency = 4; }			def : InstRW<[A57Write_4cyc_1X_NonMul_Forward, A57ReadIVA3], (instregex "^[SU]ADALP")>;
	def A57ReadIVA3 : SchedReadAdvance<3, [A57WriteIVA]>;			def : InstRW<[A57Write_4cyc_1X_NonMul_Forward, A57ReadIVA3], (instregex "^(S\|SR\|U\|UR)SRA")>;
	def : InstRW<[A57WriteIVA, A57ReadIVA3], (instregex "^[SU]ADALP")>;
	def : InstRW<[A57WriteIVA, A57ReadIVA3], (instregex "^(S\|SR\|U\|UR)SRA")>;

	// ASIMD shift by immed, complex			// ASIMD shift by immed, complex
	def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]?(Q\|R){1,2}SHR")>;			def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU]?(Q\|R){1,2}SHR")>;
				dmgreenUnsubmitted Done Reply Inline Actions This may be worth splitting off the non-forwarding parts into it's own commit to keep the unrelated changes separate. dmgreen: This may be worth splitting off the non-forwarding parts into it's own commit to keep the…
	def : InstRW<[A57Write_4cyc_1X], (instregex "^SQSHLU")>;			def : InstRW<[A57Write_4cyc_1X], (instregex "^SQSHLU")>;


	// ASIMD shift by register, basic, Q-form			// ASIMD shift by register, basic, Q-form
				dmgreenUnsubmitted Done Reply Inline Actions What is a ^[SU]SHLv1i8 ? Same for all these others. dmgreen: What is a ^[SU]SHLv1i8 ? Same for all these others.
				mnadeemAuthorUnsubmitted Done Reply Inline Actions SSHL and USHL D-form. https://godbolt.org/z/a7jo7a The schedule was missing. These should only use the X Unit but seem to be using Units X/W mnadeem: SSHL and USHL D-form. https://godbolt.org/z/a7jo7a The schedule was missing. These should…
				dmgreenUnsubmitted Done Reply Inline Actions This can drop the v1iX (except i64). These are all the valid SSHL instructions, the bottom ones being the important ones. SSHLLB_ZZI_D = 4444, SSHLLB_ZZI_H = 4445, SSHLLB_ZZI_S = 4446, SSHLLT_ZZI_D = 4447, SSHLLT_ZZI_H = 4448, SSHLLT_ZZI_S = 4449, SSHLLv16i8_shift = 4450, SSHLLv2i32_shift = 4451, SSHLLv4i16_shift = 4452, SSHLLv4i32_shift = 4453, SSHLLv8i16_shift = 4454, SSHLLv8i8_shift = 4455, SSHLv16i8 = 4456, SSHLv1i64 = 4457, SSHLv2i32 = 4458, SSHLv2i64 = 4459, SSHLv4i16 = 4460, SSHLv4i32 = 4461, SSHLv8i16 = 4462, SSHLv8i8 = 4463, dmgreen: This can drop the v1iX (except i64). These are all the valid SSHL instructions, the bottom ones…
	def : InstRW<[A57Write_4cyc_2X], (instregex "^[SU]SHL(v16i8\|v8i16\|v4i32\|v2i64)")>;			def : InstRW<[A57Write_4cyc_2X], (instregex "^[SU]SHL(v16i8\|v8i16\|v4i32\|v2i64)")>;

	// ASIMD shift by register, complex, D-form			// ASIMD shift by register, complex, D-form
	def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU][QR]{1,2}SHL(v1i8\|v1i16\|v1i32\|v1i64\|v8i8\|v4i16\|v2i32\|b\|d\|h\|s)")>;			def : InstRW<[A57Write_4cyc_1X], (instregex "^[SU][QR]{1,2}SHL(v1i8\|v1i16\|v1i32\|v1i64\|v8i8\|v4i16\|v2i32\|b\|d\|h\|s)")>;

	// ASIMD shift by register, complex, Q-form			// ASIMD shift by register, complex, Q-form
	def : InstRW<[A57Write_5cyc_2X], (instregex "^[SU][QR]{1,2}SHL(v16i8\|v8i16\|v4i32\|v2i64)")>;			def : InstRW<[A57Write_5cyc_2X], (instregex "^[SU][QR]{1,2}SHL(v16i8\|v8i16\|v4i32\|v2i64)")>;

	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	// ASIMD FP max/min, pairwise, D-form			// ASIMD FP max/min, pairwise, D-form
	def : InstRW<[A57Write_5cyc_1V], (instregex "^(FMAX\|FMIN)(NM)?P(v2f32\|v2i32)")>;			def : InstRW<[A57Write_5cyc_1V], (instregex "^(FMAX\|FMIN)(NM)?P(v2f32\|v2i32)")>;
	// ASIMD FP max/min, pairwise, Q-form			// ASIMD FP max/min, pairwise, Q-form
	def : InstRW<[A57Write_9cyc_3V], (instregex "^(FMAX\|FMIN)(NM)?P(v4f32\|v2f64\|v2i64)")>;			def : InstRW<[A57Write_9cyc_3V], (instregex "^(FMAX\|FMIN)(NM)?P(v4f32\|v2f64\|v2i64)")>;
	// ASIMD FP max/min, reduce			// ASIMD FP max/min, reduce
	def : InstRW<[A57Write_10cyc_3V], (instregex "^(FMAX\|FMIN)(NM)?Vv")>;			def : InstRW<[A57Write_10cyc_3V], (instregex "^(FMAX\|FMIN)(NM)?Vv")>;

	// ASIMD FP multiply, D-form, FZ			// ASIMD FP multiply, D-form, FZ
	def : InstRW<[A57Write_5cyc_1V], (instregex "^FMULX?(v2f32\|v1i32\|v2i32\|v1i64\|32\|64)")>;			def : InstRW<[A57Write_5cyc_1V_FP_Forward], (instregex "^FMULX?(v2f32\|v1i32\|v2i32\|v1i64\|32\|64)")>;
	// ASIMD FP multiply, Q-form, FZ			// ASIMD FP multiply, Q-form, FZ
	def : InstRW<[A57Write_5cyc_2V], (instregex "^FMULX?(v4f32\|v2f64\|v4i32\|v2i64)")>;			def : InstRW<[A57Write_5cyc_2V_FP_Forward], (instregex "^FMULX?(v4f32\|v2f64\|v4i32\|v2i64)")>;

	// ASIMD FP multiply accumulate, D-form, FZ			// ASIMD FP multiply accumulate, D-form, FZ
	// ASIMD FP multiply accumulate, Q-form, FZ			// ASIMD FP multiply accumulate, Q-form, FZ
	def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; }			def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; }
	def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10; }			def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10; }
	def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>;
				// Cortex A57 Software Optimization Guide Sec 3.15
				// Advances from FP mul and mul-accum to mul-accum
				def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ, A57Write_5cyc_1V_FP_Forward, A57Write_5cyc_2V_FP_Forward]>;
				def A57ReadFPVMA6 : SchedReadAdvance<6, [A57WriteFPVMAD, A57WriteFPVMAQ, A57Write_5cyc_1V_FP_Forward, A57Write_5cyc_2V_FP_Forward]>;

	def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32\|v1i32\|v2i32\|v1i64)")>;			def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32\|v1i32\|v2i32\|v1i64)")>;
	def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex "^FML[AS](v4f32\|v2f64\|v4i32\|v2i64)")>;			def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA6], (instregex "^FML[AS](v4f32\|v2f64\|v4i32\|v2i64)")>;

	// ASIMD FP round, D-form			// ASIMD FP round, D-form
	def : InstRW<[A57Write_5cyc_1V], (instregex "^FRINT[AIMNPXZ](v2f32)")>;			def : InstRW<[A57Write_5cyc_1V], (instregex "^FRINT[AIMNPXZ](v2f32)")>;
	// ASIMD FP round, Q-form			// ASIMD FP round, Q-form
	def : InstRW<[A57Write_5cyc_2V], (instregex "^FRINT[AIMNPXZ](v4f32\|v2f64)")>;			def : InstRW<[A57Write_5cyc_2V], (instregex "^FRINT[AIMNPXZ](v4f32\|v2f64)")>;


	// Vector - Miscellaneous			// Vector - Miscellaneous
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	def : InstRW<[A57Write_6cyc_3V], (instregex "^(UZP\|ZIP)(1\|2)(v16i8\|v8i16\|v4i32\|v2i64)")>;			def : InstRW<[A57Write_6cyc_3V], (instregex "^(UZP\|ZIP)(1\|2)(v16i8\|v8i16\|v4i32\|v2i64)")>;


	// Remainder			// Remainder
	// -----------------------------------------------------------------------------			// -----------------------------------------------------------------------------

	def : InstRW<[A57Write_5cyc_1V], (instregex "^F(ADD\|SUB)[DS]rr")>;			def : InstRW<[A57Write_5cyc_1V], (instregex "^F(ADD\|SUB)[DS]rr")>;

				// Cortex A57 Software Optimization Guide Sec 3.10
	def A57WriteFPMA : SchedWriteRes<[A57UnitV]> { let Latency = 9; }			def A57WriteFPMA : SchedWriteRes<[A57UnitV]> { let Latency = 9; }
	def A57ReadFPMA5 : SchedReadAdvance<5, [A57WriteFPMA]>;			def A57ReadFPMA5 : SchedReadAdvance<5, [A57WriteFPMA, WriteFMul]>;
	def A57ReadFPM : SchedReadAdvance<0>;			def A57ReadFPM : SchedReadAdvance<0>;
	def : InstRW<[A57WriteFPMA, A57ReadFPM, A57ReadFPM, A57ReadFPMA5], (instregex "^FN?M(ADD\|SUB)[DS]rrr")>;			def : InstRW<[A57WriteFPMA, A57ReadFPM, A57ReadFPM, A57ReadFPMA5], (instregex "^FN?M(ADD\|SUB)[DS]rrr")>;

	def : InstRW<[A57Write_10cyc_1L_1V], (instregex "^[FSU]CVT[AMNPZ][SU](_Int)?[SU]?[XW]?[DS]?[rds]i?")>;			def : InstRW<[A57Write_10cyc_1L_1V], (instregex "^[FSU]CVT[AMNPZ][SU](_Int)?[SU]?[XW]?[DS]?[rds]i?")>;
	def : InstRW<[A57Write_10cyc_1L_1V], (instregex "^[SU]CVTF")>;			def : InstRW<[A57Write_10cyc_1L_1V], (instregex "^[SU]CVTF")>;

	def : InstRW<[A57Write_32cyc_1W], (instrs FDIVDrr)>;			def : InstRW<[A57Write_32cyc_1W], (instrs FDIVDrr)>;
	def : InstRW<[A57Write_17cyc_1W], (instrs FDIVSrr)>;			def : InstRW<[A57Write_17cyc_1W], (instrs FDIVSrr)>;
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SchedA57WriteRes.td

	//=- AArch64SchedA57WriteRes.td - ARM Cortex-A57 Write Res ---- tablegen --=//			//=- AArch64SchedA57WriteRes.td - ARM Cortex-A57 Write Res ---- tablegen --=//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Contains all of the Cortex-A57 specific SchedWriteRes types. The approach			// Contains all of the Cortex-A57 specific SchedWriteRes types. The approach
	// below is to define a generic SchedWriteRes for every combination of			// below is to define a generic SchedWriteRes for every combination of
	// latency and microOps. The naming conventions is to use a prefix, one field			// latency and microOps. The naming conventions is to use a prefix, one field
	// for latency, and one or more microOp count/type designators.			// for latency, and one or more microOp count/type designators.
	// Prefix: A57Write			// Prefix: A57Write
	// Latency: #cyc			// Latency: #cyc
	// MicroOp Count/Types: #(B\|I\|M\|L\|S\|X\|W\|V)			// MicroOp Count/Types: #(B\|I\|M\|L\|S\|X\|W\|V)
				// Postfix (optional): (XYZ)_Forward
				//
				// The postfix is added to differentiate SchedWriteRes that are used in
				// subsequent SchedReadAdvances.
	//			//
	// e.g. A57Write_6cyc_1I_6S_4V means the total latency is 6 and there are			// e.g. A57Write_6cyc_1I_6S_4V means the total latency is 6 and there are
	// 11 micro-ops to be issued down one I pipe, six S pipes and four V pipes.			// 11 micro-ops to be issued down one I pipe, six S pipes and four V pipes.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Define Generic 1 micro-op types			// Define Generic 1 micro-op types

	def A57Write_5cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 5; }			def A57Write_5cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 5; }
	def A57Write_5cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 5; }			def A57Write_5cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 5; }
	def A57Write_5cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 5; }			def A57Write_5cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 5; }
				def A57Write_5cyc_1V_FP_Forward : SchedWriteRes<[A57UnitV]> { let Latency = 5; }
	def A57Write_5cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 5; }			def A57Write_5cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 5; }
				def A57Write_5cyc_1W_Mul_Forward : SchedWriteRes<[A57UnitW]> { let Latency = 5; }
	def A57Write_10cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 10; }			def A57Write_10cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 10; }
	def A57Write_17cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 17;			def A57Write_17cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 17;
	let ResourceCycles = [17]; }			let ResourceCycles = [17]; }
	def A57Write_19cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 19;			def A57Write_19cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 19;
	let ResourceCycles = [19]; }			let ResourceCycles = [19]; }
	def A57Write_1cyc_1B : SchedWriteRes<[A57UnitB]> { let Latency = 1; }			def A57Write_1cyc_1B : SchedWriteRes<[A57UnitB]> { let Latency = 1; }
	def A57Write_1cyc_1I : SchedWriteRes<[A57UnitI]> { let Latency = 1; }			def A57Write_1cyc_1I : SchedWriteRes<[A57UnitI]> { let Latency = 1; }
	def A57Write_1cyc_1S : SchedWriteRes<[A57UnitS]> { let Latency = 1; }			def A57Write_1cyc_1S : SchedWriteRes<[A57UnitS]> { let Latency = 1; }
	def A57Write_2cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 2; }			def A57Write_2cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 2; }
	def A57Write_32cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 32;			def A57Write_32cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 32;
	let ResourceCycles = [32]; }			let ResourceCycles = [32]; }
	def A57Write_35cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 35;			def A57Write_35cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 35;
	let ResourceCycles = [35]; }			let ResourceCycles = [35]; }
	def A57Write_3cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 3; }			def A57Write_3cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 3; }
	def A57Write_3cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 3; }			def A57Write_3cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 3; }
	def A57Write_3cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 3; }			def A57Write_3cyc_1W : SchedWriteRes<[A57UnitW]> { let Latency = 3; }
	def A57Write_3cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 3; }			def A57Write_3cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 3; }
	def A57Write_4cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 4; }			def A57Write_4cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 4; }
	def A57Write_4cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 4; }			def A57Write_4cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 4; }
				dmgreenUnsubmitted Done Reply Inline Actions Where is this used? dmgreen: Where is this used?
				mnadeemAuthorUnsubmitted Done Reply Inline Actions Removed it. That was a mistake. mnadeem: Removed it. That was a mistake.
				def A57Write_4cyc_1X_NonMul_Forward : SchedWriteRes<[A57UnitX]> { let Latency = 4; }
	def A57Write_9cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 9; }			def A57Write_9cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 9; }
	def A57Write_6cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 6; }			def A57Write_6cyc_1M : SchedWriteRes<[A57UnitM]> { let Latency = 6; }
	def A57Write_6cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 6; }			def A57Write_6cyc_1V : SchedWriteRes<[A57UnitV]> { let Latency = 6; }


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Define Generic 2 micro-op types			// Define Generic 2 micro-op types

	Show All 32 Lines
	def A57Write_6cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {			def A57Write_6cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {
	let Latency = 6;			let Latency = 6;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	}			}
	def A57Write_6cyc_2W : SchedWriteRes<[A57UnitW, A57UnitW]> {			def A57Write_6cyc_2W : SchedWriteRes<[A57UnitW, A57UnitW]> {
	let Latency = 6;			let Latency = 6;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	}			}
				def A57Write_6cyc_2W_Mul_Forward : SchedWriteRes<[A57UnitW, A57UnitW]> {
				let Latency = 6;
				let NumMicroOps = 2;
				}
	def A57Write_5cyc_1I_1L : SchedWriteRes<[A57UnitI,			def A57Write_5cyc_1I_1L : SchedWriteRes<[A57UnitI,
	A57UnitL]> {			A57UnitL]> {
	let Latency = 5;			let Latency = 5;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	}			}
	def A57Write_5cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {			def A57Write_5cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {
	let Latency = 5;			let Latency = 5;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	}			}
				def A57Write_5cyc_2V_FP_Forward : SchedWriteRes<[A57UnitV, A57UnitV]> {
				let Latency = 5;
				let NumMicroOps = 2;
				}
	def A57Write_5cyc_2X : SchedWriteRes<[A57UnitX, A57UnitX]> {			def A57Write_5cyc_2X : SchedWriteRes<[A57UnitX, A57UnitX]> {
	let Latency = 5;			let Latency = 5;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	}			}
				def A57Write_5cyc_2X_NonMul_Forward : SchedWriteRes<[A57UnitX, A57UnitX]> {
				let Latency = 5;
				let NumMicroOps = 2;
				}
	def A57Write_10cyc_1L_1V : SchedWriteRes<[A57UnitL,			def A57Write_10cyc_1L_1V : SchedWriteRes<[A57UnitL,
	A57UnitV]> {			A57UnitV]> {
	let Latency = 10;			let Latency = 10;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	}			}
	def A57Write_10cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {			def A57Write_10cyc_2V : SchedWriteRes<[A57UnitV, A57UnitV]> {
	let Latency = 10;			let Latency = 10;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	▲ Show 20 Lines • Show All 427 Lines • Show Last 20 Lines

llvm/test/tools/llvm-mca/AArch64/Cortex/forwarding-A57.s

This file was added.

				# RUN: llvm-mca -march=aarch64 -mcpu=cortex-a57 -iterations=1 -timeline < %s \| FileCheck %s

				# CHECK: [0] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 12
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER .. fmul v0.2s, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] DeeeeeeeeeER fmla v0.2s, v1.2s, v2.2s

				# CHECK: [1] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 13
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER . . fmul v0.4s, v1.4s, v2.4s
				# CHECK-NEXT: [0,1] DeeeeeeeeeeER fmla v0.4s, v1.4s, v2.4s

				# CHECK: [2] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 12
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER .. fmulx v0.2s, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] DeeeeeeeeeER fmls v0.2s, v1.2s, v2.2s

				# CHECK: [3] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 13
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER . . fmulx v0.4s, v1.4s, v2.4s
				# CHECK-NEXT: [0,1] DeeeeeeeeeeER fmls v0.4s, v1.4s, v2.4s

				# CHECK: [4] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 16
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeeeeeER . fmla v0.2s, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D====eeeeeeeeeER fmla v0.2s, v3.2s, v4.2s

				# CHECK: [5] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 16
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeeeeeER . fmls v0.2s, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D====eeeeeeeeeER fmls v0.2s, v3.2s, v4.2s

				# CHECK: [6] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 12
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER .. fmul d4, d5, d6
				# CHECK-NEXT: [0,1] DeeeeeeeeeER fmadd d1, d2, d3, d4

				# CHECK: [7] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 12
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER .. fmul d4, d5, d6
				# CHECK-NEXT: [0,1] DeeeeeeeeeER fmadd d1, d2, d3, d4

				# CHECK: [8] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 16
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeeeeeER . fmadd d4, d5, d6, d7
				# CHECK-NEXT: [0,1] D====eeeeeeeeeER fmadd d1, d2, d3, d4

				# CHECK: [9] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 16
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeeeeeER . fmsub d4, d5, d6, d7
				# CHECK-NEXT: [0,1] D====eeeeeeeeeER fmsub d1, d2, d3, d4

				# CHECK: [10] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 16
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeeeeeER . fnmadd d4, d5, d6, d7
				# CHECK-NEXT: [0,1] D====eeeeeeeeeER fnmadd d1, d2, d3, d4

				# CHECK: [11] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 16
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeeeeeER . fnmsub d4, d5, d6, d7
				# CHECK-NEXT: [0,1] D====eeeeeeeeeER fnmsub d1, d2, d3, d4

				# CHECK: [12] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. saba v0.2s, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeER saba v0.2s, v3.2s, v4.2s

				# CHECK: [13] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. sabal v0.2d, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeER sabal v0.2d, v3.2s, v4.2s

				# CHECK: [14] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. uaba v0.2s, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeER uaba v0.2s, v3.2s, v4.2s

				# CHECK: [15] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. uabal v0.2d, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeER uabal v0.2d, v3.2s, v4.2s

				# CHECK: [16] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. sadalp v0.1d, v1.2s
				# CHECK-NEXT: [0,1] D=eeeeER sadalp v0.1d, v2.2s

				# CHECK: [17] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. uadalp v0.1d, v1.2s
				# CHECK-NEXT: [0,1] D=eeeeER uadalp v0.1d, v2.2s

				# CHECK: [18] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. srsra v0.8b, v1.8b, #3
				# CHECK-NEXT: [0,1] D=eeeeER srsra v0.8b, v2.8b, #3

				# CHECK: [19] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. ursra v0.8b, v1.8b, #3
				# CHECK-NEXT: [0,1] D=eeeeER ursra v0.8b, v2.8b, #3

				# CHECK: [20] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 8
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeER. usra v0.4s, v1.4s, #3
				# CHECK-NEXT: [0,1] D=eeeeER usra v0.4s, v2.4s, #3

				# CHECK: [21] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 9
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER. mla v0.2s, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeeER mla v0.2s, v1.2s, v2.2s

				# CHECK: [22] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 11
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeeER . mla v0.4s, v1.4s, v2.4s
				# CHECK-NEXT: [0,1] .D=eeeeeeER mla v0.4s, v1.4s, v2.4s

				# CHECK: [23] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 9
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER. mls v0.2s, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeeER mls v0.2s, v1.2s, v2.2s

				# CHECK: [24] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 11
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeeER . mls v0.4s, v1.4s, v2.4s
				# CHECK-NEXT: [0,1] .D=eeeeeeER mls v0.4s, v1.4s, v2.4s

				# CHECK: [25] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 9
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER. smlal v0.2d, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeeER smlal v0.2d, v1.2s, v2.2s

				# CHECK: [26] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 9
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER. smlsl v0.2d, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeeER smlsl v0.2d, v1.2s, v2.2s

				# CHECK: [27] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 9
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER. umlal v0.2d, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeeER umlal v0.2d, v1.2s, v2.2s

				# CHECK: [28] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 9
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER. umlsl v0.2d, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D=eeeeeER umlsl v0.2d, v1.2s, v2.2s

				# CHECK: [29] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 10
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER . sqdmlal v0.2d, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D==eeeeeER sqdmlal v0.2d, v1.2s, v2.2s

				# CHECK: [30] Code Region
				# CHECK: Instructions: 2
				# CHECK-NEXT: Total Cycles: 10
				# CHECK: Timeline view:
				# CHECK: [0,0] DeeeeeER . sqdmlsl v0.2d, v1.2s, v2.2s
				# CHECK-NEXT: [0,1] D==eeeeeER sqdmlsl v0.2d, v1.2s, v2.2s

				# ASIMD FP Instructions
				# FMUL, FMULX, FMLA, FMLS are impacted
				# testing only a subset of combinations
				# LLVM-MCA-BEGIN
				fmul v0.2s, v1.2s, v2.2s
				fmla v0.2s, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fmul v0.4s, v1.4s, v2.4s
				fmla v0.4s, v1.4s, v2.4s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fmulx v0.2s, v1.2s, v2.2s
				fmls v0.2s, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fmulx v0.4s, v1.4s, v2.4s
				fmls v0.4s, v1.4s, v2.4s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fmla v0.2s, v1.2s, v2.2s
				fmla v0.2s, v3.2s, v4.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fmls v0.2s, v1.2s, v2.2s
				fmls v0.2s, v3.2s, v4.2s
				# LLVM-MCA-END


				# FP Multiply Instructions
				# FMUL, FMUL, FNMUL, FMADD, FMSUB, FNMADD, FNMSUB are impacted
				# testing only a subset of combinations
				# LLVM-MCA-BEGIN
				fmul d4, d5, d6
				fmadd d1, d2, d3, d4
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fmul d4, d5, d6
				fmadd d1, d2, d3, d4
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fmadd d4, d5, d6, d7
				fmadd d1, d2, d3, d4
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fmsub d4, d5, d6, d7
				fmsub d1, d2, d3, d4
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fnmadd d4, d5, d6, d7
				fnmadd d1, d2, d3, d4
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				fnmsub d4, d5, d6, d7
				fnmsub d1, d2, d3, d4
				# LLVM-MCA-END



				# ASIMD Integer Instructions X-Unit
				# SABA, UABA, SABAL, UABAL, SADALP, UADALP, SRSRA, USRA, URSRA are impacted
				# testing only a subset of combinations

				# LLVM-MCA-BEGIN
				saba v0.2s, v1.2s, v2.2s
				saba v0.2s, v3.2s, v4.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				sabal v0.2d, v1.2s, v2.2s
				sabal v0.2d, v3.2s, v4.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				uaba v0.2s, v1.2s, v2.2s
				uaba v0.2s, v3.2s, v4.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				uabal v0.2d, v1.2s, v2.2s
				uabal v0.2d, v3.2s, v4.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				sadalp v0.1d, v1.2s
				sadalp v0.1d, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				uadalp v0.1d, v1.2s
				uadalp v0.1d, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				srsra v0.8b, v1.8b, #3
				srsra v0.8b, v2.8b, #3
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				ursra v0.8b, v1.8b, #3
				ursra v0.8b, v2.8b, #3
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				usra v0.4s, v1.4s, #3
				usra v0.4s, v2.4s, #3
				# LLVM-MCA-END


				# ASIMD Multiply Instructions X-Unit
				# MLA, MLS, SMLAL, SMLSL, UMLAL, UMLSL, SQDMLAL, SQDMLSL
				# are impacted testing only a subset of combinations

				# MLAs
				# LLVM-MCA-BEGIN
				mla v0.2s, v1.2s, v2.2s
				mla v0.2s, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				mla v0.4s, v1.4s, v2.4s
				mla v0.4s, v1.4s, v2.4s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				mls v0.2s, v1.2s, v2.2s
				mls v0.2s, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				mls v0.4s, v1.4s, v2.4s
				mls v0.4s, v1.4s, v2.4s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				smlal v0.2d, v1.2s, v2.2s
				smlal v0.2d, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				smlsl v0.2d, v1.2s, v2.2s
				smlsl v0.2d, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				umlal v0.2d, v1.2s, v2.2s
				umlal v0.2d, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				umlsl v0.2d, v1.2s, v2.2s
				umlsl v0.2d, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				sqdmlal v0.2d, v1.2s, v2.2s
				sqdmlal v0.2d, v1.2s, v2.2s
				# LLVM-MCA-END

				# LLVM-MCA-BEGIN
				sqdmlsl v0.2d, v1.2s, v2.2s
				sqdmlsl v0.2d, v1.2s, v2.2s
				# LLVM-MCA-END

This is an archive of the discontinued LLVM Phabricator instance.

[AARCH64] Improve accumulator forwarding for Cortex-A57 model
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 313793

llvm/lib/Target/AArch64/AArch64SchedA57.td

llvm/lib/Target/AArch64/AArch64SchedA57WriteRes.td

llvm/test/tools/llvm-mca/AArch64/Cortex/forwarding-A57.s

This is an archive of the discontinued LLVM Phabricator instance.

[AARCH64] Improve accumulator forwarding for Cortex-A57 modelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 313793

llvm/lib/Target/AArch64/AArch64SchedA57.td

llvm/lib/Target/AArch64/AArch64SchedA57WriteRes.td

llvm/test/tools/llvm-mca/AArch64/Cortex/forwarding-A57.s

[AARCH64] Improve accumulator forwarding for Cortex-A57 model
ClosedPublic