Download Raw Diff

Details

Reviewers

andreadb
dmgreen
evgeny777

Commits

rGf08a2fc09e75: [MCA] Add tests for IPC on Cortex-A55

Summary

The tests compare IPC statistics that MCA provides with IPC values
measured on Cortex-A55 hardware. For hardware tests, each snippet is
run in a loop unrolled by 1000, and IPC is measured by linux-perf.

Several tests do not match the hardware: the skewed ALU is not
supported, LDR seem to be missing a forwarding path.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

asavonic created this revision.Mar 8 2021, 5:48 AM

Herald added a subscriber: gbedwell. · View Herald TranscriptMar 8 2021, 5:48 AM

asavonic requested review of this revision.Mar 8 2021, 5:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2021, 5:48 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

andreadb added inline comments.Mar 8 2021, 6:50 AM

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-0.s
1 ↗	(On Diff #328983)	Do you actually need --dispatch-stats? If the goal of these tests is to simply check the IPC, then you should be able to simply pass flags `--all-views=false -summary-view`. I also suggest to pass all these tests through the update_mca python script. If you only enable the summary-view, then the number of checks in the output will be very small.
llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-4.s
6–7 ↗	(On Diff #328983)	Not sure if it might help in this case, but in general I recommend to have a look at whether some operand constraints might be defined using MCSchedPredicate defs in SchedWriteVariant.

Harbormaster completed remote builds in B92643: Diff 328983.Mar 8 2021, 7:09 AM

Any chance we can use more descriptive testcases names? :-)

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-4.s
6–7 ↗	(On Diff #328983)	Is this something that is fixable in llvm? I thought it would not, in general, know the input value for an instruction.

RKSimon added a subscriber: RKSimon.Mar 9 2021, 1:15 AM

RKSimon added inline comments.

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-4.s
6–7 ↗	(On Diff #328983)	You might be able to do something with valuetracking - if you know the upper bits are zero/signsplat etc. But I don't think we have any knownbits/signbits support this late on in the compile.

andreadb added inline comments.Mar 9 2021, 1:44 AM

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-4.s
6–7 ↗	(On Diff #328983)	Yeah, I don’t think that there is anything that we can do about that specifically.

Renamed test files
Replaced "--dispatch-stats" with " --all-views=false --summary-view"

In D98174#2611655, @dmgreen wrote:

Any chance we can use more descriptive testcases names? :-)

Done :)

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-0.s
1 ↗	(On Diff #328983)	Do you actually need --dispatch-stats? If the goal of these tests is to simply check the IPC, then you should be able to simply pass flags `--all-views=false -summary-view`. Thanks. I replaced --dispatch-stats with --summary-view. I also suggest to pass all these tests through the update_mca python script. If you only enable the summary-view, then the number of checks in the output will be very small. I cannot figure out to how to combine both the reference checks and the checks from update_mca. They match the same lines (IPC: X.XX), so we cannot use different check prefixes: # RUN: llvm-mca $(args) < %s \| FileCheck %s --check-prefixes=CHECK,CHECK-IPC # CHECK-IPC: IPC: # CHECK-IPC-SAME: 1.00 add w8, w8, #1 # CHECK: Iterations: 1000 <-- this check does not match, scan continues from IPC: 1.00 # CHECK-NEXT: Instructions: 1000 I tried to add two FileCheck calls, but update_mca cannot handle this: # RUN: llvm-mca $(args) < %s > %t.log # RUN: FileCheck %s < %t.log # RUN: FileCheck %s --check-prefix CHECK-IPC < %t.log update_mca_test_checks.py:93: Warning: could not split tool and filecheck commands

One last question.
What is the plan with the SDIV test cases? I don't think that there is anything that we can do to improve that simulation, since it would require knowledge that isn't available at simulation time. The risk is to end up with a test which isn't very useful in practice (it will always be marked as XFAIL). In which case, I suggest to remove those DIV tests entirely.

This revision is now accepted and ready to land.Mar 9 2021, 6:18 AM

Ops.. I have accidentally "accepted" this patch.
Basically it LGTM if you remove the SDIV tests for now as I don't think that there is value in having them (unless you prove me wrong).

+1. LGTM, but it may be odd to have a test we don't think will ever be fixed in-tree.

In D98174#2613867, @andreadb wrote:

One last question.
What is the plan with the SDIV test cases? I don't think that there is anything that we can do to improve that simulation, since it would require knowledge that isn't available at simulation time. The risk is to end up with a test which isn't very useful in practice (it will always be marked as XFAIL). In which case, I suggest to remove those DIV tests entirely.

They are not very useful as "tests", I agree, but they can be useful as a documentation,
highlighting the cases where MCA and hardware do not match. Although there is no point
in having two tests for the same issue, so we can remove one of them.

gbedwell added inline comments.Mar 9 2021, 7:08 AM

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-0.s
1 ↗	(On Diff #328983)	For the non-xfailing tests the auto-generated checks should serve the purpose as they will explicitly contain the line: # CHECK-NEXT: IPC: 1.00 For the XFAIL tests, I think you'd have to do the trick of splitting the llvm-mca and FileCheck lines as above, in order to prevent the script from overwriting the checks. The advantage of the update script is that it automates the process for anyone running the script across (for example) the entire Aarch64 directory tree after making some change, but probably no big deal either way.

In D98174#2613940, @asavonic wrote:

In D98174#2613867, @andreadb wrote:

One last question.
What is the plan with the SDIV test cases? I don't think that there is anything that we can do to improve that simulation, since it would require knowledge that isn't available at simulation time. The risk is to end up with a test which isn't very useful in practice (it will always be marked as XFAIL). In which case, I suggest to remove those DIV tests entirely.

They are not very useful as "tests", I agree, but they can be useful as a documentation,
highlighting the cases where MCA and hardware do not match. Although there is no point
in having two tests for the same issue, so we can remove one of them.

True.
However, to be fair, the "issue" (so to say) is in the write definition from the Cortex-A55 scheduling model.
So, the scheduling model file is probably a better place where to document the issue about the DIV latency.
That being said, I don't really have a strong opinion on this, so it is fine by me if you want to still keep one of those tests.

Harbormaster completed remote builds in B92843: Diff 329284.Mar 9 2021, 11:25 AM

Adjusted SDIV operands to match the average latency specified in the model.
Added FP tests.
Added a test for instructions with OOO write and retire.

For the non XFAIL tests, I suggest to use the usual python script to auto generate CHECK directives.

Most of those tests are run with flags --all-views=false --summary-view, so the mca output is already minimal.
Consequently, the number of CHECK lines generated would be very small.

Harbormaster completed remote builds in B95877: Diff 333556.Mar 26 2021, 9:02 AM

Enabled auto-generated checks for all tests except the XFAIL'ed ones.

Harbormaster completed remote builds in B97554: Diff 335873.Apr 7 2021, 12:00 PM

LGTM

Closed by commit rGf08a2fc09e75: [MCA] Add tests for IPC on Cortex-A55 (authored by asavonic). · Explain WhyApr 8 2021, 9:40 AM

This revision was automatically updated to reflect the committed changes.

asavonic added a commit: rGf08a2fc09e75: [MCA] Add tests for IPC on Cortex-A55.

Diff 336151

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-0-single-add.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				add w8, w8, #1

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 1000
				# CHECK-NEXT: Total Cycles: 1003
				# CHECK-NEXT: Total uOps: 1000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 1.00
				# CHECK-NEXT: IPC: 1.00
				# CHECK-NEXT: Block RThroughput: 0.5

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-1-add-seq.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				add w8, w8, #1
				add w9, w9, #1

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 2000
				# CHECK-NEXT: Total Cycles: 1003
				# CHECK-NEXT: Total uOps: 2000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 1.99
				# CHECK-NEXT: IPC: 1.99
				# CHECK-NEXT: Block RThroughput: 1.0

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-10-fma.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				fmadd s3, s5, s6, s7
				fmadd s8, s9, s10, s11

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 2000
				# CHECK-NEXT: Total Cycles: 1004
				# CHECK-NEXT: Total uOps: 2000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 1.99
				# CHECK-NEXT: IPC: 1.99
				# CHECK-NEXT: Block RThroughput: 1.0

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-11-fma-mix.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				# FMADD writes and retires out-of-order
				fmadd s3, s5, s6, s7
				# ADD instructions are issued and retire in-order
				add w8, w8, #1
				add w9, w9, #1
				add w10, w10, #1

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 4000
				# CHECK-NEXT: Total Cycles: 2003
				# CHECK-NEXT: Total uOps: 4000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 2.00
				# CHECK-NEXT: IPC: 2.00
				# CHECK-NEXT: Block RThroughput: 2.0

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-2-skewed-alu.s

This file was added.

				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s
				# CHECK: IPC:
				# CHECK-SAME: 2.00
				#
				# XFAIL: *
				#
				# Cortex-A55 has a secondary skewed ALU in the Ex1 stage for simple
				# ALU instructions that do not require shifting or saturation
				# resources. Results from the skewed ALU are available 1 cycle earlier.
				#
				# This features allows the first and the second instruction to be
				# dual-issued despite a register dependency (w8).
				#
				# MCA and LLVM scheduling model do not support this yet.

				add w8, w8, #1
				add w10, w8, #1
				add w12, w8, #1

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-3-mul.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				add w8, w8, #1
				add w12, w8, #1
				mul w10, w10, w10

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 3000
				# CHECK-NEXT: Total Cycles: 3003
				# CHECK-NEXT: Total uOps: 3000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 1.00
				# CHECK-NEXT: IPC: 1.00
				# CHECK-NEXT: Block RThroughput: 1.5

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-4-sdiv.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				# DIV is not modeled precisely: on hardware it takes variable
				# number of cycles depending on its operands, but LLVM scheduling
				# model only provides an average latency.

				add w8, w8, #1
				movz w10, #1, lsl #16
				movz w12, #32768, lsl #16
				sdiv w10, w12, w10

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 4000
				# CHECK-NEXT: Total Cycles: 8004
				# CHECK-NEXT: Total uOps: 4000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 0.50
				# CHECK-NEXT: IPC: 0.50
				# CHECK-NEXT: Block RThroughput: 8.0

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-5-mul-sdiv.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				# DIV is not modeled precisely: on hardware it takes variable
				# number of cycles depending on its operands. LLVM scheduling model
				# only provides an average latency.

				add w8, w8, #1
				movz w10, #1, lsl #16
				movz w12, #32768, lsl #16
				mul w11, w8, w8
				sdiv w10, w12, w10

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 5000
				# CHECK-NEXT: Total Cycles: 8004
				# CHECK-NEXT: Total uOps: 5000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 0.62
				# CHECK-NEXT: IPC: 0.62
				# CHECK-NEXT: Block RThroughput: 8.0

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-6-mul.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				# It appears that ADD and MUL fuse together, if both can be issued in
				# one cycle:
				#
				# add w12, w8, #1
				# mul w10, w12, w10
				#
				# FIXME: MCA (and LLVM scheduling model) do not support this. The test
				# case uses different registers to break the pattern.

				add w8, w8, #1
				add w13, w8, #1
				mul w10, w12, w10

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 3000
				# CHECK-NEXT: Total Cycles: 3003
				# CHECK-NEXT: Total uOps: 3000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 1.00
				# CHECK-NEXT: IPC: 1.00
				# CHECK-NEXT: Block RThroughput: 1.5

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-7-cmp.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				add w8, w8, #1
				add w12, w9, #1
				cmp w9, #42
				mul w10, w12, w10

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 4000
				# CHECK-NEXT: Total Cycles: 3004
				# CHECK-NEXT: Total uOps: 4000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 1.33
				# CHECK-NEXT: IPC: 1.33
				# CHECK-NEXT: Block RThroughput: 2.0

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-8-ldr.s

This file was added.

				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s
				# CHECK: IPC:
				# CHECK-SAME: 1.50
				#
				# XFAIL: *
				#
				# MCA reports IPC = 0.60, while hardware shows IPC = 1.50.
				#
				# 1) The skewed ALU on Cortex-A55 is not modeled: ADD and AND
				# instructions should be issued in the same cycle.
				# See A55-2.s test for more details.
				#
				# 2) Cortex-A55 manual mentions that there is a forwarding path from
				# the ALU pipeline to the LD/ST pipeline. This is not implemented in
				# the LLVM scheduling model.

				add w8, w8, #1
				and w12, w8, #0x3f
				ldr w14, [x10, w12, uxtw #2]

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-9-fabs.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views=false --summary-view --iterations=1000 < %s \| FileCheck %s

				fabs s0, s1
				fabs s2, s3

				# CHECK: Iterations: 1000
				# CHECK-NEXT: Instructions: 2000
				# CHECK-NEXT: Total Cycles: 1004
				# CHECK-NEXT: Total uOps: 2000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 1.99
				# CHECK-NEXT: IPC: 1.99
				# CHECK-NEXT: Block RThroughput: 1.0

This is an archive of the discontinued LLVM Phabricator instance.

[MCA] Add tests for IPC on Cortex-A55
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 336151

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-0-single-add.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-1-add-seq.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-10-fma.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-11-fma-mix.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-2-skewed-alu.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-3-mul.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-4-sdiv.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-5-mul-sdiv.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-6-mul.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-7-cmp.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-8-ldr.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-9-fabs.s

This is an archive of the discontinued LLVM Phabricator instance.

[MCA] Add tests for IPC on Cortex-A55ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 336151

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-0-single-add.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-1-add-seq.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-10-fma.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-11-fma-mix.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-2-skewed-alu.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-3-mul.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-4-sdiv.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-5-mul-sdiv.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-6-mul.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-7-cmp.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-8-ldr.s

llvm/test/tools/llvm-mca/AArch64/Cortex/IPC/A55-9-fabs.s

[MCA] Add tests for IPC on Cortex-A55
ClosedPublic