This is an archive of the discontinued LLVM Phabricator instance.

I’ve attached test results for LLVM test suite (MultiSource and MicroBenchmarks suites) which show difference between complete and incomplete Cortex-A55 scheduler model.
I’d also mention that we’ve got small (~1%) improvements in GeekBench SGEMM and AES-XTS tests.

Harbormaster completed remote builds in B136237: Diff 390056.Nov 26 2021, 7:32 AM

In D112201#3155985, @kpdev42 wrote:

I’ve attached test results for LLVM test suite (MultiSource and MicroBenchmarks suites) which show difference between complete and incomplete Cortex-A55 scheduler model.
I’d also mention that we’ve got small (~1%) improvements in GeekBench SGEMM and AES-XTS tests.

OK Thanks. I presume this is run on a Cortex-A55? And the noise is low enough to make them meaningful?

We wrote a few different updates to the CortexA55 schedule prior to making it the default under cpu=generic. We had already written patches a lot like this (not this exactly - the neon part of this patch. This patch is trying to do too much at once and needs to be split up). The problem is that the A55 is notoriously difficult to schedule for and a lot of the patches we tried ended up making the performance worse, not better. We run a set of some benchmarks on an RTL simulator to get deterministic results. They are perhaps not the best benchmarks, but are very accurate, and this patch shows the same results where things don't look better.

(We also had a few other reasons for keeping the higher latencies, like the A510 sometimes having higher latencies but higher throughputs, and this schedule being used for cpu=generic allows it to produce better code in more cases. Plus it effecting many test now that it is the default. I was at least hoping to give it some times before we changed everything again.)

I think there is values to having more accurate scheduling, even if the performance results we have are not perfect. I would suggest trying to split this patch up a bit though, to make sure we can check that the parts are correct. At least the LDP and NEON parts are logically separate.

llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s
6	Why has this file been rewritten?

kpdev42 edited the summary of this revision. (Show Details)Dec 28 2021, 10:31 PM

kpdev42 added a parent revision: D116361: [SchedModels][CortexA55] Fix scheduling of FP loads.

In D112201#3199365, @dmgreen wrote:

OK Thanks. I presume this is run on a Cortex-A55? And the noise is low enough to make them meaningful?

We've run all the tests on Odroid-C4 board (4 A55 cores) with locked frequencies for each core. LLVM test suite was compiled with complete and incomplete model and run 10 times. We also repeated such test few times to ensure result is meaningful

kpdev42 mentioned this in D117003: [SchedModels][CortexA55] Add ASIMD integer instructioins.Jan 11 2022, 3:04 AM

kpdev42 mentioned this in rG37fa99eda0f5: [SchedModels][CortexA55] Add ASIMD integer instructions.Feb 17 2022, 2:43 AM

An update for Cortex-A55 model. It contains ASIMD fp, misc, crypto and crc instructions.

Herald added a reviewer: sjarus. · View Herald TranscriptFeb 21 2022, 9:31 PM

Herald added subscribers: armkevincheng, eric-k256. · View Herald Transcript

Harbormaster completed remote builds in B150805: Diff 410445.Feb 21 2022, 9:31 PM

ping

Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 1:44 AM