This is an archive of the discontinued LLVM Phabricator instance.

[CortexA55][SchedModels] Complete Cortex-A55 scheduler model
Needs ReviewPublic

Authored by kpdev42 on Oct 20 2021, 11:51 PM.

Details

Summary

Depends on D114642, D116361

Diff Detail

Event Timeline

kpdev42 created this revision.Oct 20 2021, 11:51 PM
kpdev42 requested review of this revision.Oct 20 2021, 11:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2021, 11:51 PM
asl added a subscriber: asl.Oct 27 2021, 1:37 AM

OK. Do you have any performance results that suggest this is an improvement?

kpdev42 updated this revision to Diff 390056.Nov 26 2021, 7:31 AM
kpdev42 edited the summary of this revision. (Show Details)

I’ve attached test results for LLVM test suite (MultiSource and MicroBenchmarks suites) which show difference between complete and incomplete Cortex-A55 scheduler model.
I’d also mention that we’ve got small (~1%) improvements in GeekBench SGEMM and AES-XTS tests.

I’ve attached test results for LLVM test suite (MultiSource and MicroBenchmarks suites) which show difference between complete and incomplete Cortex-A55 scheduler model.
I’d also mention that we’ve got small (~1%) improvements in GeekBench SGEMM and AES-XTS tests.

OK Thanks. I presume this is run on a Cortex-A55? And the noise is low enough to make them meaningful?

We wrote a few different updates to the CortexA55 schedule prior to making it the default under cpu=generic. We had already written patches a lot like this (not this exactly - the neon part of this patch. This patch is trying to do too much at once and needs to be split up). The problem is that the A55 is notoriously difficult to schedule for and a lot of the patches we tried ended up making the performance worse, not better. We run a set of some benchmarks on an RTL simulator to get deterministic results. They are perhaps not the best benchmarks, but are very accurate, and this patch shows the same results where things don't look better.

(We also had a few other reasons for keeping the higher latencies, like the A510 sometimes having higher latencies but higher throughputs, and this schedule being used for cpu=generic allows it to produce better code in more cases. Plus it effecting many test now that it is the default. I was at least hoping to give it some times before we changed everything again.)

I think there is values to having more accurate scheduling, even if the performance results we have are not perfect. I would suggest trying to split this patch up a bit though, to make sure we can check that the parts are correct. At least the LDP and NEON parts are logically separate.

llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s
1064

Why has this file been rewritten?

OK Thanks. I presume this is run on a Cortex-A55? And the noise is low enough to make them meaningful?

We've run all the tests on Odroid-C4 board (4 A55 cores) with locked frequencies for each core. LLVM test suite was compiled with complete and incomplete model and run 10 times. We also repeated such test few times to ensure result is meaningful

kpdev42 updated this revision to Diff 410445.Feb 21 2022, 9:31 PM

An update for Cortex-A55 model. It contains ASIMD fp, misc, crypto and crc instructions.

Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 1:44 AM