I’ve attached test results for LLVM test suite (MultiSource and MicroBenchmarks suites) which show difference between complete and incomplete Cortex-A55 scheduler model.
I’d also mention that we’ve got small (~1%) improvements in GeekBench SGEMM and AES-XTS tests.
OK Thanks. I presume this is run on a Cortex-A55? And the noise is low enough to make them meaningful?
We wrote a few different updates to the CortexA55 schedule prior to making it the default under cpu=generic. We had already written patches a lot like this (not this exactly - the neon part of this patch. This patch is trying to do too much at once and needs to be split up). The problem is that the A55 is notoriously difficult to schedule for and a lot of the patches we tried ended up making the performance worse, not better. We run a set of some benchmarks on an RTL simulator to get deterministic results. They are perhaps not the best benchmarks, but are very accurate, and this patch shows the same results where things don't look better.
(We also had a few other reasons for keeping the higher latencies, like the A510 sometimes having higher latencies but higher throughputs, and this schedule being used for cpu=generic allows it to produce better code in more cases. Plus it effecting many test now that it is the default. I was at least hoping to give it some times before we changed everything again.)
I think there is values to having more accurate scheduling, even if the performance results we have are not perfect. I would suggest trying to split this patch up a bit though, to make sure we can check that the parts are correct. At least the LDP and NEON parts are logically separate.
Why has this file been rewritten?
We've run all the tests on Odroid-C4 board (4 A55 cores) with locked frequencies for each core. LLVM test suite was compiled with complete and incomplete model and run 10 times. We also repeated such test few times to ensure result is meaningful