This mirrors what we already do for AArch64 and LNT scores are improved by a geomean of 1.57%.
Details
Diff Detail
Event Timeline
LGTM. AFAIK cortex-a57 and cortex-a72 are close enough for this to be beneficial. AArch64 re-uses the A57 model for A72 too.
Hey Florian,
It was brought to my attention that the scheduler still wasn't enabled because of the missing feature. I've now added this and the geomean improvement is 2.23%. I will shortly add a couple of tests too.
So yes, to enable the machine scheduler for a core on ARM, FeatureUseMISched is required. I think it is worth splitting using the Cortex-A57 model and enabling the machine scheduler for Cortex-A72. The A57 model should be more accurate for A72 than no model.
For changing to using the MachineScheduler, the last time I looked at it (about a year ago), I remember seeing some relatively big regressions on some benchmarks, so we might want to be a bit more cautious there, as there might be potential to tweak the scheduling heuristics for ARM. Also, I think it would be good to have numbers for a large set of benchmarks (test-suite + commercial ones)
Ok, fair enough, my LNT numbers show that the MISched results are more variable:
Regressions (%):
benchmark | Model | Model + MISched |
Stanford/FloatMM | 40.1 | |
McGill/queens | 29.6 | 29.4 |
Stanford/Puzzle | 16.05 | 15.6 |
FreeBench/mason/mason | 9.27 | |
Olden/power/power | 7.61 | |
VersaBench/ecbdes/ecbdes | 5.25 | |
BenchmarkGame/fannkuch | 6.55 | |
Fhourstones/fhourstones | 5.75 | |
Improvements (%):
benchmark | Model | Model + MISched |
Stanford/Perm | 21.7 | 22.53 |
Olden/mst/mst | 22.05 | 22.21 |
FreeBench/fourinarow/fourinarow | 22.83 | 21.21 |
TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl | 14.17 | 19.52 |
TSVC/Searching-flt/Searching-flt | 16.26 | 19.23 |
TSVC/Searching-dbl/Searching-dbl | 16.28 | 19.15 |
I will remove the use of the feature for now.
cheers,