I think all other things being equal, it could be beneficial to prioritize
candidates that do not increase register pressure, as this may uncover
new candidates that reduce register pressure.
I have been experimenting with using the MachineScheduler on ARM with
Cortex-A57, and this change helps to reduce spilling in some cases.
For the LLVM test-suite this change helps to reduce spilling in some cases.For
the test-suite and SPEC2k, it improves the geomean of the execution times
compared to the old scheduler from -0.82% (MachinScheduler for ARM on
Cortex-A57) to -1.03% (same as before, but with this patch) and eliminates a
few big outliners. For SPEC2k6, it improves the geomean of execution
times from -0.24% to -0.29%.
For AArch64 I did some benchmark runs comparing master against this
patch.
Cortex-A57:
- -0.62% geomean exec time for test-suite and SPEC2k
- -0.29% for SPEC2k6
- +0.19% SPEC2017 score
Cortex-A72:
- -0.29% geomean exec time for test-suite and SPEC2k
The impact of this change on the unit tests is quite small. The
modified test case in this patch uses 1 register less now, with the
order of instructions slightly changed. The earliest fcvt gets still
scheduled after 2 instructions, so latency wise the schedule should not
have regressed.
The following ADMGPU tests also fail:
- CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll (with -mtriple=amdgcn--amdhsa -mcpu=fiji -amdgpu-spill-sgpr-to-smem=1) Expected: NumSGPRsForWavesPerEU: 9 Actual: NumSGPRsForWavesPerEU: 14
- CodeGen/AMDGPU/schedule-regpressure-limit2.ll (with -march=amdgcn -mcpu=fiji -misched=gcn-max-occupancy-experimental Expected: NumSgprs: {{[1-5][0-9]$}} Actual: NumSGPRsForWavesPerEU: 78
I do not know anything about AMDGPU, so I cannot really say if that is
good or bad?
@jonpa this might be interesting to you too, as I gathered that you are
working on switching SystemZ to the MachineScheduler.
Why isn't this using tryGreate() and isn't this the same as just changing getUnitInc() < 0 to getUnitInc() <= 0 in the earlier check?