The motivation of this patch is to improve scheduling for the test case
test/CodeGen/AArch64/misched-sdiv.ll with the MachineScheduler. A
similar test is part of test/Codegen/ARM.
I think ideally we would schedule SDIV as early as possibly as any instruction
scheduled before SDIV will increase the critical path. By how much
depends on the number of in-order pipeline stages.
The following happens during the test case. After scheduling the sub instruction,
both sdiv and add are added to the available bottom queue. When picking
the best candidate from the bottom queue, CurrZone.getCurrCycle()
returns 0, which plus the RemLatency is lower than the critical path,
so the latency heuristic is not used. I think
using the current cycle when scheduling top-down makes sense, as it is
that's the point where dispatching it later will impact the computed critical
path length. But when scheduling bottom-up, wouldn't it make sense to
use the latency already scheduled (at least when the candidate is on
the critical path), as this more accurately represents the cost of
scheduling the instruction?
There probably is a better way to handle this and I would appreciate
any input! PostRA scheduling does not take care of that case, as the
registers allocated prevent moving the SDIV instruction up and also is
disbaled on cores like Cortex-A72.
I did some initial benchmark runs on AArch64 with this patch:
- AArch64 Cortex-A72 LLVM test-suite & spec2k: -0.22% on execution time
- AArch64 Cortex-A57 SPEC2017: +0.74% on score
Bit strange. getScheduledLatency returns max of CurrCycle and ExpectedLatency which I dont quite find set anywhere (other than 0).