This is an archive of the discontinued LLVM Phabricator instance.

[MachineScheduler] Enable latency heuristic based on scheduled lat.
Needs ReviewPublic

Authored by fhahn on Sep 26 2017, 8:14 AM.

Details

Reviewers
MatzeB
Summary

The motivation of this patch is to improve scheduling for the test case
test/CodeGen/AArch64/misched-sdiv.ll with the MachineScheduler. A
similar test is part of test/Codegen/ARM.
I think ideally we would schedule SDIV as early as possibly as any instruction
scheduled before SDIV will increase the critical path. By how much
depends on the number of in-order pipeline stages.

The following happens during the test case. After scheduling the sub instruction,
both sdiv and add are added to the available bottom queue. When picking
the best candidate from the bottom queue, CurrZone.getCurrCycle()
returns 0, which plus the RemLatency is lower than the critical path,
so the latency heuristic is not used. I think
using the current cycle when scheduling top-down makes sense, as it is
that's the point where dispatching it later will impact the computed critical
path length. But when scheduling bottom-up, wouldn't it make sense to
use the latency already scheduled (at least when the candidate is on
the critical path), as this more accurately represents the cost of
scheduling the instruction?

There probably is a better way to handle this and I would appreciate
any input! PostRA scheduling does not take care of that case, as the
registers allocated prevent moving the SDIV instruction up and also is
disbaled on cores like Cortex-A72.

I did some initial benchmark runs on AArch64 with this patch:

  • AArch64 Cortex-A72 LLVM test-suite & spec2k: -0.22% on execution time
  • AArch64 Cortex-A57 SPEC2017: +0.74% on score

Diff Detail

Event Timeline

fhahn created this revision.Sep 26 2017, 8:14 AM
jonpa added a subscriber: jonpa.Nov 28 2017, 12:51 AM

This makes sense to me as well, but I wonder why Issued gets a different value depending on top/bot ?

javed.absar added inline comments.Nov 28 2017, 2:22 AM
lib/CodeGen/MachineScheduler.cpp
2445

Bit strange. getScheduledLatency returns max of CurrCycle and ExpectedLatency which I dont quite find set anywhere (other than 0).

This makes sense to me as well, but I wonder why Issued gets a different value depending on top/bot ?

My understanding was current cycle is quite accurate from top-down, but not for bottom up. For example, after scheduling a node with latency 3 from bottom, we would only bump CurrCycle to 1. But if this node is on the critical path, scheduling the predecessor in the next cycle increases the critical path by 2 cycles (if we there are enough remaining instructions to be scheduled), because it was scheduled too early.

When going top-down, only after CurrCycle + RemLatency > CriticalPath scheduling a node later would increase the critical path.

Another way to think about it: When going top down, scheduling a node on the critical path “too early” does not have any impact on that critical path (it might cause another path to become critical though). But scheduling a node on the critical path “too early” when going bottom up is worse, as each instruction scheduled after the node increases the critical path ( assuming we can issue 1 instruction per cycle).

lib/CodeGen/MachineScheduler.cpp
2445

It's updated in bumpNode, through the reference created at line 2230 I think.

fhahn added a comment.Dec 5 2017, 6:53 AM

This makes sense to me as well, but I wonder why Issued gets a different value depending on top/bot ?

Did my response make sense as well? It's probably not extremely well worded :/

jonpa added a comment.Dec 6 2017, 7:19 AM

This makes sense to me as well, but I wonder why Issued gets a different value depending on top/bot ?

Did my response make sense as well? It's probably not extremely well worded :/

Well.. :-) I must admit I am somewhat confused still... Is this difference the same regardless of in-order / out-of-order and issue width?

I tried this patch on SystemZ, and it seems it may bring just a little improvement if anything (SystemZ is doing bottom-up-only pre-ra).

evandro added a subscriber: evandro.Apr 4 2018, 2:38 PM