This is an archive of the discontinued LLVM Phabricator instance.

[ScheduleDAG] Avoid unnecessary recomputation of topological order.
ClosedPublic

Authored by fhahn on Mar 22 2019, 3:05 PM.

Details

Summary

In some cases ScheduleDAGRRList has to add new nodes to resolve problems
with interfering physical registers. When new nodes are added, it
completely re-computes the topological order, which can take a long
time, but is unnecessary. We only add nodes one by one, and initially
they do not have any predecessors. So we can just insert them at the end
of the vector. Later we add predecessors, but the helper function
properly updates the topological order much more efficiently. With this
change, the compile time for the program below drops from 300s to 30s on
my machine.

define i11129 @test1() {
  %L1 = load i11129, i11129* undef
  %B30 = ashr i11129 %L1, %L1
  store i11129 %B30, i11129* undef
  ret i11129 %L1
}

This should be generally beneficial, as we can skip a large amount of
work. Theoretically there are some scenarios where we might not safe
much, e.g. when we add a dependency between the first and last node.
Then we would have to shift all nodes. But we still do not have to spend
the time re-computing the initial order.

Diff Detail

Event Timeline

fhahn created this revision.Mar 22 2019, 3:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2019, 3:05 PM
paquette accepted this revision.Mar 22 2019, 4:22 PM

LGTM

Out of curiosity, do you have any numbers for how this impacts compile time on, say, CTMark? What percentage of an improvement can we expect from this?

This revision is now accepted and ready to land.Mar 22 2019, 4:22 PM
fhahn added a comment.Mar 25 2019, 4:11 PM

LGTM

Out of curiosity, do you have any numbers for how this impacts compile time on, say, CTMark? What percentage of an improvement can we expect from this?

I ran CTmark in a few configurations on X86 (-O3 X86, -O0 X86, -O3 + LTO ARM64) and overall it seems neutral: some gains and losses up to 0.5%, but that's also the run-to-run difference on some runs on my machine.

fhahn planned changes to this revision.Apr 2 2019, 6:26 AM

After a bit more benchmarking, I think this patch makes things slightly worse in the general case. I've put up a patch that updates ScheduleDAGRRList to update the topological order on demand, D60125, which gives small, but stable improvements on CTMark. I have to revisit this patch and see how we can deal with extreme cases, without making things worse in the general case.

fhahn added a comment.May 31 2020, 3:02 AM

After a bit more benchmarking, I think this patch makes things slightly worse in the general case. I've put up a patch that updates ScheduleDAGRRList to update the topological order on demand, D60125, which gives small, but stable improvements on CTMark. I have to revisit this patch and see how we can deal with extreme cases, without making things worse in the general case.

Just benchmark again, it looks like currently there are a few clear improvements and no real regressions: http://llvm-compile-time-tracker.com/compare.php?from=7873376bb36b4f9646fbc26d6da88e2edbf796e4&to=d44cb1460dd6ccad74ea96ce295d804f9e291bf3&stat=instructions

I'll land the patch shortly.

This revision was not accepted when it landed; it landed in state Changes Planned.May 31 2020, 3:43 AM
This revision was automatically updated to reflect the committed changes.