This is an archive of the discontinued LLVM Phabricator instance.

[ScheduleDAGRRList] Recompute topological ordering on demand.
ClosedPublic

Authored by fhahn on Apr 2 2019, 6:23 AM.

Details

Summary

Currently there is a single point in ScheduleDAGRRList, where we
actually query the topological order (besides init code). Currently we
are recomputing the order after adding a node (which does not have
predecessors) and then we add predecessors edge-by-edge.

We can avoid adding edges one-by-one after we added a new node. In that case, we can
just rebuild the order from scratch after adding the edges to the DAG
and avoid all the updates to the ordering.

Also, we can delay updating the DAG until we query the DAG, if we keep a
list of added edges. Depending on the number of updates, we can either
apply them when needed or recompute the order from scratch.

This brings down the geomean compile time for of CTMark with -O1 down 0.2% on X86,
with no regressions.

Diff Detail

Repository
rL LLVM

Event Timeline

fhahn created this revision.Apr 2 2019, 6:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2019, 6:23 AM
efriedma added inline comments.Apr 2 2019, 12:03 PM
llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1488 ↗(On Diff #193275)

Where exactly do we query the topological order relative to this call to fixOrder()? Do we need to fixOrder() inside the loop?

fhahn marked an inline comment as done.Apr 2 2019, 12:37 PM
fhahn added inline comments.
llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1488 ↗(On Diff #193275)

We only query it in the loop below (WillCreateCycle call). In case we add new predecessors, we also exit the loop. And we only enter the loop, if there are some interferences. Maybe it's worth adding a comment here?

efriedma added inline comments.Apr 2 2019, 12:44 PM
llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1488 ↗(On Diff #193275)

Probably worth adding a comment, yes.

By the way, how frequently do we hit this case in practice?

fhahn updated this revision to Diff 193451.Apr 3 2019, 2:19 AM

Add comment and make even lazier.

fhahn marked 2 inline comments as done.Apr 3 2019, 2:46 AM
fhahn added inline comments.
llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1488 ↗(On Diff #193275)

By the way, how frequently do we hit this case in practice?

I haven't looked at this call here in isolation, but I gather statistics on the impact of the number of InitDAGTopologicalSorting and AddPred calls.

Number of InitDAGTopologicalSorting calls for CTMark, X86, -O1

patch          base

CTMark/kimwitu++/kc.test 25671.0 46740.0
CTMark/Bullet/bullet.test 16228.0 36873.0
CTMark/mafft/pairlocalalign.test 14290.0 28695.0
CTMark/lencod/lencod.test 19006.0 37880.0
CTMark/sqlite3/sqlite3.test 22447.0 49409.0
CTMark/ClamAV/clamscan.test 23087.0. 48369.0
CTMark/7zip/7zip-benchmark.test 41045.0 98737.0
CTMark/tramp3d-v4/tramp3d-v4.test 22057.0 49214.0
CTMark/SPASS/SPASS.test 19596.0 40652.0
CTMark/consumer-typeset/consumer-typeset.test 16567.0 33208.0

Number of AddPred calls for CTMark, X86, -O1

patch       base

CTMark/kimwitu++/kc.test 2645.0 2702.0
CTMark/Bullet/bullet.test 3619.0 3643.0
CTMark/mafft/pairlocalalign.test 3427.0 3455.0
CTMark/lencod/lencod.test 3902.0 4108.0
CTMark/sqlite3/sqlite3.test 3147.0 3179.0
CTMark/ClamAV/clamscan.test 3764.0 3821.0
CTMark/7zip/7zip-benchmark.test 7308.0 7393.0
CTMark/tramp3d-v4/tramp3d-v4.test 4311.0 4315.0
CTMark/SPASS/SPASS.test 4512.0 4577.0
CTMark/consumer-typeset/consumer-typeset.test 3695.0 3734.0

Also, the latest version of the patch improves compile-time on CTMark X86, -O1 a bit more: negative diff means a reduction in compile time.
Program diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test -0.6%
test-suite :: CTMark/Bullet/bullet.test -0.4%
test-suite...:: CTMark/sqlite3/sqlite3.test -0.4%
test-suite :: CTMark/kimwitu++/kc.test -0.4%
test-suite...TMark/7zip/7zip-benchmark.test -0.3%
test-suite :: CTMark/lencod/lencod.test -0.3%
test-suite :: CTMark/SPASS/SPASS.test -0.3%
test-suite...-typeset/consumer-typeset.test -0.3%
test-suite...:: CTMark/ClamAV/clamscan.test -0.2%
test-suite...Mark/mafft/pairlocalalign.test 0.0%
Geomean difference -0.3%

efriedma added inline comments.Apr 3 2019, 1:13 PM
llvm/lib/CodeGen/ScheduleDAG.cpp
716 ↗(On Diff #193451)

How terrible would it be to just call fixOrder from here, instead of making the callers check? That makes it impossible for callers to mess up, and the check should be cheap.

I guess it also might be possible to add some cheap checks here to avoid calling fixOrder; for example, if TargetSU has no successors.

fhahn marked 2 inline comments as done.Apr 3 2019, 1:23 PM
fhahn added inline comments.
llvm/lib/CodeGen/ScheduleDAG.cpp
716 ↗(On Diff #193451)

I guess the impact would not be too big, in fact that was what I initially had. I'll measure it and get back. I think it would be OK to push responsibility to the caller in a way and the assertion should catch any errors (the message should probably point to fixOrder()).

fhahn updated this revision to Diff 193737.Apr 4 2019, 9:53 AM
fhahn marked an inline comment as done.

Fix order whenever it is queried.

fhahn added inline comments.Apr 4 2019, 9:58 AM
llvm/lib/CodeGen/ScheduleDAG.cpp
716 ↗(On Diff #193451)

I gathered numbers with having FixOrder() in IsReachable & WillCreateCycle. The numbers shifted a bit, but we still have roughly the same overall gain. It seems in some cases one approach is slightly better and for some cases the other one. I guess we should go for the safer one for now?

This revision is now accepted and ready to land.Apr 4 2019, 11:57 AM
This revision was automatically updated to reflect the committed changes.