This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Limit the number of chains in findBetterChains().
AbandonedPublic

Authored by courbet on Oct 15 2018, 7:33 AM.

Details

Reviewers
niravd
Summary

Enabling -combiner-global-alias-analysis does find better chains and
therefore can result in prohibitively large compile times for very large
blocks.

This patch protects against large blocks without hurting the expected behaviour
for reasonable sizes. The typical use case for merging is when the stores to be
merged are reasonably local (e.g. when merging 4 stores, these 4 stores can be
found within a window of not much more than 4 unrelated store instructions).

Diff Detail

Event Timeline

courbet created this revision.Oct 15 2018, 7:33 AM
courbet updated this revision to Diff 169705.Oct 15 2018, 7:39 AM

revert formatting changes

Oh good catch. This is O(N^3) for long chains because we try every prefix of the chain, and because the chain is so long our chain improvements gives up and noops, so we just redo the same computation over and over. Cutting the length to any reasonable size makes it O(N^2). I suspect your observed merge size requirement is to deal with our incomplete handling of chain SubDAG in store merge (just a children of a single TF potentially skipping some loads), because it shouldn't matter otherwise (See (1)).

Although, maybe what we should actually do is just run up the chain and mark make sure each store is to a disjoint part of memory (which should be just a range check as they all had related pointers) and then just directly rewrite them into as fully parallel structure chained to a TokenFactor. That should be O(N) with a much smaller constant than this.

(1) I wrote a cleanup patch D31068 some time ago that never got around to chasing down the reviewers to care. It lets us merge stores from a chain of stores where we gave up on better aliasing. That should remove any burrs from giving up merges too soon. I've added you as a reviewer.

include/llvm/CodeGen/TargetLowering.h
2598

I wrote some a cleanup patch some time ago that never got around to chasing down the reviewers to care. It incrementally allows us to deal with chains of stores in which should remove any burrs from giving up merges too soon, so we shouldn't need to have a target-dependenct depth check.

This comment was removed by niravd.

Oh good catch. This is O(N^3) for long chains because we try every prefix of the chain, and because the chain is so long our chain improvements gives up and noops, so we just redo the same computation over and over. Cutting the length to any reasonable size makes it O(N^2). I suspect your observed merge size requirement is to deal with our incomplete handling of chain SubDAG in store merge (just a children of a single TF potentially skipping some loads), because it shouldn't matter otherwise (See (1)).

I'm not sure I understand what (1) would change to the situation ?

Although, maybe what we should actually do is just run up the chain and mark make sure each store is to a disjoint part of memory (which should be just a range check as they all had related pointers) and then just directly rewrite them into as fully parallel structure chained to a TokenFactor. That should be O(N) with a much smaller constant than this.

Note that this patch fixes a pathological case which is quite theoretical; the limit is never hit on any of the test-suite basic blocks.
Actually with this change it takes a basic block of more than ten thousand stores to reach the current compile time with hundreds of blocks, even with -combiner-global-alias-analysis.

(1) I wrote a cleanup patch D31068 some time ago that never got around to chasing down the reviewers to care. It lets us merge stores from a chain of stores where we gave up on better aliasing. That should remove any burrs from giving up merges too soon. I've added you as a reviewer.

For reference: compile times for insane block sizes with and without this patch. Y is seconds, X is #stores/3.

niravd resigned from this revision.Nov 15 2018, 7:09 AM

We've superceded this patch.

courbet abandoned this revision.Nov 18 2018, 11:42 PM

Dropping this.