This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
ClosedPublic

Authored by fhahn on May 1 2019, 1:22 PM.

Details

Summary

When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.

This patch adds some limits to the number of nodes explored for the
cases mentioned above.

Diff Detail

Repository
rL LLVM

Event Timeline

fhahn created this revision.May 1 2019, 1:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2019, 1:22 PM
fhahn updated this revision to Diff 197761.May 2 2019, 5:43 AM

Relax limits to 1000 nodes to explore. Further experimenting showed that those
bigger limits are still sufficient to ensure limiting quadratic compile time.

niravd added a comment.May 2 2019, 8:19 AM

Notes inline, but I think the majority of the compile time improvements come from keepign TokenFactor operand counts bounded. This should be changed to do reflect that.

That said, I think rewriting the "CanMergeStoresTo" to "GetMaximumMergedStoreSize" and using that as a first-order filter on nodes to consider like was discussed in D60133 should be done before any heuristic restrictions (including TokenFactor operand count limits) should be considered as it causes no code generation degradation, only compile time improvements.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1767 ↗(On Diff #197761)

I this that this is too restrictive. This check effectively curtails the second and third passes (The first being the quick two operand check).

The third pass is a fairly expensive chain predecessor search to see if there are redundancies in the operands (if A and its chain predecessor B are both in the list, you can drop B) but already has a cutoff of 1024 steps.

The second pass is a linear pass to weed out trivial redundancies in the list. Due to the way that chain replacements happen, it's extremely common for TokenFactors to have redundant operands in them and I worry that curtialing this will have nontrivial degradation for relatively little compile time improvement.

If this is making a notable difference, I suspect we should change the second pass to not inline a TokenFactor if the number of operands in the new TokenFactor would be too large.

14852 ↗(On Diff #197761)

nit: fold this into the loop test and increment (MaxNumberOfNodesToCheck> 0 , --MaxNumberOfNodesToCheck)

nit: Since we merge mainly powers of two, the limit should one or slightly higher (1025?)

19999 ↗(On Diff #197761)

It seems like the underlying problem you're trying to avoid is having a TokenFactor with too many operands. That's definitely an issue. We do try to avoid it in the SelectionDAGBuilder, but not consisently elsewhere.

This is indirectly doing this by forcing us to only consider a smaller set of chains. Given that this catches mostly long chains of independant chains with mostly the same real dependences, it would probably be much better to do the calculation here as is, but instead of making larger tokenfactors, split them into a hierarchy of some sort. Maybe this should be folded into the TokenFactor case of getNode smart constructor.

fhahn added a comment.May 2 2019, 2:18 PM

Notes inline, but I think the majority of the compile time improvements come from keepign TokenFactor operand counts bounded. This should be changed to do reflect that.

Thanks for taking a look. Yep I think that might be the case. I'll verify that tomorrow.

That said, I think rewriting the "CanMergeStoresTo" to "GetMaximumMergedStoreSize" and using that as a first-order filter on nodes to consider like was discussed in D60133 should be done before any heuristic restrictions (including TokenFactor operand count limits) should be considered as it causes no code generation degradation, only compile time improvements.

I will get back to D60133 soon as well! Unfortunately it did not yield any improvements for the case I am looking at.

fhahn updated this revision to Diff 198004.May 3 2019, 7:56 AM
fhahn retitled this revision from [DAGCombiner] Limit number of nodes to explore, to avoid quadratic compile time. to [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
fhahn edited the summary of this revision. (Show Details)

Limit this patch to visitTokenFactor, limiting the number of operands to inline to 2048 nodes.

fhahn marked an inline comment as done.May 3 2019, 8:06 AM
fhahn added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1767 ↗(On Diff #197761)

Thanks! I've limited the number of Ops to collect for inlining in the second pass. Is that along the lines you were thinking? I will also check if it helps to skip tokefactors that will get us to exceed a limit here.

fhahn marked an inline comment as done.May 3 2019, 8:10 AM
fhahn added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14852 ↗(On Diff #197761)

Thanks, moved to a separate patch: D61511

niravd accepted this revision.May 7 2019, 7:12 AM

LGTM.

This revision is now accepted and ready to land.May 7 2019, 7:12 AM
This revision was automatically updated to reflect the committed changes.