This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
ClosedPublic

Authored by fhahn on May 1 2019, 1:22 PM.

Download Raw Diff

Details

Reviewers

niravd
spatel
craig.topper

Commits

rZORG16977d60e4a3: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
rZORGb73aef86ce6b: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
rG16977d60e4a3: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
rGb73aef86ce6b: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
rGa9d6c32eafc6: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
rL360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor

Summary

When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.

This patch adds some limits to the number of nodes explored for the
cases mentioned above.

Diff Detail

Repository: rL LLVM

Event Timeline

fhahn created this revision.May 1 2019, 1:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2019, 1:22 PM

Herald added subscribers: dexonsmith, hiraditya, mehdi_amini. · View Herald Transcript

Harbormaster completed remote builds in B31250: Diff 197626.May 1 2019, 1:22 PM

Relax limits to 1000 nodes to explore. Further experimenting showed that those
bigger limits are still sufficient to ensure limiting quadratic compile time.

Harbormaster completed remote builds in B31281: Diff 197761.May 2 2019, 5:43 AM

Notes inline, but I think the majority of the compile time improvements come from keepign TokenFactor operand counts bounded. This should be changed to do reflect that.

That said, I think rewriting the "CanMergeStoresTo" to "GetMaximumMergedStoreSize" and using that as a first-order filter on nodes to consider like was discussed in D60133 should be done before any heuristic restrictions (including TokenFactor operand count limits) should be considered as it causes no code generation degradation, only compile time improvements.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1767 ↗	(On Diff #197761)	I this that this is too restrictive. This check effectively curtails the second and third passes (The first being the quick two operand check). The third pass is a fairly expensive chain predecessor search to see if there are redundancies in the operands (if A and its chain predecessor B are both in the list, you can drop B) but already has a cutoff of 1024 steps. The second pass is a linear pass to weed out trivial redundancies in the list. Due to the way that chain replacements happen, it's extremely common for TokenFactors to have redundant operands in them and I worry that curtialing this will have nontrivial degradation for relatively little compile time improvement. If this is making a notable difference, I suspect we should change the second pass to not inline a TokenFactor if the number of operands in the new TokenFactor would be too large.
14852 ↗	(On Diff #197761)	nit: fold this into the loop test and increment (MaxNumberOfNodesToCheck> 0 , --MaxNumberOfNodesToCheck) nit: Since we merge mainly powers of two, the limit should one or slightly higher (1025?)
19999 ↗	(On Diff #197761)	It seems like the underlying problem you're trying to avoid is having a TokenFactor with too many operands. That's definitely an issue. We do try to avoid it in the SelectionDAGBuilder, but not consisently elsewhere. This is indirectly doing this by forcing us to only consider a smaller set of chains. Given that this catches mostly long chains of independant chains with mostly the same real dependences, it would probably be much better to do the calculation here as is, but instead of making larger tokenfactors, split them into a hierarchy of some sort. Maybe this should be folded into the TokenFactor case of getNode smart constructor.

In D61397#1487955, @niravd wrote:

Notes inline, but I think the majority of the compile time improvements come from keepign TokenFactor operand counts bounded. This should be changed to do reflect that.

Thanks for taking a look. Yep I think that might be the case. I'll verify that tomorrow.

That said, I think rewriting the "CanMergeStoresTo" to "GetMaximumMergedStoreSize" and using that as a first-order filter on nodes to consider like was discussed in D60133 should be done before any heuristic restrictions (including TokenFactor operand count limits) should be considered as it causes no code generation degradation, only compile time improvements.

I will get back to D60133 soon as well! Unfortunately it did not yield any improvements for the case I am looking at.

Limit this patch to visitTokenFactor, limiting the number of operands to inline to 2048 nodes.

fhahn marked an inline comment as done.May 3 2019, 8:06 AM

fhahn added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1767 ↗	(On Diff #197761)	Thanks! I've limited the number of Ops to collect for inlining in the second pass. Is that along the lines you were thinking? I will also check if it helps to skip tokefactors that will get us to exceed a limit here.

fhahn marked an inline comment as done.May 3 2019, 8:10 AM

fhahn added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14852 ↗	(On Diff #197761)	Thanks, moved to a separate patch: D61511

fhahn mentioned this in D61511: [DAGCombiner] Limit number of nodes explored as store candidates..May 6 2019, 12:35 PM

LGTM.

This revision is now accepted and ready to land.May 7 2019, 7:12 AM

Closed by commit rL360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor (authored by fhahn). · Explain WhyMay 7 2019, 9:45 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

5 lines

Diff 198493

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,786 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitTokenFactor(SDNode *N) {
SmallVector<SDValue, 8> Ops; // Ops for replacing token factor.		SmallVector<SDValue, 8> Ops; // Ops for replacing token factor.
SmallPtrSet<SDNode*, 16> SeenOps;		SmallPtrSet<SDNode*, 16> SeenOps;
bool Changed = false; // If we should replace this token factor.		bool Changed = false; // If we should replace this token factor.

// Start out with this token factor.		// Start out with this token factor.
TFs.push_back(N);		TFs.push_back(N);

// Iterate through token factors. The TFs grows when new token factors are		// Iterate through token factors. The TFs grows when new token factors are
// encountered.		// encountered. Limit number of nodes to inline, to avoid quadratic compile
for (unsigned i = 0; i < TFs.size(); ++i) {		// times.
		for (unsigned i = 0; i < TFs.size() && Ops.size() <= 2048; ++i) {
SDNode *TF = TFs[i];		SDNode *TF = TFs[i];

// Check each of the operands.		// Check each of the operands.
for (const SDValue &Op : TF->op_values()) {		for (const SDValue &Op : TF->op_values()) {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
case ISD::EntryToken:		case ISD::EntryToken:
// Entry tokens don't need to be added to the list. They are		// Entry tokens don't need to be added to the list. They are
// redundant.		// redundant.
▲ Show 20 Lines • Show All 18,300 Lines • Show Last 20 Lines