This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Change store merge candidates check cut off to 1024
ClosedPublic

Authored by aemerson on May 8 2018, 7:35 AM.

Download Raw Diff

Details

Reviewers

Commits

rG4e66142f145a: [DAGCombine] Change store merge candidates check cut off to 1024.
rL331888: [DAGCombine] Change store merge candidates check cut off to 1024.

Summary

Change store merge candidates check cut off to 1024.

The previous value of 8192 resulted in 5x compile time hits in some pathological cases.

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.May 8 2018, 7:35 AM

I ran the test suite and SPEC2000/2006 benchmarks and didn't see any significant change with this.

I think pruning this back is reasonable. The choice of 8192 was arbitrary. Have you checked larger sizes than 1024? Given it's a 5x increase (presumably) of the total runtime and the algorithm is O(N) I would expect to only need a 5x reduction and 2048 would be roughly equivalent.

Also, since you've run the spec benchmarks, can do a quick check to see which binaries changed? This search could be rewritten to bound this search to the true common ancestor but there's no point in rewriting it unless there's a valid merging case where we give up early.

Either way this LGTM.

This revision is now accepted and ready to land.May 8 2018, 8:42 AM

The actual problem here seems to be superlinear performance with this cutoff value. The 5x I mentioned was inappropriate as that was over a whole compile and compared to an old version of LLVM, before the store merging after legalisation patch for AArch64 was landed in December. The actual problem in my test case, which is admittedly a very large one, is that with the original value of 8192, I get around 330s runtime for my test case, at Max=2048 its 189s, Max=1024 its 34s, Max=512 21s, Max=256 20s. So somewhere between 512 and 1024 we start to see this superlinear compile time.

Unfortunately I can't share the test case, but something is definitely not O(N).

Unfortunately I can't share the test case, but something is definitely not O(N).

Fair enough. If you happen into a sharable test case that I could use as a benchmark I'd be happy to see if I could prune this check more; It seems like there's still a lot of time being consumed here.

Closed by commit rL331888: [DAGCombine] Change store merge candidates check cut off to 1024. (authored by aemerson). · Explain WhyMay 9 2018, 8:57 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

2 lines

Diff 145935

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,378 Lines • ▼ Show 20 Lines	bool DAGCombiner::checkMergeStoreCandidatesForDependencies(
SmallVectorImpl<MemOpLink> &StoreNodes, unsigned NumStores) {		SmallVectorImpl<MemOpLink> &StoreNodes, unsigned NumStores) {
// FIXME: We should be able to truncate a full search of		// FIXME: We should be able to truncate a full search of
// predecessors by doing a BFS and keeping tabs the originating		// predecessors by doing a BFS and keeping tabs the originating
// stores from which worklist nodes come from in a similar way to		// stores from which worklist nodes come from in a similar way to
// TokenFactor simplfication.		// TokenFactor simplfication.

SmallPtrSet<const SDNode *, 16> Visited;		SmallPtrSet<const SDNode *, 16> Visited;
SmallVector<const SDNode *, 8> Worklist;		SmallVector<const SDNode *, 8> Worklist;
unsigned int Max = 8192;		unsigned int Max = 1024;
// Search Ops of store candidates.		// Search Ops of store candidates.
for (unsigned i = 0; i < NumStores; ++i) {		for (unsigned i = 0; i < NumStores; ++i) {
SDNode *n = StoreNodes[i].MemNode;		SDNode *n = StoreNodes[i].MemNode;
// Potential loops may happen only through non-chain operands		// Potential loops may happen only through non-chain operands
for (unsigned j = 1; j < n->getNumOperands(); ++j)		for (unsigned j = 1; j < n->getNumOperands(); ++j)
Worklist.push_back(n->getOperand(j).getNode());		Worklist.push_back(n->getOperand(j).getNode());
}		}
// Search through DAG. We can stop early if we find a store node.		// Search through DAG. We can stop early if we find a store node.
▲ Show 20 Lines • Show All 4,632 Lines • Show Last 20 Lines