This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Change store merge candidates check cut off to 1024
ClosedPublic

Authored by aemerson on May 8 2018, 7:35 AM.

Details

Summary

Change store merge candidates check cut off to 1024.

The previous value of 8192 resulted in 5x compile time hits in some pathological cases.

Diff Detail

Repository
rL LLVM

Event Timeline

aemerson created this revision.May 8 2018, 7:35 AM

I ran the test suite and SPEC2000/2006 benchmarks and didn't see any significant change with this.

niravd accepted this revision.May 8 2018, 8:42 AM

I think pruning this back is reasonable. The choice of 8192 was arbitrary. Have you checked larger sizes than 1024? Given it's a 5x increase (presumably) of the total runtime and the algorithm is O(N) I would expect to only need a 5x reduction and 2048 would be roughly equivalent.

Also, since you've run the spec benchmarks, can do a quick check to see which binaries changed? This search could be rewritten to bound this search to the true common ancestor but there's no point in rewriting it unless there's a valid merging case where we give up early.

Either way this LGTM.

This revision is now accepted and ready to land.May 8 2018, 8:42 AM

The actual problem here seems to be superlinear performance with this cutoff value. The 5x I mentioned was inappropriate as that was over a whole compile and compared to an old version of LLVM, before the store merging after legalisation patch for AArch64 was landed in December. The actual problem in my test case, which is admittedly a very large one, is that with the original value of 8192, I get around 330s runtime for my test case, at Max=2048 its 189s, Max=1024 its 34s, Max=512 21s, Max=256 20s. So somewhere between 512 and 1024 we start to see this superlinear compile time.

Unfortunately I can't share the test case, but something is definitely not O(N).

Unfortunately I can't share the test case, but something is definitely not O(N).

Fair enough. If you happen into a sharable test case that I could use as a benchmark I'd be happy to see if I could prune this check more; It seems like there's still a lot of time being consumed here.

This revision was automatically updated to reflect the committed changes.