This is an archive of the discontinued LLVM Phabricator instance.

[MemDep] Reduce block limit
ClosedPublic

Authored by nikic on Dec 15 2022, 5:37 AM.

Details

Summary

The non-local MemDep analysis has a limit on the number of blocks it will scan trying to find dependencies. The current limit of 1000 is ridiculously high, especially when we consider that each block scan can also visit up to 100 instructions. In degenerate cases (where we actually scan that many blocks) MemDep/GVN dominate overall compile-time, for little benefit.

This patch reduces the limit to 100, which is probably still too large, but at least avoids some of the more catastrophic cases. (For comparison, MSSA clobber walks consider up to 100 MemoryDefs/MemoryPhis, rather than 100 blocks * 100 instructions, but these limits aren't directly comparable.)

The impact on relevant GVN statistics from llvm-test-suite is as follows:

                   |   Old |   New |  Diff
gvn.NumGVNLoad     | 19298 | 19246 | -0.27%
gvn.NumPRELoad     | 13983 | 13963 | -0.14%
gvn.NumPRELoopLoad |   703 |   702 | -0.14%

The impact on compile-time is as follows: http://llvm-compile-time-tracker.com/compare.php?from=92619956eb27ef08dd24045307593fc3d7f78db0&to=675a7cdab6ef84b994b00b2d0e2f146634056c9d&stat=instructions:u

        | geomean
O3      |  -0.30%
ThinLTO |  -0.52%
LTO-g   |  -0.95%

In addition to the average improvement, this also fixes some degenerate cases. For example, libclamav_htmlnorm.c improves by 13-23%, depending on build configuration.

I know that we were kind of hoping that this issue would resolve itself in time, either by a switch to NewGVN or use of MSSA in GVN. But I think we should still address this in the meantime. Additionally, a switch to an MSSA-based implementation will effectively be doing this as well, in a roundabout way (by dint of MSSA having lower cutoffs than MDA).

Diff Detail

Event Timeline

nikic created this revision.Dec 15 2022, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 5:37 AM
nikic requested review of this revision.Dec 15 2022, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 5:37 AM

Reducing this limit is certainly the right thing to do. To what value so as to not cause large regressions is a question I'd like to answer with some experimental data.
I'll start some experiments and come back to this.

nikic added a comment.Jan 2 2023, 5:49 AM

Reducing this limit is certainly the right thing to do. To what value so as to not cause large regressions is a question I'd like to answer with some experimental data.
I'll start some experiments and come back to this.

Thanks!

I'm seeing one instance of increase in compile-time with the 100 limit and a couple of small run time regressions. The results look somewhat cleaner with a cap of 300. Could we use 300 as an intermediate step?

nikic updated this revision to Diff 488195.Jan 11 2023, 6:48 AM

Default to 300 blocks.

Compile-time results for this are a bit underwhelming, but I guess it doesn't hurt to start conservatively: http://llvm-compile-time-tracker.com/compare.php?from=cc845e9de8c87c74d494f4e90e8fcf4fca264989&to=1b691111c3e80929268c8b749470d2b5bfad6f9d&stat=instructions:u

Did you happen to also try 200 blocks?

asbirlea accepted this revision.Jan 12 2023, 10:42 AM

I did now and it also looks clean. Commit with 200?

This revision is now accepted and ready to land.Jan 12 2023, 10:42 AM
This revision was landed with ongoing or failed builds.Jan 13 2023, 1:40 AM
This revision was automatically updated to reflect the committed changes.