The non-local MemDep analysis has a limit on the number of blocks it will scan trying to find dependencies. The current limit of 1000 is ridiculously high, especially when we consider that each block scan can also visit up to 100 instructions. In degenerate cases (where we actually scan that many blocks) MemDep/GVN dominate overall compile-time, for little benefit.
This patch reduces the limit to 100, which is probably still too large, but at least avoids some of the more catastrophic cases. (For comparison, MSSA clobber walks consider up to 100 MemoryDefs/MemoryPhis, rather than 100 blocks * 100 instructions, but these limits aren't directly comparable.)
The impact on relevant GVN statistics from llvm-test-suite is as follows:
| Old | New | Diff gvn.NumGVNLoad | 19298 | 19246 | -0.27% gvn.NumPRELoad | 13983 | 13963 | -0.14% gvn.NumPRELoopLoad | 703 | 702 | -0.14%
The impact on compile-time is as follows: http://llvm-compile-time-tracker.com/compare.php?from=92619956eb27ef08dd24045307593fc3d7f78db0&to=675a7cdab6ef84b994b00b2d0e2f146634056c9d&stat=instructions:u
| geomean O3 | -0.30% ThinLTO | -0.52% LTO-g | -0.95%
In addition to the average improvement, this also fixes some degenerate cases. For example, libclamav_htmlnorm.c improves by 13-23%, depending on build configuration.
I know that we were kind of hoping that this issue would resolve itself in time, either by a switch to NewGVN or use of MSSA in GVN. But I think we should still address this in the meantime. Additionally, a switch to an MSSA-based implementation will effectively be doing this as well, in a roundabout way (by dint of MSSA having lower cutoffs than MDA).