As we have seen/established in D84108, the cut-off thresholds for Memory Dependence Analysis may need tuning.
In particular, currently they are:
- scan up to 100 instructions per block. IMO this may be a bit low. (rL179713, changed from 500)
- Scan up to 1000 (*sic*) blocks. That seems to be somewhat high.. (D16123)
- There is no cumulative limit on scanned instructions, so one query can end up scanning up to 100 instructions/block * 1k blocks = 100k instructions. Just as @nikic noted in https://reviews.llvm.org/D84108#2174113, that is indeed kinda just insane.
I've collected some numbers (with limits originally unset, on test-suite + RawSpeed + darktable),
what is the maximal value of a statistic for a single query per-TU:
Number of instructions scanned in block
50% 75% 80% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 99.5% 99.9% 99.95% 100% 30.00 65.00 114.00 176.30 209.00 246.24 293.21 358.18 388.60 430.00 480.00 593.00 920.00 2077.23 6087.00 6222.00 6222.00
Proposed action: no change, keep it at 100, which is ~80'th percentile - the cut-off is actively helpful.
Number of blocks scanned
50% 75% 80% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 99.5% 99.9% 99.95% 100% 10.0 39.0 55.0 61.0 67.0 77.0 88.0 94.0 104.0 125.0 145.0 179.0 278.0 355.5 670.5 1045.4 1908.0
The current limit is 1000, which is 99.95'th percentile,
i.e. we aren't really enforcing anything with it.
I propose to lower it to 99th percentile (250).
Instructions scanned total in a single query
50% 75% 80% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 99.5% 99.9% 99.95% 100% 66.0 318.0 336.0 463.0 498.0 518.0 571.0 683.0 791.0 986.0 1321.0 2241.0 3872.0 4847.5 8744.3 8828.0 16250.0
I propose to go with 96'th-97'th percentile, and establish a new limit of 1000,
which is a reduction of 100 times from the current implicit limit,
and 25 times less than the new implied limit.
So IOW, we will now be willing to process 4x less blocks per query,
and 100x less instructions per query total.
This does help somewhat with the D84108's lencod context_ini compile-time regression.
That being said, these are rather arbitrary cut-off's, that happen to seem sane
on the code bases viewed, so inevitably there's some code somewhere that doesn't
fit within them, and it may therefore be penalized by lower thresholds.
But compile-time budget is finite, so we have to make some decisions, sometimes.
llvm-compile-time-tracker is unsurprisingly happy about this
http://llvm-compile-time-tracker.com/compare.php?from=b1210c059d1ef084ecd275ed0ffb8343ac3cdfad&to=e3c7a0b96f8800ab21e2d5a8be414e5f83fa1aed&stat=instructions
Geomeans: -O3 -0.17%, ReleaseLTO -0.5%..-1.0%,
I would suggest changing order of decrement and the check (as done in getSimplePointerDependencyFrom) to get more consistent behavior in case of zero default limit.