Currently ScalarEvolution::getUsedLoops traverses a SCEV expression upon each call
w/o caching the results. As it's called by addToLoopUseLists for every new SCEV node created
it is time consuming. This patch adds a memoization map to speed up the calls and tries
to do so in the least invasive manner possible.
This partially addresses https://bugs.llvm.org/show_bug.cgi?id=32731
On the large-SCEVs-shallow-getSCEV-stack.ll test case attached to the Bugzilla bug I see
~70% reduction in overall (wall) time of
time ./bin/opt -slsr large-SCEVs-shallow-getSCEV-stack.ll -o /dev/null
run (the rest of the time (about 98% give or take) is still spent in checkValidity).
CTMark shows either a small improvement or is within noise, it's hard to tell even
on a 100 runs (x86):
Name | Prev | Current | % | Δ | MAD | Prev | Current | % | Δ | MAD | |
---|---|---|---|---|---|---|---|---|---|---|---|
CTMark/ClamAV/clamscan | 9.8202 | 9.8004 | -0.20% | -0.0198 | 0.0754 | 9.9054 | 9.8899 | -0.16% | -0.0155 | 0.0754 | |
CTMark/kimwitu++/kc | 11.0440 | 11.0182 | -0.23% | -0.0258 | 0.0268 | 11.0805 | 11.0802 | 0.00% | -0.0003 | 0.0268 | |
CTMark/tramp3d-v4/tramp3d-v4 | 11.6199 | 11.5813 | -0.33% | -0.0386 | 0.0388 | 11.6906 | 11.6903 | 0.00% | -0.0003 | 0.0388 | |
CTMark/7zip/7zip-benchmark | 27.3075 | 27.2473 | -0.22% | -0.0602 | 0.1511 | 27.8942 | 27.8941 | 0.00% | -0.0001 | 0.1511 | |
CTMark/sqlite3/sqlite3 | 4.7515 | 4.7446 | -0.15% | -0.0069 | 0.0278 | 4.7894 | 4.7893 | 0.00% | -0.0001 | 0.0278 | |
CTMark/7zip/7zip-benchmark-link | 0.0443 | 0.0438 | -1.13% | -0.0005 | 0.0010 | 0.0457 | 0.0457 | 0.00% | 0.0000 | 0.0010 | |
CTMark/Bullet/bullet-link | 0.0320 | 0.0319 | -0.31% | -0.0001 | 0.0006 | 0.0329 | 0.0329 | 0.00% | 0.0000 | 0.0006 | |
CTMark/ClamAV/clamscan-link | 0.0234 | 0.0234 | 0.00% | 0.0000 | 0.0003 | 0.0238 | 0.0238 | 0.00% | 0.0000 | 0.0003 | |
CTMark/SPASS/SPASS-link | 0.0227 | 0.0227 | 0.00% | 0.0000 | 0.0002 | 0.0230 | 0.0230 | 0.00% | 0.0000 | 0.0002 | |
CTMark/consumer-typeset/consumer-typeset-link | 0.0239 | 0.0239 | 0.00% | 0.0000 | 0.0003 | 0.0246 | 0.0246 | 0.20% | 0.0000 | 0.0003 | |
CTMark/kimwitu++/kc-link | 0.0528 | 0.0528 | 0.00% | 0.0000 | 0.0003 | 0.0534 | 0.0534 | 0.09% | 0.0000 | 0.0003 | |
CTMark/lencod/lencod-link | 0.0238 | 0.0238 | 0.00% | 0.0000 | 0.0003 | 0.0246 | 0.0246 | 0.00% | 0.0000 | 0.0003 | |
CTMark/mafft/pairlocalalign-link | 0.0174 | 0.0174 | 0.00% | 0.0000 | 0.0001 | 0.0176 | 0.0176 | 0.28% | 0.0000 | 0.0001 | |
CTMark/sqlite3/sqlite3-link | 0.0141 | 0.0141 | 0.00% | 0.0000 | 0.0001 | 0.0143 | 0.0143 | 0.00% | 0.0000 | 0.0001 | |
CTMark/tramp3d-v4/tramp3d-v4-link | 0.0238 | 0.0238 | 0.00% | 0.0000 | 0.0001 | 0.0240 | 0.0240 | 0.00% | 0.0000 | 0.0001 | |
CTMark/consumer-typeset/consumer-typeset | 7.6772 | 7.6499 | -0.36% | -0.0273 | 0.0343 | 7.7051 | 7.7058 | 0.01% | 0.0006 | 0.0343 | |
CTMark/mafft/pairlocalalign | 4.9020 | 4.8927 | -0.19% | -0.0093 | 0.0199 | 4.9349 | 4.9355 | 0.01% | 0.0006 | 0.0199 | |
CTMark/SPASS/SPASS | 9.0386 | 9.0233 | -0.17% | -0.0153 | 0.0716 | 9.1167 | 9.1181 | 0.02% | 0.0014 | 0.0716 | |
CTMark/lencod/lencod | 8.5570 | 8.5427 | -0.17% | -0.0143 | 0.0397 | 8.5944 | 8.5960 | 0.02% | 0.0016 | 0.0397 | |
CTMark/Bullet/bullet | 20.3327 | 20.3012 | -0.15% | -0.0315 | 0.1420 | 20.7366 | 20.7445 | 0.04% | 0.0079 | 0.1420 | |
(the left half of the table uses minimum as an aggregate function, the right half - median, the underlying data are the same 100 samples,
rows are sorted by the absolute delta of medians)
If the memory consumption becomes a concern we can switch from having a full set of loops referenced attached to every SCEV node
to a mirrored tree (DAG really) structure in a skip-list fashion so every node contains a set of unique references to all closest AddRec
nodes. This way getUsedLoops will be able to traverse a compressed tree (DAG) containing AddRec's only for any SCEV expression.
I don't think it is sufficient for correctness. Imagine the situation: A = {1,+,1}<some_loop>, B = A + 1. Something has changed for A, and it is no longer using some_loop. How do we say that B is not using it as well?
I also think that something needs to be done about it in forgetLoop. This method is called, for example, when a loop becomes non-existent, or when the set of its blocks changes. And you may end up with many cache bundles keeping references to this loop.