[CaptureTracking] Avoid long compilation time on large basic blocks
CaptureTracking becomes very expensive in large basic blocks while
calling PointerMayBeCaptured. PointerMayBeCaptured scans the BB the
number of times equal to the number of uses of 'BeforeHere', which is
currently capped at 20 and bails out with Tracker->tooManyUses().
The bottleneck here is the number of calls to PointerMayBeCaptured * the
basic block scan. In a testcase with a 82k instruction BB,
PointerMayBeCaptured is called 130k times, leading to 'shouldExplore'
taking 527k runs, this currently takes ~12min.
To fix this we locally (within PointerMayBeCaptured) number the
instructions in the basic block using a DenseMap to cache instruction
positions/numbers. We build the cache incrementally every time we need
to scan an unexplored part of the BB, improving compile time to only
This triggers in the flow: DeadStoreElimination -> MepDepAnalysis ->
Side note: after multiple runs in the test-suite I've seen no
performance nor compile time regressions, but could note a couple of
compile time improvements:
Performance Improvements - Compile Time Delta Previous Current StdDev
SingleSource/Benchmarks/Misc-C++/bigfib -4.48% 0.8547 0.8164 0.0022
MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl -1.47% 1.3912 1.3707 0.0056
Differential Revision: http://reviews.llvm.org/D7010