For large basic blocks CaptureTracking may become very expensive: for each memory related instruction considered, walks top down the BB to check ordering and dominance among two instructions. In my testcase there are 81782 instructions in a BB, which leads to compilation time of ~12min w/ 'opt -O1'. This triggers in the flow: DeadStoreElimination -> MepDepAnalysis -> CaptureTracking.
To fix the compile time bloat, the patch changes 'shouldExplore' to do two things:
- Add a special case to compute the ordering between instructions when both are in the same basic block. It avoids the use of two expensive functions in this scenario: 'dominates' and 'isPotentiallyReachable'.
- Limit the search by using a threshold=1000 on the number of instructions to search. This limit presented no measurable regressions on the test-suite.
This lead to a compile time reduction of 'opt -O1' from ~12min to 1s.