hasPredecessorHelper method, that is used by DAGCombiner to combine to pre-indexed and post-indexed load/stores, is a major source of slowdown while compiling a large function with MSan enabled on Arm. This patch caps the DFS-graph traversal for this method to 8192 which cuts compile time by 50% (4m -> 2m compile time) at the cost of less overall nodes combined.
Here's the summary of pre-index DAG nodes created and time it took to compile the pathological case with different MaxDepth limit:
- With MaxDepth = 0 (unlimited): 1800, took 4m
- With MaxDepth = 32k, 560, took 2m31s
- With MaxDepth = 8k, 139, took 2m.