The candidate collection method in the outliner can cause some dramatic compile-time regressions on large tests.
Currently, it works like this:
- Build a suffix tree over the instructions in the program
- Query the tree for good candidates
- Prune tree
- Go to step 2 until no candidates are found.
This hinges off the assumption that pruning the tree based off of overlapping candidates will reduce the search space enough that subsequent queries will take a negligible amount of time. Unfortunately, this assumption is shaky at best. Certain programs don't actually have many overlapping candidates. For very large tests, this is a big problem.
The new candidate collection method works like this:
- Build a suffix tree over the instructions in the program
- Iterate over each leaf in the tree.
- Visit the parent of each leaf. If the parent represents a beneficial string, save a candidate for it.
There are O(I) leaves, where I is the number of instructions in the program. Since we only walk it once, this is now guaranteed to be at worst O(I). Overlapping candidates are now handled entirely by the pruneOverlaps method that handled anything the suffix tree pruning didn't handle. Since that method is on average O(n), this behaves far better.
Improvements
Compile-time was measured by compiling the LLVM test suite for AArch64 using LNT. Clang with Oz and mno-red-zone was tested against clang with Oz, mno-red-zone, and -mllvm -enable-machine-outliner.
Using the old candidate collection method, the worst compile-time regression was a spectacular 284.07 seconds on MultiSource/Benchmarks/7zip/7zip-benchmark.test. This was followed up by MultiSource/Benchmarks/Bullet/bullet.test with a 257.96 second regression. On average, there was a 9.69 second compile-time regression and a median compile-time regression of 0.85 seconds.
Using the new candidate collection method, the worst compile-time regression was 0.542 seconds for MultiSource/Benchmarks/7zip/7zip-benchmark.test, followed up by 0.496 seconds for MultiSource/Benchmarks/Bullet/bullet.test. On average, there was a 0.01 second compile-time regression, with a median compile-time regression of 0.002s.
Changes to code size
This change doesn't impact the code size results. In fact, by collecting all potential candidates, we can probably make the outliner make better decisions for what to outline in the future.
Why not just phrase this routine as a single tree walk visiting the internal nodes instead of doing this thing where you walk the leaf vector looking at parents?