The regular and LTO pipelines diverge starting from the loop vectorizer and ending with instcombine cleanup, but I don't know why that should happen. This patch unifies the passes at that stage of the pipelines. The old pass manager is updated to keep it synchronized.
To prevent unintended divergence in the future, I created a helper function so changes to this part of the pipeline will remain identical between regular and LTO. We can add an isLTO flag is we really want them to be different.
The difference for the regular pipeline is that we unroll directly after the loop vectorizer instead of waiting until after SLP/VectorCombine. That eliminates the need for one stage of instcombine. This reduced compile-time by about 2.7%:
The difference for the LTO pipeline is that we inherit a run of LoopLoadElimination and drop SCCP, InstCombine, BDCE, and AlignmentFromAssumptions. This reduced compile-time by about 1.8%:
(You can see that the results for NewPM are similar, but I wasn't sure how to squash the intermediate experiments to show that directly in the tables.)
We may see perf regressions from these changes, but then we can add/move passes to recover and add tests to verify/document that the changes are intentional. Right now, we only have tests that show less unrolling for x86, but that doesn't actually seem like a bad thing to me.
These pipeline changes were suggested by the discussion in D100802 (but this patch doesn't change the LICM difference).