The regular and LTO pipelines diverge starting from the loop vectorizer and ending with instcombine cleanup, but I don't know why that should happen. This patch unifies the passes at that stage of the pipelines. The old pass manager is updated to keep it synchronized.
To prevent unintended divergence in the future, I created a helper function so changes to this part of the pipeline will remain identical between regular and LTO. We can add an isLTO flag is we really want them to be different.
The difference for the regular pipeline is that we unroll directly after the loop vectorizer instead of waiting until after SLP/VectorCombine. That eliminates the need for one stage of instcombine. This reduced compile-time by about 2.7%:
https://llvm-compile-time-tracker.com/?config=LegacyPM-O3&stat=instructions&remote=rotateright
The difference for the LTO pipeline is that we inherit a run of LoopLoadElimination and drop SCCP, InstCombine, BDCE, and AlignmentFromAssumptions. This reduced compile-time by about 1.8%:
https://llvm-compile-time-tracker.com/?config=LegacyPM-ReleaseLTO-g&stat=instructions&remote=rotateright
(You can see that the results for NewPM are similar, but I wasn't sure how to squash the intermediate experiments to show that directly in the tables.)
We may see perf regressions from these changes, but then we can add/move passes to recover and add tests to verify/document that the changes are intentional. Right now, we only have tests that show less unrolling for x86, but that doesn't actually seem like a bad thing to me.
These pipeline changes were suggested by the discussion in D100802 (but this patch doesn't change the LICM difference).
I'm not sure what this comment is trying to say exactly. I think it's coming from somewhere that is very old and out of date now.
My very high level understand of the pass pipeline, at least for non-LTO in terms of loops is that we do:
There are some extra simplifications that need to happen in between too. The last unrolling, especially on smaller inorder cores has nothing really to do with vectorization. It's done near the end of the pipeline because the runtime unrolling isn't expected to be helpful to anything else, but up to that point we have not done runtime unrolling.