There are several modifications to the optimizations performed by the
ThinLTO pre link compile when building with Sample PGO, in order to get
better matching of the SamplePGO when it is re-applied in the backend.
These same tunes should be done for full LTO pre-links as well, as
presumably the same matching issues could occur there.
There are a few issues with this patch as it stands, relating to the
fact that not all of these optimizations are attempted again in the full
LTO backend, owing to the larger compile time with the monolithic LTO.
Specifically, this includes some loop optimizations:
- In the old PM, LoopUnrollAndJam is not done in the full LTO backend.
- In the new PM, none of the loop unrolling is done in the full LTO
backend. The comments indicate that this is in part due to issues with
the new PM loop pass manager (presumably sorted out by now, but I
haven't followed this). Here is the comment:
// FIXME: at this point, we run a bunch of loop passes: // indVarSimplify, loopDeletion, loopInterchange, loopUnroll, // loopVectorize. Enable them once the remaining issue with LPM // are sorted out.
So what needs to happen still is to either:
- Continue to diverge the ThinLTO and full LTO pre-link pipelines for
these optimizations (which means this patch needs to be adjusted).
OR
- Fix the full LTO post-link pipelines to ensure these optimizations
all occur there.
this also need to be PrepareForThinLTO || PrepareForLTO for oldPM?