In ThinLTO backend compile, OPTOptions are not set so that the ICP in ThinLTO backend does not know if it is a SamplePGO build, in which profile count needs to be annotated directly on call instructions. This patch cleaned up the PGOOptions handling logic and passes down PGOOptions to ThinLTO backend.
Details
Diff Detail
- Build Status
Buildable 8734 Build 8734: arc lint + arc unit
Event Timeline
lib/Passes/PassBuilder.cpp | ||
---|---|---|
765–767 | This is a *very* significant change in the PGO instrumentation pass ordering. Before, we would do non-trivial cleanup of the IR module before adding instrumentation passes. Now they get added before any of that cleanup. This includes thing such as the first (and primary) SSA formation pass, lowering @llvm.expect intrinsics into branch weight metadata, dead argument elimination, global-opt, etc. Was all of that intended? Is it really OK to instrument before those cleanups? |
Instr PGO has its own set of simplification passes, but still I think we should keep this patch NFC for instrPGO. If there are performance numbers that justify the InstrPGO pipeline changes, it can be discussed/done in a separate thread. This one should be left as SamplePGO only change.
lib/Passes/PassBuilder.cpp | ||
---|---|---|
765–767 | There are still some optimizations invoked in addPGOInstrPasses before the actual instrumentation: FunctionPassManager FPM; FPM.addPass(SROA()); FPM.addPass(EarlyCSEPass()); // Catch trivial redundancies. FPM.addPass(SimplifyCFGPass()); // Merge & remove basic blocks. FPM.addPass(InstCombinePass()); // Combine silly sequences. invokePeepholeEPCallbacks(FPM, Level); CGPipeline.addPass(createCGSCCToFunctionPassAdaptor(std::move(FPM))); MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(std::move(CGPipeline))); If we invoke this in the original location, SROA/EarlyCSE/SimplifyCFG will be invoked twice before entering PGO instr/use pass, seems redundant to me. I've also tested the instrumentation PGO on our largest benchmark, and did not see performance change. I know these EarlyFPM were to match the behavior of the legacy PM. But maybe it should not belong to buildModuleSimplificationPipeline, which is called by both regular pipeline, prelink, and thinlto default pipeline. At least thinlto default pipeline does not need these passes? |
The tricky part is that PGOOptions was not passed in the ThinLTO backend before this patch. But after this patch, it will be passed in. However, in buildModuleSimplificationPipeline, it cannot know if this is the ThinLTO backend. As a result, we cannot disable instrumentation/annotation pass in there.
The potential solution could be:
- add another boolean value in buildModuleSimplificationPipeline to indicate it's ThinLTO backend
- refactor the code from the beginning of buildModuleSimplificationPipeline to the PGOOpt handling to a separate function, and call it separately in the caller of buildModuleSimplificationPipeline.
Any suggestions on how to move forward?
Thanks,
Dehao
FWIW, I do agree with you Dehao that we seem to be deing redundant simplification, but as David says here we shouldn't change that behavior in this patch.
The tricky part is that PGOOptions was not passed in the ThinLTO backend before this patch. But after this patch, it will be passed in. However, in buildModuleSimplificationPipeline, it cannot know if this is the ThinLTO backend. As a result, we cannot disable instrumentation/annotation pass in there.
The potential solution could be:
- add another boolean value in buildModuleSimplificationPipeline to indicate it's ThinLTO backend
- refactor the code from the beginning of buildModuleSimplificationPipeline to the PGOOpt handling to a separate function, and call it separately in the caller of buildModuleSimplificationPipeline.
I actually think the second option might be interesting, but before we do that I think understanding how much redundancy there really is in the cleanup passes is important. That will help inform how to refactor these into separate components, or if there even *are* separate components. So I largely agree w/ David that something like #1 is the best path forward.
However, rather than adding another boolean value though, I think you should turn the current boolean into an enumeration for ThinLTO phase; "none" by default, and then with beth prelink and postlink values.
Note that I don't think we should call the phase "backend" or "codegen". Those are heavily overloaded in LLVM, so I think pre/post-link is a better terminology to use.
-Chandler
Thanks Chandler and David for the suggestion.
I've sent a separate patch https://reviews.llvm.org/D36053 to refactor the code. The patch is NFC, but I'd like to ask for your opinions on the naming/description to see if it's concise. Thanks in advance for comments.
Looks awesome, LGTM.
This does move ICP to pre-cleanup passes for non-ThinLTO builds with sample PGO. I assume you've tested it and this is fine. (It also moves the sample profile loading, but that one seems completely innocuous to me.)
Yes, verified that this does not affect non-ThinLTO performance. commit the patch now.
This is a *very* significant change in the PGO instrumentation pass ordering.
Before, we would do non-trivial cleanup of the IR module before adding instrumentation passes. Now they get added before any of that cleanup. This includes thing such as the first (and primary) SSA formation pass, lowering @llvm.expect intrinsics into branch weight metadata, dead argument elimination, global-opt, etc.
Was all of that intended? Is it really OK to instrument before those cleanups?