PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.
Details
Diff Detail
Event Timeline
lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp | ||
---|---|---|
343 | Why not use DTU for the whole thing ;) | |
432–434 | We can use getAnalysisIfAvailable to only get the DT if it's already been created by some other pass in the pipeline. That way we can preserve it if it already exists, but don't have to recalculate it if not. | |
444–445 | I think the new pass manager equivalent to getAnalysisIfAvailable is getCachedResult<..>(F). |
lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp | ||
---|---|---|
343 | SplitBlock currently only accepts DT. I'll take care of making these APIs consistent later. |
The updates looks correct to me. Can you add a test for pgo-memop-opt that preserves the DT and verifies it's correct.
lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp | ||
---|---|---|
343 | That sounds sensible. To do this bit at a time. If/when you do try to convert SplitBlock to take a DTU, I think it may be best to just convert it to use the incremental updater, not using what DominatorTreeBase::splitBlock does right now. See D14723 if you want to make PDT's work with splitBlock, but I suspect the incremental updater will easier and not much slower (especially if the updates can be done lazily). | |
435 | This is often done with "DominatorTree *DT = DTWP ? &DTWP->getDomTree() : nullptr;" to put it all on a single line. |
lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp | ||
---|---|---|
343 | For converting splitBlock to use the incremental updater, because DTU can return the current UpdateStratrgy, I think we can implement the code to use the incremental way when the strategy is lazy and use the current approach + PDT support when the strategy is eager. I used this approach in D48967 (llvm::MergeBasicBlockIntoOnlyPred and llvm::MergeBlockIntoPredecessor). If some timing/benchmarking shows that the performance won't be affected too much, we can then make splitBlock to use the incremental updater under both updateStrategies. |
LGTM
test/Transforms/PGOProfile/memop_clone.ll | ||
---|---|---|
1 | pgo-memop-opt requires LoopInfo, which requires DomTree, right? So this will test preserving the DT. SGTM. |
test/Transforms/PGOProfile/memop_clone.ll | ||
---|---|---|
1 | Yes. I also tried with an invalid DomTree, it indeed asserts as expected. |
Why not use DTU for the whole thing ;)