This registers a skeleton pass.
Diff Detail
- Repository
- rL LLVM
Event Timeline
Have you considered integrating this in the current DSE infrastructure instead of having a separate pass?
include/llvm/Analysis/MemoryLocation.h | ||
---|---|---|
121–122 | This seems to be unrelated. Split? | |
include/llvm/Transforms/Scalar/PDSE.h | ||
1–8 | This needs copyright + doxygen comment as every other pass in LLVM. | |
lib/Transforms/Scalar/CMakeLists.txt | ||
48–49 | ASCII Unsorted, I think (here and everywhere else, D < a) | |
lib/Transforms/Scalar/PDSE.cpp | ||
1–29 | Thanks for the paper references. This needs to be changed to be similar to every other cpp file in LLVM (copyright etc..) | |
40–41 | This TODO really belongs here or you can throw it away? |
Also, it would be great if you can provide real examples of where this helps (e.g. how often do you expect this to trigger in practice, and why it matters).
I assume you experimented enough with this to have a complete'ish pass somewhere out-of-tree. If so, did you run this on something to measure the impact?
Side question: do you want this to be run as part of the default pipeline eventually? What's the compile time cost?
I can provide cases :)
There are a lot of them.
It's PRE for stores.
I assume you experimented enough with this to have a complete'ish pass somewhere out-of-tree. If so, did you run this on something to measure the impact?
I mentioned a lot of this to bryant yesterday, that these are things that we'd have to work through.
Side question: do you want this to be run as part of the default pipeline eventually? What's the compile time cost?
Staring at it, i'm pretty confident the compile time cost can be made minimal
The biggest cost is going to be the IDF calculation.
We can use one of the multiple-variable phi placement algorithms, or actually use the incremental construction algorithm from the new SSA updater, on the reverse graph, to place lambdas exactly where they are needed
(this is a bit trickier, but lowest cost option).
If *all* of that fails, we can make it non-sparse and solve for all variables at once.
- Insert PDSE.cpp into the right place in CMakeLists.txt.
- Add copyright headers to impl and header files.
- Re-worded file-level summary a bit.
- -print-pdse is now -print-frg because that makes a bit more sense.
- Explain why BreakCriticalEdges is required.
lib/Transforms/Scalar/PDSE.cpp | ||
---|---|---|
60 |
SSAPRE (resp. PDSE) needs critical edges to be broken in order for down-unsafety (resp. up-unsafety) propagation to be correct. I'm not sure if this condition is already met by default. I've tried adding BreakCriticalEdges as dependency, but this causes a crash in the legacy pass manager: diff --git a/lib/Transforms/Scalar/PDSE.cpp b/lib/Transforms/Scalar/PDSE.cpp index 941b184..f576260 100644 --- a/lib/Transforms/Scalar/PDSE.cpp +++ b/lib/Transforms/Scalar/PDSE.cpp @@ -723,11 +723,13 @@ public: void getAnalysisUsage(AnalysisUsage &AU) const override { AU.addRequired<PostDominatorTreeWrapperPass>(); AU.addRequired<AAResultsWrapperPass>(); + AU.addRequiredID(BreakCriticalEdgesID); AU.addRequired<TargetLibraryInfoWrapperPass>(); AU.setPreservesCFG(); AU.addPreserved<PostDominatorTreeWrapperPass>(); AU.addPreserved<GlobalsAAWrapperPass>(); + AU.addPreservedID(BreakCriticalEdgesID); } static char ID; // Pass identification, replacement for typeid @@ -741,6 +743,7 @@ INITIALIZE_PASS_BEGIN(PDSELegacyPass, "pdse", "Partial Dead Store Elimination", INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass) INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass) INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass) +INITIALIZE_PASS_DEPENDENCY(BreakCriticalEdges) INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass) INITIALIZE_PASS_END(PDSELegacyPass, "pdse", "Partial Dead Store Elimination", false, false) Args: opt -debug -debug-pass=Details -print-before-all -print-after-all -print-frg -pdse test.ll -- 'Break critical edges in CFG' is not preserving 'Function Alias Analysis Results' -- 'Break critical edges in CFG' is not preserving 'Print function to stderr' -- 'Break critical edges in CFG' is not preserving 'Post-Dominator Tree Construction' -- 'Break critical edges in CFG' is not preserving 'Basic Alias Analysis (stateless AA impl)' -- 'Break critical edges in CFG' is not preserving 'Function Pass Manager' Pass Arguments: -targetlibinfo -tti -assumption-cache-tracker -postdomtree -domtree -basicaa -aa -print-function -break-crit-edges -print-function -print-function Target Library Information Target Transform Information Assumption Cache Tracker ModulePass Manager FunctionPass Manager Post-Dominator Tree Construction Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Print function to stderr Break critical edges in CFG Print function to stderr Print function to stderr Unable to schedule 'Post-Dominator Tree Construction' required by 'Partial Dead Store Elimination' Unable to schedule pass UNREACHABLE executed at /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1243! #0 0x00007ff310f4d989 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/bryant/3rd/llvm/build/../lib/Support/Unix/Signals.inc:402:11 #1 0x00007ff310f4dba9 PrintStackTraceSignalHandler(void*) /home/bryant/3rd/llvm/build/../lib/Support/Unix/Signals.inc:466:1 #2 0x00007ff310f4aaa7 llvm::sys::RunSignalHandlers() /home/bryant/3rd/llvm/lib/Support/Signals.cpp:0:5 #3 0x00007ff310f4df30 SignalHandler(int) /home/bryant/3rd/llvm/build/../lib/Support/Unix/Signals.inc:256:1 #4 0x00007ff3102988d0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0xf8d0) #5 0x00007ff30f8ea067 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x35067) #6 0x00007ff30f8eb448 abort (/lib/x86_64-linux-gnu/libc.so.6+0x36448) #7 0x00007ff310e264a0 LLVMInstallFatalErrorHandler /home/bryant/3rd/llvm/lib/Support/ErrorHandling.cpp:133:0 #8 0x00007ff312250664 /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1243:3 #9 0x00007ff31224f288 llvm::PMDataManager::add(llvm::Pass*, bool) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1018:22 #10 0x00007ff31225374a llvm::FunctionPass::assignPassManager(llvm::PMStack&, llvm::PassManagerType) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1858:1 #11 0x00007ff31224c4b5 llvm::PMTopLevelManager::schedulePass(llvm::Pass*) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:690:7 #12 0x00007ff31225e12f llvm::legacy::PassManagerImpl::add(llvm::Pass*) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:398:3 #13 0x00007ff312252e11 llvm::legacy::PassManager::add(llvm::Pass*) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1719:1 #14 0x0000000000270a3f (opt+0x270a3f) #15 0x000000000026c2f8 (opt+0x26c2f8) #16 0x00007ff30f8d6b45 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b45) #17 0x0000000000242029 (opt+0x242029) Stack dump: 0. Program arguments: opt -debug -debug-pass=Details -print-before-all -print-after-all -print-frg -pdse test.ll If need be, I could fix this problem first before committing the patch. |
It should run faster than the existing DSE, which does a lot of quadratic complexity clobber checks. PDSE only makes linear time passes (apart from the initial pass that collects and categorizes occurrences, which does a quadratic number of alias checks).
Staring at it, i'm pretty confident the compile time cost can be made minimal
The biggest cost is going to be the IDF calculation.
Some additional avenues for speed-ups:
- I think I can re-use MemorySSA's AccessLists (instead of doing an initial roll through every single instruction like https://reviews.llvm.org/D29866 does now), since it already includes may-throws. I just need to walk each BB's list in reverse. The down side is that store insertions are a bit trickier since the MSSA structure ought to be preserved.
- For lambda insertion, the def blocks need to be computed. This means make an additional pass through the function to search for kill-only blocks (since real occurrence blocks are already known), but https://reviews.llvm.org/D29866 makes this extra pass per occurrence class. I think this could instead be done in one pass for all classes.
It is possible to do it once per pass at the cost of over-computing the set, and ending up with useless lambdas that you need to eliminate.
Otherwise, you can also just do it on the fly using the dual of http://www.cdl.uni-saarland.de/papers/bbhlmz13cc.pdf
Just some minor comments/questions and some nitpicking.
Thanks for working on this!
Filipe
include/llvm/Transforms/Scalar/PDSE.h | ||
---|---|---|
2 | Missing space. | |
lib/Transforms/Scalar/PDSE.cpp | ||
44 | From this TODO, it seems for the foreseeable future (first iterations and commits of this pass), it's not going to cover partial overwrites like D30703 is adding. This will help figure out if D30703 should wait for PDSE, or be committed with maybe a TODO mentioning PDSE and that it might not be needed in the future. | |
65 | Maybe make the option be -print-pdse-frg (print is more important) or -pdse-print-frg (pdse is more important), so it's easier to tell where the option will be active. And/or add a PDSE reference to the description. | |
76 | No brackets after return, please. |
dannyb asked me to take over the review.
Can you fix the issues that filcab identified? I'll see if I can get you an answer about critical edge splitting.
This seems to be unrelated. Split?