[PDSE] Add a no-op pass.
Needs RevisionPublic

Authored by bryant on Feb 11 2017, 10:26 AM.

Details

Summary

This registers a skeleton pass.

Diff Detail

Repository
rL LLVM
bryant created this revision.Feb 11 2017, 10:26 AM
davide requested changes to this revision.Feb 12 2017, 12:48 PM

Have you considered integrating this in the current DSE infrastructure instead of having a separate pass?

include/llvm/Analysis/MemoryLocation.h
121–122 ↗(On Diff #88098)

This seems to be unrelated. Split?

include/llvm/Transforms/Scalar/PDSE.h
2–9

This needs copyright + doxygen comment as every other pass in LLVM.

lib/Transforms/Scalar/CMakeLists.txt
49–50

ASCII Unsorted, I think (here and everywhere else, D < a)

lib/Transforms/Scalar/PDSE.cpp
2–30

Thanks for the paper references. This needs to be changed to be similar to every other cpp file in LLVM (copyright etc..)

41–42

This TODO really belongs here or you can throw it away?

This revision now requires changes to proceed.Feb 12 2017, 12:48 PM

Also, it would be great if you can provide real examples of where this helps (e.g. how often do you expect this to trigger in practice, and why it matters).
I assume you experimented enough with this to have a complete'ish pass somewhere out-of-tree. If so, did you run this on something to measure the impact?
Side question: do you want this to be run as part of the default pipeline eventually? What's the compile time cost?

Also, it would be great if you can provide real examples of where this helps (e.g. how often do you expect this to trigger in practice, and why it matters).

I can provide cases :)

There are a lot of them.
It's PRE for stores.

I assume you experimented enough with this to have a complete'ish pass somewhere out-of-tree. If so, did you run this on something to measure the impact?

I mentioned a lot of this to bryant yesterday, that these are things that we'd have to work through.

Side question: do you want this to be run as part of the default pipeline eventually? What's the compile time cost?

Staring at it, i'm pretty confident the compile time cost can be made minimal
The biggest cost is going to be the IDF calculation.

We can use one of the multiple-variable phi placement algorithms, or actually use the incremental construction algorithm from the new SSA updater, on the reverse graph, to place lambdas exactly where they are needed
(this is a bit trickier, but lowest cost option).

If *all* of that fails, we can make it non-sparse and solve for all variables at once.

bryant updated this revision to Diff 88273.Feb 13 2017, 3:57 PM
  • Insert PDSE.cpp into the right place in CMakeLists.txt.
  • Add copyright headers to impl and header files.
  • Re-worded file-level summary a bit.
  • -print-pdse is now -print-frg because that makes a bit more sense.
  • Explain why BreakCriticalEdges is required.
bryant marked 3 inline comments as done.Feb 13 2017, 4:08 PM
bryant added inline comments.
lib/Transforms/Scalar/PDSE.cpp
59

This TODO really belongs here or you can throw it away?

SSAPRE (resp. PDSE) needs critical edges to be broken in order for down-unsafety (resp. up-unsafety) propagation to be correct. I'm not sure if this condition is already met by default. I've tried adding BreakCriticalEdges as dependency, but this causes a crash in the legacy pass manager:

diff --git a/lib/Transforms/Scalar/PDSE.cpp b/lib/Transforms/Scalar/PDSE.cpp
index 941b184..f576260 100644
--- a/lib/Transforms/Scalar/PDSE.cpp
+++ b/lib/Transforms/Scalar/PDSE.cpp
@@ -723,11 +723,13 @@ public:
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.addRequired<PostDominatorTreeWrapperPass>();
     AU.addRequired<AAResultsWrapperPass>();
+    AU.addRequiredID(BreakCriticalEdgesID);
     AU.addRequired<TargetLibraryInfoWrapperPass>();
 
     AU.setPreservesCFG();
     AU.addPreserved<PostDominatorTreeWrapperPass>();
     AU.addPreserved<GlobalsAAWrapperPass>();
+    AU.addPreservedID(BreakCriticalEdgesID);
   }
 
   static char ID; // Pass identification, replacement for typeid
@@ -741,6 +743,7 @@ INITIALIZE_PASS_BEGIN(PDSELegacyPass, "pdse", "Partial Dead Store Elimination",
 INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(BreakCriticalEdges)
 INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
 INITIALIZE_PASS_END(PDSELegacyPass, "pdse", "Partial Dead Store Elimination",
                     false, false)
Args: opt -debug -debug-pass=Details -print-before-all -print-after-all -print-frg -pdse test.ll 
 -- 'Break critical edges in CFG' is not preserving 'Function Alias Analysis Results'
 -- 'Break critical edges in CFG' is not preserving 'Print function to stderr'
 -- 'Break critical edges in CFG' is not preserving 'Post-Dominator Tree Construction'
 -- 'Break critical edges in CFG' is not preserving 'Basic Alias Analysis (stateless AA impl)'
 -- 'Break critical edges in CFG' is not preserving 'Function Pass Manager'
Pass Arguments:  -targetlibinfo -tti -assumption-cache-tracker -postdomtree -domtree -basicaa -aa -print-function -break-crit-edges -print-function -print-function
Target Library Information
Target Transform Information
Assumption Cache Tracker
  ModulePass Manager
    FunctionPass Manager
      Post-Dominator Tree Construction
      Dominator Tree Construction
      Basic Alias Analysis (stateless AA impl)
      Function Alias Analysis Results
      Print function to stderr
      Break critical edges in CFG
      Print function to stderr
      Print function to stderr
Unable to schedule 'Post-Dominator Tree Construction' required by 'Partial Dead Store Elimination'
Unable to schedule pass
UNREACHABLE executed at /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1243!
#0 0x00007ff310f4d989 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/bryant/3rd/llvm/build/../lib/Support/Unix/Signals.inc:402:11
#1 0x00007ff310f4dba9 PrintStackTraceSignalHandler(void*) /home/bryant/3rd/llvm/build/../lib/Support/Unix/Signals.inc:466:1
#2 0x00007ff310f4aaa7 llvm::sys::RunSignalHandlers() /home/bryant/3rd/llvm/lib/Support/Signals.cpp:0:5
#3 0x00007ff310f4df30 SignalHandler(int) /home/bryant/3rd/llvm/build/../lib/Support/Unix/Signals.inc:256:1
#4 0x00007ff3102988d0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0xf8d0)
#5 0x00007ff30f8ea067 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x35067)
#6 0x00007ff30f8eb448 abort (/lib/x86_64-linux-gnu/libc.so.6+0x36448)
#7 0x00007ff310e264a0 LLVMInstallFatalErrorHandler /home/bryant/3rd/llvm/lib/Support/ErrorHandling.cpp:133:0
#8 0x00007ff312250664 /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1243:3
#9 0x00007ff31224f288 llvm::PMDataManager::add(llvm::Pass*, bool) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1018:22
#10 0x00007ff31225374a llvm::FunctionPass::assignPassManager(llvm::PMStack&, llvm::PassManagerType) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1858:1
#11 0x00007ff31224c4b5 llvm::PMTopLevelManager::schedulePass(llvm::Pass*) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:690:7
#12 0x00007ff31225e12f llvm::legacy::PassManagerImpl::add(llvm::Pass*) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:398:3
#13 0x00007ff312252e11 llvm::legacy::PassManager::add(llvm::Pass*) /home/bryant/3rd/llvm/lib/IR/LegacyPassManager.cpp:1719:1
#14 0x0000000000270a3f (opt+0x270a3f)
#15 0x000000000026c2f8 (opt+0x26c2f8)
#16 0x00007ff30f8d6b45 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b45)
#17 0x0000000000242029 (opt+0x242029)
Stack dump:
0.      Program arguments: opt -debug -debug-pass=Details -print-before-all -print-after-all -print-frg -pdse test.ll

If need be, I could fix this problem first before committing the patch.

Also, it would be great if you can provide real examples of where this helps (e.g. how often do you expect this to trigger in practice, and why it matters).

I can provide cases :)

There are a lot of them.
It's PRE for stores.

I assume you experimented enough with this to have a complete'ish pass somewhere out-of-tree. If so, did you run this on something to measure the impact?

I mentioned a lot of this to bryant yesterday, that these are things that we'd have to work through.

Side question: do you want this to be run as part of the default pipeline eventually? What's the compile time cost?

It should run faster than the existing DSE, which does a lot of quadratic complexity clobber checks. PDSE only makes linear time passes (apart from the initial pass that collects and categorizes occurrences, which does a quadratic number of alias checks).

Staring at it, i'm pretty confident the compile time cost can be made minimal
The biggest cost is going to be the IDF calculation.

Some additional avenues for speed-ups:

  • I think I can re-use MemorySSA's AccessLists (instead of doing an initial roll through every single instruction like https://reviews.llvm.org/D29866 does now), since it already includes may-throws. I just need to walk each BB's list in reverse. The down side is that store insertions are a bit trickier since the MSSA structure ought to be preserved.
  • For lambda insertion, the def blocks need to be computed. This means make an additional pass through the function to search for kill-only blocks (since real occurrence blocks are already known), but https://reviews.llvm.org/D29866 makes this extra pass per occurrence class. I think this could instead be done in one pass for all classes.

Also, it would be great if you can provide real examples of where this helps (e.g. how often do you expect this to trigger in practice, and why it matters).

I can provide cases :)

There are a lot of them.
It's PRE for stores.

I assume you experimented enough with this to have a complete'ish pass somewhere out-of-tree. If so, did you run this on something to measure the impact?

I mentioned a lot of this to bryant yesterday, that these are things that we'd have to work through.

Side question: do you want this to be run as part of the default pipeline eventually? What's the compile time cost?

It should run faster than the existing DSE, which does a lot of quadratic complexity clobber checks. PDSE only makes linear time passes (apart from the initial pass that collects and categorizes occurrences, which does a quadratic number of alias checks).

Staring at it, i'm pretty confident the compile time cost can be made minimal
The biggest cost is going to be the IDF calculation.

Some additional avenues for speed-ups:

  • I think I can re-use MemorySSA's AccessLists (instead of doing an initial roll through every single instruction like https://reviews.llvm.org/D29866 does now), since it already includes may-throws. I just need to walk each BB's list in reverse. The down side is that store insertions are a bit trickier since the MSSA structure ought to be preserved.
  • For lambda insertion, the def blocks need to be computed. This means make an additional pass through the function to search for kill-only blocks (since real occurrence blocks are already known), but https://reviews.llvm.org/D29866 makes this extra pass per occurrence class. I think this could instead be done in one pass for all classes.

It is possible to do it once per pass at the cost of over-computing the set, and ending up with useless lambdas that you need to eliminate.
Otherwise, you can also just do it on the fly using the dual of http://www.cdl.uni-saarland.de/papers/bbhlmz13cc.pdf

filcab added a subscriber: filcab.Apr 3 2017, 6:52 AM

Just some minor comments/questions and some nitpicking.
Thanks for working on this!
Filipe

include/llvm/Transforms/Scalar/PDSE.h
1

Missing space.

lib/Transforms/Scalar/PDSE.cpp
43

From this TODO, it seems for the foreseeable future (first iterations and commits of this pass), it's not going to cover partial overwrites like D30703 is adding.
Please let me know if my understanding is correct, and if you have plans for adding partial overwrite tracking "soon" (as in: there's already some work underway, or a plan on how to do it), or if it's more of a "after pdse gets in, we might start working on that".

This will help figure out if D30703 should wait for PDSE, or be committed with maybe a TODO mentioning PDSE and that it might not be needed in the future.

64

Maybe make the option be -print-pdse-frg (print is more important) or -pdse-print-frg (pdse is more important), so it's easier to tell where the option will be active. And/or add a PDSE reference to the description.

75

No brackets after return, please.
Maybe print this last one above the other one, and just have one return false, even.

This revision now requires changes to proceed.Aug 24 2017, 2:15 PM

dannyb asked me to take over the review.
Can you fix the issues that filcab identified? I'll see if I can get you an answer about critical edge splitting.