In a generated test case the compile does not finish in LVI. It is a big
initializing function with ~150K of code consisting mostly of allocas and
direct/indirect calls. The LVI solver essentially pushes the allocas with
undefined lattice value recursively on the block stack and fails to derive a
value (or bottom). In the bigger picture the core problem is that the value
range problem is not solved as a forward problem with well defined top, bottom,
meet etc. Instead the solver could walk the CFG spontaneously in any
direction and push values on the block stack *while* it tries to determine the meet
for each value. The tactical fix here is to suppress this scheme when
the value is an alloca. In this case the value can be determined to "not null".
This reduces the compile-time to less than 2 minutes.
While looking at the code I also found a few snippets, and for some them ask for
clarication. I marked them as FIXME's. Please chip in when you remember
something about the code.
Longer term I think this pass needs to be rewritten as forward dfa for const
ranges. Currently I don't think one can guarantee that the LVI algorithm
terminates in a reasonable time more compile-time problems lurking.
Details
Diff Detail
Event Timeline
Who can take a look at this? To illustrate the issue I added a basic dumper:
- lib/Analysis/LazyValueInfo.cpp (revision 263867)
+++ lib/Analysis/LazyValueInfo.cpp	(working copy)
@@ -565,10 +565,13 @@
 }
void LazyValueInfoCache::solve() {
+int count=1;
while (!BlockValueStack.empty()) {
  std::pair<BasicBlock*, Value*> &e = BlockValueStack.top();
  assert(BlockValueSet.count(e) && "Stack value should be in BlockValueSet!");+count++;
+if (count > 10) errs() <<  "POP " << *e.second << " in " << e.first->getName() << " = " << getCachedValueInfo(e.second, e.first) << "\n";
if (solveBlockValue(e.second, e.first)) {
  // The work item was completely processed.
  assert(BlockValueStack.top() == e && "Nothing should have been pushed!");After about *16* hours (and still happily running) it dumps about 500k lines like:
POP   %ref.tmp74111 = alloca i32, align 4 in invoke.cont74104 = undefined
POP   %ref.tmp74111 = alloca i32, align 4 in if.then.i.i.i207266 = undefined
POP   %ref.tmp74111 = alloca i32, align 4 in _ZNSt3110shared_ptrIN9TlvBuffer10TlvElementEED1Ev.exit207271 = undefined
POP   %ref.tmp74111 = alloca i32, align 4 in invoke.cont74108 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in invoke.cont74080 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in invoke.cont74085 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in if.then.i.i.i207244 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in _ZNSt3110shared_ptrIN9TlvBuffer10TlvElementEED1Ev.exit207249 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in invoke.cont74089 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in invoke.cont74094 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in if.then.i.i.i207255 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in _ZNSt3110shared_ptrIN9TlvBuffer10TlvElementEED1Ev.exit207260 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in invoke.cont74098 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in invoke.cont74104 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in if.then.i.i.i207266 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in _ZNSt3110shared_ptrIN9TlvBuffer10TlvElementEED1Ev.exit207271 = undefined
POP   %ref.tmp74112 = alloca i64, align 8 in invoke.cont74108 = undefined
POP   %ref.tmp74120 = alloca i32, align 4 in invoke.cont74089 = undefined
POP   %ref.tmp74120 = alloca i32, align 4 in invoke.cont74094 = undefined
I think we can make the file exposing this problem part of the compile-time test suite.
The patch offers a conservative fix in the data flow sense.
Thanks
Gerolf
Marking as request changes mostly to get it off my queue. Feel free to reverse once you're ready for more discussion.
| lib/Analysis/LazyValueInfo.cpp | ||
|---|---|---|
| 64 | FALSE - constantrange applies only to integers. constant/notconstant apply only to non-integers. Yes, that is very badly abstracted in the current code. | |
| 207 | True, but it does? I don't see your point. | |
| 208 | TRUE, but probably not useful in practice. Meeting dead code is relatively rare. | |
| 246 | We have proven conflicting paths along two paths. The is actually proving that a particular codepath is unreachable. We could make this as undefined, but the overdefined result is conservatively correct. | |
| 769 | Your change here is reasonable, but your reasoning scares me. What makes allocas special in this example? Just that we have a lot of them? Would the same test case with a bunch of globals be equally bad? p.s. We can actually prove constant/not-constant facts about pointers. This is useful when simplifying code involving equality checks between allocas and addressed obtained from globals. Given this, I'm not entirely convinced this is a good idea. I'm open to debating the compile time/precision tradeoff here, but it is a tradeoff which needs to be acknowledged and documented. | |
| 799 | You have missed a key detail of LVI. "false" means go do more recurse work, and recall this function when another edge might be available. This code is checking to see if any known input forces us to overdefined to avoid recursing in that common case. | |
| 843 | Same as above. | |
| 901 | Can you clarify your question? This snippet seems to match the idiom used throughout based on a quick scan of the code. | |
| 1261 | Because this is the outer interface which starts the recursive walk, not the inner step of the recursion. If you want a more detailed explanation of the code, let me know. I'll check something in which updates comments to make this more clear. | |
This should also take care of http://reviews.llvm.org/D18066.
| lib/Analysis/LazyValueInfo.cpp | ||
|---|---|---|
| 64 | Ok, added a note with that comment instead. | |
| 246 | Got it, thanks. The value should bottom (overdefined) in this case, so that is fine. | |
| 769 | "scares me". I think the crux is that values are pushed on the stack while the solver is running (see the call chain solve -> etc above) . This is not the standard way how const prop/range solvers work and where the monotonicity assumption is violated. I believe but have no proof that the alloca's are only exposing the deeper problem. What we eventually will need here is a standard (forward) data flow solver that will prevent such problems. | |
| 799 | Say you have three edges. Assume getEdgeValue() returns false, overdefined , false. The result of the function is false. Assume getEdgeValue returns overdefined on the first edge. The result of the function is true, independent of the other edges. That doesn't look right. The order of the edges should not impact the result. I think this is the intent: But this is not what is implemented (and my code proposal didn't do any better here, so forget about it). | |
| 901 | Ok, this is the example you answered below. | |
Mostly removed some of my questions and added a few comments based on Philip's
review.
Philip
the only code change is
+   FIXME: this is to solve a compile-time problem.
+   In a test case with thousands of alloca's and
+   indiect calls the solver pushed the allocas as undefined
+   on the stack and tries to "solve" them. This seems to
+   triggered that the value range problem in the current algorithm
+   is not solved as a forward problem with well defined top, bottom, meet
+   etc. Instead it may walk the CFG spontaneously in any direction and push
+   values on the block stack while it tries to determine the lattice values
+   of the values on the stack. This is illustrated in the current routine:
+   solve -> solveBlockValue -> solveBlockValueNonLocal, which (see below)
+   may call getEdgeValue, which in turn could push more values on the stack
+   This code fixes the test cases with the alloca's reducing compile-time
+   for that ~150K line funciton from "inifinite" to a couple of minutes.
+   The rational is that "not null" is the best we can do for a stackaddress.
+
+  if (isa<AllocaInst>(Val)) {
+    assert(NotNull && "Stackaddress cannot be zero\n");
+    PointerType *PTy = cast<PointerType>(Val->getType());
+    Result = LVILatticeVal::getNot(ConstantPointerNull::get(PTy));
+    BBLV = Result;
+    return true;
+  }
+
I think you were ok with it in your first review. Can I get a LGTM just for that? I will then open a new review for remaining questions.
Thanks
Gerolf
I'm 99% sure this patch is no longer needed after "267642: [LVI] Reduce compile time by lazily scanning blocks if needed.". Can you test and confirm?
| lib/Analysis/LazyValueInfo.cpp | ||
|---|---|---|
| 65 | I landed a change which clarifies this. Please rebase. | |
| 246 | I re-read my previous comment and realized I was wrong. This code path is in the conservative meet (i.e. merge two paths at a phi). Conflicting facts here merely imply we can't reason precisely about the values flowing through the merge. It does not imply unreachable code. Sorry for the confusion. | |
Yes, I had already kicked off a build with your patches. Unfortunately it does not (at least no significantly) fix the issue and the test case is still running (for about 30 min now). My fix brings CT down to 2 min on an RA compiler.
Surprising.
Geoff, any chance you could share a test case? I'd really like to understand why this is taking so long.
p.s. I think my earlier comment was unclear. Assuming null pointer case in http://reviews.llvm.org/D18066 resolves your compile time problem, I plan on landing that one and not this one. The effect should be roughly the same, but yours is a much deeper hack in the algorithm itself.
Philip,
can you share data about your compile-time improvements? Platform, benchmarks, options? Did you measure performance impact also?
Thanks
Gerolf
FYI, I just accepted an alternate patch to what I think is the same problem: https://reviews.llvm.org/D18066
If that doesn't work, we can return to this patch.
FALSE - constantrange applies only to integers. constant/notconstant apply only to non-integers. Yes, that is very badly abstracted in the current code.