Page MenuHomePhabricator

[LazyValueInfo] Fix for a nasty compile-time problem with questions
Needs ReviewPublic

Authored by Gerolf on Apr 11 2016, 6:54 PM.

Details

Reviewers
reames
hfinkel
Summary

In a generated test case the compile does not finish in LVI. It is a big
initializing function with ~150K of code consisting mostly of allocas and
direct/indirect calls. The LVI solver essentially pushes the allocas with
undefined lattice value recursively on the block stack and fails to derive a
value (or bottom). In the bigger picture the core problem is that the value
range problem is not solved as a forward problem with well defined top, bottom,
meet etc. Instead the solver could walk the CFG spontaneously in any
direction and push values on the block stack *while* it tries to determine the meet
for each value. The tactical fix here is to suppress this scheme when
the value is an alloca. In this case the value can be determined to "not null".
This reduces the compile-time to less than 2 minutes.
While looking at the code I also found a few snippets, and for some them ask for
clarication. I marked them as FIXME's. Please chip in when you remember
something about the code.
Longer term I think this pass needs to be rewritten as forward dfa for const
ranges. Currently I don't think one can guarantee that the LVI algorithm
terminates in a reasonable time more compile-time problems lurking.

Diff Detail

Event Timeline

Gerolf updated this revision to Diff 53348.Apr 11 2016, 6:54 PM
Gerolf retitled this revision from to [LazyValueInfo] Fix for a nasty compile-time problem with questions.
Gerolf updated this object.
Gerolf added reviewers: hfinkel, reames.
Gerolf added a subscriber: llvm-commits.
bmakam added a subscriber: bmakam.Apr 13 2016, 6:38 PM

Who can take a look at this? To illustrate the issue I added a basic dumper:

  • lib/Analysis/LazyValueInfo.cpp (revision 263867)

+++ lib/Analysis/LazyValueInfo.cpp (working copy)
@@ -565,10 +565,13 @@
}

void LazyValueInfoCache::solve() {
+int count=1;

while (!BlockValueStack.empty()) {
  std::pair<BasicBlock*, Value*> &e = BlockValueStack.top();
  assert(BlockValueSet.count(e) && "Stack value should be in BlockValueSet!");

+count++;
+if (count > 10) errs() << "POP " << *e.second << " in " << e.first->getName() << " = " << getCachedValueInfo(e.second, e.first) << "\n";

if (solveBlockValue(e.second, e.first)) {
  // The work item was completely processed.
  assert(BlockValueStack.top() == e && "Nothing should have been pushed!");

After about *16* hours (and still happily running) it dumps about 500k lines like:

POP %ref.tmp74111 = alloca i32, align 4 in invoke.cont74104 = undefined
POP %ref.tmp74111 = alloca i32, align 4 in if.then.i.i.i207266 = undefined
POP %ref.tmp74111 = alloca i32, align 4 in _ZNSt3110shared_ptrIN9TlvBuffer10TlvElementEED1Ev.exit207271 = undefined
POP %ref.tmp74111 = alloca i32, align 4 in invoke.cont74108 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in invoke.cont74080 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in invoke.cont74085 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in if.then.i.i.i207244 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in _ZNSt3
110shared_ptrIN9TlvBuffer10TlvElementEED1Ev.exit207249 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in invoke.cont74089 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in invoke.cont74094 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in if.then.i.i.i207255 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in _ZNSt3110shared_ptrIN9TlvBuffer10TlvElementEED1Ev.exit207260 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in invoke.cont74098 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in invoke.cont74104 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in if.then.i.i.i207266 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in _ZNSt3
110shared_ptrIN9TlvBuffer10TlvElementEED1Ev.exit207271 = undefined
POP %ref.tmp74112 = alloca i64, align 8 in invoke.cont74108 = undefined
POP %ref.tmp74120 = alloca i32, align 4 in invoke.cont74089 = undefined
POP %ref.tmp74120 = alloca i32, align 4 in invoke.cont74094 = undefined

I think we can make the file exposing this problem part of the compile-time test suite.

The patch offers a conservative fix in the data flow sense.

Thanks
Gerolf

reames requested changes to this revision.Apr 18 2016, 5:32 PM
reames edited edge metadata.

Marking as request changes mostly to get it off my queue. Feel free to reverse once you're ready for more discussion.

lib/Analysis/LazyValueInfo.cpp
64

FALSE - constantrange applies only to integers. constant/notconstant apply only to non-integers. Yes, that is very badly abstracted in the current code.

207

True, but it does? I don't see your point.

208

TRUE, but probably not useful in practice. Meeting dead code is relatively rare.

246

We have proven conflicting paths along two paths. The is actually proving that a particular codepath is unreachable. We could make this as undefined, but the overdefined result is conservatively correct.

769

Your change here is reasonable, but your reasoning scares me. What makes allocas special in this example? Just that we have a lot of them? Would the same test case with a bunch of globals be equally bad?

p.s. We can actually prove constant/not-constant facts about pointers. This is useful when simplifying code involving equality checks between allocas and addressed obtained from globals. Given this, I'm not entirely convinced this is a good idea. I'm open to debating the compile time/precision tradeoff here, but it is a tradeoff which needs to be acknowledged and documented.

799

You have missed a key detail of LVI. "false" means go do more recurse work, and recall this function when another edge might be available. This code is checking to see if any known input forces us to overdefined to avoid recursing in that common case.

843

Same as above.

901

Can you clarify your question? This snippet seems to match the idiom used throughout based on a quick scan of the code.

1261

Because this is the outer interface which starts the recursive walk, not the inner step of the recursion. If you want a more detailed explanation of the code, let me know. I'll check something in which updates comments to make this more clear.

This revision now requires changes to proceed.Apr 18 2016, 5:32 PM

This should also take care of http://reviews.llvm.org/D18066.

lib/Analysis/LazyValueInfo.cpp
64

Ok, added a note with that comment instead.

246

Got it, thanks. The value should bottom (overdefined) in this case, so that is fine.

769

"scares me". I think the crux is that values are pushed on the stack while the solver is running (see the call chain solve -> etc above) . This is not the standard way how const prop/range solvers work and where the monotonicity assumption is violated. I believe but have no proof that the alloca's are only exposing the deeper problem. What we eventually will need here is a standard (forward) data flow solver that will prevent such problems.

799

Say you have three edges. Assume getEdgeValue() returns false, overdefined , false. The result of the function is false. Assume getEdgeValue returns overdefined on the first edge. The result of the function is true, independent of the other edges. That doesn't look right. The order of the edges should not impact the result. I think this is the intent:
walk all edges.
if any edge it overdefined: BBLV= overdefined. Return true.
Otherwise: merge all values. If any edge is missing return false else BBLV = merged value. return true.

But this is not what is implemented (and my code proposal didn't do any better here, so forget about it).

901

Ok, this is the example you answered below.

Gerolf updated this revision to Diff 54314.Apr 19 2016, 7:22 PM
Gerolf edited edge metadata.

Mostly removed some of my questions and added a few comments based on Philip's
review.

Philip

the only code change is

+ FIXME: this is to solve a compile-time problem.
+
In a test case with thousands of alloca's and
+ indiect calls the solver pushed the allocas as undefined
+
on the stack and tries to "solve" them. This seems to
+ triggered that the value range problem in the current algorithm
+
is not solved as a forward problem with well defined top, bottom, meet
+ etc. Instead it may walk the CFG spontaneously in any direction and push
+
values on the block stack while it tries to determine the lattice values
+ of the values on the stack. This is illustrated in the current routine:
+
solve -> solveBlockValue -> solveBlockValueNonLocal, which (see below)
+ may call getEdgeValue, which in turn could push more values on the stack
+
This code fixes the test cases with the alloca's reducing compile-time
+ for that ~150K line funciton from "inifinite" to a couple of minutes.
+
The rational is that "not null" is the best we can do for a stackaddress.
+
+ if (isa<AllocaInst>(Val)) {
+ assert(NotNull && "Stackaddress cannot be zero\n");
+ PointerType *PTy = cast<PointerType>(Val->getType());
+ Result = LVILatticeVal::getNot(ConstantPointerNull::get(PTy));
+ BBLV = Result;
+ return true;
+ }
+

I think you were ok with it in your first review. Can I get a LGTM just for that? I will then open a new review for remaining questions.

Thanks
Gerolf

I'm 99% sure this patch is no longer needed after "267642: [LVI] Reduce compile time by lazily scanning blocks if needed.". Can you test and confirm?

lib/Analysis/LazyValueInfo.cpp
65

I landed a change which clarifies this. Please rebase.

246

I re-read my previous comment and realized I was wrong. This code path is in the conservative meet (i.e. merge two paths at a phi). Conflicting facts here merely imply we can't reason precisely about the values flowing through the merge. It does not imply unreachable code. Sorry for the confusion.

Yes, I had already kicked off a build with your patches. Unfortunately it does not (at least no significantly) fix the issue and the test case is still running (for about 30 min now). My fix brings CT down to 2 min on an RA compiler.

Yes, I had already kicked off a build with your patches. Unfortunately it does not (at least no significantly) fix the issue and the test case is still running (for about 30 min now). My fix brings CT down to 2 min on an RA compiler.

Surprising.

Geoff, any chance you could share a test case? I'd really like to understand why this is taking so long.

p.s. I think my earlier comment was unclear. Assuming null pointer case in http://reviews.llvm.org/D18066 resolves your compile time problem, I plan on landing that one and not this one. The effect should be roughly the same, but yours is a much deeper hack in the algorithm itself.

Gerolf added a subscriber: Gerolf.Apr 27 2016, 11:33 AM

Philip,
can you share data about your compile-time improvements? Platform, benchmarks, options? Did you measure performance impact also?

Thanks
Gerolf

FYI, I just accepted an alternate patch to what I think is the same problem: https://reviews.llvm.org/D18066

If that doesn't work, we can return to this patch.