[JumpThreading] Use DT to avoid processing dead blocks
ClosedPublic

Authored by uabelho on Jun 13 2017, 1:19 AM.

Details

Summary

This fixes PR33357 where LVI recursed infinitely when used on dead blocks.

Now instead of the previous simple check where we just deemed a block dead
if it had no predecessors we now flush DDT and use DT to determine if
a block is really reachable from entry before processing it.

Diff Detail

Repository
rL LLVM
uabelho created this revision.Jun 13 2017, 1:19 AM

I don't know this code at all, but the change at least seems to solve PR33357. Opinions?

lib/Analysis/LazyValueInfo.cpp
1354 ↗(On Diff #102297)

I have no idea if "overdefined" is the proper thing to use here, or if we should set it to "undefined" (by calling the LVILatticeVal() constructor)? Any opinions?

efriedma edited reviewers, added: efriedma, dberlin; removed: eli.friedman.Jun 13 2017, 1:18 PM

Is there a reason the caller can't avoid making calls on unreachable blocks?

  1. It avoids having to hack up the analysis like this
  2. You are likely to discover more issues in this and other analysis
  3. It's faster anyway

Is there a reason the caller can't avoid making calls on unreachable blocks?

I can only refer to what Eli said in the PR:
https://bugs.llvm.org//show_bug.cgi?id=33357#c7

"The alternative is that specifically disallow calling into LazyValueInfo on dead code... but it would be much harder to handle in JumpThreading given the way the pass works."

(Myself I don't know JumpThreading and I don't know LazyValueInfo.)

davide added a subscriber: davide.Jun 14 2017, 12:41 AM

Is there a reason the caller can't avoid making calls on unreachable blocks?

I can only refer to what Eli said in the PR:
https://bugs.llvm.org//show_bug.cgi?id=33357#c7

"The alternative is that specifically disallow calling into LazyValueInfo on dead code... but it would be much harder to handle in JumpThreading given the way the pass works."

(Myself I don't know JumpThreading and I don't know LazyValueInfo.)

I would like to understand more about this from Eli.

The problem is that:
We currently allow unbounded craziness in unreachable blocks (which i still believe does not make sense compared to something like 'blocks without parents can have whatever they want, but when attached to a function, gotta be verifiable', but such is life). This means any sane analysis generally has to avoid going into them anyway, because really, any optimistic or fixpointing or ... analysis will break pretty easily in a myriad of ways.

It happens that LVI is not optimal, fixpointing, etc.

But at the same time, if you let jumpthreading ask LVI about unbounded craziness, you are now condemning everything LVI ever uses (simplifiers, other analysis) to try to handle it as well. Which is not really going to happen, it's just going to cause more bugs over time :P
So it's, IMHO, not a good choice if it can be avoided.

If the issue in JumpThreading is determining reachability at all times, that we can solve in the next few weeks with the dynamic dominators stuff (it will always, at all times, have correct reachability from entry block, as long as you tell it what edges you changed)
If it's something else, i'd love to know what so i can see if i can help.
Because at worst, i would have expected you to be able to build an LVI wrapper (well, LVI is already a wrapper anyway) for jumpthreading that checks reachability, says "no idea" if it's unreachable, and others, calls the regular LVI impl.
(LVI itself should already try to avoid walking edges back into forward unreachable blocks)
I would also expect, in practice, you could do something better.

If the issue in JumpThreading is determining reachability at all times

Yes, that's the primary issue. If we had some cheap way to make JumpThreading avoid visiting unreachable blocks, that would be fine. JumpThreading already has code to kill off unreachable blocks, but it only runs once per iteration of the pass (because it takes O(N) time, where N is the number of CFG edges in the function).

If the dynamic DomTree work solves that, that fine, but I'm not sure it makes sense to block this patch indefinitely on that.

If the issue in JumpThreading is determining reachability at all times

Yes, that's the primary issue. If we had some cheap way to make JumpThreading avoid visiting unreachable blocks, that would be fine. JumpThreading already has code to kill off unreachable blocks, but it only runs once per iteration of the pass (because it takes O(N) time, where N is the number of CFG edges in the function).

Can't you just tell LVI the blocks are dead without them being dead?

If the dynamic DomTree work solves that, that fine, but I'm not sure it makes sense to block this patch indefinitely on that.

FWIW: I expect that code to start being submitted in the next week :P
I don't expect it to take all that long.

(and to be clear, if you want to hack LVI a bit, i don't think it's the worst thing in the world compared to how LVI already is, just wanted to make sure we at least explored the options)

Can't you just tell LVI the blocks are dead without them being dead?

I'm not following your question.

The sequence here is that we constant-fold a terminator (deleting an edge), then continue iterating through all the blocks in the function, and eventually crash when LVI discovers a cycle in a loop which was made unreachable. Which blocks is JumpThreading supposed to tell LVI about?

Can't you just tell LVI the blocks are dead without them being dead?

I'm not following your question.

The sequence here is that we constant-fold a terminator (deleting an edge), then continue iterating through all the blocks in the function, and eventually crash when LVI discovers a cycle in a loop which was made unreachable. Which blocks is JumpThreading supposed to tell LVI about?

Sorry, yes, my question didn't make sense.
I wrote too quickly.
I stared at it later, and saw this.

I also stared at the unreachable code removal (which as you say, is O(cfg edges)).
The way it does this looks super-strange and expensive (it should be doable in O(blocks) time, trivially)
It also iterates jump threading in basically random order, which sadly means it will take more iterations, and has to check for cases that it wouldn't otherwise, and can't eagerly remove most unreachable blocks

Sigh, oh well.

anna added a comment.Jun 14 2017, 1:46 PM

The sequence here is that we constant-fold a terminator (deleting an edge), then continue iterating through all the blocks in the function, and eventually crash when LVI discovers a cycle in a loop which was made unreachable.

So, the main problem here is that in each iteration of Jump threading, we can find the trivially dead blocks, but not the complete set of unreachable blocks. Also, I'm assuming transitively detecting dead blocks after the call to processBlock does not help in this case?

In D34135#780666, @anna wrote:

The sequence here is that we constant-fold a terminator (deleting an edge), then continue iterating through all the blocks in the function, and eventually crash when LVI discovers a cycle in a loop which was made unreachable.

So, the main problem here is that in each iteration of Jump threading, we can find the trivially dead blocks, but not the complete set of unreachable blocks. Also, I'm assuming transitively detecting dead blocks after the call to processBlock does not help in this case?

The set of blocks it will find will depend on input order of basic blocks currently.
It is possible to increase it to find all non-cycle involved unreachable blocks by performing the processing in the optimal iteration order.
It should be possible to find all unreachable blocks (including those involved in cycle) as you go by forming SCC's and iterating them individually in the right order.

The "detection of dead blocks" is really a quick DCE algorithm that walks the actual instructions of blocks, etc.
Not "normal" structural detection of unreachable blocks.
(I admit to being mostly unsure why, it's not immediately clear to me why you can't structurally detect them all)

So repeated transitive detection would be kind of a band-aid. It may even be made to work.
But it would have a much worse timebound.

anna added a comment.Jun 15 2017, 8:03 AM
In D34135#780666, @anna wrote:

The sequence here is that we constant-fold a terminator (deleting an edge), then continue iterating through all the blocks in the function, and eventually crash when LVI discovers a cycle in a loop which was made unreachable.

So, the main problem here is that in each iteration of Jump threading, we can find the trivially dead blocks, but not the complete set of unreachable blocks. Also, I'm assuming transitively detecting dead blocks after the call to processBlock does not help in this case?

The set of blocks it will find will depend on input order of basic blocks currently.
It is possible to increase it to find all non-cycle involved unreachable blocks by performing the processing in the optimal iteration order.
It should be possible to find all unreachable blocks (including those involved in cycle) as you go by forming SCC's and iterating them individually in the right order.

Agreed. Also, there was even a discussion on changing the ordering for JumpThreading because of how the current ordering can cause certain blocks to be dead, and AA chokes on the PHIs in those blocks: https://bugs.llvm.org/show_bug.cgi?id=33142#c14
(This might be completely irrelevant for this patch, but it's around the same problem of identifying unreachability at all times in JumpThreading).

In D34135#781243, @anna wrote:
In D34135#780666, @anna wrote:

The sequence here is that we constant-fold a terminator (deleting an edge), then continue iterating through all the blocks in the function, and eventually crash when LVI discovers a cycle in a loop which was made unreachable.

So, the main problem here is that in each iteration of Jump threading, we can find the trivially dead blocks, but not the complete set of unreachable blocks. Also, I'm assuming transitively detecting dead blocks after the call to processBlock does not help in this case?

The set of blocks it will find will depend on input order of basic blocks currently.
It is possible to increase it to find all non-cycle involved unreachable blocks by performing the processing in the optimal iteration order.
It should be possible to find all unreachable blocks (including those involved in cycle) as you go by forming SCC's and iterating them individually in the right order.

Agreed. Also, there was even a discussion on changing the ordering for JumpThreading because of how the current ordering can cause certain blocks to be dead, and AA chokes on the PHIs in those blocks: https://bugs.llvm.org/show_bug.cgi?id=33142#c14
(This might be completely irrelevant for this patch, but it's around the same problem of identifying unreachability at all times in JumpThreading).

This is pretty much what i'm afraid of - patching around this won't fix the fact that it uses a bunch of analysis, and will likely use more over time.
We're just going to find more bugs.

In D34135#781243, @anna wrote:
In D34135#780666, @anna wrote:

The sequence here is that we constant-fold a terminator (deleting an edge), then continue iterating through all the blocks in the function, and eventually crash when LVI discovers a cycle in a loop which was made unreachable.

So, the main problem here is that in each iteration of Jump threading, we can find the trivially dead blocks, but not the complete set of unreachable blocks. Also, I'm assuming transitively detecting dead blocks after the call to processBlock does not help in this case?

The set of blocks it will find will depend on input order of basic blocks currently.
It is possible to increase it to find all non-cycle involved unreachable blocks by performing the processing in the optimal iteration order.
It should be possible to find all unreachable blocks (including those involved in cycle) as you go by forming SCC's and iterating them individually in the right order.

Agreed. Also, there was even a discussion on changing the ordering for JumpThreading because of how the current ordering can cause certain blocks to be dead, and AA chokes on the PHIs in those blocks: https://bugs.llvm.org/show_bug.cgi?id=33142#c14
(This might be completely irrelevant for this patch, but it's around the same problem of identifying unreachability at all times in JumpThreading).

Yes. As the person who did the analysis on that bug I decided to not go that direction (i.e. use a stack worklist) because I wasn't able to prove myself it was correct.
I've seen at least another example internally where JumpThreading goes bananas because of bad iteration ordering. Iterating SCC's topologically will solve at least part of these problems,
I think, but I'm afraid that's a much larger change. I wouldn't stop this from going in but at the same time it's likely Zhendong's fuzzer (or other fuzzers, or simply production code :) will uncover similar issues to this.

i.e. IMHO the long term solution is to rewrite JT but nobody seems to have the bandwidth to do it for now.

Great discussion guys!

So if I try to summarize:

  • Everyone agrees the real fix would be to rewrite JT, but no one has the time to do it.
  • There are probably many more problems in JT just waiting to be exposed.
  • The patch doesn't make things worse.
  • No one objects if we put this patch in.

Did I get it right?

Another question:
Does anyone have an opinion about if the initial inserted value should be "overdefined"
(as currently in the patch) or if it should be "undefined"? Does it even matter?

My choice of "overdefined" was simply because there existed a LVILatticeVal::getOverdefined()
method. It's not the result of a great deal of analysis.

Great discussion guys!

So if I try to summarize:

  • Everyone agrees the real fix would be to rewrite JT, but no one has the time to do it.
  • There are probably many more problems in JT just waiting to be exposed.
  • The patch doesn't make things worse.
  • No one objects if we put this patch in.

    Did I get it right?

    Another question: Does anyone have an opinion about if the initial inserted value should be "overdefined" (as currently in the patch) or if it should be "undefined"? Does it even matter?

    My choice of "overdefined" was simply because there existed a LVILatticeVal::getOverdefined() method. It's not the result of a great deal of analysis.

Ping!

Any more comments on this?

What about "overdefined" vs "undefined"?

Anyone wants to accept it or should we let it be broken?

Update: https://reviews.llvm.org/rL314435 has now landed, so we can use domtree to detect dead blocks.

Update: https://reviews.llvm.org/rL314435 has now landed, so we can use domtree to detect dead blocks.

I saw it was reverted in r314589 but if it makes it into tree again, will that then mean that we can
replace the "trivially dead" checks

pred_empty(BB)

in JumpThreadingPass::ProcessBlock and JumpThreadingPass::runImpl with

!DT->isReachableFromEntry(BB)

and problem is solved?

evandro added a subscriber: spop.Oct 4 2017, 8:16 AM

I saw it was reverted in r314589 but if it makes it into tree again, will that then mean that we can
replace the "trivially dead" checks

Hello @uabelho , I am working to re-submit the patch. I have already fixed the assert reported by @eli.friedman . I am working to enable preservation under flag at the request of @mzolotukhin . I hope to have it re-submitted this week for review.

fhahn added a subscriber: fhahn.Nov 21 2017, 4:47 AM

Ok, so now when
https://reviews.llvm.org/D40146
has landed, can that be used to solve PR33357?

Can we replace the "trivially dead" checks

pred_empty(BB)

in JumpThreadingPass::ProcessBlock and JumpThreadingPass::runImpl with

!DT->isReachableFromEntry(BB)

and problem is solved?

Do we need something more to preserve DT while JumpThreading is running so we can use it? Hopefully something that isn't very costly?

If someone explains how it can be used I'll try to find the time to update the patch.

Can we replace the "trivially dead" checks
pred_empty(BB)
in JumpThreadingPass::ProcessBlock and JumpThreadingPass::runImpl with
!DT->isReachableFromEntry(BB)

Hi @uabelho, in order to actually use the DT you have to perform a flush on DDT like so:

DominatorTree *DT = DDT->flush();
!DT->isReachableFromEntry(BB);
...

This flush() call should only happen when you're absolutely sure you need the DT because it's potentially expensive. But the whole point of the deferred nature is to flush when you need it, and in this case you need it.

and problem is solved?
Do we need something more to preserve DT while JumpThreading is running so we can use it? Hopefully something that isn't very costly?
If someone explains how it can be used I'll try to find the time to update the patch.

I haven't tested flushing() in arbitrary locations in the code. It should just work but it might cause an assert inside DT. If you see one let me know, I'd be interested to understand why.

Hi @uabelho, in order to actually use the DT you have to perform a flush on DDT like so:

DominatorTree *DT = DDT->flush();
!DT->isReachableFromEntry(BB);
...

This flush() call should only happen when you're absolutely sure you need the DT because it's potentially expensive. But the whole point of the deferred nature is to flush when you need it, and in this case you need it.

Great, I'll try this out then.

I haven't tested flushing() in arbitrary locations in the code. It should just work but it might cause an assert inside DT. If you see one let me know, I'd be interested to understand why.

Ok, I'll let you know if I find something odd.

Thanks!

uabelho updated this revision to Diff 130139.Jan 17 2018, 4:50 AM
uabelho retitled this revision from [LVI] Add initial result to avoid infinite getValueFromCondition recursion to [JumpThreading] Use DT to avoid processing dead blocks.
uabelho edited the summary of this revision. (Show Details)

Use DT directly in JumpThreading instead of hacking LVI.

uabelho added a comment.EditedJan 17 2018, 4:56 AM

Hi @uabelho, in order to actually use the DT you have to perform a flush on DDT like so:

DominatorTree *DT = DDT->flush();
!DT->isReachableFromEntry(BB);
...

This flush() call should only happen when you're absolutely sure you need the DT because it's potentially expensive. But the whole point of the deferred nature is to flush when you need it, and in this case you need it.

Great, I'll try this out then.

I've done something like this in ProcessBlock which prevents LVI from crashing anymore.

In JumpThreadingPass::runImpl then there is also this code that I haven't done anything with:

while (ProcessBlock(BB))
  Changed = true;

++I;

// Don't thread branches over a block that's slated for deletion.
if (DDT->pendingDeletedBB(BB))
  continue;

// If the block is trivially dead, zap it.  This eliminates the successor
// edges which simplifies the CFG.
if (pred_empty(BB) &&
    BB != &BB->getParent()->getEntryBlock()) {
  DEBUG(dbgs() << "  JT: Deleting dead block '" << BB->getName()
        << "' with terminator: " << *BB->getTerminator() << '\n');
  LoopHeaders.erase(BB);
  LVI->eraseBlock(BB);
  DeleteDeadBlock(BB, DDT);
  Changed = true;
  continue;
}

Should we use DT here as well in some way? I did a quick attempt changing the pred_empty(BB) stuff above
to use DT in the same way as I did in processBlock and then I hit an assert in DeleteDeadBlock saying the block
wasn't dead after all anyway.

Or should we add

if (!DDT->flush().isReachableFromEntry(BB))
  continue;

?
That alone seems to solve the problem, even if I remove the fix in ProcessBlock.

Hi @uabelho.

Or should we add

if (!DDT->flush().isReachableFromEntry(BB))
  continue;

?
That alone seems to solve the problem, even if I remove the fix in ProcessBlock.

Doing this outside of ProcessBlock is preferred. The Jumpthreading pass is very expensive in that it iterates over F until no changes are made and every iteration of F it iterates multiple times over every BB until each block stabilizes. If there is a way to gate the flush() with a simple check I recommend doing so too.

Added inline comment.

lib/Transforms/Scalar/JumpThreading.cpp
951 ↗(On Diff #130139)

I do not think this will be accepted by the community. The performance implications are non-trivial (around 1-4%) when this is called on almost every basic block. It's the primary reason I had to write the DeferredDominance class.

Added inline comment.

lib/Transforms/Scalar/JumpThreading.cpp
951 ↗(On Diff #130139)

So if anyone who knows this code can tell me what is accepted that would be great because I have no idea.

davide edited reviewers, added: kuhar; removed: kuba.Jan 17 2018, 10:47 PM

@kuhar can probably take a look

I've just submitted https://llvm.org/PR36133 containing a testcase that show how JumpThreadings miscompiles (without crashes) the code due to DomTree not being updated between the iterations of JumpThreading itself.

I think the miscompilation is worse than a crash so it might be possible to pay the increase in compile time to avoid it. Or maybe someone has a better mitigation?

Err, looking at that bug, it looks like it occurs in release versions before the dominance update patch was ever submitted :)

Past that, the right way to go about this is actually figure out at what point and conditions it needs to be updated and do it there . Use the smallest hammer, not the biggest one

I think PR36133 could be considered a different issue. LVI currently uses a domtree for certain queries involving llvm.assume; to make that work, the domtree would have to be up-to-date every time we call into LVI. But we could solve that separately from the problem of dead code in LVI.

spatel added a subscriber: spatel.Jan 30 2018, 6:56 AM

I think PR36133 could be considered a different issue. LVI currently uses a domtree for certain queries involving llvm.assume; to make that work, the domtree would have to be up-to-date every time we call into LVI. But we could solve that separately from the problem of dead code in LVI.

Or it needs to know that it must flush updates before using it (not sure how incestuous you want to make this. This would basically be making LazyDomTree that flushes the deferred updater when you called dominates on it :P)

(also, the DT was not updated at all prior during JT prior to all of these patches, so i guess it all worked because LVI didn't use DT when called from JumpThreading or something)?

(also, the DT was not updated at all prior during JT prior to all of these patches, so i guess it all worked because LVI didn't use DT when called from JumpThreading or something)?

I've created D42717 in response to PR 36133.

As to your assumption I believe you are correct. LVI will perform more expensive checks if DT isn't available and I think that's what happened in the past. I'm still confused why I see the same regression for the addition of AA to JumpThreading last March but I'm more concerned about making tip not miscompile.

I have a patch (that relies on my fix from D42717) to also fix this issue with little additional overhead. Because we must flush all pending DT transactions before using LVI analysis we can also check if the block is unreachable from entry. If that's the case we can immediately pessimize threads across the block. Here's the patch along with a modified version of D42717:

diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
index 87a1d5e487e..51c9923ff83 100644
--- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
@@ -617,6 +617,10 @@ bool JumpThreadingPass::ComputeValueKnownInPredecessors(
     // "X < 4" and "X < 3" is known true but "X < 4" itself is not available.
     // Perhaps getConstantOnEdge should be smart enough to do this?

+    auto &DT =
+        DDT->flush(); // LVI analysis relies on an up-to-date DominatorTree.
+    if (!DT.isReachableFromEntry(BB))
+      return false; // Don't jumpthread blocks that will soon be dead.
     for (BasicBlock *P : predecessors(BB)) {
       // If the value is known by LazyValueInfo to be a constant in a
       // predecessor, use that information to try to thread this block.
@@ -630,6 +634,10 @@ bool JumpThreadingPass::ComputeValueKnownInPredecessors(

   /// If I is a PHI node, then we know the incoming values for any constants.
   if (PHINode *PN = dyn_cast<PHINode>(I)) {
+    auto &DT =
+        DDT->flush(); // LVI analysis relies on an up-to-date DominatorTree.
+    if (!DT.isReachableFromEntry(BB))
+      return false; // Don't jumpthread blocks that will soon be dead.
     for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {
       Value *InVal = PN->getIncomingValue(i);
       if (Constant *KC = getKnownConstant(InVal, Preference)) {
@@ -769,6 +777,10 @@ bool JumpThreadingPass::ComputeValueKnownInPredecessors(
           if (!isa<Constant>(RHS))
             continue;

+          auto &DT = DDT->flush(); // LVI analysis relies on an up-to-date
+                                   // DominatorTree.
+          if (!DT.isReachableFromEntry(BB))
+            return false; // Don't jumpthread blocks that will soon be dead.
           LazyValueInfo::Tristate
             ResT = LVI->getPredicateOnEdge(Pred, LHS,
                                            cast<Constant>(RHS), PredBB, BB,
@@ -792,6 +804,10 @@ bool JumpThreadingPass::ComputeValueKnownInPredecessors(

       if (!isa<Instruction>(CmpLHS) ||
           cast<Instruction>(CmpLHS)->getParent() != BB) {
+        auto &DT =
+            DDT->flush(); // LVI analysis relies on an up-to-date DominatorTree.
+        if (!DT.isReachableFromEntry(BB))
+          return false; // Don't jumpthread blocks that will soon be dead.
         for (BasicBlock *P : predecessors(BB)) {
           // If the value is known by LazyValueInfo to be a constant in a
           // predecessor, use that information to try to thread this block.
@@ -820,6 +836,10 @@ bool JumpThreadingPass::ComputeValueKnownInPredecessors(
             match(CmpLHS, m_Add(m_Value(AddLHS), m_ConstantInt(AddConst)))) {
           if (!isa<Instruction>(AddLHS) ||
               cast<Instruction>(AddLHS)->getParent() != BB) {
+            auto &DT = DDT->flush(); // LVI analysis relies on an up-to-date
+                                     // DominatorTree.
+            if (!DT.isReachableFromEntry(BB))
+              return false; // Don't jumpthread blocks that will soon be dead.
             for (BasicBlock *P : predecessors(BB)) {
               // If the value is known by LazyValueInfo to be a ConstantRange in
               // a predecessor, use that information to try to thread this
@@ -900,6 +920,10 @@ bool JumpThreadingPass::ComputeValueKnownInPredecessors(
     }
   }

+  auto &DT =
+      DDT->flush(); // LVI analysis relies on an up-to-date DominatorTree.
+  if (!DT.isReachableFromEntry(BB))
+    return false; // Don't jumpthread blocks that will soon be dead.
   // If all else fails, see if LVI can figure out a constant value for us.
   Constant *CI = LVI->getConstant(V, BB, CxtI);
   if (Constant *KC = getKnownConstant(CI, Preference)) {
@@ -1101,7 +1125,10 @@ bool JumpThreadingPass::ProcessBlock(BasicBlock *BB) {
       // unconditional. Because its no longer interesting as far as jump
       // threading is concerned.
       assert(CondBr->isConditional() && "Threading on unconditional terminator");
-
+      auto &DT =
+          DDT->flush(); // LVI analysis relies on an up-to-date DominatorTree.
+      if (!DT.isReachableFromEntry(BB))
+        return false; // Don't jumpthread blocks that will soon be dead.
       LazyValueInfo::Tristate Ret =
         LVI->getPredicateAt(CondCmp->getPredicate(), CondCmp->getOperand(0),
                             CondConst, CondBr);
@@ -2371,6 +2398,10 @@ bool JumpThreadingPass::TryToUnfoldSelect(CmpInst *CondCmp, BasicBlock *BB) {
     // Now check if one of the select values would allow us to constant fold the
     // terminator in BB. We don't do the transform if both sides fold, those
     // cases will be threaded in any case.
+    auto &DT =
+        DDT->flush(); // LVI analysis relies on an up-to-date DominatorTree.
+    if (!DT.isReachableFromEntry(BB))
+      return false; // Don't jumpthread blocks that will soon be dead.
     LazyValueInfo::Tristate LHSFolds =
         LVI->getPredicateOnEdge(CondCmp->getPredicate(), SI->getOperand(1),
                                 CondRHS, Pred, BB, CondCmp);

I have verified it does not crash on PR33357-lvi-recursion.ll, passes check-all and test-suite runs.

I have analyzed this failure and have what I believe (in @dberlin 's words) to be the smallest hammer fix:

$ git diff lib/Analysis/LazyValueInfo.cpp
diff --git a/llvm/lib/Analysis/LazyValueInfo.cpp b/llvm/lib/Analysis/LazyValueInfo.cpp
index 65fd007dc0b..36e66f7f6ef 100644
--- a/llvm/lib/Analysis/LazyValueInfo.cpp
+++ b/llvm/lib/Analysis/LazyValueInfo.cpp
@@ -1145,9 +1145,17 @@ getValueFromConditionImpl(Value *Val, Value *Cond, bool isTrueDest,
              (!isTrueDest && BO->getOpcode() != BinaryOperator::Or))
     return ValueLatticeElement::getOverdefined();

-  auto RHS = getValueFromCondition(Val, BO->getOperand(0), isTrueDest, Visited);
-  auto LHS = getValueFromCondition(Val, BO->getOperand(1), isTrueDest, Visited);
-  return intersect(RHS, LHS);
+  // Prevent infinite recursion if Cond references itself as in this example:
+  //  Cond: "%tmp4 = and i1 %tmp4, undef"
+  //    BL: "%tmp4 = and i1 %tmp4, undef"
+  //    BR: "i1 undef"
+  Value *BL = BO->getOperand(0);
+  Value *BR = BO->getOperand(1);
+  if (BL == Cond || BR == Cond)
+    return ValueLatticeElement::getOverdefined();
+
+  return intersect(getValueFromCondition(Val, BL, isTrueDest, Visited),
+                   getValueFromCondition(Val, BR, isTrueDest, Visited));
 }

The infinite recursion happens because Visited is not altered or updated until getValueFromConditionImpl() returns a value. In this case getValueFromConditionImpl() calls getValueFromCondition() which calls back to getValueFromConditionImpl() in an attempt to resolve the LHS and RHS of a valid BinaryOperator.

In our case this specific instruction we have %tmp4 = and i1 %tmp4, undef which I don't quite understand. Shouldn't this be invalid SSA form?

This patch passes ninja check-all and test-suite builds on x86_64.

@uabelho , the test case needs one small change to the FileCheck verification block:

; The
;  br i1 undef, label %bb6, label %bb1
; is replaced by
;  br label %bb6
; and the rest of the function is unreachable from entry and jump-threading
; shouldn't even ask LazyValueInfo about things in the dead blocks.

; CHECK:      define void @f(i32 %p1) {
; CHECK-NEXT: bb6:
; CHECK-NEXT:   ret void

For whatever reason JumpThreading now removes %0 = icmp eq i32 %p1, 0 from bb6:.

Gentle ping. Are there any objections to this patch? Should I open another Differential for this patch in preparation to commiting it?

dberlin accepted this revision.Mar 12 2018, 8:53 AM

I'm a little worried about the duplicated code here. I feel like you are having to patch way too many points in the same function, and it will be easy to miss one in the future if someone changes the logicc slightly.

Is there really no better place to put this/way to refactor it?

If not, i'll approve, but just wanted to check.

This revision is now accepted and ready to land.Mar 12 2018, 8:53 AM

I'm a little worried about the duplicated code here. I feel like you are having to patch way too many points in the same function, and it will be easy to miss one in the future if someone changes the logicc slightly.

Is there really no better place to put this/way to refactor it?

If not, i'll approve, but just wanted to check.

Hi @dberlin, thank you for the comment. I'm a bit confused by your remarks however. I suspect you may be confusing my older suggested patch with my newer one. The older patch won't work as it requires far too many flushes of DDT. And I agree, it's also too brittle. Here is my most recent patch I believe to properly address the LVI infinite recursion:

$ git diff lib/Analysis/LazyValueInfo.cpp
diff --git a/llvm/lib/Analysis/LazyValueInfo.cpp b/llvm/lib/Analysis/LazyValueInfo.cpp
index 65fd007dc0b..36e66f7f6ef 100644
--- a/llvm/lib/Analysis/LazyValueInfo.cpp
+++ b/llvm/lib/Analysis/LazyValueInfo.cpp
@@ -1145,9 +1145,17 @@ getValueFromConditionImpl(Value *Val, Value *Cond, bool isTrueDest,
              (!isTrueDest && BO->getOpcode() != BinaryOperator::Or))
     return ValueLatticeElement::getOverdefined();

-  auto RHS = getValueFromCondition(Val, BO->getOperand(0), isTrueDest, Visited);
-  auto LHS = getValueFromCondition(Val, BO->getOperand(1), isTrueDest, Visited);
-  return intersect(RHS, LHS);
+  // Prevent infinite recursion if Cond references itself as in this example:
+  //  Cond: "%tmp4 = and i1 %tmp4, undef"
+  //    BL: "%tmp4 = and i1 %tmp4, undef"
+  //    BR: "i1 undef"
+  Value *BL = BO->getOperand(0);
+  Value *BR = BO->getOperand(1);
+  if (BL == Cond || BR == Cond)
+    return ValueLatticeElement::getOverdefined();
+
+  return intersect(getValueFromCondition(Val, BL, isTrueDest, Visited),
+                   getValueFromCondition(Val, BR, isTrueDest, Visited));
 }

Yeah, my review window didn't update for some reason till just now.
LGTM

This revision was automatically updated to reflect the committed changes.

Thank you for solving this Brian!