This is an archive of the discontinued LLVM Phabricator instance.

[PR18861] Fix for LoopUnroll pass was breaking LCSSA form while completely removing loop
ClosedPublic

Authored by dinesh.d on Mar 5 2014, 11:13 PM.

Details

Summary

I have already sent mail to llvm-commits [http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140303/207346.html].

If LoopUnroll pass decides to completely unroll a loop, before deleting it [deleteLoopFromQueue()] from the loop list, it updates parent loop for all blocks in the loop to be unrolled [updateUnloop()->updateBlockParents()]. Candidate for parent loop is based on finding nearest parent loop among the block successors [getNearestLoop()], and returns null if a block does not have any successor [e.g. exit blocks having return statement, not sure if this may happen in other cases]. This causes the block to come out of loop nest. Now if this block is using some variable assigned inside loop nest, then it breaks LCSSA form.

This is what I have come up to fix LCSSA while detaching exit block from loop nest.

I have assumed few things.

  1. As block was part of loop getting unrolled, it will have predecessor which is part of loop nest.
  2. If any value used as instruction operand in the block whose parent is out side the loop getting unrolled, we have to add PHI node from all predecessor of the block for it to maintain LCSSA.

I will appreciate any comment or pointer to right direction.

Diff Detail

Repository
rL LLVM

Event Timeline

gentle ping.

this patch fixes crash reported in PR18861.

dinesh.d updated this revision to Diff 9033.May 2 2014, 6:39 AM
dinesh.d edited edge metadata.

fixed issue while iterating users of instruction in the patch

Gentle ping

atrick requested changes to this revision.May 9 2014, 3:44 PM
atrick edited edge metadata.

Thanks for fixing this bug! And sorry for the review delay.

I'm concerned that this implementation only handles some cases. I know
this code is super confusing, and your fix is better than nothing. But
we should try to generalize a bit because the existing LCSSA update is
full of missing corner cases. Your fix is just working around one specific
case with a new analysis.

In general, the problem is that full unrolling may cause some basic blocks to bubble up in the loop tree. Now some of the original loop's blocks may be shallower in the loop nest than others. The -view-cfg of each of the unloop.ll functions will hopefully show you some interesting cases. You have to have at least four levels of loops to see the interesting cases. Note that the problem you are solving is not limited to the case when a block "exits the function".

Also when replacing uses:

+ if (OiPN != Ui && Ui->getParent() == BB)
+ Ui->replaceUsesOfWith(Oi, OiPN);

We need to replace all uses that are not contained in the defining instruction's parent loop.

The only clean, robust way of handling this is to rerun LCSSA
formation as a utility on any block on a path to a new loop exit after
unrolling. Trying to reimplement LCSSA formation and take shortcuts is
not a good idea, because the problem of updating after unrolling seems
subject to the same corner cases as the general problem. Also,
"updateUnloop" is only supposed to update LoopInfo so we don't want to
mess with that unless necessary.

Currently, LoopUnroll does attempt to rerun LCSSA on the parent loop. So the real question is, why does this fail to catch some cases? The problem is that it only handles loop exits that end up in the parent loop. Now, we don't want to rerun LCSSA on the whole function, so what do we do? The answer is to find the shallowest loop one level deeper than the shallowest exit and rerun formLCSSARecursively on that loop.

i.e.

  • Outer->L1->L2->L3
  • Fully unroll L3.
  • If we see an exit to L1, formLCSSA for L2
  • If we see an exit to Outer, formLCSSA for L1

Loop unrolling can probably notice these exits when setting up
branches. Then walk up the loop tree from the current loop to find the
child of the exit.

This revision now requires changes to proceed.May 9 2014, 3:44 PM

Thanks, I agree that I don't understand complete flow and internals.

But While debugging for this bug, I observed that there are only 2 places we are messing
up with PHI nodes during LoopUnrool pass, one in FoldBlockIntoPredecessor() while folding
single entry PHI nodes and other during loop simplify.

For issue due to loop simplify, Chandler has added formLCSSARecursively() for parent loop
if current loop is unrolled completely. I saw that if we run formLCSSARecursively() on the
root node of the loop nest, it fixes this bug too. But as this was temporary fix as per chandler,
I started looking to completely avoid formLCSSARecursively.

This patch tries to fix PHIs deleted by FoldBlockIntoPredecessor. I had send another patch
to keep PHIs nodes from deletion while simplifying loop.

I assumed that if loop was in LCSSA and we keep/ re-insert deleted PHI nodes which was
there only to keep loop in LCSSA form, that will ensure that loop will be in LCSSA after pass
and will take care of all corner cases. But I may be wrong.

I will try to update patch as per your suggestions.

Again thanks for review and suggestions :).

Sorry for delay in response.

Andrew has suggested some investigations in loop unroll pass. I am going through them and will update this patch soon.

dinesh.d updated this revision to Diff 9834.May 27 2014, 5:39 AM
dinesh.d edited edge metadata.

To start with, I looked through all latches in the the loop, and check if they are
moved higher in loop nest, if so, I am trying to run formLCSSARecursively on highest
loop where any latch is landed.

This resolves PR18861, as the issue was caused by FoldBlockIntoPredecessor() call
in following code.

for (unsigned i = 0, e = Latches.size(); i != e; ++i) {
  BranchInst *Term = cast<BranchInst>(Latches[i]->getTerminator());
  if (Term->isUnconditional()) {
    BasicBlock *Dest = Term->getSuccessor(0);
    if (BasicBlock *Fold = FoldBlockIntoPredecessor(Dest, LI, LPM))
      std::replace(Latches.begin(), Latches.end(), Dest, Fold);
  }
}

FoldBlockIntoPredecessor() call merges BB and removes all single PHIs. In such cases,
new block will end up in 'Latches' list and all those invalidations are handled by the patch.

I am experimenting further to see if we have to check any other BB apart from latches.

atrick accepted this revision.May 27 2014, 6:10 PM
atrick edited edge metadata.

Hi Dinesh,

This looks like it will work. It is different than the fix I was proposing and very tricky. I actually thought your fix was incorrect before realizing you're checking the Latch's ParentLoop after calling deleteLoopFromQueue. This ensures that the final loop latch will already be promoted to the outermost loop that any loop exit branches to. Note that we only care about the last loop latch (std::prev(Latches.end())), so I don't think you need to check all latches. If you add comments something like this then it's good to commit:

LCSSA must be performed on the outermost affected loop. The unrolled
loop's last loop latch is guaranteed to be in the outermost loop after
deleteLoopFromQueue updates LoopInfo.

Then you could either add this comment:

This may conservatively run LCSSA on the parent loop one level higher in the loop tree than necessary.

Or you could optimize the code further by keeping track of the loop one level below OuterL and calling formLCSSA on that intstead.

This revision is now accepted and ready to land.May 27 2014, 6:10 PM
dinesh.d updated this revision to Diff 9865.May 28 2014, 2:51 AM
dinesh.d edited edge metadata.

I have updated patch as per your suggestions. I will commit this patch and continue
experimenting with different loop structure. I already have tried many different combinations
but none of them cause similar crash.

Thanks for review and support.

dinesh.d closed this revision.May 28 2014, 11:55 PM
dinesh.d updated this revision to Diff 9908.

Closed by commit rL209796 (authored by dinesh).