Page MenuHomePhabricator

[LoopDeletion] Handle inner loops w/untaken backedges

Authored by reames on Jan 10 2021, 5:08 PM.



This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA.

When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes.

This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop.

(I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.)

At a high level, we seem to have two choices. Either a) rebuild lcssa if needed, or b) restrict the transformation such that an lcssa rebuild isn't needed.

The lcssa rebuild approach is potentially expensive in the worst case. Each rebuild is potentially O(N^2) in the number of instructions in the loop being rebuilt. At worst, we could have N sub-loops (since each must contain at least one instruction), resulting in a worst case example of a whole forest of loops w/zero btc resulting in O(N^3). We have lots of precedent for this being an acceptable expense, but I want to explicit raise the issue for review. Thoughts?

Diff Detail

Event Timeline

reames created this revision.Jan 10 2021, 5:08 PM
reames requested review of this revision.Jan 10 2021, 5:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2021, 5:08 PM

The logic seems sound. I have no idea about the practical impact. Should we do some compile time testing (@fhahn @nikic) ?

Here's the compile-time numbers for the patch:

The only notable thing is mafft 52035M 52077M (+0.08%),. Perhaps some noise as there are no binary changes and IIUC it should only trigger if the transformation fires. Or perhaps a different optimzation later on gives the same result.

fhahn added inline comments.Jan 20 2021, 10:42 AM

From the arguments, it seems like changeToUnreachable at least tries to preserve LCSSA, but clearly misses the case at hand. I've not looked at the details on what exactly needs updating Do you think it would be possible to directly fix the values broken after removing the block?

reames added inline comments.Jan 20 2021, 2:51 PM

We're introducing a whole new set of exit blocks which used to be inside the loop. We either have to scan all defs in the loop (which this does), or all uses in the blocks removed from the loop. Is the later any better than the former? The former has the benefit that we can reuse existing code. (Remember we have to do this for all loop levels, so it's slightly tricky.)

fhahn accepted this revision.Jan 22 2021, 2:08 PM

LGTM, thanks!


Is the later any better than the former?

I have no idea. Let's use the existing infrastructure and take another look if there are any issues reported.

This revision is now accepted and ready to land.Jan 22 2021, 2:08 PM
This revision was automatically updated to reflect the committed changes.