This is an archive of the discontinued LLVM Phabricator instance.

[RuntimeUnrolling] Add logic for loops with multiple exit blocks
ClosedPublic

Authored by anna on May 9 2017, 7:36 AM.

Details

Summary

Runtime unrolling is done for loops with a single exit block and a
single exiting block (and this exiting block should be the latch block).
This patch adds logic to support unrolling in the presence of multiple exit
blocks (which also means multiple exiting blocks), when runtime unrolling
generates epilog blocks. A very similar logic can be applied when generating
prolog blocks as well.
One restriction on the exit blocks (other than the latch exit block)
is they should have no successors. This can alsio be extended in the future.

This patch is essentially an implementation patch. I have not added any
heuristic (in terms of branches added or code size) to decide when
this should be enabled.

Diff Detail

Repository
rL LLVM

Event Timeline

anna created this revision.May 9 2017, 7:36 AM
anna added inline comments.May 9 2017, 7:38 AM
test/Transforms/LoopUnroll/runtime-loop-multiple-exits.ll
4 ↗(On Diff #98280)

I will add more check statements in a later update. I'm hoping the focus for the first iteration is any implementation errors, or high level concerns.

evstupac edited edge metadata.May 18 2017, 1:07 PM

Hi Anna,

Thank for your contribution.
I've made some inline comments. Please take a look.

Thanks,
Evgeny

lib/Transforms/Utils/LoopUnrollRuntime.cpp
496 ↗(On Diff #98280)

How about passing "UnrollRuntimeMultiExit" as a parameter to UnrollRuntimeLoopRemainder" and introducing dependency like:
UseEpilogRemainder |= UnrollRuntimeMultiExit for loops with multi exits and
UnrollRuntimeMultiExit &= UseEpilogRemainder if UseEpilogRemainder is set by user?

508 ↗(On Diff #98280)

Could you please add a comment here?

691 ↗(On Diff #98280)

Maybe it's ok, but according to the comment (line 685) we clone extra blocks right after inserting previous into function. The order needs an additional comment.

I'd recommend adding this part to "CloneLoopBlocks".

698 ↗(On Diff #98280)

This should be more compact:
for (Instruction &II : *BB)

anna marked 3 inline comments as done.Jun 14 2017, 12:42 PM

Thanks Evgeny for the review. Updated diff.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
496 ↗(On Diff #98280)

I think the UnrollRuntimeMultiExit can be just a cl::opt for now? The end goal is to support this for prolog and epilog remainder loop creation and under some heuristic that is profitable to create such loop.

691 ↗(On Diff #98280)

Yes, it's cleaner - added as part of CloneLoopBlocks.

anna updated this revision to Diff 102594.Jun 14 2017, 12:43 PM
anna marked an inline comment as done.

Addressed review comments.

reames requested changes to this revision.Jun 22 2017, 2:33 PM
reames added inline comments.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
294 ↗(On Diff #102594)

Orthogonal to your change, but this code would be much simpler if we always created the remainder loop and then broke the backedge (removing the loop) if we knew the inserted loop was trivial.

429 ↗(On Diff #102594)

Ok, I'm missing something. Why do we need this? Is it just to preserve the dedicated exit property? If so, wouldn't simply splitting the edge and inserting a trivial edge (without duplication) be a lot simpler?

447 ↗(On Diff #102594)

This assignment is pointless.

519 ↗(On Diff #102594)

It looks like you're re-purposing this as the Latch Exit. Can you rename this and check it in? It'll make the diff smaller and easier to read.

546 ↗(On Diff #102594)

I think this bit of code can be simplified via hasDedicatedExits.

Also, I believe you actually *need* to check that predicate before calling getUniqueExitBlocks. Reading the comment on that function makes it seem like having dedicated exits is a precondition for that function.

567 ↗(On Diff #102594)

Can you separate and land this change? It should be NFC for the old code and if it's not, it'd be good to find out now.

And, reading the comments carefully, I'm not sure this is actually NFC. Aren't the exit count and the backedge taken count defined differently? (Consider loop whose header executes once and whose backedge is dynamically dead.)

Also, "guaranteed not to exit" doesn't seem to imply guaranteed to exit on the next iteration.

This revision now requires changes to proceed.Jun 22 2017, 2:33 PM
anna marked an inline comment as done.Jun 23 2017, 7:35 AM
anna added inline comments.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
429 ↗(On Diff #102594)

You mean splitting the edge (which would create this new exits) and add a trivial edge to the original exits, to avoid duplicating the code? We won't be able to do that because we need to preserve the dedicated exit property on *both* loops: the remainder loop and the original loop.
Adding this edge breaks the dedicated exit property for the original loop.

519 ↗(On Diff #102594)

Moved the assert that checks for it closer to the definition as well.

anna marked an inline comment as done.Jun 26 2017, 2:03 PM
anna added inline comments.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
294 ↗(On Diff #102594)

Actually, I'm not convinced if the code would be simpler.
So, there are multiple data structures updated when this "new" remainder loop gets generated, apart from the updates to the CFG edges.
After CloneLoopBlocks, we'll need to call a modified version of deleteDeadLoop (which is currently in LoopDeletion pass). So, apart from breaking the backedge, we'll need to update the LPM to state that the (remainder) loop we just added should be deleted.

546 ↗(On Diff #102594)

We're looking for more than dedicated exits here. This is just a bail out condition to simplify the code for now: the only exit allowed to have successors is the LatchExit.
Also, getUniqueExitBlocks works here because we check for the fact that loop is in LoopSimplifyForm (which checks for hasDedicatedExits).

However, I think this bail out can be dropped based on your offline suggestion: use LoopSimplify's logic for generating dedicated exits.

567 ↗(On Diff #102594)

Thanks for bringing this up.
I read the actual SCEV code (comments seem to differ from the code) for getExitCount versus getBackEdgeTakenCount, and the getBackEdgeTakenCount is semantically equivalent to getExitCount for all exits in the loop.

So, I think this will be an NFC for the old code.

anna planned changes to this revision.Jun 27 2017, 7:33 AM
anna marked an inline comment as done.

Changing the code to generate dedicated exits instead of cloning the exit blocks. This would make it simpler to reason about successors to the exit blocks.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
567 ↗(On Diff #102594)

Submitted separately as NFC. no problems spotted.

anna added inline comments.Jun 27 2017, 9:04 AM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
294 ↗(On Diff #102594)

I realized one more thing while working on a different problem in loop deletion: the LPM Updater *cannot* remove this remainder loop. The removal works only for current loop and subloops within it. This remainder loop is not a subloop.

And just breaking the backedge without deleting the loop, will break the loop structure and leave the loop in the LPM, which is incorrect.

This makes the CreateRemainderLoop checks pretty useful!

We can still generate the remainder loop code unconditionally and rely on some later pass on simplifyCFG to remove this loop (if it handles this optimization of single trip count 'loop').

anna updated this revision to Diff 104265.Jun 27 2017, 1:57 PM
anna edited edge metadata.

Addressed review comments. Now we no longer clone the loop exits, but generate the edges to the original loop exits
and canonicalize to dedicated loop exits afterwards.
This automatically helps support other loop exits having successors, so that restriction is removed.
Added one more test case to verify exits having successors.

anna added a comment.Jun 27 2017, 2:02 PM

The logic is now rewritten to avoid cloning the exit blocks (and test cases updated). The main puzzle was in the dominator tree update that was required after dedicated exits were generated for adjacent loops.

anna added inline comments.Jun 27 2017, 2:09 PM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
495 ↗(On Diff #104265)

Note: This assert is actually NFC wrt old assert, but it increases the compile time because contains is O(number of blocks in loop). That's the reason I didn't check it in separately.

We cannot directly compare against LatchExit as done previously, since it may not exist.

Anna and I talked offline because I was having trouble understanding the intent of the new code. She's going to update a revised version which will address the cause of my confusion.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
293 ↗(On Diff #104265)

Update the comment to describe the return value please.

764 ↗(On Diff #104265)

Dead variables?

anna updated this revision to Diff 104408.Jun 28 2017, 6:43 AM
anna marked 3 inline comments as done.

Updated the dominator info at the location where the dom tree first fails verification.
Moved the logic for phi node updates to immediately after the edges are created.
Added more tests and the file check statements.
Renamed couple of variables and added comments clarifying code.

anna updated this revision to Diff 104414.Jun 28 2017, 7:27 AM

NFC wrt previous diff: updated the test CHECK statements to avoid using the auto generated CHECK statements.
Manually updating CHECK statements makes the actual test much clearer.

reames accepted this revision.Jun 30 2017, 8:30 AM

LGTM. Thanks for all the work on this!

This revision is now accepted and ready to land.Jun 30 2017, 8:30 AM
This revision was automatically updated to reflect the committed changes.
anna added a comment.Jun 30 2017, 11:48 AM

LGTM. Thanks for all the work on this!

Thanks for all the suggestions, it simplified the code and improved the scope! Now, adding support in presence of prolog loop is just about adding test cases and updating comments :)