This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] Fold all exits based on known trip count/multiple
ClosedPublic

Authored by nikic on Jun 13 2021, 12:25 PM.

Details

Summary

Fold all exits based on known trip count/multiple information from SCEV. Previously only the latch exit or the single exit were folded.

This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple entirely: They're still used to a) decide whether runtime unrolling should be performed and b) for ORE remarks. However, the core unrolling logic is independent of them now.

Diff Detail

Unit TestsFailed

Event Timeline

nikic created this revision.Jun 13 2021, 12:25 PM
nikic requested review of this revision.Jun 13 2021, 12:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2021, 12:25 PM

I think there's a problem with this code, but I'm surprised the existing test didn't catch it. Have you run this with explicit dom tree and loop info validation enabled?

Here's the problem I think I see. If we have an exit block which exits only on iteration X, and we unroll by count Y where Y > X, I believe we'll end up making the exit block unreachable. (This can't happen if there's only a single exit since that exit must be reached.) The problem with an unreachable exit is that there can be arbitrarily complicated loops reachable only through that path, and updating loop info in that case is hard. Am I missing something here?

llvm/lib/Transforms/Utils/LoopUnroll.cpp
747

This bit doesn't make sense to me, can you explain?

I think there's a problem with this code, but I'm surprised the existing test didn't catch it. Have you run this with explicit dom tree and loop info validation enabled?

Here's the problem I think I see. If we have an exit block which exits only on iteration X, and we unroll by count Y where Y > X, I believe we'll end up making the exit block unreachable. (This can't happen if there's only a single exit since that exit must be reached.) The problem with an unreachable exit is that there can be arbitrarily complicated loops reachable only through that path, and updating loop info in that case is hard. Am I missing something here?

There's two bits of subtlety here: 1. For non-latch blocks, we don't fold branches towards exit (only towards loop) and 2. for a complete unroll, we always report the last iteration as exiting. These two properties ensure that we don't run into any of the hard LI invalidation scenarios, because we always retain both a branch to all exits (on the last iteration) and all loop blocks, even if we know that one or the other may be unreachable.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
747

The runtime loop unrolling code may insert a prologue, in which case the computed trip counts for the original loop may no longer be correct.

nikic added inline comments.Jun 16 2021, 11:29 AM
llvm/lib/Transforms/Utils/LoopUnroll.cpp
747

I should mention that runtime unrolling is going to need some more work, in particular I think that the decision whether to use a runtime unroll or not should be made by the caller, not this utility. Once that is done, it might make sense to compute the trip count information after doing the runtime unroll transform, in which case we could still use the trip multiple logic below this for the non-latch exits. If we teach SCEV to recognize the trip multiple from the runtime unroll pattern, then we shouldn't need to specially handle runtime unrolling here at all, though I'm not confident that we can detect the trip multiple sufficiently reliably (once folds get involved).

I think that's all followup work though, and the option used here is the conservative one of only folding the latch exit as we did before.

reames accepted this revision.Jun 16 2021, 7:48 PM

LGTM, concerns addressed by responses. The bit about which exits we fold is particularly subtle. Might be worth adding a comment to the code.

This revision is now accepted and ready to land.Jun 16 2021, 7:48 PM
This revision was landed with ongoing or failed builds.Jun 17 2021, 11:58 AM
This revision was automatically updated to reflect the committed changes.