This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnrolling] Re-prioritize Peeling and Partial unrolling
ClosedPublic

Authored by mkazantsev on Feb 21 2017, 10:05 PM.

Details

Summary

In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.

This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.

Diff Detail

Event Timeline

mkazantsev created this revision.Feb 21 2017, 10:05 PM
mkuper added a subscriber: mkuper.Feb 21 2017, 10:42 PM
mkuper added inline comments.
lib/Transforms/Scalar/LoopUnrollPass.cpp
789

The current heuristic is supposed to be that we only peel when we don't know the trip count, but have an approximate one due to profile information. Partial unrolling requires a known trip count. So in the old code, they don't actually compete for priority, and this change looks like it should be NFC.

I'm not sure this is completely NFC, though - I don't remember if computePeelCount actually enforces an unknown trip count, or just assumes it's unknown, because otherwise we'd quit at line 838. So you probably want this to be "if (!TripCount && UP.PeelCount)", or change computePeelCount() to bail on a known trip count. Otherwise, what you may in effect be doing is enabling peeling for loops with a constant low trip count that we chose not to unroll. Which is probably not something you want to do.

mkazantsev added inline comments.Feb 22 2017, 3:01 AM
lib/Transforms/Scalar/LoopUnrollPass.cpp
789

In fact, even if trip count is known, peeling can still be profitable. One of such cases is shown at https://reviews.llvm.org/D30161, where Phi's back edge input is a constant. Partial unrolling here does not give any benefits, while peeling does. My understanding is following: Peeling should not set peel count unless it has proved that peeling IS profitable. In this case nothing bad happens with small loops that we don't unroll: they will only be peeled if it is good. Maybe it is reasonable to add the following restriction to peeling: TripCount should be either unknown or above some threshold (for example, partial unrolling threshold). Does it make sense for you?

mkuper added inline comments.Feb 22 2017, 10:22 AM
lib/Transforms/Scalar/LoopUnrollPass.cpp
789

Sorry, I wasn't clear.
I understand that there are cases where peeling is profitable even when the trip count is known - D30161 is a really nice example.

But right now, when we have profile information, we try to peel by the estimated number of iterations. That specific kind of peeling, combined with this patch, has two issues:

  1. getLoopEstimatedTripCount() returns the estimated (based on the profile) trip count. This may be imprecise, and at this point, somewhat different from the actual trip count, especially for sampling-based FDO. So you may end up peeling by the "estimated" trip-count even though you know the actual tripcount. That sounds wrong.
  1. Even if we fix getLoopEstimatedTripCount() to return the real trip count, when available, you still basically get "let's peel a loop by its real tripcount", which is equivalent to full unrolling. In theory, it shouldn't happen, because full unrolling and peeling should be using the same thresholds, so if we decided not to fully unroll, we'll decide not to "peel" either. But I'm not sure this actually holds.

I'm not saying we should disable any kind of peeling when the real trip count is known. Only that we should disable the "peel by the estimated trip count" heuristic.

mkazantsev updated this revision to Diff 89848.Feb 27 2017, 1:45 AM
mkazantsev edited the summary of this revision. (Show Details)
mkazantsev added a reviewer: mkuper.
mkazantsev marked 3 inline comments as done.
mkuper added inline comments.Feb 28 2017, 10:42 AM
lib/Transforms/Scalar/LoopUnrollPass.cpp
789

This patch doesn't really make sense as a stand-alone, and I'm not sure it's the right way to go as preparation to D30161 either.

I would suggest something like:

  1. In this patch, invert the priorities of peeling and partial unrolling, but bail out of computePeelCount when the real trip count is known. This should be NFC.
  2. In D30161, change the logic of computePeelCount() to be something like:
if conditions are right:
  peel by 1
else if trip count is known:
  bail out
else:
  check if we should peel by estimated trip count.
lib/Transforms/Utils/LoopUnrollPeel.cpp
75

If you do decide to call this twice - I'd expect this to also get hit only the second time around.

mkazantsev updated this revision to Diff 90121.Feb 28 2017, 9:29 PM
mkazantsev marked an inline comment as done.
mkazantsev edited the summary of this revision. (Show Details)
mkazantsev marked an inline comment as done.Mar 1 2017, 4:07 AM
mkazantsev added inline comments.
lib/Transforms/Utils/LoopUnrollPeel.cpp
75

No longer calling twice.

mkuper accepted this revision.Mar 1 2017, 2:17 PM

LGTM

This revision is now accepted and ready to land.Mar 1 2017, 2:17 PM
This revision was automatically updated to reflect the committed changes.