This is an archive of the discontinued LLVM Phabricator instance.

[Loop Peeling] Do not close further unroll/peel if profile based peeling was not used
ClosedPublic

Authored by skatkov on Jul 18 2019, 11:51 PM.

Details

Summary

Current peeling cost model can decide to peel off not all iterations
but only some of them to eliminate conditions on phi. At the same time
if any peeling happens the door for further unroll/peel optimizations on that
loop closes because the part of the code thinks that if peeling happened
it is profile based peeling and all iterations are peeled off.

To resolve this inconsistency the patch provides the flag which states whether
the full peeling basing on profile is enabled or not and peeling cost model
is able to modify this field like it does not PeelCount.

In a separate patch I will introduce an option to allow/disallow peeling basing
on profile.

To avoid infinite loop peeling the patch tracks the total number of peeled iteration
through llvm.loop.peeled.count loop metadata.

Diff Detail

Repository
rL LLVM

Event Timeline

skatkov created this revision.Jul 18 2019, 11:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2019, 11:51 PM

I'm not quite sure this is the right macro approach.

The comment near the code your modifying makes me thing the whole reasoning behind disabling future peeling and unrolling after the first peel may be flawed. It doesn't make sense to "use up" profiling information. Assuming we correctly updated our profiling when doing the transform, the resulting profile for the loop should indicate that it's cold and thus not profitable to further peel/unroll. If that's not happening, maybe there's another issue in play? (I wonder if your other profile bug fix may help here?)

p.s. I'm opened to being convinced that this is a practical answer, even if not an ideal one. Just make the argument. :)

llvm/include/llvm/Analysis/TargetTransformInfo.h
478 ↗(On Diff #210743)

Naming wise: Please call this something other than "Full" peeling. Full peeling sounds like full unrolling which would imply we were able to entirely eliminate the loop structure.

Maybe: PeelProfiledIterations?

reames requested changes to this revision.Jul 19 2019, 10:48 AM

(Marking as RC for tracking purposes only)

This revision now requires changes to proceed.Jul 19 2019, 10:48 AM
skatkov marked an inline comment as done.Jul 22 2019, 2:01 AM

I'm not quite sure this is the right macro approach.

The comment near the code your modifying makes me thing the whole reasoning behind disabling future peeling and unrolling after the first peel may be flawed. It doesn't make sense to "use up" profiling information. Assuming we correctly updated our profiling when doing the transform, the resulting profile for the loop should indicate that it's cold and thus not profitable to further peel/unroll. If that's not happening, maybe there's another issue in play? (I wonder if your other profile bug fix may help here?)

p.s. I'm opened to being convinced that this is a practical answer, even if not an ideal one. Just make the argument. :)

Hi Philip, the story is the following.

Let's we have a loop which does 5 iteration according to profile but also one of its condition can be simplified by peeling and to simplify condition we need only 1 iteration peeled off.
According to current implementation Loop Peeling cost model will decide that we should peel off one iteration.
One iteration is peeled off and weights are updated correctly (now estimated trip count is 5 - 1 == 4)
But we mark the loop as llvm.loop.unroll.disable unconditionally if we do any peeling.

As a result another potential LoopUnroll pass will not even consider this loop for peeling while if it does it would detect that we can peel additional 4 iterations.

So this patch fixes this part: if we did not peel all iteration basing on profile we should mark this loop as no consider for future peel/unroll.

Will update the patch soon with modified proposed name of variable which really makes the intention clearer.

llvm/include/llvm/Analysis/TargetTransformInfo.h
478 ↗(On Diff #210743)

Agreed.

skatkov updated this revision to Diff 211029.Jul 22 2019, 2:03 AM
skatkov retitled this revision from [Loop Peeling] Do not close further unroll/peel if not full peeling happens to [Loop Peeling] Do not close further unroll/peel if profile based peeling was not used.

Test is updated to show what this patch changes.

skatkov updated this revision to Diff 211690.Jul 25 2019, 1:28 AM
skatkov edited the summary of this revision. (Show Details)

Added the guard to not exceed the UnrollPeelMaxCount limit.

reames requested changes to this revision.Jul 30 2019, 11:49 AM

Having both the enable flag and the AlreadyPeeled variable is really confusing. Is there a way we could combine them? Maybe replace the boolean with the AlreadyPeeledViaProfiling count or something?

This revision now requires changes to proceed.Jul 30 2019, 11:49 AM

Having both the enable flag and the AlreadyPeeled variable is really confusing. Is there a way we could combine them? Maybe replace the boolean with the AlreadyPeeledViaProfiling count or something?

To me the flag AllowPeeling is to disable peeling at all while AlreadyPeeledViaProfiling is for disablement of part of the peeling.
In this term I would update the comment for AllowPeeling:
/ Allow peeling off loop iterations for loops with low dynamic tripcount.
to
/ Allow peeling off loop iterations.
taking into account that it is not now true (before my change as well).

If we really want to combine these options it makes sense to introduce enum with possible levels of peeling (say None, NonProfileBased, All).
Coding this enum with integer is not something really reducing the confusion.
However this enum will not be aligned with different unroll options. Also this will require some modifications in options handling - Different ways to set the desired level of peeling needs to support this new enum.

Let me know if we really want to introduce this.

Philip, please take a look at https://reviews.llvm.org/D65501 which introduces the peeling level as possible solution to handle your comment.

Philip, please take a look at https://reviews.llvm.org/D65501 which introduces the peeling level as possible solution to handle your comment.

See https://reviews.llvm.org/D65503. It is the same patch but based on D65501.

Philip, please take a look at https://reviews.llvm.org/D65501 which introduces the peeling level as possible solution to handle your comment.

See https://reviews.llvm.org/D65503. It is the same patch but based on D65501.

I personally prefer this variant.

reames accepted this revision.Aug 1 2019, 4:59 PM

LGTM.

Serguei and I spent a fair amount of time talking about this one offline. Neither of us are completely happy with the structure of this patch, but since it's using the same pattern as what was already there (just a bit more fine grained), I decide to approve this so as to unblock Serguei's other work.

He and I are planning to continuing thinking about the code structure here, and may come back with an NFC for the whole area if we can find a design that seems cleaner. Ideas welcome.

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
1140 ↗(On Diff #211690)

The more I look at this comment, the more it just feels wrong. Absolutely not a blocking item for this patch though!

llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp
330 ↗(On Diff #211690)

Sink this debug output under your if please.

This revision is now accepted and ready to land.Aug 1 2019, 4:59 PM
This revision was automatically updated to reflect the committed changes.