This is an archive of the discontinued LLVM Phabricator instance.

[LoopPeeling] Get rid of Phis that become invariant after N steps
ClosedPublic

Authored by mkazantsev on Apr 3 2017, 11:56 AM.

Details

Summary

This patch is a generalization of the improvement introduced in rL296898.
Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes
an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after
N iterations, we can peel N times and turn it into invariant.
In order to do this, we for every Phi in loop's header we define the Invariant Depth value
which is calculated as follows:

Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].

If %y is a loop invariant, then Depth(%x) = 1.
If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1.
Otherwise, Depth(%x) is infinite.

Notice that if we peel a loop, all Phis with Depth = 1 become invariants,
and all other Phis with finite depth decrease the depth by 1.
Thus, peeling N first iterations allows us to turn all Phis with Depth <= N
into invariants.

Diff Detail

Repository
rL LLVM

Event Timeline

mkazantsev created this revision.Apr 3 2017, 11:56 AM
anna requested changes to this revision.Apr 4 2017, 3:35 AM

Hi Max,
Couple of comments inline. The logic as such looks correct to me.

lib/Transforms/Utils/LoopUnrollPeel.cpp
101 ↗(On Diff #93904)

You can use a SmallDenseMap here since number of phis within a single basic block (i.e. header) is generally a small number.

153 ↗(On Diff #93904)

Nit: iterations.

test/Transforms/LoopUnroll/peel-loop-not-forced.ll
90 ↗(On Diff #93904)

Could you please add a test case that exercises the logic of choosing the depth based on the code size? i.e. say we have 2 phi's each of which have maxDepth of 3 and 4, but due to the code size restriction on peeling, you need to peel by only 3.

This revision now requires changes to proceed.Apr 4 2017, 3:35 AM
mkazantsev updated this revision to Diff 94043.Apr 4 2017, 4:30 AM
mkazantsev edited edge metadata.
mkazantsev marked 3 inline comments as done.

Added a new test, fixed typo, used SmallDenseMap for storage of Phis depth.

mkazantsev updated this revision to Diff 94044.Apr 4 2017, 4:32 AM
mkazantsev planned changes to this revision.Apr 4 2017, 4:37 AM
mkazantsev added inline comments.
lib/Transforms/Utils/LoopUnrollPeel.cpp
134 ↗(On Diff #94044)

Actually this one is not needed, because if the value is present in map, we will return it early in CalculateInvariantDepth.

mkazantsev updated this revision to Diff 94050.Apr 4 2017, 4:47 AM
mkazantsev marked an inline comment as done.

Removed redundant piece of code.

sanjoy requested changes to this revision.Apr 4 2017, 9:27 AM

Minor comments inline.

lib/Transforms/Utils/LoopUnrollPeel.cpp
109 ↗(On Diff #94050)

I think the logic may be more understandable if instead of returning 0 for a loop-varying-non-header-phi %x, you return DepthInfinity where DepthInfinity is an alias for UINT_MAX. Returning 0 in these cases seems a bit odd, since given your scheme I'd expect a dept of 0 meaning that the value is loop invariant, not that it is "hopelessly loop invariant".

111 ↗(On Diff #94050)

I think this is too large to be a lambda -- can you please extract it out into a static helper?

126 ↗(On Diff #94050)

Use else if (PHINode *IncPhi = dyn_cast<PHINode>(Input)) {

137 ↗(On Diff #94050)

One TODO here would be to consider some minor binary operations as well, like:

A = PHI(0, C)
B = PHI(0, 5)
C = B + 1

Ideally the depth of A should be 2, not 0.

This revision now requires changes to proceed.Apr 4 2017, 9:27 AM
mkazantsev updated this revision to Diff 94201.Apr 5 2017, 4:45 AM
mkazantsev edited edge metadata.
mkazantsev marked 4 inline comments as done.
mkazantsev edited the summary of this revision. (Show Details)
mkazantsev added inline comments.Apr 5 2017, 8:22 AM
lib/Transforms/Utils/LoopUnrollPeel.cpp
137 ↗(On Diff #94050)

Good catch!

sanjoy requested changes to this revision.Apr 5 2017, 11:16 AM

Comments inline.

lib/Transforms/Utils/LoopUnrollPeel.cpp
50 ↗(On Diff #94201)

s/InfiniteDepth/InfiniteInvariantDepth/

84 ↗(On Diff #94201)

I should have mentioned this in the earlier revision, but do you think computeIterationsToInvariance is a better name for this function? Depth does not clarify what we're computing here very much.

If you do this, please also rename InfiniteDepth (this suggestion also invalidates the "expand Inv to Invariant" suggestion).

85 ↗(On Diff #94201)

I'd expand Inv to Invariant.

94 ↗(On Diff #94201)

This is a suggestion, feel free to ignore:

getLoopLatch isn't super-cheap (it re-discovers the latch every time). How about passing the latch explicitly to this function?

109 ↗(On Diff #94201)

s/estimate/compute an upper bound on/

138 ↗(On Diff #94201)

I don't think there is a need to explain the algorithm here (you should do that on the implementation, as you've already done). Instead renaming the helper to computeIterationsToInvariance will make the logic here more obvious.

149 ↗(On Diff #94201)

Why did you change the condition to 2 * LoopSize <= UP.Threshold?

165 ↗(On Diff #94201)

Can UP.Threshold be 0 (say it was directly set by the user by UnrollThreshold)?

168 ↗(On Diff #94201)

I don't know the rest of the code well enough to be conclusive on this, but should this be UP.PeelCount = std::max(UP.PeelCount, DesiredPeelCount);? That is, if we've earlier decided, due to some other reasons, that peeling for 5 iterations is a good idea, and here we decide that peeling for only 3 iterations is a good idea, perhaps we should still peel for 5 iterations?

Actually, even if UP.PeelCount > DesiredPeelCount is impossible, I think UP.PeelCount = std::max(UP.PeelCount, DesiredPeelCount); seems cleaner.

test/Transforms/LoopUnroll/peel-loop-not-forced.ll
60 ↗(On Diff #94201)

s/trice/thrice/

92 ↗(On Diff #94201)

s/trice/thrice/

This revision now requires changes to proceed.Apr 5 2017, 11:16 AM
mkazantsev added inline comments.Apr 5 2017, 1:38 PM
lib/Transforms/Utils/LoopUnrollPeel.cpp
149 ↗(On Diff #94201)

Because initially it was wrong. The threshold applies to the total number of iterations after peeling, not to loop's size, see line 202. If we peel once, the cost would be 2 * LoopSize.

sanjoy added inline comments.Apr 5 2017, 1:48 PM
lib/Transforms/Utils/LoopUnrollPeel.cpp
149 ↗(On Diff #94201)

Can you please fix that separately with a test?

mkazantsev added inline comments.Apr 6 2017, 3:28 AM
lib/Transforms/Utils/LoopUnrollPeel.cpp
165 ↗(On Diff #94201)

Only if loop size is also 0, we enter here witn condition 2 * LoopSize <= UP.Threshold :) So I Don't think it's real.

168 ↗(On Diff #94201)

Line 126

UP.PeelCount = 0;

It is never changed after that. The puspose of this method is to only set it once.

mkazantsev added inline comments.
lib/Transforms/Utils/LoopUnrollPeel.cpp
149 ↗(On Diff #94201)
mkazantsev marked 10 inline comments as done.Apr 6 2017, 5:49 AM
mkazantsev added inline comments.
lib/Transforms/Utils/LoopUnrollPeel.cpp
165 ↗(On Diff #94201)

Line 131 rejects empty loops:

if (!L->empty())
  return;
mkazantsev updated this revision to Diff 94368.Apr 6 2017, 7:23 AM
mkazantsev edited edge metadata.
mkazantsev marked 2 inline comments as done.

Rebase & some renames.

sanjoy accepted this revision.Apr 6 2017, 9:12 AM

lgtm with minor comments

lib/Transforms/Utils/LoopUnrollPeel.cpp
94 ↗(On Diff #94201)

Please add an assert that BackEdge is, in fact, the latch.

165 ↗(On Diff #94201)

I don't think line 131 has anything to do with LoopSize, it is just checking if there are *subloops* or not.

But I think your point on LoopSize being non-zero is correct. Can you please add an assert?

168 ↗(On Diff #94201)

SGTM.

mkazantsev marked 2 inline comments as done.Apr 17 2017, 2:34 AM
mkazantsev added inline comments.
lib/Transforms/Utils/LoopUnrollPeel.cpp
165 ↗(On Diff #94201)

Ok. Zero loop size is not allowed where it is calculated, like

// Don't allow an estimate of size zero.  This would allows unrolling of loops
// with huge iteration counts, which is a compile time problem even if it's
// not a problem for code quality. Also, the code using this size may assume
// that each loop has at least three instructions (likely a conditional
// branch, a comparison feeding that branch, and some kind of loop increment
// feeding that comparison instruction).
LoopSize = std::max(LoopSize, BEInsns + 1);

So I'll add this assert in the very beginning of the method.

This revision was automatically updated to reflect the committed changes.
mkazantsev marked an inline comment as done.

@mkazantsev hi.
I have encountered a missing peeling issue, and here is the most simplified version of the code: https://godbolt.org/g/xZgb4T
If i manually use opt, and specify -loop-unroll -unroll-force-peel-count=2, the peeling happens as i would expect.
Which i suppose means, only the analysis (calculateIterationsToInvariance()) does not support that pattern.
If i change the line 117 of this diff of that function with something like:

-  }
+  } else ToInvariance = 2u;

it also peels that case correctly, so i do think it is just an analysis problem.

Do you plan on working on the loop peeling again anytime soon?
If yes, then i guess i should not worry.
If not, any pointers? :)

Hi @lebedev.ri

This is an interesting case, however it has nothing to do with this patch. This patch is only for pattern "some Phi becomes invariant after N steps", and in your situation is "some condition becomes invariant after few steps". It is a completely different situation. In case if your condition was if (x < 60), using peeling here would be complete bizzare.

I personally don't plan to do anything in peeling in the observable future.

In LLVM, we have a pass InductiveRangeChecksElimination which is not included into clang pipeline, but it handles exactly this kind of cases. Try running opt with -irce option and see what happens.

  • Max