This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
UnrollLoop.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
-
LoopUnrollPass.cpp
-
Utils/
-
LoopUnrollPeel.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
-
complete_unroll_profitability_with_assume.ll
-
peel-loop-conditions.ll

Differential D43876

[LoopUnroll] Peel off iterations if it makes conditions true/false.
ClosedPublic

Authored by fhahn on Feb 28 2018, 9:41 AM.

Download Raw Diff

Details

Reviewers

mkuper
mkazantsev
efriedma

Commits

rGfc97b6173f2d: [LoopUnroll] Peel off iterations if it makes conditions true/false.
rL327671: [LoopUnroll] Peel off iterations if it makes conditions true/false.

Summary

If the loop body contains conditions of the form IndVar < #constant, we
can remove the checks by peeling off #constant iterations. This patch
initially starts out with supporting very simple conditions. It can be
made more powerful in follow-up commits.

This improves codegen for PR34364.

Diff Detail

Repository: rL LLVM

Event Timeline

fhahn created this revision.Feb 28 2018, 9:41 AM

lebedev.ri added a subscriber: lebedev.ri.Feb 28 2018, 9:44 AM

efriedma mentioned this in D43878: [LoopUnroll] Simplify induction variables after peeling too..Feb 28 2018, 11:47 AM

efriedma added inline comments.Feb 28 2018, 11:56 AM

lib/Transforms/Utils/LoopUnrollPeel.cpp
154 ↗	(On Diff #136315)	Please don't use getCanonicalInductionVariable; indvars doesn't generate canonical induction variables anymore, so this will only handle limited cases. Can you use SCEV here instead?
177 ↗	(On Diff #136315)	This doesn't check for the possibility that the induction variable could wrap... not a correctness problem, of course, but it's not clear the transform is profitable in that case.

fhahn added a child revision: D43878: [LoopUnroll] Simplify induction variables after peeling too..Feb 28 2018, 1:24 PM

mkazantsev added inline comments.Feb 28 2018, 9:18 PM

lib/Transforms/Utils/LoopUnrollPeel.cpp
154 ↗	(On Diff #136315)	I would also advice using SCEV here. It can help to cover more cases than just comparison against constants for no extra price. See `ScalarEvolution::isKnownPredicate` or `ScalarEvolution::isLoopEntryGuardedByCond`.
175 ↗	(On Diff #136315)	Is it OK if you have negative UpperBound? If you interpret it as unsigned value, it is a very big value, and you are going to peel max possible amount of iterations in this case.
test/Transforms/LoopUnroll/peel-loop-conditions.ll
10 ↗	(On Diff #136315)	I don't think it's profitable unless we have proved that `%k` is positive.

mkazantsev requested changes to this revision.Feb 28 2018, 9:20 PM

This revision now requires changes to proceed.Feb 28 2018, 9:20 PM

lebedev.ri added inline comments.Mar 1 2018, 2:33 AM

test/Transforms/LoopUnroll/peel-loop-conditions.ll
75 ↗	(On Diff #136315)	What about peeling from the end (i guess will still work), and from the beginning and end?

Update to use SCEV. It now uses a rather simple approach: evaluate AddRec at the DesiredPeelCount and then try to add the recurrence step until the predicate is not known any longer. We could probably compute the number of iterations directly, but I could not find any existing function in SCEV that would handle all predicates. But we only loop for MaxPeelCount iterations at max, and MaxPeelCount should be low.

fhahn added inline comments.Mar 1 2018, 9:08 AM

test/Transforms/LoopUnroll/peel-loop-conditions.ll
75 ↗	(On Diff #136315)	We could do that as well. It would require changes to peelLoop though and should be done as a follow up patch to this one.

I believe this has lots of potential (not just for constant). What other follow-up commits are you planning on ?

In D43876#1024032, @junbuml wrote:

I believe this has lots of potential (not just for constant). What other follow-up commits are you planning on ?

I am open for suggestions. I think peeling off iterations at the end as Roman suggested might be worthwhile too. And opportunistically peeling off a few iterations, if the distance between start and end are low, but unrolling does not apply.

samparker added a subscriber: samparker.Mar 2 2018, 12:29 AM

I am open for suggestions. I think peeling off iterations at the end as Roman suggested might be worthwhile too. And opportunistically peeling off a few iterations, if the distance between start and end are low, but unrolling does not apply.

Peeling off iterations at the end might hit the case I'm trying to optimize. I will be happy to support it.

lib/Transforms/Utils/LoopUnrollPeel.cpp
169 ↗	(On Diff #136540)	Isn't it okay to remove this nullcheck for Condition ?
192 ↗	(On Diff #136540)	Do we really need this loop. Isn't it possible to find the count using getMinusSCEV(RightSCEV, LeftSCEV->getStart()) as long as getMinusSCEV(RightSCEV, LeftSCEV->getStart()) is constant ?

Peeling off iterations at the end might hit the case I'm trying to optimize. I will be happy to support it.

Great! Is there any code/examples you could share that would benefit?

lib/Transforms/Utils/LoopUnrollPeel.cpp
169 ↗	(On Diff #136540)	Ah yes, we are not dyn_cast'ing any more.
192 ↗	(On Diff #136540)	Yes we could, but we would need some logic dealing with different predicates I think. That should not be too much work and I am happy to do it, unless there is existing infrastructure I might have missed.
175 ↗	(On Diff #136315)	With using SCEV, this problem went away I think.
177 ↗	(On Diff #136315)	I think this still needs wrapping checks. I'll have to look into how to best do that what SCEV.

Great! Is there any code/examples you could share that would benefit?

Hopefully, below C code should show the case I'm pursuing. In this loop, we know that the condition is false only in the last one iteration.

void test (int* p, int* p2, int* p3, int v, int L) {

unsigned int M = *p;
int k;
for (k = 1; k <= M; k++) {
  p2[k] = v;
  if (k < M) {
      p3[k] = v;
  }
}

}

Just FYI, I am also currently working on a pass to handle slightly more generic cases within the loop body, such like:

for (unsigned i = 0; i; i < 1000; ++i) {

if (i < M)
... something
else
... something else

}

My transformation operates on the unrolled version and also attempts to remove selects if the loop body has been simplified into one block. Hopefully I will be able to get an initial version up for review and discussion next week.

In D43876#1031076, @samparker wrote:
Just FYI, I am also currently working on a pass to handle slightly more generic cases within the loop body, such like:

for (unsigned i = 0; i; i < 1000; ++i) {
if (i < M)
... something
else
... something else
}

My transformation operates on the unrolled version and also attempts to remove selects if the loop body has been simplified into one block. Hopefully I will be able to get an initial version up for review and discussion next week.

Interesting. What transformation is this pass doing? Loop peeling should handle cases like that for constant bounds. Non-constant bounds still won't be peeled, so it should not interfere with your transformation.

Yes, this is a really nice approach for lower constant bounds. My transform selects a region of conditionally executed blocks and then attempts to hoist conditional statements into the head block of that region. Currently I can find loop unrolled induction variable idioms as well as finding a connected path of conditional values. From there I can produce a fast path free of compares and branches, as well as leaving the original code as a fallback. I also split and remove selects because performing conditional moves is still expensive on our small cores as well as the increased register pressure they create.

My transform selects a region of conditionally executed blocks and then attempts to hoist conditional statements into the head block of that region.

Will your pass merge some blocks under same condition into a group in the loop? I'm curious if the conditional branch is removed in the loop?

Just FYI, I am also currently working on a pass to handle slightly more generic cases within the loop body, such like:
for (unsigned i = 0; i; i < 1000; ++i) {
  if (i < M)
    ... something
  else
    ... something else
}
My transformation operates on the unrolled version and also attempts to remove selects if the loop body has been simplified into one block. Hopefully I will be able to get an initial version up for review and discussion next week.

I'm not sure if I fully understand what your pass will do. It would be great if you can show how this code would be transformed after your pass ?

To remove the conditional branch in general, I think we can split the loop like :

Min = min(1000, M);
unsigned i;
for (i = 0; i; i < Min; ++i) {
    ... something
}

for (; i; i < 1000; ++i) {
    ... something else
}

@samparker, will your pass perform something like this or something else ?

I believe peeling is a special case of such loop splitting, but peeling itself is worthwhile to do for a low constant bound.

mcrosier added a subscriber: mcrosier.Mar 8 2018, 11:51 AM

fhahn added inline comments.Mar 8 2018, 2:05 PM

lib/Transforms/Utils/LoopUnrollPeel.cpp
192 ↗	(On Diff #136540)	I had a closer look today. By using isKnownPredicate, we can also handle cases like the one below, where we compare 2 SCEVAddRecExprs, easily. Given that MaxPeelCount should be quite small and we only iterate at most MaxPeelCount times for each compare instruction, I am not sure if it's worth making things more complicated. It might be worth to add a "fast-path" for the case where we comparing a constant bound with a AddRecExpr with a known integer constant. What do you think? for.body.lr.ph: br label %for.body for.body: %i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ] %j = phi i32 [ 2, %for.body.lr.ph ], [ %j.inc, %for.inc ] %cmp1 = icmp ult i32 %i.05, %j br i1 %cmp1, label %if.then, label %for.inc if.then: call void @f1() br label %for.inc for.inc: %inc = add nsw i32 %i.05, 2 %j.inc = add nsw i32 %j, 1 %cmp = icmp slt i32 %inc, %k br i1 %cmp, label %for.body, label %for.end for.end: ret void

junbuml added inline comments.Mar 9 2018, 10:44 AM

lib/Transforms/Utils/LoopUnrollPeel.cpp
192 ↗	(On Diff #136540)	Based on that MaxPeelCount is a small constant, I don't see any big burden on it. fast-path sounds good to me.

mkazantsev requested changes to this revision.Mar 11 2018, 10:12 PM

mkazantsev added inline comments.

lib/Transforms/Utils/LoopUnrollPeel.cpp
173 ↗	(On Diff #136540)	What was the point of moving this `if`? Could we not just update DesiredPeelCount before this line? We only need MaxPeelCount under this condition, there is no point in calculating it before it.
184 ↗	(On Diff #136540)	After all this logic, could you please create a variable like `LeftAR = cast<SCEVAddRecExpr>(LeftSCEV)` to make it explicitly clear what you expect to see next? I also suggest bailing out if `LeftAR->isAffine()` is `false` because it will lead you to huge SCEV computations in the loop below.
187 ↗	(On Diff #136540)	Any good reason why it is 16 bit?
189 ↗	(On Diff #136540)	You could have used `SE.getConstant(LeftSCEV->getType(), DesiredPeelCount)` and no need to declare `C` for this.
190 ↗	(On Diff #136540)	I totally don't get this logic. Are we going to prove _another_ predicate if SCEV was unable to prove something for this one? Could you please add a comment on what is going on here?
194 ↗	(On Diff #136540)	Step calculation can be hoisted out of this loop. I would also suggest bailing early if AR is not affine because adding a step of non-affine AddRec many times can produce really big and ugly SCEVs.

This revision now requires changes to proceed.Mar 11 2018, 10:12 PM

mkazantsev added subscribers: skatkov, reames.Mar 12 2018, 12:51 AM

@mkazantsev thank you very much for having a look. I hope I addressed your feedback. Are you concerned about having the loop that evaluates the predicate at the current iteration?

fhahn added inline comments.Mar 12 2018, 11:29 AM

lib/Transforms/Utils/LoopUnrollPeel.cpp
173 ↗	(On Diff #136540)	MaxPeelCount is passed to `countToEliminateCompares`, to limit the maximum iterations we do.
194 ↗	(On Diff #136540)	I've added a comment that makes it clearer I hope. The idea is to handle cases like below, where the condition is known to be false initially. Initially `i > 2` is not known, but the inverse `i <= 2` is known. if (i > 2) { // do something } else { // do something else }

wcxf added a subscriber: wcxf.Mar 12 2018, 8:00 PM

mkazantsev added inline comments.Mar 12 2018, 10:01 PM

lib/Transforms/Utils/LoopUnrollPeel.cpp
173 ↗	(On Diff #136540)	My bad, I didn't notice it. :)
194 ↗	(On Diff #136540)	I see the point now, but you are using wrong function. If you look into `getSwappedPredicate`, it returns `<` for `>`, and what you need is called `getInversePredicate`.

Please fix getSwappedPredicate with getInversePredicate, the rest looks fine now.

This revision now requires changes to proceed.Mar 12 2018, 10:03 PM

Thanks. Updated to use getInversePredicate

lib/Transforms/Utils/LoopUnrollPeel.cpp
194 ↗	(On Diff #136540)	Right, my bad, thank you very much :)

mkazantsev added inline comments.Mar 13 2018, 9:34 PM

lib/Transforms/Utils/LoopUnrollPeel.cpp
182 ↗	(On Diff #138144)	Why do we bail if both left and right are AddRecs? What is the problem? If there are no conceptual troubles with handling this case, I'd rather handle it. And if so, please add a corresponding test.

mkazantsev added inline comments.Mar 13 2018, 9:55 PM

lib/Transforms/Utils/LoopUnrollPeel.cpp
173 ↗	(On Diff #138144)	I think you should also check `isKnownPredicate` for `LeftSCEV`, `RightSCEV` at this point and bail if it is trivially known true or false. In this case it makes more sense to eliminate this comparison than try to peel something out.

I think it is a bug to be fixed.

lib/Transforms/Utils/LoopUnrollPeel.cpp
185 ↗	(On Diff #138144)	Bail if LeftAR->getLoop() != L.

This revision now requires changes to proceed.Mar 13 2018, 10:31 PM

Thanks again! I've added a test case to make sure we do not peel the inner loop if the condition depends ARs of the outer loop

fhahn added inline comments.Mar 14 2018, 12:57 PM

lib/Transforms/Utils/LoopUnrollPeel.cpp
182 ↗	(On Diff #138144)	I don't think there are conceptual problems, although it will make things slightly more complicated; I think we would need some more checks and also evaluate the second AR, if we have one. I have test case, but I would prefer to add that in a follow up patch, if that is ok with you?

LGTM. I would rather prefer removing the restriction on RightSCEV being AR in this patch and then making a follow-up if you need some extra logic for it, but it's up to you.

lib/Transforms/Utils/LoopUnrollPeel.cpp
182 ↗	(On Diff #138144)	We could just remove this "continue" check without adding extra logic for right AR. I mean, we only care that LeftSCEV is an AR and we don't care what RightSCEV actually is. For example, in a loop: int i = 1, j = 1; while (true) { i++; j = (j /s (i + 1)); } SCEV of `i`'s Phi will be a SCEVAddRec and SCEV of `j`'s Phi will be SCEVUnknown, and for some reason you allow your optimization when `RightSCEV = j` and explicitly prohibit it when `RightSCEV = i`. I wonder how `i` is more complicated than `j`? :) I'm OK if you remove this check in a follow-up patch or right in this one, whatever you like more.

This revision is now accepted and ready to land.Mar 14 2018, 9:30 PM

Thanks again for having a look. I'll drop the bail out and add a test before I commit the change.

lib/Transforms/Utils/LoopUnrollPeel.cpp
182 ↗	(On Diff #138144)	Ah yes. I removed the check now, as it won't anything wrong. What I meant is that I think we could do better than just dropping the bail out. That's what I plan to do in a follow up patch.

Drop bailout

Rebased. I had to add -unroll-peel-max-count=0 to the new test case added in D43931, as loop peeling now covers that case too

Closed by commit rL327671: [LoopUnroll] Peel off iterations if it makes conditions true/false. (authored by fhahn). · Explain WhyMar 15 2018, 2:37 PM

This revision was automatically updated to reflect the committed changes.

Hi Florian,
We identified a 2.15% regression in SPEC2006/h264ref due to this commit. After this change, the FullPelBlockMotionBiPred() function is no longer inlined into the hottest function, BlockMotionSearch(). Previous to this change, the function was inlined because there was a single callsite in the entire program (known only when compiling in LTO) and the original definition could be removed after inlining. However, after loop peeling the callsite of FullPelBlockMotionBiPred() is replicated, which prevents inlining.

I was wondering if we could avoid peeling in this case until we have some type of cost model that can determine if peeling would prevent inlining. Also, after looking at the code (which I can't share here) you might also notice that the amount of code being peeled in this case is fairly large relative to the amount of code being removed from the loop. It might also make sense to have a heuristic that takes code size into consideration when peeling, if that hasn't already been done.

Thoughts?

Chad

Hi Chad,

In D43876#1045969, @mcrosier wrote:

Hi Florian,
We identified a 2.15% regression in SPEC2006/h264ref due to this commit. After this change, the FullPelBlockMotionBiPred() function is no longer inlined into the hottest function, BlockMotionSearch(). Previous to this change, the function was inlined because there was a single callsite in the entire program (known only when compiling in LTO) and the original definition could be removed after inlining. However, after loop peeling the callsite of FullPelBlockMotionBiPred() is replicated, which prevents inlining.

I was wondering if we could avoid peeling in this case until we have some type of cost model that can determine if peeling would prevent inlining. Also, after looking at the code (which I can't share here) you might also notice that the amount of code being peeled in this case is fairly large relative to the amount of code being removed from the loop. It might also make sense to have a heuristic that takes code size into consideration when peeling, if that hasn't already been done.

Thoughts?

Thanks for making me aware of this, I originally thought considering MaxPeelCount should help us avoid those cases.

I will follow this up with patches in the following days. I think there are a couple of things that can be done to make the peeling more conservative for now. First, only peel if we can proof that the condition is known to be true in the peeled part and false in the loop (or vice versa). Otherwise we cannot simplify the loop body and peeling is likely not very beneficial. Second, have a simple cost function, that takes the size of the loop body vs the eliminated instructions into account.

Also, D43878, which enables induction variable simplification after peeling is not committed yet, so currently the loop body may not be simplified after peeling, even if it could be.

Cheers,
Florian

In D43876#1046222, @fhahn wrote:

Hi Chad,

In D43876#1045969, @mcrosier wrote:

Hi Florian,
We identified a 2.15% regression in SPEC2006/h264ref due to this commit. After this change, the FullPelBlockMotionBiPred() function is no longer inlined into the hottest function, BlockMotionSearch(). Previous to this change, the function was inlined because there was a single callsite in the entire program (known only when compiling in LTO) and the original definition could be removed after inlining. However, after loop peeling the callsite of FullPelBlockMotionBiPred() is replicated, which prevents inlining.

I was wondering if we could avoid peeling in this case until we have some type of cost model that can determine if peeling would prevent inlining. Also, after looking at the code (which I can't share here) you might also notice that the amount of code being peeled in this case is fairly large relative to the amount of code being removed from the loop. It might also make sense to have a heuristic that takes code size into consideration when peeling, if that hasn't already been done.

Thoughts?

Thanks for making me aware of this, I originally thought considering MaxPeelCount should help us avoid those cases.

I will follow this up with patches in the following days. I think there are a couple of things that can be done to make the peeling more conservative for now. First, only peel if we can proof that the condition is known to be true in the peeled part and false in the loop (or vice versa). Otherwise we cannot simplify the loop body and peeling is likely not very beneficial. Second, have a simple cost function, that takes the size of the loop body vs the eliminated instructions into account.

Also, D43878, which enables induction variable simplification after peeling is not committed yet, so currently the loop body may not be simplified after peeling, even if it could be.

Cheers,
Florian

You're welcome, Florian. And thanks for following up on this (and all the great work you're doing here)!

@mcrosier I've submitted D44983 for review. It prevents peeling, if we cannot simplify the loop body after peeling. Peeling if we cannot simplify the loop body afterwards is likely not beneficial. It would be great if you could check if that helps in your case. If it's not easy for you to check, I can try and test it myself.

Coming up with some additional heuristics, e.g. based on the number of instructions peeled and not eliminated, should be possible, but without knowing the inlining situation we would probably have to choose a rather arbitrary threshold.

a.elovikov added a subscriber: a.elovikov.Apr 3 2018, 3:28 AM

In D43876#1050460, @fhahn wrote:

@mcrosier I've submitted D44983 for review. It prevents peeling, if we cannot simplify the loop body after peeling. Peeling if we cannot simplify the loop body afterwards is likely not beneficial. It would be great if you could check if that helps in your case. If it's not easy for you to check, I can try and test it myself.

Coming up with some additional heuristics, e.g. based on the number of instructions peeled and not eliminated, should be possible, but without knowing the inlining situation we would probably have to choose a rather arbitrary threshold.

Sure, I'll take a look now and let you know! Sorry I didn't see this sooner.

In D43876#1057247, @mcrosier wrote:

In D43876#1050460, @fhahn wrote:

@mcrosier I've submitted D44983 for review. It prevents peeling, if we cannot simplify the loop body after peeling. Peeling if we cannot simplify the loop body afterwards is likely not beneficial. It would be great if you could check if that helps in your case. If it's not easy for you to check, I can try and test it myself.

Coming up with some additional heuristics, e.g. based on the number of instructions peeled and not eliminated, should be possible, but without knowing the inlining situation we would probably have to choose a rather arbitrary threshold.

Sure, I'll take a look now and let you know! Sorry I didn't see this sooner.

Unfortunately, D44983 does not fix this case. I'm going to dig into this now. I'll update you once I have some additional findings.

Thanks for letting me know, so it looks like we need better size based heuristics as well. I'll take a closer look at what happens in FullPelBlockMotionBiPred soon.

In D43876#1057409, @mcrosier wrote:

In D43876#1057247, @mcrosier wrote:

In D43876#1050460, @fhahn wrote:

@mcrosier I've submitted D44983 for review. It prevents peeling, if we cannot simplify the loop body after peeling. Peeling if we cannot simplify the loop body afterwards is likely not beneficial. It would be great if you could check if that helps in your case. If it's not easy for you to check, I can try and test it myself.

Coming up with some additional heuristics, e.g. based on the number of instructions peeled and not eliminated, should be possible, but without knowing the inlining situation we would probably have to choose a rather arbitrary threshold.

Sure, I'll take a look now and let you know! Sorry I didn't see this sooner.

Unfortunately, D44983 does not fix this case. I'm going to dig into this now. I'll update you once I have some additional findings.

I had a closer look at FullPelBlockMotionBiPred and we peeled off an iteration because we have something like

if (i % 2) {
  ...
  if (i != 0) {...}
 ...
}

Peeling based on those nested conditional is likely to increase the code size too much compared to the benefit. D45374 only considers conditions in blocks that are executed on every iteration.

In D43876#1059839, @fhahn wrote:
In D43876#1057409, @mcrosier wrote:

In D43876#1057247, @mcrosier wrote:

In D43876#1050460, @fhahn wrote:

@mcrosier I've submitted D44983 for review. It prevents peeling, if we cannot simplify the loop body after peeling. Peeling if we cannot simplify the loop body afterwards is likely not beneficial. It would be great if you could check if that helps in your case. If it's not easy for you to check, I can try and test it myself.

Coming up with some additional heuristics, e.g. based on the number of instructions peeled and not eliminated, should be possible, but without knowing the inlining situation we would probably have to choose a rather arbitrary threshold.

Sure, I'll take a look now and let you know! Sorry I didn't see this sooner.

Unfortunately, D44983 does not fix this case. I'm going to dig into this now. I'll update you once I have some additional findings.

I had a closer look at FullPelBlockMotionBiPred and we peeled off an iteration because we have something like
if (i % 2) {
  ...
  if (i != 0) {...}
 ...
}
Peeling based on those nested conditional is likely to increase the code size too much compared to the benefit. D45374 only considers conditions in blocks that are executed on every iteration.

IIUC, D45374 should fix the regression in h264ref we're seeing (but I haven't tested it yet). However, I wanted to share with you my findings. Currently, loop peeling will not peel a loop if it includes a function call that is likely to be inlined (i.e., is not marked with a noinline attribute, has internal linkage and has a single use). This is exactly the case we're dealing with here except FullPelBlockMotionBiPred isn't marked as internal until the LTO phase of compilation. Thus, one possible approach would be to defer peeling until the LTO phase. After r329392, this can be accomplished with a small change to the pass manager:

diff --git a/lib/Transforms/IPO/PassManagerBuilder.cpp b/lib/Transforms/IPO/PassManagerBuilder.cpp
index 1def2e8..b06ca2a 100644
--- a/lib/Transforms/IPO/PassManagerBuilder.cpp
+++ b/lib/Transforms/IPO/PassManagerBuilder.cpp
@@ -651,7 +651,10 @@ void PassManagerBuilder::populateModulePassManager(
   addInstructionCombiningPass(MPM);
 
   if (!DisableUnrollLoops) {
-    MPM.add(createLoopUnrollPass(OptLevel));    // Unroll small loops
+    if (PrepareForLTO)
+      MPM.add(createSimpleLoopUnrollPass(OptLevel));    // Unroll small loops
+    else
+      MPM.add(createLoopUnrollPass(OptLevel));    // Unroll small loops
 
     // LoopUnroll may generate some redundency to cleanup.
     addInstructionCombiningPass(MPM);

However, I haven't run any performance tests (but I do know if fixes the h264ref regression) nor do I know if the community would be interested in such an approach.

A more targeted approach might be to only prevent peeling of the loop (rather than disable peeling entirely) during the first phase of LTO if the loop has any calls.

Herald added a subscriber: zzheng. · View Herald TranscriptApr 6 2018, 10:40 AM

This is exactly the case we're dealing with here except FullPelBlockMotionBiPred isn't marked as internal until the LTO phase of compilation. Thus, one possible approach would be to defer peeling until the LTO phase. After r329392, this can be accomplished with a small change to the pass manager:

ThinLTO doesn't run vectorization or unrolling before link-time; among other reasons, it avoids problems like this. You might want to consider making non-thin LTO work the same way.

In D43876#1060032, @efriedma wrote:

This is exactly the case we're dealing with here except FullPelBlockMotionBiPred isn't marked as internal until the LTO phase of compilation. Thus, one possible approach would be to defer peeling until the LTO phase. After r329392, this can be accomplished with a small change to the pass manager:

ThinLTO doesn't run vectorization or unrolling before link-time; among other reasons, it avoids problems like this. You might want to consider making non-thin LTO work the same way.

Thanks for the suggestion, Eli. Given this is already the behavior of ThinLTO, I'm going to investigate this further.

In D43876#1030514, @junbuml wrote:

I am open for suggestions. I think peeling off iterations at the end as Roman suggested might be worthwhile too. And opportunistically peeling off a few iterations, if the distance between start and end are low, but unrolling does not apply.

Peeling off iterations at the end might hit the case I'm trying to optimize. I will be happy to support it.

Came across this patch and curious, is "Peeling off iterations at the end" supposed to work by now (if so, I don't see it)?

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 19 2022, 10:36 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Utils/

UnrollLoop.h

2 lines

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

2 lines

Utils/

LoopUnrollPeel.cpp

93 lines

test/

Transforms/

LoopUnroll/

complete_unroll_profitability_with_assume.ll

2 lines

peel-loop-conditions.ll

613 lines

Diff 138631

llvm/trunk/include/llvm/Transforms/Utils/UnrollLoop.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
bool UseEpilogRemainder, bool UnrollRemainder,		bool UseEpilogRemainder, bool UnrollRemainder,
LoopInfo *LI,		LoopInfo *LI,
ScalarEvolution SE, DominatorTree DT,		ScalarEvolution SE, DominatorTree DT,
AssumptionCache *AC,		AssumptionCache *AC,
bool PreserveLCSSA);		bool PreserveLCSSA);

void computePeelCount(Loop *L, unsigned LoopSize,		void computePeelCount(Loop *L, unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP,		TargetTransformInfo::UnrollingPreferences &UP,
unsigned &TripCount);		unsigned &TripCount, ScalarEvolution &SE);

bool peelLoop(Loop L, unsigned PeelCount, LoopInfo LI, ScalarEvolution *SE,		bool peelLoop(Loop L, unsigned PeelCount, LoopInfo LI, ScalarEvolution *SE,
DominatorTree DT, AssumptionCache AC, bool PreserveLCSSA);		DominatorTree DT, AssumptionCache AC, bool PreserveLCSSA);

MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);		MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_UTILS_UNROLLLOOP_H		#endif // LLVM_TRANSFORMS_UTILS_UNROLLLOOP_H

llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 789 Lines • ▼ Show 20 Lines	if (getUnrolledLoopSize(LoopSize, UP) < UP.Threshold) {
TripMultiple = UP.UpperBound ? 1 : TripMultiple;		TripMultiple = UP.UpperBound ? 1 : TripMultiple;
return ExplicitUnroll;		return ExplicitUnroll;
}		}
}		}
}		}
}		}

// 4th priority is loop peeling		// 4th priority is loop peeling
computePeelCount(L, LoopSize, UP, TripCount);		computePeelCount(L, LoopSize, UP, TripCount, SE);
if (UP.PeelCount) {		if (UP.PeelCount) {
UP.Runtime = false;		UP.Runtime = false;
UP.Count = 1;		UP.Count = 1;
return ExplicitUnroll;		return ExplicitUnroll;
}		}

// 5th priority is partial unrolling.		// 5th priority is partial unrolling.
// Try partial unroll only when TripCount could be staticaly calculated.		// Try partial unroll only when TripCount could be staticaly calculated.
▲ Show 20 Lines • Show All 553 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/LoopUnrollPeel.cpp

Show All 14 Lines

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/LoopSimplify.h"		#include "llvm/Transforms/Utils/LoopSimplify.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/UnrollLoop.h"		#include "llvm/Transforms/Utils/UnrollLoop.h"
#include "llvm/Transforms/Utils/ValueMapper.h"		#include "llvm/Transforms/Utils/ValueMapper.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <limits>		#include <limits>

using namespace llvm;		using namespace llvm;
		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "loop-unroll"		#define DEBUG_TYPE "loop-unroll"

STATISTIC(NumPeeled, "Number of loops peeled");		STATISTIC(NumPeeled, "Number of loops peeled");

static cl::opt<unsigned> UnrollPeelMaxCount(		static cl::opt<unsigned> UnrollPeelMaxCount(
"unroll-peel-max-count", cl::init(7), cl::Hidden,		"unroll-peel-max-count", cl::init(7), cl::Hidden,
cl::desc("Max average trip count which will cause loop peeling."));		cl::desc("Max average trip count which will cause loop peeling."));
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	static unsigned calculateIterationsToInvariance(
}		}

// If we found that this Phi lies in an invariant chain, update the map.		// If we found that this Phi lies in an invariant chain, update the map.
if (ToInvariance != InfiniteIterationsToInvariance)		if (ToInvariance != InfiniteIterationsToInvariance)
IterationsToInvariance[Phi] = ToInvariance;		IterationsToInvariance[Phi] = ToInvariance;
return ToInvariance;		return ToInvariance;
}		}

		// Return the number of iterations to peel off that make conditions in the
		// body true/false. For example, if we peel 2 iterations off the loop below,
		// the condition i < 2 can be evaluated at compile time.
		// for (i = 0; i < n; i++)
		// if (i < 2)
		// ..
		// else
		// ..
		// }
		static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
		ScalarEvolution &SE) {
		assert(L.isLoopSimplifyForm() && "Loop needs to be in loop simplify form");
		unsigned DesiredPeelCount = 0;

		for (auto *BB : L.blocks()) {
		auto *BI = dyn_cast<BranchInst>(BB->getTerminator());
		if (!BI \|\| BI->isUnconditional())
		continue;

		// Ignore loop exit condition.
		if (L.getLoopLatch() == BB)
		continue;

		Value *Condition = BI->getCondition();
		Value LeftVal, RightVal;
		CmpInst::Predicate Pred;
		if (!match(Condition, m_ICmp(Pred, m_Value(LeftVal), m_Value(RightVal))))
		continue;

		const SCEV *LeftSCEV = SE.getSCEV(LeftVal);
		const SCEV *RightSCEV = SE.getSCEV(RightVal);

		// Do not consider predicates that are known to be true or false
		// independently of the loop iteration.
		if (SE.isKnownPredicate(Pred, LeftSCEV, RightSCEV) \|\|
		SE.isKnownPredicate(ICmpInst::getInversePredicate(Pred), LeftSCEV,
		RightSCEV))
		continue;

		// Check if we have a condition with one AddRec and one non AddRec
		// expression. Normalize LeftSCEV to be the AddRec.
		if (!isa<SCEVAddRecExpr>(LeftSCEV)) {
		if (isa<SCEVAddRecExpr>(RightSCEV)) {
		std::swap(LeftSCEV, RightSCEV);
		Pred = ICmpInst::getSwappedPredicate(Pred);
		} else
		continue;
		}

		const SCEVAddRecExpr *LeftAR = cast<SCEVAddRecExpr>(LeftSCEV);

		// Avoid huge SCEV computations in the loop below and make sure we only
		// consider AddRecs of the loop we are trying to peel.
		if (!LeftAR->isAffine() \|\| LeftAR->getLoop() != &L)
		continue;

		// Check if extending DesiredPeelCount lets us evaluate Pred.
		const SCEV *IterVal = LeftAR->evaluateAtIteration(
		SE.getConstant(LeftSCEV->getType(), DesiredPeelCount), SE);

		// If the original condition is not known, get the negated predicate
		// (which holds on the else branch) and check if it is known. This allows
		// us to peel of iterations that make the original condition false.
		if (!SE.isKnownPredicate(Pred, IterVal, RightSCEV))
		Pred = ICmpInst::getInversePredicate(Pred);

		const SCEV *Step = LeftAR->getStepRecurrence(SE);
		while (DesiredPeelCount < MaxPeelCount &&
		SE.isKnownPredicate(Pred, IterVal, RightSCEV)) {
		IterVal = SE.getAddExpr(IterVal, Step);
		DesiredPeelCount++;
		}
		}

		return DesiredPeelCount;
		}

// Return the number of iterations we want to peel off.		// Return the number of iterations we want to peel off.
void llvm::computePeelCount(Loop *L, unsigned LoopSize,		void llvm::computePeelCount(Loop *L, unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP,		TargetTransformInfo::UnrollingPreferences &UP,
unsigned &TripCount) {		unsigned &TripCount, ScalarEvolution &SE) {
assert(LoopSize > 0 && "Zero loop size is not allowed!");		assert(LoopSize > 0 && "Zero loop size is not allowed!");
UP.PeelCount = 0;		UP.PeelCount = 0;
if (!canPeel(L))		if (!canPeel(L))
return;		return;

// Only try to peel innermost loops.		// Only try to peel innermost loops.
if (!L->empty())		if (!L->empty())
return;		return;
Show All 14 Lines	if (2 * LoopSize <= UP.Threshold && UnrollPeelMaxCount > 0) {
assert(BackEdge && "Loop is not in simplified form?");		assert(BackEdge && "Loop is not in simplified form?");
for (auto BI = L->getHeader()->begin(); isa<PHINode>(&*BI); ++BI) {		for (auto BI = L->getHeader()->begin(); isa<PHINode>(&*BI); ++BI) {
PHINode Phi = cast<PHINode>(&BI);		PHINode Phi = cast<PHINode>(&BI);
unsigned ToInvariance = calculateIterationsToInvariance(		unsigned ToInvariance = calculateIterationsToInvariance(
Phi, L, BackEdge, IterationsToInvariance);		Phi, L, BackEdge, IterationsToInvariance);
if (ToInvariance != InfiniteIterationsToInvariance)		if (ToInvariance != InfiniteIterationsToInvariance)
DesiredPeelCount = std::max(DesiredPeelCount, ToInvariance);		DesiredPeelCount = std::max(DesiredPeelCount, ToInvariance);
}		}
if (DesiredPeelCount > 0) {
// Pay respect to limitations implied by loop size and the max peel count.		// Pay respect to limitations implied by loop size and the max peel count.
unsigned MaxPeelCount = UnrollPeelMaxCount;		unsigned MaxPeelCount = UnrollPeelMaxCount;
MaxPeelCount = std::min(MaxPeelCount, UP.Threshold / LoopSize - 1);		MaxPeelCount = std::min(MaxPeelCount, UP.Threshold / LoopSize - 1);

		DesiredPeelCount = std::max(DesiredPeelCount,
		countToEliminateCompares(*L, MaxPeelCount, SE));

		if (DesiredPeelCount > 0) {
DesiredPeelCount = std::min(DesiredPeelCount, MaxPeelCount);		DesiredPeelCount = std::min(DesiredPeelCount, MaxPeelCount);
// Consider max peel count limitation.		// Consider max peel count limitation.
assert(DesiredPeelCount > 0 && "Wrong loop size estimation?");		assert(DesiredPeelCount > 0 && "Wrong loop size estimation?");
DEBUG(dbgs() << "Peel " << DesiredPeelCount << " iteration(s) to turn"		DEBUG(dbgs() << "Peel " << DesiredPeelCount << " iteration(s) to turn"
<< " some Phis into invariants.\n");		<< " some Phis into invariants.\n");
UP.PeelCount = DesiredPeelCount;		UP.PeelCount = DesiredPeelCount;
return;		return;
}		}
▲ Show 20 Lines • Show All 377 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopUnroll/complete_unroll_profitability_with_assume.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S < %s -loop-unroll -unroll-threshold=42 \| FileCheck %s --check-prefix=ANALYZE-FULL			; RUN: opt -S < %s -loop-unroll -unroll-threshold=42 \| FileCheck %s --check-prefix=ANALYZE-FULL

	; This test is supposed to check that calls to @llvm.assume builtin are not			; This test is supposed to check that calls to @llvm.assume builtin are not
	; prohibiting the analysis of full unroll profitability in case the cost of the			; prohibiting the analysis of full unroll profitability in case the cost of the
	; unrolled loop (not acounting to any simplifications done by such unrolling) is			; unrolled loop (not acounting to any simplifications done by such unrolling) is
	; higher than some threshold.			; higher than some threshold.
	;			;
	; Ensure that we indeed are testing this code path by verifying that the loop is			; Ensure that we indeed are testing this code path by verifying that the loop is
	; not unrolled without such analysis:			; not unrolled without such analysis:

	; RUN: opt -S < %s -loop-unroll -unroll-threshold=42 -unroll-max-iteration-count-to-analyze=2 \			; RUN: opt -S < %s -loop-unroll -unroll-threshold=42 -unroll-max-iteration-count-to-analyze=2 \
	; RUN: \| FileCheck %s --check-prefix=DONT-ANALYZE-FULL			; RUN: -unroll-peel-max-count=0 \| FileCheck %s --check-prefix=DONT-ANALYZE-FULL

	; Function Attrs: nounwind			; Function Attrs: nounwind
	declare void @llvm.assume(i1) #1			declare void @llvm.assume(i1) #1

	define i32 @foo(i32* %a) {			define i32 @foo(i32* %a) {
	; ANALYZE-FULL-LABEL: @foo(			; ANALYZE-FULL-LABEL: @foo(
	; ANALYZE-FULL-NEXT: entry:			; ANALYZE-FULL-NEXT: entry:
	; ANALYZE-FULL-NEXT: br label [[FOR_BODY:%.*]]			; ANALYZE-FULL-NEXT: br label [[FOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopUnroll/peel-loop-conditions.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -S -loop-unroll -verify-dom-info \| FileCheck %s

				declare void @f1()
				declare void @f2()

				; Check that we can peel off iterations that make conditions true.
				define void @test1(i32 %k) {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: for.body.lr.ph:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]
				; CHECK: for.body.peel.begin:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL:%.*]]
				; CHECK: for.body.peel:
				; CHECK-NEXT: [[CMP1_PEEL:%.*]] = icmp ult i32 0, 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL]], label [[IF_THEN_PEEL:%.]], label [[IF_ELSE_PEEL:%.]]
				; CHECK: if.else.peel:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC_PEEL:%.*]]
				; CHECK: if.then.peel:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL]]
				; CHECK: for.inc.peel:
				; CHECK-NEXT: [[INC_PEEL:%.*]] = add nsw i32 0, 1
				; CHECK-NEXT: [[CMP_PEEL:%.]] = icmp slt i32 [[INC_PEEL]], [[K:%.]]
				; CHECK-NEXT: br i1 [[CMP_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.]], label [[FOR_END:%.]]
				; CHECK: for.body.peel.next:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL2:%.*]]
				; CHECK: for.body.peel2:
				; CHECK-NEXT: [[CMP1_PEEL3:%.*]] = icmp ult i32 [[INC_PEEL]], 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL3]], label [[IF_THEN_PEEL5:%.]], label [[IF_ELSE_PEEL4:%.]]
				; CHECK: if.else.peel4:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC_PEEL6:%.*]]
				; CHECK: if.then.peel5:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL6]]
				; CHECK: for.inc.peel6:
				; CHECK-NEXT: [[INC_PEEL7:%.*]] = add nsw i32 [[INC_PEEL]], 1
				; CHECK-NEXT: [[CMP_PEEL8:%.*]] = icmp slt i32 [[INC_PEEL7]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL8]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next1:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_NEXT9:%.*]]
				; CHECK: for.body.peel.next9:
				; CHECK-NEXT: br label [[FOR_BODY_LR_PH_PEEL_NEWPH:%.*]]
				; CHECK: for.body.lr.ph.peel.newph:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_05:%.]] = phi i32 [ [[INC_PEEL7]], [[FOR_BODY_LR_PH_PEEL_NEWPH]] ], [ [[INC:%.]], [[FOR_INC:%.*]] ]
				; CHECK-NEXT: [[CMP1:%.*]] = icmp ult i32 [[I_05]], 2
				; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: if.else:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_05]], 1
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[K]]
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]], !llvm.loop !0
				; CHECK: for.end.loopexit:
				; CHECK-NEXT: br label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				for.body.lr.ph:
				br label %for.body

				for.body:
				%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
				%cmp1 = icmp ult i32 %i.05, 2
				br i1 %cmp1, label %if.then, label %if.else

				if.then:
				call void @f1()
				br label %for.inc

				if.else:
				call void @f2()
				br label %for.inc

				for.inc:
				%inc = add nsw i32 %i.05, 1
				%cmp = icmp slt i32 %inc, %k
				br i1 %cmp, label %for.body, label %for.end

				for.end:
				ret void
				}

				; Check we peel off the maximum number of iterations that make conditions true.
				define void @test2(i32 %k) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: for.body.lr.ph:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]
				; CHECK: for.body.peel.begin:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL:%.*]]
				; CHECK: for.body.peel:
				; CHECK-NEXT: [[CMP1_PEEL:%.*]] = icmp ult i32 0, 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL]], label [[IF_THEN_PEEL:%.]], label [[IF_ELSE_PEEL:%.]]
				; CHECK: if.else.peel:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[IF2_PEEL:%.*]]
				; CHECK: if.then.peel:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[IF2_PEEL]]
				; CHECK: if2.peel:
				; CHECK-NEXT: [[CMP2_PEEL:%.*]] = icmp ult i32 0, 4
				; CHECK-NEXT: br i1 [[CMP2_PEEL]], label [[IF_THEN2_PEEL:%.]], label [[FOR_INC_PEEL:%.]]
				; CHECK: if.then2.peel:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL]]
				; CHECK: for.inc.peel:
				; CHECK-NEXT: [[INC_PEEL:%.*]] = add nsw i32 0, 1
				; CHECK-NEXT: [[CMP_PEEL:%.]] = icmp slt i32 [[INC_PEEL]], [[K:%.]]
				; CHECK-NEXT: br i1 [[CMP_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.]], label [[FOR_END:%.]]
				; CHECK: for.body.peel.next:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL2:%.*]]
				; CHECK: for.body.peel2:
				; CHECK-NEXT: [[CMP1_PEEL3:%.*]] = icmp ult i32 [[INC_PEEL]], 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL3]], label [[IF_THEN_PEEL5:%.]], label [[IF_ELSE_PEEL4:%.]]
				; CHECK: if.else.peel4:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[IF2_PEEL6:%.*]]
				; CHECK: if.then.peel5:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[IF2_PEEL6]]
				; CHECK: if2.peel6:
				; CHECK-NEXT: [[CMP2_PEEL7:%.*]] = icmp ult i32 [[INC_PEEL]], 4
				; CHECK-NEXT: br i1 [[CMP2_PEEL7]], label [[IF_THEN2_PEEL8:%.]], label [[FOR_INC_PEEL9:%.]]
				; CHECK: if.then2.peel8:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL9]]
				; CHECK: for.inc.peel9:
				; CHECK-NEXT: [[INC_PEEL10:%.*]] = add nsw i32 [[INC_PEEL]], 1
				; CHECK-NEXT: [[CMP_PEEL11:%.*]] = icmp slt i32 [[INC_PEEL10]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL11]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next1:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL13:%.*]]
				; CHECK: for.body.peel13:
				; CHECK-NEXT: [[CMP1_PEEL14:%.*]] = icmp ult i32 [[INC_PEEL10]], 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL14]], label [[IF_THEN_PEEL16:%.]], label [[IF_ELSE_PEEL15:%.]]
				; CHECK: if.else.peel15:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[IF2_PEEL17:%.*]]
				; CHECK: if.then.peel16:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[IF2_PEEL17]]
				; CHECK: if2.peel17:
				; CHECK-NEXT: [[CMP2_PEEL18:%.*]] = icmp ult i32 [[INC_PEEL10]], 4
				; CHECK-NEXT: br i1 [[CMP2_PEEL18]], label [[IF_THEN2_PEEL19:%.]], label [[FOR_INC_PEEL20:%.]]
				; CHECK: if.then2.peel19:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL20]]
				; CHECK: for.inc.peel20:
				; CHECK-NEXT: [[INC_PEEL21:%.*]] = add nsw i32 [[INC_PEEL10]], 1
				; CHECK-NEXT: [[CMP_PEEL22:%.*]] = icmp slt i32 [[INC_PEEL21]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL22]], label [[FOR_BODY_PEEL_NEXT12:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next12:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL24:%.*]]
				; CHECK: for.body.peel24:
				; CHECK-NEXT: [[CMP1_PEEL25:%.*]] = icmp ult i32 [[INC_PEEL21]], 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL25]], label [[IF_THEN_PEEL27:%.]], label [[IF_ELSE_PEEL26:%.]]
				; CHECK: if.else.peel26:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[IF2_PEEL28:%.*]]
				; CHECK: if.then.peel27:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[IF2_PEEL28]]
				; CHECK: if2.peel28:
				; CHECK-NEXT: [[CMP2_PEEL29:%.*]] = icmp ult i32 [[INC_PEEL21]], 4
				; CHECK-NEXT: br i1 [[CMP2_PEEL29]], label [[IF_THEN2_PEEL30:%.]], label [[FOR_INC_PEEL31:%.]]
				; CHECK: if.then2.peel30:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL31]]
				; CHECK: for.inc.peel31:
				; CHECK-NEXT: [[INC_PEEL32:%.*]] = add nsw i32 [[INC_PEEL21]], 1
				; CHECK-NEXT: [[CMP_PEEL33:%.*]] = icmp slt i32 [[INC_PEEL32]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL33]], label [[FOR_BODY_PEEL_NEXT23:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next23:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_NEXT34:%.*]]
				; CHECK: for.body.peel.next34:
				; CHECK-NEXT: br label [[FOR_BODY_LR_PH_PEEL_NEWPH:%.*]]
				; CHECK: for.body.lr.ph.peel.newph:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_05:%.]] = phi i32 [ [[INC_PEEL32]], [[FOR_BODY_LR_PH_PEEL_NEWPH]] ], [ [[INC:%.]], [[FOR_INC:%.*]] ]
				; CHECK-NEXT: [[CMP1:%.*]] = icmp ult i32 [[I_05]], 2
				; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[IF2:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[IF2]]
				; CHECK: if2:
				; CHECK-NEXT: [[CMP2:%.*]] = icmp ult i32 [[I_05]], 4
				; CHECK-NEXT: br i1 [[CMP2]], label [[IF_THEN2:%.*]], label [[FOR_INC]]
				; CHECK: if.then2:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_05]], 1
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[K]]
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]], !llvm.loop !2
				; CHECK: for.end.loopexit:
				; CHECK-NEXT: br label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				for.body.lr.ph:
				br label %for.body

				for.body:
				%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
				%cmp1 = icmp ult i32 %i.05, 2
				br i1 %cmp1, label %if.then, label %if.else

				if.then:
				call void @f1()
				br label %if2

				if.else:
				call void @f2()
				br label %if2

				if2:
				%cmp2 = icmp ult i32 %i.05, 4
				br i1 %cmp2, label %if.then2, label %for.inc

				if.then2:
				call void @f1()
				br label %for.inc

				for.inc:
				%inc = add nsw i32 %i.05, 1
				%cmp = icmp slt i32 %inc, %k
				br i1 %cmp, label %for.body, label %for.end

				for.end:
				ret void
				}

				; Check that we can peel off iterations that make a condition false.
				define void @test3(i32 %k) {
				; CHECK-LABEL: @test3(
				; CHECK-NEXT: for.body.lr.ph:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]
				; CHECK: for.body.peel.begin:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL:%.*]]
				; CHECK: for.body.peel:
				; CHECK-NEXT: [[CMP1_PEEL:%.*]] = icmp ugt i32 0, 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL]], label [[IF_THEN_PEEL:%.]], label [[IF_ELSE_PEEL:%.]]
				; CHECK: if.else.peel:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC_PEEL:%.*]]
				; CHECK: if.then.peel:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL]]
				; CHECK: for.inc.peel:
				; CHECK-NEXT: [[INC_PEEL:%.*]] = add nsw i32 0, 1
				; CHECK-NEXT: [[CMP_PEEL:%.]] = icmp slt i32 [[INC_PEEL]], [[K:%.]]
				; CHECK-NEXT: br i1 [[CMP_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.]], label [[FOR_END:%.]]
				; CHECK: for.body.peel.next:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL2:%.*]]
				; CHECK: for.body.peel2:
				; CHECK-NEXT: [[CMP1_PEEL3:%.*]] = icmp ugt i32 [[INC_PEEL]], 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL3]], label [[IF_THEN_PEEL5:%.]], label [[IF_ELSE_PEEL4:%.]]
				; CHECK: if.else.peel4:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC_PEEL6:%.*]]
				; CHECK: if.then.peel5:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL6]]
				; CHECK: for.inc.peel6:
				; CHECK-NEXT: [[INC_PEEL7:%.*]] = add nsw i32 [[INC_PEEL]], 1
				; CHECK-NEXT: [[CMP_PEEL8:%.*]] = icmp slt i32 [[INC_PEEL7]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL8]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next1:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL10:%.*]]
				; CHECK: for.body.peel10:
				; CHECK-NEXT: [[CMP1_PEEL11:%.*]] = icmp ugt i32 [[INC_PEEL7]], 2
				; CHECK-NEXT: br i1 [[CMP1_PEEL11]], label [[IF_THEN_PEEL13:%.]], label [[IF_ELSE_PEEL12:%.]]
				; CHECK: if.else.peel12:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC_PEEL14:%.*]]
				; CHECK: if.then.peel13:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL14]]
				; CHECK: for.inc.peel14:
				; CHECK-NEXT: [[INC_PEEL15:%.*]] = add nsw i32 [[INC_PEEL7]], 1
				; CHECK-NEXT: [[CMP_PEEL16:%.*]] = icmp slt i32 [[INC_PEEL15]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL16]], label [[FOR_BODY_PEEL_NEXT9:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next9:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_NEXT17:%.*]]
				; CHECK: for.body.peel.next17:
				; CHECK-NEXT: br label [[FOR_BODY_LR_PH_PEEL_NEWPH:%.*]]
				; CHECK: for.body.lr.ph.peel.newph:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_05:%.]] = phi i32 [ [[INC_PEEL15]], [[FOR_BODY_LR_PH_PEEL_NEWPH]] ], [ [[INC:%.]], [[FOR_INC:%.*]] ]
				; CHECK-NEXT: [[CMP1:%.*]] = icmp ugt i32 [[I_05]], 2
				; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: if.else:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_05]], 1
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[K]]
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]], !llvm.loop !3
				; CHECK: for.end.loopexit:
				; CHECK-NEXT: br label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				for.body.lr.ph:
				br label %for.body

				for.body:
				%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
				%cmp1 = icmp ugt i32 %i.05, 2
				br i1 %cmp1, label %if.then, label %if.else

				if.then:
				call void @f1()
				br label %for.inc

				if.else:
				call void @f2()
				br label %for.inc

				for.inc:
				%inc = add nsw i32 %i.05, 1
				%cmp = icmp slt i32 %inc, %k
				br i1 %cmp, label %for.body, label %for.end

				for.end:
				ret void
				}

				; Test that we respect MaxPeelCount
				define void @test4(i32 %k) {
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: for.body.lr.ph:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]
				; CHECK: for.body.peel.begin:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL:%.*]]
				; CHECK: for.body.peel:
				; CHECK-NEXT: [[CMP1_PEEL:%.*]] = icmp ugt i32 0, 9999
				; CHECK-NEXT: br i1 [[CMP1_PEEL]], label [[IF_THEN_PEEL:%.]], label [[FOR_INC_PEEL:%.]]
				; CHECK: if.then.peel:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL]]
				; CHECK: for.inc.peel:
				; CHECK-NEXT: [[INC_PEEL:%.*]] = add nsw i32 0, 1
				; CHECK-NEXT: [[CMP_PEEL:%.]] = icmp slt i32 [[INC_PEEL]], [[K:%.]]
				; CHECK-NEXT: br i1 [[CMP_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.]], label [[FOR_END:%.]]
				; CHECK: for.body.peel.next:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL2:%.*]]
				; CHECK: for.body.peel2:
				; CHECK-NEXT: [[CMP1_PEEL3:%.*]] = icmp ugt i32 [[INC_PEEL]], 9999
				; CHECK-NEXT: br i1 [[CMP1_PEEL3]], label [[IF_THEN_PEEL4:%.]], label [[FOR_INC_PEEL5:%.]]
				; CHECK: if.then.peel4:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL5]]
				; CHECK: for.inc.peel5:
				; CHECK-NEXT: [[INC_PEEL6:%.*]] = add nsw i32 [[INC_PEEL]], 1
				; CHECK-NEXT: [[CMP_PEEL7:%.*]] = icmp slt i32 [[INC_PEEL6]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL7]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next1:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL9:%.*]]
				; CHECK: for.body.peel9:
				; CHECK-NEXT: [[CMP1_PEEL10:%.*]] = icmp ugt i32 [[INC_PEEL6]], 9999
				; CHECK-NEXT: br i1 [[CMP1_PEEL10]], label [[IF_THEN_PEEL11:%.]], label [[FOR_INC_PEEL12:%.]]
				; CHECK: if.then.peel11:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL12]]
				; CHECK: for.inc.peel12:
				; CHECK-NEXT: [[INC_PEEL13:%.*]] = add nsw i32 [[INC_PEEL6]], 1
				; CHECK-NEXT: [[CMP_PEEL14:%.*]] = icmp slt i32 [[INC_PEEL13]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL14]], label [[FOR_BODY_PEEL_NEXT8:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next8:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL16:%.*]]
				; CHECK: for.body.peel16:
				; CHECK-NEXT: [[CMP1_PEEL17:%.*]] = icmp ugt i32 [[INC_PEEL13]], 9999
				; CHECK-NEXT: br i1 [[CMP1_PEEL17]], label [[IF_THEN_PEEL18:%.]], label [[FOR_INC_PEEL19:%.]]
				; CHECK: if.then.peel18:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL19]]
				; CHECK: for.inc.peel19:
				; CHECK-NEXT: [[INC_PEEL20:%.*]] = add nsw i32 [[INC_PEEL13]], 1
				; CHECK-NEXT: [[CMP_PEEL21:%.*]] = icmp slt i32 [[INC_PEEL20]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL21]], label [[FOR_BODY_PEEL_NEXT15:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next15:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL23:%.*]]
				; CHECK: for.body.peel23:
				; CHECK-NEXT: [[CMP1_PEEL24:%.*]] = icmp ugt i32 [[INC_PEEL20]], 9999
				; CHECK-NEXT: br i1 [[CMP1_PEEL24]], label [[IF_THEN_PEEL25:%.]], label [[FOR_INC_PEEL26:%.]]
				; CHECK: if.then.peel25:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL26]]
				; CHECK: for.inc.peel26:
				; CHECK-NEXT: [[INC_PEEL27:%.*]] = add nsw i32 [[INC_PEEL20]], 1
				; CHECK-NEXT: [[CMP_PEEL28:%.*]] = icmp slt i32 [[INC_PEEL27]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL28]], label [[FOR_BODY_PEEL_NEXT22:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next22:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL30:%.*]]
				; CHECK: for.body.peel30:
				; CHECK-NEXT: [[CMP1_PEEL31:%.*]] = icmp ugt i32 [[INC_PEEL27]], 9999
				; CHECK-NEXT: br i1 [[CMP1_PEEL31]], label [[IF_THEN_PEEL32:%.]], label [[FOR_INC_PEEL33:%.]]
				; CHECK: if.then.peel32:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL33]]
				; CHECK: for.inc.peel33:
				; CHECK-NEXT: [[INC_PEEL34:%.*]] = add nsw i32 [[INC_PEEL27]], 1
				; CHECK-NEXT: [[CMP_PEEL35:%.*]] = icmp slt i32 [[INC_PEEL34]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL35]], label [[FOR_BODY_PEEL_NEXT29:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next29:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL37:%.*]]
				; CHECK: for.body.peel37:
				; CHECK-NEXT: [[CMP1_PEEL38:%.*]] = icmp ugt i32 [[INC_PEEL34]], 9999
				; CHECK-NEXT: br i1 [[CMP1_PEEL38]], label [[IF_THEN_PEEL39:%.]], label [[FOR_INC_PEEL40:%.]]
				; CHECK: if.then.peel39:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL40]]
				; CHECK: for.inc.peel40:
				; CHECK-NEXT: [[INC_PEEL41:%.*]] = add nsw i32 [[INC_PEEL34]], 1
				; CHECK-NEXT: [[CMP_PEEL42:%.*]] = icmp slt i32 [[INC_PEEL41]], [[K]]
				; CHECK-NEXT: br i1 [[CMP_PEEL42]], label [[FOR_BODY_PEEL_NEXT36:%.*]], label [[FOR_END]]
				; CHECK: for.body.peel.next36:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_NEXT43:%.*]]
				; CHECK: for.body.peel.next43:
				; CHECK-NEXT: br label [[FOR_BODY_LR_PH_PEEL_NEWPH:%.*]]
				; CHECK: for.body.lr.ph.peel.newph:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_05:%.]] = phi i32 [ [[INC_PEEL41]], [[FOR_BODY_LR_PH_PEEL_NEWPH]] ], [ [[INC:%.]], [[FOR_INC:%.*]] ]
				; CHECK-NEXT: [[CMP1:%.*]] = icmp ugt i32 [[I_05]], 9999
				; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_05]], 1
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[K]]
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]], !llvm.loop !4
				; CHECK: for.end.loopexit:
				; CHECK-NEXT: br label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				for.body.lr.ph:
				br label %for.body

				for.body:
				%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
				%cmp1 = icmp ugt i32 %i.05, 9999
				br i1 %cmp1, label %if.then, label %for.inc

				if.then:
				call void @f1()
				br label %for.inc

				for.inc:
				%inc = add nsw i32 %i.05, 1
				%cmp = icmp slt i32 %inc, %k
				br i1 %cmp, label %for.body, label %for.end

				for.end:
				ret void
				}

				; In this case we cannot peel the inner loop, because the condition involves
				; the outer induction variable.
				define void @test5(i32 %k) {
				; CHECK-LABEL: @test5(
				; CHECK-NEXT: for.body.lr.ph:
				; CHECK-NEXT: br label [[OUTER_HEADER:%.*]]
				; CHECK: outer.header:
				; CHECK-NEXT: [[J:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH:%.]] ], [ [[J_INC:%.]], [[OUTER_INC:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_05:%.]] = phi i32 [ 0, [[OUTER_HEADER]] ], [ [[INC:%.]], [[FOR_INC:%.*]] ]
				; CHECK-NEXT: [[CMP1:%.*]] = icmp ult i32 [[J]], 2
				; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: if.else:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_05]], 1
				; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[INC]], [[K:%.]]
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[OUTER_INC]]
				; CHECK: outer.inc:
				; CHECK-NEXT: [[J_INC]] = add nsw i32 [[J]], 1
				; CHECK-NEXT: [[OUTER_CMP:%.*]] = icmp slt i32 [[J_INC]], [[K]]
				; CHECK-NEXT: br i1 [[OUTER_CMP]], label [[OUTER_HEADER]], label [[FOR_END:%.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				for.body.lr.ph:
				br label %outer.header

				outer.header:
				%j = phi i32 [ 0, %for.body.lr.ph ], [ %j.inc, %outer.inc ]
				br label %for.body

				for.body:
				%i.05 = phi i32 [ 0, %outer.header ], [ %inc, %for.inc ]
				%cmp1 = icmp ult i32 %j, 2
				br i1 %cmp1, label %if.then, label %if.else

				if.then:
				call void @f1()
				br label %for.inc

				if.else:
				call void @f2()
				br label %for.inc

				for.inc:
				%inc = add nsw i32 %i.05, 1
				%cmp = icmp slt i32 %inc, %k
				br i1 %cmp, label %for.body, label %outer.inc

				outer.inc:
				%j.inc = add nsw i32 %j, 1
				%outer.cmp = icmp slt i32 %j.inc, %k
				br i1 %outer.cmp, label %outer.header, label %for.end


				for.end:
				ret void
				}

				define void @test6(i32 %k) {
				; CHECK-LABEL: @test6(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]
				; CHECK: for.body.peel.begin:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL:%.*]]
				; CHECK: for.body.peel:
				; CHECK-NEXT: [[CMP1_PEEL:%.*]] = icmp ult i32 0, 4
				; CHECK-NEXT: br i1 [[CMP1_PEEL]], label [[IF_THEN_PEEL:%.]], label [[IF_ELSE_PEEL:%.]]
				; CHECK: if.else.peel:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC_PEEL:%.*]]
				; CHECK: if.then.peel:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC_PEEL]]
				; CHECK: for.inc.peel:
				; CHECK-NEXT: [[INC_PEEL:%.*]] = add nsw i32 0, 2
				; CHECK-NEXT: [[J_INC_PEEL:%.*]] = add nsw i32 4, 1
				; CHECK-NEXT: [[CMP_PEEL:%.]] = icmp slt i32 [[INC_PEEL]], [[K:%.]]
				; CHECK-NEXT: br i1 [[CMP_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.]], label [[FOR_END:%.]]
				; CHECK: for.body.peel.next:
				; CHECK-NEXT: br label [[FOR_BODY_PEEL_NEXT1:%.*]]
				; CHECK: for.body.peel.next1:
				; CHECK-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
				; CHECK: entry.peel.newph:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_05:%.]] = phi i32 [ [[INC_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[INC:%.]], [[FOR_INC:%.*]] ]
				; CHECK-NEXT: [[J:%.*]] = phi i32 [ [[INC_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[INC]], [[FOR_INC]] ]
				; CHECK-NEXT: [[CMP1:%.*]] = icmp ult i32 [[I_05]], [[J]]
				; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: call void @f1()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: if.else:
				; CHECK-NEXT: call void @f2()
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_05]], 2
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[K]]
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]], !llvm.loop !5
				; CHECK: for.end.loopexit:
				; CHECK-NEXT: br label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%i.05 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%j = phi i32 [ 4, %entry ], [ %inc, %for.inc ]
				%cmp1 = icmp ult i32 %i.05, %j
				br i1 %cmp1, label %if.then, label %if.else

				if.then:
				call void @f1()
				br label %for.inc

				if.else:
				call void @f2()
				br label %for.inc

				for.inc:
				%inc = add nsw i32 %i.05, 2
				%j.inc = add nsw i32 %j, 1
				%cmp = icmp slt i32 %inc, %k
				br i1 %cmp, label %for.body, label %for.end

				for.end:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] Peel off iterations if it makes conditions true/false.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 138631

llvm/trunk/include/llvm/Transforms/Utils/UnrollLoop.h

llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp

llvm/trunk/lib/Transforms/Utils/LoopUnrollPeel.cpp

llvm/trunk/test/Transforms/LoopUnroll/complete_unroll_profitability_with_assume.ll

llvm/trunk/test/Transforms/LoopUnroll/peel-loop-conditions.ll

[LoopUnroll] Peel off iterations if it makes conditions true/false.
ClosedPublic