This is an archive of the discontinued LLVM Phabricator instance.

[Unroll] Do NOT unroll a loop with small runtime upperbound
ClosedPublic

Authored by zzheng on Jun 6 2019, 5:52 PM.

Details

Summary

For a runtime loop if we can compute its trip count upperbound:

Don't unroll if:

  1. loop is not guaranteed to run either zero or upperbound iterations; and
  2. trip count upperbound is less than UnrollMaxUpperBound

Unless user or TTI asked to do so.

If unrolling, limit unroll factor to loop's trip count upperbound.

Diff Detail

Event Timeline

zzheng created this revision.Jun 6 2019, 5:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2019, 5:52 PM
Herald added a subscriber: dmgreen. · View Herald Transcript
zzheng updated this revision to Diff 203488.Jun 6 2019, 9:13 PM

Re-upload patch with full context.

zzheng edited the summary of this revision. (Show Details)Jun 6 2019, 9:14 PM

What problem is this solving?

test/Transforms/LoopUnroll/runtime-small-upperbound.ll
29–31

Is it necessary ro have the store from/to the global here? Without it, you could could just

CHECK: store
CHECK-NOT: store
zzheng updated this revision to Diff 203636.Jun 7 2019, 3:56 PM
define dso_local void @hoge_3(i8 %arg) {
entry:
  %x = load i32, i32* @global, align 4
  %y = load i8*, i8** @global.1, align 4
  %0 = icmp ult i32 %x, 17
  br i1 %0, label %loop, label %exit

loop:
  %iv = phi i32 [ %x, %entry ], [ %iv.next, %loop ]
  %ptr = phi i8* [ %y, %entry ], [ %ptr.next, %loop ]
  %iv.next = add nuw i32 %iv, 8
  %ptr.next = getelementptr inbounds i8, i8* %ptr, i32 1
  store i8 %arg, i8* %ptr.next, align 1
  %1 = icmp ult i32 %iv.next, 17
  br i1 %1, label %loop, label %exit

exit:
  ret void
}

This loop has a runtime trip count upper bound of 3.

It's NOT guaranteed this loop will run either 0 or 3 iterations, so computeUnrollCount() chooses not to full-unroll it since icmp and br in each duplicated loop body cannot be removed.

computeUnrollCount() then decides to unroll it by 8 (runtime unrolling). But the unrolled loop will never execute, wasting code size, and runtime check.

If user/TTI really want it unrolled, we need to limit unroll factor by the upper bound.

zzheng marked an inline comment as done.Jun 7 2019, 3:59 PM

What problem is this solving?

Unwanted (or excessive) unrolling of loops with small runtime trip count upper bound.

Unwanted (or excessive) unrolling of loops with small runtime trip count upper bound.

Wouldn't I want to (fully) unroll loops especially those with low trip count to reduce overhead/enable further optimizations?

Do you have performance/code size numbers?

zzheng added a comment.EditedJun 7 2019, 4:58 PM

Wouldn't I want to (fully) unroll loops especially those with low trip count to reduce overhead/enable further optimizations?

Do you have performance/code size numbers?

The overhead of loop exit check will not be reduced after full unroll:

// Try to find the trip count upper bound if we cannot find the exact trip
// count.
bool MaxOrZero = false;
if (!TripCount) {
  MaxTripCount = SE.getSmallConstantMaxTripCount(L);
  MaxOrZero = SE.isBackedgeTakenCountMaxOrZero(L);
  // We can unroll by the upper bound amount if it's generally allowed or if
  // we know that the loop is executed either the upper bound or zero times.
  // (MaxOrZero unrolling keeps only the first loop test, so the number of
  // loop tests remains the same compared to the non-unrolled version, whereas
  // the generic upper bound unrolling keeps all but the last loop test so the
  // number of loop tests goes up which may end up being worse on targets with
  // constrained branch predictor resources so is controlled by an option.)
  // In addition we only unroll small upper bounds.
  if (!(UP.UpperBound || MaxOrZero) || MaxTripCount > UnrollMaxUpperBound) {
    MaxTripCount = 0;
  }
}

This loop was found from an internal workload, unrolling it impact performance negatively.

hfinkel added inline comments.
lib/Transforms/Scalar/LoopUnrollPass.cpp
795

Did you make this new variable so that you could apply this !(UP.UpperBound || MaxOrZero) || UnrollByMaxCount > UnrollMaxUpperBound condition only to full-unrolling case and not to the partial unrolling case?

1097

Why did this move down here?

zzheng marked 2 inline comments as done.Jun 10 2019, 5:27 PM
zzheng added inline comments.
lib/Transforms/Scalar/LoopUnrollPass.cpp
795

No, new variable UnrollByMaxCount replaces use of MaxTripCount in full-unrolling case.

We only full-unroll a runtime loop by its upper bound if MaxOrZero is true; otherwise runtime unrolling will handle it, sometimes unroll more than MaxTripCount. Runtime unrolling needs to know MaxTripCount, so we cannot set it to zero here.

https://reviews.llvm.org/D63064 is trying to solve the same problem for ARM target.

I'm confuse why this relevant to partial unrolling, which only applies to known TripCount. We only calculate MaxTripCount and MaxOrZero if TripCount is 0.

1097

To move it close to use, like MaxOrZero.

hfinkel added inline comments.Jun 10 2019, 9:44 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
795

I'm confuse why this relevant to partial unrolling, which only applies to known TripCount. We only calculate MaxTripCount and MaxOrZero if TripCount is 0.

I asked because the following conditional is added in this patch (before the pre-existing debug output), which seems to apply MaxTripCount to limit partial unrolling (at least if the debug output below the conditional is accurate). Maybe the debug output should say runtime unrolling?

if (MaxTripCount && UP.Count > MaxTripCount)
  UP.Count = MaxTripCount;

LLVM_DEBUG(dbgs() << "  partially unrolling with count: " << UP.Count
                  << "\n");

new variable UnrollByMaxCount replaces use of MaxTripCount in full-unrolling case.

Okay. If UnrollByMaxCount is only used during full unrolling, then can we name it FullUnrollMaxCount, or similar, to make that clear?

zzheng updated this revision to Diff 204151.Jun 11 2019, 1:26 PM
zzheng edited the summary of this revision. (Show Details)

Renamed UnrollByMaxCount to FullUnrollMaxTripCount, note this is not to be confused with member variable UP.FullUnrollMaxCount.
Changed debug output to accurately reflect partial or runtime unrolling.

zzheng marked an inline comment as done.Jun 11 2019, 1:27 PM
zzheng edited the summary of this revision. (Show Details)
hfinkel accepted this revision.Sep 9 2019, 5:55 AM

LGTM

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
168

Can you please add a comment here next to false so that it is clear what this is

/*MaxOrZero*/ false
This revision is now accepted and ready to land.Sep 9 2019, 5:55 AM
This revision was automatically updated to reflect the committed changes.