This is an archive of the discontinued LLVM Phabricator instance.

Use continuous boosting factor for complete unroll.
ClosedPublic

Authored by danielcdh on Nov 22 2016, 1:32 PM.

Details

Summary

The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.

The patch also simplified the code by reducing one parameter in UP.

The patch only affects code-gen of two speccpu2006 benchmark:

445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.

Event Timeline

danielcdh updated this revision to Diff 78933.Nov 22 2016, 1:32 PM
danielcdh retitled this revision from to Use continuous boosting factor for complete unroll..
danielcdh updated this object.
danielcdh added a reviewer: mzolotukhin.
danielcdh added a subscriber: llvm-commits.
mzolotukhin edited edge metadata.Dec 1 2016, 5:54 PM

Hi Dehao,

Sorry for the delay, I missed the patch for some reason.

A couple of initial remarks:

  1. Could you split unrelated changes into a separate (probably, NFC) patch? For instance, changing int to unsigned.
  2. Did you check compile-time impact of this change? LLVM-testsuite exposed a couple of problems with the original version, so I wonder if it's ok with the new one.

Thanks,
Michael

include/llvm/Analysis/TargetTransformInfo.h
246–257

What is the formula for the threshold boost? Could you include it into the comment?

lib/Transforms/InstCombine/InstCombinePHI.cpp
570 ↗(On Diff #78933)

Why do we need to remove it?

danielcdh updated this revision to Diff 80022.Dec 1 2016, 7:48 PM
danielcdh edited edge metadata.

rebase

danielcdh updated this revision to Diff 80023.Dec 1 2016, 7:53 PM
danielcdh marked an inline comment as done.

update comment

danielcdh marked an inline comment as done.Dec 1 2016, 7:54 PM

Hi Dehao,

Sorry for the delay, I missed the patch for some reason.

A couple of initial remarks:

  1. Could you split unrelated changes into a separate (probably, NFC) patch? For instance, changing int to unsigned.

Done

  1. Did you check compile-time impact of this change? LLVM-testsuite exposed a couple of problems with the original version, so I wonder if it's ok with the new one.

I checked the compile time impact on speccpu2006, no noticeable compile time change is observed.

Thanks,
Dehao

Thanks,
Michael

lib/Transforms/InstCombine/InstCombinePHI.cpp
570 ↗(On Diff #78933)

This is irrelevant, removed the change.

Hi Dehao,

Please find some comments inline.

Thanks,
Michael

include/llvm/Analysis/TargetTransformInfo.h
246–257

I would also add a couple of examples to clarify the intention of this boost. Like if unrolling reduces the loop (in terms of execution time) by a factor of 4x, then we boost the threshold by the factor of 4. If unrolling isn't expected to reduce the running time, then we don't increase the threshold.

lib/Transforms/Scalar/LoopUnrollPass.cpp
49–57

I think this description might be unclear for future users who are not familiar with this patch. We need to reflect what, when, and how this parameter boosts.

766–767
  1. Why do we use Benefit*Benefit here?
  2. The result is unsigned, which means that the values we can get here are very limited (1,2,3, or 4 with PercentMaxThresholdBoost = 400). I'd suggest computing the upper bound for the cost, not the benefit to work around it.

Also, this code needs some comments, it's not obvious what we're doing here.

test/Transforms/LoopUnroll/full-unroll-crashers.ll
2

Don't we need a corresponding -unroll-max-percent-threshold-boost argument here? We need to explicitly pass it so that the test is immune to future changes in default thresholds.

test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll
1

Same here and in other tests.

danielcdh updated this revision to Diff 81493.Dec 14 2016, 4:09 PM
danielcdh marked 2 inline comments as done.

update

lib/Transforms/Scalar/LoopUnrollPass.cpp
766–767

Updated the code and comments, hopefully makes it clear.

mzolotukhin added inline comments.Dec 14 2016, 5:02 PM
include/llvm/Analysis/TargetTransformInfo.h
252–253

Shouldn't it be 2x instead of 4x?

lib/Transforms/Scalar/LoopUnrollPass.cpp
54

I didn't realize that we use this formula - I thought we're using a linear function. Doesn't it contradict to this?

/// BoostedThreshold = Threshold * min(RolledCost / UnrolledCost,
///                                    PercentMaxThresholdBoost)
test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
1

I think unroll-max-percent-threshold-boost should be 100 to correspond to unroll-dynamic-cost-savings-discount=0. BTW, it looks a bit weird that value 100 for the boost means that we are actually not boosting anything.

test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll
1

Same here.

danielcdh updated this revision to Diff 81514.Dec 14 2016, 6:05 PM

update

include/llvm/Analysis/TargetTransformInfo.h
252–253

Updated the comment to make it consistent.

lib/Transforms/Scalar/LoopUnrollPass.cpp
54

Updated the comment to make it consistent.

test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
1

Yes, 100 means 100%, i.e no boosting. Maybe we need to change the naming to make it more accurate, suggestions?

mzolotukhin added inline comments.Dec 15 2016, 12:21 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
54

Where is this 1/(1-X^2) come from? To me from the code it looks like:

NewThreshold = DefaultThreshold * X^2

Also, is there any compelling reason not use a simple linear formula for this?
E.g.

NewThreshold = DefaultThreshold * Y,
Y = min(RolledCost/UnrolledCost, BoostLimit)
danielcdh updated this revision to Diff 81657.Dec 15 2016, 1:56 PM

Update comment

danielcdh updated this revision to Diff 81659.Dec 15 2016, 2:10 PM

change to linear boosting factor computation.

danielcdh added inline comments.Dec 15 2016, 2:12 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
54

Thanks for the reviews!

Updated the comment to make it clearer.

No compelling reason, just thought it would be helpful to promote more complete inline when beneficial. Performance experiments does not justify the non-linear equation, so I changed to linear formula instead.

mzolotukhin accepted this revision.Dec 15 2016, 4:04 PM
mzolotukhin edited edge metadata.

Hi Dehao,

Thanks, the patch mostly looks good to me (see minor remarks inline). I also would like Chandler to take a look at this too, as we discussed this with him in the past - I've added him to reviewers.

Thanks,
Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
54

The comment is still about quadratic formula:)

765

This computation potentially may overflow. We probably need some checks against it.

test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll
1

Minor: if possible, please don't change -unroll-threshold=12. We're replacing -dynamic-cost-savings-discount and -unroll-percent-dynamic-cost-saved-threshold, so the changes should touch only them.

test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
24

Can we avoid this change? If no, the comment before the test needs to be updated.

This revision is now accepted and ready to land.Dec 15 2016, 4:04 PM
danielcdh updated this revision to Diff 81758.Dec 16 2016, 8:30 AM
danielcdh marked 2 inline comments as done.
danielcdh edited edge metadata.

update

Thanks for the review!

Chandler, could you help take a look at this patch too?

Thanks,
Dehao

test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll
1

Because we are now using linear equation to compute boosting factor, it is not enough to boost 10 to the required threshold. That's the original reason I used non-linear equation.

chandlerc added inline comments.Dec 28 2016, 6:11 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
49–56

Here and above, if the value is a limit or max, start with that. So I would say MaxPercentThresholdBoost. In fact, the flag string below is already that order.

Also, I liked having these be different from the names in the TTI struct. Can you keep the Unroll prefix that used to be here? That also matches the flag text.

51–56

The text here doesn't really parse for me. It starts off talking about something other than what it is. How about:

The maximum 'boost' (represented as a percentage >= 100) applied to
the threshold when aggressively unrolling a loop due to the dynamic
cost savings. If completely unrolling a loop will reduce the total
runtime from X to Y, we boost the lop unroll threshold to
DefaultThreshold*std::min(MaxPercentThresholdBoost, X/Y). This limit
avoids excessive code bloat.
762–768

How about factoring the logic to compute this BoostFactorPercent into a helper function? I think that would make it more clear -- you could use early return to bail out in error conditions.

I think it would also be a good place to explain some of the reasoning behind the specific formula (runtime cost / unrolled cost) used to scale the threshold.

danielcdh updated this revision to Diff 82674.Dec 29 2016, 9:04 AM
danielcdh marked 3 inline comments as done.

integrate Chandler's comments

Thanks for the reviews! PTAL

chandlerc accepted this revision.Dec 29 2016, 1:51 PM
chandlerc edited edge metadata.

I'm good with the patch now, thanks (see nit picks below, but no need to refresh the patch, just fix prior to submitting).

I would like to see at least code size numbers for the llvm test suite benchmarks before you submit. (I'm interested if there are runtime changes, but not really worried about them.) If the code size doesn't regress significantly (as SPEC doesn't), LGTM.

lib/Transforms/Scalar/LoopUnrollPass.cpp
54

lop -> loop

762–763

I think this will format in an easier to read way with a variable like Boost.

danielcdh updated this revision to Diff 82704.Dec 29 2016, 3:58 PM
danielcdh marked an inline comment as done.
danielcdh edited edge metadata.

update

I'm good with the patch now, thanks (see nit picks below, but no need to refresh the patch, just fix prior to submitting).

I would like to see at least code size numbers for the llvm test suite benchmarks before you submit. (I'm interested if there are runtime changes, but not really worried about them.) If the code size doesn't regress significantly (as SPEC doesn't), LGTM.

The code size does not change except for the following 2 binaries:

CMakeFiles/CheckTypeSize/CMAKE_SIZEOF_UNSIGNED_SHORT.bin 8112->8104 (0.1%)
CMakeFiles/TestEndianess.bin 8096->8088 (0.1%)

The run time did not change.

danielcdh closed this revision.Dec 29 2016, 5:01 PM