This is an archive of the discontinued LLVM Phabricator instance.

Loop unroller: set thresholds for optsize and minsize functions to zero
ClosedPublic

Authored by hans on May 10 2016, 9:57 AM.

Details

Summary

Before r268509, Clang would disable the loop unroll pass when optimizing for size. That commit enabled it, to be able to support unroll pragmas in -Os builds. However, this regressed binary size in one of Chromium's DLLs with ~100 KB.

This restores the original behaviour of no unrolling at -Os, but doing it in LLVM instead of Clang makes more sense, and also allows the pragmas to keep working.

Diff Detail

Repository
rL LLVM

Event Timeline

hans updated this revision to Diff 56741.May 10 2016, 9:57 AM
hans retitled this revision from to Loop unroller: set thresholds for optsize and minsize functions to zero.
hans updated this object.
hans added reviewers: mamai, rnk.
hans added a subscriber: llvm-commits.
mamai added inline comments.May 10 2016, 10:31 AM
test/Transforms/LoopUnroll/unroll-opt-attribute.ll
7–10 ↗(On Diff #56741)

This test should not need to be modified if the #pragma unroll still works in /Os... The -unroll-count=4 should simulate the unroll pragma, and therefor the loop should be unrolled even if the optsize attribute is there.

rnk added inline comments.May 10 2016, 10:53 AM
test/Transforms/LoopUnroll/unroll-opt-attribute.ll
7–10 ↗(On Diff #56741)

Sounds like the LLVM pass is misbehaving then, and we should revert the clang side change to enable the loop unroller until this is addressed in LLVM.

hans added inline comments.May 10 2016, 10:56 AM
test/Transforms/LoopUnroll/unroll-opt-attribute.ll
7–10 ↗(On Diff #56741)

I'm looking into it. I'm not sure the -unroll-count flag and the pragma do the same thing. Maybe this is easy to fix..

hans added inline comments.May 10 2016, 11:05 AM
test/Transforms/LoopUnroll/unroll-opt-attribute.ll
7–10 ↗(On Diff #56741)

Yeah, -unroll-count does not simulate the pragma (is it supposed to?). It does set the count, but it doesn't modify the thresholds, whereas the pragma does:

if (PragmaCount > 0 ||
    ((PragmaFullUnroll || PragmaEnableUnroll) && TripCount != 0)) {                                                                    
  // If the loop has an unrolling pragma, we want to be more aggressive with                                                           
  // unrolling limits. Set thresholds to at least the PragmaTheshold value                                                             
  // which is larger than the default limits.                                                                                          
  if (UP.Threshold != NoThreshold)                                                                                                     
    UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);                                                            
  if (UP.PartialThreshold != NoThreshold)                                                                                              
    UP.PartialThreshold =
        std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);                                                                
}

The pragmas are tested in test/Transforms/LoopUnroll/unroll-pragmas.ll I'll add a test there for the optsize case.

hans updated this revision to Diff 56763.May 10 2016, 11:11 AM

Adding a test that makes sure the unroll pragma works at optsize.

Please take another look.

mamai resigned from this revision.May 10 2016, 11:36 AM
mamai removed a reviewer: mamai.

Ok then, it sounds good to me, at least for the test part. But I don't think I am the right person to accept this commit. I don't know why there was a threshold of 50 for optimize for size or why there was another threshold for partial unrolling. There might frontends other than clang that used it?

mamai added a subscriber: mamai.May 10 2016, 11:43 AM

Hi,

In general I agree with what this patch is trying to fix, but wouldn't it be easier to just set PartialOptSizeThreshold and OptSizeThreshold to 0 in BasicTTI? These attributes are aimed exactly at what you are trying to solve, but we just happened to not initialize them with proper values.

Thanks,
Michael

zzheng added a subscriber: zzheng.May 10 2016, 1:31 PM
hans updated this revision to Diff 56797.May 10 2016, 1:36 PM

In general I agree with what this patch is trying to fix, but wouldn't it be easier to just set PartialOptSizeThreshold and OptSizeThreshold to 0 in BasicTTI? These attributes are aimed exactly at what you are trying to solve, but we just happened to not initialize them with proper values.

Oops, I completely missed that these thresholds could be overridden by TTI. Thanks!

It seems the defaults are set here in gatherUnrollingPreferences() though, and then TTI can adjust them if it likes. I've updated the patch to simplify set the defaults to 0. Please take another look.

mzolotukhin accepted this revision.May 10 2016, 2:30 PM
mzolotukhin added a reviewer: mzolotukhin.

Thanks, this looks good to me. In terms of performance and code size/compile time this patch should only restore the behavior before r268509, right?

Michael

test/Transforms/LoopUnroll/unroll-opt-attribute.ll
110 ↗(On Diff #56797)

Redundant space in the end.

This revision is now accepted and ready to land.May 10 2016, 2:30 PM
hans added a comment.May 10 2016, 2:51 PM

Thanks, this looks good to me. In terms of performance and code size/compile time this patch should only restore the behavior before r268509, right?

For performance and code size, yes, this just restores the behaviour with Clang before r268509 (except for code with unroll pragmas). For compile time, we didn't use to have this pass in the pipeline at all before, so even with the threshold set to zero there is probably some cost.

test/Transforms/LoopUnroll/unroll-opt-attribute.ll
110 ↗(On Diff #56797)

Will fix when committing, thanks.

This revision was automatically updated to reflect the committed changes.
llvm/trunk/test/Transforms/LoopUnroll/PowerPC/p7-unrolling.ll