This is an archive of the discontinued LLVM Phabricator instance.

Use profile info to adjust loop unroll threshold.
ClosedPublic

Authored by danielcdh on Nov 10 2016, 2:37 PM.

Details

Summary

For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.

Event Timeline

danielcdh updated this revision to Diff 77556.Nov 10 2016, 2:37 PM
danielcdh retitled this revision from to Use profile info to adjust loop unroll threshold..
danielcdh updated this object.
danielcdh added reviewers: davidxl, mzolotukhin.
danielcdh added a subscriber: llvm-commits.
mzolotukhin edited edge metadata.Nov 11 2016, 5:58 PM

Hi,

Thanks for working on this! Please find some comments inline.

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
762

This looks like a magic number to me. Can we use some parameter for it (or maybe separate thresholds for 'hot' and 'cold' loops)?

test/Other/pass-pipelines.ll
49–50 ↗(On Diff #77556)

Hmm, is loop-unroll in a separate instance of loop pass manager now?

danielcdh updated this revision to Diff 78260.Nov 16 2016, 2:02 PM
danielcdh edited edge metadata.

Update the patch to remove dependency to BFI/PSI and only use trip count to evaluate if we want to unroll the loop.

Also steal Micheal's getLoopEstimatedTripCount implementation.

lib/Transforms/Scalar/LoopUnrollPass.cpp
762

Logic removed from the patch

test/Other/pass-pipelines.ll
49–50 ↗(On Diff #77556)

Removed dependency to BFI/PSI

mzolotukhin accepted this revision.Nov 16 2016, 2:24 PM
mzolotukhin edited edge metadata.

The change looks good to me, thank you! I'm assuming you and Michael will figure out which version of getLoopEstimatedTripCount you want to use, other than that I have mostly nitpicky comments below.

BTW, do you have performance testing results for this patch? I'd expect some improvements in code-size and compile-time with these changes.

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
757

Please add some comment here.

758–759

if (auto ProfileTripCount = getLoopEstimatedTripCount(L)) ?

lib/Transforms/Utils/LoopUtils.cpp
1071

This version and the one from D25963 should eventually become the same, right?

1077–1078

Probably we also need to check that the latch is exiting (i.e. the branch is conditional).

test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll
6

Please add @ to the name.

7

Is it enough to just check presence of the prologue? Maybe explicitly check that we have several copies of some instruction?

This revision is now accepted and ready to land.Nov 16 2016, 2:24 PM
danielcdh updated this revision to Diff 78280.Nov 16 2016, 3:36 PM
danielcdh marked 5 inline comments as done.
danielcdh edited edge metadata.

update

lib/Transforms/Utils/LoopUtils.cpp
1071

Yes, I stole the code from D25963 ;-)

The perf/size impact of this patch is small on speccpu as flat loop is rare in most of the benchmarks.

spec/2006/fp/C++/444.namd 25.47 +0.30%
spec/2006/fp/C++/447.dealII 45.46 +0.23%
spec/2006/fp/C++/450.soplex 43.38 +0.58%
spec/2006/fp/C++/453.povray 37.88 -0.78%
spec/2006/fp/C/433.milc 23.75 -0.13%
spec/2006/fp/C/470.lbm 41.53 -0.09%
spec/2006/fp/C/482.sphinx3 48.97 -0.11%
spec/2006/int/C++/471.omnetpp 22.79 -0.22%
spec/2006/int/C++/473.astar 22.99 +0.17%
spec/2006/int/C++/483.xalancbmk 38.55 -0.42%
spec/2006/int/C/400.perlbench 37.06 +1.14%
spec/2006/int/C/401.bzip2 23.38 +0.98%
spec/2006/int/C/403.gcc 34.52 -0.25%
spec/2006/int/C/429.mcf 42.28 -0.05%
spec/2006/int/C/445.gobmk 27.98 +0.60%
spec/2006/int/C/456.hmmer 26.01 -0.06%
spec/2006/int/C/458.sjeng 30.42 +0.50%
spec/2006/int/C/462.libquantum 57.48 +0.37%
spec/2006/int/C/464.h264ref 47.6 -0.70%

geometric mean +0.11%

danielcdh closed this revision.Nov 16 2016, 5:26 PM