For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.
Details
Diff Detail
- Build Status
Buildable 1172 Build 1172: arc lint + arc unit
Event Timeline
Hi,
Thanks for working on this! Please find some comments inline.
Michael
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
765 | This looks like a magic number to me. Can we use some parameter for it (or maybe separate thresholds for 'hot' and 'cold' loops)? | |
test/Other/pass-pipelines.ll | ||
49–50 | Hmm, is loop-unroll in a separate instance of loop pass manager now? |
Update the patch to remove dependency to BFI/PSI and only use trip count to evaluate if we want to unroll the loop.
Also steal Micheal's getLoopEstimatedTripCount implementation.
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
765 | Logic removed from the patch | |
test/Other/pass-pipelines.ll | ||
49–50 | Removed dependency to BFI/PSI |
The change looks good to me, thank you! I'm assuming you and Michael will figure out which version of getLoopEstimatedTripCount you want to use, other than that I have mostly nitpicky comments below.
BTW, do you have performance testing results for this patch? I'd expect some improvements in code-size and compile-time with these changes.
Michael
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
760 | Please add some comment here. | |
761–762 | if (auto ProfileTripCount = getLoopEstimatedTripCount(L)) ? | |
lib/Transforms/Utils/LoopUtils.cpp | ||
1071 ↗ | (On Diff #78260) | This version and the one from D25963 should eventually become the same, right? |
1077–1078 ↗ | (On Diff #78260) | Probably we also need to check that the latch is exiting (i.e. the branch is conditional). |
test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll | ||
6 | Please add @ to the name. | |
7 | Is it enough to just check presence of the prologue? Maybe explicitly check that we have several copies of some instruction? |
The perf/size impact of this patch is small on speccpu as flat loop is rare in most of the benchmarks.
spec/2006/fp/C++/444.namd 25.47 +0.30%
spec/2006/fp/C++/447.dealII 45.46 +0.23%
spec/2006/fp/C++/450.soplex 43.38 +0.58%
spec/2006/fp/C++/453.povray 37.88 -0.78%
spec/2006/fp/C/433.milc 23.75 -0.13%
spec/2006/fp/C/470.lbm 41.53 -0.09%
spec/2006/fp/C/482.sphinx3 48.97 -0.11%
spec/2006/int/C++/471.omnetpp 22.79 -0.22%
spec/2006/int/C++/473.astar 22.99 +0.17%
spec/2006/int/C++/483.xalancbmk 38.55 -0.42%
spec/2006/int/C/400.perlbench 37.06 +1.14%
spec/2006/int/C/401.bzip2 23.38 +0.98%
spec/2006/int/C/403.gcc 34.52 -0.25%
spec/2006/int/C/429.mcf 42.28 -0.05%
spec/2006/int/C/445.gobmk 27.98 +0.60%
spec/2006/int/C/456.hmmer 26.01 -0.06%
spec/2006/int/C/458.sjeng 30.42 +0.50%
spec/2006/int/C/462.libquantum 57.48 +0.37%
spec/2006/int/C/464.h264ref 47.6 -0.70%
geometric mean +0.11%
Please add some comment here.