This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] Rotate loop, when optimizing for size and can fully unroll a loop.
AbandonedPublic

Authored by fhahn on Apr 4 2019, 7:08 AM.

Details

Summary

If we can fully unroll a loop, there should be no size overhead when
rotating a loop, as a prerequisite for unrolling. Loop rotation is
conservative when optimizing for size, but we can rotate in loop-unroll,
if unrolling is beneficial.

To avoid rotating when LoopUnroll would fail, do some checks if rotating
will result in an unroll-able loop.

I measure code size on ARM64 with -Oz -flto on MultiSource, SPEC2000 and
SPEC2006. Codesize changes are as follows:

test-suite...Rodinia/backprop/backprop.test 1796.00 1752.00 -2.4%
test-suite.../Benchmarks/Olden/mst/mst.test 1436.00 1420.00 -1.1%
test-suite...006/447.dealII/447.dealII.test 256408.00 255040.00 -0.5%
test-suite...urce/Applications/lua/lua.test 88648.00 88508.00 -0.2%
test-suite.../CINT2000/176.gcc/176.gcc.test 837764.00 836520.00 -0.1%
test-suite...yApps-C++/PENNANT/PENNANT.test 40156.00 40188.00 0.1%
test-suite...rks/tramp3d-v4/tramp3d-v4.test 195420.00 195280.00 -0.1%
test-suite...0/253.perlbmk/253.perlbmk.test 335568.00 335400.00 -0.1%

Event Timeline

fhahn created this revision.Apr 4 2019, 7:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2019, 7:08 AM
paquette added inline comments.Apr 4 2019, 9:24 AM
llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
1096

Can you just use the OptForSize you defined earlier in the function?

1108

I feel like this comment doesn't really describe why you're avoiding loops that are unsafe to clone? Maybe a bit more detail here?

1119

This could use a comment explaining what you're doing here?

1130

I think it's probably worth factoring L->getHeader()->getParent() out into a variable at this point, if only to reduce the number of ->s. :)

fhahn marked an inline comment as done.Apr 4 2019, 9:57 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
1108

Having all of those checks here is a bit unfortunate, really, because they are basically taken as is from UnrollLoop, and the comments make more sense there IMO. I'll see if I can push the rotation logic into LoopUnroll, as an option.

Why do we need to rotate the loop before unrolling? llvm::UnrollLoop currently refuses to unroll loops where the latch is an unconditional branch, but that isn't a fundamental limitation, as far as I can tell. We already support unrolling loops where the latch is not the exit branch; allowing loops where the latch doesn't exit at all is a minor extension. Granted, it might be more efficient to explicitly rotate the loop before unrolling, so we don't clone quite so much code.

On a related note, independent of what we do in unrolling, it would probably be worthwhile to teach loop rotation to rotate loops where the cloned header would fold to zero instructions.

fhahn added a comment.Apr 4 2019, 1:04 PM

Why do we need to rotate the loop before unrolling? llvm::UnrollLoop currently refuses to unroll loops where the latch is an unconditional branch, but that isn't a fundamental limitation, as far as I can tell. We already support unrolling loops where the latch is not the exit branch; allowing loops where the latch doesn't exit at all is a minor extension. Granted, it might be more efficient to explicitly rotate the loop before unrolling, so we don't clone quite so much code.

IIUC the only reason requiring the conditional branch in the latch is to reduce the complexity of the unrolling code. In the non-optsize case, loop-rotate should take care of them. I can have a look and see how much work it would be to lift the restriction on UnrollLoop. The initial patch was just the easiest way to get things working, to give a good idea of the potential gains.

On a related note, independent of what we do in unrolling, it would probably be worthwhile to teach loop rotation to rotate loops where the cloned header would fold to zero instructions.

Great idea, it seems there are a few potential improvements for optsize around here. I'll add it to my todo list :)

Like D59832, #pragma clang loop unroll/#pragma unroll might require a loop rotation as well to successfully unroll?

fhahn added a comment.May 8 2019, 8:35 AM

On a related note, independent of what we do in unrolling, it would probably be worthwhile to teach loop rotation to rotate loops where the cloned header would fold to zero instructions.

I started looking into this and put up D61683, which uses UnrolledInstAnalyzer to simulate the simplification of the hoisted header, including a patch to move some functionality in UnrolledInstAnalyzer. It is still rough (the interface & naming needs improving) and it lead to some size improvements and regressions. I need to take a closer look at the regressions, I suspect they highlight some problems in other passes, that previously did not have to deal with a lot of loops with -Oz.

fhahn added a comment.May 15 2019, 1:51 PM

Why do we need to rotate the loop before unrolling? llvm::UnrollLoop currently refuses to unroll loops where the latch is an unconditional branch, but that isn't a fundamental limitation, as far as I can tell. We already support unrolling loops where the latch is not the exit branch; allowing loops where the latch doesn't exit at all is a minor extension. Granted, it might be more efficient to explicitly rotate the loop before unrolling, so we don't clone quite so much code.

It looks like there is not too much work needed to support loops with unconditional latches with exiting headers. I've put up a patch: D61962. Still needs a bit more evaluating of the impact on code-size/performance. It is more aggressive than this patch.

fhahn abandoned this revision.Jun 27 2019, 1:13 PM

The motivating case for this patch has been addressed by teaching UnrollLoop to deal with loops with unconditional latches: https://reviews.llvm.org/rL364398

Thanks for all the feedback!