This is an archive of the discontinued LLVM Phabricator instance.

Unroll of loops with constant bounds
ClosedPublic

Authored by evstupac on Mar 18 2016, 6:52 PM.

Details

Summary

The patch limit unroll of loops with constant bounds by current threshold reducing unroll factor to minimum power-of-2.
Previously we gave up on unrolling if there were no unroll factor that is modulo of TripCount or could exceed threshold if there is such.

Diff Detail

Event Timeline

evstupac updated this revision to Diff 51098.Mar 18 2016, 6:52 PM
evstupac retitled this revision from to Unroll of loops with constant bounds.
evstupac updated this object.
evstupac added reviewers: sanjoy, aizatsky, mzolotukhin.
evstupac set the repository for this revision to rL LLVM.
evstupac added a subscriber: llvm-commits.
mzolotukhin edited edge metadata.Mar 21 2016, 2:51 PM

Hi Evgeny,

Please find some comments inline.

Thanks,
Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
646–651

Do I understand it correctly that with this change we start to unroll every loop if it's possible at all (some of them with remainder)? If that's correct, have you measured performance, compile time, and code size impact of this change? Also, it might make sense to limit it to O3.

test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
1

You probably don't want to run opt ... -O2 in this test. O2 will run the entire optimization pipeline, while we only want loop-unroll.

19

Please pass the test through opt -instnamer to replace names like %0 with some symbolic names. These digit names make it harder to change the test in future.

Also, you probably could only remove almost all instructions from the body to make the test smaller. If you run only loop-unroll, nothing will optimize that to ret void, so it would be fine.

Hi Michael,

Thanks for the review.
Please find my responses inline.

Thanks,
Evgeny

lib/Transforms/Scalar/LoopUnrollPass.cpp
646–651

The change allows unroll only for loops that satisfy threshold limit. There are not much cases where this hits. On spec2000 I've got almost all build same or <0.1% code size changes.
Without the changes it happens that the same loop with unknown bounds get unrolled, but with constant no.
For example when we have prime TripCount we'll not found Count that is modulo of TripCount.

for (i = 0; i < 17; i++)

I don't see any reason why we should restrict unroll in the case if we are in threshold limits.

As for remainder - there is no such by default. As we know remainder TripCount (it is constant) we can jump into the middle of the loop first.

The other point is that now we check threshold limit only at entrance. So potentially we can find unroll factor which exceed threshold limit.

test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
1

"-O2" makes CHECK statements easier. However I agree it is not required here. I'll fix the test.

19

Ok. Will update.

mzolotukhin added inline comments.Mar 21 2016, 6:48 PM
test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
1

You could add some specific passes after yours. You probably need just something like -dce -instcombine -simplifycfg - some tests do this. Adding the entire "-O2" might introduce undesired side effects from e.g. running loop-unroll twice.

evstupac updated this revision to Diff 51504.Mar 23 2016, 6:55 PM
evstupac edited edge metadata.

Test updated.
I've created my own names instead of %0 and %1. Hope this is ok.
As for instructions in loop I'd like to leave some there (to disable full unrolling).

mzolotukhin accepted this revision.Mar 28 2016, 2:49 PM
mzolotukhin edited edge metadata.

LGTM. Please run the patch through clang-format prior to committing, it looks like there are some missing spaces etc.

Michael

This revision is now accepted and ready to land.Mar 28 2016, 2:49 PM
evstupac updated this revision to Diff 52284.Mar 31 2016, 2:05 PM
evstupac edited edge metadata.

rebased, passed clang-format

evstupac updated this revision to Diff 52408.Apr 1 2016, 12:23 PM

removed check (LoopSize - 2) * Count + 2 > UP.PartialThreshold

as at this point (LoopSize - 2) * Count + 2 <= max (UP.PartialThreshold, 3). So only UP.PartialThreshold equal to 2 can make comparison above true.

Since you've already got LGTM, you could proceed with checking it in (no need to wait for another explicit LGTM in this case).

Michael

This revision was automatically updated to reflect the committed changes.