The patch limit unroll of loops with constant bounds by current threshold reducing unroll factor to minimum power-of-2.
Previously we gave up on unrolling if there were no unroll factor that is modulo of TripCount or could exceed threshold if there is such.
Details
Diff Detail
Event Timeline
Hi Evgeny,
Please find some comments inline.
Thanks,
Michael
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
646–651 | Do I understand it correctly that with this change we start to unroll every loop if it's possible at all (some of them with remainder)? If that's correct, have you measured performance, compile time, and code size impact of this change? Also, it might make sense to limit it to O3. | |
test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll | ||
1 | You probably don't want to run opt ... -O2 in this test. O2 will run the entire optimization pipeline, while we only want loop-unroll. | |
19 | Please pass the test through opt -instnamer to replace names like %0 with some symbolic names. These digit names make it harder to change the test in future. Also, you probably could only remove almost all instructions from the body to make the test smaller. If you run only loop-unroll, nothing will optimize that to ret void, so it would be fine. |
Hi Michael,
Thanks for the review.
Please find my responses inline.
Thanks,
Evgeny
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
646–651 | The change allows unroll only for loops that satisfy threshold limit. There are not much cases where this hits. On spec2000 I've got almost all build same or <0.1% code size changes. for (i = 0; i < 17; i++) I don't see any reason why we should restrict unroll in the case if we are in threshold limits. As for remainder - there is no such by default. As we know remainder TripCount (it is constant) we can jump into the middle of the loop first. The other point is that now we check threshold limit only at entrance. So potentially we can find unroll factor which exceed threshold limit. | |
test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll | ||
1 | "-O2" makes CHECK statements easier. However I agree it is not required here. I'll fix the test. | |
19 | Ok. Will update. |
test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll | ||
---|---|---|
1 | You could add some specific passes after yours. You probably need just something like -dce -instcombine -simplifycfg - some tests do this. Adding the entire "-O2" might introduce undesired side effects from e.g. running loop-unroll twice. |
Test updated.
I've created my own names instead of %0 and %1. Hope this is ok.
As for instructions in loop I'd like to leave some there (to disable full unrolling).
LGTM. Please run the patch through clang-format prior to committing, it looks like there are some missing spaces etc.
Michael
removed check (LoopSize - 2) * Count + 2 > UP.PartialThreshold
as at this point (LoopSize - 2) * Count + 2 <= max (UP.PartialThreshold, 3). So only UP.PartialThreshold equal to 2 can make comparison above true.
Since you've already got LGTM, you could proceed with checking it in (no need to wait for another explicit LGTM in this case).
Michael
Do I understand it correctly that with this change we start to unroll every loop if it's possible at all (some of them with remainder)? If that's correct, have you measured performance, compile time, and code size impact of this change? Also, it might make sense to limit it to O3.