For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count.
Details
- Reviewers
bcahoon mkazantsev sanjoy - Commits
- rG1376d934ed24: [Hexagon] peel loops with runtime small trip counts
rGb7322e8ac7a9: peel loops with runtime small trip counts
rG66c8ba5a50ea: peel loops with runtime small trip counts
rL329129: [Hexagon] peel loops with runtime small trip counts
rL329042: peel loops with runtime small trip counts
rL328854: peel loops with runtime small trip counts
Diff Detail
Event Timeline
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
803 ↗ | (On Diff #139743) | I don't think it is correct. computePeelCount will set PeelCount to 0 if the loop is unpeelable, and you are forcing it to some value without checking this fact. |
test/Transforms/LoopUnroll/Hexagon/peel-small-loop.ll | ||
---|---|---|
7 | Please add a test where you force peel count to 2 and the loop cannot be peeled, I believe it will expose a bug. |
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
803 ↗ | (On Diff #139743) | There is another call to canPeel() in peelLoop(). In the case of a target sets the peel count to Non zero, then the loop won’t get peel if it’s invalid to be peeled. |
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
803 ↗ | (On Diff #139743) |
I suggest you to move your logic inside computePeelCount and return the number of peeled iterations set by your platform-specific code *after* you check that the loop can be peeled. This will allow to replace the check in peelLoop with assertion later. |
lib/Transforms/Utils/LoopUnrollPeel.cpp | ||
490 |
|
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
803 ↗ | (On Diff #139743) | I have submitted https://reviews.llvm.org/D44919 for review to get rid of this second check, I believe this is how it should work. |
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
803 ↗ | (On Diff #139743) | How about changing computePeelCount to take the maximum of the passed in peelCount and the compute peel count? That seems to be in line with how other unroll settings are handled. For example, if computePeelCount finds a higher peel count, shouldn't that one be used? Also note that even settings like UnrollForcePeelCount are only applied after the other calculations could not find a peel count. |
lib/Transforms/Utils/LoopUnrollPeel.cpp | ||
---|---|---|
223 | You only need preset peel count in one place. How about remembering it into a variable in the beginning of the method, zeroing UP.PeelCount and using the variable where you need it? Then you will not need to zero UP.PeelCount before every return. |
test/Transforms/LoopUnroll/Hexagon/peel-small-loop.ll | ||
---|---|---|
7 | I still ask you to add such a test. |
Thanks Max. Please note that this patch also fixes a bug as well. Without this patch, the value passed by the flag -unroll-peel-count was ignored.
test/Transforms/LoopUnroll/Hexagon/peel-small-loop.ll | ||
---|---|---|
7 | There is already a test that check this case: test/Transforms/LoopUnroll/peel-loop-irreducible.ll. |
I think this patch may cause some inefficiencies in terms of compile time and generated code in case if unroller decides to make full unrolling or partial unrolling with user-forced number of iterations, see discussion of D44919. However it is not a correctness issue, i.e. resulting code will still produce right results.
Currently it fails by assert after the logic has changed, to fix it I propose to add canPeel check before forcing peel count.
Thank you very much Max for your review. I tried to zero UP.PeelCount before every return from computeUnrollCount that stands before computePeelCount and it causes test/Transforms/LoopUnroll/pr33437.ll to fail. This test expects to peel a loop first and than unrolling it.
Ok, then it seems to be expected behavior in some cases (which is strange to me, but OK). Then we can just add canPeel check to this patch to avoid crash. As I said before, checking this before setting peel count is reasonable by itself.
I committed this patch with canPeel() check before setting PeelCount in Hexagon Target. Thank you very much Max for your Review.
You only need preset peel count in one place. How about remembering it into a variable in the beginning of the method, zeroing UP.PeelCount and using the variable where you need it? Then you will not need to zero UP.PeelCount before every return.