This is an archive of the discontinued LLVM Phabricator instance.

[LV] Clamp the VF to the trip count
ClosedPublic

Authored by anna on Sep 11 2017, 11:28 AM.

Details

Summary

When the MaxVectorSize > ConstantTripCount, we should just clamp the
vectorization factor to be the ConstantTripCount.
This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF.

Earlier we were finding the maximum vector width, which could be greater than
the trip count itself. The Loop vectorizer does all the work for generating a
vectorizable loop, but in the end we would always choose the scalar loop (since
the VF > trip count). This allows us to choose the VF keeping in mind the trip
count if available.

This is a fix on top of rL312472.

Event Timeline

anna created this revision.Sep 11 2017, 11:28 AM
Ayal added inline comments.Sep 11 2017, 2:05 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
6166

Better remove the default setting of TC = 0 in the declaration of computeFeasibleMaxVF(), which becomes dead.

6179

Probably good to hoist this DEBUG line along with computing TC earlier.

6258

While we're at it, in this or a follow-up patch:

  • Suggest to also clamp NewMaxVectorSize above to TC.
  • VS below should start from MaxVectorSize*2 instead of MaxVectorSize; it's pointless and wasteful to calculateRegisterUsage for MaxVectorSize, as the latter will be returned by default anyway.
  • Optimize calculateRegisterUsage() for the case where VFs is empty.

Doing this will also take care of your early exit above.

test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
3

You could reuse CHECK-AVX2 instead of adding CHECK-CLAMP, right?

anna marked 2 inline comments as done.Sep 12 2017, 4:56 AM
anna added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
6179

When OptForSize is true (i.e. at this point in the code), we know for sure that a small trip count value returns a constant trip count or value of 1, if we didn't know the trip count.

I left this here rather than move up because there maybe *more* spurious cases where we do not have a *small* trip count, and the DEBUG statement wouldn't be useful.

6258

To me, it seems rather than clamping the NewMaxVectorSize, explicitly showing the early return is clearer.
The second and third suggestions I'll do in a follow-up patch.

Also, an additional point to follow up: I think we can early return in even earlier case - when VF is set to 1 because the target has no vector registers.

anna updated this revision to Diff 114798.Sep 12 2017, 5:01 AM

Addressed review comments - no longer need the default argument,
updated test to reuse the existing RUN.

anna updated this revision to Diff 114815.Sep 12 2017, 6:18 AM

updated previous diff to CHECK for the correct VF in the first test.

Ayal added inline comments.Sep 12 2017, 6:40 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
6179

At this point I think TC can also be zero if we don't know it, which is why we check if TC<2 below (i.e., if (TC==0 || TC == 1)).

Printing out the value of TC when computed above can only aid debugging, afaics.

In any case, I'm also ok leaving it here; you're also printing TC in computeFeasibleMaxVF() where it clamps MaxVF.

6258

Agreed!

test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
54

The max possible vector width for this test on AVX1 is 16, so we're missing the point in checking its selected VF here. Suffice to CHECK-AVX2 only, whose max possible vector width is 32, as noted.

anna added a comment.Sep 12 2017, 7:26 AM

@Ayal, any other comments or does this look good to go? Thanks.

lib/Transforms/Vectorize/LoopVectorize.cpp
6179

yes, the clamped value is printed in the debug statement.

test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
54

agreed, I'll remove the CHECK-AVX1 check.

anna updated this revision to Diff 114839.Sep 12 2017, 7:36 AM
anna marked an inline comment as done.

minor review comment: removed unnecessary CHECK.

Ayal accepted this revision.Sep 12 2017, 7:51 AM

@Ayal, any other comments or does this look good to go? Thanks.

This looks good to go!

This revision is now accepted and ready to land.Sep 12 2017, 7:51 AM
This revision was automatically updated to reflect the committed changes.