This is an archive of the discontinued LLVM Phabricator instance.

[LV] Clamp the VF to the trip count
ClosedPublic

Authored by anna on Sep 11 2017, 11:28 AM.

Details

Summary

When the MaxVectorSize > ConstantTripCount, we should just clamp the
vectorization factor to be the ConstantTripCount.
This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF.

Earlier we were finding the maximum vector width, which could be greater than
the trip count itself. The Loop vectorizer does all the work for generating a
vectorizable loop, but in the end we would always choose the scalar loop (since
the VF > trip count). This allows us to choose the VF keeping in mind the trip
count if available.

This is a fix on top of rL312472.

Diff Detail

Repository
rL LLVM

Event Timeline

anna created this revision.Sep 11 2017, 11:28 AM
Ayal added inline comments.Sep 11 2017, 2:05 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
6166 ↗(On Diff #114643)

Better remove the default setting of TC = 0 in the declaration of computeFeasibleMaxVF(), which becomes dead.

6179 ↗(On Diff #114643)

Probably good to hoist this DEBUG line along with computing TC earlier.

6258 ↗(On Diff #114643)

While we're at it, in this or a follow-up patch:

  • Suggest to also clamp NewMaxVectorSize above to TC.
  • VS below should start from MaxVectorSize*2 instead of MaxVectorSize; it's pointless and wasteful to calculateRegisterUsage for MaxVectorSize, as the latter will be returned by default anyway.
  • Optimize calculateRegisterUsage() for the case where VFs is empty.

Doing this will also take care of your early exit above.

test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
3 ↗(On Diff #114643)

You could reuse CHECK-AVX2 instead of adding CHECK-CLAMP, right?

anna marked 2 inline comments as done.Sep 12 2017, 4:56 AM
anna added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
6179 ↗(On Diff #114643)

When OptForSize is true (i.e. at this point in the code), we know for sure that a small trip count value returns a constant trip count or value of 1, if we didn't know the trip count.

I left this here rather than move up because there maybe *more* spurious cases where we do not have a *small* trip count, and the DEBUG statement wouldn't be useful.

6258 ↗(On Diff #114643)

To me, it seems rather than clamping the NewMaxVectorSize, explicitly showing the early return is clearer.
The second and third suggestions I'll do in a follow-up patch.

Also, an additional point to follow up: I think we can early return in even earlier case - when VF is set to 1 because the target has no vector registers.

anna updated this revision to Diff 114798.Sep 12 2017, 5:01 AM

Addressed review comments - no longer need the default argument,
updated test to reuse the existing RUN.

anna updated this revision to Diff 114815.Sep 12 2017, 6:18 AM

updated previous diff to CHECK for the correct VF in the first test.

Ayal added inline comments.Sep 12 2017, 6:40 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
6179 ↗(On Diff #114643)

At this point I think TC can also be zero if we don't know it, which is why we check if TC<2 below (i.e., if (TC==0 || TC == 1)).

Printing out the value of TC when computed above can only aid debugging, afaics.

In any case, I'm also ok leaving it here; you're also printing TC in computeFeasibleMaxVF() where it clamps MaxVF.

6258 ↗(On Diff #114643)

Agreed!

test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
54 ↗(On Diff #114798)

The max possible vector width for this test on AVX1 is 16, so we're missing the point in checking its selected VF here. Suffice to CHECK-AVX2 only, whose max possible vector width is 32, as noted.

anna added a comment.Sep 12 2017, 7:26 AM

@Ayal, any other comments or does this look good to go? Thanks.

lib/Transforms/Vectorize/LoopVectorize.cpp
6179 ↗(On Diff #114643)

yes, the clamped value is printed in the debug statement.

test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
54 ↗(On Diff #114798)

agreed, I'll remove the CHECK-AVX1 check.

anna updated this revision to Diff 114839.Sep 12 2017, 7:36 AM
anna marked an inline comment as done.

minor review comment: removed unnecessary CHECK.

Ayal accepted this revision.Sep 12 2017, 7:51 AM

@Ayal, any other comments or does this look good to go? Thanks.

This looks good to go!

This revision is now accepted and ready to land.Sep 12 2017, 7:51 AM
This revision was automatically updated to reflect the committed changes.