This is an archive of the discontinued LLVM Phabricator instance.

LoopVectorize: MaxVF should not be larger than the loop trip count
ClosedPublic

Authored by zvi on Sep 3 2017, 7:46 AM.

Details

Summary

Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count.

Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow:

  1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks.
  2. Compute MaxVF returned 16 on a target supporting AVX512.
  3. OptForSize -> choose VF:=MaxVF.
  4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop.

With this patch step 2. will choose MaxVF=8 based on TripCount.

Event Timeline

zvi created this revision.Sep 3 2017, 7:46 AM
hfinkel added a subscriber: hfinkel.Sep 3 2017, 8:00 AM
hfinkel added inline comments.
test/Transforms/LoopVectorize/X86/pr34438.ll
8

Can you please test the output directly instead of testing the debug output? We generally test debug output when there's no other effective way to write a test. It's better to have non-asserts-requiring tests.

zvi updated this revision to Diff 113702.Sep 3 2017, 8:47 AM

Test the contents of the transformed function instead of using debug prints. This addresses Hal's comment. Thanks.

zvi marked an inline comment as done.Sep 3 2017, 8:47 AM
hfinkel accepted this revision.Sep 3 2017, 8:48 AM

LGTM

This revision is now accepted and ready to land.Sep 3 2017, 8:48 AM
This revision was automatically updated to reflect the committed changes.
Ayal added inline comments.Sep 4 2017, 2:45 AM
llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
6165 ↗(On Diff #113729)

Would be good to restrict MaxVF to TC even if we're not OptForSize, although it's probably less likely to have a known TC that is larger than Tiny but smaller than current MaxVF.

6243 ↗(On Diff #113729)

Better set MaxVectorSize to PowerOf2Floor(ConstTripCount), to also handle the case where ConstTripCount is not a power of 2.

Also, while we're at it: computeFeasibleMaxVF() is currently not required/documented to return a power of 2. We should either check that MaxVF is a power of 2 when checking if (TC % MaxVF != 0), and then can simply set MaxVectorSize to ConstTripCount here. Otherwise we should document/assert that MaxVF is a power of 2.

zvi added inline comments.Sep 4 2017, 4:08 PM
llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
6165 ↗(On Diff #113729)

Thanks for pointing that out! Created pr34468 showing a missed opportunity for TinyTripCount < ConstTripCount < MaxVL.

6243 ↗(On Diff #113729)

Better set MaxVectorSize to PowerOf2Floor(ConstTripCount), to also handle the case where ConstTripCount is not a power of 2.

What would there be a benefit of this suggestion given that we currently bail out if there is a need for a tail-loop in the case of short-trip-count? Should we tie this with the work on vectorizing short trip count and allowing runtime checks/tail loops?

Also, while we're at it: computeFeasibleMaxVF() is currently not required/documented to return a power of 2. We should either check that MaxVF is a power of 2 when checking if (TC % MaxVF != 0), and then can simply set MaxVectorSize to ConstTripCount here. Otherwise we should document/assert that MaxVF is a power of 2.

It did not occur to me that MaxVF may not be a power of two. Can we label this as a follow-up work?

Ayal added inline comments.Sep 10 2017, 1:22 AM
llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
6243 ↗(On Diff #113729)

Better set MaxVectorSize to PowerOf2Floor(ConstTripCount), to also handle the case where ConstTripCount is not a power of 2.

What would there be a benefit of this suggestion given that we currently bail out if there is a need for a tail-loop in the case of short-trip-count? Should we tie this with the work on vectorizing short trip count and allowing runtime checks/tail loops?

Well, your pr34468 shows where there would be a benefit, namely when TinyTripCount < ConstTripCount < MaxVL which allows runtime checks, i.e., does not bail-out if a tail-loop is needed.

It did not occur to me that MaxVF may not be a power of two. Can we label this as a follow-up work?

Sure, suggest to simply document and assert that MaxVF is expected to be a power of two.

anna added a subscriber: anna.Sep 11 2017, 10:09 AM
anna added inline comments.
llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
6165 ↗(On Diff #113729)

This is exactly the case we are seeing. Currently, even with this patch, for loops with constant trip count = 16, we will have OptForSize as false (because TC=16 is same as TinyTripCountVectorThreshold).

So, if the MaxVF=32, we still won't vectorize the loop. I'm working on the fix.