This is an archive of the discontinued LLVM Phabricator instance.

[LV] Make sure smallest/widest type sizes are powers-of-2.
ClosedPublic

Authored by fhahn on May 30 2020, 8:16 AM.

Details

Summary

LV currently only supports power of 2 vectorization factors, which has
been made explicit with the assertion added in
840450549c9199150cbdee29acef756c19660ca1.

However, if the widest type is not a power-of-2 the computed maxVF won't
be a power-of-2 either. This patch changes getSmallestAndWidestTypes to
round up to the next power-of-2. This can happen in practice for
x86_fp80, for example. Alternatively we could force the computed max VF
to the next-lowest power-of-2

Fixes PR46139.

Diff Detail

Event Timeline

fhahn created this revision.May 30 2020, 8:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2020, 8:17 AM
Ayal added a comment.May 30 2020, 1:56 PM

Alternatively we could force the computed max VF to the next-lowest power-of-2

This alternative may perhaps be better, or another one, as noted inline.

Also added a couple of unrelated comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5011

The original motivation to assert that MaxVF is a power of 2, in D80491, was for concluding that any chosen VF will divide a constant trip count.
Perhaps a better fix than D80491 would be (have been ..., as noted in the summary) to round MaxVF down to a power of 2 here, rather than forcing computeFeasibleMaxVF() and/or getSmallestAndWidestTypes() to return powers of 2.

5077

Unrelated: makes sense to limit MaxVF to ConstTripCount also if the latter is not a power of 2, by rounding it down to a power of 2 (or up, with fold-tail?).

5117

Unrelated: getMinimumVF() must also be (asserted to be) a power of 2; moreover, it must not cause MaxVF to exceed MaxSafeRegisterWidth.

5228

The critical part is to make sure VF does not exceed MaxSafeRegisterWidth; this indeed holds when rounding MinWidth and MaxWidth up to powers of 2; might not hold when rounding them down, conceptually.

But perhaps non-power-of-2 sized types should be skipped when computing MinWidth and MaxWidth, related to the last "also" part of the FIXME above? cf. hasIrregularType().

llvm/test/Transforms/LoopVectorize/X86/fp80-widest-type.ll
10

Worth indicating that this fixes PR46139; e.g., in a comment, test name, file name.

bjope added a subscriber: bjope.Jun 1 2020, 8:27 AM
bjope removed a subscriber: bjope.Jun 1 2020, 10:39 AM
bjope added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5011

Rounding down MaxVF returned by computeFeasibleMaxVF (maybe at the end of that function rather than here) would probably work better for our OOT target (with 160-bit register and i40 types, so neither the Smallest/WidestType nor the TTI.getRegisterBitWidth(true) returns a power-of-2 value in that situation).

5228

I did play around with ignoring "irregular" types here a few days ago. In most situations I think it is ok. But I also noticed that it is possible to for example get <4 x i15> in the vector body, with a loop doing store i15 7, i15* ptr, align 2.

So ignoring the type when determining Smallest/Widest type does not completely prevent us from creating vectors with irregular types (as it seems). And that vector could be larger than the max register width. Although, considering that the resulting <4 x i15> vector store in my test will use a packed layout, that case actually looks like a bug (I'll probably end up writing a PR about it).

D80491 has been breaking our ToT builds for a week now. Can this be submitted soon. If not please consider reverting D80491.

fhahn updated this revision to Diff 267703.Jun 1 2020, 12:42 PM

Updated to round down in computeFeasibleMaxVF.

fhahn marked an inline comment as done.Jun 1 2020, 12:50 PM
fhahn added a subscriber: bjope.

Also added a couple of unrelated comments.

I'll take a look at those in a bit, when I have more time.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5011

Perhaps a better fix than D80491 would be (have been ..., as noted in the summary) to round MaxVF down to a power of 2 here, rather than forcing computeFeasibleMaxVF() and/or getSmallestAndWidestTypes() to return powers of 2.

computeFeasibleMaxVF already has a comment indicating that the MaxVF should be a power-of-2 I think. I updated the patch to move the rounding down to computeFeasibleMaxVF, as other code might also implicitly rely on MaxVF being a power-of-2.

It might be good to first ensure MaxVF to be a power of 2 in computeFeasibleMaxVF in the short term and then check if the other uses work as expected with non-power-of-2's subsequently, if required/desired.

I think on most in-tree targets it is unlikely to have much impact, but it would be interesting to hear if the out-of-tree target @bjope mentioned actually supports non-power-of-2 vectorization factors (e.g. vector add of 5 x i40)?

Ayal added a comment.Jun 1 2020, 1:25 PM

D80491 has been breaking our ToT builds for a week now. Can this be submitted soon. If not please consider reverting D80491.

Sorry about this. But note that reverting D80491 will only remove an assert, leading to potentially wrong code being generated silently.
Perhaps additional lit tests could be devised to help exercise failing behaviors(?)

Updated to round down in computeFeasibleMaxVF.

Update looks good to me, thanks! Added a few minor comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5061

Suggest to add to the comment that type sizes may also not be powers of 2.

5061–5062

Rounding WidestRegister down to a power of 2 is no longer needed; rounding-down MaxVectorSize alone suffices to ensure the method returns a power of 2.

5121

This should be redundant; i.e., at this point MaxVF can be asserted to be a power of 2. Only way it might not be is because of getMinimumVF(), which, being a minimum, should probably not be rounded down silently (see above comment).

llvm/test/Transforms/LoopVectorize/X86/fp80-widest-type.ll
8

Comment should be updated. E.g., ";Make sure non-power-of-2 types are handled correctly, i.e., MaxVF is still a power-of-2."

fhahn updated this revision to Diff 267724.Jun 1 2020, 1:53 PM

Adjust comments, drop unnecessary rounding.

fhahn marked 6 inline comments as done.Jun 1 2020, 1:54 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5061–5062

Right, the only other use to compute the loop bound below should be fine.

5121

This should be redundant; i.e., at this point MaxVF can be asserted to be a power of 2. Only way it might not be is because of getMinimumVF(), which, being a minimum, should probably not be rounded down silently (see above comment).

Right. Otherwise I am sure some test case will surface, which can then be added :)

Ayal accepted this revision.Jun 1 2020, 2:21 PM

Thanks!

This revision is now accepted and ready to land.Jun 1 2020, 2:21 PM
This revision was automatically updated to reflect the committed changes.
fhahn marked 2 inline comments as done.
bjope added inline comments.Jun 3 2020, 6:13 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5011

The VF is power-of-2 for the targets we support (but we got some registers and element sizes that aren't power-of-2).

And this patch (as it landed) seem to solve the problems we had with D80491. Thanks!