This is an archive of the discontinued LLVM Phabricator instance.

[LV] Clamp MaxVF to power of 2
ClosedPublic

Authored by Ayal on May 24 2020, 9:16 AM.

Details

Summary

If a loop has a constant trip count known to be a multiple of MaxVF (times user UF), LV infers that no tail will be generated for any chosen VF. This relies on the chosen VF's being powers of 2 bound by MaxVF, and assumes MaxVF is a power of 2.
Make sure the latter holds, in particular when MaxVF is set by a memory dependence distance which may not be a power of 2.

Diff Detail

Event Timeline

Ayal created this revision.May 24 2020, 9:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2020, 9:16 AM
fhahn accepted this revision.May 24 2020, 10:25 AM

LGTM, thanks!

llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll
108

nit both loop and vectorize.enable metadata not needed?

This revision is now accepted and ready to land.May 24 2020, 10:25 AM
Ayal marked an inline comment as done.May 24 2020, 3:34 PM
Ayal added inline comments.
llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll
108

they are needed in order to get the loop vectorized, eventually with VF=2 and a folded tail.

This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Jun 1 2020, 7:06 AM
bjope added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5056

This has caused some problems for us (downstream) and I'm not really sure how to deal with it.

Maybe this never happens for in-tree targets, but our target will return 160 for PowerOf2Floor. And I haven't really seen anything that says that TTI.getRegisterBitWidth(true) must return a power of 2 number.
Another, perhaps abnormal, thing is that our frontend support i40 types. So SmallestType/WidestType isn't guaranteed to be power-of-2 either.

Even if we actually want to scalarize operations using i40, doing PowerOf2Floor makes WidestRegister=128. And then we won't get a power-of-2 result from this method, since 128/40 isn't a power of 2 (considering that SmallestType/WidestType may end up not being a power-of-2). And then we trigger the new assert in computeMaxVF.

So in some sense this patch limits the types allowed in loops to be power-of-2-sized. And returning something that isn't a power-of-two from TTI.getRegisterBitWidth is now stupid.

Maybe there are other ways to ensure MaxVF is a power-of-2, or were these implications for getSmallestAndWidestTypes and getRegisterBitWidth intentional?

fhahn added inline comments.Jun 1 2020, 7:44 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5056

There's currently a fix being discussed: D80870

bjope added inline comments.Jun 1 2020, 8:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5056

Ah, thanks! I had not seen that one.