This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Add overflow checks when tail-folding with scalable vectors
ClosedPublic

Authored by david-arm on May 9 2022, 8:04 AM.

Details

Summary

In InnerLoopVectorizer::getOrCreateVectorTripCount there is an
assert that the known minimum value for the VF is a power of 2
when tail-folding is enabled. However, for scalable vectors the
value of vscale may not be a power of 2, which means we have
to worry about the possibility of overflow. I have solved this
problem by adding preheader checks that prevent us from entering
the vector body if the canonical IV would overflow, i.e.

if ((IntMax - TripCount) < (VF * UF)) ... skip vector loop ...

Diff Detail

Event Timeline

david-arm created this revision.May 9 2022, 8:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 8:04 AM
david-arm requested review of this revision.May 9 2022, 8:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 8:04 AM
sdesmalen added inline comments.May 12 2022, 6:59 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2990

The interface of getBitMask returns a uint64_t, which means this functionality is limited to loops with i64 induction variables. I don't know whether this is a limitation that's already assumed elsewhere, but perhaps you can use getMask() instead (which returns an APInt).

paulwalker-arm added inline comments.May 12 2022, 8:14 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2981–2983

Can you remove this? The following paragraph pretty much implies the same and I don't see such a langref change ever happening.

2984

Perhaps "VScale is not necessarily a power-of-2, which means we cannot guarantee an overflow to zero when updating induction variables and so an additional overflow check is required before entering the vector loop."?

2993

Not sure if this is better so feel free to ignore but what about doing n + (VF * UF) < n? to remove the need for the mask.

david-arm updated this revision to Diff 429165.May 13 2022, 1:35 AM
  • Addressed review comments!
david-arm marked 4 inline comments as done.May 13 2022, 1:36 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2993

Strangely enough, after the DCE and InstCombine passes the LLVM IR ends up being identical! So I just left it in the original form.

paulwalker-arm accepted this revision.May 13 2022, 3:00 AM
paulwalker-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2987

There are now three uses or this so perhaps worth breaking out?

This revision is now accepted and ready to land.May 13 2022, 3:00 AM
sdesmalen added inline comments.May 13 2022, 3:04 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2986

nit: MaxUIntTripCount ?

2988

If you change the name of 'LHS' above, then you could propagate this into the ICmp expression below

sdesmalen accepted this revision.May 13 2022, 3:05 AM
This revision was automatically updated to reflect the committed changes.
david-arm marked an inline comment as done.