This is an archive of the discontinued LLVM Phabricator instance.

[LV] Mark increment of main vector loop induction variable as NUW.
ClosedPublic

Authored by fhahn on May 27 2021, 7:56 AM.

Details

Summary

This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.

If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.

This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.

At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.

Note that this could probably be further improved by using information
from the original IV.

Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g

Part of a set of fixes required for PR50412.

Diff Detail

Event Timeline

fhahn created this revision.May 27 2021, 7:56 AM
fhahn requested review of this revision.May 27 2021, 7:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2021, 7:56 AM
mkazantsev added inline comments.May 31 2021, 2:28 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3085

Just to clarify, what exactly is the semantics of End here? Is it last used value of IV or the last computed value (that may be not used)?

fhahn added inline comments.May 31 2021, 4:31 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3085

End is the trip-count of the original loop, clamped down to the closest value that is divisible by Step. It's compared against against the last computed value of the IV (Next) to determine whether to exit the loop or now (compare at line 3096)

fhahn edited the summary of this revision. (Show Details)May 31 2021, 12:01 PM
mkazantsev accepted this revision.Jun 1 2021, 8:31 PM

Looks legit, thanks!

This revision is now accepted and ready to land.Jun 1 2021, 8:31 PM
reames added a comment.Jun 3 2021, 9:18 AM

LGTM as well. I'm not super familiar with this code, but I skimmed the callers and the necessary conditions seemed to be locally implied.

As an aside, do you have a test case where SCEV can't figure this out? I've got D103118 out for review now, and am interested in other such cases.

This revision was landed with ongoing or failed builds.Jun 7 2021, 2:50 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Jun 7 2021, 2:58 PM

LGTM as well. I'm not super familiar with this code, but I skimmed the callers and the necessary conditions seemed to be locally implied.

As an aside, do you have a test case where SCEV can't figure this out? I've got D103118 out for review now, and am interested in other such cases.

The motivating case is something like the function below, where nuw improves the results. There are a few other improvements I am planning as follow-up.

define i32 @f(i32 %n) {
entry:
  %cmp5 = icmp ult i32 %n, 13
  tail call void @llvm.assume(i1 %cmp5)
  %cmp16 = icmp ult i32 %n, 1
  br i1 %cmp16, label %middle.block, label %if.end.preheader

if.end.preheader:                                 ; preds = %entry
  %min.iters.check = icmp ult i32 %n, 8
  br i1 %min.iters.check, label %middle.block, label %vector.ph

vector.ph:                                        ; preds = %if.end.preheader
  %n.mod.vf = urem i32 %n, 8
  %n.vec = sub i32 %n, %n.mod.vf
  %ind.end = sub i32 %n, %n.vec
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %index.next = add nuw i32 %index, 8
  %ec = icmp eq i32 %index.next, %n.vec
  br i1 %ec, label %middle.block, label %vector.body

middle.block:
  ret i32 10
}

; Function Attrs: inaccessiblememonly nofree nosync nounwind willreturn mustprogress
declare void @llvm.assume(i1 noundef %0) #1
Ayal added inline comments.Jun 9 2021, 3:03 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3092

Wrapping of %IV + %Step is related to wrapping of %End = %BackedgeTakenCount + 1, i.e., when %End wraps around to zero. In this case, minimum iteration checks indeed redirects away from the vector loop, unless the tail is folded.
However, it is conceivable to avoid this redirection and vectorize loops whose %End wraps around to zero, i.e., whose %IV + %Step also wraps; see D75746.

An alternative direction to ensure that %IV+%Step of vectorized loops don't wrap, even if their original %End does wrap and also when their tails are folded, is to control the vector loop with a unit Step %IV, whose upper bound is at most half of the original %End. If a vector %IV is also required, one can be constructed in parallel, or computed from %IV * %Step + <0,1,...,%Step-1>; eventually, LSR should know best to optimize them. Sounds reasonable?