This is an archive of the discontinued LLVM Phabricator instance.

LoopVectorize: handle casted indvars in iv-select-cmp
AbandonedPublic

Authored by artagnon on Aug 14 2023, 6:06 AM.

Details

Summary

As a follow-up to D150851, handle casted indvars in cases where a
runtime-check isn't necessary, hence vectorizing:

int test(int *a, int n) {
  int rdx = 331;
  for (int i = 0; i < n; i++) {
    if (a[i] > 3)
      rdx = i;
  }
  return rdx;
}

D150851 looks for the nsw flag on the increment of the indvar, and
concludes that the indvar can't wrap, and hence can't hit the sentinel
value:

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1

The issue with vectorizing the above example is that IndVarSimplify
comes along and truncates the indvar as shown below:

%1 = trunc i64 %indvars.iv to i32
%spec.select = select i1 %cmp1, i32 %1, i32 %rdx.06

Now, the loop bounds are still on i64, so this truncated indvar may
still overflow, hitting the sentinel value:

%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

However, the exit condition of the loop, %wide.trip.count, has been
widened from an i32 %n by IndVarSimplify:

%wide.trip.count = zext i32 %n to i64
br label %for.body

This tells us that %n was orignally an i32, but we don't know if it is
signed. Now, from the loop guard, we know that %n was originally an i32,
and that is signed:

%cmp5 = icmp sgt i32 %n, 0
br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup

This patch pattern-matches a cast in the select, and an icmp in the loop
guard, which could perhaps be introduced by IndVarSimplify, and
determines when a truncated indvar can't overflow.

Diff Detail