This is an archive of the discontinued LLVM Phabricator instance.

[WIP] [LAA] Allow runtime checks when strides different but address space does not wrap around
AbandonedPublic

Authored by anna on Aug 13 2018, 10:00 AM.

Details

Reviewers
sbaranga
anemet
Summary

Currently we allow runtime checks when the distance between source and sink is
not a constant in the loop, but the strides for two memory accesses are the same
(and are addrecs).

This change handles the reverse case: the distance between the source and the
sink is a constant in the loop, but the strides for the memory accesses are not
the same.

I have not added any test cases yet. There may even be changes needed to the
runtime check generation to teach this case. However, this first patch is to
see if I missed some obvious legality constraint.

Given the strides are addrecs, we are guaranteed they do not wrap around, so
wide loads in vectorization is still legal.

If we have 2 memory accesses: load of array A and store into arrayB, where
distance = sink - source is a loop invariant computation.
Stride of ArrayA = z. Stride of ArrayB = y.

The runtime check needed is ArrayA + (z * trip count) - { ArrayB + (y * trip
count) } > Vectorization factor.

Diff Detail

Event Timeline

anna created this revision.Aug 13 2018, 10:00 AM
anna retitled this revision from [LAA] Allow runtime checks when strides different but address space does not wrap around to [WIP] [LAA] Allow runtime checks when strides different but address space does not wrap around.Aug 13 2018, 11:11 AM

Hi Anna,

If the distance between the source and the sink is loop invariant, how can these have different strides (unless they are also loop invariant)? Could you give an example?

Thanks,
Silviu

lib/Analysis/LoopAccessAnalysis.cpp
1473

Not sure that we can rely on pointer arithmetic not wrapping, but getPtrStride should already version the loop to make sure that pointer wrapping doesn't happen.

anna added a comment.Aug 17 2018, 10:26 AM

Hi Anna,

If the distance between the source and the sink is loop invariant, how can these have different strides (unless they are also loop invariant)? Could you give an example?

Thanks,
Silviu

Hi Silviu,
I could not get a test case for this specific patch, but let me start with the original motivation.

I will reduce the test case. The debug-info shows something like this:

Pointer access with non-constant stride
LAA: Bad stride - Not an AddRecExpr pointer   %tmp97 = getelementptr inbounds i8, i8 addrspace(1)* %tmp89, i64 62 SCEV: (62 + %tmp89)<nsw>
LAA: Bad stride - Not an AddRecExpr pointer   %tmp98 = getelementptr inbounds i8, i8 addrspace(1)* %tmp89, i64 66 SCEV: (66 + %tmp89)<nsw>
LAA: Src Scev: (62 + %tmp89)<nsw>Sink Scev: (66 + %tmp89)<nsw>(Induction step: 0)
LAA: Distance for   store i8 1, i8 addrspace(1)* %tmp97, align 2, !tbaa !85, !azul-base-pointer-java-type !33 to   store i8 0, i8 addrspace(1)* %tmp98, align 1, !tbaa !86, !azul-base-pointer-java-type !33: 4

%tmp89 is a header phi which is loop varying.
The strides were not add rec pointers, but the distance between source and sink scev is constant.

I initially had a patch like this which allowed vectorizing the loop in question with runtime checks:

const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
if (!StrideAPtr || !StrideBPtr || StrideAPtr != StrideBPtr){
   if (C || PSE.getSE()->isLoopInvariant(Dist, InnermostLoop))
      ShouldRetryWithRuntimeCheck = true;
    return Dependence::Unknown;
}

However, that was failing a test case in pointer-with-unknown-bounds.ll.
; for (i = 0; i < 20; ++i)
; A[i*i] *= 2;

(we were saying it's legal with runtime checks and generating a vector loop - I can see this being a vector gather and scatter, once we prove i*i doesn't overflow).

I don't understand why we should care about the stride being an addrecPtr for the vectorizer though, because the vectorizer has it's own SCEV overflow checks before deciding on vectorizing (and generating a wide load). What am I missing here?

Thanks,
Anna

For pointer-with-unknown-bounds.ll, I think the point was just checking that we don't vectorize in the case where the SCEV expressions for the pointers are not affine.

IIRC with ShouldRetryWithRuntimeCheck set the runtime checks will try to prove for two expressions A and B that the intervals [min(A), max(A)] and [min(B), max(B)] are disjoint. It's somewhat tricky to perform this check with non-affine expressions.

In theory this should be fine as long as we can emit the correct run-time check.

I might be missing something, but in your example, if %tmp89 is loop-varying I'm not sure how we can emit the check at all since we don't have any strides available?

anna abandoned this revision.Mar 5 2020, 11:33 AM

abandoning this change - lost context.

Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2020, 11:33 AM