This is an archive of the discontinued LLVM Phabricator instance.

[WIP] [LAA] Allow runtime checks when strides different but address space does not wrap around
AbandonedPublic

Authored by anna on Aug 13 2018, 10:00 AM.

Download Raw Diff

Details

Reviewers

sbaranga
anemet

Summary

Currently we allow runtime checks when the distance between source and sink is
not a constant in the loop, but the strides for two memory accesses are the same
(and are addrecs).

This change handles the reverse case: the distance between the source and the
sink is a constant in the loop, but the strides for the memory accesses are not
the same.

I have not added any test cases yet. There may even be changes needed to the
runtime check generation to teach this case. However, this first patch is to
see if I missed some obvious legality constraint.

Given the strides are addrecs, we are guaranteed they do not wrap around, so
wide loads in vectorization is still legal.

If we have 2 memory accesses: load of array A and store into arrayB, where
distance = sink - source is a loop invariant computation.
Stride of ArrayA = z. Stride of ArrayB = y.

The runtime check needed is ArrayA + (z * trip count) - { ArrayB + (y * trip
count) } > Vectorization factor.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 21393
Build 21393: arc lint + arc unit

Event Timeline

anna created this revision.Aug 13 2018, 10:00 AM

Harbormaster completed remote builds in B21393: Diff 160381.Aug 13 2018, 10:02 AM

anna retitled this revision from [LAA] Allow runtime checks when strides different but address space does not wrap around to [WIP] [LAA] Allow runtime checks when strides different but address space does not wrap around.Aug 13 2018, 11:11 AM

ping

Hi Anna,

If the distance between the source and the sink is loop invariant, how can these have different strides (unless they are also loop invariant)? Could you give an example?

Thanks,
Silviu

lib/Analysis/LoopAccessAnalysis.cpp
1473	Not sure that we can rely on pointer arithmetic not wrapping, but getPtrStride should already version the loop to make sure that pointer wrapping doesn't happen.

In D50644#1204214, @sbaranga wrote:

Hi Anna,

If the distance between the source and the sink is loop invariant, how can these have different strides (unless they are also loop invariant)? Could you give an example?

Thanks,
Silviu

Hi Silviu,
I could not get a test case for this specific patch, but let me start with the original motivation.

I will reduce the test case. The debug-info shows something like this:

Pointer access with non-constant stride
LAA: Bad stride - Not an AddRecExpr pointer   %tmp97 = getelementptr inbounds i8, i8 addrspace(1)* %tmp89, i64 62 SCEV: (62 + %tmp89)<nsw>
LAA: Bad stride - Not an AddRecExpr pointer   %tmp98 = getelementptr inbounds i8, i8 addrspace(1)* %tmp89, i64 66 SCEV: (66 + %tmp89)<nsw>
LAA: Src Scev: (62 + %tmp89)<nsw>Sink Scev: (66 + %tmp89)<nsw>(Induction step: 0)
LAA: Distance for   store i8 1, i8 addrspace(1)* %tmp97, align 2, !tbaa !85, !azul-base-pointer-java-type !33 to   store i8 0, i8 addrspace(1)* %tmp98, align 1, !tbaa !86, !azul-base-pointer-java-type !33: 4

%tmp89 is a header phi which is loop varying.
The strides were not add rec pointers, but the distance between source and sink scev is constant.

I initially had a patch like this which allowed vectorizing the loop in question with runtime checks:

const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
if (!StrideAPtr || !StrideBPtr || StrideAPtr != StrideBPtr){
   if (C || PSE.getSE()->isLoopInvariant(Dist, InnermostLoop))
      ShouldRetryWithRuntimeCheck = true;
    return Dependence::Unknown;
}

However, that was failing a test case in pointer-with-unknown-bounds.ll.
; for (i = 0; i < 20; ++i)
; A[i*i] *= 2;

(we were saying it's legal with runtime checks and generating a vector loop - I can see this being a vector gather and scatter, once we prove i*i doesn't overflow).

I don't understand why we should care about the stride being an addrecPtr for the vectorizer though, because the vectorizer has it's own SCEV overflow checks before deciding on vectorizing (and generating a wide load). What am I missing here?

Thanks,
Anna

For pointer-with-unknown-bounds.ll, I think the point was just checking that we don't vectorize in the case where the SCEV expressions for the pointers are not affine.

IIRC with ShouldRetryWithRuntimeCheck set the runtime checks will try to prove for two expressions A and B that the intervals [min(A), max(A)] and [min(B), max(B)] are disjoint. It's somewhat tricky to perform this check with non-affine expressions.

In theory this should be fine as long as we can emit the correct run-time check.

I might be missing something, but in your example, if %tmp89 is loop-varying I'm not sure how we can emit the check at all since we don't have any strides available?

abandoning this change - lost context.

Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2020, 11:33 AM

Revision Contents

Path

Size

lib/

Analysis/

LoopAccessAnalysis.cpp

9 lines

Diff 160381

lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,457 Lines • ▼ Show 20 Lines	MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,

const SCEV *Dist = PSE.getSE()->getMinusSCEV(Sink, Src);		const SCEV *Dist = PSE.getSE()->getMinusSCEV(Sink, Src);

LLVM_DEBUG(dbgs() << "LAA: Src Scev: " << Src << "Sink Scev: " << Sink		LLVM_DEBUG(dbgs() << "LAA: Src Scev: " << Src << "Sink Scev: " << Sink
<< "(Induction step: " << StrideAPtr << ")\n");		<< "(Induction step: " << StrideAPtr << ")\n");
LLVM_DEBUG(dbgs() << "LAA: Distance for " << *InstMap[AIdx] << " to "		LLVM_DEBUG(dbgs() << "LAA: Distance for " << *InstMap[AIdx] << " to "
<< InstMap[BIdx] << ": " << Dist << "\n");		<< InstMap[BIdx] << ": " << Dist << "\n");

		const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
// Need accesses with constant stride. We don't want to vectorize		// Need accesses with constant stride. We don't want to vectorize
// "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap in		// "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap in
// the address space.		// the address space.
if (!StrideAPtr \|\| !StrideBPtr \|\| StrideAPtr != StrideBPtr){		if (!StrideAPtr \|\| !StrideBPtr \|\| StrideAPtr != StrideBPtr){
LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n");		LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n");
		// Given StrideAPtr and StrideBPtr exists, we know the pointer arithmetic
		// does not wrap. Even if they are not the same stride, if the distance
		sbarangaUnsubmitted Not Done Reply Inline Actions Not sure that we can rely on pointer arithmetic not wrapping, but getPtrStride should already version the loop to make sure that pointer wrapping doesn't happen. sbaranga: Not sure that we can rely on pointer arithmetic not wrapping, but getPtrStride should already…
		// between source and sink is constant wrt the loop, then let us try with
		// runtime checks.
		if (StrideAPtr && StrideBPtr &&
		(C \|\| PSE.getSE()->isLoopInvariant(Dist, InnermostLoop)))
		ShouldRetryWithRuntimeCheck = true;
return Dependence::Unknown;		return Dependence::Unknown;
}		}

Type *ATy = APtr->getType()->getPointerElementType();		Type *ATy = APtr->getType()->getPointerElementType();
Type *BTy = BPtr->getType()->getPointerElementType();		Type *BTy = BPtr->getType()->getPointerElementType();
auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();		auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();
uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);		uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);
uint64_t Stride = std::abs(StrideAPtr);		uint64_t Stride = std::abs(StrideAPtr);
const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
if (!C) {		if (!C) {
if (TypeByteSize == DL.getTypeAllocSize(BTy) &&		if (TypeByteSize == DL.getTypeAllocSize(BTy) &&
isSafeDependenceDistance(DL, *(PSE.getSE()),		isSafeDependenceDistance(DL, *(PSE.getSE()),
(PSE.getBackedgeTakenCount()), Dist, Stride,		(PSE.getBackedgeTakenCount()), Dist, Stride,
TypeByteSize))		TypeByteSize))
return Dependence::NoDep;		return Dependence::NoDep;

LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");		LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
▲ Show 20 Lines • Show All 890 Lines • Show Last 20 Lines