This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
1/3
LoopAccessAnalysis.cpp
-
test/
-
Analysis/LoopAccessAnalysis/
-
LoopAccessAnalysis/
-
loop-invariant-dep-with-backedge-taken-count.ll
-
Transforms/LoopVectorize/RISCV/
-
LoopVectorize/
-
RISCV/
-
safe-dep-distance.ll

Differential D132703

[LAA] Use BTC to rule out dependences if one ptr is loop invariant.
Needs ReviewPublic

Authored by fhahn on Aug 25 2022, 1:39 PM.

Download Raw Diff

Details

Reviewers

Ayal
anemet
Meinersbur

Summary

isSafeDependenceDistance can also be used to rule out dependences
between loop invariant and loop-variant accesses if we ca prove the
distance between them is larger than the range the accesses will
travel through the execution of the loop.

This should help to avoid regressions after D126533.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > MLIR.Examples/standalone::test.toy
	60,050 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test

Event Timeline

fhahn created this revision.Aug 25 2022, 1:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 1:39 PM

Herald added subscribers: frasercrmck, luismarques, apazos and 19 others. · View Herald Transcript

fhahn requested review of this revision.Aug 25 2022, 1:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 1:39 PM

Herald added subscribers: • pcwang-thead, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B183467: Diff 455705.Aug 25 2022, 1:39 PM

fhahn mentioned this in D131924: [LAA] Prune dependencies with distance large than access implied by trip count.Aug 25 2022, 1:58 PM

reames added a subscriber: reames.Aug 25 2022, 2:45 PM

reames added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1873	Not sure if this is a generalization, or a possible bug, but the fact you're dependent on the loop invariant bit here looks off. My reasoning is as follows, if we know the stride of either access and the trip count, we can describe the region which that pointer can access. It shouldn't matter whether the other region is fixed size (i.e. pointer is loop invariant), or variably sized if we can prove that one region is before or after another. Unless maybe we're trying to reason about the case where we don't know the sign of the distance between the start of the two regions? Knowing that either operand is loop invariant does tell you that region is small. I could see how that might be useful when we can prove distance is far from zero (but potentially negative.) However, in that case, I think there's a missing check to prove that the invariant region is smaller than abs(distance). That is, that the access type is smaller than distance. If it's not, we could have one varying region which starts with one byte offset into the loop invariant region. Honestly, I think this code is made far more confusing by the fact that 0 is used an error value for stride. 0 is a valid stride; we really should be using Optional here instead.
1896	Please drop this below the constant check, and update the comment to note that the following transforms need matching strides. As written, it's unclear why the prior code is correct - if it is.

fhahn mentioned this in D126533: [LAA] Relax pointer dependency with runtime pointer checks .Sep 5 2022, 12:20 AM

dtemirbulatov added a subscriber: dtemirbulatov.Sep 5 2022, 2:45 AM

peterwaller-arm added a subscriber: peterwaller-arm.Sep 8 2022, 2:20 AM

fhahn added inline comments.Sep 9 2022, 10:17 AM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1873	Thanks, I don't think loop invariance is strictly necessary for correctness, just to limit the initial scope of the implementation. Let me look into writing up some test cases and see if it can be easily extended in this patch.

It is a bit irritating that the current logic does not consider which direction the pointers are traveling. E.g. Src = &D[0] and Sink = %D[1+ i] never overlap (with non-negative i), but the code seems to consider Src before or after Sink symmetrically. But that's what the current code does. Changing that as @reames suggested computing memory regions relative to the base pointer might require more some effort, but this smaller change looks fine to me.

If Src and Sink don't use the same base pointer then getMinusSCEV(Sink, Src) will not return anything usable anyway.

igor.kirillov added a subscriber: igor.kirillov.Nov 14 2022, 1:43 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

LoopAccessAnalysis.cpp

41 lines

test/

Analysis/

LoopAccessAnalysis/

loop-invariant-dep-with-backedge-taken-count.ll

7 lines

Transforms/

LoopVectorize/

RISCV/

safe-dep-distance.ll

34 lines

Diff 455705

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,829 Lines • ▼ Show 20 Lines	MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
if (!AIsWrite && !BIsWrite)		if (!AIsWrite && !BIsWrite)
return Dependence::NoDep;		return Dependence::NoDep;

// We cannot check pointers in different address spaces.		// We cannot check pointers in different address spaces.
if (APtr->getType()->getPointerAddressSpace() !=		if (APtr->getType()->getPointerAddressSpace() !=
BPtr->getType()->getPointerAddressSpace())		BPtr->getType()->getPointerAddressSpace())
return Dependence::Unknown;		return Dependence::Unknown;

		if (isa<ScalableVectorType>(ATy) \|\| isa<ScalableVectorType>(BTy))
		return Dependence::Unknown;

int64_t StrideAPtr =		int64_t StrideAPtr =
getPtrStride(PSE, ATy, APtr, InnermostLoop, Strides, true);		getPtrStride(PSE, ATy, APtr, InnermostLoop, Strides, true);
int64_t StrideBPtr =		int64_t StrideBPtr =
getPtrStride(PSE, BTy, BPtr, InnermostLoop, Strides, true);		getPtrStride(PSE, BTy, BPtr, InnermostLoop, Strides, true);

const SCEV *Src = PSE.getSCEV(APtr);		const SCEV *Src = PSE.getSCEV(APtr);
const SCEV *Sink = PSE.getSCEV(BPtr);		const SCEV *Sink = PSE.getSCEV(BPtr);

Show All 11 Lines	MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
ScalarEvolution &SE = *PSE.getSE();		ScalarEvolution &SE = *PSE.getSE();
const SCEV *Dist = SE.getMinusSCEV(Sink, Src);		const SCEV *Dist = SE.getMinusSCEV(Sink, Src);

LLVM_DEBUG(dbgs() << "LAA: Src Scev: " << Src << "Sink Scev: " << Sink		LLVM_DEBUG(dbgs() << "LAA: Src Scev: " << Src << "Sink Scev: " << Sink
<< "(Induction step: " << StrideAPtr << ")\n");		<< "(Induction step: " << StrideAPtr << ")\n");
LLVM_DEBUG(dbgs() << "LAA: Distance for " << *InstMap[AIdx] << " to "		LLVM_DEBUG(dbgs() << "LAA: Distance for " << *InstMap[AIdx] << " to "
<< InstMap[BIdx] << ": " << Dist << "\n");		<< InstMap[BIdx] << ": " << Dist << "\n");

// Need accesses with constant stride. We don't want to vectorize		// Compute the stride to use. If one of the accesses is loop-invariant, use
// "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap in		// the stride of the AddRec. Those cases are only used to try to rule out
// the address space.		// dependences if we can prove the distance between the accesses is larger
if (!StrideAPtr \|\| !StrideBPtr \|\| StrideAPtr != StrideBPtr){		// than the range the accesses will travel through the execution of the loop.
		uint64_t Stride = -1;
		if (StrideAPtr && StrideBPtr && StrideAPtr == StrideBPtr) {
		reamesUnsubmitted Not Done Reply Inline Actions Not sure if this is a generalization, or a possible bug, but the fact you're dependent on the loop invariant bit here looks off. My reasoning is as follows, if we know the stride of either access and the trip count, we can describe the region which that pointer can access. It shouldn't matter whether the other region is fixed size (i.e. pointer is loop invariant), or variably sized if we can prove that one region is before or after another. Unless maybe we're trying to reason about the case where we don't know the sign of the distance between the start of the two regions? Knowing that either operand is loop invariant does tell you that region is small. I could see how that might be useful when we can prove distance is far from zero (but potentially negative.) However, in that case, I think there's a missing check to prove that the invariant region is smaller than abs(distance). That is, that the access type is smaller than distance. If it's not, we could have one varying region which starts with one byte offset into the loop invariant region. Honestly, I think this code is made far more confusing by the fact that 0 is used an error value for stride. 0 is a valid stride; we really should be using Optional here instead. reames: Not sure if this is a generalization, or a possible bug, but the fact you're dependent on the…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, I don't think loop invariance is strictly necessary for correctness, just to limit the initial scope of the implementation. Let me look into writing up some test cases and see if it can be easily extended in this patch. fhahn: Thanks, I don't think loop invariance is strictly necessary for correctness, just to limit the…
		Stride = std::abs(StrideAPtr);
		} else if (StrideAPtr && SE.isLoopInvariant(Sink, InnermostLoop)) {
		Stride = std::abs(StrideAPtr);
		} else if (StrideBPtr && SE.isLoopInvariant(Src, InnermostLoop)) {
		Stride = std::abs(StrideBPtr);
		} else {
		// Need at least one access with constant stride. We don't want to vectorize
		// "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap
		// in the address space.
LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n");		LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n");
return Dependence::Unknown;		return Dependence::Unknown;
}		}

auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();		auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();
uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);		uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);
bool HasSameSize =		bool HasSameSize =
DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);		DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);
uint64_t Stride = std::abs(StrideAPtr);
const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
if (!C) {
if (!isa<SCEVCouldNotCompute>(Dist) && HasSameSize &&		if (!isa<SCEVCouldNotCompute>(Dist) && HasSameSize &&
isSafeDependenceDistance(DL, SE, (PSE.getBackedgeTakenCount()), Dist,		isSafeDependenceDistance(DL, SE, (PSE.getBackedgeTakenCount()), Dist,
Stride, TypeByteSize))		Stride, TypeByteSize))
return Dependence::NoDep;		return Dependence::NoDep;

		// Need accesses with constant stride. We don't want to vectorize
		reamesUnsubmitted Not Done Reply Inline Actions Please drop this below the constant check, and update the comment to note that the following transforms need matching strides. As written, it's unclear why the prior code is correct - if it is. reames: Please drop this below the constant check, and update the comment to note that the *following…
		// "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap in
		// the address space.
		if (!StrideAPtr \|\| !StrideBPtr \|\| StrideAPtr != StrideBPtr) {
		LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n");
		return Dependence::Unknown;
		}

		const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
		if (!C) {
LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");		LLVM_DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
FoundNonConstantDistanceDependence = true;		FoundNonConstantDistanceDependence = true;
return Dependence::Unknown;		return Dependence::Unknown;
}		}

const APInt &Val = C->getAPInt();		const APInt &Val = C->getAPInt();
int64_t Distance = Val.getSExtValue();		int64_t Distance = Val.getSExtValue();

▲ Show 20 Lines • Show All 837 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopAccessAnalysis/loop-invariant-dep-with-backedge-taken-count.ll

	Show All 31 Lines

	exit:			exit:
	ret void			ret void
	}			}

	define void @test_distance_much_greater_than_BTC_100(ptr %a) {			define void @test_distance_much_greater_than_BTC_100(ptr %a) {
	; CHECK-LABEL: Loop access info in function 'test_distance_much_greater_than_BTC_100':			; CHECK-LABEL: Loop access info in function 'test_distance_much_greater_than_BTC_100':
	; CHECK-NEXT: loop:			; CHECK-NEXT: loop:
	; CHECK-NEXT: Report: unsafe dependent memory operations in loop.			; CHECK-NEXT: Memory dependences are safe
	; CHECK-NEXT: Unknown data dependence.
	; CHECK-NEXT: Dependences:			; CHECK-NEXT: Dependences:
	; CHECK-NEXT: Unknown:
	; CHECK-NEXT: %l = load i32, ptr %gep.x, align 4 ->
	; CHECK-NEXT: store i32 %l, ptr %gep, align 4
	; CHECK-EMPTY:
	; CHECK-NEXT: Run-time memory checks:			; CHECK-NEXT: Run-time memory checks:
	; CHECK-NEXT: Grouped accesses:			; CHECK-NEXT: Grouped accesses:
	; CHECK-EMPTY:			; CHECK-EMPTY:
	;			;
	entry:			entry:
	%gep.x = getelementptr i32, ptr %a, i32 200			%gep.x = getelementptr i32, ptr %a, i32 200
	br label %loop			br label %loop

	▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -riscv-v-vector-bits-min=-1 -mtriple riscv64-linux-gnu -mattr=+v,+f -S 2>%t \| FileCheck %s			; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -riscv-v-vector-bits-min=-1 -mtriple riscv64-linux-gnu -mattr=+v,+f -S 2>%t \| FileCheck %s

	target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"			target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
	target triple = "riscv64"			target triple = "riscv64"

	; Dependence distance between read and write is greater than the trip			; Dependence distance between read and write is greater than the trip
	; count of the loop. Thus, values written are never read for any			; count of the loop. Thus, values written are never read for any
	; valid vectorization of the loop.			; valid vectorization of the loop.
	define void @test(ptr %p) {			define void @test(ptr %p) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 200, [[TMP0]]
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 200, [[TMP1]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 200, [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr i64, ptr [[P:%.]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr i64, ptr [[P:%.]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[TMP1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr i64, ptr [[TMP3]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 32			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x i64>, ptr [[TMP4]], align 32
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP0]], 200			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP2]], 200
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr i64, ptr [[P]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr i64, ptr [[P]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP4]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr i64, ptr [[TMP6]], i32 0
	; CHECK-NEXT: store <2 x i64> [[WIDE_LOAD]], ptr [[TMP5]], align 32			; CHECK-NEXT: store <vscale x 1 x i64> [[WIDE_LOAD]], ptr [[TMP7]], align 32
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 200, 200			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 200, [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[A1:%.*]] = getelementptr i64, ptr [[P]], i64 [[IV]]			; CHECK-NEXT: [[A1:%.*]] = getelementptr i64, ptr [[P]], i64 [[IV]]
	; CHECK-NEXT: [[V:%.*]] = load i64, ptr [[A1]], align 32			; CHECK-NEXT: [[V:%.*]] = load i64, ptr [[A1]], align 32
	; CHECK-NEXT: [[OFFSET:%.*]] = add i64 [[IV]], 200			; CHECK-NEXT: [[OFFSET:%.*]] = add i64 [[IV]], 200
	; CHECK-NEXT: [[A2:%.*]] = getelementptr i64, ptr [[P]], i64 [[OFFSET]]			; CHECK-NEXT: [[A2:%.*]] = getelementptr i64, ptr [[P]], i64 [[OFFSET]]
	; CHECK-NEXT: store i64 [[V]], ptr [[A2]], align 32			; CHECK-NEXT: store i64 [[V]], ptr [[A2]], align 32
	▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines