This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
19
LoopAccessAnalysis.cpp
-
test/
-
Analysis/LoopAccessAnalysis/
-
LoopAccessAnalysis/
-
multiple-strides-rt-memory-checks.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
-
multiple-strides-vectorization.ll
1
pr31098.ll

Differential D28044

[LV/LoopAccess] Check statically if an unknown dependence distance can be proven larger than the loop-count
ClosedPublic

Authored by dorit on Dec 21 2016, 11:48 PM.

Download Raw Diff

Details

Reviewers

silviu.baranga
anemet
Ayal
mkuper
hfinkel

Commits

rGeac89d736c85: [LV/LoopAccess] Check statically if an unknown dependence distance can be…
rL294892: [LV/LoopAccess] Check statically if an unknown dependence distance can be

Summary

This fixes PR31098: Try to resolve statically data-dependences whose compile-time-unknown distance can be proven larger than the loop-count, instead of resorting to runtime dependence checking (which are not always possible).

(A couple existing testcases assumed that runtime tests will be generated, but with this patch the runtime tests are no longer generated because a dependence with an unknown distance is resolved statically; I changed the loop-count in these tests so that we will not be able to resolve the dependence statically; This way these tests can continue to verify that the runtime tests can be created for the accesses at hand.)

Diff Detail

Event Timeline

dorit updated this revision to Diff 82308.Dec 21 2016, 11:48 PM

dorit retitled this revision from to [LV/LoopAccess] Check statically if an unknown dependence distance can be proven larger than the loop-count.

dorit updated this object.

dorit added reviewers: mkuper, silviu.baranga.

dorit added a subscriber: llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptDec 21 2016, 11:48 PM

dorit added reviewers: Ayal, anemet, hfinkel.Dec 25 2016, 12:16 AM

delena added a subscriber: delena.Dec 25 2016, 11:27 PM

discovered I wasn't careful about not losing bits when matching the types of the MinusExpr operands... Now fixed.

Hi Dorit, sorry for the delay - I don't know SCEV and LAA well enough yet, and I spent some time happily ignoring this, without noticing nobody else reviewed it either. :)

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1351	Can this be an early exit?
1358	Can you use getMaxBackedgeTakenCount()? If the count is precisely known, then they should be equivalent. If it's not, then since by definition MaxBackedgeTakenCount >= BackedgeTakenCount, the equation above also holds. (I'm not sure if ever vectorize when we don't have a precise getBackedgeTakenCount(), but even if we don't, no need for extra constraints here)
1373	(Sorry for possibly rehashing the discussion you had with Silviu) The code looks fine, but I'm not entirely sure I understand why this is the best way to approach this. I've read through PR31098, and it probably contains the information I'm looking for, but it's spread over separate messages and it's a bit hard for me to follow. Why can't you offload this into SCEV? Is the problem representing \|dist\| as a SCEV, or that SCEV can't prove \|dist\| - product is positive? While this may be a good idea independently, wouldn't cases like PR31098, in general be better served by improving alias analysis, and then being able to prove that 8D >= 0?

Thanks Michael!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1351	yes, I'll change that
1358	Ok; I gave this a try, but getMaxBackedgeTakenCount is giving me -1… maybe I'm using it wrong?: Const SCEV *MaxBackedgeTakenCount = PSE.getSE()->getMaxBackedgeTakenCount(InnermostLoop); I don't see any uses of this API in the vectorizer/LoopAccesses; is it supposed to work also when predicates are added to compute the BackedgeCount? (BTW, we indeed never get here if getBackedgeTakenCount is UnknownSCEV. it' a requirement in canAnalyzeLoop).
1373	AFAICS, SCEV can't answer the questions isKnownPositive(dist)/isKnownNonNegative(dist)/… for the expression in question. It looks like SCEV can tell that D = (%size /u 2) is positive, but it can't tell that 8D is positive... (where in fact it is even strictly positive if we know we entered the loop). So I think there's room for improvement in that respect. But in any case, even if/when that is improved, there still may be scenarios in which we can't answer the isKnownPositive question about dist, but we can successfully prove that either (dist - product > 0) or (-dist - product > 0) simply because things get canceled out... (After all, the goal is not to compute \|dist\|, but to prove the inequality...). By improving alias analysis you refer to my suggestion to be able to anti-alias structure fields? If so then yes, sure, that would help prune irrelevant dependencies. (But if we had a regular array strided access like so: a[2i+1], a[2i+D] then alias analysis would not come to the rescue… we'd have to deal with dist = 8D-4 for which 8*D>=0 is not sufficient)

dorit added inline comments.Jan 9 2017, 12:35 AM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1358	getMaxBackedgeTakenCount is giving me -1 probably this related to the PR you opened (PR31412)? (although this is not in the remainder loop...?)

mkuper added inline comments.Jan 9 2017, 4:51 PM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1358	Err, yes, you're right, this is currently broken. Sorry for the confusion.
1373	What I mean is that this holds unconditionally: (X - Y > 0 \|\| -X - Y > 0) <=> \|X\| - Y > 0 Would it make sense to sink this inference into SCEV itself, instead of only using it here? If not, is it because the SCEV representation of \|X\| is too hairy? Yes, that's what I meant. And you're right, we probably want both.

+Sanjoy

dorit added inline comments.Jan 10 2017, 2:14 PM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1373	about 1: I don't know that we have a canonical way for abs representation in SCEV, so I guess I would classify this as "hairy" :-) but better if the SCEV experts would comment on this.

changed one condition to early-exit, as requested.

anemet added inline comments.Jan 12 2017, 9:09 AM

../llvm/lib/Analysis/LoopAccessAnalysis.cpp
1303–1344 ↗	(On Diff #84090)	Can this be separated out into its own function? isDependent is already a largish function.

separated out into a new function. (good suggestion! thanks).

Dorit, I am really sorry that it took me sooo much time to look at this for real. The only thing that is missing is an LAA-specific test.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1224–1229	Use the same variable names as in the formula or the other way?
1241–1245	The comment is actually slightly misleading since you perform subtraction to detect this. @sanjoy, are these steps reasonably efficient to prove the property at line 1224 with SCEV?
llvm/test/Transforms/LoopVectorize/pr31098.ll
1–2	Please add an LAA-only test in addition to this under Test/Analysis/LoopAccessAnalysis.

This revision now requires changes to proceed.Jan 23 2017, 9:11 AM

added testcase and addressed other comments.

I did not understand why this works (I've added some questions inline). Some "constructive" reasoning (not necessarily a proof) for why isSafeDependenceDistance is correct may help (i.e. we know X and therefore we know Y etc.). If this is already documented / discussed elsewhere then a reference to that discussion / documentation will be helpful too.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1208	Nit: "non-constant"
1225	How does this work with a negative `BackedgeTakenCount * Step`? For instance, in: i = 0; do { r0 = out[i + 1]; out[i] = r1; } while (--i != -2); `D` is `1` (or `-1`), `BackedgeTakenCount` is `2` and `Step` is `-1`, so `\|Dist\| > BackedgeTakenCount * Step` is true (I assume you intend to use a signed `>` ?). However, on the first iteration we write to `out[0]`, and we read from it on the second iteration. Or is this somehow not the kind of dependence you're looking for?
1237	Can you add a comment for why you're zero extending `Product` but sign extending `Dist`?
1241	How do you know that `Dist - (BackedgeTakenCount * Step)` won't overflow? I.e. why can't `Dist` be `i8 128` and `BackedgeTakenCount * Step` be `i8 1`? Skimming through LAA, it looks like it guards against overflow in the addressing expressions themselves, but I think the situation above can happen even if addressing expressions themselves do not overflow. for (i8 i = 0; i < 2; i++) { r0 = out[i - 64]; out[i + 64] = r1; }

In D28044#661369, @sanjoy wrote:

I did not understand why this works (I've added some questions inline). Some "constructive" reasoning (not necessarily a proof) for why isSafeDependenceDistance is correct may help (i.e. we know X and therefore we know Y etc.). If this is already documented / discussed elsewhere then a reference to that discussion / documentation will be helpful too.

Thanks for looking closely into this.

I guess clarifying in the documentation that "Step" is the *absolute* stride would help… And I can also add the following:

"
We check here if the absolute distance (|Dist/Step|) is <= the loop iteration count.
This is equivalent to the Strong SIV Test (Practical Dependence Testing , Section 4.2.1);

Note, that for vectorization it is sufficient to prove that the dependence distance is >= VF; This is checked elsewhere.
But in some cases we can prune unknown dependence distances early, and even before selecting the VF, and without a runtime test, by comparing the distance against the loop iteration count. Since the vectorized code will be executed only if LoopCount >= VF, proving that the distance >= LoopCount also guarantees that distance >= VF."

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1225	Step cannot be negative. It is the absolute Stride in bytes. I will clarify that in the function description.
1237	Is this clear enough?: "The dependence distance is signed (can be positive/negative), so we sign extend Dist; The (unsigned) product is the multiplication of the absolute stride in bytes, and the backdgeTakenCount, so we zero extend Product."
1241	I'm not sure I understand the overflow concern... First of all, just to make sure it's clear: we reach this code only when the distance is unknown, so the specific example you gave with a known distance of 128 will not reach this code. So say the example was the following: void foo (char out, signed char Two, signed char SixtyFour) { for (signed char i = 0; i < Two; i++) { char r0 = out[i - SixtyFour]; out[i + SixtyFour] = r0 2; } } The Dist that we will have is: (2 * (sext i8 %SixtyFour to i64)) The Minus expressions that we will compute are: Minus = (0 + (2 * (sext i8 %SixtyFour to i64)) + (-1 * (zext i8 %Two to i64))<nsw>) Minus = (0 + (-2 * (sext i8 %SixtyFour to i64)) + (-1 * (zext i8 %Two to i64))<nsw>) (And of course in this case we will not be able to prove statically that this is positive, so we will return that the distance is not safe).

Addressing Sanjoy's comments. Thanks.

Any remaining/further concerns?

ping?

Sorry for the delay, I'll make sure to review this by the end of (California time) day today. If I don't get to it by then please feel free to check this in, and I'll do a post-commit review.

Closed by commit rL294892: [LV/LoopAccess] Check statically if an unknown dependence distance can be (authored by dorit). · Explain WhyFeb 12 2017, 1:44 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

LoopAccessAnalysis.cpp

66 lines

test/

Analysis/

LoopAccessAnalysis/

multiple-strides-rt-memory-checks.ll

8 lines

Transforms/

LoopVectorize/

multiple-strides-vectorization.ll

8 lines

pr31098.ll

100 lines

Diff 84749

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,199 Lines • ▼ Show 20 Lines	bool MemoryDepChecker::couldPreventStoreLoadForward(uint64_t Distance,

if (MaxVFWithoutSLForwardIssues < MaxSafeDepDistBytes &&		if (MaxVFWithoutSLForwardIssues < MaxSafeDepDistBytes &&
MaxVFWithoutSLForwardIssues !=		MaxVFWithoutSLForwardIssues !=
VectorizerParams::MaxVectorWidth * TypeByteSize)		VectorizerParams::MaxVectorWidth * TypeByteSize)
MaxSafeDepDistBytes = MaxVFWithoutSLForwardIssues;		MaxSafeDepDistBytes = MaxVFWithoutSLForwardIssues;
return false;		return false;
}		}

		/// Given a non-contant dependence-distance \p Dist between two accesses,
		sanjoyUnsubmitted Not Done Reply Inline Actions Nit: "non-constant" sanjoy: Nit: "non-constant"
		/// that have the same stride \p Stride and the same type size \p TypeByteSize,
		/// in a loop whose takenCount is \p BackedgeTakenCount, check if it is
		/// possible to prove statically that the dependence distance is larger
		/// than the range that the accesses will travel through the execution of
		/// the loop. If so, return true; false otherwise. This is useful for
		/// example in loops such as the following (PR31098):
		/// for (i = 0; i < D; ++i) {
		/// = out[i];
		/// out[i+D] =
		/// }
		static bool isSafeDependenceDistance(const DataLayout &DL, ScalarEvolution &SE,
		const SCEV &BackedgeTakenCount,
		const SCEV &Dist, uint64_t Stride,
		uint64_t TypeByteSize) {

		// If we can prove that
		// (*) \|dist_in_bytes\| > takenCount stepInBytes
		sanjoyUnsubmitted Not Done Reply Inline Actions How does this work with a negative `BackedgeTakenCount * Step`? For instance, in: i = 0; do { r0 = out[i + 1]; out[i] = r1; } while (--i != -2); `D` is `1` (or `-1`), `BackedgeTakenCount` is `2` and `Step` is `-1`, so `\|Dist\| > BackedgeTakenCount * Step` is true (I assume you intend to use a signed `>` ?). However, on the first iteration we write to `out[0]`, and we read from it on the second iteration. Or is this somehow not the kind of dependence you're looking for? sanjoy: How does this work with a negative `BackedgeTakenCount * Step`? For instance, in: ``` i = 0…
		doritAuthorUnsubmitted Not Done Reply Inline Actions Step cannot be negative. It is the absolute Stride in bytes. I will clarify that in the function description. dorit: Step cannot be negative. It is the absolute Stride in bytes. I will clarify that in the…
		// then there is no dependence.
		// (this is equivalent to what the SIV data-dependence test would do).
		const uint64_t ByteStride = Stride * TypeByteSize;
		const SCEV *Step = SE.getConstant(BackedgeTakenCount.getType(), ByteStride);
		anemetUnsubmitted Not Done Reply Inline Actions Use the same variable names as in the formula or the other way? anemet: Use the same variable names as in the formula or the other way?
		const SCEV *Product = SE.getMulExpr(&BackedgeTakenCount, Step);

		const SCEV *CastedDist = &Dist;
		const SCEV *CastedProduct = Product;
		uint64_t DistTypeSize = DL.getTypeAllocSize(Dist.getType());
		uint64_t ProductTypeSize = DL.getTypeAllocSize(Product->getType());
		if (DistTypeSize > ProductTypeSize)
		CastedProduct = SE.getZeroExtendExpr(Product, Dist.getType());
		sanjoyUnsubmitted Not Done Reply Inline Actions Can you add a comment for why you're zero extending `Product` but sign extending `Dist`? sanjoy: Can you add a comment for why you're zero extending `Product` but sign extending `Dist`?
		doritAuthorUnsubmitted Not Done Reply Inline Actions Is this clear enough?: "The dependence distance is signed (can be positive/negative), so we sign extend Dist; The (unsigned) product is the multiplication of the absolute stride in bytes, and the backdgeTakenCount, so we zero extend Product." dorit: Is this clear enough?: "The dependence distance is signed (can be positive/negative), so we…
		else
		CastedDist = SE.getNoopOrSignExtend(&Dist, Product->getType());

		// Is (Dist > takenCount * stepInBytes) ?
		sanjoyUnsubmitted Not Done Reply Inline Actions How do you know that `Dist - (BackedgeTakenCount * Step)` won't overflow? I.e. why can't `Dist` be `i8 128` and `BackedgeTakenCount * Step` be `i8 1`? Skimming through LAA, it looks like it guards against overflow in the addressing expressions themselves, but I think the situation above can happen even if addressing expressions themselves do not overflow. for (i8 i = 0; i < 2; i++) { r0 = out[i - 64]; out[i + 64] = r1; } sanjoy: How do you know that `Dist - (BackedgeTakenCount * Step)` won't overflow? I.e. why can't…
		doritAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure I understand the overflow concern... First of all, just to make sure it's clear: we reach this code only when the distance is unknown, so the specific example you gave with a known distance of 128 will not reach this code. So say the example was the following: void foo (char out, signed char Two, signed char SixtyFour) { for (signed char i = 0; i < Two; i++) { char r0 = out[i - SixtyFour]; out[i + SixtyFour] = r0 2; } } The Dist that we will have is: (2 * (sext i8 %SixtyFour to i64)) The Minus expressions that we will compute are: Minus = (0 + (2 * (sext i8 %SixtyFour to i64)) + (-1 * (zext i8 %Two to i64))<nsw>) Minus = (0 + (-2 * (sext i8 %SixtyFour to i64)) + (-1 * (zext i8 %Two to i64))<nsw>) (And of course in this case we will not be able to prove statically that this is positive, so we will return that the distance is not safe). dorit: I'm not sure I understand the overflow concern... First of all, just to make sure it's clear…
		// (If so, then we have proven (**) because \|dist\| >= dist)
		const SCEV *Minus = SE.getMinusSCEV(CastedDist, CastedProduct);
		if (SE.isKnownPositive(Minus))
		return true;
		anemetUnsubmitted Not Done Reply Inline Actions The comment is actually slightly misleading since you perform subtraction to detect this. @sanjoy, are these steps reasonably efficient to prove the property at line 1224 with SCEV? anemet: The comment is actually slightly misleading since you perform subtraction to detect this.

		// Second try: Is (-Dist > takenCount * stepInBytes) ?
		// (If so, then we have proven (*) because \|dist\| >= -1dist)
		const SCEV *NegDist = SE.getNegativeSCEV(CastedDist);
		Minus = SE.getMinusSCEV(NegDist, CastedProduct);
		if (SE.isKnownPositive(Minus))
		return true;

		return false;
		}

/// \brief Check the dependence for two accesses with the same stride \p Stride.		/// \brief Check the dependence for two accesses with the same stride \p Stride.
/// \p Distance is the positive distance and \p TypeByteSize is type size in		/// \p Distance is the positive distance and \p TypeByteSize is type size in
/// bytes.		/// bytes.
///		///
/// \returns true if they are independent.		/// \returns true if they are independent.
static bool areStridedAccessesIndependent(uint64_t Distance, uint64_t Stride,		static bool areStridedAccessesIndependent(uint64_t Distance, uint64_t Stride,
uint64_t TypeByteSize) {		uint64_t TypeByteSize) {
assert(Stride > 1 && "The stride must be greater than 1");		assert(Stride > 1 && "The stride must be greater than 1");
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
// Need accesses with constant stride. We don't want to vectorize		// Need accesses with constant stride. We don't want to vectorize
// "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap in		// "A[B[i]] += ..." and similar code or pointer arithmetic that could wrap in
// the address space.		// the address space.
if (!StrideAPtr \|\| !StrideBPtr \|\| StrideAPtr != StrideBPtr){		if (!StrideAPtr \|\| !StrideBPtr \|\| StrideAPtr != StrideBPtr){
DEBUG(dbgs() << "Pointer access with non-constant stride\n");		DEBUG(dbgs() << "Pointer access with non-constant stride\n");
return Dependence::Unknown;		return Dependence::Unknown;
}		}

		Type *ATy = APtr->getType()->getPointerElementType();
		Type *BTy = BPtr->getType()->getPointerElementType();
		auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();
		uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);
		uint64_t Stride = std::abs(StrideAPtr);
const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);		const SCEVConstant *C = dyn_cast<SCEVConstant>(Dist);
if (!C) {		if (!C) {
		if (TypeByteSize == DL.getTypeAllocSize(BTy) &&
		mkuperUnsubmitted Not Done Reply Inline Actions Can this be an early exit? mkuper: Can this be an early exit?
		doritAuthorUnsubmitted Not Done Reply Inline Actions yes, I'll change that dorit: yes, I'll change that
		isSafeDependenceDistance(DL, *(PSE.getSE()),
		(PSE.getBackedgeTakenCount()), Dist, Stride,
		TypeByteSize))
		return Dependence::NoDep;

DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");		DEBUG(dbgs() << "LAA: Dependence because of non-constant distance\n");
ShouldRetryWithRuntimeCheck = true;		ShouldRetryWithRuntimeCheck = true;
		mkuperUnsubmitted Not Done Reply Inline Actions Can you use getMaxBackedgeTakenCount()? If the count is precisely known, then they should be equivalent. If it's not, then since by definition MaxBackedgeTakenCount >= BackedgeTakenCount, the equation above also holds. (I'm not sure if ever vectorize when we don't have a precise getBackedgeTakenCount(), but even if we don't, no need for extra constraints here) mkuper: Can you use getMaxBackedgeTakenCount()? If the count is precisely known, then they should be…
		doritAuthorUnsubmitted Not Done Reply Inline Actions Ok; I gave this a try, but getMaxBackedgeTakenCount is giving me -1… maybe I'm using it wrong?: Const SCEV MaxBackedgeTakenCount = PSE.getSE()->getMaxBackedgeTakenCount(InnermostLoop); I don't see any uses of this API in the vectorizer/LoopAccesses; is it supposed to work also when predicates are added to compute the BackedgeCount? (BTW, we indeed never get here if getBackedgeTakenCount is UnknownSCEV. it' a requirement in canAnalyzeLoop). dorit:* Ok; I gave this a try, but getMaxBackedgeTakenCount is giving me -1… maybe I'm using it wrong?
		doritAuthorUnsubmitted Not Done Reply Inline Actions getMaxBackedgeTakenCount is giving me -1 probably this related to the PR you opened (PR31412)? (although this is not in the remainder loop...?) dorit: > getMaxBackedgeTakenCount is giving me -1 probably this related to the PR you opened…
		mkuperUnsubmitted Not Done Reply Inline Actions Err, yes, you're right, this is currently broken. Sorry for the confusion. mkuper: Err, yes, you're right, this is currently broken. Sorry for the confusion.
return Dependence::Unknown;		return Dependence::Unknown;
}		}

Type *ATy = APtr->getType()->getPointerElementType();
Type *BTy = BPtr->getType()->getPointerElementType();
auto &DL = InnermostLoop->getHeader()->getModule()->getDataLayout();
uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);

const APInt &Val = C->getAPInt();		const APInt &Val = C->getAPInt();
int64_t Distance = Val.getSExtValue();		int64_t Distance = Val.getSExtValue();
uint64_t Stride = std::abs(StrideAPtr);

// Attempt to prove strided accesses independent.		// Attempt to prove strided accesses independent.
if (std::abs(Distance) > 0 && Stride > 1 && ATy == BTy &&		if (std::abs(Distance) > 0 && Stride > 1 && ATy == BTy &&
areStridedAccessesIndependent(std::abs(Distance), Stride, TypeByteSize)) {		areStridedAccessesIndependent(std::abs(Distance), Stride, TypeByteSize)) {
DEBUG(dbgs() << "LAA: Strided accesses are independent\n");		DEBUG(dbgs() << "LAA: Strided accesses are independent\n");
return Dependence::NoDep;		return Dependence::NoDep;
}		}

// Negative distances are not plausible dependencies.		// Negative distances are not plausible dependencies.
if (Val.isNegative()) {		if (Val.isNegative()) {
		mkuperUnsubmitted Not Done Reply Inline Actions (Sorry for possibly rehashing the discussion you had with Silviu) The code looks fine, but I'm not entirely sure I understand why this is the best way to approach this. I've read through PR31098, and it probably contains the information I'm looking for, but it's spread over separate messages and it's a bit hard for me to follow. Why can't you offload this into SCEV? Is the problem representing \|dist\| as a SCEV, or that SCEV can't prove \|dist\| - product is positive? While this may be a good idea independently, wouldn't cases like PR31098, in general be better served by improving alias analysis, and then being able to prove that 8D >= 0? mkuper: (Sorry for possibly rehashing the discussion you had with Silviu) The code looks fine, but I'm…
		doritAuthorUnsubmitted Not Done Reply Inline Actions AFAICS, SCEV can't answer the questions isKnownPositive(dist)/isKnownNonNegative(dist)/… for the expression in question. It looks like SCEV can tell that D = (%size /u 2) is positive, but it can't tell that 8D is positive... (where in fact it is even strictly positive if we know we entered the loop). So I think there's room for improvement in that respect. But in any case, even if/when that is improved, there still may be scenarios in which we can't answer the isKnownPositive question about dist, but we can successfully prove that either (dist - product > 0) or (-dist - product > 0) simply because things get canceled out... (After all, the goal is not to compute \|dist\|, but to prove the inequality...). By improving alias analysis you refer to my suggestion to be able to anti-alias structure fields? If so then yes, sure, that would help prune irrelevant dependencies. (But if we had a regular array strided access like so: a[2i+1], a[2i+D] then alias analysis would not come to the rescue… we'd have to deal with dist = 8D-4 for which 8D>=0 is not sufficient) dorit:* 1. AFAICS, SCEV can't answer the questions isKnownPositive(dist)/isKnownNonNegative(dist)/… for…
		mkuperUnsubmitted Not Done Reply Inline Actions What I mean is that this holds unconditionally: (X - Y > 0 \|\| -X - Y > 0) <=> \|X\| - Y > 0 Would it make sense to sink this inference into SCEV itself, instead of only using it here? If not, is it because the SCEV representation of \|X\| is too hairy? Yes, that's what I meant. And you're right, we probably want both. mkuper: 1. What I mean is that this holds unconditionally: ``` (X - Y > 0 \|\| -X - Y > 0) <=> \|X\| - Y >…
		doritAuthorUnsubmitted Not Done Reply Inline Actions about 1: I don't know that we have a canonical way for abs representation in SCEV, so I guess I would classify this as "hairy" :-) but better if the SCEV experts would comment on this. dorit: about 1: I don't know that we have a canonical way for abs representation in SCEV, so I guess…
bool IsTrueDataDependence = (AIsWrite && !BIsWrite);		bool IsTrueDataDependence = (AIsWrite && !BIsWrite);
if (IsTrueDataDependence && EnableForwardingConflictDetection &&		if (IsTrueDataDependence && EnableForwardingConflictDetection &&
(couldPreventStoreLoadForward(Val.abs().getZExtValue(), TypeByteSize) \|\|		(couldPreventStoreLoadForward(Val.abs().getZExtValue(), TypeByteSize) \|\|
ATy != BTy)) {		ATy != BTy)) {
DEBUG(dbgs() << "LAA: Forward but may prevent st->ld forwarding\n");		DEBUG(dbgs() << "LAA: Forward but may prevent st->ld forwarding\n");
return Dependence::ForwardButPreventsForwarding;		return Dependence::ForwardButPreventsForwarding;
}		}

▲ Show 20 Lines • Show All 807 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopAccessAnalysis/multiple-strides-rt-memory-checks.ll

; RUN: opt -loop-accesses -analyze -S < %s \| FileCheck %s		; RUN: opt -loop-accesses -analyze -S < %s \| FileCheck %s
; RUN: opt -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' -disable-output < %s 2>&1 \| FileCheck %s		; RUN: opt -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' -disable-output < %s 2>&1 \| FileCheck %s

; This is the test case from PR26314.		; This is the test case from PR26314.
; When we were retrying dependence checking with memchecks only,		; When we were retrying dependence checking with memchecks only,
; the loop-invariant access in the inner loop was incorrectly determined to be wrapping		; the loop-invariant access in the inner loop was incorrectly determined to be wrapping
; because it was not strided in the inner loop.		; because it was not strided in the inner loop.

; #define Z 32		; #define Z 32
; typedef struct s {		; typedef struct s {
; int v1[Z];		; int v1[Z];
; int v2[Z];		; int v2[Z];
; int v3[Z][Z];		; int v3[Z][Z];
; } s;		; } s;
;		;
; void slow_function (s* const obj) {		; void slow_function (s* const obj, int z) {
; for (int j=0; j<Z; j++) {		; for (int j=0; j<Z; j++) {
; for (int k=0; k<Z; k++) {		; for (int k=0; k<z; k++) {
; int x = obj->v1[k] + obj->v2[j];		; int x = obj->v1[k] + obj->v2[j];
; obj->v3[j][k] += x;		; obj->v3[j][k] += x;
; }		; }
; }		; }
; }		; }

; CHECK: function 'Test':		; CHECK: function 'Test':
; CHECK: .inner:		; CHECK: .inner:
; CHECK-NEXT: Memory dependences are safe		; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:		; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:		; CHECK-NEXT: Run-time memory checks:
; CHECK: Check 0:		; CHECK: Check 0:
; CHECK: Check 1:		; CHECK: Check 1:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

%struct.s = type { [32 x i32], [32 x i32], [32 x [32 x i32]] }		%struct.s = type { [32 x i32], [32 x i32], [32 x [32 x i32]] }

define void @Test(%struct.s* nocapture %obj) #0 {		define void @Test(%struct.s* nocapture %obj, i64 %z) #0 {
br label %.outer.preheader		br label %.outer.preheader


.outer.preheader:		.outer.preheader:
%i = phi i64 [ 0, %0 ], [ %i.next, %.outer ]		%i = phi i64 [ 0, %0 ], [ %i.next, %.outer ]
%1 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 1, i64 %i		%1 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 1, i64 %i
br label %.inner		br label %.inner

Show All 11 Lines	.inner:
%3 = load i32, i32* %2		%3 = load i32, i32* %2
%4 = load i32, i32* %1		%4 = load i32, i32* %1
%5 = add nsw i32 %4, %3		%5 = add nsw i32 %4, %3
%6 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 2, i64 %i, i64 %j		%6 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 2, i64 %i, i64 %j
%7 = load i32, i32* %6		%7 = load i32, i32* %6
%8 = add nsw i32 %5, %7		%8 = add nsw i32 %5, %7
store i32 %8, i32* %6		store i32 %8, i32* %6
%j.next = add nuw nsw i64 %j, 1		%j.next = add nuw nsw i64 %j, 1
%exitcond.inner = icmp eq i64 %j.next, 32		%exitcond.inner = icmp eq i64 %j.next, %z
br i1 %exitcond.inner, label %.outer, label %.inner		br i1 %exitcond.inner, label %.outer, label %.inner
}		}

llvm/test/Transforms/LoopVectorize/multiple-strides-vectorization.ll

; RUN: opt -loop-vectorize -force-vector-width=4 -S < %s \| FileCheck %s		; RUN: opt -loop-vectorize -force-vector-width=4 -S < %s \| FileCheck %s

; This is the test case from PR26314.		; This is the test case from PR26314.
; When we were retrying dependence checking with memchecks only,		; When we were retrying dependence checking with memchecks only,
; the loop-invariant access in the inner loop was incorrectly determined to be wrapping		; the loop-invariant access in the inner loop was incorrectly determined to be wrapping
; because it was not strided in the inner loop.		; because it was not strided in the inner loop.
; Improved wrapping detection allows vectorization in the following case.		; Improved wrapping detection allows vectorization in the following case.

; #define Z 32		; #define Z 32
; typedef struct s {		; typedef struct s {
; int v1[Z];		; int v1[Z];
; int v2[Z];		; int v2[Z];
; int v3[Z][Z];		; int v3[Z][Z];
; } s;		; } s;
;		;
; void slow_function (s* const obj) {		; void slow_function (s* const obj, int z) {
; for (int j=0; j<Z; j++) {		; for (int j=0; j<Z; j++) {
; for (int k=0; k<Z; k++) {		; for (int k=0; k<z; k++) {
; int x = obj->v1[k] + obj->v2[j];		; int x = obj->v1[k] + obj->v2[j];
; obj->v3[j][k] += x;		; obj->v3[j][k] += x;
; }		; }
; }		; }
; }		; }

; CHECK-LABEL: Test		; CHECK-LABEL: Test
; CHECK: <4 x i64>		; CHECK: <4 x i64>
; CHECK: <4 x i32>, <4 x i32>		; CHECK: <4 x i32>, <4 x i32>
; CHECK: llvm.loop.vectorize.width		; CHECK: llvm.loop.vectorize.width

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

%struct.s = type { [32 x i32], [32 x i32], [32 x [32 x i32]] }		%struct.s = type { [32 x i32], [32 x i32], [32 x [32 x i32]] }

define void @Test(%struct.s* nocapture %obj) #0 {		define void @Test(%struct.s* nocapture %obj, i64 %z) #0 {
br label %.outer.preheader		br label %.outer.preheader


.outer.preheader:		.outer.preheader:
%i = phi i64 [ 0, %0 ], [ %i.next, %.outer ]		%i = phi i64 [ 0, %0 ], [ %i.next, %.outer ]
%1 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 1, i64 %i		%1 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 1, i64 %i
br label %.inner		br label %.inner

Show All 11 Lines	.inner:
%3 = load i32, i32* %2		%3 = load i32, i32* %2
%4 = load i32, i32* %1		%4 = load i32, i32* %1
%5 = add nsw i32 %4, %3		%5 = add nsw i32 %4, %3
%6 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 2, i64 %i, i64 %j		%6 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 2, i64 %i, i64 %j
%7 = load i32, i32* %6		%7 = load i32, i32* %6
%8 = add nsw i32 %5, %7		%8 = add nsw i32 %5, %7
store i32 %8, i32* %6		store i32 %8, i32* %6
%j.next = add nuw nsw i64 %j, 1		%j.next = add nuw nsw i64 %j, 1
%exitcond.inner = icmp eq i64 %j.next, 32		%exitcond.inner = icmp eq i64 %j.next, %z
br i1 %exitcond.inner, label %.outer, label %.inner		br i1 %exitcond.inner, label %.outer, label %.inner
}		}

llvm/test/Transforms/LoopVectorize/pr31098.ll

				; REQUIRES: asserts
				; RUN: opt -S -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses=true -debug-only=loop-accesses < %s 2>&1 \| FileCheck %s
				anemetUnsubmitted Not Done Reply Inline Actions Please add an LAA-only test in addition to this under Test/Analysis/LoopAccessAnalysis. anemet: Please add an LAA-only test in addition to this under Test/Analysis/LoopAccessAnalysis.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; Check that the compile-time-unknown depenendece-distance is resolved
				; statically. Due to the non-unit stride of the accesses in this testcase
				; we are currently not able to create runtime dependence checks, and therefore
				; if we don't resolve the dependence statically we cannot vectorize the loop.
				;
				; Specifically in this example, during dependence analysis we get 6 unknown
				; dependence distances between the 8 real/imaginary accesses below:
				; dist = 8D, 4+8D, -4+8D, -8D, 4-8D, -4-8D.
				; At compile time we can prove for all of the above that \|dist\|>loopBound*step
				; (where the step is 8bytes, and the loopBound is D-1), and thereby conclude
				; that there are no dependencies (without runtime tests):
				; \|8D\|>8D-8, \|4+8D\|>8D-8, \|-4+8D\|>8D-8, etc.

				; #include <stdlib.h>
				; class Complex {
				; private:
				; float real_;
				; float imaginary_;
				;
				; public:
				; Complex() : real_(0), imaginary_(0) { }
				; Complex(float real, float imaginary) : real_(real), imaginary_(imaginary) { }
				; Complex(const Complex &rhs) : real_(rhs.real()), imaginary_(rhs.imaginary()) { }
				;
				; inline float real() const { return real_; }
				; inline float imaginary() const { return imaginary_; }
				;
				; Complex operator+(const Complex& rhs) const
				; {
				; return Complex(real_ + rhs.real_, imaginary_ + rhs.imaginary_);
				; }
				;
				; Complex operator-(const Complex& rhs) const
				; {
				; return Complex(real_ - rhs.real_, imaginary_ - rhs.imaginary_);
				; }
				; };
				;
				; void Test(Complex *out, size_t size)
				; {
				; size_t D = size / 2;
				; for (size_t offset = 0; offset < D; ++offset)
				; {
				; Complex t0 = out[offset];
				; Complex t1 = out[offset + D];
				; out[offset] = t1 + t0;
				; out[offset + D] = t0 - t1;
				; }
				; }

				; CHECK-LABEL: Test
				; CHECK: LAA: No unsafe dependent memory operations in loop. We don't need runtime memory checks.
				; CHECK: vector.body:
				; CHECK: <4 x i32>

				%class.Complex = type { float, float }

				define void @Test(%class.Complex* nocapture %out, i64 %size) local_unnamed_addr {
				entry:
				%div = lshr i64 %size, 1
				%cmp47 = icmp eq i64 %div, 0
				br i1 %cmp47, label %for.cond.cleanup, label %for.body.preheader

				for.body.preheader:
				br label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%offset.048 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%0 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.048, i32 0
				%1 = load float, float* %0, align 4
				%imaginary_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.048, i32 1
				%2 = load float, float* %imaginary_.i.i, align 4
				%add = add nuw i64 %offset.048, %div
				%3 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %add, i32 0
				%4 = load float, float* %3, align 4
				%imaginary_.i.i28 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %add, i32 1
				%5 = load float, float* %imaginary_.i.i28, align 4
				%add.i = fadd fast float %4, %1
				%add4.i = fadd fast float %5, %2
				store float %add.i, float* %0, align 4
				store float %add4.i, float* %imaginary_.i.i, align 4
				%sub.i = fsub fast float %1, %4
				%sub4.i = fsub fast float %2, %5
				store float %sub.i, float* %3, align 4
				store float %sub4.i, float* %imaginary_.i.i28, align 4
				%inc = add nuw nsw i64 %offset.048, 1
				%exitcond = icmp eq i64 %inc, %div
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
				}