This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Analysis/
-
Analysis/
12
LoopAccessAnalysis.cpp
-
test/Analysis/LoopAccessAnalysis/
-
Analysis/
-
LoopAccessAnalysis/
13
memcheck-wrapping-pointers.ll

Differential D17080

[LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs
ClosedPublic

Authored by sbaranga on Feb 10 2016, 9:41 AM.

Download Raw Diff

Details

Reviewers

anemet
mzolotukhin
mkuper
sanjoy
hfinkel

Commits

rGac920f7716ef: [LAA] Allow more run-time alias checks by coercing pointer expressions to…
rL313012: [LAA] Allow more run-time alias checks by coercing pointer expressions to…

Summary

LAA can only emit run-time alias checks for pointers with affine AddRec
SCEV expressions. However, non-AddRecExprs can be now be converted to
affine AddRecExprs using SCEV predicates.

This change tries to add the minimal set of SCEV predicates in order
to enable run-time alias checking.

Diff Detail

Event Timeline

sbaranga updated this revision to Diff 47472.Feb 10 2016, 9:41 AM

sbaranga retitled this revision from to [LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs.

sbaranga updated this object.

sbaranga added reviewers: anemet, mzolotukhin.

sbaranga added a subscriber: llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 10 2016, 9:41 AM

We have a dependency on http://reviews.llvm.org/D17078 (without it the compiler crashes while trying to build 464.h264ref in spec2006).

roman.shirokiy added a subscriber: roman.shirokiy.Mar 11 2016, 2:48 AM

This rebases the patch and moves the lambda to a separate method.

Rebase the patch since it wasn't applying to ToT anymore.

dorit added a subscriber: dorit.Feb 13 2017, 5:46 AM

This patch still applies almost entirely as is (only needs a tiny update in the testcase), and seems to have positive performance effect on a couple benchmarks.
It is also a prerequisite for PR30654.
Lets revive this review!

dorit added reviewers: mkuper, sanjoy.Feb 20 2017, 3:49 AM

dorit added a subscriber: Ayal.

dorit added inline comments.Feb 21 2017, 12:41 AM

lib/Analysis/LoopAccessAnalysis.cpp
460	I just have a small suggestion, to maybe change "Force" to "Assume", just because "Force" here has the same effect as "Assume" in the getPtrStride API, namely to allow adding new runtime tests. (Right?). But if you prefer Force that's fine with me too.

dorit added a reviewer: hfinkel.Feb 22 2017, 7:34 PM

ping

Ayal added inline comments.Mar 1 2017, 3:09 AM

test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll
6	"i can ove[r]flow": better clarify which i can overflow - in the test below the induction variable of the loop i is a signed 64bit (as is the bound 'len') whose bump has an nsw, so it is free of overflow concerns. The indices of A[i] and A[i+1] are (signed or unsigned?) 32bit idx and idx.inc, whose zext-add-trunc bump has no nuw nor nsw and are therefore subject to overflow concerns. Is it clear why the Added "SCEV assumptions" Flags for these should be <nssw>, rather than <nusw>? Starting from 0 and 1, nssw is more conservative. "When len has a type": these assumptions are needed regardless of the type of len.
10	Suggest to clarify the corresponding C code using both i and idx.
12	"emmit" >> "emit", multiple occurrences.
20	Why sext's and not zext's?
22	"We" >> "The". Hopefully the transformed expressions are i32 {0,+,1} i32 {1,+,1} based on the added flags, right?
56	It doesn't really matter if we zext or sext here, right? Or should zext indicate the type of idx was originally unsigned?
140	See comments above.

sbaranga added inline comments.Mar 7 2017, 3:05 AM

test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll
6	Thanks for the observations. There is a discrepancy here between the C code and the IR being tested (and the text is misleading). We are actually looking at the expressions for %inc.ptr0 and %inc.ptr1. The sign extension comes from the getelementptr instruction. Because the SCEV expression that we are trying to linearize is something like (sext i32 {0,+,1}<%for.body> to i64), we can only add the nssw flag. I'll update the text to clarify things.
22	i64 should be correct. We're essentially sinking the sext to get a i64 linear expression.
56	Correct (for both).

Renaming Force to Assume.

The tests have been re-written, with the test IR coming directly from C so it should now match the C code.

I've removed one of tests which was only testing that this also works for NSW.
This functionality is already covered by other tests and doesn't show up often from C.

This looks ok to me, added only minor comments, but it should be approved by someone who understands all this better...

IIUC the current scheme originally computes CanDoRT and NeedRTCheck "independently", or rather intertwinedly, checking first if CanDoRT without Assume in parallel to determining if NeedRTCheck, and if the latter is true revisit if CanDoRT with Assume as needed by Retries. A simpler albeit possibly slower approach is to first determine separately if NeedRTCheck, returning true if not; else check if CanDoRT with Assume. But Need[RT]Check seems to depend on CanDo[AliasSet]RT, and we may return false if !NeedRTCheck when there are incomparable pointers of distinct address spaces. The use of "isDependencyCheckNeeded()" also raises an eyebrow, or two.

lib/Analysis/LoopAccessAnalysis.cpp
537–538	The tests added below check this other added condition, namely PSE.hasNoOverflow(), right?
545	IsDepCheckNeeded was originally taken out of the AS : AST loop. Here you can simply ask "if(isDependencyCheckNeeded())" later.
547	IsWrite is needed only later by RtCheck.insert(), can move it there.
627	This goes back to the original code: it may be slightly easier to always bump ++NumTotalPtrChecks instead of ++NumReadPtrChecks and set NeedsCheck = (NumTotalPtrChecks >= 2 && NumWritePtrChecks >= 1); But not sure I follow the original logic here; it's equivalent to doing: bool NeedsCheck = (NumTotalPtrChecks >= 2 && NumWritePtrChecks >= 1); if (IsDepCheckNeeded && CanDoAliasSetRT && RunningDepId == 2) NeedsCheck = false; Having CanDoAliasSetRT && RunningDepId == 2 implies that NumTotalPtrChecks == 2, but why set NeedsCheck to false when this happens and IsDepCheckNeeded? What if NumWritePtrChecks >= 1? May want to rename NeedsCheck >> NeedsAliasSetRTCheck, analogous to the CanDo's.
668	ok, but what if !NeedRTCheck?
test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll
6	Thanks! This looks fine to me.
25	the second one should be: ; i64 {(%b),+,4}<%for.body>
76	ove[r]flow

dorit mentioned this in D30041: [PSCEV] Create AddRec for Phis in cases of possible integer overflow, using runtime checks.Apr 13 2017, 9:00 AM

In D17080#725468, @Ayal wrote:

This looks ok to me, added only minor comments, but it should be approved by someone who understands all this better...

@mkuper, @hfinkel - can you help out here?

I think @anemet has reviewed patches in this part of the vectorizer… @anemet, would you be able to please take a look?

(it's not a big patch: the heart of the patch is just one line: adding a call to
PSE.getAsAddRec(Ptr) under "if (Assume)" in hasComputeableBounds()
(which would make us more aggressive in adding runtime checks).
All the rest is an attempt to minimize the cases in which we pass Assume=true…)

In D17080#729013, @Ayal wrote:

In D17080#725468, @Ayal wrote:

This looks ok to me, added only minor comments, but it should be approved by someone who understands all this better...

@mkuper, @hfinkel - can you help out here?

@anemet , @mssimpso - can you help out here?

@sbaranga, can you clarify my comments and help me understand this better? Hopefully this could move forward.

Sorry for not replying earlier. It looks like there are only minor changes left? I plan to push an update after this gets unblocked.

It's also a little bit odd that I haven't been able to get a definitive review for this. It would be good to know if there's something fundamentally wrong with the approach (at least we could start thinking about workarounds).

lib/Analysis/LoopAccessAnalysis.cpp
537–538	Correct.
627	I've tried to preserve the original logic, so if there are any issues with the original logic I've also pulled them in. The only difference here being that we're doing the same logic per alias set. It is possible that the logic can be simplified. RunningDepId == 2 means that we only have one dependence set. There a comment above " But there is no need to checks if there is only one dependence set for this alias set.". If I remember correctly this covers cases such as: for (int i = 0; i < n; ++i) a[i] = a[i] + 1; where we don't need any checks. For renaming NeedsCheck >> NeedsAliasSetRTCheck: I have no objections.
668	If !NeedRTCheck I don't think we need to do this check. However, the old code was doing it, so I left it in.

hfinkel added inline comments.Aug 21 2017, 1:42 PM

lib/Analysis/LoopAccessAnalysis.cpp
522	Please add a comment on what Assume means in practice for this function.
632	Can you extend this comment to note why, even if they failed previously, they might succeed now.

Address comments received so far.

Harbormaster completed remote builds in B9564: Diff 112377.Aug 23 2017, 8:57 AM

LGTM

lib/Analysis/LoopAccessAnalysis.cpp
618	Please note here that this is the `RunningDepId == 2` check below.

This revision is now accepted and ready to land.Sep 2 2017, 5:23 PM

sbaranga closed this revision.Sep 12 2017, 12:49 AM

Thanks! Committed in r313012.

Hi Silviu,
I came across the similar gap, but I aggressively tried converting the PtrSCEV to AddRec in hasComputableBounds, which worked on our internal benchmark. Could you please clarify why you have the Assume to be false in the first call, i.e. how do we know that it's not useful to try and convert PtrSCEV to AddRec?

This was my patch over the original code.

diff --git a/lib/Analysis/LoopAccessAnalysis.cpp b/lib/Analysis/LoopAccessAnalysis.cpp
index d2dbecd..a1c7584 100644
--- a/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/lib/Analysis/LoopAccessAnalysis.cpp
@@ -608,6 +608,11 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE,
     return true;
 
   const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
+  // Sometimes the PtrSCEV can be converted into the AddRec, try to retrieve it
+  // if possible.
+  if (!AR)
+    AR = PSE.getAsAddRec(Ptr);
+    if (!AR)
         return false;

Hi Anna,

In D17080#869566, @anna wrote:
Hi Silviu,
I came across the similar gap, but I aggressively tried converting the PtrSCEV to AddRec in hasComputableBounds, which worked on our internal benchmark. Could you please clarify why you have the Assume to be false in the first call, i.e. how do we know that it's not useful to try and convert PtrSCEV to AddRec?

This was my patch over the original code.
diff --git a/lib/Analysis/LoopAccessAnalysis.cpp b/lib/Analysis/LoopAccessAnalysis.cpp
index d2dbecd..a1c7584 100644
--- a/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/lib/Analysis/LoopAccessAnalysis.cpp
@@ -608,6 +608,11 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE,
     return true;
 
   const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
+  // Sometimes the PtrSCEV can be converted into the AddRec, try to retrieve it
+  // if possible.
+  if (!AR)
+    AR = PSE.getAsAddRec(Ptr);
+    if (!AR)
         return false;

We only need to convert to an AddRec if we need to emit the runtime checks (see the comment above the original logic for computing NeedRTCheck).

One example would be if we only have reads in the loop. If we convert in hasComputableBounds we end up having unnecessary versioning (at least with regards to the dependence analysis).

Would the current solution also work in your internal benchmark? If not it might be because something else would need the expression to be an AddRec?

Thanks,
Silviu

Hi Silviu,

In D17080#869615, @sbaranga wrote:

We only need to convert to an AddRec if we need to emit the runtime checks (see the comment above the original logic for computing NeedRTCheck).

One example would be if we only have reads in the loop. If we convert in hasComputableBounds we end up having unnecessary versioning (at least with regards to the dependence analysis).

Thanks for the clarification. makes sense.

Would the current solution also work in your internal benchmark? If not it might be because something else would need the expression to be an AddRec?

Yes, I tried your patch and it works. I was just curious why we needed all the extra code rather than always checking if we could convert to an AddRec.

Revision Contents

Path

Size

lib/

Analysis/

LoopAccessAnalysis.cpp

88 lines

test/

Analysis/

LoopAccessAnalysis/

memcheck-wrapping-pointers.ll

173 lines

Diff 47472

lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	public:
/// (i.e. the pointers have computable bounds).		/// (i.e. the pointers have computable bounds).
bool canCheckPtrAtRT(RuntimePointerChecking &RtCheck, ScalarEvolution *SE,		bool canCheckPtrAtRT(RuntimePointerChecking &RtCheck, ScalarEvolution *SE,
Loop *TheLoop, const ValueToValueMap &Strides,		Loop *TheLoop, const ValueToValueMap &Strides,
bool ShouldCheckStride = false);		bool ShouldCheckStride = false);

/// \brief Goes over all memory accesses, checks whether a RT check is needed		/// \brief Goes over all memory accesses, checks whether a RT check is needed
/// and builds sets of dependent accesses.		/// and builds sets of dependent accesses.
void buildDependenceSets() {		void buildDependenceSets() {
processMemAccesses();		processMemAccesses();
		doritUnsubmitted Not Done Reply Inline Actions I just have a small suggestion, to maybe change "Force" to "Assume", just because "Force" here has the same effect as "Assume" in the getPtrStride API, namely to allow adding new runtime tests. (Right?). But if you prefer Force that's fine with me too. dorit: I just have a small suggestion, to maybe change "Force" to "Assume", just because "Force" here…
}		}

/// \brief Initial processing of memory accesses determined that we need to		/// \brief Initial processing of memory accesses determined that we need to
/// perform dependency checking.		/// perform dependency checking.
///		///
/// Note that this can later be cleared if we retry memcheck analysis without		/// Note that this can later be cleared if we retry memcheck analysis without
/// dependency checking (i.e. ShouldRetryWithRuntimeCheck).		/// dependency checking (i.e. ShouldRetryWithRuntimeCheck).
bool isDependencyCheckNeeded() { return !CheckDeps.empty(); }		bool isDependencyCheckNeeded() { return !CheckDeps.empty(); }
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	private:
bool IsRTCheckAnalysisNeeded;		bool IsRTCheckAnalysisNeeded;

/// The SCEV predicate containing all the SCEV-related assumptions.		/// The SCEV predicate containing all the SCEV-related assumptions.
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;
};		};

} // end anonymous namespace		} // end anonymous namespace

/// \brief Check whether a pointer can participate in a runtime bounds check.		/// \brief Check whether a pointer can participate in a runtime bounds check.
		hfinkelUnsubmitted Not Done Reply Inline Actions Please add a comment on what Assume means in practice for this function. hfinkel: Please add a comment on what Assume means in practice for this function.
static bool hasComputableBounds(PredicatedScalarEvolution &PSE,		static bool hasComputableBounds(PredicatedScalarEvolution &PSE,
const ValueToValueMap &Strides, Value *Ptr,		const ValueToValueMap &Strides, Value *Ptr,
Loop *L) {		Loop *L, bool Force) {
const SCEV *PtrScev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);		const SCEV *PtrScev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);

		if (!AR && Force)
		AR = dyn_cast<SCEVAddRecExpr>(PSE.getAsAddRec(Ptr));

if (!AR)		if (!AR)
return false;		return false;

return AR->isAffine();		return AR->isAffine();
}		}

bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,		bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
		AyalUnsubmitted Not Done Reply Inline Actions The tests added below check this other added condition, namely PSE.hasNoOverflow(), right? Ayal: The tests added below check this other added condition, namely PSE.hasNoOverflow(), right?
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Correct. sbaranga: Correct.
ScalarEvolution SE, Loop TheLoop,		ScalarEvolution SE, Loop TheLoop,
const ValueToValueMap &StridesMap,		const ValueToValueMap &StridesMap,
bool ShouldCheckStride) {		bool ShouldCheckStride) {
// Find pointers with computable bounds. We are going to use this information		// Find pointers with computable bounds. We are going to use this information
// to place a runtime bound check.		// to place a runtime bound check.
bool CanDoRT = true;		bool CanDoRT = true;

		AyalUnsubmitted Not Done Reply Inline Actions IsDepCheckNeeded was originally taken out of the AS : AST loop. Here you can simply ask "if(isDependencyCheckNeeded())" later. Ayal: IsDepCheckNeeded was originally taken out of the AS : AST loop. Here you can simply ask "if…
bool NeedRTCheck = false;		bool NeedRTCheck = false;
if (!IsRTCheckAnalysisNeeded) return true;		if (!IsRTCheckAnalysisNeeded) return true;
		AyalUnsubmitted Not Done Reply Inline Actions IsWrite is needed only later by RtCheck.insert(), can move it there. Ayal: IsWrite is needed only later by RtCheck.insert(), can move it there.

bool IsDepCheckNeeded = isDependencyCheckNeeded();		bool IsDepCheckNeeded = isDependencyCheckNeeded();

// We assign a consecutive id to access from different alias sets.		// We assign a consecutive id to access from different alias sets.
// Accesses between different groups doesn't need to be checked.		// Accesses between different groups doesn't need to be checked.
unsigned ASId = 1;		unsigned ASId = 1;
for (auto &AS : AST) {		for (auto &AS : AST) {
int NumReadPtrChecks = 0;		int NumReadPtrChecks = 0;
int NumWritePtrChecks = 0;		int NumWritePtrChecks = 0;
		bool CanDoAliasSetRT = true;

// We assign consecutive id to access from different dependence sets.		// We assign consecutive id to access from different dependence sets.
// Accesses within the same set don't need a runtime check.		// Accesses within the same set don't need a runtime check.
unsigned RunningDepId = 1;		unsigned RunningDepId = 1;
DenseMap<Value *, unsigned> DepSetId;		DenseMap<Value *, unsigned> DepSetId;

for (auto A : AS) {		SmallVector<MemAccessInfo, 4> Retries;
Value *Ptr = A.getValue();
bool IsWrite = Accesses.count(MemAccessInfo(Ptr, true));
MemAccessInfo Access(Ptr, IsWrite);

if (IsWrite)		auto createCheckForAccess = [&](MemAccessInfo Access, bool Force) {
++NumWritePtrChecks;		bool IsDepCheckNeeded = isDependencyCheckNeeded();
else		Value *Ptr = Access.getPointer();
++NumReadPtrChecks;		bool IsWrite = Access.getInt();

		if (!hasComputableBounds(PSE, StridesMap, Ptr, TheLoop, Force))
		return false;

if (hasComputableBounds(PSE, StridesMap, Ptr, TheLoop) &&
// When we run after a failing dependency check we have to make sure		// When we run after a failing dependency check we have to make sure
// we don't have wrapping pointers.		// we don't have wrapping pointers.
(!ShouldCheckStride \|\|		if (ShouldCheckStride)
isStridedPtr(PSE, Ptr, TheLoop, StridesMap) == 1)) {		if (isStridedPtr(PSE, Ptr, TheLoop, StridesMap, Force) != 1)
		return false;

// The id of the dependence set.		// The id of the dependence set.
unsigned DepId;		unsigned DepId;

if (IsDepCheckNeeded) {		if (IsDepCheckNeeded) {
Value *Leader = DepCands.getLeaderValue(Access).getPointer();		Value *Leader = DepCands.getLeaderValue(Access).getPointer();
unsigned &LeaderId = DepSetId[Leader];		unsigned &LeaderId = DepSetId[Leader];
if (!LeaderId)		if (!LeaderId)
LeaderId = RunningDepId++;		LeaderId = RunningDepId++;
DepId = LeaderId;		DepId = LeaderId;
} else		} else
// Each access has its own dependence set.		// Each access has its own dependence set.
DepId = RunningDepId++;		DepId = RunningDepId++;

RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap, PSE);		RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap, PSE);

DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');		DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');
} else {
		return true;
		};

		for (auto A : AS) {
		Value *Ptr = A.getValue();
		bool IsWrite = Accesses.count(MemAccessInfo(Ptr, true));
		MemAccessInfo Access(Ptr, IsWrite);

		if (IsWrite)
		++NumWritePtrChecks;
		else
		++NumReadPtrChecks;

		if (!createCheckForAccess(Access, false)) {
DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');		DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');
CanDoRT = false;		Retries.push_back(Access);
		CanDoAliasSetRT = false;
}		}
}		}

// If we have at least two writes or one write and a read then we need to		// If we have at least two writes or one write and a read then we need to
// check them. But there is no need to checks if there is only one		// check them. But there is no need to checks if there is only one
// dependence set for this alias set.		// dependence set for this alias set.
		hfinkelUnsubmitted Not Done Reply Inline Actions Please note here that this is the `RunningDepId == 2` check below. hfinkel: Please note here that this is the `RunningDepId == 2` check below.
//		//
// Note that this function computes CanDoRT and NeedRTCheck independently.		// Note that this function computes CanDoRT and NeedRTCheck independently.
// For example CanDoRT=false, NeedRTCheck=false means that we have a pointer		// For example CanDoRT=false, NeedRTCheck=false means that we have a pointer
// for which we couldn't find the bounds but we don't actually need to emit		// for which we couldn't find the bounds but we don't actually need to emit
// any checks so it does not matter.		// any checks so it does not matter.
if (!(IsDepCheckNeeded && CanDoRT && RunningDepId == 2))		bool NeedsCheck = false;
NeedRTCheck \|= (NumWritePtrChecks >= 2 \|\| (NumReadPtrChecks >= 1 &&		if (!(IsDepCheckNeeded && CanDoAliasSetRT && RunningDepId == 2))
		NeedsCheck = (NumWritePtrChecks >= 2 \|\| (NumReadPtrChecks >= 1 &&
NumWritePtrChecks >= 1));		NumWritePtrChecks >= 1));
		AyalUnsubmitted Not Done Reply Inline Actions This goes back to the original code: it may be slightly easier to always bump ++NumTotalPtrChecks instead of ++NumReadPtrChecks and set NeedsCheck = (NumTotalPtrChecks >= 2 && NumWritePtrChecks >= 1); But not sure I follow the original logic here; it's equivalent to doing: bool NeedsCheck = (NumTotalPtrChecks >= 2 && NumWritePtrChecks >= 1); if (IsDepCheckNeeded && CanDoAliasSetRT && RunningDepId == 2) NeedsCheck = false; Having CanDoAliasSetRT && RunningDepId == 2 implies that NumTotalPtrChecks == 2, but why set NeedsCheck to false when this happens and IsDepCheckNeeded? What if NumWritePtrChecks >= 1? May want to rename NeedsCheck >> NeedsAliasSetRTCheck, analogous to the CanDo's. Ayal: This goes back to the original code: it may be slightly easier to always bump…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions I've tried to preserve the original logic, so if there are any issues with the original logic I've also pulled them in. The only difference here being that we're doing the same logic per alias set. It is possible that the logic can be simplified. RunningDepId == 2 means that we only have one dependence set. There a comment above " But there is no need to checks if there is only one dependence set for this alias set.". If I remember correctly this covers cases such as: for (int i = 0; i < n; ++i) a[i] = a[i] + 1; where we don't need any checks. For renaming NeedsCheck >> NeedsAliasSetRTCheck: I have no objections. sbaranga: I've tried to preserve the original logic, so if there are any issues with the original logic…

		// We need to check things but some of the pointers couldn't be checked.
		if (NeedsCheck && !CanDoAliasSetRT) {
		// Reset the CanDoSetRt flag and retry all accesses that have failed.
		CanDoAliasSetRT = true;
		hfinkelUnsubmitted Not Done Reply Inline Actions Can you extend this comment to note why, even if they failed previously, they might succeed now. hfinkel: Can you extend this comment to note why, even if they failed previously, they might succeed now.
		for (auto Access : Retries)
		if (!createCheckForAccess(Access, true)) {
		CanDoAliasSetRT = false;
		break;
		}
		}

		CanDoRT &= CanDoAliasSetRT;
		NeedRTCheck \|= NeedsCheck;
++ASId;		++ASId;
}		}

// If the pointers that we would use for the bounds comparison have different		// If the pointers that we would use for the bounds comparison have different
// address spaces, assume the values aren't directly comparable, so we can't		// address spaces, assume the values aren't directly comparable, so we can't
// use them for the runtime check. We also have to assume they could		// use them for the runtime check. We also have to assume they could
// overlap. In the future there should be metadata for whether address spaces		// overlap. In the future there should be metadata for whether address spaces
// are disjoint.		// are disjoint.
Show All 10 Lines	for (unsigned j = i + 1; j < NumPointers; ++j) {

Value *PtrI = RtCheck.Pointers[i].PointerValue;		Value *PtrI = RtCheck.Pointers[i].PointerValue;
Value *PtrJ = RtCheck.Pointers[j].PointerValue;		Value *PtrJ = RtCheck.Pointers[j].PointerValue;

unsigned ASi = PtrI->getType()->getPointerAddressSpace();		unsigned ASi = PtrI->getType()->getPointerAddressSpace();
unsigned ASj = PtrJ->getType()->getPointerAddressSpace();		unsigned ASj = PtrJ->getType()->getPointerAddressSpace();
if (ASi != ASj) {		if (ASi != ASj) {
DEBUG(dbgs() << "LAA: Runtime check would require comparison between"		DEBUG(dbgs() << "LAA: Runtime check would require comparison between"
" different address spaces\n");		" different address spaces\n");
		AyalUnsubmitted Not Done Reply Inline Actions ok, but what if !NeedRTCheck? Ayal: ok, but what if !NeedRTCheck?
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions If !NeedRTCheck I don't think we need to do this check. However, the old code was doing it, so I left it in. sbaranga: If !NeedRTCheck I don't think we need to do this check. However, the old code was doing it, so…
return false;		return false;
}		}
}		}
}		}

if (NeedRTCheck && CanDoRT)		if (NeedRTCheck && CanDoRT)
RtCheck.generateChecks(DepCands, IsDepCheckNeeded);		RtCheck.generateChecks(DepCands, IsDepCheckNeeded);

▲ Show 20 Lines • Show All 1,318 Lines • Show Last 20 Lines

test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll

This file was added.

				; RUN: opt -basicaa -loop-accesses -analyze < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				; When len has a type larger than i, i can oveflow in the following kernel:
				;
				AyalUnsubmitted Not Done Reply Inline Actions "i can ove[r]flow": better clarify which i can overflow - in the test below the induction variable of the loop i is a signed 64bit (as is the bound 'len') whose bump has an nsw, so it is free of overflow concerns. The indices of A[i] and A[i+1] are (signed or unsigned?) 32bit idx and idx.inc, whose zext-add-trunc bump has no nuw nor nsw and are therefore subject to overflow concerns. Is it clear why the Added "SCEV assumptions" Flags for these should be <nssw>, rather than <nusw>? Starting from 0 and 1, nssw is more conservative. "When len has a type": these assumptions are needed regardless of the type of len. Ayal: "i can ove[r]flow": better clarify which i can overflow - in the test below the induction…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the observations. There is a discrepancy here between the C code and the IR being tested (and the text is misleading). We are actually looking at the expressions for %inc.ptr0 and %inc.ptr1. The sign extension comes from the getelementptr instruction. Because the SCEV expression that we are trying to linearize is something like (sext i32 {0,+,1}<%for.body> to i64), we can only add the nssw flag. I'll update the text to clarify things. sbaranga: Thanks for the observations. There is a discrepancy here between the C code and the IR being…
				AyalUnsubmitted Not Done Reply Inline Actions Thanks! This looks fine to me. Ayal: Thanks! This looks fine to me.
				; for (unsigned i = 0; i < len; i++) {
				; B[i] = A[i] + A[i + 1];
				; }
				;
				AyalUnsubmitted Not Done Reply Inline Actions Suggest to clarify the corresponding C code using both i and idx. Ayal: Suggest to clarify the corresponding C code using both i and idx.
				; If accesses to A and B can alias, we need to emmit a run-time alias check
				; between A and B. However, when i and i + 1 can wrap, their SCEV expression
				AyalUnsubmitted Not Done Reply Inline Actions "emmit" >> "emit", multiple occurrences. Ayal: "emmit" >> "emit", multiple occurrences.
				; is not an AddRec. We need to create SCEV predicates and coerce the
				; expressions to AddRecs in order to be able to emmit the run-time alias
				; check.
				;
				; The SCEV expressions for ind.ext and i + 1 respectively are:
				; (sext i32 {0,+,1}<%for.body> to i64)
				; (sext i32 {1,+,1}<%for.body> to i64)
				;
				AyalUnsubmitted Not Done Reply Inline Actions Why sext's and not zext's? Ayal: Why sext's and not zext's?
				; We transformed expressions are:
				; i64 {0,+,1}
				AyalUnsubmitted Not Done Reply Inline Actions "We" >> "The". Hopefully the transformed expressions are i32 {0,+,1} i32 {1,+,1} based on the added flags, right? Ayal: "We" >> "The". Hopefully the transformed expressions are i32 {0,+,1} i32 {1,+,1} based on…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions i64 should be correct. We're essentially sinking the sext to get a i64 linear expression. sbaranga: i64 should be correct. We're essentially sinking the sext to get a i64 linear expression.
				; i64 {1,+,1}

				; CHECK-LABEL: test1
				AyalUnsubmitted Not Done Reply Inline Actions the second one should be: ; i64 {(%b),+,4}<%for.body> Ayal: the second one should be: ; i64 {(%b),+,4}<%for.body>
				; CHECK: Memory dependences are safe with run-time checks
				; CHECK-NEXT: Dependences:
				; CHECK-NEXT: Run-time memory checks:
				; CHECK-NEXT: Check 0:
				; CHECK-NEXT: Comparing group
				; CHECK-NEXT: %inc.ptr2 = getelementptr inbounds i32, i32* %B, i32 %idx
				; CHECK-NEXT: Against group
				; CHECK-NEXT: %inc.ptr1 = getelementptr inbounds i32, i32* %A, i32 %idx.inc
				; CHECK-NEXT: %inc.ptr0 = getelementptr inbounds i32, i32* %A, i32 %idx
				; CHECK-NEXT: Grouped accesses:
				; CHECK-NEXT: Group
				; CHECK-NEXT: (Low: %B High: (-4 + (4 * (1 smax %len)) + %B))
				; CHECK-NEXT: Member: {%B,+,4}
				; CHECK-NEXT: Group
				; CHECK-NEXT: (Low: %A High: ((4 * (1 smax %len)) + %A))
				; CHECK-NEXT: Member: {(4 + %A),+,4}
				; CHECK-NEXT: Member: {%A,+,4}
				; CHECK: Store to invariant address was not found in loop.
				; CHECK-NEXT: SCEV assumptions:
				; CHECK-NEXT: {0,+,1}<%for.body> Added Flags: <nssw>
				; CHECK-NEXT: {1,+,1}<%for.body> Added Flags: <nssw>
				define void @test1(i32* %A, i32 *%B, i64 %len) {
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%idx = phi i32 [ 0, %entry ], [ %idx.inc, %for.body ]
				%i = phi i64 [ 0, %entry ], [ %add1, %for.body ]

				%idx.ext = zext i32 %idx to i64
				%idx.ext.next = add i64 %idx.ext, 1
				AyalUnsubmitted Not Done Reply Inline Actions It doesn't really matter if we zext or sext here, right? Or should zext indicate the type of idx was originally unsigned? Ayal: It doesn't really matter if we zext or sext here, right? Or should zext indicate the type of…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Correct (for both). sbaranga: Correct (for both).
				%idx.inc = trunc i64 %idx.ext.next to i32

				%inc.ptr0 = getelementptr inbounds i32, i32* %A, i32 %idx
				%inc.ptr1 = getelementptr inbounds i32, i32* %A, i32 %idx.inc

				%ld1 = load i32, i32* %inc.ptr0, align 4
				%ld2 = load i32, i32* %inc.ptr1, align 4

				%add = add i32 %ld1, %ld2

				%inc.ptr2 = getelementptr inbounds i32, i32* %B, i32 %idx
				store i32 %add, i32* %inc.ptr2

				%add1 = add nsw i64 %i, 1
				%cmp = icmp slt i64 %add1, %len
				br i1 %cmp, label %for.body, label %for.end

				for.end: ; preds = %for.body
				ret void
				}
				AyalUnsubmitted Not Done Reply Inline Actions ove[r]flow Ayal: ove[r]flow

				; This test is almost the same as the one above, except we zero extend
				; the index.
				;
				; The SCEV expressions for ind.ext and i + 1 respectively are:
				; (zext i32 {0,+,1}<%for.body> to i64)
				; (zext i32 {1,+,1}<%for.body> to i64)

				; CHECK-LABEL: test2
				; CHECK: Memory dependences are safe with run-time checks
				; CHECK-NEXT: Dependences:
				; CHECK-NEXT: Run-time memory checks:
				; CHECK-NEXT: Check 0:
				; CHECK-NEXT: Comparing group
				; CHECK-NEXT: %inc.ptr2 = getelementptr inbounds i32, i32* %B, i64 %idx.ext
				; CHECK-NEXT: Against group
				; CHECK-NEXT: %inc.ptr1 = getelementptr inbounds i32, i32* %A, i64 %idx.inc.ext
				; CHECK-NEXT: %inc.ptr0 = getelementptr inbounds i32, i32* %A, i64 %idx.ext
				; CHECK-NEXT: Grouped accesses:
				; CHECK-NEXT: Group
				; CHECK-NEXT: (Low: %B High: (-4 + (4 * (1 smax %len)) + %B))
				; CHECK-NEXT: Member: {%B,+,4}
				; CHECK-NEXT: Group
				; CHECK-NEXT: (Low: %A High: ((4 * (1 smax %len)) + %A))
				; CHECK-NEXT: Member: {(4 + %A),+,4}
				; CHECK-NEXT: Member: {%A,+,4}
				; CHECK: Store to invariant address was not found in loop.
				; CHECK-NEXT: SCEV assumptions:
				; CHECK-NEXT: {0,+,1}<%for.body> Added Flags: <nusw>
				; CHECK-NEXT: {1,+,1}<%for.body> Added Flags: <nusw>

				define void @test2(i32* %A, i32 *%B, i64 %len) {
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%idx = phi i32 [ 0, %entry ], [ %idx.inc, %for.body ]
				%i = phi i64 [ 0, %entry ], [ %add1, %for.body ]

				%idx.ext = zext i32 %idx to i64
				%idx.inc = add i32 %idx, 1
				%idx.inc.ext = zext i32 %idx.inc to i64

				%inc.ptr0 = getelementptr inbounds i32, i32* %A, i64 %idx.ext
				%inc.ptr1 = getelementptr inbounds i32, i32* %A, i64 %idx.inc.ext

				%ld1 = load i32, i32* %inc.ptr0, align 4
				%ld2 = load i32, i32* %inc.ptr1, align 4

				%add = add i32 %ld1, %ld2

				%inc.ptr2 = getelementptr inbounds i32, i32* %B, i64 %idx.ext
				store i32 %add, i32* %inc.ptr2

				%add1 = add nsw i64 %i, 1
				%cmp = icmp slt i64 %add1, %len
				br i1 %cmp, label %for.body, label %for.end

				for.end: ; preds = %for.body
				ret void
				}

				; When len has a type larger than i, i can oveflow in the following kernel:
				; for (unsigned i = 0; i < len; i++) {
				AyalUnsubmitted Not Done Reply Inline Actions See comments above. Ayal: See comments above.
				; A[i] = A[i] + 1;
				; }
				;
				; We do need to check that i doesn't wrap, but we don't need a run-time alias
				; check.

				; CHECK-LABEL: test3
				; CHECK: Memory dependences are safe
				; SCEV assumptions:
				; {0,+,1}<%for.body> Added Flags: <nssw>
				define void @test3(i32* %A, i64 %len) {
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%A.idx = phi i32 [ 0, %entry ], [ %A.idx.inc, %for.body ]
				%i = phi i64 [ 0, %entry ], [ %add1, %for.body ]

				%A.idx.inc = add i32 %A.idx, 1
				%inc.ptr = getelementptr inbounds i32, i32* %A, i32 %A.idx


				%ld = load i32, i32* %inc.ptr, align 4
				%add = add i32 %ld, 1
				store i32 %add, i32* %inc.ptr

				%add1 = add nsw i64 %i, 1
				%cmp = icmp slt i64 %add1, %len
				br i1 %cmp, label %for.body, label %for.end

				for.end: ; preds = %for.body
				ret void
				}