This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] Fix a bug about memory dependence checking in loop vectorization
Needs ReviewPublic

Authored by Jiangning on Dec 19 2014, 2:27 AM.

Download Raw Diff

This revision needs review, but there are no reviewers specified.

Details

Reviewers: None

Summary

In memory dependence checking module of loop vectorization, the algorithm tries to collect memory access candidates from AliasSetTracker AccessAnalysis::processMemAccesses, and then check memory dependences one another in MemoryDepChecker::isDependent.

The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in 'PtrAccessSet Accesses' of AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists by using statement "bool IsWrite = S.count(MemAccessInfo(Ptr, true));". This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed.

The attached test case exposed a loop body like below,

{code}

// loop body
   ... = a[i]          (1)
    ... = a[i+1]      (2)
 .......
a[i+1] = ....           (3)
   a[i] = ...            (4)

{code}

If we ignore two reads, the dependence between (1) and (3) after vectorization would not be able to be captured, and finally this loop will be incorrectly vectorized.

The fix in this patch simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, I think it will not increase compile-time visibly.

Thanks,
-Jiangning

Diff Detail

Repository: rL LLVM

Event Timeline

Jiangning updated this revision to Diff 17484.Dec 19 2014, 2:27 AM

Jiangning retitled this revision from to [PATCH] Fix a bug about memory dependence checking in loop vectorization.

Jiangning updated this object.

Jiangning edited the test plan for this revision. (Show Details)

Jiangning set the repository for this revision to rL LLVM.

Jiangning updated this object.

Jiangning added a subscriber: Unknown Object (MLST).

Hi Jiangning,

I think the test case can be simplified so that we can understand the problem easily. I've simplified it into

define void @test_loop_novect(double** %Array_.i, i64 %indvars.iv1438) {
for.body209.lr.ph:
  %t275 = load double** %Array_.i, align 8
  br label %for.body209

for.body209:                                      ; preds = %for.body209, %for.body209.lr.ph
  %indvars.iv1436 = phi i64 [ 0, %for.body209.lr.ph ], [ %indvars.iv.next1437, %for.body209 ]
  %arrayidx.i954 = getelementptr inbounds double* %t275, i64 %indvars.iv1436
  %indvars.iv.next1437 = add nuw nsw i64 %indvars.iv1436, 1
  %arrayidx.i997 = getelementptr inbounds double* %t275, i64 %indvars.iv.next1437
  %t282 = load double* %arrayidx.i954, align 8
  %t284 = load double* %arrayidx.i997, align 8
  store double %t282, double* %arrayidx.i997, align 8
  store double %t284, double* %arrayidx.i954, align 8
  %exitcond1441 = icmp eq i64 %indvars.iv1436, %indvars.iv1438
  br i1 %exitcond1441, label %invoke.cont239, label %for.body209

invoke.cont239:                                   ; preds = %for.body209
  ret void
}

Then it's easier to know that the case is like

tmp1 = a[i]
tmp2 = a[i+1]
a[i+1] = tmp1
a[i] = tmp2

Besides that, I think the fix itself is reasonable and LGTM.

Following Hao's feedback, I got the test case simplified.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

105 lines

test/

Transforms/

LoopVectorize/

loop-vect-memdep.ll

26 lines

Diff 17786

lib/Transforms/Vectorize/LoopVectorize.cpp

Property	Old Value	New Value
File Mode	100644	100755

Context not available.
	bool UseDeferred = SetIteration > 0;	bool UseDeferred = SetIteration > 0;
	PtrAccessSet &S = UseDeferred ? DeferredAccesses : Accesses;	PtrAccessSet &S = UseDeferred ? DeferredAccesses : Accesses;

	for (auto A : AS) {	for (auto AV : AS) {
	Value *Ptr = A.getValue();	Value *Ptr = AV.getValue();
	bool IsWrite = S.count(MemAccessInfo(Ptr, true));

	// If we're using the deferred access set, then it contains only reads.	// For a single memory access in AliasSetTracker, Accesses may contain
	bool IsReadOnlyPtr = ReadOnlyPtr.count(Ptr) && !IsWrite;	// both read and write, and they both need to be handled for CheckDeps.
	if (UseDeferred && !IsReadOnlyPtr)	for (auto AC : S) {
	continue;	if (AC.getPointer() != Ptr)
	// Otherwise, the pointer must be in the PtrAccessSet, either as a read	continue;
	// or a write.
	assert(((IsReadOnlyPtr && UseDeferred) \|\| IsWrite \|\|
	S.count(MemAccessInfo(Ptr, false))) &&
	"Alias-set pointer not in the access set?");

	MemAccessInfo Access(Ptr, IsWrite);
	DepCands.insert(Access);

	// Memorize read-only pointers for later processing and skip them in the
	// first round (they need to be checked after we have seen all write
	// pointers). Note: we also mark pointer that are not consecutive as
	// "read-only" pointers (so that we check "a[b[i]] +="). Hence, we need
	// the second check for "!IsWrite".
	if (!UseDeferred && IsReadOnlyPtr) {
	DeferredAccesses.insert(Access);
	continue;
	}

	// If this is a write - check other reads and writes for conflicts. If	bool IsWrite = AC.getInt();
	// this is a read only check other writes for conflicts (but only if
	// there is no other write to the ptr - this is an optimization to	// If we're using the deferred access set, then it contains only
	// catch "a[i] = a[i] + " without having to do a dependence check).	// reads.
	if ((IsWrite \|\| IsReadOnlyPtr) && SetHasWrite) {	bool IsReadOnlyPtr = ReadOnlyPtr.count(Ptr) && !IsWrite;
	CheckDeps.insert(Access);	if (UseDeferred && !IsReadOnlyPtr)
	IsRTCheckNeeded = true;	continue;
	}	// Otherwise, the pointer must be in the PtrAccessSet, either as a
		// read or a write.
		assert(((IsReadOnlyPtr && UseDeferred) \|\| IsWrite \|\|
		S.count(MemAccessInfo(Ptr, false))) &&
		"Alias-set pointer not in the access set?");

		MemAccessInfo Access(Ptr, IsWrite);
		DepCands.insert(Access);

		// Memorize read-only pointers for later processing and skip them in
		// the first round (they need to be checked after we have seen all
		// write pointers). Note: we also mark pointer that are not
		// consecutive as "read-only" pointers (so that we check
		// "a[b[i]] +="). Hence, we need the second check for "!IsWrite".
		if (!UseDeferred && IsReadOnlyPtr) {
		DeferredAccesses.insert(Access);
		continue;
		}

	if (IsWrite)	// If this is a write - check other reads and writes for conflicts. If
	SetHasWrite = true;	// this is a read only check other writes for conflicts (but only if
		// there is no other write to the ptr - this is an optimization to
	// Create sets of pointers connected by a shared alias set and	// catch "a[i] = a[i] + " without having to do a dependence check).
	// underlying object.	if ((IsWrite \|\| IsReadOnlyPtr) && SetHasWrite) {
	typedef SmallVector<Value *, 16> ValueVector;	CheckDeps.insert(Access);
	ValueVector TempObjects;	IsRTCheckNeeded = true;
	GetUnderlyingObjects(Ptr, TempObjects, DL);	}
	for (Value *UnderlyingObj : TempObjects) {
	UnderlyingObjToAccessMap::iterator Prev =	if (IsWrite)
	ObjToLastAccess.find(UnderlyingObj);	SetHasWrite = true;
	if (Prev != ObjToLastAccess.end())
	DepCands.unionSets(Access, Prev->second);	// Create sets of pointers connected by a shared alias set and
		// underlying object.
	ObjToLastAccess[UnderlyingObj] = Access;	typedef SmallVector<Value *, 16> ValueVector;
		ValueVector TempObjects;
		GetUnderlyingObjects(Ptr, TempObjects, DL);
		for (Value *UnderlyingObj : TempObjects) {
		UnderlyingObjToAccessMap::iterator Prev =
		ObjToLastAccess.find(UnderlyingObj);
		if (Prev != ObjToLastAccess.end())
		DepCands.unionSets(Access, Prev->second);

		ObjToLastAccess[UnderlyingObj] = Access;
		}
	}	}
	}	}
	}	}
Context not available.

test/Transforms/LoopVectorize/loop-vect-memdep.ll

This file was added.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; RUN: opt < %s -S -loop-vectorize -debug-only=loop-vectorize 2>&1 \| FileCheck %s

				; CHECK: LV: Can't vectorize due to memory conflicts

				define void @test_loop_novect(double** %arr, i64 %n) {
				for.body.lr.ph:
				%t = load double** %arr, align 8
				br label %for.body

				for.body: ; preds = %for.body, %for.body.lr.ph
				%i = phi i64 [ 0, %for.body.lr.ph ], [ %i.next, %for.body ]
				%a = getelementptr inbounds double* %t, i64 %i
				%i.next = add nuw nsw i64 %i, 1
				%a.next = getelementptr inbounds double* %t, i64 %i.next
				%t1 = load double* %a, align 8
				%t2 = load double* %a.next, align 8
				store double %t1, double* %a.next, align 8
				store double %t2, double* %a, align 8
				%c = icmp eq i64 %i, %n
				br i1 %c, label %final, label %for.body

				final: ; preds = %for.body
				ret void
				}