Page MenuHomePhabricator

[PATCH] Fix a bug about memory dependence checking in loop vectorization
Needs ReviewPublic

Authored by Jiangning on Dec 19 2014, 2:27 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

In memory dependence checking module of loop vectorization, the algorithm tries to collect memory access candidates from AliasSetTracker AccessAnalysis::processMemAccesses, and then check memory dependences one another in MemoryDepChecker::isDependent.

The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in 'PtrAccessSet Accesses' of AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists by using statement "bool IsWrite = S.count(MemAccessInfo(Ptr, true));". This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed.

The attached test case exposed a loop body like below,

{code}

// loop body
   ... = a[i]          (1)
    ... = a[i+1]      (2)
 .......
a[i+1] = ....           (3)
   a[i] = ...            (4)

{code}

If we ignore two reads, the dependence between (1) and (3) after vectorization would not be able to be captured, and finally this loop will be incorrectly vectorized.

The fix in this patch simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, I think it will not increase compile-time visibly.

Thanks,
-Jiangning

Diff Detail

Repository
rL LLVM

Event Timeline

Jiangning updated this revision to Diff 17484.Dec 19 2014, 2:27 AM
Jiangning retitled this revision from to [PATCH] Fix a bug about memory dependence checking in loop vectorization.
Jiangning updated this object.
Jiangning edited the test plan for this revision. (Show Details)
Jiangning set the repository for this revision to rL LLVM.
Jiangning updated this object.
Jiangning updated this object.
Jiangning updated this object.
Jiangning added a subscriber: Unknown Object (MLST).

Hi Jiangning,

I think the test case can be simplified so that we can understand the problem easily. I've simplified it into

define void @test_loop_novect(double** %Array_.i, i64 %indvars.iv1438) {
for.body209.lr.ph:
  %t275 = load double** %Array_.i, align 8
  br label %for.body209

for.body209:                                      ; preds = %for.body209, %for.body209.lr.ph
  %indvars.iv1436 = phi i64 [ 0, %for.body209.lr.ph ], [ %indvars.iv.next1437, %for.body209 ]
  %arrayidx.i954 = getelementptr inbounds double* %t275, i64 %indvars.iv1436
  %indvars.iv.next1437 = add nuw nsw i64 %indvars.iv1436, 1
  %arrayidx.i997 = getelementptr inbounds double* %t275, i64 %indvars.iv.next1437
  %t282 = load double* %arrayidx.i954, align 8
  %t284 = load double* %arrayidx.i997, align 8
  store double %t282, double* %arrayidx.i997, align 8
  store double %t284, double* %arrayidx.i954, align 8
  %exitcond1441 = icmp eq i64 %indvars.iv1436, %indvars.iv1438
  br i1 %exitcond1441, label %invoke.cont239, label %for.body209

invoke.cont239:                                   ; preds = %for.body209
  ret void
}

Then it's easier to know that the case is like

tmp1 = a[i]
tmp2 = a[i+1]
a[i+1] = tmp1
a[i] = tmp2

Besides that, I think the fix itself is reasonable and LGTM.

Jiangning updated this revision to Diff 17786.Jan 5 2015, 1:10 AM

Following Hao's feedback, I got the test case simplified.