This is an archive of the discontinued LLVM Phabricator instance.

Differential D21773

[clang] Update an optimization remark test for change D18777
AbandonedPublic

Authored by lihuang on Jun 27 2016, 4:12 PM.

Download Raw Diff

Details

Reviewers

reames

Summary

Update an optimization remark test for change D18777.

This test checks the loop-vectorization remarks when pointer checking threshold is exceeded. The change in D18777 would introduce zexts that cannot be removed so that the "loop not vectorized" reason is changed, hence breaking this test.

Modified the offsets to be 1 and the zexts could be finally removed by indvars (this magic fact is attributed to some scev mechanisms). Since the purpose of this test is checking the vectorization options, the offset numbers don't matter.

Diff Detail

Event Timeline

lihuang updated this revision to Diff 62024.Jun 27 2016, 4:12 PM

lihuang retitled this revision from to [clang] Update an optimization remark test for change D18777.

lihuang updated this object.

lihuang added reviewers: sanjoy, reames.

lihuang added a subscriber: cfe-commits.

lihuang mentioned this in D18777: [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers.Jun 27 2016, 4:19 PM

Sound plausible, but I don't know this area (optimization remarks) well enough to sign off on this. @anemet can you please take a look?

This test checks the loop-vectorization remarks when pointer checking threshold is exceeded. The change in D18777 would introduce zexts that cannot be removed so that the "loop not vectorized" reason is changed, hence breaking this test.

Can you please elaborate? The "loop not vectorized" reason is changed, to what?

Hi Adam,

The change in D18777 breaks this test becasue it converts some sexts to zexts, which cannot be eliminated by indvar-simplification after widening IV.

The IR after indvar-simplification and before loop-vectorization is like:

...
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
...
add nuw nsw i64 %indvars.iv, 2                  // i + 2
%12 = trunc i64 %11 to i32                
%idxprom2047 = zext i32 %12 to i64
%arrayidx21 = getelementptr inbounds i32, i32* %C, i64 %idxprom2047
...
%14 = add nuw nsw i64 %indvars.iv, 3            // i + 3
%15 = trunc i64 %14 to i32
%idxprom2448 = zext i32 %15 to i64
...

IV is promoted to 64-bit but the trunc/zext cannot be eliminated (at least cannot be eliminated with the -O1 pass pipeline). Then optimzation remark becomes:

optimization-remark-options.c:17:3: remark: loop not vectorized: cannot identify array bounds
   [-Rpass-analysis=loop-vectorize]
for (int i = 0; i < N; i++) {

In D21773#469596, @lihuang wrote:
IV is promoted to 64-bit but the trunc/zext cannot be eliminated (at least cannot be eliminated with the -O1 pass pipeline). Then optimzation remark becomes:
optimization-remark-options.c:17:3: remark: loop not vectorized: cannot identify array bounds
   [-Rpass-analysis=loop-vectorize]
for (int i = 0; i < N; i++) {

That sounds like an optimization regression. It seems to me that you could create a testcase with fewer arrays than in the above test such that you don't exceed the max number of memchecks. This new testcase would be vectorized before D18777 but not after.

Adam

You are right. A regression test could be:

void foo2(int *dw, int *uw, int *A, int *B, int *C, int *D, int N) {
  for (int i = 0; i < N; i++) {
    dw[i] = A[i] + B[i - 1] + C[i - 2];
    uw[i] = A[i] + B[i + 2];
  }
}

need to fix the fundamental problem.

lihuang mentioned this in D18867: [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative .Jul 14 2016, 10:15 AM

sanjoy resigned from this revision.Aug 5 2016, 6:51 PM

sanjoy removed a reviewer: sanjoy.

lihuang abandoned this revision.Aug 10 2016, 3:05 PM

Revision Contents

Path

Size

test/

Frontend/

optimization-remark-options.c

2 lines

Diff 62024

test/Frontend/optimization-remark-options.c

Show All 10 Lines	double foo(int N) {
return v;		return v;
}		}

// CHECK: {{.*}}:17:3: remark: loop not vectorized: cannot prove it is safe to reorder memory operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop. If the arrays will always be independent specify '#pragma clang loop vectorize(assume_safety)' before the loop or provide the '__restrict__' qualifier with the independent array arguments. Erroneous results will occur if these options are incorrectly applied!		// CHECK: {{.*}}:17:3: remark: loop not vectorized: cannot prove it is safe to reorder memory operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop. If the arrays will always be independent specify '#pragma clang loop vectorize(assume_safety)' before the loop or provide the '__restrict__' qualifier with the independent array arguments. Erroneous results will occur if these options are incorrectly applied!

void foo2(int dw, int uw, int A, int B, int C, int D, int N) {		void foo2(int dw, int uw, int A, int B, int C, int D, int N) {
for (int i = 0; i < N; i++) {		for (int i = 0; i < N; i++) {
dw[i] = A[i] + B[i - 1] + C[i - 2] + D[i - 3];		dw[i] = A[i] + B[i - 1] + C[i - 2] + D[i - 3];
uw[i] = A[i] + B[i + 1] + C[i + 2] + D[i + 3];		uw[i] = A[i] + B[i + 1] + C[i + 1] + D[i + 1];
}		}
}		}