This is an archive of the discontinued LLVM Phabricator instance.

Add #pragma clang loop vectorize_assume_alignment(n)
Needs ReviewPublic

Authored by m-happy on Nov 6 2019, 8:25 AM.

Details

Summary

This patch adds functionality '#pragma clang loop vectorize_assume_alignment'. The load/store access in for loop following the '#pragma clang loop vectorize_assume_alignment(n)' uses n bit for align access.

#pragma clang loop vectorize_assume_alignment(n)
for(){
  ....
}

This pragma depends on D69900

Diff Detail

Event Timeline

m-happy created this revision.Nov 6 2019, 8:25 AM
lebedev.ri added inline comments.
clang/docs/LanguageExtensions.rst
3135

What does this actually mean, in the end?
Will it assume that whatever alignment is required
for whatever vectorization width chosen,
is the actual alignment?

m-happy edited the summary of this revision. (Show Details)Nov 6 2019, 9:08 AM
m-happy edited the summary of this revision. (Show Details)
hfinkel added inline comments.Nov 6 2019, 9:10 AM
clang/docs/LanguageExtensions.rst
3135

Also, just arrays, or also pointers?

hfinkel added inline comments.Nov 6 2019, 10:49 AM
clang/docs/LanguageExtensions.rst
3135

Currently, it is using 32 bit.

You'll need to use the preferred alignment of the element type.

By pointers, if you mean pointers to arrays? Then Yes.

No, I mean pointers.

What do you intend for these cases:

double A[Size], B[Size];

void foo() {
#pragma clang loop aligned
  for (int i = 0; i < n; ++i) {
    A[i] = B[i] + 1;
  }
}

or this:

void foo(double *A, double *B) {
#pragma clang loop aligned
  for (int i = 0; i < n; ++i) {
    A[i] = B[i] + 1;
  }
}

or this:

void foo(double *A, double *B, double *C) {
#pragma clang loop aligned
  for (int i = 0; i < n; ++i) {
    A[i] = B[i] + *C + 1;
  }
}

or this:

void foo(double *A, double *B) {
#pragma clang loop aligned
  for (int i = 0; i < n; ++i) {
    *A++ = *B++ + 1;
  }
}

what about arrays of structs?

Please see my comments in D69900 as well.

For the font-end side, I suggest the syntax vectorize_assume_alignment(32) instead. We also need to define what this assumes to be aligned. The first element of the array? Every array element? In an array-of-structs, the struct of the element that is accessed? Scalars?

Please see my comments in D69900 as well.

For the font-end side, I suggest the syntax vectorize_assume_alignment(32) instead. We also need to define what this assumes to be aligned. The first element of the array? Every array element? In an array-of-structs, the struct of the element that is accessed? Scalars?

It would be really best if this could just make use of CodeGenFunction::EmitAlignmentAssumption(),
else this is a heavy hammer with loose handle.

It would be really best if this could just make use of CodeGenFunction::EmitAlignmentAssumption(),
else this is a heavy hammer with loose handle.

Since it is a property of the data structure, not of the vectorization, __builtin_assume_aligned might be preferable anyway.

m-happy retitled this revision from Add #pragma clang loop aligned to Add #pragma clang loop vectorize_assume_alignment(n).Nov 13 2019, 2:27 PM
m-happy edited the summary of this revision. (Show Details)
m-happy edited the summary of this revision. (Show Details)
m-happy updated this revision to Diff 229190.Nov 13 2019, 2:32 PM

Updated the syntax as suggested by Michael Kruse, and added functionality to specify the number used for alignment instead of just using 32bits.

To reword, if vectorize_assume_alignment(32) is *NOT* lowered
via CodeGenFunction::EmitAlignmentAssumption() in clang frontend,
but lowered into LoopAttributes::VectorizeAssumeAlignment,
then once the alignment that was specified does not match reality,
there will be a miscompile instead of an UBSan error.

Could you elaborate why this is better than __builtin_assume_aligned?