Page MenuHomePhabricator

[BasicAA] Use MayAlias instead of PartialAlias for fallback.

Authored by Meinersbur on Jun 17 2017, 10:57 AM.



Using various methods, BasicAA tries to determine whether two GetElementPtr memory locations alias when its base pointers are known to be equal. When none of its heuristics are applicable, it falls back to PartialAlias to, according to a comment, protect TBAA making a wrong decision in case of unions and malloc. PartialAlias is not correct, because a PartialAlias result implies that some, but not all, bytes overlap which is not necessarily the case here.

AAResults returns the first analysis result that is not MayAlias. BasicAA is always the first alias analysis. When it returns PartialAlias, no other analysis is queried to give a more exact result (which was the intention of returning PartialAlias instead of MayAlias). For instance, ScopedAA could return a more accurate result.

The PartialAlias hack was introduced in r131781 (and re-applied in r132632 after some reverts) to fix where TBAA returns a wrong NoAlias result due to a union. No test case for the malloc case mentioned in the comment was provided.

Since r303851 ( clang does emit specific TBAA for unions anymore (but "omnipotent char" instead). This BasicAA hack only hides the problem in some cases where GetElementPtr is used on a common base pointer. In many other situation, it would still return MayAlias even on union types.

The hack prevents the SLPVectorizer to vectorize in some cases. It is not able to vectorize:

void partialalias(void * restrict Cp, void * restrict Ap,void * restrict Bp, float delta, int64_t i, int64_t j) {
  float *C = Cp;
  float *A = Ap;
  float *B = Bp;
  int64_t shl = i << 3l;
  int64_t base = 16*j;
  int64_t i0 = base + (shl | 0);
  int64_t i1 = base + (shl | 1);
  int64_t i2 = base + (shl | 2);
  int64_t i3 = base + (shl | 3);
  C[i0] = delta * A[i0] * B[i0];
  C[i1] = delta * A[i1] * B[i1];
  C[i2] = delta * A[i2] * B[i2]; 
  C[i3] = delta * A[i3] * B[i3]; 

but the following:

void noalias(float (* restrict C)[2028], float (* restrict A)[2028], float (* restrict B)[2028], float delta, int64_t i, int64_t j) {
  int64_t shl = i << 3;
  int64_t base = 16*j;
  int64_t i0 = base + (shl | 0);
  int64_t i1 = base + (shl | 1);
  int64_t i2 = base + (shl | 2);
  int64_t i3 = base + (shl | 3);
  C[0][i0] = delta * A[0][i0] * B[0][i0];
  C[0][i1] = delta * A[0][i1] * B[0][i1];
  C[0][i2] = delta * A[0][i2] * B[0][i2]; 
  C[0][i3] = delta * A[0][i3] * B[0][i3]; 

In the latter, BasicAA uses the knowledge that i0 is even, but i1 is odd, from DemandedBits. This information is not used in the former case because it is the first index of a GetElementPtr instruction. The SLPVectorizer on the other hand uses ScalarEvolution to determine whether two accesses are consecutive, which can handle this case. In general, SCEV-AA might be more suited to handle aliasing of pointers of the pattern "BasePtr + dynamic offset".

Polly emits additional scoped-aa metadata to mark the pointers as non-aliasing. However, due to BasicAA returning PartialAlias, the ScopedAA is effectively ignored. This can cause a Polly-optimized function to be slower being inlined than without inlining because the BasicAA falls back to PartialAlias since the index expression is more complicated. File

contains some Polly-generated code where this is case and is the same code but without inlining of the hostspot code, which is 4x faster thanks to the SLPVectorizer.

Note that with this patch, SLPVectorizer is still not able to vectorize the "partialalias()" function above without an additional alias analysis (or applying DemandedBits also to the first index of a GetElementPtr).

Diff Detail

Event Timeline

Meinersbur created this revision.Jun 17 2017, 10:57 AM
sanjoy added a subscriber: sanjoy.Jun 17 2017, 11:56 AM
hfinkel accepted this revision.Jun 19 2017, 7:16 PM

I'd also like to get rid of this hack, and now that the union situation is resolved (at least in the sense that the metadata is now conservatively correct), I think that we should try. Assuming that self hosting and an associated test suite run is clean, this LGTM.

This revision is now accepted and ready to land.Jun 19 2017, 7:16 PM
This revision was automatically updated to reflect the committed changes.