This is an archive of the discontinued LLVM Phabricator instance.

[RFC][AliasAnalysis][BasicAA] Return MayAlias for the pointer plus variable offset to structure object member
ClosedPublic

Authored by shiva0217 on Apr 10 2018, 8:16 PM.

Details

Summary

In the following case, pointer addoff calculate by "&x.q + off" may have a chance alias to x.p structure member.

struct X {
  int *p;
  int *q;
};
int foo(int i, int j, int off)
{
  struct X x;
  int **t, *addoff;
  x.p = &i;
  x.q = &j;
  t = &x.q;
  t += off;
  addoff = *t;
  return *addoff;
}

Diff Detail

Repository
rL LLVM

Event Timeline

shiva0217 created this revision.Apr 10 2018, 8:16 PM
shiva0217 retitled this revision from [RFC][BasicAA] Return MayAlias for the pointer plus variable offset to structure object member to [RFC][AliasAnalysis][BasicAA] Return MayAlias for the pointer plus variable offset to structure object member.
shiva0217 edited the summary of this revision. (Show Details)Apr 10 2018, 11:21 PM
efriedma added inline comments.
lib/Analysis/BasicAliasAnalysis.cpp
1164 ↗(On Diff #141945)

*indices.

You should probably fix the big block comment as well; "we can't count on the constant offsets that come from non-struct sources" isn't an accurate description of the LangRef rules for GEPs.

shiva0217 updated this revision to Diff 142108.Apr 11 2018, 7:18 PM
shiva0217 added inline comments.
lib/Analysis/BasicAliasAnalysis.cpp
1164 ↗(On Diff #141945)

Hi @efriedma. Thanks for your comments. I update the comment block. Is it ok?

efriedma added inline comments.Apr 12 2018, 12:47 PM
lib/Analysis/BasicAliasAnalysis.cpp
1164 ↗(On Diff #141945)

Could you reorganize the code so it looks more like if (!DecompGEP.VarIndices.empty()) return false; int64_t GEPBaseOffset = DecompGEP.StructOffset + DecompGEP.OtherOffset;?

This C case is actually not legal (it's been discussed before many times).
You cannot take the address of .a field, add an offset, and get to another field legally.
So i wouldn't express it in these terms, since in that language, this is clearly illegal.

The LLVM version, of course, does not make this undefined behavior AFAIK (which is a source of much imprecision)

This C case is actually not legal (it's been discussed before many times).
You cannot take the address of .a field, add an offset, and get to another field legally.
So i wouldn't express it in these terms, since in that language, this is clearly illegal.

The LLVM version, of course, does not make this undefined behavior AFAIK (which is a source of much imprecision)

Hi @dberlin. Thanks for the comments. So it's a language restriction which is illegal! Would you mind point me out which rule in C99 mention this restriction or the threads relative to this topic we could follow?

Ultimately the undefined behavior comes out of section 6.5.6 of the C standard: "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

The LLVM version, of course, does not make this undefined behavior AFAIK (which is a source of much imprecision)

I agree. I believe this is UB in C/C++, but is well defined at the IR level (and, thus, we should still fix this).

shiva0217 updated this revision to Diff 142361.EditedApr 13 2018, 2:28 AM

Hi @dberlin, @efriedma, @hfinkel,

I have found some discussion about the memory layout of the structural member.
It couldn't guarantee due to the padding and alignment are according to CPU target and not restrict in C language.
So pointer arithmetic for structure member will produce undefined behavior.
Thanks for pointing me out this.

Still, @hfinkel's point seems reasonable because GEP has well defined its behavior.
So should we return MayAlias in this case?

efriedma accepted this revision.Apr 13 2018, 11:54 AM

LGTM.

Yes, this patch is correct.

This revision is now accepted and ready to land.Apr 13 2018, 11:54 AM
This revision was automatically updated to reflect the committed changes.