This is an archive of the discontinued LLVM Phabricator instance.

Allow for C's "writing off the end" idiom in __builtin_object_size
ClosedPublic

Authored by george.burgess.iv on Sep 11 2015, 4:29 PM.

Details

Reviewers
rsmith
Summary

In C, a common idiom is:

struct Foo { int a; char cs[1] };
struct Foo *F = (struct Foo *)malloc(sizeof(Foo) + strlen(SomeString));
strcpy(F->cs, SomeString);

Currently, __builtin_object_size does not allow for this, which breaks some existing code. This patch makes us answer conservatively in Clang if the following conditions are met:

  • Type is 1 or 3
  • The Base is invalid/can't be determined
  • The subobject we're referencing is the last subobject in the struct
  • The subobject we're referencing is an array with 0 or 1 elements (for 0 elements, both char foo[] and char foo[0] syntaxes are supported)

Diff Detail

Event Timeline

george.burgess.iv retitled this revision from to Allow for C's "writing off the end" idiom in __builtin_object_size.
george.burgess.iv updated this object.
george.burgess.iv added a reviewer: rsmith.
rsmith added inline comments.Sep 11 2015, 4:46 PM
lib/AST/ExprConstant.cpp
165

array -> array element.

4455–4458

I think you should bail out above, with the base set to the MemberExpr and with an empty designator, rather than applying a member designator on top of a base that already refers to the member (and then needing to undo the effect on the offset here).

6471

This only seems necessary when Type is 1; for type 3, returning 1 would be conservatively correct as a lower bound on the known-accessible storage, and is better than giving up here and evaluating the object size as 0.

6472–6473

Ideally, we should walk the designator and make sure that at each step we picked the last subobject (last array element, last field, ...) -- that is, we want to know that we're really looking at a trailing flexible array member, and not just a size-1 array in the middle of some object.

You should also check that the designator refers to the most-derived object (that is, that Entries.size() == MostDerivedPathLength), because given:

struct A { char buffer[10]; };
struct B : A {};
struct C { B b[1]; } *c;
int m = __builtin_object_size(c->b[0], 1);
int n = __builtin_object_size((A*)c->b[0], 1);

... we should presumably compute m == -1 but n == 10, because for n the A subobject of the B object is known to have size 10, even though we're looking at a base subobject of a size-1 trailing array.

george.burgess.iv marked 4 inline comments as done.

Addressed all feedback -- added a walk of the Designator as suggested.

Regarding isDesignatorAtObjectEnd: I'm assuming that the Index returned by FieldDescriptor::getFieldIndex has some bearing on the actual position of the field in its parent. Specifically, that if FD->getFieldIndex() + 1 == numFieldsIn(FD->getParent()), then FD is the last field in its parent. I can't find documentation on this guarantee though, so I'm happy to swap to walking the offsets of each field in FD->getParent() if we feel that would be a more robust approach.

lib/AST/ExprConstant.cpp
4455–4458

With the new approach, this change isn't even necessary anymore :)

6471

Good catch

6472–6473

You should also check that the designator refers to the most-derived object (that is, that Entries.size() == MostDerivedPathLength), because given

That's a fun case! Thanks for catching that.

Ideally, we should walk the designator and make sure that at each step we picked the last subobject (last array element, last field, ...) -- that is, we want to know that we're really looking at a trailing flexible array member, and not just a size-1 array in the middle of some object

That's the goal of the AtEndOfObject check below. :) I guess it would fail with things like union { char c[1]; int a; }; though. Added.

FWIW, I'm really interested in seeing this patch in!

Thanks,
Michael

rsmith added inline comments.Oct 8 2015, 3:07 PM
lib/AST/ExprConstant.cpp
6313–6316

BaseType = getType(Base);

6317–6320

This seems like the wrong way to handle this case (we'll be computing the wrong type for such LValues in other cases). In LValueExprEvaluatorBase::VisitMemberExpr, we should use

Result.setInvalid(E);
return false;

instead of

Result.setInvalid(E->getBase());
// fall through to perform member access

I thought we'd already made that change, but it appears not.

6333–6335

You should check that the ArrayIndex is 1 in this case (a pointer to the __real__ component doesn't point to the end of the object).

6345–6350

Not updating BaseType will cause odd things to happen on the later iterations of this loop. You should probably just return false here.

george.burgess.iv marked 4 inline comments as done.

Addressed all feedback.

Also, updated object-size tests to use CHECK-LABEL instead of CHECK, because yay I’m learning how to do things properly.

lib/AST/ExprConstant.cpp
6333–6335

Nice catch!

rsmith accepted this revision.Oct 15 2015, 5:30 PM
rsmith edited edge metadata.

Some minor typographical comments.

Please add some tests for the union case, then this LGTM.

lib/AST/ExprConstant.cpp
6333–6336

OK, but please reflect that this is specific to __builtin_object_size in the name or at least doc comment for this function.

6361

This is missing a noun.

6362

I think this would be clearer if it were named hasNonemptyDesignator... or perhaps reverse the sense of it and name it refersToCompleteObject or similar?

This revision is now accepted and ready to land.Oct 15 2015, 5:30 PM
george.burgess.iv marked 3 inline comments as done.

r250488. Thanks for the review!