This is an archive of the discontinued LLVM Phabricator instance.

Add support for attribute "overallocated"
Needs RevisionPublic

Authored by ahatanak on Jun 16 2016, 4:13 PM.

Details

Summary

This patch adds support for attribute "overallocated", which will be used to indicate a union or struct is over-allocated. This is needed to have __builtin_object_size correctly compute the size of an object when malloc allocates extra space at the end of a struct or union. For example,

struct S {

int a;
char s[32];

};

void *p = malloc(sizeof(S) + 64);
unsigned s = builtin_object_size(((struct S*)p)->s, 1); // "s" should be 32+64=96, not 32.

The link to the relevant discussion on cfe-dev is here:

http://lists.llvm.org/pipermail/cfe-dev/2016-March/047782.html

Diff Detail

Event Timeline

ahatanak updated this revision to Diff 61047.Jun 16 2016, 4:13 PM
ahatanak retitled this revision from to Add support for attribute "overallocated".
ahatanak updated this object.
ahatanak added subscribers: cfe-commits, dexonsmith, hfinkel.
rsmith added inline comments.Jun 16 2016, 5:23 PM
include/clang/Basic/AttrDocs.td
2073–2079

I don't think this says what you mean. "Overallocated" alone does not imply that you can use S::b to access more than 4 bytes. If you want that to work, I think the right model is for the attribute to imply that the final member within the struct is treated as a flexible array member, no matter what its array bound is.

This should then apply to everywhere we consider flexible array members, not just to builtin_object_size.

ahatanak updated this revision to Diff 61145.Jun 17 2016, 6:32 PM

Fix a bug in tryEvaluateBuiltinObjectSize. If the pointer passed to builtin_object_size doesn't point to an array, it should be able to compute the exact size of the subobject the pointer points to. Therefore, it should be able to tell the last call to builtin_object_size in test/CodeGen/object-size.c should return 128.

ahatanak added inline comments.Jun 17 2016, 6:49 PM
include/clang/Basic/AttrDocs.td
2073–2079

I intended overallocated to mean there might be extra memory at the end of a struct or union regardless of whether or not the last member is an array. This seemed more useful than restricting it to structs with array members as was discussed in the cfe-dev thread. For example, Hal mentioned that we might have caught the bugs you fixed in r262891 had we used this attribute on over-allocated classes.

Does this sound reasonable to you or do you see any problem with this approach? Alternatively, we can use an attribute (variable_length_array or gcc's bnd_variable_size) that will be attached to the array, which is good enough for the use case we care about (we want __builtin_object_size to return an exact or a conservative value for struct sockaddr_un that was mentioned in the thread).

aaron.ballman added inline comments.Jun 20 2016, 7:24 AM
include/clang/Basic/Attr.td
2279

Can drop the "Expected" string literal. A record type should automatically do the right thing.

lib/Sema/SemaDeclAttr.cpp
4932

This isn't required.

5398

Can just call handleSimpleAttribute() instead.

Thanks for the patch!

lib/AST/ExprConstant.cpp
1055

Did you mean : 1 here?

ahatanak updated this revision to Diff 61743.Jun 23 2016, 4:40 PM
ahatanak edited edge metadata.

Address review comments and change the wording in AttrDocs.td to explain what the attribute means and how it is used. Also, fixed the code in VisitMemberExpr to set LValue::OverAllocated before the base class of the member expression is visited. This change was needed because the base class of the innermost MemberExpr has to be examined to see whether the member belongs to an overallocated class. Without this change, the last check in test/CodeGen/object-size.c fails.

aaron.ballman requested changes to this revision.Jun 27 2016, 11:02 AM
aaron.ballman edited edge metadata.

Missing Sema tests for the attribute.

include/clang/Basic/AttrDocs.td
2082

I *think* this code example will give doxygen fits because it's not indented; have you tried generating the docs locally? You can test this by running: clang-tblgen -gen-attr-docs -I E:\llvm\llvm\tools\clang\include E:\llvm\llvm\tools\clang\include\clang\Basic\Attr.td -o E:\llvm\llvm\tools\clang\docs\AttributeReference.rst and then building the docs as normal. Note, you will have to revert AttributeReference.rst when you are done.

lib/AST/ExprConstant.cpp
1051–1057

All three of these should be using unsigned, otherwise you do not get the behavior you expect in MSVC (it allocates bit-fields of different types on their own boundaries).

This revision now requires changes to proceed.Jun 27 2016, 11:02 AM
rsmith requested changes to this revision.Jun 27 2016, 1:02 PM
rsmith edited edge metadata.
rsmith added inline comments.
include/clang/Basic/AttrDocs.td
2073–2079

No, this approach is not reasonable. Just changing what __builtin_object_size returns does not change the fact that code that tries to use bytes off the end of the struct would have undefined behavior. Lying in the result of __builtin_object_size is actively harmful.

Note that in your example below, you cannot access more than four chars through ((struct S*)p)->b, despite the attribute, because the attribute does not affect the behaviour of the array member of S.

The right thing to do here would presumably be to have an attribute that makes an array be treated as a flexible array member, *even if* its bound is specified (and greater than 0). This would affect __builtin_object_size, sanitizers, alias analysis, diagnostics for flexible array members in the middle of a type, and so on.

I think you are right. There are other places that need to be fixed to properly support over-allocated structures. I'll see if I can come up with a patch that treats the over-allocated array as flexible array member.

ahatanak updated this revision to Diff 62560.Jul 1 2016, 4:06 PM
ahatanak edited edge metadata.

The new patch defines a new attribute "flexible_array", which gets attached to the last array member of a struct.

I made changes to clang to treat arrays marked "flexible_array" as C99's flexible array members where it made sense to do so. There are several places where C99's flexible arrays and "flexible_array" are treated differently. For example, the attribute doesn't change the way arguments are passed or values are returned from a function. It doesn't change objective-c's type encoding either.

aaron.ballman added inline comments.Jul 5 2016, 7:15 AM
include/clang/AST/Decl.h
3249 ↗(On Diff #62560)

Does this require a bit, or can this simply be looked up as needed?

include/clang/Basic/Attr.td
18

Instead of "field", how about "non-static data member"? That should make it more clear as to what it pertains to.

2281

Should be able to drop the string literal here; it should work by default. If it doesn't, then ClangAttrEmitter.cpp should be updated to properly handle it.

include/clang/Basic/DiagnosticSemaKinds.td
2165 ↗(On Diff #62560)

This should be updated to use a %select instead of passing string literals (for eventual localization purposes).

2520 ↗(On Diff #62560)

non-static data members instead of fields.

lib/AST/ExprConstant.cpp
170

Is the bit required here as well, or can this be queried as-needed?

1125

Perhaps this should be a default argument, since it's false almost everywhere?

5168–5169

Should be const qualified.

lib/Sema/SemaDeclAttr.cpp
1808–1809

Should be const

ahatanak updated this revision to Diff 63299.Jul 8 2016, 12:45 PM
ahatanak edited edge metadata.

Address review comments.

ahatanak marked 7 inline comments as done.Jul 8 2016, 12:55 PM
ahatanak added inline comments.
include/clang/AST/Decl.h
3249 ↗(On Diff #63299)

Probably it can be looked up although it would require incrementing the field_iterator until it reaches the last non-static data member of the record.

lib/AST/ExprConstant.cpp
170

I don't think there is a way to do that reliably. The FieldDecl for the array isn't always available in struct LValue, as far as I can tell, so it looks like we'll need a bit here.

aaron.ballman added inline comments.Jul 11 2016, 11:02 AM
include/clang/AST/Decl.h
3249 ↗(On Diff #63299)

Ah shoot, I forgot that these are forward iterators, not bidirectional ones. I think the bit is probably the better way to go; we have bits to spare here.

include/clang/Basic/DiagnosticSemaKinds.td
2170 ↗(On Diff #63299)

I think "members of structs or classes" and "the last member of a struct" could be combined into "the last member of a non-union class", instead of using separate diagnostic text. What do you think?

2171 ↗(On Diff #63299)

Since you can only have one such member, I think we want to drop the plural from both of these. I think the first may read better as "a non-flexible array member" (since there are "arrays" and "flexible arrays", but not "fixed-sized arrays" in our diagnostics), and the second may read better as "an array member that has at least one element".

lib/AST/ExprConstant.cpp
170

Okay, that seems reasonable to me -- I don't think anyone is relying on that bit being stolen from MostDerivedPathLength.

ahatanak updated this revision to Diff 63596.Jul 11 2016, 3:14 PM

Change diagnostic messages.

include/clang/Basic/DiagnosticSemaKinds.td
2170 ↗(On Diff #63299)

I agree, I think these two diagnostics should be combined.

aaron.ballman accepted this revision.Jul 21 2016, 8:28 AM
aaron.ballman edited edge metadata.

LGTM with one small testing nit, but you should wait for @rsmith to chime in since he had comments previously.

test/SemaCXX/flexible-array-attr.cpp
52 ↗(On Diff #63596)

Would be good to change this to b[1] so that it looks like an otherwise well-formed construct. Should not change the diagnostic.

rsmith added inline comments.Dec 2 2016, 2:43 PM
include/clang/AST/Decl.h
3250 ↗(On Diff #63596)

How is this different from HasFlexibleArrayMember? Do we really need both?

include/clang/Basic/AttrDocs.td
2076–2077

Instead of "This attribute is useful when you want", I would suggest "This attribute causes", since that is far from the only effect. It should also disable the sanitizer checks for array bounds violations, for example.

include/clang/Basic/DiagnosticSemaKinds.td
2520 ↗(On Diff #62560)

Again, this will be confusing in C. Our diagnostics generally use the word "field" to refer to fields in C or non-static data members in C++.

2170 ↗(On Diff #63299)

We shouldn't talk about classes when targeting C.

2173 ↗(On Diff #63596)

I don't think it's very clear what "nested" means here. I assume you mean that a struct with a flexible_array member can't be used as the type of a field. If so, why not? C allows that for other kinds of flexible array members.

lib/AST/ExprConstant.cpp
287

Pointer arithmetic that leaves the initial portion of a member with the flexible_array attribute will lose its designator here. Is that what you want?

5167

Should we also set HasFlexibleArrayAttr on *real* flexible array members (which will be fields with IncompleteArrayType) rather than invalidating the designator?

lib/CodeGen/CGExpr.cpp
695–696 ↗(On Diff #63596)

This is wrong. We shouldn't disable all bounds checking for all array members in a class just because it ends in a flexible array member: only that member should have its checks disabled. You can reverse the sense of this as a fast-path for the common case of a class without a flexible array member, though.

ahatanak added inline comments.Dec 8 2016, 3:23 PM
include/clang/AST/Decl.h
3250 ↗(On Diff #63596)

As I commented below, we don't need both if we are going to treat an array with flexible_array attribute like other existing flexible arrays.

include/clang/Basic/DiagnosticSemaKinds.td
2173 ↗(On Diff #63596)

Yes, that's what I meant. I disallowed using struct with a flexible_array member to be used as a field in another struct just because the use case I had in mind didn't need that (see the link below). We don't need to restrict its usage if we are going to handle it like other existing flexible array members.

http://lists.llvm.org/pipermail/cfe-dev/2016-February/047561.html

lib/AST/ExprConstant.cpp
5167

I'm assuming you are also suggesting we call Result.addArray on "real" flexible arrays too?

ahatanak added inline comments.Dec 9 2016, 6:20 PM
include/clang/AST/Decl.h
3250 ↗(On Diff #63596)

I forgot to mention this, but one reason I didn't treat arrays with flexible_array attribute as C99 flexible arrays is that there are a couple of places in IRGen (TargetInfo.cpp) that call hasFlexibleArrayMember() to decide how to pass arguments to functions. For example, when I compile the following code with "-arch arm64", the first function receives a pointer while the second one receives a value:

typedef struct {
  int a;
  int b[5];
} S0;

typedef struct {
  int a;
  int b[];
} S1;

S0 g0;
S1 g1;

void foo0(S0 s0) {
  g0 = s0;
}

void foo1(S1 s1) {
  g1 = s1;
}

I agree with you that treating flexible_arrays as C99 flexible arrays will probably keep the code simple and clean, but I have to see if anyone is using structs with flexible arrays in way that will cause ABI breakage.

ahatanak updated this revision to Diff 81090.Dec 12 2016, 8:48 AM
ahatanak edited edge metadata.

Address review comments.

  • Arrays marked "flexible_array" are now treated as flexible arrays.
  • __builtin_object_size returns a more accurate numbers for normal C99 flexible arrays (see test/CodeGen/object-size.c).

Note that a couple of tests fail with this patch applied since this patch depends on https://reviews.llvm.org/D24999.

This revision now requires changes to proceed.Apr 28 2017, 2:37 PM