This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Instead of rejecting C unions with non-trivial fields, detect attempts to destruct/initialize/copy them.
ClosedPublic

Authored by ahatanak on Jun 24 2019, 11:59 PM.

Details

Summary

This patch diagnoses uses of non-trivial C unions in the following contexts:

  • function parameters
  • function returns
  • variables with automatic storage duration
  • assignment
  • compound literal with automatic storage duration
  • block capture except when a __block variable is captured by a non-escaping block

Note that we already reject non-trivial C structs/unions used for variable function arguments. Also this patch doesn't detect non-trivial C unions passed to functions without function prototypes. I think that's okay since clang will reject the function definition's parameters, but if it isn't okay, it's possible to add diagnostics for that too.

See the discussion in https://reviews.llvm.org/D62988 for more background.

rdar://problem/50679094

Diff Detail

Repository
rL LLVM

Event Timeline

ahatanak created this revision.Jun 24 2019, 11:59 PM
ahatanak planned changes to this revision.Jun 26 2019, 1:13 PM

This patch should diagnose lvalue-to-rvalue conversion of volatile non-trivial C unions too, but it currently doesn't.

void test(volatile union U0 *a) {
  (void)*a;
}
rjmccall added inline comments.Jun 26 2019, 1:55 PM
include/clang/Basic/DiagnosticSemaKinds.td
633 ↗(On Diff #206368)

Should this be "`%select{contains|is}2 a union that is non-trivial to..."? None of these situations bans non-trivial types, just unions containing them (and aggregates containing such unions).

include/clang/Sema/Sema.h
2116 ↗(On Diff #206368)

If this is parallel with the diagnostic, I'd suggest mentioning that in the comment.

2121 ↗(On Diff #206368)

Same typo in both of these.

2126 ↗(On Diff #206368)

Please expand this comment to clarify that this it's only used when we (might be) initializing an object by copying another.

Why is this specific to automatic storage duration? Just because C doesn't allow other initializers to be non-constant in the first place?

lib/Sema/SemaDecl.cpp
11085 ↗(On Diff #206368)

Why is this okay? Don't we need to check default-initialization for these?

11089 ↗(On Diff #206368)

Please include a comment like:

// Assume all other initializers involving copying some existing object.
// TODO: ignore any initializers where we can guarantee copy-elision
ahatanak marked 2 inline comments as done.Jun 26 2019, 6:01 PM
ahatanak added inline comments.
include/clang/Sema/Sema.h
2126 ↗(On Diff #206368)

Yes, that was my reasoning. We want to reject copy initializations of non-trivial C unions with automatic storage duration since the compiler doesn't know how to do the copy, but I don't think variables with static duration have that problem since the initializers are constant, which enables them to be initialized according to the rules described in the C standard.

lib/Sema/SemaDecl.cpp
11085 ↗(On Diff #206368)

I didn't consider an ImplicitValueInitExpr to be a default-initialization since IRGen currently emits a memcpy to initialize the field instead of calling a synthesized default-initialization function as it does when a local variable that's a non-trivial C struct is declared without an initializer.

typedef struct {
  id f0, f1;
} S0 ;

typedef struct {
  S0 s0;
  int f2;
} S;

void test(void) {
  // emits a memcpy of a global constant.
  S s = { .f2 = 1 };
}

It's not clear to me whether this is a default-initialization, but if it is, we should check default-initialization here and IRGen should be fixed to emit a call to the default-initialization function.

rjmccall added inline comments.Jun 26 2019, 10:00 PM
lib/Sema/SemaDecl.cpp
11085 ↗(On Diff #206368)

C does not use the terms "default-initialization" and "value-initialization", but the the initialization performed for static/thread storage duration objects generally matches what C++ would call value-initialization. These rules are laid out in section 6.7.9 of the standard, which also includes the following in its description of the rules for list-initializations of aggregates (regardless of their storage duration):

p19: ...all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

So s.s0 must be initialized as if a static object (in C++ terms, value-initialized), not left with an indeterminate value like an object of automatic storage duration without an initializer would be (in C++ terms, default-initialized).

I assume that the default-initialization function for a non-trivial C struct only initializes the fields that have non-trivial default-initialization. That's not good enough for value-initialization, which is required to recursively value-initialize *all* of the fields (and even zero-initialize any padding). Note that that's the *only* difference between these two kinds of initialization. Furthermore, if you think about what default-initialization actually means for all our leaf non-trivial C types, it's just assigning a predictable bit-pattern (e.g. a null pointer for ARC's __strong and __weak references), not doing anything that requires executing tailored code at runtime. So here's what that tells us:

  • We can't call the default-initialization function because it might leave trivial fields uninitialized (in general, although struct S0 in your example doesn't have any, so effectively it does a full value-initialization).
  • memcpy from a global constant is always a valid implementation of value-initialization for non-trivial C types. It's also a valid implementation for default-implementation; it just might do more than it really needs to. (In the case of struct S0, it would be *more efficient* to use memset, but memcpy is still *valid*.)

    Now, maybe someday we'll add a non-trivial C type that really requires non-trivial code to initialize, but I doubt we'd want to, because we'd have to forbid e.g. declaring global variables of that type, and we wouldn't have a simple answer for telling people how to properly initialize a dynamically-allocated object. Even if we do, we can wait until then to generalize.
  • Regardless, yes, this is a value-initialization and so it should check whether we can value-initialize the type. Since value-initialization just means default-initialization plus a bunch of zero-initialization (which we can assume we can always do), that just means checking whether we can default-initialize the type.
ahatanak updated this revision to Diff 206995.Jun 27 2019, 11:29 PM
ahatanak marked 7 inline comments as done.

Address review comments. Diagnose lvalue-to-rvalue conversion of volatile non-trivial C types.

ahatanak added inline comments.Jun 27 2019, 11:33 PM
include/clang/Basic/DiagnosticSemaKinds.td
633 ↗(On Diff #206368)

Yes, that's correct. We only ban aggregates containing non-trivial unions (the former case) and non-trivial unions (the latter case).

rjmccall added inline comments.Jun 27 2019, 11:42 PM
include/clang/Sema/Sema.h
2124 ↗(On Diff #206995)

"default-initialized", please.

lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

It looks like you're generally warning about this based on the specific context that forced an lvalue-to-rvalue conversion. I'm not sure volatile is special except that we actually perform the load even in unused-value contexts. Is the assumption that you've exhaustively covered all the other contexts of lvalue-to-rvalue conversions whose values will actually be used? That seems questionable to me.

ahatanak marked an inline comment as done.Jun 28 2019, 9:43 AM
ahatanak added inline comments.
lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

Yes, that was my assumption. All the other contexts where lvalue-to-rvalue conversion is performed and the result is used are already covered by other calls sites of checkNonTrivialCUnion, which informs the users that the struct/union is being used in an invalid context.

Do you have a case in mind that I didn't think of where a lvalue-to-rvalue conversion requires a non-trivial initialization/destruction/copying of a union but clang fails to emit any diagnostics?

Also I realized that lvalue-to-rvalue conversion of volatile types doesn't always require non-trivial destruction, so I think CheckDestruct shouldn't be set in this case.

void foo(U0 *a, volatile U0 *b) {
  // this doesn't require destruction.
  // this is perfectly valid if U0 is non-trivial to destruct but trivial to copy.
  *a = *b;  
}

For the same reason, I think CheckDestruct shouldn't be set for function returns (but it should be set for function parameters since they are destructed by the callee).

rjmccall added inline comments.Jun 28 2019, 12:10 PM
lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

There are a *lot* of places that trigger lvalue-to-rvalue conversion. Many of them aren't legal with structs (in C), but I'm worried about approving a pattern with the potential to be wrong by default just because we didn't think about some weird case. As an example, casts can trigger lvalue-to-rvalue conversion; I think the only casts allowed with structs are the identity cast and the cast to void, but those are indeed allowed. Now, a cast to void means the value is ignored, so we can elide a non-volatile load in the operand, and an identity cast isn't terminal; if the idea is that we're checking all the *terminal* uses of a struct r-value, then we're in much more restricted territory (and don't need to worry about things like commas and conditional operators that can propagate values out). But this still worries me.

I'm not sure we need to be super-pedantic about destructibility vs. copyability in some of this checking. It's certain possible in C++, but I can't imagine what sort of *primitive* type would be trivially copyable but not trivially destructible. (The reverse isn't true: something like a relative pointer would be non-trivially copyable but still trivially destructible.)

Is there any way to make this check cheaper so that we can immediately avoid any further work for types that are obviously copyable/destructible? All the restricted types are (possibly arrays of) record types, right?

ahatanak marked an inline comment as done.Jun 28 2019, 3:29 PM
ahatanak added inline comments.
lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

I'm not sure I fully understand what you are saying, but by "cheaper", do you mean fewer and simpler rules for allowing or disallowing non-trivial unions even when that might result in rejecting unions used in contexts in which non-trivial initialization/destruction/copying is not required? If so, we can probably diagnose any lvalue-to-rvalue conversions regardless of whether the source is volatile if the type is either non-trivial to copy or destruct.

rjmccall added inline comments.Jun 28 2019, 7:39 PM
lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

Sorry, that point was separate from the discussion of volatile and lvalue-to-rvalue conversions. I mean that you're changing a lot of core paths in Sema, and it would be nice if we could very quickly decide based on the type that no restrictions apply instead of having to make a function call, a switch, and a bunch of other calls in order to realize that e.g. void* never needs additional checking. Currently you have a !CPlusPlus check in front of all the checkNonTrivialCUnion calls; I would like something that reliably avoids doing this work for the vast majority of types that are not restricted, even in C.

ahatanak marked an inline comment as done.Jul 1 2019, 12:43 PM
ahatanak added inline comments.
lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

Instead of checking !CPlusPlus, we can call isNonTrivialPrimitiveCType (which is deleted in this patch) to do a cheaper check of whether the type requires any non-trivial default-initialize/destruct/copy functions. The function still uses the copy visitor, so if we want to make the check even cheaper, we can add a flag to RecordDecl that indicates whether it contains a non-trivial union.

rjmccall added inline comments.Jul 1 2019, 2:29 PM
lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

I think it would be sufficient for the fast path to just check for a C record type (i.e. !isa<CXXRecordDecl>(RT->getDecl())). But it doesn't seem unreasonable to record non-copyable/destructible/defaultable types on the RecordDecl if we have the bits.

ahatanak updated this revision to Diff 207600.Jul 2 2019, 12:13 PM

Call hasNonTrivialPrimitiveCStruct to check whether the type is a non-trivial C struct/union before calling checkNonTrivialCUnion. Fix comments.

ahatanak marked an inline comment as done.Jul 2 2019, 12:22 PM
ahatanak added inline comments.
lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

I don't think we can call !isa<CXXRecordDecl> if the type is an array? I created a new function hasNonTrivialPrimitiveCStruct instead that checks whether the type is a non-trivial C struct or union.

ahatanak marked an inline comment as done.Jul 2 2019, 12:31 PM
ahatanak added inline comments.
lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

Actually, it's not wrong. It only excludes C++ records, so the function call will be made for all the other types including C structs/unions or arrays of them that are trivial.

rjmccall added inline comments.Jul 2 2019, 1:38 PM
include/clang/AST/Type.h
1133 ↗(On Diff #207600)

You only want these checks to trigger on unions with non-trivial members (or structs containing them), right? How about something like hasNonTrivialPrimitiveCUnionMember()? Or maybe make it more descriptive for the use sites, like isPrimitiveCRestrictedType()?

Also, it would be nice if the fast path of this could be inlined so that clients usually didn't had to make a call at all. You can write the getBaseElementTypeUnsafe()->getAs<RecordType>() part in an inline implementation at the bottom this file.

lib/Sema/SemaExpr.cpp
16218 ↗(On Diff #206995)

I was skipping a few steps, but yes, you'd need to look through array types (which you can do "unsafely" because you don't need to care about top-level qualification), then check if the base element type was a RecordType.

ahatanak marked an inline comment as done.Jul 3 2019, 1:11 PM
ahatanak added inline comments.
include/clang/AST/Type.h
1133 ↗(On Diff #207600)

Since we don't keep track of whether a struct or union is or contains unions with non-trivial members, we'll have to use the visitors to detect such structs or unions or, to do it faster, add a bit to RecordDeclBits that indicates the presence of non-trivial unions. I guess it's okay to add another bit to RecordDeclBits?

rjmccall added inline comments.Jul 3 2019, 6:02 PM
include/clang/AST/Type.h
1133 ↗(On Diff #207600)

It looks like there's plenty of space in RecordDeclBits, yeah.

ahatanak updated this revision to Diff 208861.Jul 9 2019, 6:31 PM
ahatanak marked 2 inline comments as done.

Add a bit to RecordDeclBitfields that indicates whether the record has a non-trivial C union member.

Thanks, that looks great. A few more requests and then this will be ready to go, I think.

include/clang/AST/DeclBase.h
1454 ↗(On Diff #208861)

This needs to be updated.

include/clang/AST/Type.h
1133 ↗(On Diff #207600)

This comment seems like the right place to explain what makes a union non-trivial in C (that it contains a member which is non-trivial for *any* of the reasons that a type might be non-trivial).

1238 ↗(On Diff #208861)

This should be static, right?

lib/Sema/SemaDecl.cpp
11648 ↗(On Diff #208861)

Please add a comment explaining why this is specific to local variables.

12053 ↗(On Diff #208861)

Please add a comment explaining why this is specific to local variables.

lib/Sema/SemaExpr.cpp
6097 ↗(On Diff #208861)

Can we extract a common function that checks the initializer expression of a non-trivial C union? I think there are at least three separate places that do that in this patch, and it's not too hard to imagine that we might want to add more cases to the common analysis.

ahatanak marked 3 inline comments as done.Jul 10 2019, 3:18 PM
ahatanak added inline comments.
lib/Sema/SemaExpr.cpp
6097 ↗(On Diff #208861)

Do you mean we should have something like bool checkNonTrivialCUnionInInitList(const Expr *Init) so that we don't have to call dyn_cast? If Init is an InitListExpr, the function diagnoses non-trivial C unions types and returns true, otherwise it returns false.

rjmccall added inline comments.Jul 10 2019, 4:16 PM
lib/Sema/SemaExpr.cpp
6097 ↗(On Diff #208861)

Well, the goal is not to eliminate a dyn_cast, it's to generally perform whatever checking is necessary for an initializer. Currently all of our checking is specific to InitListExpr, but there might be other cases we'll want to care about, in which case we'll probably care about them in all three of these places.

ahatanak updated this revision to Diff 209108.Jul 10 2019, 6:16 PM
ahatanak marked 3 inline comments as done.

Address review comments.

ahatanak marked 2 inline comments as done.Jul 10 2019, 6:48 PM
ahatanak added inline comments.
lib/Sema/SemaDecl.cpp
12053 ↗(On Diff #208861)

I was trying to explain why this should be specific to local variables and realized that it's not clear to me whether it should be.

Suppose there is a union with two fields that are both non-trivial:

union U {
  Type A a;
  Type B a;
};

U global;

In this case, is value-initialization (which is essentially default-initialization plus a bunch of zero-initialization as per our previous discussion) used to initialize global? If so, should we reject the code since it requires default-initialization? It should be fine if we can assume default-initialization means zero-initialization for non-trivial types in C, but what if TypeA or TypeB requires initializing to a non-zero value?

rjmccall added inline comments.Jul 10 2019, 11:30 PM
lib/Sema/SemaDecl.cpp
12053 ↗(On Diff #208861)

Yeah, the default-initialization dimension of this problem is interesting. The C++ rule makes sense for C++ because default initialization of a C++ class requires an actual, arbitrary-side-effects constructor call, which of course you can't reasonably do implicitly for a union member. As discussed previously, non-trivial C types can presumably always be default-initialized with a constant bit pattern. That means that, as long as we can do any initialization work at all, then it's in principle not a problem as long as the bit pattern is the same for all the union members requiring non-trivial initialization (and in particular if there's only one such member). So it's just like you say, we *could* just initialize such unions conservatively as long as two different members don't require inconsistent patterns, which in practice they currently never do. That's all true independent of storage duration — if we can write that pattern into a global, we can write into a local. The only caveat is that a semantic need for non-trivial default initialization almost certainly means that there's a semantic need for non-trivial destruction as well, which of course can't be done on a local union (but isn't a problem for a global because we just don't destroy them).

On the other hand, on a language level it's much simpler to just say that we can't default-initialize a union of any storage duration if it has a non-trivial member, and then the language rule doesn't depend on bit-level representations. If there's interest, we can look into weakening that rule later by saying that e.g. it's possible to default-initialize a union with at most one non-trivial member.

Apropos, do we consider unions with non-trivial members to be non-trivial members for the purposes of enclosing unions? Seems like we should. Probably the most sensible way to handle that is to also flag the union as being non-trivial in a dimension if it has a member that's non-trivial in that dimension (which might also let you fast-path some of the checking you need to do). Essentially, we'd consider the case where copying is impossible to be a subset of the case where copying is non-trivial.

ahatanak updated this revision to Diff 209565.Jul 12 2019, 12:41 PM
ahatanak marked an inline comment as done.

Diagnose C union globals that are non-trivial to default-initialize. Add 3 bits to RecordDeclBitfields for default-initialize, destruct, and copy to fast-path checking.

lib/Sema/SemaDecl.cpp
12053 ↗(On Diff #208861)

Yes, this patch does consider unions with non-trivial members to be non-trivial members for the purposes of enclosing unions.

I've made changes that make clang diagnose global variables that are or have C union types that are non-trivial to default-initialize. This disallows declaring global C union variables that have ObjC ARC pointer fields, but we can relax this later if users want them.

Thanks, just a few minor comment requests now.

include/clang/AST/DeclBase.h
1453 ↗(On Diff #209565)

Please include in these comments that these imply the associated basic non-triviality predicates.

include/clang/AST/Type.h
1133 ↗(On Diff #207600)

Okay, if we're tracking these separately, please put separate comments on each. Also, please mention in each comment that this implies the associated basic non-triviality predicate.

lib/Sema/SemaDecl.cpp
12053 ↗(On Diff #208861)

Well, presumably you're only diagnosing them if they're actually default-initialized. Users have an easy workaround if they actually want to declare a global union containing a __strong reference: they can just initialize the member they actually want to initialize.

ahatanak updated this revision to Diff 209622.Jul 12 2019, 3:37 PM
ahatanak marked 3 inline comments as done.

Improve comments.

ahatanak updated this revision to Diff 209626.Jul 12 2019, 3:49 PM

In Type.h, move method declarations down and mention that the predicates imply the associated basic non-triviality predicates.

rjmccall accepted this revision.Jul 12 2019, 4:14 PM

Thanks!

This revision is now accepted and ready to land.Jul 12 2019, 4:14 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2019, 6:50 PM
cfe/trunk/lib/Serialization/ASTWriterDecl.cpp