This is an archive of the discontinued LLVM Phabricator instance.

Add an attribute to allow fields of non-trivial types in C unions
AbandonedPublic

Authored by ahatanak on Jun 6 2019, 3:45 PM.

Details

Reviewers
rjmccall
rsmith
Summary

clang currently disallows fields of non-trivial types (e.g., __strong) in unions in C mode since it's not possible for the compiler to determine how the unions should be initialized, destructed, or copied.

This patch adds support for a new attribute non_trivial_union_member, which causes fields annotated with the attribute to be trivial when computing the trivialness of the containing union. This means the users are responsible for patching up the code so that the union is initialized, destructed, and copied in a functionally correct way.

rdar://problem/50591731

Diff Detail

Repository
rC Clang

Event Timeline

ahatanak created this revision.Jun 6 2019, 3:45 PM

Does this lead to C/C++ ABI mismatches? Should we just honor this in C++ as well by ignoring it when deciding to delete special members? Is taking such a general name a good idea if it's language-specific? Richard, thoughts?

Does this lead to C/C++ ABI mismatches? Should we just honor this in C++ as well by ignoring it when deciding to delete special members? Is taking such a general name a good idea if it's language-specific? Richard, thoughts?

This is a C-only attribute, so clang will emit a diagnostic (warning 'attribute ignored') if the attribute is used to annotate a member of a C++ union. I think that would be sufficient to prevent possible C/C++ ABI mismatches?

Does this lead to C/C++ ABI mismatches? Should we just honor this in C++ as well by ignoring it when deciding to delete special members? Is taking such a general name a good idea if it's language-specific? Richard, thoughts?

This is a C-only attribute, so clang will emit a diagnostic (warning 'attribute ignored') if the attribute is used to annotate a member of a C++ union. I think that would be sufficient to prevent possible C/C++ ABI mismatches?

Is there a way to write a C++ union that would be ABI-compatible with a C union with this attribute?

How do you write correct (non-leaking, non-double-freeing, non-releasing-invalid-pointers) code with this attribute? For example, suppose I have a __strong union member: does storing to it release the old value (which might be a different union member)? If so, how do you work around that? https://clang.llvm.org/docs/AutomaticReferenceCounting.html#ownership-qualified-fields-of-structs-and-unions should be updated to say what happens. If manual reference counting code is required to make any use of this feature correct (which seems superficially likely), is this a better programming model than __unsafe_unretained?

Unions with non-trivial members are only permitted in C++11 onwards; should we allow the attribute in C++98 mode? But more than that, unions with non-trivial members require user-provided special members in C++11 onwards, meaning that a union using this attribute in C would have a different calling convention than the "natural" equivalent in C++. So I'm also wondering if we should allow this in all language modes.

How do you write correct (non-leaking, non-double-freeing, non-releasing-invalid-pointers) code with this attribute? For example, suppose I have a __strong union member: does storing to it release the old value (which might be a different union member)? If so, how do you work around that? https://clang.llvm.org/docs/AutomaticReferenceCounting.html#ownership-qualified-fields-of-structs-and-unions should be updated to say what happens. If manual reference counting code is required to make any use of this feature correct (which seems superficially likely), is this a better programming model than __unsafe_unretained?

I should first make it clear that this attribute is a dangerous feature that makes it easier for users to write incorrect code if they aren't careful.

To write correct code, users have to manually (I don't mean manual-retain-release, this is compiled with ARC) do assignments to the fields that are annotated with the attribute to patch up the code generated by the compiler since the compiler isn't generating the kind of special functions it generates for non-trivial structs. For example:

union U1 {
  id __weak __attribute__((non_trivial_union_member)) a;
  id __attribute__((non_trivial_union_member)) b;
};
  
id getObj(int);

void foo() {
  union U1 u1 = { .b = 0}; // zero-initialize 'b'.
  u1.b = getObj(1); // assign to __strong field 'b'.
  u1.b = getObj(2); // retain and assign another object to 'b' and release the previously referenced object.
  u1.b = 0; // release the object.
  id t = getObj(3);
  u1.a = t; // assign to __weak field 'a'.
  u1.a = 0; // unregister as a __weak object.
}

struct S1 {
  union U1 f0;
  int f1;
};

void foo1() {
  struct S1 s1 = { .f0 = 0};
  s1.f0.a = getObj(4);
  struct S1 s2 = s1; // this is just a memcpy
  s2.f0.a = s1.f0.a;  // manually copy __weak field 'a'.
  s2.f0.a = 0;  // unregister as a __weak object.
}

Storing to a __strong union member does release the old value. The old value can be a different union member and I think that should be fine as long as the other member is also __strong. If the other member has a different lifetime qualifier, I believe it has to be nulled out before assigning the new object as shown in the example above.

Unions with non-trivial members are only permitted in C++11 onwards; should we allow the attribute in C++98 mode? But more than that, unions with non-trivial members require user-provided special members in C++11 onwards, meaning that a union using this attribute in C would have a different calling convention than the "natural" equivalent in C++. So I'm also wondering if we should allow this in all language modes.

I think we can allow this attribute in C++ mode including C++98. One thing to note is that the compiler creates temporaries users can't directly access when passing/returning unions with non-trivial members to/from functions or capturing them as a block capture. In those cases, users won't be able to patch up the code generated by the compiler, so it would be incorrect to pass a union whose active member is the one annotated with non_trivial_union_member.

How do you write correct (non-leaking, non-double-freeing, non-releasing-invalid-pointers) code with this attribute? For example, suppose I have a __strong union member: does storing to it release the old value (which might be a different union member)? If so, how do you work around that? https://clang.llvm.org/docs/AutomaticReferenceCounting.html#ownership-qualified-fields-of-structs-and-unions should be updated to say what happens. If manual reference counting code is required to make any use of this feature correct (which seems superficially likely), is this a better programming model than __unsafe_unretained?

I should first make it clear that this attribute is a dangerous feature that makes it easier for users to write incorrect code if they aren't careful.

To write correct code, users have to manually (I don't mean manual-retain-release, this is compiled with ARC) do assignments to the fields that are annotated with the attribute to patch up the code generated by the compiler since the compiler isn't generating the kind of special functions it generates for non-trivial structs. For example:

union U1 {
  id __weak __attribute__((non_trivial_union_member)) a;
  id __attribute__((non_trivial_union_member)) b;
};
  
id getObj(int);

void foo() {
  union U1 u1 = { .b = 0}; // zero-initialize 'b'.
  u1.b = getObj(1); // assign to __strong field 'b'.
  u1.b = getObj(2); // retain and assign another object to 'b' and release the previously referenced object.

This is what I expected and what I was worried about. I could be missing something, but this approach appears to me to cause this feature to not be useful in many cases. If a store to a union member now first reads from (and releases) that union member, then you cannot change the active union member to a member with lifetime type, because that would trigger a load of an inactive union member with undefined behavior. For example, consider:

union U {
  float f;
  __attribute__((non_trivial_union_member)) id b;
};
void f() {
  union U u = {.f = 1.0f};
  u.b = getObj(1); // undefined behavior: tries to release existing value of u.b, which is not active union member; the implied `[u.b release]` is going to do Very Bad Things.

In C, storing to a union member is the way you change the active member, and I can't off-hand think of any sensible workaround for this problem. But maybe I overlooked it: can you demonstrate how you would correctly write the last line of the above example? If that can't be done reasonably, I do not think we should add this attribute.

For ARC, you could bzero the union member; this is what how we tell people to initialize malloc'ed memory as well, since there's no default-constructor concept that they can invoke for such cases.

Our immediate need for this attribute is that we have some code that wants to adopt non-trivial annotations in unions, with no intention of ever copying or destroying them as aggregates. Of course a better solution would be to make any C code that would copy or destroy the aggregate type illegal, but that seems like a big project. But maybe there'd be no need for this attribute in the long term if we eventually do have that support.

For ARC, you could bzero the union member; this is what how we tell people to initialize malloc'ed memory as well, since there's no default-constructor concept that they can invoke for such cases.
Our immediate need for this attribute is that we have some code that wants to adopt non-trivial annotations in unions, with no intention of ever copying or destroying them as aggregates.

OK, so we'd be looking at something like this when using the attribute:

typedef struct S {
  int is_b;
  union {
    float f;
    __attribute__((non_trivial_union_member)) id b;
  };
} S;
S make() {
  S s = {0, 0.0f};
  return s;
}
void set_f(S *s, float f) {
  if (s->is_b) s->b = 0;
  s->is_b = 0;
  s->f = f;
}
void set_b(S *s, id b) {
  if (!s->is_b) bzero(s, sizeof(S));
  s->is_b = 1;
  s->b = b;
}
void destroy_s(S *s) {
  if (s->is_b) s->b = 0;
}

versus the same code written without the attribute, which might be:

typedef struct S {
  int is_b;
  union {
    float f;
    void *b;
  };
} S;
S make() {
  S s = {0, 0.0f};
  return s;
}
void set_f(S *s, float f) {
  if (s->is_b) (__bridge_transfer id)s->b;
  s->is_b = 0;
  s->f = f;
}
void set_b(S *s, id b) {
  if (s->is_b) (__bridge_transfer id)s->b;
  s->is_b = 1;
  s->b = (__bridge_retained void*)b;
}
void destroy_s(S *s) {
  if (s->is_b) (__bridge_transfer id)s->b;
}

To me, using the attribute here doesn't seem worthwhile; the first form of this code seems scary since important reference counting actions are hidden behind what appear to be dead stores, and the second form (while verbose) is at least explicit about what's going on. If you still think this is a useful feature, so be it, but we should at least be explicit in Clang's ARC documentation about how to correctly make use of it.

I'm not sure whether we should support this as a way of disabling the union triviality features in general, or only the restriction on lifetime types, but either way, I think the support of this attribute should not be dependent on the language mode; I think adding a C-only extension for this does a disservice to our users.

Of course a better solution would be to make any C code that would copy or destroy the aggregate type illegal, but that seems like a big project. But maybe there'd be no need for this attribute in the long term if we eventually do have that support.

Yeah, maybe so; adopting the C++11 "unrestricted unions" behavior for non-trivial types in unions in C does seem to make a certain amount of sense.

John and I had a discussion offline and decided that we should not pursue the approach taken in this patch. I'll try to work on a patch that follows the C++11 approach when I have time later.

ahatanak abandoned this revision.Jun 12 2019, 5:14 PM