This is an archive of the discontinued LLVM Phabricator instance.

Diagnose union tail padding
AbandonedPublic

Authored by jfb on May 15 2020, 6:15 PM.

Details

Summary

From a discussion in D68115, it seems like union tail padding is often surprising to folks, compiler implementors included. This new diagnostic, -Wunion-tail-padding, emits a diagnostic when a union's tail is left undefined when develoeprs might have reasonably expected that it would not be. I've included -Wunion-tail-padding under the umbrella of a new diagnostic group, -Wpadding, since it often seems like padding in general is a source of pain. In particular, padding can leak secret information to malicious attackers. Here we focus on the places where padding is *unitialized*, not where a structure or union has padding.

Diff Detail

Event Timeline

jfb created this revision.May 15 2020, 6:15 PM
jfb marked an inline comment as done.May 15 2020, 6:15 PM
jfb added inline comments.
clang/test/CodeGen/union-tail-padding.c
9

This test, while it might be surprising, is testing the current behavior of clang. I haven't validated that it's correct for each standard.

dexonsmith added inline comments.May 15 2020, 6:37 PM
clang/test/CodeGen/union-tail-padding.c
28–36

Are these warnings actionable? What should users do when they see this warning?

jfb marked an inline comment as done.May 15 2020, 8:18 PM
jfb added inline comments.
clang/test/CodeGen/union-tail-padding.c
28–36

Good point!

I was thinking about this, and was wondering if I should add a fixit which suggests using the first wider member of the union. The problem is to offer the same object representation... that's tricky on its own (there isn't always an exact match), and then we have to deal with type punning (in C++, not in C).

So I'd love ideas, because I'm not sure what to do. That being said, I wrote this partly because D68115 was surprising to folks, and partly because developers would like to opt-in to this diagnostic to find places where initialization isn't doing what they think.

Maybe instead we should suggest to leave uninitialized, and use an assignment, or memset?

jfb marked an inline comment as done.May 15 2020, 8:22 PM
jfb added inline comments.
clang/include/clang/Basic/DiagnosticGroups.td
825

I'd like to hear what other diagnostics folks think should fit under the Padding group. I've focused not on declarations which have padding, but rather on initializations which leave uninitialized padding behind. It seems like internal padding for structs and between array elements would be desirable.

jfb marked an inline comment as done.May 15 2020, 9:01 PM
jfb added inline comments.
clang/include/clang/Basic/DiagnosticFrontendKinds.td
264

Something which I'm not sure matters:

union Unnamed {
  union {
    int i;
    float f;
  };
  char bytes[8];
};

union Unnamed unnamed = { 42 };

Will generate the following diagnostic:

warning: Initializing union 'Unnamed' field '' only initializes the first 4 of 8 bytes, leaving the remaining 4 bytes undefined [-Wunion-tail-padding]
union Unnamed unnamed = { 42 };
                        ^

I think that's generally how we handle unnamed members anyways?

dexonsmith added inline comments.May 18 2020, 11:29 AM
clang/test/CodeGen/union-tail-padding.c
28–36

Maybe instead we should suggest to leave uninitialized, and use an assignment, or memset?

It's not clear to me that those are better. For example, memset doesn't seem right for non-PODs in C++. I'm not sure what to suggest though.

jfb marked an inline comment as done.May 18 2020, 12:48 PM
jfb added inline comments.
clang/test/CodeGen/union-tail-padding.c
28–36

I'd argue that non-PODs in unions don't seem right either ;-)
It would be easy to diagnose non-trivially-copyable types differently in this circumstance.

dexonsmith added inline comments.May 18 2020, 2:16 PM
clang/test/CodeGen/union-tail-padding.c
28–36

It would be easy to diagnose non-trivially-copyable types differently in this circumstance.

Good point.

I'm not convinced that this is an especially useful diagnostic (much like with -Wpadded), but I'm also not opposed if you are aware of cases where it would be used.

It seems worth pointing out that, at least in C, merely assigning to a union member results in all padding becoming unspecified (6.2.6.1/7). So the suggestion to memset the union to zero and then assign to a specific union member does not work.

clang/include/clang/Basic/DiagnosticFrontendKinds.td
264

Please start diagnostic messages with a lowercase letter.

In the case you identify, it'd be nice to say "initializing union Unnamed field i only initializes [...]". (We should probably special-case formatting a declaration with an empty name and print out something like '<unnamed>' instead, but that's not specific to this diagnostic.)

Also, I think this would read more naturally as "initializing field i of union Unnamed only initializes [...]".

clang/include/clang/Basic/DiagnosticGroups.td
825

It seems to me that including this in the existing -Wpadded warning group would be reasonable (I prefer the name -Wpadding over -Wpadded for what it's worth, but it seems problematic to include both with different meanings.)

-Wpadded currently warns whenever struct layout inserts interior or tail padding. Also warning on union fields that, when active, would result in tail padding seems reasonable to me too.

(The new warning seems like something that would be hard to enable on any sizeable codebase, or with any headers that aren't under the end-user's control, which I find somewhat concerning. But the old -Wpadded warning has the same issues, so this seems like it's not making things worse.)

clang/lib/CodeGen/CGExprConstant.cpp
608–612

I don't think it's acceptable to produce a warning such as this from IR generation, and especially not from here -- which we may or may not reach depending on a diverse set of factors (in particular, whether the initializer happens to be a constant).

I think either you should warn from record layout, like we do for -Wpadded, or from SemaInit (if you only want this to apply if the narrower field is actually initialized).

clang/test/CodeGen/union-tail-padding.c
28–36

Regarding whether this is actionable, I think one useful test is: would it be reasonable for someone to want to promote this warning to an error? If the answer is no, then we should consider making this diagnostic be a remark instead of a warning.

jfb added a comment.May 26 2020, 9:24 AM

I'm not convinced that this is an especially useful diagnostic (much like with -Wpadded), but I'm also not opposed if you are aware of cases where it would be used.

I wrote it to help with issues a codebase was having adopting variable auto-init. They were surprised at what clang did, and wish they could have known. This is just one datapoint, and I agree that generally this isn't super useful.

It seems worth pointing out that, at least in C, merely assigning to a union member results in all padding becoming unspecified (6.2.6.1/7). So the suggestion to memset the union to zero and then assign to a specific union member does not work.

Indeed, this isn't something I'm trying to address here but it would be useful to do so.

I was wondering if any of the tests were surprising to you, or if the behavior described was as expected? It's certainly not tractable for most users of C and C++ (which might be fine, it shouldn't be relied on).

In D80055#2055151, @jfb wrote:

I was wondering if any of the tests were surprising to you, or if the behavior described was as expected?

I've highlighted one case where the test expectation doesn't match the standard rules.

clang/test/CodeGen/union-tail-padding.c
42

This warning appears to be incorrect. Value-initialization of a union with a trivial default constructor performs zero-initialization, which does zero out padding bits.

jfb abandoned this revision.Jun 19 2020, 3:05 PM

I've gotten what I wanted out of this (diagnosed a particular codebase), and am not sure it's worth pursuing further. If anyone is interested, please let me know.