From a discussion in D68115, it seems like union tail padding is often surprising to folks, compiler implementors included. This new diagnostic, -Wunion-tail-padding, emits a diagnostic when a union's tail is left undefined when develoeprs might have reasonably expected that it would not be. I've included -Wunion-tail-padding under the umbrella of a new diagnostic group, -Wpadding, since it often seems like padding in general is a source of pain. In particular, padding can leak secret information to malicious attackers. Here we focus on the places where padding is *unitialized*, not where a structure or union has padding.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/test/CodeGen/union-tail-padding.c | ||
---|---|---|
9 | This test, while it might be surprising, is testing the current behavior of clang. I haven't validated that it's correct for each standard. |
clang/test/CodeGen/union-tail-padding.c | ||
---|---|---|
28–36 | Are these warnings actionable? What should users do when they see this warning? |
clang/test/CodeGen/union-tail-padding.c | ||
---|---|---|
28–36 | Good point! I was thinking about this, and was wondering if I should add a fixit which suggests using the first wider member of the union. The problem is to offer the same object representation... that's tricky on its own (there isn't always an exact match), and then we have to deal with type punning (in C++, not in C). So I'd love ideas, because I'm not sure what to do. That being said, I wrote this partly because D68115 was surprising to folks, and partly because developers would like to opt-in to this diagnostic to find places where initialization isn't doing what they think. Maybe instead we should suggest to leave uninitialized, and use an assignment, or memset? |
clang/include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
825 | I'd like to hear what other diagnostics folks think should fit under the Padding group. I've focused not on declarations which have padding, but rather on initializations which leave uninitialized padding behind. It seems like internal padding for structs and between array elements would be desirable. |
clang/include/clang/Basic/DiagnosticFrontendKinds.td | ||
---|---|---|
264 | Something which I'm not sure matters: union Unnamed { union { int i; float f; }; char bytes[8]; }; union Unnamed unnamed = { 42 }; Will generate the following diagnostic: warning: Initializing union 'Unnamed' field '' only initializes the first 4 of 8 bytes, leaving the remaining 4 bytes undefined [-Wunion-tail-padding] union Unnamed unnamed = { 42 }; ^ I think that's generally how we handle unnamed members anyways? |
clang/test/CodeGen/union-tail-padding.c | ||
---|---|---|
28–36 |
It's not clear to me that those are better. For example, memset doesn't seem right for non-PODs in C++. I'm not sure what to suggest though. |
clang/test/CodeGen/union-tail-padding.c | ||
---|---|---|
28–36 | I'd argue that non-PODs in unions don't seem right either ;-) |
clang/test/CodeGen/union-tail-padding.c | ||
---|---|---|
28–36 |
Good point. |
I'm not convinced that this is an especially useful diagnostic (much like with -Wpadded), but I'm also not opposed if you are aware of cases where it would be used.
It seems worth pointing out that, at least in C, merely assigning to a union member results in all padding becoming unspecified (6.2.6.1/7). So the suggestion to memset the union to zero and then assign to a specific union member does not work.
clang/include/clang/Basic/DiagnosticFrontendKinds.td | ||
---|---|---|
264 | Please start diagnostic messages with a lowercase letter. In the case you identify, it'd be nice to say "initializing union Unnamed field i only initializes [...]". (We should probably special-case formatting a declaration with an empty name and print out something like '<unnamed>' instead, but that's not specific to this diagnostic.) Also, I think this would read more naturally as "initializing field i of union Unnamed only initializes [...]". | |
clang/include/clang/Basic/DiagnosticGroups.td | ||
825 | It seems to me that including this in the existing -Wpadded warning group would be reasonable (I prefer the name -Wpadding over -Wpadded for what it's worth, but it seems problematic to include both with different meanings.) -Wpadded currently warns whenever struct layout inserts interior or tail padding. Also warning on union fields that, when active, would result in tail padding seems reasonable to me too. (The new warning seems like something that would be hard to enable on any sizeable codebase, or with any headers that aren't under the end-user's control, which I find somewhat concerning. But the old -Wpadded warning has the same issues, so this seems like it's not making things worse.) | |
clang/lib/CodeGen/CGExprConstant.cpp | ||
608–612 | I don't think it's acceptable to produce a warning such as this from IR generation, and especially not from here -- which we may or may not reach depending on a diverse set of factors (in particular, whether the initializer happens to be a constant). I think either you should warn from record layout, like we do for -Wpadded, or from SemaInit (if you only want this to apply if the narrower field is actually initialized). | |
clang/test/CodeGen/union-tail-padding.c | ||
28–36 | Regarding whether this is actionable, I think one useful test is: would it be reasonable for someone to want to promote this warning to an error? If the answer is no, then we should consider making this diagnostic be a remark instead of a warning. |
I wrote it to help with issues a codebase was having adopting variable auto-init. They were surprised at what clang did, and wish they could have known. This is just one datapoint, and I agree that generally this isn't super useful.
It seems worth pointing out that, at least in C, merely assigning to a union member results in all padding becoming unspecified (6.2.6.1/7). So the suggestion to memset the union to zero and then assign to a specific union member does not work.
Indeed, this isn't something I'm trying to address here but it would be useful to do so.
I was wondering if any of the tests were surprising to you, or if the behavior described was as expected? It's certainly not tractable for most users of C and C++ (which might be fine, it shouldn't be relied on).
I've highlighted one case where the test expectation doesn't match the standard rules.
clang/test/CodeGen/union-tail-padding.c | ||
---|---|---|
42 | This warning appears to be incorrect. Value-initialization of a union with a trivial default constructor performs zero-initialization, which does zero out padding bits. |
I've gotten what I wanted out of this (diagnosed a particular codebase), and am not sure it's worth pursuing further. If anyone is interested, please let me know.
Something which I'm not sure matters:
Will generate the following diagnostic:
I think that's generally how we handle unnamed members anyways?