This is an archive of the discontinued LLVM Phabricator instance.

[clang] Reject flexible array member in a union in C++
ClosedPublic

Authored by Fznamznon on Apr 5 2023, 9:17 AM.

Details

Summary

It was rejected in C, and in a strange way accepted in C++. However, the
support was never properly tested and fully implemented, so just reject
it in C++ mode as well.

This change also fixes crash on attempt to initialize union with flexible
array member. Due to missing check on union, there was a null expression
added to init list that caused crash later.

Fixes https://github.com/llvm/llvm-project/issues/61746

Diff Detail

Event Timeline

Fznamznon created this revision.Apr 5 2023, 9:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 9:17 AM
Fznamznon requested review of this revision.Apr 5 2023, 9:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 9:17 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Fznamznon added inline comments.Apr 5 2023, 9:45 AM
clang/lib/Sema/SemaInit.cpp
808

Just for some context, numStructUnionElements checks that there is a flexible array member and returns number_of_initializable_fields-1 for structs. For unions it just returns 1 or 0, so flexible array member caused adding one more element to initlistexpr that was never properly handled.

Instead of doing this change, we could probably never enter initialization since the record (union) declaration is not valid, but that is not the case even for other types of errors in code, for example, I've tried declaring field of struct with a typo:

struct { cha x[]; } r = {1};

Initialization is still performed by clang.
Also, it seems MSVC considers flexible array member inside union as valid, so the test code is probably not always invalid.

shafik added a subscriber: rsmith.Apr 5 2023, 2:21 PM

Thank you for this fix.

clang/lib/Sema/SemaInit.cpp
808

I am not sure what to think here, looking at gcc documentation for this extension: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

and using the following code:

struct f1 {
  int x; int y[];
} f1 = { 1, { 2, 3, 4 } }; // #1

struct f2 {
  struct f1 f1; int data[3];
} f2 = { { 1 }, { 2, 3, 4 } }; // #2

struct { char x[]; } r = {1};  // #3

gcc rejects 2 and 3 even though 2 comes from their documentation. Clang warns on 2 and MSVC rejects 2

CC @aaron.ballman @rsmith

Fznamznon added inline comments.Apr 6 2023, 3:47 AM
clang/lib/Sema/SemaInit.cpp
808

Yes, I also had a feeling that we probably need to refine how these extensions are supported by clang, that is probably a bit out of scope of the fix though.

aaron.ballman added inline comments.Apr 6 2023, 7:43 AM
clang/lib/Sema/SemaInit.cpp
808

The GCC extension differs based on C vs C++: https://godbolt.org/z/E14Yz37To
As does the extension in Clang, but differently than GCC: https://godbolt.org/z/zYznaYPf5

So I agree that there's work to be done on this extension, but it's outside of the scope of the crash fix for this patch. For this patch, GCC rejects allowing a flexible array member in a union in both C and C++, but it looks like Clang rejects in C and perhaps accepts in C++: https://godbolt.org/z/bTsPn3G7b So how about we add a C++ test case for this as well -- if that still crashes, that should be fixed, but if the code is accepted, we should either decide we want to start rejecting it or we should ensure the codegen for it is correct.

Fznamznon added inline comments.Apr 11 2023, 6:21 AM
clang/lib/Sema/SemaInit.cpp
808

While experimenting with C++ test, I've noticed the following thing:

union { char x[]; } r = {0}; // when in global scope generates no IR (without my patch crashes)

void foo() {
  union { char x[]; } r = {0}; // fails with error "initialization of flexible array member is not allowed" in both C/C++, no crash even without my patch
}

union A {char x[]; };
A a = {0}; // crashes even with my patch but with different assertion when trying -emit-llvm
void foo() {
  A a = {0}; // fails with error "initialization of flexible array member is not allowed" in both C and C++, no crash even without my patch
}

It is not entirely clear to me why the behavior different for function and TU scope. gcc always gives an error about flexible array in union, no matter how it is used. Also, it is strange that I'm not seeing the same "initialization of flexible array member is not allowed" error message for structs, just for unions. I'm thinking that we don't really have proper codegen for the code that I'm trying to fix and we should reject the code like gcc does. MSVC is fine with all though - https://godbolt.org/z/aoarEzd56 .

WDYT?

Fznamznon marked an inline comment as not done.Apr 17 2023, 4:52 AM
Fznamznon added inline comments.
clang/lib/Sema/SemaInit.cpp
808

Ping?

aaron.ballman added inline comments.Apr 17 2023, 5:16 AM
clang/lib/Sema/SemaInit.cpp
808

Ouch, that's a rather confused situation, isn't it? I don't have the impression we actually support this extension and we do not document support for it in the language extensions page, so due to the crashing I'm leaning towards diagnosing this code as invalid per the standards. That doesn't mean someone can't come along and support this extension if they need it to work, but the current state of things is not production-ready. Any disagreement with that?

(If we go that route, we should definitely follow the potentially breaking changes policy (https://llvm.org/docs/DeveloperPolicy.html#making-potentially-breaking-changes) because this could be disruptive depending on how much this is used in the wild.)

Fznamznon added inline comments.Apr 17 2023, 5:33 AM
clang/lib/Sema/SemaInit.cpp
808

I don't have any objections to start diagnosing the code as invalid. Flexible array members in unions don't seem to work properly. This is claimed as some GCC extension around the clang code, the thing is that GCC rejects it.

shafik added inline comments.Apr 18 2023, 8:59 AM
clang/lib/Sema/SemaInit.cpp
808

Diagnosing this sounds like the right thing to do. If someone is using this, it might be good to flush that and try and see what their actual use case is.

Fznamznon updated this revision to Diff 515287.Apr 20 2023, 5:07 AM

Reject flexible array members in unions in C++ as well.

Fznamznon retitled this revision from [clang] Do not crash when initializing union with flexible array member to [clang] Reject flexible array member in a union in C++.Apr 20 2023, 5:09 AM
Fznamznon edited the summary of this revision. (Show Details)
Fznamznon added reviewers: rjmccall, Restricted Project.
aaron.ballman added inline comments.Apr 28 2023, 9:38 AM
clang/docs/ReleaseNotes.rst
58
clang/include/clang/Basic/DiagnosticSemaKinds.td
6241–6244

Should this be updated to remove the union case?

6252

Same question here

6255

This was the last use of this diagnostic group, so we can remove it from DiagnosticGroups.td as well.

Fznamznon added inline comments.Apr 28 2023, 10:24 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
6241–6244

Sounds reasonable, but what is unfortunate is that this diagnostic exactly matches TagTypeKind enum, so now it can be emitted with Diag(...) << ... << SomeTagDecl->getTagKind(). Once I remove union from there, this neat thing fails.

Fznamznon edited the summary of this revision. (Show Details)

Apply some of the comments, rebase

aaron.ballman added inline comments.Apr 28 2023, 10:52 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
6241–6244

Oh, that might be worth leaving alone then, but put a comment next to the diagnostic definition to explain the relationship to TagTypeKind.

Fznamznon updated this revision to Diff 518695.May 2 2023, 5:55 AM

Rebase, add a comment

Fznamznon added inline comments.May 2 2023, 5:57 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
6241–6244

Sure, added a comment above the first diagnostic that uses the same structure.

This revision is now accepted and ready to land.May 2 2023, 6:25 AM
aaron.ballman added inline comments.May 2 2023, 7:19 AM
clang/test/SemaObjCXX/flexible-array.mm
7

CC @rjmccall -- this is changing Objective-C++ behavior, which I assume you're fine with because it's following the same changes in C++ behavior, but wanted to call out explicitly in case you had concerns.

I have no specific objections. I do worry about removing support for something that's apparently been accepted for a long time, just on general source-compatibility grounds, but I don't think there's an ObjC-specific problem with it.

This revision was automatically updated to reflect the committed changes.

I've got an error from buildbot on Windows:

C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winioctl.h:13404:51: error: flexible array member 'Lev1Depends' in a union is not allowed

        STORAGE_QUERY_DEPENDENT_VOLUME_LEV1_ENTRY Lev1Depends[];

It feels like Windows headers contain flexible array members in unions. We probably can't just always reject them on Windows.

I've got an error from buildbot on Windows:

C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winioctl.h:13404:51: error: flexible array member 'Lev1Depends' in a union is not allowed

        STORAGE_QUERY_DEPENDENT_VOLUME_LEV1_ENTRY Lev1Depends[];

It feels like Windows headers contain flexible array members in unions. We probably can't just always reject them on Windows.

Thank you for the quick revert -- yeah, this means we have to make this work for MSVC compatibility, unfortunately. Any indication we're going to hit something similar with GCC compatibility in a glibc (or other system) header?

I think we'll have to back off the hard stance of always rejecting this because I think we'll need this to work in -fms-compatibility mode. If there's not indications of this being disruptive on non-MSVC-compatible targets, then we may still be able to get away with rejecting the extension there.

If there's not indications of this being disruptive on non-MSVC-compatible targets, then we may still be able to get away with rejecting the extension there.

If we need to have the codepath anyway, there isn't much harm in allowing it on all targets, I think. There's really only one possible interpretation for the construct.

If there's not indications of this being disruptive on non-MSVC-compatible targets, then we may still be able to get away with rejecting the extension there.

If we need to have the codepath anyway, there isn't much harm in allowing it on all targets, I think. There's really only one possible interpretation for the construct.

You would think, except the GCC extension differs based on C vs C++: https://godbolt.org/z/E14Yz37To as does the extension in Clang, but differently than GCC: https://godbolt.org/z/zYznaYPf5 and so we'd also have to dig into solving that if we wanted to keep GCC compatibility behavior.

If there's not indications of this being disruptive on non-MSVC-compatible targets, then we may still be able to get away with rejecting the extension there.

If we need to have the codepath anyway, there isn't much harm in allowing it on all targets, I think. There's really only one possible interpretation for the construct.

You would think, except the GCC extension differs based on C vs C++: https://godbolt.org/z/E14Yz37To as does the extension in Clang, but differently than GCC: https://godbolt.org/z/zYznaYPf5 and so we'd also have to dig into solving that if we wanted to keep GCC compatibility behavior.

I don't see any unions there? Declaring a flexible array is separate from flexible array initialization.

Actually, despite my saying the interpretation for unions is "obvious", it's actually a little more weird than I thought: union x { short x[]; }; static_assert(sizeof(x)==2); compiles with msvc.

If there's not indications of this being disruptive on non-MSVC-compatible targets, then we may still be able to get away with rejecting the extension there.

If we need to have the codepath anyway, there isn't much harm in allowing it on all targets, I think. There's really only one possible interpretation for the construct.

You would think, except the GCC extension differs based on C vs C++: https://godbolt.org/z/E14Yz37To as does the extension in Clang, but differently than GCC: https://godbolt.org/z/zYznaYPf5 and so we'd also have to dig into solving that if we wanted to keep GCC compatibility behavior.

I don't see any unions there? Declaring a flexible array is separate from flexible array initialization.

Sorry, that was a rather bad think-o -- those examples came from GCC's documentation of FAM extensions (https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html), but you're right, that example is not about declarations but initialization. @Fznamznon had an example: https://godbolt.org/z/jsWjv7svr

Actually, despite my saying the interpretation for unions is "obvious", it's actually a little more weird than I thought: union x { short x[]; }; static_assert(sizeof(x)==2); compiles with msvc.

Well that's... a bit shorter... than I would have expected that union to be. (I'm not apologizing for the pun.)

Any indication we're going to hit something similar with GCC compatibility in a glibc (or other system) header?

I haven't seen non-Windows problems. Though I agree that if we somehow are going to support this for MSVC compatibility, there is no harm in allowing this for all targets.

Well that's... a bit shorter... than I would have expected that union to be. (I'm not apologizing for the pun.)

In clang union x { short x[]; }; static_assert(sizeof(x)==2); evaluates to '0 == 2'. That is even shorter, I guess? This is also the reason why

union A {char x[]; };
A a = {0};

keeps crashing even after my fix. It feels like we're not even close to be compatible with msvc. I would probably prefer submitting the original fix and submitting an issue to clean up the mess with this extension. At least this way it won't crash in C after issuing an error, like this https://godbolt.org/z/5oxYTq5fq

I posted a pure crash fix here - https://reviews.llvm.org/D150435 .