This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Fix test for union initialization
ClosedPublic

Authored by nikic on Dec 18 2021, 11:33 AM.

Details

Summary

The pr19687.c test (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c) currently checks that a union initialization with empty initializer list initializes the whole union to zero. An empty initializer is a GNU extension that I believe should be equivalent to = {0} in this case, which does not require the initialization of anything but the first union member by my reading of the C11 standard (see https://reviews.llvm.org/D115994#3203445 below for relevant quotes).

Prior to D115924 constant folding ended up evaluating the undef initialization as zero, but now it retains the undef, thus breaking this test.

Diff Detail

Event Timeline

nikic requested review of this revision.Dec 18 2021, 11:33 AM
nikic created this revision.
nikic set the repository for this revision to rT test-suite.Dec 18 2021, 11:37 AM
nikic added subscribers: lenary, rsmith.

@lenary Looking at history, would it be preferred to leave this test alone and instead add it to the exclude list in CMakeFiles?

Not sure who to ask what the expected/specified initialization behavior here is, maybe @rsmith knows?

I think Clang is wrong, and "padding" is whatever portion of the union's storage isn't covered by the first member. Maybe there's a clever reading that allows treating this otherwise, but that seems clearly contrary to the intent of the standard.

nikic added a comment.EditedDec 20 2021, 12:49 PM

I think Clang is wrong, and "padding" is whatever portion of the union's storage isn't covered by the first member. Maybe there's a clever reading that allows treating this otherwise, but that seems clearly contrary to the intent of the standard.

Thanks! However, I think I may have confused things here with a bad quote and some incorrect assumptions on my side. I just looked into this a bit more carefully, and I believe the following passages are relevant. First, we have 6.7.9.10:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:

  • if it has pointer type, it is initialized to a null pointer;
  • if it has arithmetic type, it is initialized to (positive or unsigned) zero;
  • if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
  • if it is a union, the first named member is initialized (recursive ly) according to these rules, and any padding is initialized to zero bits

And then in 6.7.9.19:

The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; 151) all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

And 6.7.9.21:

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

My reading here is that the fallback to static duration initialization (including its initialization of padding to zero) only occurs if no field has been initialized. If a field has been initialized, then the padding is no longer initialized (unless it is part of a subobject, in which case the whole subobject is zero initialized). So my interpretation would be that clang would be correct in initializing union U t = {0} with undef padding. union U t[2] = {0} would allow undef padding on the first union, while the second union must be all-zero initialized (and this is indeed what is generated).

In which case the only remaining issue here would be the use of the GNU extension union U t = {}, which I guess it not the same as union U t = {0} after all, and rather zero-initializes the whole object, because no field has been explicitly initialized. In that case Clang would have an initialization bug for the GNU extension case only.

Does that sound about right?

nikic added a comment.Dec 20 2021, 1:24 PM

In which case the only remaining issue here would be the use of the GNU extension union U t = {}, which I guess it not the same as union U t = {0} after all, and rather zero-initializes the whole object, because no field has been explicitly initialized. In that case Clang would have an initialization bug for the GNU extension case only.

Though I could really see this go either way. union U t[1] = {} should clearly result in all-zero initialization (and does), but union U t = {} could well be interpreted to initialize all remaining (== all) fields to zero while leaving padding uninitialized. This would even make more sense to me, given how the rest of this works.

I haven't found any explicit documentation for the empty initializer extension.

nikic edited the summary of this revision. (Show Details)Dec 21 2021, 12:17 AM

I found this old clang patch, which tried to zero-initialize union padding: https://reviews.llvm.org/D68115 Based on the discussion there, and in particular the comment https://reviews.llvm.org/D68115#1946757 it seems like for C++ undef padding is correct for union U t = U{} case, but not correct for union U t = U().

lenary requested changes to this revision.Dec 24 2021, 1:23 AM

This test was directly copied in from the gcc c torture suite and iirc we haven’t edited them on purpose.

If you think the test has undefined behaviour or behaviour clang explicitly doesn’t support, there’s a big excludes list in one of the CMakeLists.txt

This revision now requires changes to proceed.Dec 24 2021, 1:23 AM
nikic updated this revision to Diff 396149.EditedDec 24 2021, 2:44 AM
nikic edited the summary of this revision. (Show Details)

Add test to exclude list instead of modifying the test.

lenary accepted this revision.Dec 26 2021, 2:43 AM

Cool. This is the right kind of fix (thanks for the comment!)

The explanation of which cases are UB and which are not is making my head spin, so I’d like @rjmccall to approve this to before you land it.

This revision is now accepted and ready to land.Dec 26 2021, 2:43 AM
lenary added a comment.Jan 4 2022, 4:18 AM

Ok, this has been held up for a while, and it's something that should have been quick.

I'm not sure the presence of the GCC extension for empty initializer lists really matters here. With regards to the C11 spec, we're doing the right thing, which is that the behaviour is undefined, so excluding the test is correct.

@nikic I'm happy for this to be landed.

Aside: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Consistent,%20Warningless,%20and%20Intuitive%20Initialization%20with%20%7B%7D.html seems to be in the works for C23, which would make this case more defined. At that point, we probably will need to re-vendor in the GCC C torture suite, as I'm sure there have been changes since I first vendored in the test suite.

nikic added a comment.Jan 4 2022, 5:51 AM

Aside: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Consistent,%20Warningless,%20and%20Intuitive%20Initialization%20with%20%7B%7D.html seems to be in the works for C23, which would make this case more defined. At that point, we probably will need to re-vendor in the GCC C torture suite, as I'm sure there have been changes since I first vendored in the test suite.

Thanks for the reference, that's very interesting! Looks like this specific case (= {} for unions) is still a bit of an open question, though the feedback seems to be in favor of not special-casing it (i.e. just doing static initialization of the first member). In either case, nice to know that this is on the way to standardization.

This revision was landed with ongoing or failed builds.Jan 5 2022, 4:11 AM