This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Fold VLA types in compound literals to constant arrays.
ClosedPublic

Authored by efriedma on Mar 10 2021, 11:25 AM.

Details

Summary

Similar to variables with an initializer, this is never valid in standard C, so we can safely constant-fold as an extension. I ran into this construct in a couple proprietary codebases.

While I'm here, drive-by fix for 090dd647: we should only fold variables with VLA types, not arbitrary variably modified types.

Diff Detail

Event Timeline

efriedma requested review of this revision.Mar 10 2021, 11:25 AM
efriedma created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2021, 11:25 AM
aaron.ballman added inline comments.Mar 15 2021, 11:34 AM
clang/test/Sema/vla.c
134

Doesn't this violate the constraints in C17 6.5.2.5p1, "The type name shall specify a complete object type or an array of unknown size, but not a variable-length array type"?

efriedma added inline comments.Mar 15 2021, 12:58 PM
clang/test/Sema/vla.c
134

Yes, this is a constraint violation. This patch downgrades the error to a warning, for compatibility with older versions of clang.

aaron.ballman added inline comments.Mar 15 2021, 1:22 PM
clang/test/Sema/vla.c
134

It was an error in older versions of Clang, so downgrading it to a warning can't make code *more* compatible with older versions. This also makes it harder to port C code to other compilers because Clang happily accepts code that the C standard says should be rejected.

I'm not strongly opposed to the change, but I also don't like ignoring constraint violations in the standard as that comes awfully close to what -fpermissive does (and at least with that awful flag you have to opt *into* the less strictly conforming behavior instead of getting it by default as with extensions).

I'm curious if @rsmith has thoughts here?

efriedma added inline comments.Mar 15 2021, 2:11 PM
clang/test/Sema/vla.c
134

It was an error in older versions of Clang

https://godbolt.org/z/rvbffY

aaron.ballman added inline comments.Mar 15 2021, 2:42 PM
clang/test/Sema/vla.c
134

Oh, I hadn't realized this changed *that* recently! Is this breaking some significant amounts of code now that we err on it (or regressing performance by not folding the VLA)?

My point still stands about disliking when we ignore constraint violations. To expound a bit, in this particular case, I think the standard is being over-constraining because we reasonably *can* fold this to a more efficient form. Normally, I'd suggest filing a DR with WG14 over this and treating the constraint as UB we can extend. However, I think the "solution" that comes out of WG14 would (likely) be to make compound literal expressions of VLA types be undefined behavior rather than a constraint violation (because I don't see many other options in the standard wording for a more targeted fix to allow *just* this safe usage). Given the security concerns around misuse of VLAs already, I think that would be a worse world to live in even if it gets us what we want with this specific case in mind. This is where my caution is coming from. Knowing a bit more about the experience in the proprietary code bases would be helpful, if it's something you can share more details about.

rsmith added inline comments.Mar 15 2021, 3:47 PM
clang/test/Sema/vla.c
134

Is this breaking some significant amounts of code now that we err on it (or regressing performance by not folding the VLA)?

We only ever constant-fold VLAs to constant-length arrays in cases where a VLA would be invalid, so there's no regressing-performance concern. (We used to do such folding, but it resulted in our miscompiling certain testcases, which is why Clang's behavior changed recently.) As a result, there's no risk of someone seeing a surprise VLA here, that should have been a constraint violation.

Given the above, I don't think that changing this from a constraint violation to UB in the C standard would make a difference. Either way, I think a strictly conforming program cannot contain such a construct, and a conforming implementation can, as an extension, define the meaning of programs containing such a construct.

It's common for us to have conforming extensions that permit us to accept more real-world code. Accepting with an on-by-default warning is often the best tradeoff between accepting real-world code and diagnosing non-portable constructs, especially in cases where past versions of GCC (<= 4.4) and Clang accepted the code as an extension. I'm inclined to think that's the best we can do here.

If we want to more strongly discourage use of this extension (and I don't have any particular preferences on whether we should do so), we could promote the warning to DefaultError (but leave the extension in, so that people can still build legacy codebases), but I think we should do that consistently across all the places we fold VLAs to constant arrays, not only for compound literals.

efriedma added inline comments.Mar 15 2021, 3:55 PM
clang/test/Sema/vla.c
134

The pattern I saw looked something like the following, at global scope, where it couldn't really be confused for a VLA:

const int x = 2;
int *p = (int[x]){0};

As for the frequency of it showing up, I think I got confused. Looking again, there were multiple codebases with new -Wgnu-folding-constant warnings, but only one codebase with this specific construct. So that makes the argument a bit weaker.

That said, we're doing this folding for variables already, so extending it to compound literals doesn't seem like a big deal. And the warning is on by default.

aaron.ballman accepted this revision.Mar 16 2021, 5:27 AM

LGTM, though I think we should consider changing the diagnostic text (for all of the VLA folding cases) as a follow-up.

clang/test/Sema/vla.c
134

Given the above, I don't think that changing this from a constraint violation to UB in the C standard would make a difference. Either way, I think a strictly conforming program cannot contain such a construct, and a conforming implementation can, as an extension, define the meaning of programs containing such a construct.

I think changing from a constraint violation to UB in the C standard would make a difference, but subtly. When something is a constraint violation, the implicit expectation is that implementations diagnose it because the committee felt it was a sufficiently bad construct to warrant forcing implementations to tell users not to use it. When something is UB, there is no such expectation -- some UB is explicitly for implementation extensions, others are because it would be impossible to diagnose, etc. So I agree that the outcome is the same, but the messaging to implementers (and other readers of the standard) is quite different.

It's common for us to have conforming extensions that permit us to accept more real-world code. Accepting with an on-by-default warning is often the best tradeoff between accepting real-world code and diagnosing non-portable constructs, especially in cases where past versions of GCC (<= 4.4) and Clang accepted the code as an extension. I'm inclined to think that's the best we can do here.

Agreed, but "real-world code", to me, isn't "someone wrote it once" so much as "this is being relied upon and it's an important use case." For example, I am highly sympathetic to a standard library (or other popular system libraries) that relied on older compiler behavior (accidentally or purposefully), but less so for "regular user code" unless it's a popular idiom.

If we want to more strongly discourage use of this extension (and I don't have any particular preferences on whether we should do so), we could promote the warning to DefaultError (but leave the extension in, so that people can still build legacy codebases), but I think we should do that consistently across all the places we fold VLAs to constant arrays, not only for compound literals.

100% agreed that anything we do here should be done consistently. I think I'd feel most comfortable if the diagnostic made it more obvious that this code is not valid C code, like ISO C forbids use of a variable length array in this context; folding to a constant array as an extension. I think it's reasonable to not promote it to DefaultError because I think each of these cases is not introducing some potentially unsafe usage so much as enabling questionable code to compile with reasonable results. (If I'm wrong about that, I'd be curious to know of it.)

As for this specific case of defining the behavior... if this was the first VLA constant folding extension, I'd argue it's not motivated well enough, but because we have plenty of other instances of doing the same thing, I'm fine adding the extension here.

This revision is now accepted and ready to land.Mar 16 2021, 5:27 AM
This revision was landed with ongoing or failed builds.Apr 14 2021, 5:10 PM
This revision was automatically updated to reflect the committed changes.