This is an archive of the discontinued LLVM Phabricator instance.

Diagnose use of VLAs in C++ by default
ClosedPublic

Authored by aaron.ballman on Jul 28 2023, 12:25 PM.

Details

Summary

Clang supports VLAs in C++ as an extension, but we currently only warn on their use when you pass -Wvla, -Wvla-extension, or -pedantic. However, VLAs as they're expressed in C have been considered by WG21 and rejected, are easy to use accidentally to the surprise of users (e.g., https://ddanilov.me/default-non-standard-features/), and they have potential security implications beyond constant-size arrays (https://wiki.sei.cmu.edu/confluence/display/c/ARR32-C.+Ensure+size+arguments+for+variable+length+arrays+are+in+a+valid+range). C++ users should strongly consider using other functionality such as std::vector instead.

This seems like sufficiently compelling evidence to warn users about VLA use by default in C++ modes. This patch enables the -Wvla-extension diagnostic group in C++ language modes by default, and adds the warning group to -Wall in GNU++ language modes. The warning is still opt-in in C language modes, where support for VLAs is somewhat less surprising to users.

Fixes https://github.com/llvm/llvm-project/issues/62836

Diff Detail

Event Timeline

aaron.ballman created this revision.Jul 28 2023, 12:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 12:25 PM
Herald added subscribers: mattd, pmatos, asb and 4 others. · View Herald Transcript
aaron.ballman requested review of this revision.Jul 28 2023, 12:25 PM
Herald added a project: Restricted Project. · View Herald Transcript
aaron.ballman added subscribers: ABataev, Restricted Project.Jul 28 2023, 12:28 PM

The patch seems large and scary, but that's because of how often we've had to test VLA behavior in C++ code. The OpenMP test cases are mostly opting out of VLA warnings mechanically (I verified with @ABataev that this was the desired approach) while the non-OpenMP tests were largely updated by hand so that I could spot check the diagnostic behavior to ensure it's reasonable. The functional changes themselves are quite small.

Adding the clang-vendors group in case this is disruptive for downstreams.

jrtc27 added a subscriber: jrtc27.Jul 28 2023, 12:36 PM

Given GCC defines GNU C++ and regards this as a feature (unless you use things like -pedantic to ask for ISO C++), does it make sense to enable this for GNU C++? Every deviation from GCC is a point of friction for adopting Clang over GCC.

aaron.ballman added a comment.EditedJul 28 2023, 1:08 PM

Given GCC defines GNU C++ and regards this as a feature (unless you use things like -pedantic to ask for ISO C++), does it make sense to enable this for GNU C++?

I think GCC should enable -Wvla by default in GNU C++ as well, for the same reasons I'm proposing it for Clang. I've filed an issue for it at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110848

Every deviation from GCC is a point of friction for adopting Clang over GCC.

It is, but one of the selling points of Clang is that we strive to have comprehensive diagnostics and that sometimes leads to these deviations with GCC. We don't make much of a distinction between GNU++ and C++ modes in Clang (~10 uses of GNUMode in the frontend), but that is an approach I could take should it be necessary. However, I strongly prefer to enable this diagnostic in all C++ modes -- VLA use in C++ is deeply weird and easy to overlook, making it a bug factory: https://github.com/llvm/llvm-project/issues?q=is%3Aissue+is%3Aopen+VLA+C%2B%2B

Matt added a subscriber: Matt.Jul 29 2023, 12:47 AM

Given GCC defines GNU C++ and regards this as a feature (unless you use things like -pedantic to ask for ISO C++), does it make sense to enable this for GNU C++?

I think GCC should enable -Wvla by default in GNU C++ as well, for the same reasons I'm proposing it for Clang. I've filed an issue for it at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110848

The GCC conversation is leaning towards only diagnosing by default in C++ mode but not in GNU++ mode. I'm still trying to persuade them to diagnose in both modes one last time, but if it looks like they're firm about not diagnosing in GNU++ mode, I can live with that (for now). It at least improves our security posture a bit, so it's definitely a win.

Endill added a subscriber: Endill.Aug 11 2023, 10:11 AM

Given GCC defines GNU C++ and regards this as a feature (unless you use things like -pedantic to ask for ISO C++), does it make sense to enable this for GNU C++?

I think GCC should enable -Wvla by default in GNU C++ as well, for the same reasons I'm proposing it for Clang. I've filed an issue for it at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110848

The GCC conversation is leaning towards only diagnosing by default in C++ mode but not in GNU++ mode. I'm still trying to persuade them to diagnose in both modes one last time, but if it looks like they're firm about not diagnosing in GNU++ mode, I can live with that (for now). It at least improves our security posture a bit, so it's definitely a win.

I think that we should warn by default in GNU mode regardless of GCC decision. As for the porting concern, I think it falls into "comprehensive diagnostics" selling point you mentioned earlier, which I totally agree with.

Given GCC defines GNU C++ and regards this as a feature (unless you use things like -pedantic to ask for ISO C++), does it make sense to enable this for GNU C++?

I think GCC should enable -Wvla by default in GNU C++ as well, for the same reasons I'm proposing it for Clang. I've filed an issue for it at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110848

The GCC conversation is leaning towards only diagnosing by default in C++ mode but not in GNU++ mode. I'm still trying to persuade them to diagnose in both modes one last time, but if it looks like they're firm about not diagnosing in GNU++ mode, I can live with that (for now). It at least improves our security posture a bit, so it's definitely a win.

I think that we should warn by default in GNU mode regardless of GCC decision. As for the porting concern, I think it falls into "comprehensive diagnostics" selling point you mentioned earlier, which I totally agree with.

The current discussion on the GCC issue is to diagnose by default in C++ mode and add -Wvla to -Wall in GNU++ mode, which perhaps is a nice compromise. I'm waiting to see if any further discussion happens on that issue, but if folks have opinions on that approach, I'd love to hear them.

Enable the diagnostic by default in C++ language modes, and under -Wall in GNU++ language modes.

Enable the diagnostic by default in C++ language modes, and under -Wall in GNU++ language modes.

I am happy with that approach.

Enable the diagnostic by default in C++ language modes, and under -Wall in GNU++ language modes.

I am happy with that approach.

Ditto! Some of the GCC folks also expressed a desire for this direction on their side.
Given that, and how surprising VLAs can be to c++ folks, It seems reasonable to go ahead with this patch

aaron.ballman edited the summary of this revision. (Show Details)Aug 29 2023, 9:36 AM

I put up an RFC at https://discourse.llvm.org/t/rfc-diagnosing-use-of-vlas-in-c/73109 to reach a slightly wider audience for awareness.

Updated based on feedback on the RFC. Specifically, this variant recognizes int array[cond ? 1 : -1]; or similar constructs and recommends using a static_assert instead. This new diagnostic is under its own warning group (-Wvla-extension-static-assert) so that users who want to be alerted to use of VLAs but not static_assert-like VLAs can do so, but the new warning group is still under -Wvla-extension and thus is enabled similarly.

Overall looks good, just a few nits from looking things over.

clang/include/clang/Basic/DiagnosticSemaKinds.td
143

Clarify text: "variable length arrays in C++ are a Clang extension" and same below.

I also think we should be (consistently) using the term "variable-length array" with a dash instead of "variable length array", but if you change that, it should be in a follow-on which changes it in all the messages.

clang/lib/Sema/SemaType.cpp
2617

Not sure whether to actually care, since C++98 is so old now, but, having -Wno-vla-extension-static-assert work in C++98 mode seems like it'd be quite useful, exactly because the usage cannot trivially be replaced by a static_assert. So it's a bit unfortunate that we don't distinguish it there.

Perhaps we should emit the same diagnostics in both 98/11, but with slightly different text in 98?

clang/test/CXX/dcl.dcl/dcl.spec/dcl.typedef/p2-0x.cpp
39

That we get a warning _and_ an error for this using statement now is non-ideal. I see that we are doing this in two different places, though...first in CheckType for the VLA, then diagnosing if it's used at file scope in CheckDecl. So maybe not worth fixing.

clang/test/SemaCXX/cxx1z-noexcept-function-type.cpp
124

Another unfortunate case, similar to the error case earlier, of BOTH warning about about it being a variable-length but then also warning about it actually being constant array as an extension.

clang/test/SemaCXX/offsetof.cpp
31

Weird that new-interp adds the diagnostic for C++98 mode. I wonder if that indicates a bug (e.g. if in new-interp we accidentally use C++11 rules for C++98)?

clang/test/SemaCXX/vla-ext-diag.cpp
13

I think this is "expected" due to the divergence between the use of the constant-expression evaluation in C++11-and-later and "ICE" code in previous.

aaron.ballman marked 4 inline comments as done.Oct 2 2023, 6:55 AM
aaron.ballman added a subscriber: tbaeder.
aaron.ballman added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
143

Clarify text: "variable length arrays in C++ are a Clang extension" and same below.

Sure, that's reasonable, thanks!

I also think we should be (consistently) using the term "variable-length array" with a dash instead of "variable length array", but if you change that, it should be in a follow-on which changes it in all the messages.

Agreed that we should be consistent, but the standard term of art is variable length array despite "variable-length array" being arguably more correct, and that usage is the more common in Clang (273 uses of "variable length array" compared to 64 uses of "variable-length array").

clang/lib/Sema/SemaType.cpp
2617

That's effectively what we're doing here, right?

C++98 is told "variable length arrays (in C++) are a Clang extension" and in C++11 they're told "variable length arrays (in C++) are a Clang extension; did you mean to use 'static_assert'?"

Or am I misunderstanding you?

clang/test/CXX/dcl.dcl/dcl.spec/dcl.typedef/p2-0x.cpp
39

Yeah, agreed that this is unfortunate, but it comes from the split between the type and the declaration checking code and that's going to be hard to work around.

clang/test/SemaCXX/cxx1z-noexcept-function-type.cpp
124

Yup, same kind of issue here as above in terms of the cause.

clang/test/SemaCXX/offsetof.cpp
31
clang/test/SemaCXX/vla-ext-diag.cpp
13

Ah you're exactly right about that, thank you!

aaron.ballman marked 4 inline comments as done.

Updated based on review feedback. Specifically:

  • Updated a FIXME comment in a test to clarify what's happening.
  • Reworded diagnostic to add "in C++"
cor3ntin accepted this revision.Oct 2 2023, 7:21 AM

LGTM!

This revision is now accepted and ready to land.Oct 2 2023, 7:21 AM
tbaeder added inline comments.Oct 2 2023, 7:30 AM
jyknight added inline comments.Oct 2 2023, 12:50 PM
clang/lib/Sema/SemaType.cpp
2617

In this patch, in C++98 mode, we diagnose everything under the flag -Wvla-extension, and never under the flag -Wvla-extension-static-assert. My point was that we ought to diagnose static-assert-like-VLAs under the -Wvla-extension-static-assert flag even in C++98 mode.

tahonermann added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
147–149

I find the "did you mean to use 'static_assert'" phrasing awkward here since it is highly unlikely that someone intended to write static_assert(x) and instead wrote int my_assert[x ? 1 : -1]. Perhaps something like this?

"variable length arrays in C++ are a Clang extension; 'static_assert' can be used in this case".

clang/test/SemaCXX/vla-ext-diag.cpp
35

Perhaps add a test where the conditional expression is parenthesized.

int array4[(n ? 1 : -1)];

Adding tests for one side being of size 0 might be useful to demonstrate that those are not intended to trigger the "use 'static_assert'" diagnostic. I know some compilers don't diagnose on a size of 0 and so the static assert idiom generally requires a negative number.

int array5[n ? 1 : 0]; // ok?
aaron.ballman marked an inline comment as done.Oct 3 2023, 6:59 AM
aaron.ballman added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
147–149

Eh, I'm on the fence. We're consistent about asking users "did you mean X?" in our diagnostics and this follows the same pattern. "Did you mean to use X" is not "is this a typo for X?" but "were you aware you could do X instead?" So yeah, the wording is a bit awkward, but it's consistent with other diagnostics and not really wrong either. Do you have strong feelings?

clang/lib/Sema/SemaType.cpp
2617

Ah okay! That does make sense but would be hard to pull off. Because the existing code is based around picking which diagnostic ID to use, all of the related diagnostic IDs need to take the same kind of streaming arguments, that makes passing additional arguments to control %select behavior really awkward in this case. So I'd have to add an entire new diagnostic ID for the C++98 case and I'm not certain it's really worth it. e.g.

def ext_vla : Extension<"variable length arrays are a C99 feature">,
  InGroup<VLAExtension>;
// In C++ language modes, we warn by default as an extension, while in GNU++
// language modes, we warn as an extension but add the warning group to -Wall.
def ext_vla_cxx : ExtWarn<
  "variable length arrays in C++ are a Clang extension">,
  InGroup<VLAExtension>;
def ext_vla_cxx_in_gnu_mode : Extension<ext_vla_cxx.Summary>,
  InGroup<VLAExtension>;
def ext_vla_cxx_static_assert : ExtWarn<
  "variable length arrays in C++ are a Clang extension; did you mean to use "
  "'static_assert'?">, InGroup<VLAUseStaticAssert>;
def ext_vla_cxx_in_gnu_mode_static_assert : Extension<
  ext_vla_cxx_static_assert.Summary>, InGroup<VLAUseStaticAssert>;
def ext_vla_cxx98_static_assert : ExtWarn<
  ext_vla_cxx.Summary>, InGroup<VLAUseStaticAssert>;
def ext_vla_cxx98_in_gnu_mode_static_assert : Extension<
  ext_vla_cxx98_static_assert.Summary>, InGroup<VLAUseStaticAssert>;
def warn_vla_used : Warning<"variable length array used">,
  InGroup<VLA>, DefaultIgnore;

It's not the end of the world, but do you think it's worth it?

clang/test/SemaCXX/vla-ext-diag.cpp
35

Good call on the additional test cases!

tahonermann added inline comments.Oct 3 2023, 7:38 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
147–149

No, I don't have strong feelings on this.

I did recognize use of the widely used "did you mean" pattern, but my impression with those has been that they are generally employed in situations where similar syntax is involved (e.g., misspelled name, . vs ->, return vs co_return, etc...) or where there might be some missing syntax (e.g., () following a function name or a missing ;); cases where the programmer is likely to response "oh, yes I did!". This feels different since it is suggesting use of a distinct feature.

jyknight added inline comments.Oct 3 2023, 10:04 AM
clang/lib/Sema/SemaType.cpp
2617

Yea, that's basically what I meant.

But, actually, the message is not great in C++11 mode either -- "use static_assert" isn't really what's meant, since the only reason it's diagnosed here at all is because the value (accidentally or intentionally) _isn't_ actually a constant expression and thus wouldn't work under static_assert.

A message like: variable length arrays in C++ are a Clang extension; this looks like it was intended to be a pre-C++11 emulation of static_assert, but doesn't have a constant condition would be correct in both language modes, and maybe be more to the point for the issue.

aaron.ballman marked an inline comment as not done.Oct 3 2023, 12:02 PM
aaron.ballman added inline comments.
clang/lib/Sema/SemaType.cpp
2617

But, actually, the message is not great in C++11 mode either -- "use static_assert" isn't really what's meant, since the only reason it's diagnosed here at all is because the value (accidentally or intentionally) _isn't_ actually a constant expression and thus wouldn't work under static_assert.

Good point!!

A message like: variable length arrays in C++ are a Clang extension; this looks like it was intended to be a pre-C++11 emulation of static_assert, but doesn't have a constant condition would be correct in both language modes, and maybe be more to the point for the issue.

Oof, that's a mouthful. Stepping back a bit, I think the only reason we want a second diagnostic group here is so people can be alerted to using a VLA while turning off diagnostics for static_assert emulation. From that perspective, we don't need different wording for the diagnostics so much as different diagnostic groups for the same diagnostic wording. e.g.,

void old_style_static_assert(int n) {
  int array1[n != 12 ? 1 : -1]; // warning: variable length arrays in C++ are a Clang extension [-Wvla-extension-static-assert]
  int array2[n]; // warning: variable length arrays in C++ are a Clang extension [-Wvla-extension]
}

Do you think that could be reasonable behavior, or is that too subtle?

Updated based on review feedback. Specifically:

  • Added test cases, fixed handling of parenthesized expressions.
This revision was landed with ongoing or failed builds.Oct 20 2023, 6:51 AM
This revision was automatically updated to reflect the committed changes.

Is it expected that this introduces a warning for C code, as the commit message and tests appear to only affect C++? A trivial example from the Linux kernel:

https://elixir.bootlin.com/linux/v6.5.8/source/tools/lib/bpf/btf_dump.c#L1678

#include <stddef.h>
#include <stdio.h>
#include <string.h>

void foo(char *orig_name, char **cached_name, size_t dup_cnt)
{
        const size_t max_len = 256;
        char new_name[max_len];

        snprintf(new_name, max_len, "%s___%zu", orig_name, dup_cnt);
        *cached_name = strdup(new_name);
}
$ clang -std=gnu89 -Wall -fsyntax-only test.c
test.c:8:16: warning: variable length arrays are a C99 feature [-Wvla-extension]
    8 |         char new_name[max_len];
      |                       ^~~~~~~
1 warning generated.

Is it expected that this introduces a warning for C code, as the commit message and tests appear to only affect C++? A trivial example from the Linux kernel:

https://elixir.bootlin.com/linux/v6.5.8/source/tools/lib/bpf/btf_dump.c#L1678

#include <stddef.h>
#include <stdio.h>
#include <string.h>

void foo(char *orig_name, char **cached_name, size_t dup_cnt)
{
        const size_t max_len = 256;
        char new_name[max_len];

        snprintf(new_name, max_len, "%s___%zu", orig_name, dup_cnt);
        *cached_name = strdup(new_name);
}
$ clang -std=gnu89 -Wall -fsyntax-only test.c
test.c:8:16: warning: variable length arrays are a C99 feature [-Wvla-extension]
    8 |         char new_name[max_len];
      |                       ^~~~~~~
1 warning generated.

No, that's unintended, I'll get that fixed. Thanks for letting me know!

Is it expected that this introduces a warning for C code, as the commit message and tests appear to only affect C++? A trivial example from the Linux kernel:

https://elixir.bootlin.com/linux/v6.5.8/source/tools/lib/bpf/btf_dump.c#L1678

#include <stddef.h>
#include <stdio.h>
#include <string.h>

void foo(char *orig_name, char **cached_name, size_t dup_cnt)
{
        const size_t max_len = 256;
        char new_name[max_len];

        snprintf(new_name, max_len, "%s___%zu", orig_name, dup_cnt);
        *cached_name = strdup(new_name);
}
$ clang -std=gnu89 -Wall -fsyntax-only test.c
test.c:8:16: warning: variable length arrays are a C99 feature [-Wvla-extension]
    8 |         char new_name[max_len];
      |                       ^~~~~~~
1 warning generated.

No, that's unintended, I'll get that fixed. Thanks for letting me know!

Unfortunately, this will require complicating the diagnostic groups slightly more. We don't have facilities that allow us to say that a single diagnostic is grouped under -Wall in one language mode but not another, and we don't have a way for diagnostic groups to share the same string (so we can't have def VLACxxExtension : DiagGroup<"vla-extension", [VLAUseStaticAssert]>; and def VLAExtension : DiagGroup<"vla-extension", [VLACxxExtension]>;).

I think I will address this by adding -Wvla-cxx-extension and putting the C++ warnings under it, and that warning group will be added to -Wall. e.g.,

diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index dcdae38013d2..4cb792132d6e 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -849,7 +849,8 @@ def VariadicMacros : DiagGroup<"variadic-macros">;
 def VectorConversion : DiagGroup<"vector-conversion">;      // clang specific
 def VexingParse : DiagGroup<"vexing-parse">;
 def VLAUseStaticAssert : DiagGroup<"vla-extension-static-assert">;
-def VLAExtension : DiagGroup<"vla-extension", [VLAUseStaticAssert]>;
+def VLACxxExtension : DiagGroup<"vla-cxx-extension", [VLAUseStaticAssert]>;
+def VLAExtension : DiagGroup<"vla-extension", [VLACxxExtension]>;
 def VLA : DiagGroup<"vla", [VLAExtension]>;
 def VolatileRegisterVar : DiagGroup<"volatile-register-var">;
 def Visibility : DiagGroup<"visibility">;
@@ -1086,7 +1087,8 @@ def Consumed       : DiagGroup<"consumed">;
 // warning should be active _only_ when -Wall is passed in, mark it as
 // DefaultIgnore in addition to putting it here.
 def All : DiagGroup<"all", [Most, Parentheses, Switch, SwitchBool,
-                            MisleadingIndentation, PackedNonPod, VLAExtension]>;
+                            MisleadingIndentation, PackedNonPod,
+                            VLACxxExtension]>;

 // Warnings that should be in clang-cl /w4.
 def : DiagGroup<"CL4", [All, Extra]>;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index a4c1cb08de94..3bcbb003d6de 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -141,9 +141,9 @@ def ext_vla : Extension<"variable length arrays are a C99 feature">,
 // language modes, we warn as an extension but add the warning group to -Wall.
 def ext_vla_cxx : ExtWarn<
   "variable length arrays in C++ are a Clang extension">,
-  InGroup<VLAExtension>;
+  InGroup<VLACxxExtension>;
 def ext_vla_cxx_in_gnu_mode : Extension<ext_vla_cxx.Summary>,
-  InGroup<VLAExtension>;
+  InGroup<VLACxxExtension>;
 def ext_vla_cxx_static_assert : ExtWarn<
   "variable length arrays in C++ are a Clang extension; did you mean to use "
   "'static_assert'?">, InGroup<VLAUseStaticAssert>;

Any concerns with this approach?

Any concerns with this approach?

Sounds reasonable to me

Any concerns with this approach?

Sounds reasonable to me

Thanks for the double-check! This should now be fixed in https://github.com/llvm/llvm-project/commit/f8d448d5e587a23886c3226957f880146a4d8c69, sorry for the hassle!