This is an archive of the discontinued LLVM Phabricator instance.

[Sema] disable -Wvla for function array parameters
AbandonedPublic

Authored by inclyc on Aug 30 2022, 8:47 AM.

Details

Reviewers
aaron.ballman
rsmith
Group Reviewers
Restricted Project
Summary

Diff Detail

Event Timeline

inclyc created this revision.Aug 30 2022, 8:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2022, 8:47 AM
inclyc published this revision for review.Aug 30 2022, 8:47 AM
inclyc added reviewers: rsmith, Restricted Project.
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2022, 8:48 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
erichkeane added inline comments.
clang/include/clang/Sema/Sema.h
1760

I haven't looked particularly closely, but this doesn't seem right to me. There is already the 'K_Nop' diagnostic Kind (see the line below), that we should look at the uses of.

inclyc added inline comments.Aug 30 2022, 9:12 AM
clang/include/clang/Sema/Sema.h
1760

I've tried using the following constructor, I have to to pass in a series of redundant parameters (such as DiagID and Loc), which makes it strange to construct a Builder emitting no error message at all?

e.g.

Sema::SemaDiagnosticBuilder(Sema::SemaDiagnosticBuilder::Kind::K_Nop, Loc, 0,
                          nullptr, S);

Thank you for working on this! Can you be sure to add more information to the description so that reviewers more quickly understand why the changes are being made?

In terms of the changes themselves, I don't think they're heading in the right direction just yet. The changes are because we want to implement https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2992.pdf, which clarifies what constitutes a VLA, which is a problem with our type system or how we use it. We have the VariableArrayType to represent a VLA type and I think we're using that type when we should be using a variably modified type (see Type::isVariablyModifiedType()).

clang/include/clang/Sema/Sema.h
1760

+1, I think it's dangerous to add this interface (especially because it's not an explicit ctor).

clang/test/Sema/warn-vla.c
8–12

The diagnostic there is rather unfortunate because we're not using a variable-length array in this case.

inclyc planned changes to this revision.Aug 30 2022, 9:51 AM

We have the VariableArrayType to represent a VLA type and I think we're using that type when we should be using a variably modified type

The clang AST for function signatures encodes two types: the type as written, and the type after promotion. Semantic analysis always uses the type after promotion, but the type as written is useful if you're trying to do source rewriting, or something like that.

We could theoretically mess with the AST representation somehow, but I don't see any compelling reason to. We probably just want to emit the warn_vla_used diagnostic somewhere else.

We have the VariableArrayType to represent a VLA type and I think we're using that type when we should be using a variably modified type

The clang AST for function signatures encodes two types: the type as written, and the type after promotion. Semantic analysis always uses the type after promotion, but the type as written is useful if you're trying to do source rewriting, or something like that.

Agreed; we need the type as written for -ast-print support, as well as AST matchers, etc.

We could theoretically mess with the AST representation somehow, but I don't see any compelling reason to. We probably just want to emit the warn_vla_used diagnostic somewhere else.

Yeah, re-reading the C standard, I think I got VM types slightly wrong. Consider:

void func(int n, int array[n]);

array is a VLA (C2x 6.7.6.2p4) before pointer adjustment (C2x 6.7.6.3p6), which makes func a VM type (C2x 6.7.6p3), not array.

We could theoretically mess with the AST representation somehow, but I don't see any compelling reason to. We probably just want to emit the warn_vla_used diagnostic somewhere else.

Yeah, re-reading the C standard, I think I got VM types slightly wrong. Consider:

void func(int n, int array[n]);

array is a VLA (C2x 6.7.6.2p4) before pointer adjustment (C2x 6.7.6.3p6), which makes func a VM type (C2x 6.7.6p3), not array.

The type of parameter array after promotion (which is the type that matters for almost any purpose), is int*. That isn't variably modified. And the function type is just void(int,int*); there's no way we can possibly treat that as variably modified.

If you wrote, say void func(int n, int array[n][n]), then array is variably modified. "func" is still not variably modified, I think? "array" is not part of the "nested sequence of declarators", despite the fact that the parameter list contains a variably modified type. At least, that's clang's current interpretation of the rules.

We could theoretically mess with the AST representation somehow, but I don't see any compelling reason to. We probably just want to emit the warn_vla_used diagnostic somewhere else.

Yeah, re-reading the C standard, I think I got VM types slightly wrong. Consider:

void func(int n, int array[n]);

array is a VLA (C2x 6.7.6.2p4) before pointer adjustment (C2x 6.7.6.3p6), which makes func a VM type (C2x 6.7.6p3), not array.

The type of parameter array after promotion (which is the type that matters for almost any purpose), is int*. That isn't variably modified. And the function type is just void(int,int*); there's no way we can possibly treat that as variably modified.

On the one hand, yes, that makes sense to me. On the other hand, 6.7.6p3 says:

A full declarator is a declarator that is not part of another declarator. If, in the nested sequence of
declarators in a full declarator, there is a declarator specifying a variable length array type, the type
specified by the full declarator is said to be variably modified. Furthermore, any type derived by
declarator type derivation from a variably modified type is itself variably modified.

array is not a full declarator in my example, func is, but the standard waves its hands around a lot regarding when the adjustments to function parameters happen. Notionally, I think they only impact the definition of the function body. So I think the function declaration type is void(int, int[]) and I think the decayed function type on function call is void (*)(int, int *), but I'm not 100% sure. But that's why I was thinking func is a VM type.

If you wrote, say void func(int n, int array[n][n]), then array is variably modified. "func" is still not variably modified, I think? "array" is not part of the "nested sequence of declarators", despite the fact that the parameter list contains a variably modified type. At least, that's clang's current interpretation of the rules.

Once upon a time I had convinced myself that Clang got our interpretation wrong here, and I was hoping I'd rediscover exactly why I thought that as I worked my way through the C DRs and C feature statuses by adding actual test coverage. A lot of Clang's C functionality is less rigorously tested, especially the older stuff, so it might help to start pulling together more rigorous testing in this area. That would also help us to identify situations where we're not quite certain what the standard says, and I can go back to WG14 with a list of questions.

At the very least, I don't read the C grammar as supporting that interpretation of the rules. declarator reaches direct-declarator which reaches function-declarator which reaches parameter-type-list which reaches parameter-list then parameter-declaration, which finally gets us back to declarator for us to start parsing array which ends up being a VLA. So func is the full declarator in that case and thus is VM, at least as I read the spec.

As a practical matter, there isn't any reason to force variably modified parameters to make a function variably modified. The types of parameters aren't visible in the caller of a function; we only check the compatibility for calls.

Trying to treat void (*)(int, int *) as variably modified would have additional complications, in that clang would internally need to have two canonical types for void (*)(int, int *): one variably modified, one not variably modified.

So if there's a disagreement between clang and the standard, I'd rather just try to fix the standard.

As a practical matter, there isn't any reason to force variably modified parameters to make a function variably modified. The types of parameters aren't visible in the caller of a function; we only check the compatibility for calls.

Trying to treat void (*)(int, int *) as variably modified would have additional complications, in that clang would internally need to have two canonical types for void (*)(int, int *): one variably modified, one not variably modified.

I have been thinking about this problem for different reasons.

I wonder if the right solution here is to just change AdjustedType so that it desugars to the OriginalType instead of the AdjustedType, but keep the latter as the canonical type.
And not bother storing a separate QualType fot the AdjustedType.

That way, after a decay, the as-written type could be VM, but not the canonical type.

Similarly with AttributedType vis EquivalentType and ModifiedType.

We would lose though the nice property that a single step desugar always produces the same type.

I wonder how radical a change this would be for the users.

As a practical matter, there isn't any reason to force variably modified parameters to make a function variably modified. The types of parameters aren't visible in the caller of a function; we only check the compatibility for calls.

There are constraints in C based on whether something is or isn't a variably modified type, so I think there is a reason to make sure we're correct here. For example:

// All at file scope
int n = 100;
void func(int array[n++]);
typedef typeof(func) other_func;

C2x 6.7.8p2: "If a typedef name specifies a variably modified type then it shall have block scope." Even more interesting is C2x 6.7.2.5p4 (typeof, which is new in C2x): "If the type of the operand is a variably modified type, the operand is evaluated; otherwise the operand is not evaluated."

Trying to treat void (*)(int, int *) as variably modified would have additional complications, in that clang would internally need to have two canonical types for void (*)(int, int *): one variably modified, one not variably modified.

So if there's a disagreement between clang and the standard, I'd rather just try to fix the standard.

I'll ask on the WG14 reflectors to see if I'm interpreting the variably modified type specification wrong or not.

I'll ask on the WG14 reflectors to see if I'm interpreting the variably modified type specification wrong or not.

There's some agreement that the existing words could stand to be improved; others on the committee had similar questions and concerns as we had. The sentiment expressed so far has been that a function type cannot be a VM type except perhaps (uselessly) as part of its return type. If a function were to be a VM type, it couldn't be declared at file scope, and that would be a bit... silly. One thing I missed when investigating this was C2x 6.2.5p23:

"A function type describes a function with specified return type. A function type is characterized by its return type and the number and types of its parameters. A function type is said to be derived from its return type, and if its return type is T, the function type is sometimes called "function returning T". The construction of a function type from a return type is called "function type derivation".

This is kind of a hand-wavey way of saying that the function's return type is part of the sequence of declarators for the full declarator, but the parameter list is not, because the function type is derived from the return type alone. Maybe.

All this said, there was a little bit of sentiment expressed that it would be nice if C would allow functions to be a VM type so that the type system doesn't lose important semantic information, but that's a big project and who knows whether anyone will try to take it on or not.

Unwinding the stack a bit back to what got us on this discussion: I think I agree with Eli that we're probably diagnosing VLAs from the wrong place instead of modeling them wrong in the type system.

inclyc added inline comments.Sep 4 2022, 8:36 PM
clang/test/Sema/warn-vla.c
8–12

Emm, I'm not clear about whether we should consider this a VLA, and generates -Wvla-extensions. Is v[n] literally a variable-length array? (in source code) So it seems to me that we should still report c89 incompatibility warnings?

aaron.ballman added inline comments.Sep 6 2022, 8:25 AM
clang/test/Sema/warn-vla.c
8–12

C89's grammar only allowed for an integer constant expression to (optionally) appear as the array extent in an array declarator, so there is a compatibility warning needed for that. But I don't think we should issue a warning about this being a VLA in C99 or later. The array *is* a VLA in terms of the form written in the source, but C adjusts the parameter to be a pointer parameter, so as far as the function's type is concerned, it's not a VLA (it's just a self-documenting interface).

Because self-documenting code is nice and because people are worried about accidental use of VLAs that cause stack allocations (which this does not), I think we don't want to scare people off from this construct. But I'm curious what others think as well.

aaron.ballman added inline comments.Sep 6 2022, 8:31 AM
clang/test/Sema/warn-vla.c
8–12

But I'm curious what others think as well.

(Specifically, I'm wondering if others agree that the only warning that should be issued there is a C89 compatibility warning under -Wvla-extensions when in C89 mode and via a new CPre99Compat diagnostic group when enabled in C99 and later modes.)

efriedma added inline comments.Sep 6 2022, 9:24 AM
clang/test/Sema/warn-vla.c
8–12

I imagine people working with codebases that are also built with compilers that don't support VLAs would still want some form of warning on any VLA type.

aaron.ballman added inline comments.Sep 6 2022, 9:29 AM
clang/test/Sema/warn-vla.c
8–12

The tricky part is: that's precisely what WG14 N2992 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2992.pdf) is clarifying. If your implementation doesn't support VLAs, it's still required to support this syntactic form. So the question becomes: do we want a portability warning to compilers that don't conform to the standard? Maybe we do (if we can find evidence of compilers in such a state), but probably under a specific diagnostic flag rather than -Wvla.

efriedma added inline comments.Sep 6 2022, 9:45 AM
clang/test/Sema/warn-vla.c
8–12

That only applies to C23, though, right? None of that wording is there for C11. (In particular, MSVC claims C11 conformance without any support for VLA types.)

aaron.ballman added inline comments.Sep 6 2022, 9:55 AM
clang/test/Sema/warn-vla.c
8–12

In a strict sense, yes, this is a C23 change. There are no changes to older standards as far as ISO is concerned (and there never have been).

In a practical sense, no, this is clarifying how VLAs have always been intended to work. C23 is in a really weird spot in that we had to stop our "defect report" process at the end of C17 because the ISO definition of a defect did not match the WG14 definition of a defect, and ISO got upset. Towards the tail end of the C23 cycle, we introduced a list of extensions to obsolete versions of C (https://www.open-std.org/jtc1/sc22/wg14/www/previous.html) as a stopgap, but the convener would not allow us to retroactively add papers to it. The committee still hasn't gotten used to this new process and as a result, not many papers have made it to the list. We've also not had the chance to discuss https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3002.pdf on having a more reasonable issue tracking process.

The end result is that there's a whole pile of papers in C23 that we would have normally treated via the old defect report process but couldn't, and if we had treated them via the old DR process, it would have been more clear why I think this should be a retroactive change back to C99 instead of a C23-only change.

efriedma added inline comments.Sep 6 2022, 11:58 AM
clang/test/Sema/warn-vla.c
8–12

Even if the committee can make retroactive changes to standards, that doesn't change the fact that MSVC doesn't support VLAs, and that codebases exist which need to be compiled with both clang and MSVC.

aaron.ballman added inline comments.Sep 6 2022, 12:17 PM
clang/test/Sema/warn-vla.c
8–12

Such code bases are already supported: https://godbolt.org/z/7n768WKW7 but I take your point that not everyone writes portable code when trying to port between compilers, and so some form of diagnostic would be nice. But I don't think that diagnostic is spelled -Wvla because the whole point of the paper was to clarify that such a function signature is mandatory to support even if you claim you don't support VLAs and the purpose to -Wvla is to identify nonportable code because it's using a VLA.

Alternatively, when in -fms-compatibility mode, we could disallow VLAs and function signatures which involve one because that's not compatible with MSVC. (No idea just how much code that would break or whether I think this idea has much merit or not, just observing that VLAs are a weird boundary case in terms of MS compatibility because they're not a language extension.)

efriedma added inline comments.Sep 6 2022, 12:46 PM
clang/test/Sema/warn-vla.c
8–12

I think there's some value in having flags for both "stack-allocated variable length array" and "any type spelled using a VLA". Which one of those is spelled "-Wvla" is up for debate, I guess. I'd be inclined to let the the existing warning flag continue to work the way it has for the past; both clang and gcc have supported it with the existing semantics for a long time. It's not like there's any shortage of flag names.

When I was referring to "codebases compiled with both clang and MSVC", I wasn't specifically referring to codebases that use clang on Windows; I was also referring to codebases that use MSVC on Windows, and clang on other targets. How -fms-compatibility works is a separate subject.

the purpose to -Wvla is to identify nonportable code

The standard committee can't magically make something "portable"; portability depends on implementations.

aaron.ballman added inline comments.Sep 6 2022, 1:34 PM
clang/test/Sema/warn-vla.c
8–12

I think there's some value in having flags for both "stack-allocated variable length array" and "any type spelled using a VLA". Which one of those is spelled "-Wvla" is up for debate, I guess. I'd be inclined to let the the existing warning flag continue to work the way it has for the past; both clang and gcc have supported it with the existing semantics for a long time. It's not like there's any shortage of flag names.

Yeah, that's a different way of delineating than I was thinking originally and it's worth more thought. I was thinking the separation would be "this is a VLA" (for people who want to avoid all VLA stack allocations due to the security concerns) and "this is a portability concern" (for people who want to port to older language standards, C++, other compilers).

the purpose to -Wvla is to identify nonportable code

The standard committee can't magically make something "portable"; portability depends on implementations.

Agreed, but you cut the quote short: "because it's using a VLA." was weight-bearing. -Wvla warning on things which are not VLAs according to the standard is largely why we fixed the standard to make it clear this construct wasn't a VLA as far as __STDC_NO_VLA__ support is concerned. But you're point is also valid that we probably need something about the portability of the syntax rather than only diagnose based on the semantics.

inclyc added a comment.Sep 6 2022, 5:36 PM

Yeah, that's a different way of delineating than I was thinking originally and it's worth more thought. I was thinking the separation would be "this is a VLA" (for people who want to avoid all VLA stack allocations due to the security concerns) and "this is a portability concern" (for people who want to port to older language standards, C++, other compilers).

I think it is a good idea to separate -Wvla into -Wvla-portability(warnings on portability) and -Wvla-stack (warnings on stack allocations, security issue). Report -Wvla-portability if compilers implementations in practice don't support any vla syntax especially in C89 mode and report -Wvla-stack if it causes stack allocations.

Yeah, that's a different way of delineating than I was thinking originally and it's worth more thought. I was thinking the separation would be "this is a VLA" (for people who want to avoid all VLA stack allocations due to the security concerns) and "this is a portability concern" (for people who want to port to older language standards, C++, other compilers).

I think it is a good idea to separate -Wvla into -Wvla-portability(warnings on portability) and -Wvla-stack (warnings on stack allocations, security issue). Report -Wvla-portability if compilers implementations in practice don't support any vla syntax especially in C89 mode and report -Wvla-stack if it causes stack allocations.

This is largely the same conclusion I think I've come to as well. Add two new warning groups (we can bikeshed the names later): -Wvla-portability and -Wvla-stack-allocation grouped under the existing -Wvla. The behavior of -Wvla won't change, but users can opt into the kind of specific diagnostic they want. void func(int n, int array[n]); would trigger -Wvla-portability but not -Wvla-stack-allocation, and void func(int n) { int array[n]; } would trigger both diagnostics (though we could debate having it skip the -Wvla-portability warning whenever giving any other "you're using a VLA" kind of warning). @efriedma are you on board with that plan as well?

That sounds fine.