This is an archive of the discontinued LLVM Phabricator instance.

[Attr] Apply GNU-style attributes to expression statements
ClosedPublic

Authored by vsavchenko on Dec 21 2020, 5:19 AM.

Details

Summary

Before this commit, expression statements could not be annotated
with statement attributes. Whenever parser found attribute, it
unconditionally assumed that it was followed by a declaration.
This not only doesn't allow expression attributes to have attributes,
but also produces spurious error diagnostics.

Diff Detail

Event Timeline

vsavchenko created this revision.Dec 21 2020, 5:19 AM
vsavchenko requested review of this revision.Dec 21 2020, 5:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 21 2020, 5:19 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Fix block.c test

aaron.ballman requested changes to this revision.Dec 22 2020, 7:04 AM

Thank you for working on this, it's a problem that's annoyed me for a while but I've never found a satisfactory solution to. Unfortunately, it causes us to reject valid code (which is the same problem I kept running up against when I last tried).

clang/test/Index/blocks.c
8 ↗(On Diff #313316)

This change should not be necessary as the original code was valid (and I think was testing that validity, but could probably use a comment to make that explicit).

clang/test/Sema/address_spaces.c
12 ↗(On Diff #313316)

This breaks valid C89 code that uses implicit int (it is also a less helpful diagnostic, but that's secondary to the rejects valid bug).

clang/test/Sema/block-literal.c
44 ↗(On Diff #313316)

This is also valid C89 that should not be rejected.

clang/test/SemaTemplate/address_space-dependent.cpp
12–13 ↗(On Diff #313316)

I think this is a pretty unfortunate degradation of diagnostic wording. The code looks like a somewhat validish declaration, so the diagnostic about a missing type specifier is more helpful than a use of an undeclared identifier that the user is likely trying to declare.

This revision now requires changes to proceed.Dec 22 2020, 7:04 AM

@aaron.ballman I totally agree, but I also would like to understand.
__attribute__ is a GNU extension, right? Then why does it affect the grammar of C? I always thought that attributes should be somewhat transparent for parsers, but it looks like in this situation all compilers automatically assume that __attribute__ begins a declaration.
It is unclear to me why *x;, [[unknown]] *x; (dereference of x) and __attribute__((unknown)) *x; (declaration of int *) have different meanings.

Does it essentially mean that there is no way to implement statement attributes in C/Obj-C?
Because even if we introduce some heuristics for understanding that what we parsed is a declaration attribute -> what follows must be a declaration, attributes that can be applied to both declaration and statements will cause confusion.

@aaron.ballman I totally agree, but I also would like to understand.
__attribute__ is a GNU extension, right?

Correct.

Then why does it affect the grammar of C? I always thought that attributes should be somewhat transparent for parsers, but it looks like in this situation all compilers automatically assume that __attribute__ begins a declaration.

GNU attributes are kind of awful in that they "slide" around to whatever construct seems to make the most sense based on the given attribute. My guess is that this assumption is baked into compilers because otherwise there would be parsing ambiguities. e.g., __attribute__((attr)) int x = ..., y, z; could either be a declaration attribute that applies to x, y, and z or maybe it's a statement attribute that's meant to suppress a diagnostic on the initialization of x (that sort of thing), and it would take too much lookahead to make a determination based on the attribute name.

It is unclear to me why *x;, [[unknown]] *x; (dereference of x) and __attribute__((unknown)) *x; (declaration of int *) have different meanings.

C and C++ style attributes have a very well-defined meaning in terms of what they appertain to based on the syntactic location of the attribute rather than based on the name of the attribute. That's why [[attr]] *x; and __attribute__((attr))) *x behave differently.

Does it essentially mean that there is no way to implement statement attributes in C/Obj-C?
Because even if we introduce some heuristics for understanding that what we parsed is a declaration attribute -> what follows must be a declaration, attributes that can be applied to both declaration and statements will cause confusion.

That was really the crux of the problem I kept running into as well. I wouldn't say there's "no way" to do it, but it's not as trivial as it feels like it should be.

If we could tell that the attribute is a decl or stmt attribute, we could throw our hands up in that specific circumstance and say "this code is ambiguous", but then we're extending the syntactic locations of where GNU attributes can be written and what they mean (so we'd diverge from GCC). But then again, we've done that before (such as with allowing you to write a GNU-style attribute on a function definition, as in: https://godbolt.org/z/PGrfP3 (but this has caused users some pain, from what I understand).

Perhaps coordinating with the GCC folks on this may not be a bad idea. However, taking a step back -- what attributes would need this functionality (and couldn't be written on something within the expression statement)?

However, taking a step back -- what attributes would need this functionality (and couldn't be written on something within the expression statement)?

It is still good old suppress, it is very counter-intuitive when you put suppress and it causes some weird parse errors: https://godbolt.org/z/zzY64q

Maintain previous behavior unless preceeded by a statement attribute

However, taking a step back -- what attributes would need this functionality (and couldn't be written on something within the expression statement)?

It is still good old suppress, it is very counter-intuitive when you put suppress and it causes some weird parse errors: https://godbolt.org/z/zzY64q

Yeah, I kind of figured that might be the cause. I'm not 100% convinced (one way or the other) if the suppress attribute should get a GNU spelling. The [[]] spellings are available in all language modes (we have -fdouble-square-bracket-attributes to enable this) and don't run afoul of the "guess what this attribute appertains to" problem that GNU-style attributes do.

clang/lib/Parse/ParseStmt.cpp
214–218

I think you need to ensure there's at least one attribute before checking !Attrs.back().isStmtAttr() as this may cause problems if the user does something odd like __attribute__(()) int x; (I don't know if this will result in a valid GNUAttributeLoc with no attributes or not.)

I'm not certain that logic is perfect either. It would be pretty mysterious to handle these cases differently:

__attribute__((stmt_attr, decl_attr)) int a, b, c;
__attribute__((decl_attr, stmt_attr)) int x, y, z;

Refine condition for statement attributes

Yeah, I kind of figured that might be the cause. I'm not 100% convinced (one way or the other) if the suppress attribute should get a GNU spelling. The [[]] spellings are available in all language modes (we have -fdouble-square-bracket-attributes to enable this) and don't run afoul of the "guess what this attribute appertains to" problem that GNU-style attributes do.

I don't think that we can force our users into adding this flag. Also, Obj-C codebases already use a good amount of GNU-style attributes, so it is pretty natural there.

clang/lib/Parse/ParseStmt.cpp
214–218

Yep, my bad. I changed it so that ALL the attributes in the front should be statement attributes.

Yeah, I kind of figured that might be the cause. I'm not 100% convinced (one way or the other) if the suppress attribute should get a GNU spelling. The [[]] spellings are available in all language modes (we have -fdouble-square-bracket-attributes to enable this) and don't run afoul of the "guess what this attribute appertains to" problem that GNU-style attributes do.

I don't think that we can force our users into adding this flag.

You don't have to -- ObjC could always imply the flag is set (and I imagine ObjC will eventually update to C23 as the base language, where it'll be supported anyway).

Also, Obj-C codebases already use a good amount of GNU-style attributes, so it is pretty natural there.

IIRC, those attributes are often hidden behind macros, so I'm not certain how critical this is if the -fdouble-square-bracket-attributes feature was enabled in ObjC mode.

clang/lib/Parse/ParseStmt.cpp
214–218

I think this is a better approach but it still doesn't leave us a clean way to handle attributes that are both declaration and statement attributes. I'm nervous about this in the same way I'm nervous about GNU attributes sliding around in general -- it doesn't always work out the way a user might expect.

nomerge is a good example. You currently cannot write this code:

int f(void);

void func(void) {
  __attribute__((nomerge)) static int i = f();
}

because of the way GNU attributes start a declaration. However, I think this could also be considered a bug because nomerge on a statement (which this also is) will ensure that the call expression to f() is not merged during optimization.

If we now say "GNU attribute can now be used at the start of an arbitrary statement", we have to late parse the attribute until we know what kind of statement we have. Because the user could have written:

void func(void) {
  __attribute__((nomerge)) extern void f(void), g(int), h(void);
}

which is currently accepted today and would change meaning if we treated nomerge as a statement attribute.

clang/test/Parser/stmt-attributes.c
89

This is a semi-good example of the kind of behavior I was worried about -- nomerge can be applied to a declaration or a statement, just not *this* kind of declaration, and so the diagnostic is confusing.

vsavchenko added inline comments.Jan 5 2021, 1:26 AM
clang/lib/Parse/ParseStmt.cpp
214–218

However, I think this could also be considered a bug because nomerge on a statement (which this also is) will ensure that the call expression to f() is not merged during optimization.

If we look at clang/test/Sema/attr-nomerge.cpp, we can see that we have exactly the same behavior as in C++ with [[clang::nomerge]]:

[[clang::nomerge]] static int i = f(); // expected-error {{'nomerge' attribute cannot be applied to a declaration}}

If we now say "GNU attribute can now be used at the start of an arbitrary statement", we have to late parse the attribute until we know what kind of statement we have. Because the user could have written:

void func(void) {
  __attribute__((nomerge)) extern void f(void), g(int), h(void);
}

which is currently accepted today and would change meaning if we treated nomerge as a statement attribute.

I checked that as well and for this example before this patch, after this patch, and in C++ when using [[clang::nomerge]] we get exactly the same behavior: 3 errors 'nomerge' attribute cannot be applied to a declaration.

I guess I want to clarify one point here, after this patch the parser will not assume that statement always follows statement attributes. We simply turn off the assumption that what follows is a declaration, parser will simply determine whether it is a statement or a declaration, it can do it on its own without attribute's "help".

I guess I want to clarify one point here, after this patch the parser will not assume that statement always follows statement attributes. We simply turn off the assumption that what follows is a declaration, parser will simply determine whether it is a statement or a declaration, it can do it on its own without attribute's "help".

We used to say that if there's a GNU attribute (or we're parsing a declaration statement), what follows must be a declaration. Now we're using the attributes in the given attribute list to decide whether what follows is a declaration or a statement (if all of the attributes are statement attributes, we don't treat what follows as being a declaration unless we're in a declaration statement). Or am I misreading the code?

e.g.,

void foo(void) {
  __attribute__((fallthrough)) i = 12;
}

Before patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:2:32: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
  __attribute__((fallthrough)) i = 12;
                               ^
F:\Aaron Ballman\Desktop\test.c:2:18: error: 'fallthrough' attribute cannot be applied to a declaration
  __attribute__((fallthrough)) i = 12;
                 ^             ~
1 warning and 1 error generated.

After patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:2:32: error: use of undeclared identifier 'i'
  __attribute__((fallthrough)) i = 12;
                               ^
1 error generated.

So I think this will cause issues for the suppression attribute you're working on if we allow it to have a GNU spelling with valid C89 code like; __attribute__((suppress)) i = 12; or __attribute__((suppress)) g() { return 12; } which may be intended to suppress the "type specifier missing" diagnostic (somewhat amusingly, since the user could also just add the int to the declaration since they're changing the code anyway).

clang/lib/Parse/ParseStmt.cpp
214–218

If we look at clang/test/Sema/attr-nomerge.cpp, we can see that we have exactly the same behavior as in C++ with [[clang::nomerge]]

I don't read it quite the same way -- in C and C++, the syntactic position of the attribute defines what the attribute appertains to, unlike with GNU attributes. So the end result is the same, but for different reasons. In the case of the C or C++ attribute spelling, an attribute at that position applies to all of the declarations in the declaration group, and the declaration is of the wrong type (hence the error). In the case of the GNU attribute, we guess wrong whether the attribute appertains to the declaration(s) or the statement because we don't know which to pick solely based on the name of the attribute. We wind up picking the declaration version and it applies to the wrong kind of declaration so we get the error. However, it's defensible to argue that as part of the heuristic for determining what the attribute appertains to we should be checking that the attribute actually applies successfully -- e.g., we try it as a decl attr, find that it doesn't apply, then try again as a stmt attr. (Note, I'm not at all suggesting that we should *do* this work, just pointing out that the "sliding" behavior of GNU attributes makes this tricky to say what the right answer is.)

I checked that as well and for this example before this patch, after this patch, and in C++ when using [[clang::nomerge]] we get exactly the same behavior: 3 errors 'nomerge' attribute cannot be applied to a declaration.

Huh, that's odd, because for me, before this patch, I get no errors with that example: https://godbolt.org/z/fsGr5n

I guess I want to clarify one point here, after this patch the parser will not assume that statement always follows statement attributes. We simply turn off the assumption that what follows is a declaration, parser will simply determine whether it is a statement or a declaration, it can do it on its own without attribute's "help".

We used to say that if there's a GNU attribute (or we're parsing a declaration statement), what follows must be a declaration. Now we're using the attributes in the given attribute list to decide whether what follows is a declaration or a statement (if all of the attributes are statement attributes, we don't treat what follows as being a declaration unless we're in a declaration statement). Or am I misreading the code?

We do not decide whether it is declaration or statement. I do get that it is a matter of perspective, but right now I prefer to read it like this:
BEFORE: if we've seen a GNU-style attribute parse whatever comes next as a declaration
AFTER: if we've seen a GNU-style attribute except for the case when all of the parsed attributes are known to be statement attributes, parse whatever comes next as a declaration

So, essentially, when we see statement attributes only, attribute/declaration heuristic is canceled and we parse exactly the way we would've parsed as if no attributes present.

e.g.,

void foo(void) {
  __attribute__((fallthrough)) i = 12;
}

Before patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:2:32: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
  __attribute__((fallthrough)) i = 12;
                               ^
F:\Aaron Ballman\Desktop\test.c:2:18: error: 'fallthrough' attribute cannot be applied to a declaration
  __attribute__((fallthrough)) i = 12;
                 ^             ~
1 warning and 1 error generated.

After patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:2:32: error: use of undeclared identifier 'i'
  __attribute__((fallthrough)) i = 12;
                               ^
1 error generated.

In both situations, the code contains errors. I don't think that this is a valid argument, to be honest. Actually, the nature of this change is that it starts to parse strictly MORE cases.

So I think this will cause issues for the suppression attribute you're working on if we allow it to have a GNU spelling with valid C89 code like; __attribute__((suppress)) i = 12;

This is not a valid code if you remove __attribute__((suppress)). i = 12 without a prior declaration of i will cause the use of undeclared identifier 'i' error. You can try it with any compiler: https://godbolt.org/z/P43nxn.
I honestly don't see a scenario when the user has some piece of code that is parsed as a statement, then they mindfully add a statement attribute, and expect it to be parsed as a declaration after that.

or __attribute__((suppress)) g() { return 12; } which may be intended to suppress the "type specifier missing" diagnostic (somewhat amusingly, since the user could also just add the int to the declaration since they're changing the code anyway).

Again, we do not force the parser to parse whatever follows the statement attribute as a statement. __attribute__((suppress)) will not change how we parse things here.

I guess I want to clarify one point here, after this patch the parser will not assume that statement always follows statement attributes. We simply turn off the assumption that what follows is a declaration, parser will simply determine whether it is a statement or a declaration, it can do it on its own without attribute's "help".

We used to say that if there's a GNU attribute (or we're parsing a declaration statement), what follows must be a declaration. Now we're using the attributes in the given attribute list to decide whether what follows is a declaration or a statement (if all of the attributes are statement attributes, we don't treat what follows as being a declaration unless we're in a declaration statement). Or am I misreading the code?

We do not decide whether it is declaration or statement. I do get that it is a matter of perspective, but right now I prefer to read it like this:
BEFORE: if we've seen a GNU-style attribute parse whatever comes next as a declaration
AFTER: if we've seen a GNU-style attribute except for the case when all of the parsed attributes are known to be statement attributes, parse whatever comes next as a declaration

So, essentially, when we see statement attributes only, attribute/declaration heuristic is canceled and we parse exactly the way we would've parsed as if no attributes present.

I think this is a defensible design, but it still has the potential to change the parsing behavior in some circumstances and I want to make sure those circumstances are ones we know about. Note, I still think we should coordinate this change with the GCC folks. GCC has a lot of attributes that Clang does not have, so my concern with GCC is that Clang supports writing attributes in this way and GCC can't support it for some reason and it causes issues for an attribute we want to share between implementations. I don't have a specific case in mind that may cause an issue but they have a lot of attributes I'm unfamiliar with.

e.g.,

void foo(void) {
  __attribute__((fallthrough)) i = 12;
}

Before patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:2:32: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
  __attribute__((fallthrough)) i = 12;
                               ^
F:\Aaron Ballman\Desktop\test.c:2:18: error: 'fallthrough' attribute cannot be applied to a declaration
  __attribute__((fallthrough)) i = 12;
                 ^             ~
1 warning and 1 error generated.

After patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:2:32: error: use of undeclared identifier 'i'
  __attribute__((fallthrough)) i = 12;
                               ^
1 error generated.

In both situations, the code contains errors. I don't think that this is a valid argument, to be honest. Actually, the nature of this change is that it starts to parse strictly MORE cases.

My concerns are if there are cases where we would start to reject valid code based on the presence of a GNU statement attribute or where we'd regress diagnostics. This seemed like a case where we were regressing diagnostics and could potentially reject otherwise valid code (using a different attribute that wouldn't generate an error in the way fallthrough does). We previously were saying "aha, this is a declaration and the attribute cannot apply to it" which is strictly better than "aha, I have no idea what this identifier is doing here". However, I'm changing my opinion here -- see below.

So I think this will cause issues for the suppression attribute you're working on if we allow it to have a GNU spelling with valid C89 code like; __attribute__((suppress)) i = 12;

This is not a valid code if you remove __attribute__((suppress)). i = 12 without a prior declaration of i will cause the use of undeclared identifier 'i' error. You can try it with any compiler: https://godbolt.org/z/P43nxn.

Thank you for pointing out that my test case was bad in the first place -- I forgot that *block scope* variables cannot be implicit int, but *file scope* ones can (in C, not in C++): https://godbolt.org/z/7TjGT3

When I went back and re-ran my test with the variable at file scope, the behavior stayed the same with and without the patch:

F:\llvm-project>cat "F:\Aaron Ballman\Desktop\test.c"
__attribute__((fallthrough)) i = 12;
j = 12;

Before patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -std=c89 -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:1:30: warning: declaration specifier missing, defaulting to 'int'
__attribute__((fallthrough)) i = 12;
                             ^
int
F:\Aaron Ballman\Desktop\test.c:1:16: error: 'fallthrough' attribute cannot be applied to a declaration
__attribute__((fallthrough)) i = 12;
               ^             ~
F:\Aaron Ballman\Desktop\test.c:2:1: warning: declaration specifier missing,
      defaulting to 'int'
j = 12;
^
int
2 warnings and 1 error generated.

After patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -std=c89 -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:1:30: warning: declaration specifier missing, defaulting to 'int'
__attribute__((fallthrough)) i = 12;
                             ^
int
F:\Aaron Ballman\Desktop\test.c:1:16: error: 'fallthrough' attribute cannot be applied to a declaration
__attribute__((fallthrough)) i = 12;
               ^             ~
F:\Aaron Ballman\Desktop\test.c:2:1: warning: declaration specifier missing, defaulting to 'int'
j = 12;
^
int
2 warnings and 1 error generated.

Huttah!

I honestly don't see a scenario when the user has some piece of code that is parsed as a statement, then they mindfully add a statement attribute, and expect it to be parsed as a declaration after that.

There are attributes like nomerge which are both a declaration and a statement attribute, and there are other attributes like suppress that are statement attributes which can reasonably be written on a declaration statement. So the scenario I am worried about is when the code is being parsed as a declaration, the user adds one of these attributes and the decision changes to parse as a statement rather than a declaration. I'm not yet convinced we can't run into that situation with this change, but the implicit int thing was the only major edge case I could come up with and I've alllllmost convinced myself we can't run into an issue with it. I think our parser predicate test (on line 218 of your patch) for whether something is a declaration statement or not is what's helping avoid most of the parsing concerns I have because they usually will find some declaration specifier to help get the right answer.

FWIW, while testing various situations out, I think I found a Clang bug where we reject valid code (https://godbolt.org/z/ooqMvP) that *might* cause us problems if we apply this patch and ever fix that rejects valid bug. We should not be rejecting any of these functions, but we fail to note the implicit int with the declaration within foo() (note how the presence of the extern declaration specifier in quux() fixes things).

or __attribute__((suppress)) g() { return 12; } which may be intended to suppress the "type specifier missing" diagnostic (somewhat amusingly, since the user could also just add the int to the declaration since they're changing the code anyway).

Again, we do not force the parser to parse whatever follows the statement attribute as a statement. __attribute__((suppress)) will not change how we parse things here.

I agree that it shouldn't (in theory), but I've not been able to convince myself that it doesn't (in practice). I'm getting there though, and I appreciate your patience while we discuss the concerns. Concretely, I think we could use some more C test coverage for the various places where implicit int can appear in a declaration. Here are the places I could come up with off the top of my head: https://godbolt.org/z/6h3MTr

clang/test/Parser/stmt-attributes.c
89

In the latest patch, this diagnostic is back to being reasonable (but the test fails when I run it locally because it emits a different diagnostic).

I guess I want to clarify one point here, after this patch the parser will not assume that statement always follows statement attributes. We simply turn off the assumption that what follows is a declaration, parser will simply determine whether it is a statement or a declaration, it can do it on its own without attribute's "help".

We used to say that if there's a GNU attribute (or we're parsing a declaration statement), what follows must be a declaration. Now we're using the attributes in the given attribute list to decide whether what follows is a declaration or a statement (if all of the attributes are statement attributes, we don't treat what follows as being a declaration unless we're in a declaration statement). Or am I misreading the code?

We do not decide whether it is declaration or statement. I do get that it is a matter of perspective, but right now I prefer to read it like this:
BEFORE: if we've seen a GNU-style attribute parse whatever comes next as a declaration
AFTER: if we've seen a GNU-style attribute except for the case when all of the parsed attributes are known to be statement attributes, parse whatever comes next as a declaration

So, essentially, when we see statement attributes only, attribute/declaration heuristic is canceled and we parse exactly the way we would've parsed as if no attributes present.

I think this is a defensible design, but it still has the potential to change the parsing behavior in some circumstances and I want to make sure those circumstances are ones we know about. Note, I still think we should coordinate this change with the GCC folks. GCC has a lot of attributes that Clang does not have, so my concern with GCC is that Clang supports writing attributes in this way and GCC can't support it for some reason and it causes issues for an attribute we want to share between implementations. I don't have a specific case in mind that may cause an issue but they have a lot of attributes I'm unfamiliar with.

This is where we also pretty defensive and conservative, if we see any unknown attributes, we stick with the "attribute -> declaration" rule.

e.g.,

void foo(void) {
  __attribute__((fallthrough)) i = 12;
}

Before patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:2:32: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
  __attribute__((fallthrough)) i = 12;
                               ^
F:\Aaron Ballman\Desktop\test.c:2:18: error: 'fallthrough' attribute cannot be applied to a declaration
  __attribute__((fallthrough)) i = 12;
                 ^             ~
1 warning and 1 error generated.

After patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:2:32: error: use of undeclared identifier 'i'
  __attribute__((fallthrough)) i = 12;
                               ^
1 error generated.

In both situations, the code contains errors. I don't think that this is a valid argument, to be honest. Actually, the nature of this change is that it starts to parse strictly MORE cases.

My concerns are if there are cases where we would start to reject valid code based on the presence of a GNU statement attribute or where we'd regress diagnostics. This seemed like a case where we were regressing diagnostics and could potentially reject otherwise valid code (using a different attribute that wouldn't generate an error in the way fallthrough does). We previously were saying "aha, this is a declaration and the attribute cannot apply to it" which is strictly better than "aha, I have no idea what this identifier is doing here". However, I'm changing my opinion here -- see below.

So I think this will cause issues for the suppression attribute you're working on if we allow it to have a GNU spelling with valid C89 code like; __attribute__((suppress)) i = 12;

This is not a valid code if you remove __attribute__((suppress)). i = 12 without a prior declaration of i will cause the use of undeclared identifier 'i' error. You can try it with any compiler: https://godbolt.org/z/P43nxn.

Thank you for pointing out that my test case was bad in the first place -- I forgot that *block scope* variables cannot be implicit int, but *file scope* ones can (in C, not in C++): https://godbolt.org/z/7TjGT3

When I went back and re-ran my test with the variable at file scope, the behavior stayed the same with and without the patch:

F:\llvm-project>cat "F:\Aaron Ballman\Desktop\test.c"
__attribute__((fallthrough)) i = 12;
j = 12;

Before patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -std=c89 -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:1:30: warning: declaration specifier missing, defaulting to 'int'
__attribute__((fallthrough)) i = 12;
                             ^
int
F:\Aaron Ballman\Desktop\test.c:1:16: error: 'fallthrough' attribute cannot be applied to a declaration
__attribute__((fallthrough)) i = 12;
               ^             ~
F:\Aaron Ballman\Desktop\test.c:2:1: warning: declaration specifier missing,
      defaulting to 'int'
j = 12;
^
int
2 warnings and 1 error generated.

After patch:

F:\llvm-project>llvm\out\build\x64-Debug\bin\clang.exe -std=c89 -fsyntax-only "F:\Aaron Ballman\Desktop\test.c"
F:\Aaron Ballman\Desktop\test.c:1:30: warning: declaration specifier missing, defaulting to 'int'
__attribute__((fallthrough)) i = 12;
                             ^
int
F:\Aaron Ballman\Desktop\test.c:1:16: error: 'fallthrough' attribute cannot be applied to a declaration
__attribute__((fallthrough)) i = 12;
               ^             ~
F:\Aaron Ballman\Desktop\test.c:2:1: warning: declaration specifier missing, defaulting to 'int'
j = 12;
^
int
2 warnings and 1 error generated.

Huttah!

Yay!

I honestly don't see a scenario when the user has some piece of code that is parsed as a statement, then they mindfully add a statement attribute, and expect it to be parsed as a declaration after that.

There are attributes like nomerge which are both a declaration and a statement attribute, and there are other attributes like suppress that are statement attributes which can reasonably be written on a declaration statement. So the scenario I am worried about is when the code is being parsed as a declaration, the user adds one of these attributes and the decision changes to parse as a statement rather than a declaration. I'm not yet convinced we can't run into that situation with this change, but the implicit int thing was the only major edge case I could come up with and I've alllllmost convinced myself we can't run into an issue with it. I think our parser predicate test (on line 218 of your patch) for whether something is a declaration statement or not is what's helping avoid most of the parsing concerns I have because they usually will find some declaration specifier to help get the right answer.

I actually thought about this. Essentially there are 3 interesting cases here:

  1. the user has a declaration that is parsed as a declaration only because of the attribute and wants to add a statement attribute
__attribute__((X)) i = 12;
// into
__attribute__((X, fallthrough)) i = 12;

We don't want to break working code, so we maintain the rule and parse it as a declaration.

  1. the user has a statement and wants to add a statement attribute
i = 12; // assignment
// into
__attribute__((stmt_attr)) i = 12; // still assignment

We don't use the rule and don't break the code, that's why we do it!

  1. the user has a declaration-or-statement attribute on a declaration that is parsed as declaration only because of this attribute
__attribute__((both-decl-and-stmt)) i = 12; // implicit int declaration
// after the Clang update
__attribute__((both-decl-and-stmt)) i = 12; // error!

That is a regression, but it looks like statement attributes are pretty rare, let alone declaration-or-statement attributes in C. Also, it might cause the problem only when the user relied on the "attribute -> declaration" rule. So, it seems pretty low chance, and we get a bit more consistency in what "statement attribute" means.

FWIW, while testing various situations out, I think I found a Clang bug where we reject valid code (https://godbolt.org/z/ooqMvP) that *might* cause us problems if we apply this patch and ever fix that rejects valid bug. We should not be rejecting any of these functions, but we fail to note the implicit int with the declaration within foo() (note how the presence of the extern declaration specifier in quux() fixes things).

Yikes, that's pretty bad!

or __attribute__((suppress)) g() { return 12; } which may be intended to suppress the "type specifier missing" diagnostic (somewhat amusingly, since the user could also just add the int to the declaration since they're changing the code anyway).

Again, we do not force the parser to parse whatever follows the statement attribute as a statement. __attribute__((suppress)) will not change how we parse things here.

I agree that it shouldn't (in theory), but I've not been able to convince myself that it doesn't (in practice). I'm getting there though, and I appreciate your patience while we discuss the concerns.

I actually want to thank you instead. It is a pretty tricky situation here with all this, so thank you for going so patiently from one case to another!

Concretely, I think we could use some more C test coverage for the various places where implicit int can appear in a declaration. Here are the places I could come up with off the top of my head: https://godbolt.org/z/6h3MTr

That's awesome, I'll incorporate that into the patch!

There are attributes like nomerge which are both a declaration and a statement attribute, and there are other attributes like suppress that are statement attributes which can reasonably be written on a declaration statement. So the scenario I am worried about is when the code is being parsed as a declaration, the user adds one of these attributes and the decision changes to parse as a statement rather than a declaration. I'm not yet convinced we can't run into that situation with this change, but the implicit int thing was the only major edge case I could come up with and I've alllllmost convinced myself we can't run into an issue with it. I think our parser predicate test (on line 218 of your patch) for whether something is a declaration statement or not is what's helping avoid most of the parsing concerns I have because they usually will find some declaration specifier to help get the right answer.

I actually thought about this. Essentially there are 3 interesting cases here:

  1. the user has a declaration that is parsed as a declaration only because of the attribute and wants to add a statement attribute
__attribute__((X)) i = 12;
// into
__attribute__((X, fallthrough)) i = 12;

We don't want to break working code, so we maintain the rule and parse it as a declaration.

  1. the user has a statement and wants to add a statement attribute
i = 12; // assignment
// into
__attribute__((stmt_attr)) i = 12; // still assignment

We don't use the rule and don't break the code, that's why we do it!

  1. the user has a declaration-or-statement attribute on a declaration that is parsed as declaration only because of this attribute
__attribute__((both-decl-and-stmt)) i = 12; // implicit int declaration
// after the Clang update
__attribute__((both-decl-and-stmt)) i = 12; // error!

That is a regression, but it looks like statement attributes are pretty rare, let alone declaration-or-statement attributes in C. Also, it might cause the problem only when the user relied on the "attribute -> declaration" rule. So, it seems pretty low chance, and we get a bit more consistency in what "statement attribute" means.

I agree with these three cases. I did come up with a fourth case: the user wants to attribute something that is a statement which declares things but is not a declaration statement. e.g., __attribute__((whatever)) [](){}(); (or similar using blocks instead of lambdas). We don't parse this construct on trunk (https://godbolt.org/z/3e83of) and I tried it with your patch applied and it still wouldn't parse, so it may make for a good test case to add.

FWIW, while testing various situations out, I think I found a Clang bug where we reject valid code (https://godbolt.org/z/ooqMvP) that *might* cause us problems if we apply this patch and ever fix that rejects valid bug. We should not be rejecting any of these functions, but we fail to note the implicit int with the declaration within foo() (note how the presence of the extern declaration specifier in quux() fixes things).

Yikes, that's pretty bad!

I filed https://bugs.llvm.org/show_bug.cgi?id=48672 for it.

or __attribute__((suppress)) g() { return 12; } which may be intended to suppress the "type specifier missing" diagnostic (somewhat amusingly, since the user could also just add the int to the declaration since they're changing the code anyway).

Again, we do not force the parser to parse whatever follows the statement attribute as a statement. __attribute__((suppress)) will not change how we parse things here.

I agree that it shouldn't (in theory), but I've not been able to convince myself that it doesn't (in practice). I'm getting there though, and I appreciate your patience while we discuss the concerns.

I actually want to thank you instead. It is a pretty tricky situation here with all this, so thank you for going so patiently from one case to another!

Huttah for quality code review practices! :-)

Concretely, I think we could use some more C test coverage for the various places where implicit int can appear in a declaration. Here are the places I could come up with off the top of my head: https://godbolt.org/z/6h3MTr

That's awesome, I'll incorporate that into the patch!

I'm going to see if I can run the patch against our internal corpus here at work to see if it shakes out any breakages to real world code. I've not tried this before, so I have no idea how long it will take before I get results back, but I'll let you know when I have them.

I'm going to see if I can run the patch against our internal corpus here at work to see if it shakes out any breakages to real world code. I've not tried this before, so I have no idea how long it will take before I get results back, but I'll let you know when I have them.

The tests finally finished over the weekend and came back with no new failures -- that gives me a lot more confidence that this doesn't accidentally change behavior in cases users are likely to hit (at least in C and C++). I think the next steps are to clean up the tests and then coordinate with the GCC folks to ensure they don't have any strong objections we should be aware of.

clang/test/Parser/stmt-attributes.c
5

Given that these are parser tests, I think you can drop the attribute name itself except in the cases where you explicitly want to test that an unknown attribute introduces the start of a declaration (or things where the attribute identifier matters for some reason). This will reduce the amount of expected-warning comments we need to add.

Also, because this change impacts C++ and ObjC/C++ as well, I think we should have some tests for language-specific constructs like lambdas, range-based for loops, blocks, etc.

I'm going to see if I can run the patch against our internal corpus here at work to see if it shakes out any breakages to real world code. I've not tried this before, so I have no idea how long it will take before I get results back, but I'll let you know when I have them.

The tests finally finished over the weekend and came back with no new failures -- that gives me a lot more confidence that this doesn't accidentally change behavior in cases users are likely to hit (at least in C and C++).

That is amazing, thanks for doing that! I'll make sure that it works for Obj-C/Obj-C++.

I think the next steps are to clean up the tests and then coordinate with the GCC folks to ensure they don't have any strong objections we should be aware of.

I'll clean up the tests! But, unfortunately, I couldn't proceed with GCC folks. I asked around and I'm not allowed to do that (company rules).

vsavchenko added inline comments.Jan 18 2021, 6:58 AM
clang/test/Parser/stmt-attributes.c
5

This test is mostly a copy-paste of a similar test for C++11 statement attributes.
I think that these cases show that GNU spelling for attributes doesn't break anything if we put it in front of these various statements. And you are absolutely right that we need to add more cases for C++ and Obj-C constructs. But I don't understand about 'unknown_attribute'. What should I replace it with?

aaron.ballman added inline comments.Jan 19 2021, 5:26 AM
clang/test/Parser/stmt-attributes.c
5

But I don't understand about 'unknown_attribute'. What should I replace it with?

You can use __attribute__(()) to test that we parse a GNU style attribute in a position without bothering to name any attributes. This will eliminate the "unknown attribute" sema warnings from the test without impacting the parser test functionality, and it'll make it more clear in what cases the specific attribute named impacts the parsing behavior.

e.g., __attribute__((unknown_attribute)) if (0) {} converted into __attribute__(()) if (0) {} should still test what we need tested.

vsavchenko added inline comments.Jan 19 2021, 5:27 AM
clang/test/Parser/stmt-attributes.c
5

Gotcha, will do!

I'll clean up the tests! But, unfortunately, I couldn't proceed with GCC folks. I asked around and I'm not allowed to do that (company rules).

Ah, that's unfortunate. When I get the chance, I'll try to send an email to the GCC lists. Do you mind if I pass the email by you first (privately) to ensure I'm not misstating your goals or anything from your patch?

I'll clean up the tests! But, unfortunately, I couldn't proceed with GCC folks. I asked around and I'm not allowed to do that (company rules).

Ah, that's unfortunate. When I get the chance, I'll try to send an email to the GCC lists. Do you mind if I pass the email by you first (privately) to ensure I'm not misstating your goals or anything from your patch?

That would be amazing, thank you!
Sure, sounds like a solid plan.

Add more tests

Add extension for backward compatibility checks

NoQ added inline comments.Jan 28 2021, 9:52 PM
clang/include/clang/Basic/Features.def
256 ↗(On Diff #319516)

People seem to occasionally write lit tests for those. There isn't much room for error there but i guess a tiny test wouldn't hurt.

Add check for extension

aaron.ballman accepted this revision.Feb 10 2021, 8:24 AM

I'll clean up the tests! But, unfortunately, I couldn't proceed with GCC folks. I asked around and I'm not allowed to do that (company rules).

Ah, that's unfortunate. When I get the chance, I'll try to send an email to the GCC lists. Do you mind if I pass the email by you first (privately) to ensure I'm not misstating your goals or anything from your patch?

I sent the email to the GCC lists (https://gcc.gnu.org/pipermail/gcc/2021-January/234747.html) and there were no follow-up responses after two weeks. I take the silence to mean there are no concerns from the GCC side of things, so this patch (finally!) LGTM. Thank you for sticking through the long process on this one!

This revision is now accepted and ready to land.Feb 10 2021, 8:24 AM

I'll clean up the tests! But, unfortunately, I couldn't proceed with GCC folks. I asked around and I'm not allowed to do that (company rules).

Ah, that's unfortunate. When I get the chance, I'll try to send an email to the GCC lists. Do you mind if I pass the email by you first (privately) to ensure I'm not misstating your goals or anything from your patch?

I sent the email to the GCC lists (https://gcc.gnu.org/pipermail/gcc/2021-January/234747.html) and there were no follow-up responses after two weeks. I take the silence to mean there are no concerns from the GCC side of things, so this patch (finally!) LGTM. Thank you for sticking through the long process on this one!

Yay, thanks!