With this patch, declarations containing unknown types are parsed like declarations, not like expressions. This leads to better diagnostics and yields InvalidDecl in AST (thus DeclRefExpr's to them) which is good for compiler users and AST consumers.
Details
- Reviewers
sammccall hokein adamcz aaron.ballman - Group Reviewers
Restricted Project
Diff Detail
Event Timeline
https://github.com/llvm/llvm-project/issues/58355 corresponding github issue with current clang behavior.
Thank you for the changes! One thing you should add is a release note so users know there's been a diagnostic improvement.
clang/lib/Parse/ParseDecl.cpp | ||
---|---|---|
5384–5385 ↗ | (On Diff #471751) | This is a pretty unsafe function to call from arbitrary parsing contexts. We should be asserting that the parser is in a reasonable state to even ask this question; alternatively, we could make this a static function that accepts a Parser& object. |
5388 ↗ | (On Diff #471751) | Why only __attribute__ and not __declspec? [[]] attributes as well? |
5392–5394 ↗ | (On Diff #471751) | Looking for *, &, and && will help catch some cases... but it's not really going to help for situations like unknown const * or other qualifiers... |
5425 ↗ | (On Diff #471751) | Why is ObjC exempted from this? I need to think about this a whole lot more. On the one hand, I like the changes happening in the test cases; those look like good changes to me. But on the other hand, this function is called in a fair number of places to make parsing decisions and I'm not convinced that speculative answers are the best way to go from this interface. I need to see if I can find any situations where this is called and we'd get worse parsing behavior (or incorrect parsing behavior!). |
clang/test/SemaOpenCL/invalid-pipes-cl1.2.cl | ||
28–29 |
clang/lib/Parse/ParseDecl.cpp | ||
---|---|---|
5392–5394 ↗ | (On Diff #471751) | Actually, this Parser::isDeclarationSpecifier is never called in case of C++. So I am not sure if we need to cover C++ specific constructs or not. But unknown const * is a good catch, I missed it for some reason. I will update the patch soon. |
5425 ↗ | (On Diff #471751) | There is weird behavior in case of ObjC range-based for loop. For example, in for (NSString* key in xyz) token for in keyword is of type Identifier by the call of Parser::isDeclarationSpecifier. So I decided to exempt it in first version and discuss it in review. |
I admire the goals of this patch, but I think it needs some explanation "why we expect this to be roughly correct/safe", rather than just trying to think of cases that might go wrong and patch them.
The test coverage for incorrect code is not good, I'd expect to need to add some new cases here. (And if there are benefits to the AST, we'd want to test those).
clang/lib/Parse/ParseDecl.cpp | ||
---|---|---|
5425 ↗ | (On Diff #471751) | This looks very suspicious to me, for example in a * b;. isDeclarationSpecifier() is going to return true when pointing at a. In fact a may be either a type or an expression here. Even if this right now this function never gets called for that case, it's not obvious why it wouldn't be in the future, or wouldn't be called for similar cases. |
clang/lib/Parse/ParseDecl.cpp | ||
---|---|---|
5425 ↗ | (On Diff #471751) | The property I relied on is that token a in a * b; gets annotated before and default branch works. But I understand your concern, I am gonna update the patch soon. |
Added test to show the advantage in AST dump.
Missing keywords are added in 'IsUnknownTypedefName', the function is static now.
'DisambiguatingWithExpression' check is added to narrow down the effect of the changes.
- Added test for AST dump of invalid C code
- Added testcase for diagnostics
- Some minor fixes
Not sure this is ready for review again, ignore me if not...
I'm still not sure why this is correct in principle. Without that, if someone finds a misparse 6 months from now I don't know how we determine where to fix it.
For example, this path is called from Parser::isKnownToBeDeclarationSpecifier() whose contract is Return true if we know that we are definitely looking at a decl-specifier... Return false if it's no a decl-specifier, or we're not sure. There doesn't seem to be any room for heuristics, unless we're going to change that contract and audit all the callers. If this *isn't* a heuristic (it sure looks like one) it needs some comments on why it's correct.
clang/test/Parser/recovery.c | ||
---|---|---|
105 | this diagnostic is worse than the old one (less accurate). (I think it's OK to trade off diagnostics quality if it's better on balance, maybe leave a comment?) |
FWIW, I agree with the concerns here; I touched on the same thing with my earlier comments.
clang/test/Parser/recovery.c | ||
---|---|---|
105 | Just to mention, this is exact behavior of clang for C++. |
Reasoning about invalid code is made now in specific parsing path not in general decision-making method.
clang/lib/Parse/ParseStmt.cpp | ||
---|---|---|
177–178 | What about other qualifiers? _Nullable and restrict and whatnot? | |
184–185 | Why are these pinned to C++? __declspec is used in C and shows up in the same syntactic locations as in C++, and [ seems like you're looking for [[]]-style attributes (perhaps?) but those also exist in C. What about parens? e.g., _BitNit(12) foo; // Oops, meant to type _BitInt(12) | |
clang/test/AST/ast-dump-invalid.c | ||
8–11 | I'd also like to see a test where the qualifier comes before the unknown type name: const unknown_t e; and for typos with multi-keyword types: unsinged long g; unsigned lnog h; and for situations that are just utter nonsense rather than types: unknown_t[12] i; unknown_t&&& j; ohhh.... and GNU statement expressions make this a bit interesting because that's a place where a declaration and an expression get a bit mixed up: ({ int val = 0, i = 12; vla * i; }) // Typo: vla should be val, but this is *not* a declaration, it's the expression determining the value of the statement expression. ({ int val = 0, i = 12; vla & i; }) // Similar but an interesting case for C++ | |
clang/test/Parser/recovery.c | ||
105 |
Less accurate how? It went from use of undeclared identifier 'unknown_t' to unknown type name 'unknown_t' and that seems accurate to me given that the undeclared identifier is written in a type position. | |
109 | There is a behavioral difference here that is plausibly a regression: we used to get use of undeclared identifier 'a' here as well but because we recover as-if a was declared as type int, that diagnostic has gone away. I think that's reasonable as the user was already told about the issue with the declaration of a. |
clang/test/Parser/recovery.c | ||
---|---|---|
105 | It *could* be a type position, or it could be an operand of multiplication - because the identifier is unresolved, we don't know. (We could discount the possibility it's a multiplication expression because the result would be discarded, but AFAICT that's not actually what's happening and it'll fire in other cases). Regardless, this seems OK to me (especially given we already make this guess for C++!) |
clang/test/Parser/recovery.c | ||
---|---|---|
105 | Ahhh okay, I see what you mean, thanks! I guess I've never seen a discarded multiplication in real world code, so my assumption was there's very little chance this diagnostic is less accurate instead of more accurate. Same goes for Foo & Bar which could be a bitwise AND or a declaration of reference in C++. |
clang/lib/Parse/ParseStmt.cpp | ||
---|---|---|
177–178 | I think restrict can not go just after type name. |
clang/lib/Parse/ParseStmt.cpp | ||
---|---|---|
177–178 | It can when you consider typedefs, as in: https://godbolt.org/z/hjK9f3xEc |
clang/lib/Parse/ParseStmt.cpp | ||
---|---|---|
184–185 | For _BitNit(12) foo; clang gives diagnostics about implicit declaration of function _BitNit'. This patch preserves the behaviour. I think it's OK, wdyt? |
@sammccall @aaron.ballman I am thinking about another solution with tentative parsing as implemented for C++ in Parser::isCXXSimpleDeclaration (which is eventually called from isDeclarationStatement()). This approach works well for C++. So I want to update this patch with this approach.
What about other qualifiers? _Nullable and restrict and whatnot?