This is an archive of the discontinued LLVM Phabricator instance.

[clang][AST] Handle variable declaration with unknown typedef in C
Needs ReviewPublic

Authored by urazoff on Oct 29 2022, 6:20 AM.

Details

Reviewers
sammccall
hokein
adamcz
aaron.ballman
Group Reviewers
Restricted Project
Summary
With this patch, declarations containing unknown types are parsed
like declarations, not like expressions. This leads to better
diagnostics and yields InvalidDecl in AST (thus DeclRefExpr's to
them) which is good for compiler users and AST consumers.

Diff Detail

Event Timeline

urazoff created this revision.Oct 29 2022, 6:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 29 2022, 6:20 AM
urazoff requested review of this revision.Oct 29 2022, 6:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 29 2022, 6:20 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

https://github.com/llvm/llvm-project/issues/58355 corresponding github issue with current clang behavior.

aaron.ballman added reviewers: Restricted Project, aaron.ballman.Oct 31 2022, 11:39 AM
aaron.ballman removed a subscriber: aaron.ballman.

Thank you for the changes! One thing you should add is a release note so users know there's been a diagnostic improvement.

clang/lib/Parse/ParseDecl.cpp
5384–5385 ↗(On Diff #471751)

This is a pretty unsafe function to call from arbitrary parsing contexts. We should be asserting that the parser is in a reasonable state to even ask this question; alternatively, we could make this a static function that accepts a Parser& object.

5388 ↗(On Diff #471751)

Why only __attribute__ and not __declspec? [[]] attributes as well?

5392–5394 ↗(On Diff #471751)

Looking for *, &, and && will help catch some cases... but it's not really going to help for situations like unknown const * or other qualifiers...

5425 ↗(On Diff #471751)

Why is ObjC exempted from this?

I need to think about this a whole lot more. On the one hand, I like the changes happening in the test cases; those look like good changes to me. But on the other hand, this function is called in a fair number of places to make parsing decisions and I'm not convinced that speculative answers are the best way to go from this interface. I need to see if I can find any situations where this is called and we'd get worse parsing behavior (or incorrect parsing behavior!).

clang/test/SemaOpenCL/invalid-pipes-cl1.2.cl
28–29
urazoff added inline comments.Nov 7 2022, 3:34 AM
clang/lib/Parse/ParseDecl.cpp
5392–5394 ↗(On Diff #471751)

Actually, this Parser::isDeclarationSpecifier is never called in case of C++. So I am not sure if we need to cover C++ specific constructs or not. But unknown const * is a good catch, I missed it for some reason. I will update the patch soon.

5425 ↗(On Diff #471751)

There is weird behavior in case of ObjC range-based for loop. For example, in for (NSString* key in xyz) token for in keyword is of type Identifier by the call of Parser::isDeclarationSpecifier. So I decided to exempt it in first version and discuss it in review.

I admire the goals of this patch, but I think it needs some explanation "why we expect this to be roughly correct/safe", rather than just trying to think of cases that might go wrong and patch them.

The test coverage for incorrect code is not good, I'd expect to need to add some new cases here. (And if there are benefits to the AST, we'd want to test those).

clang/lib/Parse/ParseDecl.cpp
5425 ↗(On Diff #471751)

This looks very suspicious to me, for example in a * b;. isDeclarationSpecifier() is going to return true when pointing at a. In fact a may be either a type or an expression here.

Even if this right now this function never gets called for that case, it's not obvious why it wouldn't be in the future, or wouldn't be called for similar cases.

urazoff added inline comments.Nov 8 2022, 1:59 AM
clang/lib/Parse/ParseDecl.cpp
5425 ↗(On Diff #471751)

The property I relied on is that token a in a * b; gets annotated before and default branch works. But I understand your concern, I am gonna update the patch soon.

urazoff updated this revision to Diff 475370.Nov 15 2022, 1:49 AM

Added test to show the advantage in AST dump.
Missing keywords are added in 'IsUnknownTypedefName', the function is static now.
'DisambiguatingWithExpression' check is added to narrow down the effect of the changes.

urazoff updated this revision to Diff 476766.Nov 20 2022, 11:26 AM
  • Added test for AST dump of invalid C code
  • Added testcase for diagnostics
  • Some minor fixes

Not sure this is ready for review again, ignore me if not...

I'm still not sure why this is correct in principle. Without that, if someone finds a misparse 6 months from now I don't know how we determine where to fix it.

For example, this path is called from Parser::isKnownToBeDeclarationSpecifier() whose contract is Return true if we know that we are definitely looking at a decl-specifier... Return false if it's no a decl-specifier, or we're not sure. There doesn't seem to be any room for heuristics, unless we're going to change that contract and audit all the callers. If this *isn't* a heuristic (it sure looks like one) it needs some comments on why it's correct.

clang/test/Parser/recovery.c
105

this diagnostic is worse than the old one (less accurate).

(I think it's OK to trade off diagnostics quality if it's better on balance, maybe leave a comment?)

Not sure this is ready for review again, ignore me if not...

I'm still not sure why this is correct in principle. Without that, if someone finds a misparse 6 months from now I don't know how we determine where to fix it.

For example, this path is called from Parser::isKnownToBeDeclarationSpecifier() whose contract is Return true if we know that we are definitely looking at a decl-specifier... Return false if it's no a decl-specifier, or we're not sure. There doesn't seem to be any room for heuristics, unless we're going to change that contract and audit all the callers. If this *isn't* a heuristic (it sure looks like one) it needs some comments on why it's correct.

FWIW, I agree with the concerns here; I touched on the same thing with my earlier comments.

urazoff added inline comments.Nov 29 2022, 10:58 AM
clang/test/Parser/recovery.c
105

Just to mention, this is exact behavior of clang for C++.

urazoff updated this revision to Diff 478667.Nov 29 2022, 11:50 AM

Reasoning about invalid code is made now in specific parsing path not in general decision-making method.

aaron.ballman added inline comments.Nov 30 2022, 6:17 AM
clang/lib/Parse/ParseStmt.cpp
177–178

What about other qualifiers? _Nullable and restrict and whatnot?

184–185

Why are these pinned to C++? __declspec is used in C and shows up in the same syntactic locations as in C++, and [ seems like you're looking for [[]]-style attributes (perhaps?) but those also exist in C.

What about parens? e.g.,

_BitNit(12) foo; // Oops, meant to type _BitInt(12)
clang/test/AST/ast-dump-invalid.c
8–11

I'd also like to see a test where the qualifier comes before the unknown type name:

const unknown_t e;

and for typos with multi-keyword types:

unsinged long g;
unsigned lnog h;

and for situations that are just utter nonsense rather than types:

unknown_t[12] i;
unknown_t&&& j;

ohhh.... and GNU statement expressions make this a bit interesting because that's a place where a declaration and an expression get a bit mixed up:

({ int val = 0, i = 12; vla * i; }) // Typo: vla should be val, but this is *not* a declaration, it's the expression determining the value of the statement expression.
({ int val = 0, i = 12; vla & i; }) // Similar but an interesting case for C++
clang/test/Parser/recovery.c
105

this diagnostic is worse than the old one (less accurate).

Less accurate how? It went from use of undeclared identifier 'unknown_t' to unknown type name 'unknown_t' and that seems accurate to me given that the undeclared identifier is written in a type position.

109

There is a behavioral difference here that is plausibly a regression: we used to get use of undeclared identifier 'a' here as well but because we recover as-if a was declared as type int, that diagnostic has gone away. I think that's reasonable as the user was already told about the issue with the declaration of a.

sammccall added inline comments.Nov 30 2022, 8:06 AM
clang/test/Parser/recovery.c
105

It *could* be a type position, or it could be an operand of multiplication - because the identifier is unresolved, we don't know.

(We could discount the possibility it's a multiplication expression because the result would be discarded, but AFAICT that's not actually what's happening and it'll fire in other cases).

Regardless, this seems OK to me (especially given we already make this guess for C++!)

aaron.ballman added inline comments.Nov 30 2022, 8:15 AM
clang/test/Parser/recovery.c
105

Ahhh okay, I see what you mean, thanks! I guess I've never seen a discarded multiplication in real world code, so my assumption was there's very little chance this diagnostic is less accurate instead of more accurate. Same goes for Foo & Bar which could be a bitwise AND or a declaration of reference in C++.

urazoff added inline comments.Dec 4 2022, 10:46 AM
clang/lib/Parse/ParseStmt.cpp
177–178

I think restrict can not go just after type name.

aaron.ballman added inline comments.Dec 5 2022, 11:46 AM
clang/lib/Parse/ParseStmt.cpp
177–178

It can when you consider typedefs, as in: https://godbolt.org/z/hjK9f3xEc

urazoff added inline comments.Dec 14 2022, 12:38 PM
clang/lib/Parse/ParseStmt.cpp
184–185

For _BitNit(12) foo; clang gives diagnostics about implicit declaration of function _BitNit'. This patch preserves the behaviour. I think it's OK, wdyt?

@sammccall @aaron.ballman I am thinking about another solution with tentative parsing as implemented for C++ in Parser::isCXXSimpleDeclaration (which is eventually called from isDeclarationStatement()). This approach works well for C++. So I want to update this patch with this approach.