This is an archive of the discontinued LLVM Phabricator instance.

Improve diagnostic message for misplaced square brackets
ClosedPublic

Authored by rtrieu on Feb 5 2014, 10:06 PM.

Details

Reviewers
rsmith
Summary

It is a common mistake to place square brackets before the identifier. However, when this occurs, Clang merely says "expected unqualified-id" pointing at the first open square bracket. Also, since the identifier isn't properly parsed, attempting to use it will result in additional undeclared identifier errors.

test.cc:

const char[] str = "foo";
char a = str[1];

Clang:

test.cc:1:11: error: expected unqualified-id
const char[] str = "foo";
          ^
test.cc:2:10: error: use of undeclared identifier 'str'
char a = str[1];
         ^
2 errors generated.

With patch:

test.cc:1:17: error: brackets go after the unqualified-id
const char[] str = "foo";
          ~~    ^
                []
1 error generated.

Currently, the parser will process the identifier then the brackets. Now, when a bracket appears instead of an identifier, the parser can process the brackets first, holding the location information. Then when it finds the identifier, it can will produce a better diagnostic to move the brackets and also include a fix-it hint. Other error messages have been updated to display in the correct spot. Since the identifier can now be attached to the Declarator, this will also remove the undeclared identifier errors, too.

Diff Detail

Event Timeline

gribozavr added inline comments.Feb 9 2014, 10:17 AM
test/Parser/brackets.c
6

-fdiagnostics-parseable-fixits?

rtrieu updated this revision to Unknown Object (????).Feb 12 2014, 4:04 PM

Add parseable fix-its to the test case.

rsmith added inline comments.Feb 23 2014, 9:04 PM
lib/Parse/ParseDecl.cpp
4717

In this case, I think you probably shouldn't allow square brackets. That is, if someone writes:

int foo::[3]

we shouldn't activate the special case.

4722–4729

It looks like this will behave strangely if the remaining declarator is not just a simple identifier:

int[3] (*p);

... will form an 'array of three pointers to int' type, rather than a 'pointer to array of three ints' type. Instead, to get this sort of thing right, I suggest you consume the array bound, then call ParseDeclarator, then add the array bound chunks to the declarator and issue your diagnostic.

To get the diagnostic location right, you could store the location of the '[' as the identifier location in the Declarator object or similar.

Oh, nice feature! I didn't look at the patch, but I tried to write this
patch a while ago in http://llvm.org/bugs/show_bug.cgi?id=11739 (which this
patch will fix) and there are a few comments on that bug that might be
interesting (including review comments from zygoloid, test cases, etc). So
if you haven't see that bug, maybe give it a look :-)

rtrieu updated this revision to Diff 9095.May 5 2014, 8:34 PM

Cache the results of the bracket parsing, then later add the info the Declarator to get the correct type.

rtrieu added inline comments.May 5 2014, 8:39 PM
lib/Parse/ParseDecl.cpp
4717

This patch doesn't affect the diagnostic messages for this code.

4722–4729

Switch to storing the bracket data, then letting the identifier parsing happen. Right before the diagnostic is emitted, the bracket data is added to the Declarator, which now treats:

int[3] (*p);

as

int (*p)[3];
rsmith added inline comments.May 6 2014, 6:45 PM
lib/Parse/ParseDecl.cpp
4838

This looks like it'll provide the wrong location if there were misplaced brackets.

4856

Have you considered putting the bracket parsing code way down here (and recursively calling back into this function after parsing them)?

rtrieu added inline comments.May 9 2014, 1:31 PM
lib/Parse/ParseDecl.cpp
4838

The recovery for brackets requires !D.mayOmitIdentifier() [line 4710] while this branch requires D.mayOmitIdentifier(). Either bracket recovery happens or this code executes, but not both.

4856

Currently, the error correction of:

int [1] a [2];

to

int a[1][2];

If this was done recursively, the bracket recovery would go to the end instead to:

int a[2][1];
rtrieu updated this revision to Diff 9367.May 13 2014, 3:44 PM

Switching to a recursive solution for misplaced brackets. Upon seeing a bracket before the identifier, parse the brackets and store the info. Call ParseDirectDeclarator again to continue parsing. After that returns, attach the bracket info the Declarator and emit the misplaced bracket error with fix-it hint to move the brackets after the identifier. Additionally, store the location of the beginning of the brackets so that other error messages may point to a better location.

+static SourceLocation getDiagLoc(Declarator &D, SourceLocation Loc) {

This is a very generic name for quite a specific behavior.
'getMissingDeclaratorIdLoc' or similar?

+ if (Tok.is(tok::l_square)) {

Maybe factor this out into a separate function.
'ParseMisplacedBracketDeclarator' or similar?

+ SavedBracketInfo SavedInfo;
+ SourceLocation Loc = Tok.getLocation();
+
+ while (Tok.is(tok::l_square)) {
+ ParseBracketDeclarator(D, &SavedInfo);

Is it possible to handle this by parsing into a new Declarator here (and
copying its chunks into D once we're done), instead of introducing a
separate SavedBracketInfo code path into ParseBracketDeclarator?

+ Something went wrong parsing the parens, in which case,
[...]
+
Adding back the paren info to the end of the Declarator.

Please say "brackets" rather than "parens"; we're pretty consistent about
using "parens" to mean () and not [].

+ SavedInfo->push_back(
+ {DeclaratorChunk::getArray(0, false, false, 0,
T.getOpenLocation(),

MSVC doesn't support this syntax.

rtrieu updated this revision to Diff 9726.May 22 2014, 7:00 PM

Moved the misplaced bracket logic from ParseDirectDeclarator into its own method.
Use a temporary Declarator to store the bracket location instead of using a custom storage object.
Renamed getDiagLoc to getMissingDeclaratorIdLoc.
Fixed up paren/bracket mistakes.

Looks great, ship it!

Wait... one more thing:

int [3] *x;

We should suggest adding parentheses here, because

int *x[3];

means the wrong thing.

rtrieu updated this revision to Diff 9892.May 28 2014, 4:11 PM

Add code to handle:

int [] *a;

This is handled by inserting parens in the token stream, then backtracking and have ParseDirectDeclarator handle the parens as normal through ParseParenDeclarator.

rtrieu updated this revision to Diff 10311.Jun 10 2014, 6:49 PM

Clean up brackets.cpp test case
Add new tests for reference variables
Use ParseDeclaratorInternal instead of ParseDirectDeclarator
Remove use of token stream mangling and backtracking
Skip the error message when the closing paren comes before the opening paren in the fix-it hint

The main difference is restarting the parse earlier with ParseDeclaratorInternal, which allows for parsing the leading '*' and '&' tokens in "int [1] *foo" and "int [2] &bar"

+ if (Tok.is(tok::l_square)) {
+ if (ParseMisplacedBracketDeclarator(D)) {
+ ParseMisplacedBracketDeclarator(D);
+
goto PastIdentifier;
+ return;

Remove comments here; maybe just

if (Tok.is(tok::l_square))

return ParseMisplacedBracketDeclarator(D);

+ Sometimes, parentheses are needed. Determine if they are needed by
+
looking at the current token. If they are needed, store the location
+ of the left parentheses in SuggestParenLoc.
+ SourceLocation SuggestParenLoc;
+ if (Tok.isNot(tok::identifier) && Tok.isNot(tok::l_paren) &&
+ Tok.isNot(tok::semi)) {
+ SuggestParenLoc = Tok.getLocation();
+ D.getName().EndLocation = SourceLocation();
+ }
+
+
Now that the brackets are removed, try parsing the declarator again.
+ ParseDeclaratorInternal(D, &Parser::ParseDirectDeclarator);

This logic for determining whether parens are needed doesn't seem to cover
all cases. In particular:

int [3] T::*p;

needs parens but starts with an identifier. Conversely,

int [3] ::n;

doesn't need parens, but gets them. Instead, how about something like:

SourceLocation LocBeforeDeclarator = Tok.getLocation();
ParseDeclaratorInternal(D, &Parser::ParseDirectDeclarator);

bool NeedParens = false;
if (D.getNumTypeObjects()) {

switch (D.getTypeObject(D.getNumTypeObjects())) {
  case DeclaratorChunk::Pointer:
  case DeclaratorChunk::Reference:
  case DeclaratorChunk::BlockPointer:
  case DeclaratorChunk::MemberPointer:
    NeedParens = true;
    break;
  case /* all the rest */
    break;
}

}

// ...

+ // Adding back the bracket info to the end of the Declarator.

If you've detected that you need parens, it'd be nice to fabricate a
DeclaratorChunk for the parens, just in case something downstream is making
assumptions about what DeclaratorChunks can exist based on how types can be
written.

+ The missing identifier would have been diagnosed in
ParseDirectDeclarator.
+
If parentheses are required, always suggest them.
+ if (!D.getIdentifier() && !SuggestParenLoc.isValid())
+ return;

It would be nice to assert !D.mayOmitIdentifier() somewhere in this
function, maybe right next to this check, since you're relying on that
here. (I could certainly imagine this code getting reused for parsing
abstract declarators at some point.)

+ SourceLocation EndBracketLoc =
+ TempDeclarator.getTypeObject(TempDeclarator.getNumTypeObjects() - 1)
+ .EndLoc;

Is this just 'TempDeclarator.getLocEnd()'?

+ When suggesting parentheses, the closing paren should not be before
the
+
opening paren.
+ if (SuggestParenLoc.isValid() && EndLoc < SuggestParenLoc)

This kind of SourceLocation comparison won't work in general (for instance,
it will do weird things if the two locations have different FileIDs). Does
this error case really happen?

Also, you still need to produce some kind of diagnostic on this code path.

rtrieu updated this revision to Diff 10378.Jun 12 2014, 9:22 PM
In D2712#21, @rsmith wrote:

+ if (Tok.is(tok::l_square)) {
+ if (ParseMisplacedBracketDeclarator(D)) {
+ ParseMisplacedBracketDeclarator(D);
+
goto PastIdentifier;
+ return;

Remove comments here; maybe just

Removed comments.

if (Tok.is(tok::l_square))
  return ParseMisplacedBracketDeclarator(D);

+ Sometimes, parentheses are needed. Determine if they are needed by
+
looking at the current token. If they are needed, store the location
+ of the left parentheses in SuggestParenLoc.
+ SourceLocation SuggestParenLoc;
+ if (Tok.isNot(tok::identifier) && Tok.isNot(tok::l_paren) &&
+ Tok.isNot(tok::semi)) {
+ SuggestParenLoc = Tok.getLocation();
+ D.getName().EndLocation = SourceLocation();
+ }
+
+
Now that the brackets are removed, try parsing the declarator again.
+ ParseDeclaratorInternal(D, &Parser::ParseDirectDeclarator);

This logic for determining whether parens are needed doesn't seem to cover
all cases. In particular:

int [3] T::*p;

needs parens but starts with an identifier. Conversely,

int [3] ::n;

doesn't need parens, but gets them. Instead, how about something like:

SourceLocation LocBeforeDeclarator = Tok.getLocation();
ParseDeclaratorInternal(D, &Parser::ParseDirectDeclarator);

bool NeedParens = false;
if (D.getNumTypeObjects()) {
  switch (D.getTypeObject(D.getNumTypeObjects())) {
    case DeclaratorChunk::Pointer:
    case DeclaratorChunk::Reference:
    case DeclaratorChunk::BlockPointer:
    case DeclaratorChunk::MemberPointer:
      NeedParens = true;
      break;
    case /* all the rest */
      break;
  }
}

Paren adding updated.

+ // Adding back the bracket info to the end of the Declarator.

If you've detected that you need parens, it'd be nice to fabricate a
DeclaratorChunk for the parens, just in case something downstream is making
assumptions about what DeclaratorChunks can exist based on how types can be
written.

Done. New paren DeclaratorChuck added right after figuring out they are needed.

+ The missing identifier would have been diagnosed in
ParseDirectDeclarator.
+
If parentheses are required, always suggest them.
+ if (!D.getIdentifier() && !SuggestParenLoc.isValid())
+ return;

It would be nice to assert !D.mayOmitIdentifier() somewhere in this
function, maybe right next to this check, since you're relying on that
here. (I could certainly imagine this code getting reused for parsing
abstract declarators at some point.)

!D.mayOmitIdentifier() asserted at the beginning of the function since this function assumed the existence of an identifier.

+ SourceLocation EndBracketLoc =
+ TempDeclarator.getTypeObject(TempDeclarator.getNumTypeObjects() - 1)
+ .EndLoc;

Is this just 'TempDeclarator.getLocEnd()'?

Yes. Changed.

+ When suggesting parentheses, the closing paren should not be before the
+
opening paren.
+ if (SuggestParenLoc.isValid() && EndLoc < SuggestParenLoc)

This kind of SourceLocation comparison won't work in general (for instance,
it will do weird things if the two locations have different FileIDs). Does
this error case really happen?

Changed it to detect that the Declarator end location changes after the call to ParseDeclaratorInternal. Something like "int [];" will trigger this case.

Also, you still need to produce some kind of diagnostic on this code path.

The ParseDeclaratorInternal should provide the error here.

rsmith accepted this revision.Jun 24 2014, 8:58 AM
rsmith added a reviewer: rsmith.

Looks great, with one tweak.

lib/Parse/ParseDecl.cpp
5646–5649

This looks unreachable. If nothing after the brackets has been parsed, then there can't be an identifier, so we should have bailed out already. Remove?

This revision is now accepted and ready to land.Jun 24 2014, 8:58 AM
rtrieu closed this revision.Jun 24 2014, 4:23 PM

Committed at r211641.

lib/Parse/ParseDecl.cpp
5646–5649

Removed.