This is an archive of the discontinued LLVM Phabricator instance.

Fix heuristics skipping invalid ctor-initializers with C++11
ClosedPublic

Authored by ogoffart on Jun 19 2016, 2:37 AM.

Details

Summary

Use better heuristics to detect if a '{' might be the start of the
constructor body or not. Especially when there is a completion token.

Fix the test 'test/CodeCompletion/ctor-initializer.cpp ' when clang defaults to c++11

Replaces http://reviews.llvm.org/D21497

Diff Detail

Repository
rL LLVM

Event Timeline

ogoffart updated this revision to Diff 61193.Jun 19 2016, 2:37 AM
ogoffart retitled this revision from to Fix heuristics skipping invalid ctor-initializers with C++11.
ogoffart updated this object.
ogoffart added reviewers: rsmith, probinson.
ogoffart added a reviewer: cfe-commits.

Ping.

The problem i'm fixing here is how we recover invalid code in the ctor-init part as we skip the function body.
In particular, we want to know if the '{' is the begining of the body or not. In C++03, we always consider it as the beginng of the body. The problem was that in C++11 we don't, making the code skip too much, causing worse parse error later.

So what this patch is doing is finding heuristics to know if the '{' is starting a function body or not.
The rules are the following: If we are not in a template argument, anf that the previous tokens are not an identifier, or a > , then it is much more likely to be the function body. We verify that further by checking that the token after the matching '}'

I also changed the code to just ignore the code_completion token at this point. The previous code was making it thinking that it would then be a template argument. But this is not likely.

rsmith edited edge metadata.Aug 24 2016, 1:17 PM

Please produce patches with more lines of context in future; phabricator only lets us comment on lines that are included in the patch, and in this case some of the relevant parts of the function are not in the context. (The equivalent of diff -U1000 is a common approach for this.)

Taking a step back, I wonder whether we have the right strategy overall for code completion within in-class mem-initializer-lists. The code after the code completion token is quite plausibly not even brace-balanced, in a case where you're writing a new constructor. Consider this completion:

struct Foo {
  Foo() : some_long_x(0), some_|
  int some_long_x, some_long_y;
};

Here, we ought to be able to complete "some_long_y", but we need to recognize that the next token is not part of the function definition in order to do that (and then we need to not try to consume a function body once we're done with the initializer). And conversely when completing here:

struct Foo {
  Foo() : some_long_x(0), some_| {}
  int some_long_x, some_long_y;
};

... we need to recognize that we do have a function body so that we can parse the members to find out what names we should complete.

However, this patch is an incremental improvement over what we already have, so I'm happy to go in this direction for now.

lib/Parse/ParseCXXInlineMethods.cpp
840 ↗(On Diff #61193)

Can you delete the check for kw_template here? We handle the template keyword above, and any time we actually hit this case we would expect template to be followed by one of ::, (, or {, which makes no sense.

847–851 ↗(On Diff #61193)

You should probably also handle the case of a comma immediately after the code completion token, for completions like this:

struct A {
  A() : new_mem|, existing_member() {}
  int new_member, existing_member;
};

... and likewise the case where the completion token is followed by an identifier, a ::, or a decltype, for completions like this:

struct A {
  A() : new_mem| existing_member() {}
  int new_member, existing_member;
};

In all those cases, I think you can just continue to pick up the rest of the initializers after the code completion token. (And if you continue from here in those cases, I think you can remove the handling of the code_completion token up on line 835, since this codepath will do the right thing in all those cases.)

894–897 ↗(On Diff #61193)

Please add a comment here indicating that if the previous token is not one of these kinds, we've encountered an error (because this mem-initializer is missing its initializer). This is not really obvious, and it's important because that's what makes it correct to use a heuristic to guess whether we've got a function body next.

ogoffart updated this revision to Diff 69223.Aug 25 2016, 4:42 AM
ogoffart edited edge metadata.

Made the requested changes

Regarding this:

struct Foo {
  Foo() : some_long_x(0), some_| {}
  int some_long_x, some_long_y;
};

That should work fine because the token before the { is the code completion token, not an identifier. This is basicaly tested by the test with CHECK-CC2.

However, this does not work because if the completion is within an identifier, the Lexer will abort by calling cutOffLexing from Lexer::LexIdentifier

Ping?
I guess i coud just commit it now.

This revision was automatically updated to reflect the committed changes.