This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Avoid indexing decls associated with friend decls.
ClosedPublic

Authored by ioeric on Jun 1 2018, 2:50 AM.

Details

Summary

These decls are sometime used as the canonical declarations (e.g. for go-to-def),
which seems to be bad.

  • friend decls that are not definitions should be ignored for indexing purposes
  • this means they should never be selected as canonical decl
  • if the friend decl is the only decl, then the symbol should not be indexed

Event Timeline

ioeric created this revision.Jun 1 2018, 2:50 AM
ilya-biryukov added inline comments.Jun 1 2018, 5:20 AM
clangd/index/SymbolCollector.cpp
293

Maybe move this closer to shouldFilterDecl()? We have similar filters there.
That would also mean we properly add the reference counts for friend declarations that get a normal declaration after their usage later.

Do I understand the intent of this change correctly?

  • friend decls that are not definitions should be ignored for indexing purposes
  • this means they should never be selected as canonical decl
  • if the friend decl is the only decl, then the symbol should not be indexed

if so, that makes sense to me. I think the comments could make this a little clearer, but it's not too bad.

clangd/index/SymbolCollector.cpp
297

this seems suspect, we're going to treat the third decl in friend X; X; friend X differently from that in X; friend X; friend X;.

Why? i.e. why is the inner check necessary and why does it treat the original (meaning first, I think) decl specially?

unittests/clangd/SymbolCollectorTests.cpp
816

Can you also test that:

  • if a friend decl (non-definition) comes first, followed by a non-friend decl (non-definition), then the decl *is* indexed. (maybe just drop the definition from foo, since it's otherwise the same as Y)
  • if a friend decl has a definition, and there is no other declaration, then the decl *is* indexed (and the friend decl is canonical)
ioeric added inline comments.Jun 1 2018, 5:35 AM
clangd/index/SymbolCollector.cpp
293

I didn't put this filter there because I think it's a bit more special than those filters in shouldFilterDecl. We check the OrigD and we could potentially replace D with OrigD. We could change shouldFilterDecl to handle that, but I'm not sure if it's worth it.

Reference counting for friend declarations is actually a bit tricky as USRs of the generated declarations might be ambiguous.

Consider the following exmaple:

namespace a {                
class A {};                  
namespace b { class B { friend class A; };  }  // b                          
} // a

I would expect the generated friend decl to be a::A, but it's actually a::b::A! So getting USR right is a bit tricky, and I think it's probably ok to ignore references in friend decls.

For reference, [namespace.memdef]p3:
"If the name in a friend declaration is neither qualified nor a template-id and the declaration is a function or an elaborated-type-specifier, the lookup to determine whether the entity has been previously declared shall not consider any scopes outside the innermost enclosing namespace.

ioeric edited the summary of this revision. (Show Details)Jun 1 2018, 6:15 AM
ioeric updated this revision to Diff 149446.Jun 1 2018, 6:18 AM
ioeric marked an inline comment as done.
ioeric edited the summary of this revision. (Show Details)
  • Addressed review comment.
clangd/index/SymbolCollector.cpp
297

this seems suspect, we're going to treat the third decl in friend X; X; friend X differently from that in X; friend X; friend X;.

I'm not very sure if I understand the problem. But I'll try to explain what would happen for these two examples.

  • For the first example, the first friend decl will be the canonical decl, and we would only index the second X since its OrigD is not in friend decl. Both the first and third friend decl will not be indexed.
  • For the second example, the first non-friend X will be the canonical decl, and all three occurrences will have the same D pointing to it. This probably means that the same X will be processed three times, but it's probably fine (we might want to optimize it).

Basically, D is always the canonical declaration in AST and OrigD is the declaration that the indexer is currently visiting. I agree it's confusing...

Why? i.e. why is the inner check necessary and why does it treat the original (meaning first, I think) decl specially?

The inner check handles the following case:

class X {
  friend void foo();
}
void foo();

There will be two occurrences of foo in the index:

  1. The friend decl, where D will be the canonical decl (i.e. friend foo) and OrigD will also be friend foo.
  2. The non-friend decl, where D will still be the canonical decl (i.e. friend foo) and OrigD is now the non-friend decl.
unittests/clangd/SymbolCollectorTests.cpp
816

Done. I have missed the case where friend decl is a definition. Thanks!

  • friend decls that are not definitions should be ignored for indexing purposes

This is not generally true IIUC. A friend declaration can be a reference, a declaration or a definition.

int foo(int);
int bar(int, int);
class ExistingFwdCls;

class X {
  friend class ExistingFwdCls; // a reference and a declaration.
  friend class NewClass; // a new declaration.
  friend int foo(int); // a reference and a declaration.
  friend int baz(int, int, int); // a new declaration.
};

class Y {
  friend class ::ExistingFwdCls; // qualified => just a reference.
  friend int ::bar(int a, int b); // qualified => just a reference.
  friend int foo(int) { // a reference  and a definition
    return 100;
  }
};

Note that friend functions with bodies are probably ok as canonical declaration, as they are often the only declaration, e.g.

class Foo {
  friend bool operator < (Foo const& lhs, Foo const&lhs) {
   return false;
  }
};
clangd/index/SymbolCollector.cpp
293

Reference counting for friend declarations is actually a bit tricky as USRs of the generated declarations might be ambiguous.

This seems like an obvious bug in the USRs that we should fix. WDYT?

sammccall added inline comments.Jun 1 2018, 6:43 AM
clangd/index/SymbolCollector.cpp
297

Basically, D is always the canonical declaration in AST and OrigD is the declaration that the indexer is currently visiting. I agree it's confusing...

Whoops, I had this backwards. So I guess I mean the outer check.
Fundamentally my question is: in an *arbitrary* list of redecls of a symbol, what is special about the canonical declaration (D, which is just the first in the list) that we particularly care whether it's a friend?
I would expect that either we're ignoring the other redecls, or we're looping over all redecls.

And here, I think we could just skip all friend decls (that are not definitions), regardless of what D->getFriendObjectKind() is. If we have other decls, the symbol was indexed already. If we do not, then we don't want to index it anyway.

ioeric added inline comments.Jun 1 2018, 6:44 AM
clangd/index/SymbolCollector.cpp
293

Sorry, I think I was not clear... I think this is intended according to the standard. So in the example, the qualified name of the friend decl a::b::A is, although confusing, correct, and o the actual problem is not with the USR.

See [namespace.memdef]p3:
"If the name in a friend declaration is neither qualified nor a template-id and the declaration is a function or an elaborated-type-specifier, the lookup to determine whether the entity has been previously declared shall not consider any scopes outside the innermost enclosing namespace.

ioeric updated this revision to Diff 149465.Jun 1 2018, 7:32 AM
  • Clarify.
clangd/index/SymbolCollector.cpp
297

Fundamentally my question is: in an *arbitrary* list of redecls of a symbol, what is special about the canonical declaration (D, which is just the first in the list) that we particularly care whether it's a friend?

It's because we have chosen to use D as a symbol's canonical declaration (line 332). Here we want to override the canonical declaration D when it's a friend decl, so that the first non-friend decl would become the canonical declaration of the symbol.

And here, I think we could just skip all friend decls (that are not definitions), regardless of what D->getFriendObjectKind() is.

OK. I think I understand where the confusion came from now. The override of D should've been a separate check:

if (OrigD is non-definition friend)
  skip;
if (D is friend decl)
  D = OrigD;
sammccall accepted this revision.Jun 1 2018, 9:23 AM

This looks OK, the asymmetry still seems a little odd to me and could be reduced a little.

(Please also resolve Ilya's question about references with him, I don't have a strong opinion)

clangd/index/SymbolCollector.cpp
303

Ah, now I understand - we're picking another to use as canonical. But the exception for definitions should apply here too. And there's nothing special about *OrigD* that makes it a good pick for canonical.

For determinism, can we instead iterate over the redecls of D and pick the first one that's not a friend or is a definition?

(I'd pull that check out into a function shouldSkipDecl and rename the existing one shouldSkipSymbol, but up to you)

This revision is now accepted and ready to land.Jun 1 2018, 9:23 AM
ioeric updated this revision to Diff 149526.Jun 1 2018, 12:14 PM
  • Make canonical decls determinstic.
ioeric updated this revision to Diff 149528.Jun 1 2018, 12:16 PM
  • Remove debug message.
  • friend decls that are not definitions should be ignored for indexing purposes

This is not generally true IIUC. A friend declaration can be a reference, a declaration or a definition.

int foo(int);
int bar(int, int);
class ExistingFwdCls;

class X {
  friend class ExistingFwdCls; // a reference and a declaration.
  friend class NewClass; // a new declaration.
  friend int foo(int); // a reference and a declaration.
  friend int baz(int, int, int); // a new declaration.
};

class Y {
  friend class ::ExistingFwdCls; // qualified => just a reference.
  friend int ::bar(int a, int b); // qualified => just a reference.
  friend int foo(int) { // a reference  and a definition
    return 100;
  }
};

Note that friend functions with bodies are probably ok as canonical declaration, as they are often the only declaration, e.g.

class Foo {
  friend bool operator < (Foo const& lhs, Foo const&lhs) {
   return false;
  }
};

Sorry for the late response Ilya. I was trying to test these cases. So, with the current change, if a real "canonical" declaration comes before the friend decl, then the reference will still be recorded (I've added a test case for this). Would this address your concern?

clangd/index/SymbolCollector.cpp
303

But the exception for definitions should apply here too

I'm not sure why this is needed. Do you have an example?

And there's nothing special about *OrigD* that makes it a good pick for canonical.

I think OrigD is "special" because we have checked that it's not an uninteresting friend decl.

For determinism, can we instead iterate over the redecls of D and pick the first one that's not a friend or is a definition?

This seems to be the most ideal option, although, in practice, it's a bit hard to check if each redecl is a definition, as we don't have Roles for them. To make it more deterministic, I added a map from clang's canonical decl to clangd's "canonical" decl so that we could be sure that all redecls would have the same D. WDYT?

(I'd pull that check out into a function shouldSkipDecl and rename the existing one shouldSkipSymbol, but up to you)

I gave this a try, but it couldn't find a very good abstraction. Happy to chat more next week.

Sorry for the late response Ilya. I was trying to test these cases. So, with the current change, if a real "canonical" declaration comes before the friend decl, then the reference will still be recorded (I've added a test case for this). Would this address your concern?

No worries about the late response.
I think we should not rely on an extra being before the friend declarations, i.e. we should count a reference either way.
The compiler does not enforce that and there's definitely lots of code where friend is the first decl. But it's totally up to you if you actually want to support this or not, of course. I do agree that this isn't ever going to be a major issue.

sammccall accepted this revision.Jun 4 2018, 4:24 AM
sammccall added inline comments.
clangd/index/SymbolCollector.cpp
302

I'd rephrase the second sentence as "Use OrigD instead, unless we've already picked a replacement for D". up to you

ioeric added a comment.Jun 4 2018, 4:29 AM

Sorry for the late response Ilya. I was trying to test these cases. So, with the current change, if a real "canonical" declaration comes before the friend decl, then the reference will still be recorded (I've added a test case for this). Would this address your concern?

No worries about the late response.
I think we should not rely on an extra being before the friend declarations, i.e. we should count a reference either way.
The compiler does not enforce that and there's definitely lots of code where friend is the first decl. But it's totally up to you if you actually want to support this or not, of course. I do agree that this isn't ever going to be a major issue.

I think it would be pretty hard (or impossible) to get reference counting for friend decls correct for all cases given how friend decls work in C++ (e.g. friend decls and original decls might not even have the same name in AST). It would be nice to be able to handle all cases, but I do feel solving them is out of the scope of this patch.

ioeric updated this revision to Diff 149721.Jun 4 2018, 4:31 AM
ioeric marked an inline comment as done.
  • Addressed review comment.
ioeric updated this revision to Diff 149723.Jun 4 2018, 4:34 AM
  • Revert unintended changes.
This revision was automatically updated to reflect the committed changes.