In current indexing logic we get references to class itself when we see
a constructor/destructor which is only syntactically true. Semantically
this information is not correct. This patch marks that reference as
NameReference to let clients deal with it.
Details
- Reviewers
akyrtzi gribozavr nathawes benlangmuir - Commits
- rGe7eb27a9a0ed: [clang][Index] Mark references from Constructors and Destructors to class as…
rL355668: [clang][Index] Mark references from Constructors and Destructors to class as…
rC355668: [clang][Index] Mark references from Constructors and Destructors to class as…
Diff Detail
- Repository
- rC Clang
- Build Status
Buildable 28887 Build 28886: arc lint + arc unit
Event Timeline
These references were added to support using the index data for symbol rename. I.e. so that when you rename the class, you can use the index data to find all occurrences of its name, including its use in constructor/destructor declarations and references (hence the test cases in test/Index/Core/index-source.cpp). If you need to specifically find references of the class symbol, as opposed to its name, could we instead distinguish these cases with a more specific SymbolRole, e.g. NameReference as opposed to Reference, and filter them based on that instead?
These references were added to support using the index data for symbol rename.
Understood -- thank you for providing the context (the original commit that added this code didn't have the explanation why this reference was added). However, my suggestion would be to still take this patch. For symbol rename, my suggestion would be one of the following.
Option (1): Symbol rename is a complex operation that requires knowing many places where the class name appears. So I think it is fair that it would need to navigate to all such declarations using the semantic index -- find the class, then find its constructors, destructor, out-of-line functions, find derived classes, then find using declarations in derived classes, find calls to constructors, destructor etc. I don't think it is not the job of the core indexer API to hardcode all of these places as a "reference to the class". Different clients require different navigation to related symbols, and hardcoding all such navigation patterns in the indexer would create a messy index that is difficult to work with, since you'd have to filter out lots of stuff. Not to even mention the index size.
Option (2): A client that wants to hardcode all navigation patterns in the index can still do this -- in the IndexDataConsumer, it is possible to handle the ConstructorDecl as a reference to the class, if desired. The flow of the indexing information becomes more clear in my opinion: when there is a constructor decl in the source, the IndexDataConsumer gets one ConstructorDecl. Then the specific implementation of IndexDataConsumer decides to treat it also as a class reference, because it wants to support a specific approach to performing symbol renaming.
As is, the indexing information is misleading and surprising. When we see a constructor decl or a constructor call, there is no reference to the class at that point -- there is a reference to a constructor. I think index is supposed to expose semantic information, not just that there's a token in the source code that has the same spelling as the class name.
could we instead distinguish these cases with a more specific SymbolRole, e.g. NameReference as opposed to Reference, and filter them based on that instead?
It feels like a workaround to me, that also pushes the fix to the clients of the indexing API. The core of the problem still exists: the indexing information is surprising, and requires extra work on the client to make it not surprising (filtering out NameReferences).
I agree with Dmitri's reasoning. I believe this should be handled inside the renamers logic as mentioned in option 1. But if you believe this should not be the case, I am also happy to move logic of adding a reference to class on constructors&destructors to the consumer of renamer(as mentioned in option 2).
That was my patch (via Argyrios), so sorry about that!
However, my suggestion would be to still take this patch. For symbol rename, my suggestion would be one of the following.
Option (1): Symbol rename is a complex operation that requires knowing many places where the class name appears. So I think it is fair that it would need to navigate to all such declarations using the semantic index -- find the class, then find its constructors, destructor, out-of-line functions, find derived classes, then find using declarations in derived classes, find calls to constructors, destructor etc. I don't think it is not the job of the core indexer API to hardcode all of these places as a "reference to the class". Different clients require different navigation to related symbols, and hardcoding all such navigation patterns in the indexer would create a messy index that is difficult to work with, since you'd have to filter out lots of stuff. Not to even mention the index size.
I feel like the complexity you mention here is what makes it worthwhile to expose all these locations as occurrences of the type, and I'm fairly sure we're actually already doing that in all these locations (unless I've missed other patches removing them). Pushing this work onto all clients that want to provide automatic global rename functionality or to support users manually renaming by including these in their find-references results seems like a poorer outcome from a code sharing/maintenance point of view than continuing to provide it and requiring clients that don't want it to check the SymbolRole bits inside their IndexDataConsumer's handleDeclOccurrence (if (Roles & SymbolRole::NameReference) return true;). I don't think the index size is concerning; clients don't have to store these occurrences if they don't care about them.
Option (2): A client that wants to hardcode all navigation patterns in the index can still do this -- in the IndexDataConsumer, it is possible to handle the ConstructorDecl as a reference to the class, if desired. The flow of the indexing information becomes more clear in my opinion: when there is a constructor decl in the source, the IndexDataConsumer gets one ConstructorDecl. Then the specific implementation of IndexDataConsumer decides to treat it also as a class reference, because it wants to support a specific approach to performing symbol renaming.
I think the downside here is again the maintenance burden. This code would probably live downstream somewhere so whenever changes happen upstream that affect the code deriving these extra occurrences in clients that want it, or introduce new types of locations that those clients don't handle and miss, there's more duplicated effort updating/hunting down bugs. I also personally see these as legitimate occurrences of the type rather than hardcoded navigation patterns (see below), even though rename was the motivation here.
As is, the indexing information is misleading and surprising. When we see a constructor decl or a constructor call, there is no reference to the class at that point -- there is a reference to a constructor. I think index is supposed to expose semantic information, not just that there's a token in the source code that has the same spelling as the class name.
could we instead distinguish these cases with a more specific SymbolRole, e.g. NameReference as opposed to Reference, and filter them based on that instead?
It feels like a workaround to me, that also pushes the fix to the clients of the indexing API. The core of the problem still exists: the indexing information is surprising, and requires extra work on the client to make it not surprising (filtering out NameReferences).
I agree that most people think of it as purely being the constructor name, but I don't think the semantic connection to the class is really as loose as 'a token with the same spelling' and I think it's reasonable to report it. To resolve a constructor call, the compiler looks for a *type* with that spelling. It's not as if the user chooses to name their constructor with the same name as the type – the name lookup finds the type. One interesting consequence of this is that the compiler seems to complain if you try to reference the constructor as opposed to the type when you call it, e.g.
class A { public: A(int x) {} }; int main() { auto x = A(2); // ok auto y = A::A(2); // error: qualified reference to 'A' is a constructor name rather than a type in this context return 0; }
If we report these occurrences with a SymbolRole of NameReference, is it really that misleading as to why they are reported? The SymbolRole bit set tells you the kind of occurrence you’re dealing with (Decl, Definition, Call, etc) and already has an Implicit role for occurrences that don’t actually appear in the source code for cases like the implicitly generated default constructors as mentioned above. Some clients may find that surprising at first or undesirable for their particular use case, but it's still useful overall and ultimately understandable IMO.
Also taking a step back a bit, what were the specific concerns / original motivation to change the existing behavior? Is it just that these references don't match what you expected the index to model, or their contribution to idex size, or something else? I kind of just made my own assumptions and went with it, sorry.
- Add NameReference role kind, and tag references from constructors/destructors to the class with that kind.
As discussed offline constructor calls are a different issue, coming from even a different layer than Indexing API.
No need for parens around SymbolRole::NameReference.