This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Ensure Ref::Container refers to an indexed symbol
ClosedPublic

Authored by nridge on Jun 28 2021, 10:54 PM.

Diff Detail

Event Timeline

nridge created this revision.Jun 28 2021, 10:54 PM
nridge requested review of this revision.Jun 28 2021, 10:54 PM
kadircet added inline comments.Jul 2 2021, 8:00 AM
clang-tools-extra/clangd/index/SymbolCollector.cpp
360

why do we need to expose this function in the class interface? ASTCtx is reachable from Decl, and we can just pass in Opts.

Also I believe we should take in a declcontext as an input parameter, and make use of ASTNode.ParentDC for starting the traversal.

This also requires some comments explaining that it'll return the first context that is known by index.

361

in theory enclosing namespace can also be unnamed, something like a BlockDecl.

I think we should enforce the return value to be a named decl, but I don't think we should enforce any intermediate parents to be named decls.

506–507

as mentioned above I think we should start the traversal from ASTNode.ParentDC. any downsides to doing that ? I don't see a case where this container we return is not a DeclContext.

nridge added inline comments.Jul 4 2021, 9:33 PM
clang-tools-extra/clangd/index/SymbolCollector.cpp
506–507

I assume you mean ASTNode.ContainerDC.

I tried this suggestion, but it looks like ContainerDC does not contain what we want in many cases.

For all of the following cases in SymbolCollectorTest.RefContainers:

  • ref2 (variable initializer)
  • ref3 (function parameter default value)
  • ref5 (template parameter default value)
  • ref6 (type of variable)
  • ref7a (return type of function)
  • ref7b (parameter type of function)

ASTNode.ContainerDC is the TranslationUnitDecl (i.e. the DeclContext in which the function or variable is declared) rather than the function or variable itself.

In addition, for ref4 (member initializer), ContainerDC is the RecordDecl rather than the FieldDecl.

So, I don't think this suggestion will work without making the Ref::Container field significantly less specific (and thereby the call hierarchy feature less useful in some cases).

nridge updated this revision to Diff 356428.Jul 4 2021, 9:44 PM
nridge marked an inline comment as done.

Addressed review comments, except for the one about using ContainerDC, for the reason explained

kadircet added inline comments.Jul 5 2021, 10:56 PM
clang-tools-extra/clangd/index/SymbolCollector.cpp
159

can you also add some info about why we are not using DeclContexts ? something like:

Decls can nest under non-DeclContext nodes, in cases like initializer, where they might be attributed to VarDecl.
Preserving that level of granularity is especially useful for initializers at top-level, as otherwise the only context we
can attach these refs is TranslationUnitDecl, which we don't preserve in the index.
FIXME: Maybe we should have some symbols for representing file-scopes, that way we can show these refs as
being contained in the file-scope.

(Last fixme bit is optional, please add that if you also think the functionality would be more useful that way)

506–507

I assume you mean ASTNode.ContainerDC.

yes, sorry for the mess.

So, I don't think this suggestion will work without making the Ref::Container field significantly less specific (and thereby the call hierarchy feature less useful in some cases).

I can't seem to remember why we went with it in the first place. But now:

int foo();
int x = foo();
void bar() {
  int y = foo();
}

calling call hierarchy on foo will yield x and bar, creating some sort of discrepancy. I am not really sure if telling first call to foo is contained in x is really useful compared to saying it was contained in filename.cpp (but ofc we can't do that today as container needs to be a symbol in the index, hence the ref will be lost instead). do you also think that would be more useful (i am not really a user of this feature so I would like to hear which is better from some users)?

also I can't say that I find the discrepancy between an initializer in parmvardecl/nontypetemplateparm vs vardecl useful (i know it already exists today, but not really sure why).
they are actually attached to the first declcontext containing them, not the first decl (which you say is more useful, let's leave the fact that they are not indexed out for a while).
moreover it is a behavior we rely on through libindex, as it just marks ASTNode.Parent with declcontext rather than making use of the vardecl. which makes me believe it is not really important to have the top-level symbol as the container. But it is the only way we can retain some container information.
if that's the case, I believe we should make this more explicit, i.e. start a traversal from the node itself, remember the last indexed decl we've seen as a parent and return that if we failed to find any indexed declcontext's containing the decl (with a fixme saying that we could have some file symbols in the index for translationunitdecls).
That way we can say that the container is the first DeclContext containing the reference (with the exception of fixme mentioned).

nridge added inline comments.Jul 6 2021, 11:27 PM
clang-tools-extra/clangd/index/SymbolCollector.cpp
506–507

I am not really sure if telling first call to foo is contained in x is really useful compared to saying it was contained in filename.cpp [...] do you also think that would be more useful (i am not really a user of this feature so I would like to hear which is better from some users)?

In my opinion, in cases where a reference to a function occurs in the initializer (or otherwise in the declaration, such as perhaps in the type) of a namespace-scope variable, it is significantly more useful to show the name of that variable in the call hierarchy, than the name of the containing file.

Consider an example like this:

void FlushAndClearCaches();
...
// elsewhere, at namespace scope:
std::function<void()> OnMemoryPressureCallback = &FlushAndClearCaches;

If I invoke call hierarchy on FlushAndClearCaches (or one of its callees), and I'm looking at the children of the FlushAndClearCaches node in the call hierarchy tree, it is much more informative if the name of the relevant child item is OnMemoryPressureCallback, than if it is a file name. If it's a file name, I need to take the extra step of clicking on the item to navigate to the target file and the reference location inside it, and look there, whereas if it's a variable name I may have all the information I need just from the call hierarchy view.

I think being able to eyeball the call hierarchy view and get and idea of what sorts of functionality calls into a function, without having to click on each item and examine the context, is an important use case.

(It's fair to ask if this logic also applies to local variables. I would say no: local variables are more likely to be implementation details with un-interesting names. The name of the containing function is typically the more interesting part, and is still more granular than a file.)

I appreciate that on a technical level, this is an inconsistency between local variables vs. namespace-scope variables. However, from the point of view of a user of call hierarchy, I think this combination gives the best results.

I am also open to revising what we store in Ref::Container if additional consumers (other than call hierarchy) come up, but for now I think it makes sense to let it be guided by what it useful for call hierarchy. (I surveyed the Eclipse CDT codebase, where the corresponding piece of information in the index has one other consumer besides call hierarchy: it's used to annotate entries in the find-references view with the name of the calling function (https://github.com/clangd/clangd/issues/177). Were we to add this feature, I think we'd want to keep the same treatment of variables as for call hierarchy, for similar reasons.)

nridge updated this revision to Diff 357823.Jul 11 2021, 6:38 PM

Address review comments

nridge marked an inline comment as done.Jul 11 2021, 6:39 PM
nridge added inline comments.
clang-tools-extra/clangd/index/SymbolCollector.cpp
159

I've added a comment to explain why we're not using`DeclContext` here.

I'm not sure if your other comment is still relevant, if we agree that the current behaviour is the desirable one.

kadircet accepted this revision.Jul 15 2021, 7:11 AM

thanks, lgtm!
sorry for the late reply :( adding a couple nits.

clang-tools-extra/clangd/index/SymbolCollector.cpp
170

not need for _or_null as loop condition ensures Enclosing != nullptr

nit: const auto *ND

173

nit: I'd just break here and return Enclosing outside.

506–507

(It's fair to ask if this logic also applies to local variables. I would say no: local variables are more likely to be implementation details with un-interesting names. The name of the containing function is typically the more interesting part, and is still more granular than a file.)

Right, I was mostly saying that if we want this behaviour at one place, why we don't want it at the other. I'd claim that top-level function declaration is also going to be an implementation detail most of the time.

I appreciate that on a technical level, this is an inconsistency between local variables vs. namespace-scope variables. However, from the point of view of a user of call hierarchy, I think this combination gives the best results.

SG. I suppose we can revisit the decision in the future if need be and store the vardecl if need be and make callhiearachy iterate over the parents instead.

This revision is now accepted and ready to land.Jul 15 2021, 7:11 AM
nridge updated this revision to Diff 360714.Jul 22 2021, 12:31 AM
nridge marked 3 inline comments as done.

Address review comments

This revision was landed with ongoing or failed builds.Jul 22 2021, 12:34 AM
This revision was automatically updated to reflect the committed changes.