This is an archive of the discontinued LLVM Phabricator instance.

[clangd] go-to-def on names in comments etc that are used nearby.
ClosedPublic

Authored by sammccall on Mar 2 2020, 1:45 PM.

Details

Summary

This is intended as a companion to (and is inspired by) D72874 which attempts to
resolve these cases using the index.
The intent is we'd try this strategy after the AST-based approach but before the
index-based (I think local usages would be more reliable than index matches).

Diff Detail

Event Timeline

sammccall created this revision.Mar 2 2020, 1:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2020, 1:45 PM
nridge added a comment.Mar 3 2020, 3:45 PM

Thanks for putting this together, this is really neat and will indeed complement D72874 nicely!

I have one high-level comment about the interface, the rest are nits.

clang-tools-extra/clangd/XRefs.cpp
339

nit: I think separate names for the FileID and the position would read better

371

nit: space after if

clang-tools-extra/clangd/XRefs.h
57

Do we want to bake this condition into the interface of this function, or would it be more appropriate to just tell the caller whether it's a real identifier token or not?

In particular, given our plan for handling some dependent cases, the caller may want to do something along the lines of if (notRealIdentifier || isDependent) { /* use the textual heuristic */ }.

clang-tools-extra/clangd/unittests/XRefsTests.cpp
1007

Tangentially related, but what do you think about the issue I raised in this mailing list thread about testcase style?

1039

// (taking into account the penalty for going backwards)

1066

nit: space after *

1073

EXPECT_EQ?

nridge added a comment.Mar 5 2020, 4:30 PM

Here's a test case that gives a less desirable result with this approach than D72874:

struct Foo {
  void uniqueMethodName();
};
struct Bar {
  void uniqueMethodName();
};

// Will call u^niqueMethodName() on t.
template <typename T>
void f(T t);

This method only returns Bar::uniqueMethodName() because it's closer in terms of distance, whereas D72874 returns both methods.

Is that perhaps a reason to adjust the order in which we try the approaches (i.e. use this one as a fallback if index lookup fails)? Or should we try to allow multiple results with this approach as well?

This method only returns Bar::uniqueMethodName() because it's closer in terms of distance

Yeah, missing data is definitely bad:

  • not finding results makes the feature feel flaky or unreliable
  • finding *partial* results can be misleading and makes the feature untrustworthy

Is that perhaps a reason to adjust the order in which we try the approaches (i.e. use this one as a fallback if index lookup fails)?

Well, it's even easier to construct examples where the index approach is missing some data (most symbols are not indexable) or gives wrong results (name collisions are common) which is probably worse.
So I'm not convinced by this that the index is more accurate.

Or should we try to allow multiple results with this approach as well?

It's an interesting idea, but two immediate risks:

  • it's clearly worse for things like comment navigation local variables
  • it helps your example only quite narrowly: all the targets need to be in the current file

We have several building blocks and various ways for how to put them together.
I think we should go back to the use cases (e.g. dependent code) and work out how each is best served, my intuition is that pretty different. Discussing how the building blocks should relate to each other in the general case may run in circles a bit :-) I'll make a proposal on https://github.com/clangd/clangd/issues/241 for a bit more visibility.

sammccall marked an inline comment as done.Mar 6 2020, 6:14 AM

I'll make a proposal on https://github.com/clangd/clangd/issues/241 for a bit more visibility.

OK, I wrote a bunch of stuff there (twice, laptop crashed, grr...)

I'll put this briefly on hold to see what you/others think there - e.g. the real-identifier bit is best decided with the final plan in mind.

clang-tools-extra/clangd/unittests/XRefsTests.cpp
1007

Posted a reply on the thread. TL;DR:

  • I hate that too
  • the style brings several benefits
  • I don't know what would be better that's available
  • but we could probably build something
sammccall updated this revision to Diff 257989.Apr 16 2020, 1:37 AM

rebase - finally getting back to this

Merge with existing heuristic, extracting SpelledWord struct for the analysis.

hokein added a subscriber: hokein.Apr 16 2020, 12:18 PM
sammccall updated this revision to Diff 258187.Apr 16 2020, 3:29 PM

Add tests for SpelledWord

sammccall marked 3 inline comments as done.Apr 16 2020, 3:51 PM

OK, I think this is probably finally good for another round - it's pretty well merged with the index-based version, and the common parts are pulled out to SourceCode and tested.
I couldn't resist adding support for backtick-identifiers and a few others while there...

This doesn't do the lookup using concentric semantic ranges described in the bug. I think what it does is probably good enough for now and probably won't have time to rebuild that part really soon...

sammccall updated this revision to Diff 258193.Apr 16 2020, 3:56 PM

address comments

nridge accepted this revision.Apr 20 2020, 10:49 PM

Thanks, Sam, this looks great!

clang-tools-extra/clangd/SourceCode.cpp
858

nit: qualify StringRef or not consistently

866

It's interesting to note that clang has a lexer and parser for doxygen comments (see e.g. RawComment::parse()), so we could conceivably do something more structured, but it's probably not worth the effort.

913

Expanded.size() == 1 implies !Expanded.empty()

clang-tools-extra/clangd/SourceCode.h
231

quotes --> backticks?

clang-tools-extra/clangd/XRefs.cpp
451

Is this any different from getSpellingLineNumber(Loc)?

clang-tools-extra/clangd/unittests/SourceCodeTests.cpp
377

Maybe test the initialism thing with EXPECT_FALSE(word("// [[H^TTP]] ").LikelyIdentifier);

clang-tools-extra/clangd/unittests/XRefsTests.cpp
1028

What's the rationale for supporting string literals for the nearby-ident heuristic, but not the index heuristic?

1032

(Did you mean to write a test case where the macro invocation expands to nothing?)

This revision is now accepted and ready to land.Apr 20 2020, 10:49 PM
sammccall updated this revision to Diff 259295.Apr 22 2020, 8:19 AM
sammccall marked 15 inline comments as done.

address review comments

This revision was automatically updated to reflect the committed changes.