This is intended as a companion to (and is inspired by) D72874 which attempts to
resolve these cases using the index.
The intent is we'd try this strategy after the AST-based approach but before the
index-based (I think local usages would be more reliable than index matches).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for putting this together, this is really neat and will indeed complement D72874 nicely!
I have one high-level comment about the interface, the rest are nits.
| clang-tools-extra/clangd/XRefs.cpp | ||
|---|---|---|
| 339 | nit: I think separate names for the FileID and the position would read better | |
| 371 | nit: space after if | |
| clang-tools-extra/clangd/XRefs.h | ||
| 57 | Do we want to bake this condition into the interface of this function, or would it be more appropriate to just tell the caller whether it's a real identifier token or not? In particular, given our plan for handling some dependent cases, the caller may want to do something along the lines of if (notRealIdentifier || isDependent) { /* use the textual heuristic */ }. | |
| clang-tools-extra/clangd/unittests/XRefsTests.cpp | ||
| 1007 | Tangentially related, but what do you think about the issue I raised in this mailing list thread about testcase style? | |
| 1039 | // (taking into account the penalty for going backwards) | |
| 1066 | nit: space after * | |
| 1073 | EXPECT_EQ? | |
Here's a test case that gives a less desirable result with this approach than D72874:
struct Foo {
  void uniqueMethodName();
};
struct Bar {
  void uniqueMethodName();
};
// Will call u^niqueMethodName() on t.
template <typename T>
void f(T t);This method only returns Bar::uniqueMethodName() because it's closer in terms of distance, whereas D72874 returns both methods.
Is that perhaps a reason to adjust the order in which we try the approaches (i.e. use this one as a fallback if index lookup fails)? Or should we try to allow multiple results with this approach as well?
Yeah, missing data is definitely bad:
- not finding results makes the feature feel flaky or unreliable
- finding *partial* results can be misleading and makes the feature untrustworthy
Is that perhaps a reason to adjust the order in which we try the approaches (i.e. use this one as a fallback if index lookup fails)?
Well, it's even easier to construct examples where the index approach is missing some data (most symbols are not indexable) or gives wrong results (name collisions are common) which is probably worse.
So I'm not convinced by this that the index is more accurate.
Or should we try to allow multiple results with this approach as well?
It's an interesting idea, but two immediate risks:
- it's clearly worse for things like comment navigation local variables
- it helps your example only quite narrowly: all the targets need to be in the current file
We have several building blocks and various ways for how to put them together.
I think we should go back to the use cases (e.g. dependent code) and work out how each is best served, my intuition is that pretty different. Discussing how the building blocks should relate to each other in the general case may run in circles a bit :-) I'll make a proposal on https://github.com/clangd/clangd/issues/241 for a bit more visibility.
I'll make a proposal on https://github.com/clangd/clangd/issues/241 for a bit more visibility.
OK, I wrote a bunch of stuff there (twice, laptop crashed, grr...)
I'll put this briefly on hold to see what you/others think there - e.g. the real-identifier bit is best decided with the final plan in mind.
| clang-tools-extra/clangd/unittests/XRefsTests.cpp | ||
|---|---|---|
| 1007 | Posted a reply on the thread. TL;DR: 
 | |
OK, I think this is probably finally good for another round - it's pretty well merged with the index-based version, and the common parts are pulled out to SourceCode and tested.
I couldn't resist adding support for backtick-identifiers and a few others while there...
This doesn't do the lookup using concentric semantic ranges described in the bug. I think what it does is probably good enough for now and probably won't have time to rebuild that part really soon...
Thanks, Sam, this looks great!
| clang-tools-extra/clangd/SourceCode.cpp | ||
|---|---|---|
| 883 | nit: qualify StringRef or not consistently | |
| 891 | It's interesting to note that clang has a lexer and parser for doxygen comments (see e.g. RawComment::parse()), so we could conceivably do something more structured, but it's probably not worth the effort. | |
| 938 | Expanded.size() == 1 implies !Expanded.empty() | |
| clang-tools-extra/clangd/SourceCode.h | ||
| 244 | quotes --> backticks? | |
| clang-tools-extra/clangd/XRefs.cpp | ||
| 451 | Is this any different from getSpellingLineNumber(Loc)? | |
| clang-tools-extra/clangd/unittests/SourceCodeTests.cpp | ||
| 392 ↗ | (On Diff #258193) | Maybe test the initialism thing with EXPECT_FALSE(word("// [[H^TTP]] ").LikelyIdentifier); | 
| clang-tools-extra/clangd/unittests/XRefsTests.cpp | ||
| 1028 | What's the rationale for supporting string literals for the nearby-ident heuristic, but not the index heuristic? | |
| 1032 | (Did you mean to write a test case where the macro invocation expands to nothing?) | |
quotes --> backticks?