This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
2/2
FindSymbols.h
-
FindSymbols.cpp
-
XRefs.h
30/31
XRefs.cpp
-
unittests/
1/1
XRefsTests.cpp

Differential D72874

[clangd] Add a textual fallback for go-to-definition
ClosedPublic

Authored by nridge on Jan 16 2020, 1:32 PM.

Download Raw Diff

Details

Reviewers

sammccall

Commits

rGdc4cd43904df: [clangd] Add a textual fallback for go-to-definition

Summary

This facilitates performing go-to-definition in contexts where AST-based resolution does not work, such as comments, string literals, preprocessor disabled regions, and macro definitions, based on textual lookup in the index.

Partially fixes https://github.com/clangd/clangd/issues/241

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nridge created this revision.Jan 16 2020, 1:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2020, 1:32 PM

Herald added subscribers: cfe-commits, usaxena95, kadircet and 4 others. · View Herald Transcript

Unit tests: pass. 61850 tests passed, 0 failed and 781 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster completed remote builds in B44195: Diff 238599.Jan 16 2020, 1:56 PM

nridge marked an inline comment as done.Jan 21 2020, 8:40 AM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
196	(There should be a return here, will fix locally.)

I've tried this out locally and it's fun! As suspected on the bug though, IMO it's far from accurate enough. Examples from clangd/Compiler.cpp:

it triggers on almost every word, even words that plainly don't refer to any decl like format [[lazily]], in case vlog is off. This means that e.g. (in VSCode) the underline on ctrl-hover gives no/misleading signal. It also means that missing your target now jumps you somewhere random instead of doing nothing.
when it works properly, the correct result usually mixed with incorrect results (e.g. createInvocationFromCommandLine sets [[DisableFree]]).
it doesn't work for some symbols - ones that are not indexable (e.g. RemappedFileBuffers will handle the lifetime of the [[Buffer]] pointer, gives a variety of wrong results)

So while I want to stress this is really cool, it doesn't feel reliable on any dimension: you can't trust clangd on whether the word is an actual reference, you can't trust any particular result, and you can't trust the correct result is in the set.

Some suggestions:

only trigger when there's *some* positive signal for the word.
- Markup like quotes/backticks/brackets/\p
- weird case like lowerCamel, UpperCamel, CAPS, mid-sentence Capitalization, under_scores.
- use of the word as a token in nearby code (very close if very short, anywhere in file if longer)
- (maybe you want to support ns::Qualifiers?)
post-filter aggressively - only return exact name matches (I think including case).
call fuzzyFind directly and set ProximityPath as well as the enclosing scopes from lexing. For extra strictness consider AnyScope=false
if you get more than 3 results, and none from current file, maybe don't return anything, as confidence is too low. Or try a stricter query...
handle the most common case of non-indexable symbols (local symbols) by running the query against the closest occurrence of the token in code.

Thanks for taking a look!

In D72874#1831977, @sammccall wrote:

it triggers on almost every word, even words that plainly don't refer to any decl like format [[lazily]], in case vlog is off. This means that e.g. (in VSCode) the underline on ctrl-hover gives no/misleading signal. It also means that missing your target now jumps you somewhere random instead of doing nothing.

Heh, I didn't realize VSCode had this feature. I do agree that it changes the tradeoffs a bit, as it means go-to-definition can be invoked in a context where there isn't an explicit signal from the user that they think there's a target there.

The other points you make are completely fair too. I will revise and take your suggestions into account.

I'll aim to start by factoring in enough of your suggestions to reduce the noise to an acceptable level for an initial landing, and leave some of the others for follow-up enhancements.

Address some review comments

I've addressed some of the review comments, with a view to getting something minimal we can land, and improve on in follow-up changes.

Mostly, I focused on the suggestions which reduce the number of results. I've left other suggestions which increase the number of results (e.g. handling non-indexed symbols) for follow-ups.

In D72874#1831977, @sammccall wrote:

only trigger when there's *some* positive signal for the word.

Markup like quotes/backticks/brackets/\p

weird case like lowerCamel, UpperCamel, CAPS, mid-sentence Capitalization, under_scores.

use of the word as a token in nearby code (very close if very short, anywhere in file if longer)

(maybe you want to support ns::Qualifiers?)

I currently handle lowerCamel, UpperCamel, CAPS, and under_scores. I've left the others as follow-ups.

post-filter aggressively - only return exact name matches (I think including case).

Done.

call fuzzyFind directly and set ProximityPath

Done.

as well as the enclosing scopes from lexing. For extra strictness consider AnyScope=false

I haven't done this yet, do you think it's important for an initial landing?

If so, could you mention what API you had in mind for determining "enclosing scopes from lexing"?

I had in mind using something like SelectionTree and collecting any RecordDecls or NamespaceDecls on the path from the common ancestor to the TU, but that's technically not "from lexing", so perhaps you have something else in mind.

if you get more than 3 results, and none from current file, maybe don't return anything, as confidence is too low. Or try a stricter query...

I implemented this, but my testing shows this causes a lot of results for class names to be excluded. The reason appears to be that fuzzyFind() returns the class and each of its constructors as distinct results, so if a class has more than two constructors, we'll have more than 3 results (and typically the class is declared in a different file).

Should we try to handle this case specifically (collapse a class name and its construtors to a single result), or should we reconsider this filtering criterion? It's not exactly clear to me what sort of bad behaviour it's intended to weed out.

handle the most common case of non-indexable symbols (local symbols) by running the query against the closest occurrence of the token in code.

I've left this as a follow-up.

Harbormaster completed remote builds in B47735: Diff 247529.Mar 1 2020, 5:26 PM

Thanks! The scope looks good to me now, on to implementation details.
I'm being a bit picky on the behaivor because go-to-def is a heavily-used feature, many users won't be expecting what we're doing here, and we can't reasonably expect them to understand the failure modes.
So, let's try hard not to fail :-)

This reminds me: it's not completely obvious what set of "act on symbol under the cursor" things this should (eventually) apply to.
I think not having e.g. find-references work makes sense - user should navigate to a "real" occurrence to resolve the ambiguity, and things like code actions are right out.
However having textDocument/hover work when we have high confidence in results would be really cool.
Obviously nothing in scope for this patch, but it seems worth writing this down somewhere, precisely because we shouldn't do it now.

In D72874#1900149, @nridge wrote:

I currently handle lowerCamel, UpperCamel, CAPS, and under_scores. I've left the others as follow-ups.

(sorry for shifting goalposts, I think CAPS may be too broad. Left a comment inline)

if you get more than 3 results, and none from current file, maybe don't return anything, as confidence is too low. Or try a stricter query...

I implemented this, but my testing shows this causes a lot of results for class names to be excluded. The reason appears to be that fuzzyFind() returns the class and each of its constructors as distinct results, so if a class has more than two constructors, we'll have more than 3 results (and typically the class is declared in a different file).

I think we should just drop constructor results, they'll always have this problem.
(There are other cases but this is the biggest).

handle the most common case of non-indexable symbols (local symbols) by running the query against the closest occurrence of the token in code.

I've left this as a follow-up.

Makes sense. I think this there's not a lot of new complexity here, we have the major pieces (getWordAtPosition, TokenBuffer, SelectionTree, targetDecl, index) but integration is definitely substantial.

I'd suggest we go down that path before adding complexity for the indexed-based path though, because I suspect it's going to handle many of the practical situations where the index-based approach needs a lot of help (and vice-versa).

clang-tools-extra/clangd/SourceCode.cpp
313 ↗	(On Diff #247529)	@kadircet is working on getting rid of this function because creating raw lexers is is wasteful and not actually very powerful. Mostly we're moving to syntax::TokenBuffer, which records actual lexed tokens, but that doesn't apply here. The examples in the tests seem like they'd be covered by something really simple, like enclosing identifier chars: unsigned Begin, End; for (Begin = Offset; Begin > 0 && isIdentifierBody(Code[Begin-1]); --BeginEnd) {} for (End = Offset; End < Code.size() && isIdentifierBody(Code[End]); ++End) {} return Code.slice(Begin, End); (Lexer::isIdentifierBodyChar requires langopts but just passes through DollarIdents to isIdentifierBody, and I don't think we care much about identifiers with $ in them.) If we really want to do something more subtle here, we should check it in SourceCodeTests.
clang-tools-extra/clangd/SourceCode.h
93 ↗	(On Diff #247529)	consider moving the isLikelyToBeIdentifier check inside. The current API is pretty general and it's not clear yet what (else) it's good for so it's nice to direct towards intended usage. Also doing the identifier check inside this function is more convenient when it relies on markers outside the identifier range (like doxygen `\p` or backtick-quoted identifiers) That said, you may still want to return the range when it's not a likely identifier, with a signature like `StringRef getWordAtPosition(bool *LikelyIdentifier = nullptr)`. I'm thinking of the future case where the caller wants to find a nearby matching token and resolve it - resolving belongs in the caller so there's not much point having this function duplicate the check.
93 ↗	(On Diff #247529)	This doesn't use the SourceManager-structure of the file, so the natural signature would be `StringRef getWordAtPosition(StringRef Code, unsigned Offset)`. (what are the practical cases where langopts is relevant?)
clang-tools-extra/clangd/XRefs.cpp
184	nit: mention snake_case, MACRO_CASE?
188	nit: can you mention this catches lowerCamel and UpperCamel
189	nit: prefer llvm::isUppercase to avoid locales
189	this will fire for initialisms like `HTTP`. I think we want to require both upper and lowercase letters.
208	I think this is dead - we're just sorting by score.
212	this function should have a high-level comment describing the strategy and the limitations (e.g. idea of extending it to resolve nearby matching tokens). A name like `locateSymbolNamedTextuallyAt` would better describe what this does, rather than what its caller does. I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through `locateSymbolAt`. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify.
226	FWIW the API for this is visibleNamespaces() from SourceCode.cpp. (No enclosing classes, but I suspect we can live without them once we have a nearby-tokens solution too)
229	If we're bailing out on >3, I think this limit should be aiming to detect when there's >3, and avoid fetching way too much data, but not trying to avoid noise. (I'd suggest 10 or so)
234	This seems dead, you're requiring exact matches, these will always have the same score.
236	This is an interesting signal, I think there are two sensible ways to go about it: assume results in this file are more likely accurate than those in other files. In this case we should at minimum be using this in ranking, but really we should just drop all cross-file results if we have an in-file one. don't rely on index for main-file cases, and rely on "find nearby matching token and resolve it instead". That can easily handled cases defined/referenced in the main-file with sufficient accuracy, including non-indexed symbols. So here we can assume this signal is always false, and drop it.
237	BTW I think the answer for constructors is just to drop all constructor results here. (This also affects template specializations which I think we can not worry about, and virtual method hierarchies which are more painful but I also wouldn't try to fix now)
238	I'm not sure why we're using SymbolToLocation here: Main file URI check: the `Symbol` has URIs. They need to be canonicalized to file URIs before comparison. This allows checking both decl and def location. PreferredDeclaration and Definition can be more easily set directly from the `Symbol`
244	I wouldn't bother qualifying this as "for now". Any code is subject to change in the future, but requiring an exact name match for index-based results seems more like a design decision than a fixme.
270	I don't think this should be logged, particularly by default - it doesn't really indicate anything other than we should have a "look up symbol by name" API (ok, actually I think this is just dead code because we've already checked name above)
clang-tools-extra/clangd/unittests/XRefsTests.cpp
599	`#ifdef`'d out code is another interesting motivation worth testing.

I'm playing with a prototype of the token-based approach, a couple of follow-ups from that.

I've split out functions to handle file/macro/ast from locateSymbolAt in e7de00cf974a4e30d4900518ae8473a117efbd6c - hopefully an easy merge, you're adding another one.

I think having this trigger where the identifier is an actual token in the program is a surprisingly invasive change and runs a strong risk of confusing users (who can't distinguish these textual heuristics from normal go-to-def behaviour, and rely on its accuracy), and we shouldn't do it without a lot more testing.
I think the way to implement this is to call getMacroArgExpandedLocation on the start of the "token" we found, and feed the result into TokenBuffer::expandedTokens(SourceRange). If we get an empty list back, then the parser didn't see this token and we're good to proceed without any overlap with the strict AST-based options.
This will leave comments, strings, and #ifdef'd sections should work fine, but not dependent or broken code. (Many cases of broken code can be fixed using RecoveryExpr which is finally going to land)

sammccall added inline comments.Mar 2 2020, 10:10 AM

clang-tools-extra/clangd/SourceCode.cpp
313 ↗	(On Diff #247529)	Mostly we're moving to syntax::TokenBuffer, which records actual lexed tokens, but that doesn't apply here. Oops, this isn't true - token buffer's expanded token stream has "real" tokens, but the spelled token streams use the raw lexer. You can just use spelledIdentifierTouching(), I think.

sammccall added inline comments.Mar 2 2020, 10:12 AM

clang-tools-extra/clangd/SourceCode.cpp
313 ↗	(On Diff #247529)	You can just use spelledIdentifierTouching(), I think. Sorry disregard this, obviously it doesn't work in comments etc. Need more coffee...

Thanks for all the comments Sam! I'll have a detailed look tomorrow, but I wanted to follow up on this:

In D72874#1901383, @sammccall wrote:

I think having this trigger where the identifier is an actual token in the program is a surprisingly invasive change and runs a strong risk of confusing users (who can't distinguish these textual heuristics from normal go-to-def behaviour, and rely on its accuracy), and we shouldn't do it without a lot more testing.

The "dependent code" use case is a pretty important one in my eyes.

In one of the codebases I work on, we have a fair amount of code like this:

template <typename T>
void foo(T t) {
   // ...
   t.someUniqueMethodName();
   // ...
   t.someOtherUniqueMethodName();
   // ...
}

The code is in practice only instantiated with a handful of types for T (often just two). (But we don't have a way to express this in the code at this time.) Being able to invoke go-to-definition at e.g. someUniqueMethodName and get the definition sites of the corresponding handful of methods, as opposed to nothing at all, is something I'd really like to get working. I'm open to suggestions to how we can test this better, or scope the behaviour more narrowly to avoid other unintended results for real tokens.

In D72874#1901606, @nridge wrote:

The "dependent code" use case is a pretty important one in my eyes.

In one of the codebases I work on, we have a fair amount of code like this:

Yep, fair enough. And I don't think that this is so bad for say DependentDeclRefExpr, where we're already doing heuristic stuff (and the user can reasonably understand that we might).
I'm more concerned that it might trigger at arbitrary times, like say on [[^noreturn]] void abort();.

But we can distinguish these cases! SelectionTree recognizes DependentDeclRefExpr and friends even if targetDecl can't resolve them. So I think we can use a whitelist: the AST part of locateSymbol reports the type of node that owned TouchedIdentifier, and if it's one of the types we want to use textual fallback for, then we go ahead with the fallback code (in addition to the cases I mentioned where the touched word doesn't turn out to be a real identifier).
We can even try to glean more info, e.g. if it's a CXXDependentScopeMemberExpr then we can filter out non-member index results.

(Tactically I think it makes sense to add the basic fallback logic, and follow up with the dependent-code entrypoints, but up to you)

sammccall mentioned this in D75479: [clangd] go-to-def on names in comments etc that are used nearby..Mar 2 2020, 1:45 PM

I'm posting some partial responses because I have some questions (really just one, about fuzzy matching).

In general the comments seem reasonable and I plan to address all of them.

(I've marked some comments as done because I've addressed them locally. I'm not uploading a revised patch yet because it wouldn't be very interesting.)

clang-tools-extra/clangd/XRefs.cpp
212	I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through locateSymbolAt. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify. I was going to push back against this, but I ended up convincing myself that your suggestion is better :) For the record, the consideration that convinced me was: Suppose in the future we add fancier AST-based logic that handles a case like `T().foo()` (for example, by surveying types for which `T` is actually substituted, and offering `foo()` inside those types). If all we're testing is "navigation works for this case" rather than "navigation works for this case via the AST-based mechanism", we could regress the AST logic but have our test still pass because the testcase is simple enough that the text-based navigation fallback (that we're adding here) works as well.
238	Well the `Symbol` has `SymbolLocation`s and we need protocol `Location`s, so we have to use something to convert them. Other places that perform such conversion use `symbolToLocation()`, so I reused it. But you're right that `symbolToLocation()` also has some "pick the definition or the declaration" logic which is less appropriate here. I can factor out the `SymbolLocation` --> `Location` conversion logic from `symbolToLocation()`, and just use that here.
244	Do we want to rule out the possibility of handling typos in an identifier name in a comment (in cases where we have high confidence in the match, e.g. a long / unique name, small edit distance, only one potential match) in the future? This is also relevant to whether we want to keep the `FuzzyMatcher` or not.

Rebase onto D75479 and address most review comments

Comments remaining to be addressed:

revising the tests to exercise locateSymbolNamedTextuallyAt() directly
comments related to fuzzy matching (I have an outstanding question about that)

Handling of dependent code has been deferred to a follow-up change

In D72874#1900648, @sammccall wrote:

This reminds me: it's not completely obvious what set of "act on symbol under the cursor" things this should (eventually) apply to.
I think not having e.g. find-references work makes sense - user should navigate to a "real" occurrence to resolve the ambiguity, and things like code actions are right out.
However having textDocument/hover work when we have high confidence in results would be really cool.
Obviously nothing in scope for this patch, but it seems worth writing this down somewhere, precisely because we shouldn't do it now.

Agreed. Filed https://github.com/clangd/clangd/issues/303 for hover.

In D72874#1901722, @sammccall wrote:

(Tactically I think it makes sense to add the basic fallback logic, and follow up with the dependent-code entrypoints, but up to you)

Yep, will handle dependent code in a folow-up patch.

clang-tools-extra/clangd/SourceCode.h
93 ↗	(On Diff #247529)	Now that I'm using `wordTouching()` from D75479, I think this comment no longer applies?
clang-tools-extra/clangd/XRefs.cpp
212	Renamed and comment added. I still need to revise the tests.
226	Thanks, that's convenient! Out of curiosity, though: is the reason to prefer this lexer-based approach over hit-testing the query location against `NamespaceDecl`s in the AST, mainly for performance?
228	It occured to me that I don't think we can do `AnyScope=false` if we want to handle dependent member cases like `T().uniqueMethodName()`. The members we want to find in such a case will often be both in a different file (so nearby-tokens won't handle them) and not in any visible scope.
236	Since you've implemented "find nearby matching token and resolve it", I went with the second approach.

nridge marked an inline comment as done.Mar 5 2020, 4:53 PM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
489	Sorry this location-setting code is so messy. All my attempts to make it more concise have been thwarted by `llvm::Expected`'s very restrictive API.

Harbormaster completed remote builds in B48285: Diff 248630.Mar 5 2020, 4:59 PM

I'd like to sync up briefly on https://github.com/clangd/clangd/issues/241 so we know where we want to end up.

I think this is in good shape and certainly doesn't need a bigger scope, just want to be able to reason about how things will fit together.

clang-tools-extra/clangd/SourceCode.h
93 ↗	(On Diff #247529)	I think the reasons still apply - D75479 doesn't need to check likelihood (it considers actual use as identifier evidence enough) so I didn't include it there, but we should eventually merge these more thoroughly I think. No need to do that until we actually want to implement different heuristics though.
clang-tools-extra/clangd/XRefs.cpp
226	Well, it was written for fallback code completion when we have no AST at all :-) Gathering from the AST should be better, though it's not quite as simple as hit-testing (you also have to find `using namespace`). But this exists today, which is a feature!
244	No idea whether typo-correction is a good idea in principle - tradeoff between current false negatives and false positives+compute. However neither FuzzyMatcher nor the existing index implementations support/can easily support real typo correction, and it seems implausible to me we'd add it for this feature. Compare to e.g: allowing case-insensitive match in some cases: `fooBar` vs `FooBar` is a plausible "typo". This is easy to implement. correct the typo where we see the fixed version used as an identifier in this file (and not the original). Excludes some cases, but drives false-positives way down, and easy to implement. I don't think we need to rule things out, but I'm uncertain enough about the approach to think that putting comments, fuzzymatcher etc here speculatively isn't worth it.
489	Ugh, don't get me started on Error/Expected :-( I'd love to get rid of it somehow, but it seems like we'd inevitably just end up with the new thing + Error/Expected + error_code/ErrorOr + return-a-bool, and I'm not sure it'd be better. (If you have more energy than me, I'd enthusiastically +1 an llvm-dev proposal to drop the clever checks from llvm::Error, and I know some others who would...)

nridge marked an inline comment as done.Mar 9 2020, 8:00 AM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
244	Perhaps I'm unclear on the distinction between fuzzy matching and typo correction. Are they not both a matter of comparing a candidate string against a test string, and considering it a match if the they are "close enough" according to some metric (with the metric potentially being a simple edit distance in the case of typo correction)?

I've started to update the patch to be in line with the direction discussed in the issue.

@sammccall, how would you like to proceed logistically:

Do you plan to land (a possibly modified version of) D75479?
Or should I combine that patch into this one?

In D72874#1915840, @nridge wrote:

I've started to update the patch to be in line with the direction discussed in the issue.

@sammccall, how would you like to proceed logistically:

Do you plan to land (a possibly modified version of) D75479?

Or should I combine that patch into this one?

This patch looks good, I wouldn't bother redesigning anything, we should iterate instead.

You should go ahead, and I'll merge, and then we should work towards enabling dependent code use cases etc. SG?

clang-tools-extra/clangd/FindSymbols.h
25	nit: these names are vague and echo the type signature, maybe indexToLSPLocation?
26	nit: HintPath should be TUPath, the decision to use some other path as a TU path can only be made in the caller (needs context). (Same is true for symbolToLocation, I'm not sure when that became public)
clang-tools-extra/clangd/XRefs.cpp
486	(The fuzzy matcher and topN are still here - I think we don't need them, right? With only up-to-3 results, std::sort seems more obvious)
488	maybe bail out early (on unusable/too many) instead of doing all the score computations first? fuzzyFind(..., { // bail out if it's a constructor or name doesn't match if (Results.size() >= 3) { TooMany = true; return; } // add result });

This revision is now accepted and ready to land.Mar 11 2020, 8:44 AM

Remove fuzzy matching
Rebase to apply to head, taking only the parts from D75479 that I need for index-based lookup (such as wordTouching())
Revise tests so they exercise locateSymbolNamedTextuallyAt() directly, except for one smoke test
Add tests that verify that we do not trigger on dependent or broken code

Herald added a subscriber: mgrang. · View Herald TranscriptMar 12 2020, 12:12 PM

nridge marked 2 inline comments as done.Mar 12 2020, 12:16 PM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
441	Oh whoops, this assumption is another dependency on `findNearbyIdentifier()`

nridge edited the summary of this revision. (Show Details)Mar 12 2020, 12:23 PM

Tweak a comment

nridge marked an inline comment as done.Mar 12 2020, 12:58 PM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
441	For now, I just had it restrict to 3 results in general (even if they're in the same file). Once `findNearbyIdentifier()` lands, the behaviour will automatically become what we intended.

I should mention that in my local usage, I've found the restriction on no more than 3 results (even if they're not in the current file) to be somewhat limiting. For example, a comment can easily reference the name of a function which has more than 3 overloads.

But we can start by landing this, and consider relaxing the limit (either in general, or in specific cases such as the overload set case) in follow-ups.

(Also just to clarify: while I said on Discord that I already implemented exclusion of string literals, I actually ended up deferring that part to a follow-up because it wasn't working as I expected.)

Harbormaster failed remote builds in B49037: Diff 250018!Mar 12 2020, 1:34 PM

Closed by commit rGdc4cd43904df: [clangd] Add a textual fallback for go-to-definition (authored by sammccall, committed by nridge). · Explain WhyMar 12 2020, 1:34 PM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B49045: Diff 250031!Mar 12 2020, 2:07 PM

sammccall mentioned this in rG3f1c2bf1712c: [clangd] go-to-def on names in comments etc that are used nearby..Apr 22 2020, 10:53 AM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

4 lines

28 lines

15 lines

146 lines

unittests/

XRefsTests.cpp

80 lines

Diff 248630

clang-tools-extra/clangd/FindSymbols.h

	Show All 15 Lines
	#include "index/Symbol.h"			#include "index/Symbol.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {
	class ParsedAST;			class ParsedAST;
	class SymbolIndex;			class SymbolIndex;

				/// Helper function for deriving an LSP Location from a SymbolLocation.
				llvm::Expected<Location> symbolLocationToLocation(const SymbolLocation &Loc,
				sammccallUnsubmitted Done Reply Inline Actions nit: these names are vague and echo the type signature, maybe indexToLSPLocation? sammccall: nit: these names are vague and echo the type signature, maybe indexToLSPLocation?
				llvm::StringRef HintPath);
				sammccallUnsubmitted Done Reply Inline Actions nit: HintPath should be TUPath, the decision to use some other path as a TU path can only be made in the caller (needs context). (Same is true for symbolToLocation, I'm not sure when that became public) sammccall: nit: HintPath should be TUPath, the decision to use some other path as a TU path can only be…

	/// Helper function for deriving an LSP Location for a Symbol.			/// Helper function for deriving an LSP Location for a Symbol.
	llvm::Expected<Location> symbolToLocation(const Symbol &Sym,			llvm::Expected<Location> symbolToLocation(const Symbol &Sym,
	llvm::StringRef HintPath);			llvm::StringRef HintPath);

	/// Searches for the symbols matching \p Query. The syntax of \p Query can be			/// Searches for the symbols matching \p Query. The syntax of \p Query can be
	/// the non-qualified name or fully qualified of a symbol. For example,			/// the non-qualified name or fully qualified of a symbol. For example,
	/// "vector" will match the symbol std::vector and "std::vector" would also			/// "vector" will match the symbol std::vector and "std::vector" would also
	/// match it. Direct children of scopes (namepaces, etc) can be listed with a			/// match it. Direct children of scopes (namepaces, etc) can be listed with a
	Show All 18 Lines

clang-tools-extra/clangd/FindSymbols.cpp

	Show All 12 Lines
	#include "ParsedAST.h"			#include "ParsedAST.h"
	#include "Quality.h"			#include "Quality.h"
	#include "SourceCode.h"			#include "SourceCode.h"
	#include "index/Index.h"			#include "index/Index.h"
	#include "clang/AST/DeclTemplate.h"			#include "clang/AST/DeclTemplate.h"
	#include "clang/Index/IndexDataConsumer.h"			#include "clang/Index/IndexDataConsumer.h"
	#include "clang/Index/IndexSymbol.h"			#include "clang/Index/IndexSymbol.h"
	#include "clang/Index/IndexingAction.h"			#include "clang/Index/IndexingAction.h"
				#include "llvm/ADT/StringRef.h"
	#include "llvm/Support/FormatVariadic.h"			#include "llvm/Support/FormatVariadic.h"
	#include "llvm/Support/Path.h"			#include "llvm/Support/Path.h"
	#include "llvm/Support/ScopedPrinter.h"			#include "llvm/Support/ScopedPrinter.h"

	#define DEBUG_TYPE "FindSymbols"			#define DEBUG_TYPE "FindSymbols"

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {

	namespace {			namespace {
	using ScoredSymbolInfo = std::pair<float, SymbolInformation>;			using ScoredSymbolInfo = std::pair<float, SymbolInformation>;
	struct ScoredSymbolGreater {			struct ScoredSymbolGreater {
	bool operator()(const ScoredSymbolInfo &L, const ScoredSymbolInfo &R) {			bool operator()(const ScoredSymbolInfo &L, const ScoredSymbolInfo &R) {
	if (L.first != R.first)			if (L.first != R.first)
	return L.first > R.first;			return L.first > R.first;
	return L.second.name < R.second.name; // Earlier name is better.			return L.second.name < R.second.name; // Earlier name is better.
	}			}
	};			};

	} // namespace			} // namespace

	llvm::Expected<Location> symbolToLocation(const Symbol &Sym,			llvm::Expected<Location> symbolLocationToLocation(const SymbolLocation &Loc,
	llvm::StringRef HintPath) {			llvm::StringRef HintPath) {
	// Prefer the definition over e.g. a function declaration in a header			auto Path = URI::resolve(Loc.FileURI, HintPath);
	auto &CD = Sym.Definition ? Sym.Definition : Sym.CanonicalDeclaration;
	auto Path = URI::resolve(CD.FileURI, HintPath);
	if (!Path) {			if (!Path) {
	return llvm::make_error<llvm::StringError>(			return llvm::make_error<llvm::StringError>(
	formatv("Could not resolve path for symbol '{0}': {1}",			llvm::formatv("Could not resolve path for file '{0}': {1}", Loc.FileURI,
	Sym.Name, llvm::toString(Path.takeError())),			llvm::toString(Path.takeError())),
	llvm::inconvertibleErrorCode());			llvm::inconvertibleErrorCode());
	}			}
	Location L;			Location L;
	// Use HintPath as TUPath since there is no TU associated with this			// Use HintPath as TUPath since there is no TU associated with this
	// request.			// request.
	L.uri = URIForFile::canonicalize(*Path, HintPath);			L.uri = URIForFile::canonicalize(*Path, HintPath);
	Position Start, End;			Position Start, End;
	Start.line = CD.Start.line();			Start.line = Loc.Start.line();
	Start.character = CD.Start.column();			Start.character = Loc.Start.column();
	End.line = CD.End.line();			End.line = Loc.End.line();
	End.character = CD.End.column();			End.character = Loc.End.column();
	L.range = {Start, End};			L.range = {Start, End};
	return L;			return L;
	}			}

				llvm::Expected<Location> symbolToLocation(const Symbol &Sym,
				llvm::StringRef HintPath) {
				// Prefer the definition over e.g. a function declaration in a header
				return symbolLocationToLocation(
				Sym.Definition ? Sym.Definition : Sym.CanonicalDeclaration, HintPath);
				}

	llvm::Expected<std::vector<SymbolInformation>>			llvm::Expected<std::vector<SymbolInformation>>
	getWorkspaceSymbols(llvm::StringRef Query, int Limit,			getWorkspaceSymbols(llvm::StringRef Query, int Limit,
	const SymbolIndex *const Index, llvm::StringRef HintPath) {			const SymbolIndex *const Index, llvm::StringRef HintPath) {
	std::vector<SymbolInformation> Result;			std::vector<SymbolInformation> Result;
	if (Query.empty() \|\| !Index)			if (Query.empty() \|\| !Index)
	return Result;			return Result;

	auto Names = splitQualifiedName(Query);			auto Names = splitQualifiedName(Query);
	▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

clang-tools-extra/clangd/XRefs.h

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

	// If SpellingLoc points at a "word" that does not correspond to an expanded			// If SpellingLoc points at a "word" that does not correspond to an expanded
	// token (e.g. in a comment, a string, or a PP disabled region), then try to			// token (e.g. in a comment, a string, or a PP disabled region), then try to
	// find a close occurrence of that word that does.			// find a close occurrence of that word that does.
	// (This is for internal use by locateSymbolAt, and is exposed for testing).			// (This is for internal use by locateSymbolAt, and is exposed for testing).
	const syntax::Token *findNearbyIdentifier(SourceLocation SpellingLoc,			const syntax::Token *findNearbyIdentifier(SourceLocation SpellingLoc,
	const syntax::TokenBuffer &TB);			const syntax::TokenBuffer &TB);

				// Tries to provide a textual fallback for locating a symbol referenced at
				// a location, by looking up the word under the cursor as a symbol name in the
				// index. The aim is to pick up references to symbols in contexts where
				// AST-based resolution does not work, such as comments, strings, and PP
				// disabled regions. The implementation takes a number of measures to avoid
				// false positives, such as looking for some signal that the word at the
				// given location is likely to be an identifier. The function does not
				// currently return results for locations that end up as real expanded
				// tokens, although this may be relaxed for e.g. dependent code in the future.
				// (This is for internal use by locateSymbolAt, and is exposed for testing).
				std::vector<LocatedSymbol>
				locateSymbolNamedTextuallyAt(ParsedAST &AST, const SymbolIndex *Index,
				SourceLocation Loc,
				const std::string &MainFilePath);

	/// Get all document links			/// Get all document links
	std::vector<DocumentLink> getDocumentLinks(ParsedAST &AST);			std::vector<DocumentLink> getDocumentLinks(ParsedAST &AST);

	/// Returns highlights for all usages of a symbol at \p Pos.			/// Returns highlights for all usages of a symbol at \p Pos.
	std::vector<DocumentHighlight> findDocumentHighlights(ParsedAST &AST,			std::vector<DocumentHighlight> findDocumentHighlights(ParsedAST &AST,
	Position Pos);			Position Pos);

	struct ReferencesResult {			struct ReferencesResult {
	Show All 33 Lines

clang-tools-extra/clangd/XRefs.cpp

//===--- XRefs.cpp ------------------------------------------------ C++--===//		//===--- XRefs.cpp ------------------------------------------------ C++--===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#include "XRefs.h"		#include "XRefs.h"
#include "AST.h"		#include "AST.h"
#include "CodeCompletionStrings.h"		#include "CodeCompletionStrings.h"
#include "FindSymbols.h"		#include "FindSymbols.h"
#include "FindTarget.h"		#include "FindTarget.h"
		#include "FuzzyMatch.h"
#include "Logger.h"		#include "Logger.h"
#include "ParsedAST.h"		#include "ParsedAST.h"
#include "Protocol.h"		#include "Protocol.h"
		#include "Quality.h"
#include "Selection.h"		#include "Selection.h"
#include "SourceCode.h"		#include "SourceCode.h"
#include "URI.h"		#include "URI.h"
#include "index/Index.h"		#include "index/Index.h"
#include "index/Merge.h"		#include "index/Merge.h"
#include "index/Relation.h"		#include "index/Relation.h"
#include "index/SymbolLocation.h"		#include "index/SymbolLocation.h"
#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
Show All 20 Lines
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include <cctype>

namespace clang {		namespace clang {
namespace clangd {		namespace clangd {
namespace {		namespace {

// Returns the single definition of the entity declared by D, if visible.		// Returns the single definition of the entity declared by D, if visible.
// In particular:		// In particular:
// - for non-redeclarable kinds (e.g. local vars), return D		// - for non-redeclarable kinds (e.g. local vars), return D
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	llvm::Optional<Location> makeLocation(const ASTContext &AST, SourceLocation Loc,
L.range = halfOpenToRange(		L.range = halfOpenToRange(
SM, CharSourceRange::getCharRange(Loc, Loc.getLocWithOffset(TokLen)));		SM, CharSourceRange::getCharRange(Loc, Loc.getLocWithOffset(TokLen)));
return L;		return L;
}		}

} // namespace		} // namespace

// Treat #included files as symbols, to enable go-to-definition on them.		// Treat #included files as symbols, to enable go-to-definition on them.
static llvm::Optional<LocatedSymbol>		static llvm::Optional<LocatedSymbol>
		sammccallUnsubmitted Done Reply Inline Actions nit: mention snake_case, MACRO_CASE? sammccall: nit: mention snake_case, MACRO_CASE?
locateFileReferent(const Position &Pos, ParsedAST &AST,		locateFileReferent(const Position &Pos, ParsedAST &AST,
llvm::StringRef MainFilePath) {		llvm::StringRef MainFilePath) {
for (auto &Inc : AST.getIncludeStructure().MainFileIncludes) {		for (auto &Inc : AST.getIncludeStructure().MainFileIncludes) {
if (!Inc.Resolved.empty() && Inc.R.start.line == Pos.line) {		if (!Inc.Resolved.empty() && Inc.R.start.line == Pos.line) {
		sammccallUnsubmitted Done Reply Inline Actions nit: can you mention this catches lowerCamel and UpperCamel sammccall: nit: can you mention this catches lowerCamel and UpperCamel
LocatedSymbol File;		LocatedSymbol File;
		sammccallUnsubmitted Done Reply Inline Actions nit: prefer llvm::isUppercase to avoid locales sammccall: nit: prefer llvm::isUppercase to avoid locales
		sammccallUnsubmitted Done Reply Inline Actions this will fire for initialisms like `HTTP`. I think we want to require both upper and lowercase letters. sammccall: this will fire for initialisms like `HTTP`. I think we want to require both upper and…
File.Name = std::string(llvm::sys::path::filename(Inc.Resolved));		File.Name = std::string(llvm::sys::path::filename(Inc.Resolved));
File.PreferredDeclaration = {		File.PreferredDeclaration = {
URIForFile::canonicalize(Inc.Resolved, MainFilePath), Range{}};		URIForFile::canonicalize(Inc.Resolved, MainFilePath), Range{}};
File.Definition = File.PreferredDeclaration;		File.Definition = File.PreferredDeclaration;
// We're not going to find any further symbols on #include lines.		// We're not going to find any further symbols on #include lines.
return File;		return File;
}		}
		nridgeAuthorUnsubmitted Done Reply Inline Actions (There should be a return here, will fix locally.) nridge: (There should be a return here, will fix locally.)
}		}
return llvm::None;		return llvm::None;
}		}

// Macros are simple: there's no declaration/definition distinction.		// Macros are simple: there's no declaration/definition distinction.
// As a consequence, there's no need to look them up in the index either.		// As a consequence, there's no need to look them up in the index either.
static llvm::Optional<LocatedSymbol>		static llvm::Optional<LocatedSymbol>
locateMacroReferent(const syntax::Token &TouchedIdentifier, ParsedAST &AST,		locateMacroReferent(const syntax::Token &TouchedIdentifier, ParsedAST &AST,
llvm::StringRef MainFilePath) {		llvm::StringRef MainFilePath) {
if (auto M = locateMacroAt(TouchedIdentifier, AST.getPreprocessor())) {		if (auto M = locateMacroAt(TouchedIdentifier, AST.getPreprocessor())) {
if (auto Loc = makeLocation(AST.getASTContext(),		if (auto Loc = makeLocation(AST.getASTContext(),
M->Info->getDefinitionLoc(), MainFilePath)) {		M->Info->getDefinitionLoc(), MainFilePath)) {
		sammccallUnsubmitted Done Reply Inline Actions I think this is dead - we're just sorting by score. sammccall: I think this is dead - we're just sorting by score.
LocatedSymbol Macro;		LocatedSymbol Macro;
Macro.Name = std::string(M->Name);		Macro.Name = std::string(M->Name);
Macro.PreferredDeclaration = *Loc;		Macro.PreferredDeclaration = *Loc;
Macro.Definition = Loc;		Macro.Definition = Loc;
		sammccallUnsubmitted Done Reply Inline Actions this function should have a high-level comment describing the strategy and the limitations (e.g. idea of extending it to resolve nearby matching tokens). A name like `locateSymbolNamedTextuallyAt` would better describe what this does, rather than what its caller does. I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through `locateSymbolAt`. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify. sammccall: this function should have a high-level comment describing the strategy and the limitations (e.g.
		nridgeAuthorUnsubmitted Done Reply Inline Actions I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through locateSymbolAt. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify. I was going to push back against this, but I ended up convincing myself that your suggestion is better :) For the record, the consideration that convinced me was: Suppose in the future we add fancier AST-based logic that handles a case like `T().foo()` (for example, by surveying types for which `T` is actually substituted, and offering `foo()` inside those types). If all we're testing is "navigation works for this case" rather than "navigation works for this case via the AST-based mechanism", we could regress the AST logic but have our test still pass because the testcase is simple enough that the text-based navigation fallback (that we're adding here) works as well. nridge: > I would strongly consider exposing this function publicly for the detailed tests, and only…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Renamed and comment added. I still need to revise the tests. nridge: Renamed and comment added. I still need to revise the tests.
return Macro;		return Macro;
}		}
}		}
return llvm::None;		return llvm::None;
}		}

// Decls are more complicated.		// Decls are more complicated.
// The AST contains at least a declaration, maybe a definition.		// The AST contains at least a declaration, maybe a definition.
// These are up-to-date, and so generally preferred over index results.		// These are up-to-date, and so generally preferred over index results.
// We perform a single batch index lookup to find additional definitions.		// We perform a single batch index lookup to find additional definitions.
static std::vector<LocatedSymbol>		static std::vector<LocatedSymbol>
locateASTReferent(SourceLocation CurLoc, const syntax::Token *TouchedIdentifier,		locateASTReferent(SourceLocation CurLoc, const syntax::Token *TouchedIdentifier,
ParsedAST &AST, llvm::StringRef MainFilePath,		ParsedAST &AST, llvm::StringRef MainFilePath,
const SymbolIndex *Index) {		const SymbolIndex *Index) {
		sammccallUnsubmitted Done Reply Inline Actions FWIW the API for this is visibleNamespaces() from SourceCode.cpp. (No enclosing classes, but I suspect we can live without them once we have a nearby-tokens solution too) sammccall: FWIW the API for this is visibleNamespaces() from SourceCode.cpp. (No enclosing classes, but I…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Thanks, that's convenient! Out of curiosity, though: is the reason to prefer this lexer-based approach over hit-testing the query location against `NamespaceDecl`s in the AST, mainly for performance? nridge: Thanks, that's convenient! Out of curiosity, though: is the reason to prefer this lexer-based…
		sammccallUnsubmitted Done Reply Inline Actions Well, it was written for fallback code completion when we have no AST at all :-) Gathering from the AST should be better, though it's not quite as simple as hit-testing (you also have to find `using namespace`). But this exists today, which is a feature! sammccall: Well, it was written for fallback code completion when we have no AST at all :-) Gathering…
const SourceManager &SM = AST.getSourceManager();		const SourceManager &SM = AST.getSourceManager();
// Results follow the order of Symbols.Decls.		// Results follow the order of Symbols.Decls.
		nridgeAuthorUnsubmitted Done Reply Inline Actions It occured to me that I don't think we can do `AnyScope=false` if we want to handle dependent member cases like `T().uniqueMethodName()`. The members we want to find in such a case will often be both in a different file (so nearby-tokens won't handle them) and not in any visible scope. nridge: It occured to me that I don't think we can do `AnyScope=false` if we want to handle dependent…
std::vector<LocatedSymbol> Result;		std::vector<LocatedSymbol> Result;
		sammccallUnsubmitted Done Reply Inline Actions If we're bailing out on >3, I think this limit should be aiming to detect when there's >3, and avoid fetching way too much data, but not trying to avoid noise. (I'd suggest 10 or so) sammccall: If we're bailing out on >3, I think this limit should be aiming to detect when there's >3, and…
// Keep track of SymbolID -> index mapping, to fill in index data later.		// Keep track of SymbolID -> index mapping, to fill in index data later.
llvm::DenseMap<SymbolID, size_t> ResultIndex;		llvm::DenseMap<SymbolID, size_t> ResultIndex;

auto AddResultDecl = [&](const NamedDecl *D) {		auto AddResultDecl = [&](const NamedDecl *D) {
const NamedDecl *Def = getDefinition(D);		const NamedDecl *Def = getDefinition(D);
		sammccallUnsubmitted Done Reply Inline Actions This seems dead, you're requiring exact matches, these will always have the same score. sammccall: This seems dead, you're requiring exact matches, these will always have the same score.
const NamedDecl *Preferred = Def ? Def : D;		const NamedDecl *Preferred = Def ? Def : D;

		sammccallUnsubmitted Done Reply Inline Actions This is an interesting signal, I think there are two sensible ways to go about it: assume results in this file are more likely accurate than those in other files. In this case we should at minimum be using this in ranking, but really we should just drop all cross-file results if we have an in-file one. don't rely on index for main-file cases, and rely on "find nearby matching token and resolve it instead". That can easily handled cases defined/referenced in the main-file with sufficient accuracy, including non-indexed symbols. So here we can assume this signal is always false, and drop it. sammccall: This is an interesting signal, I think there are two sensible ways to go about it: - assume…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Since you've implemented "find nearby matching token and resolve it", I went with the second approach. nridge: Since you've implemented "find nearby matching token and resolve it", I went with the second…
auto Loc = makeLocation(AST.getASTContext(), nameLocation(*Preferred, SM),		auto Loc = makeLocation(AST.getASTContext(), nameLocation(*Preferred, SM),
		sammccallUnsubmitted Done Reply Inline Actions BTW I think the answer for constructors is just to drop all constructor results here. (This also affects template specializations which I think we can not worry about, and virtual method hierarchies which are more painful but I also wouldn't try to fix now) sammccall: BTW I think the answer for constructors is just to drop all constructor results here. (This…
MainFilePath);		MainFilePath);
		sammccallUnsubmitted Done Reply Inline Actions I'm not sure why we're using SymbolToLocation here: Main file URI check: the `Symbol` has URIs. They need to be canonicalized to file URIs before comparison. This allows checking both decl and def location. PreferredDeclaration and Definition can be more easily set directly from the `Symbol` sammccall: I'm not sure why we're using SymbolToLocation here: - Main file URI check: the `Symbol` has…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Well the `Symbol` has `SymbolLocation`s and we need protocol `Location`s, so we have to use something to convert them. Other places that perform such conversion use `symbolToLocation()`, so I reused it. But you're right that `symbolToLocation()` also has some "pick the definition or the declaration" logic which is less appropriate here. I can factor out the `SymbolLocation` --> `Location` conversion logic from `symbolToLocation()`, and just use that here. nridge: Well the `Symbol` has `SymbolLocation`s and we need protocol `Location`s, so we have to use…
if (!Loc)		if (!Loc)
return;		return;

Result.emplace_back();		Result.emplace_back();
Result.back().Name = printName(AST.getASTContext(), *Preferred);		Result.back().Name = printName(AST.getASTContext(), *Preferred);
Result.back().PreferredDeclaration = *Loc;		Result.back().PreferredDeclaration = *Loc;
		sammccallUnsubmitted Done Reply Inline Actions I wouldn't bother qualifying this as "for now". Any code is subject to change in the future, but requiring an exact name match for index-based results seems more like a design decision than a fixme. sammccall: I wouldn't bother qualifying this as "for now". Any code is subject to change in the future…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Do we want to rule out the possibility of handling typos in an identifier name in a comment (in cases where we have high confidence in the match, e.g. a long / unique name, small edit distance, only one potential match) in the future? This is also relevant to whether we want to keep the `FuzzyMatcher` or not. nridge: Do we want to rule out the possibility of handling typos in an identifier name in a comment (in…
		sammccallUnsubmitted Done Reply Inline Actions No idea whether typo-correction is a good idea in principle - tradeoff between current false negatives and false positives+compute. However neither FuzzyMatcher nor the existing index implementations support/can easily support real typo correction, and it seems implausible to me we'd add it for this feature. Compare to e.g: allowing case-insensitive match in some cases: `fooBar` vs `FooBar` is a plausible "typo". This is easy to implement. correct the typo where we see the fixed version used as an identifier in this file (and not the original). Excludes some cases, but drives false-positives way down, and easy to implement. I don't think we need to rule things out, but I'm uncertain enough about the approach to think that putting comments, fuzzymatcher etc here speculatively isn't worth it. sammccall: No idea whether typo-correction is a good idea in principle - tradeoff between current false…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Perhaps I'm unclear on the distinction between fuzzy matching and typo correction. Are they not both a matter of comparing a candidate string against a test string, and considering it a match if the they are "close enough" according to some metric (with the metric potentially being a simple edit distance in the case of typo correction)? nridge: Perhaps I'm unclear on the distinction between fuzzy matching and typo correction. Are they not…
// Preferred is always a definition if possible, so this check works.		// Preferred is always a definition if possible, so this check works.
if (Def == Preferred)		if (Def == Preferred)
Result.back().Definition = *Loc;		Result.back().Definition = *Loc;

// Record SymbolID for index lookup later.		// Record SymbolID for index lookup later.
if (auto ID = getSymbolID(Preferred))		if (auto ID = getSymbolID(Preferred))
ResultIndex[*ID] = Result.size() - 1;		ResultIndex[*ID] = Result.size() - 1;
};		};
Show All 9 Lines	if (const auto *CMD = llvm::dyn_cast<CXXMethodDecl>(D)) {
Attr = D->getAttr<FinalAttr>();		Attr = D->getAttr<FinalAttr>();
if (Attr && TouchedIdentifier &&		if (Attr && TouchedIdentifier &&
SM.getSpellingLoc(Attr->getLocation()) ==		SM.getSpellingLoc(Attr->getLocation()) ==
TouchedIdentifier->location()) {		TouchedIdentifier->location()) {
// We may be overridding multiple methods - offer them all.		// We may be overridding multiple methods - offer them all.
for (const NamedDecl *ND : CMD->overridden_methods())		for (const NamedDecl *ND : CMD->overridden_methods())
AddResultDecl(ND);		AddResultDecl(ND);
continue;		continue;
}		}
		sammccallUnsubmitted Done Reply Inline Actions I don't think this should be logged, particularly by default - it doesn't really indicate anything other than we should have a "look up symbol by name" API (ok, actually I think this is just dead code because we've already checked name above) sammccall: I don't think this should be logged, particularly by default - it doesn't really indicate…
}		}

// Special case: the point of declaration of a template specialization,		// Special case: the point of declaration of a template specialization,
// it's more useful to navigate to the template declaration.		// it's more useful to navigate to the template declaration.
if (auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D)) {		if (auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D)) {
if (TouchedIdentifier &&		if (TouchedIdentifier &&
D->getLocation() == TouchedIdentifier->location()) {		D->getLocation() == TouchedIdentifier->location()) {
AddResultDecl(CTSD->getSpecializedTemplate());		AddResultDecl(CTSD->getSpecializedTemplate());
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	const syntax::Token *findNearbyIdentifier(SourceLocation SpellingLoc,
};		};
const syntax::Token *BestTok = nullptr;		const syntax::Token *BestTok = nullptr;
// Search bounds are based on word length: 2^N lines forward.		// Search bounds are based on word length: 2^N lines forward.
unsigned BestCost = Word.size() + 1;		unsigned BestCost = Word.size() + 1;

// Updates BestTok and BestCost if Tok is a good candidate.		// Updates BestTok and BestCost if Tok is a good candidate.
// May return true if the cost is too high for this token.		// May return true if the cost is too high for this token.
auto Consider = [&](const syntax::Token &Tok) {		auto Consider = [&](const syntax::Token &Tok) {
if(!(Tok.kind() == tok::identifier && Tok.text(SM) == Word))		if (!(Tok.kind() == tok::identifier && Tok.text(SM) == Word))
return false;		return false;
// No point guessing the same location we started with.		// No point guessing the same location we started with.
if (Tok.location() == WordStart)		if (Tok.location() == WordStart)
return false;		return false;
// We've done cheap checks, compute cost so we can break the caller's loop.		// We've done cheap checks, compute cost so we can break the caller's loop.
unsigned TokCost = Cost(Tok.location());		unsigned TokCost = Cost(Tok.location());
if (TokCost >= BestCost)		if (TokCost >= BestCost)
return true; // causes the outer loop to break.		return true; // causes the outer loop to break.
Show All 26 Lines	if (BestTok)
vlog(		vlog(
"Word {0} under cursor {1} isn't a token (after PP), trying nearby {2}",		"Word {0} under cursor {1} isn't a token (after PP), trying nearby {2}",
Word, WordStart.printToString(SM),		Word, WordStart.printToString(SM),
BestTok->location().printToString(SM));		BestTok->location().printToString(SM));

return BestTok;		return BestTok;
}		}

		static bool isLikelyToBeIdentifier(StringRef Word) {
		// Word contains underscore.
		// This handles things like snake_case and MACRO_CASE.
		if (Word.contains('_')) {
		return true;
		}
		// Word contains capital letter other than at beginning.
		// This handles things like lowerCamel and UpperCamel.
		// The check for also containing a lowercase letter is to rule out
		// initialisms like "HTTP".
		bool HasLower = Word.find_if(clang::isLowercase) != StringRef::npos;
		bool HasUpper = Word.substr(1).find_if(clang::isUppercase) != StringRef::npos;
		if (HasLower && HasUpper) {
		return true;
		}
		// FIXME: There are other signals we could listen for.
		// Some of these require inspecting the surroundings of the word as well.
		// - mid-sentence Capitalization
		// - markup like quotes / backticks / brackets / "\p"
		// - word has a qualifier (foo::bar)
		return false;
		}

		using ScoredLocatedSymbol = std::pair<float, LocatedSymbol>;
		nridgeAuthorUnsubmitted Done Reply Inline Actions Oh whoops, this assumption is another dependency on `findNearbyIdentifier()` nridge: Oh whoops, this assumption is another dependency on `findNearbyIdentifier()`
		nridgeAuthorUnsubmitted Done Reply Inline Actions For now, I just had it restrict to 3 results in general (even if they're in the same file). Once `findNearbyIdentifier()` lands, the behaviour will automatically become what we intended. nridge: For now, I just had it restrict to 3 results in general (even if they're in the same file).
		struct ScoredSymbolGreater {
		bool operator()(const ScoredLocatedSymbol &L,
		const ScoredLocatedSymbol &R) const {
		return L.first > R.first;
		}
		};

		std::vector<LocatedSymbol>
		locateSymbolNamedTextuallyAt(ParsedAST &AST, const SymbolIndex *Index,
		SourceLocation Loc,
		const std::string &MainFilePath) {
		const auto &SM = AST.getSourceManager();
		FileID File;
		unsigned Pos;
		std::tie(File, Pos) = SM.getDecomposedLoc(Loc);
		llvm::StringRef Code = SM.getBufferData(File);
		auto QueryString = wordTouching(Code, Pos);
		if (!isLikelyToBeIdentifier(QueryString)) {
		return {};
		}

		// If this is a real token that survived preprocessing, don't
		// use the textual heuristic. This is to avoid false positives
		// when over tokens that happen to correspond to an identifier
		// name elsewhere.
		// FIXME: Relax this for dependent code.
		unsigned WordOffset = QueryString.data() - Code.data();
		SourceLocation WordStart = SM.getComposedLoc(File, WordOffset);
		// If this is a real token that survived preprocessing, don't use heuristics.
		auto WordExpandedTokens =
		AST.getTokens().expandedTokens(SM.getMacroArgExpandedLocation(WordStart));
		if (!WordExpandedTokens.empty())
		return {};

		FuzzyFindRequest Req;
		Req.Query = QueryString.str();
		Req.ProximityPaths = {MainFilePath};
		Req.Scopes = visibleNamespaces(Code.take_front(Pos), AST.getLangOpts());
		// FIXME: For extra strictness, consider AnyScope=false.
		Req.AnyScope = true;
		// We limit the results to 3 further below. This limit is to avoid fetching
		// too much data, while still likely having enough for 3 results to remain
		// after additional filtering.
		Req.Limit = 10;
		TopN<ScoredLocatedSymbol, ScoredSymbolGreater> Top(*Req.Limit);
		sammccallUnsubmitted Done Reply Inline Actions (The fuzzy matcher and topN are still here - I think we don't need them, right? With only up-to-3 results, std::sort seems more obvious) sammccall: (The fuzzy matcher and topN are still here - I think we don't need them, right? With only up-to…
		FuzzyMatcher Filter(Req.Query);
		Index->fuzzyFind(Req, [&](const Symbol &Sym) {
		sammccallUnsubmitted Done Reply Inline Actions maybe bail out early (on unusable/too many) instead of doing all the score computations first? fuzzyFind(..., { // bail out if it's a constructor or name doesn't match if (Results.size() >= 3) { TooMany = true; return; } // add result }); sammccall: maybe bail out early (on unusable/too many) instead of doing all the score computations first?
		auto MaybeDeclLoc =
		nridgeAuthorUnsubmitted Done Reply Inline Actions Sorry this location-setting code is so messy. All my attempts to make it more concise have been thwarted by `llvm::Expected`'s very restrictive API. nridge: Sorry this location-setting code is so messy. All my attempts to make it more concise have been…
		sammccallUnsubmitted Not Done Reply Inline Actions Ugh, don't get me started on Error/Expected :-( I'd love to get rid of it somehow, but it seems like we'd inevitably just end up with the new thing + Error/Expected + error_code/ErrorOr + return-a-bool, and I'm not sure it'd be better. (If you have more energy than me, I'd enthusiastically +1 an llvm-dev proposal to drop the clever checks from llvm::Error, and I know some others who would...) sammccall: Ugh, don't get me started on Error/Expected :-( I'd love to get rid of it somehow, but it…
		symbolLocationToLocation(Sym.CanonicalDeclaration, MainFilePath);
		if (!MaybeDeclLoc) {
		log("locateSymbolNamedTextuallyAt: {0}", MaybeDeclLoc.takeError());
		return;
		}
		Location DeclLoc = *MaybeDeclLoc;
		Location DefLoc;
		if (Sym.Definition) {
		auto MaybeDefLoc = symbolLocationToLocation(Sym.Definition, MainFilePath);
		if (!MaybeDefLoc) {
		log("locateSymbolNamedTextuallyAt: {0}", MaybeDefLoc.takeError());
		return;
		}
		DefLoc = *MaybeDefLoc;
		}
		Location PreferredLoc = bool(Sym.Definition) ? DefLoc : DeclLoc;

		// For now, only consider exact name matches, including case.
		// This is to avoid too many false positives.
		// We could relax this in the future if we make the query more accurate
		// by other means.
		if (Sym.Name != QueryString)
		return;

		// Exclude constructor results. They have the same name as the class,
		// but we don't have enough context to prefer them over the class.
		if (Sym.SymInfo.Kind == index::SymbolKind::Constructor)
		return;

		std::string Scope = std::string(Sym.Scope);
		llvm::StringRef ScopeRef = Scope;
		ScopeRef.consume_back("::");
		LocatedSymbol Located;
		Located.Name = (Sym.Name + Sym.TemplateSpecializationArgs).str();
		Located.PreferredDeclaration = DeclLoc;
		Located.Definition = DefLoc;

		SymbolQualitySignals Quality;
		Quality.merge(Sym);
		SymbolRelevanceSignals Relevance;
		Relevance.Name = Sym.Name;
		Relevance.Query = SymbolRelevanceSignals::Generic;
		if (auto NameMatch = Filter.match(Sym.Name))
		Relevance.NameMatch = *NameMatch;
		else
		return;
		Relevance.merge(Sym);
		auto Score =
		evaluateSymbolAndRelevance(Quality.evaluate(), Relevance.evaluate());
		dlog("locateSymbolNamedTextuallyAt: {0}{1} = {2}\n{3}{4}\n", Sym.Scope,
		Sym.Name, Score, Quality, Relevance);

		Top.push({Score, std::move(Located)});
		});
		std::vector<LocatedSymbol> Result;
		for (auto &Res : std::move(Top).items())
		Result.push_back(std::move(Res.second));
		// Assume we don't have results from the current file, otherwise the
		// findNearbyIdentifier() mechanism would have handled them.
		// If we have more than 3 results, and none from the current file, don't
		// return anything, as confidence is too low.
		// FIXME: Alternatively, try a stricter query?
		if (Result.size() > 3)
		return {};
		return Result;
		}

std::vector<LocatedSymbol> locateSymbolAt(ParsedAST &AST, Position Pos,		std::vector<LocatedSymbol> locateSymbolAt(ParsedAST &AST, Position Pos,
const SymbolIndex *Index) {		const SymbolIndex *Index) {
const auto &SM = AST.getSourceManager();		const auto &SM = AST.getSourceManager();
auto MainFilePath =		auto MainFilePath =
getCanonicalPath(SM.getFileEntryForID(SM.getMainFileID()), SM);		getCanonicalPath(SM.getFileEntryForID(SM.getMainFileID()), SM);
if (!MainFilePath) {		if (!MainFilePath) {
elog("Failed to get a path for the main file, so no references");		elog("Failed to get a path for the main file, so no references");
return {};		return {};
Show All 29 Lines	if (const syntax::Token *NearbyIdent =
if (auto Macro = locateMacroReferent(NearbyIdent, AST, MainFilePath))		if (auto Macro = locateMacroReferent(NearbyIdent, AST, MainFilePath))
return {*std::move(Macro)};		return {*std::move(Macro)};
ASTResults = locateASTReferent(NearbyIdent->location(), NearbyIdent, AST,		ASTResults = locateASTReferent(NearbyIdent->location(), NearbyIdent, AST,
*MainFilePath, Index);		*MainFilePath, Index);
if (!ASTResults.empty())		if (!ASTResults.empty())
return ASTResults;		return ASTResults;
}		}

return {};		return locateSymbolNamedTextuallyAt(AST, Index, CurLoc, MainFilePath);
}		}

std::vector<DocumentLink> getDocumentLinks(ParsedAST &AST) {		std::vector<DocumentLink> getDocumentLinks(ParsedAST &AST) {
const auto &SM = AST.getSourceManager();		const auto &SM = AST.getSourceManager();
auto MainFilePath =		auto MainFilePath =
getCanonicalPath(SM.getFileEntryForID(SM.getMainFileID()), SM);		getCanonicalPath(SM.getFileEntryForID(SM.getMainFileID()), SM);
if (!MainFilePath) {		if (!MainFilePath) {
elog("Failed to get a path for the main file, so no links");		elog("Failed to get a path for the main file, so no links");
▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines

clang-tools-extra/clangd/unittests/XRefsTests.cpp

Show First 20 Lines • Show All 590 Lines • ▼ Show 20 Lines	if (!WantDecl) {
llvm::Optional<Range> GotDef;		llvm::Optional<Range> GotDef;
if (Results[0].Definition)		if (Results[0].Definition)
GotDef = Results[0].Definition->range;		GotDef = Results[0].Definition->range;
EXPECT_EQ(WantDef, GotDef) << Test;		EXPECT_EQ(WantDef, GotDef) << Test;
}		}
}		}
}		}

		TEST(LocateSymbol, Textual) {
		sammccallUnsubmitted Done Reply Inline Actions `#ifdef`'d out code is another interesting motivation worth testing. sammccall: `#ifdef`'d out code is another interesting motivation worth testing.
		const char *Tests[] = {
		R"cpp(// Comment
		struct [[MyClass]] {};
		// Comment mentioning M^yClass
		)cpp",
		R"cpp(// String
		struct [[MyClass]] {};
		const char* s = "String literal mentioning M^yClass";
		)cpp",
		R"cpp(// Ifdef'ed out code
		struct [[MyClass]] {};
		#ifdef WALDO
		M^yClass var;
		#endif
		)cpp"};

		for (const char *Test : Tests) {
		Annotations T(Test);
		llvm::Optional<Range> WantDecl;
		if (!T.ranges().empty())
		WantDecl = T.range();

		auto TU = TestTU::withCode(T.code());

		auto AST = TU.build();
		auto Index = TU.index();
		auto Results = locateSymbolAt(AST, T.point(), Index.get());

		if (!WantDecl) {
		EXPECT_THAT(Results, IsEmpty()) << Test;
		} else {
		ASSERT_THAT(Results, ::testing::SizeIs(1)) << Test;
		EXPECT_EQ(Results[0].PreferredDeclaration.range, *WantDecl) << Test;
		}
		}
		}

TEST(LocateSymbol, Ambiguous) {		TEST(LocateSymbol, Ambiguous) {
auto T = Annotations(R"cpp(		auto T = Annotations(R"cpp(
struct Foo {		struct Foo {
Foo();		Foo();
Foo(Foo&&);		Foo(Foo&&);
$ConstructorLoc[[Foo]](const char*);		$ConstructorLoc[[Foo]](const char*);
};		};

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	TEST(LocateSymbol, Ambiguous) {
EXPECT_THAT(locateSymbolAt(AST, T.point("12")),		EXPECT_THAT(locateSymbolAt(AST, T.point("12")),
UnorderedElementsAre(Sym("bar", T.range("NonstaticOverload1")),		UnorderedElementsAre(Sym("bar", T.range("NonstaticOverload1")),
Sym("bar", T.range("NonstaticOverload2"))));		Sym("bar", T.range("NonstaticOverload2"))));
EXPECT_THAT(locateSymbolAt(AST, T.point("13")),		EXPECT_THAT(locateSymbolAt(AST, T.point("13")),
UnorderedElementsAre(Sym("baz", T.range("StaticOverload1")),		UnorderedElementsAre(Sym("baz", T.range("StaticOverload1")),
Sym("baz", T.range("StaticOverload2"))));		Sym("baz", T.range("StaticOverload2"))));
}		}

		TEST(LocateSymbol, TextualAmbiguous) {
		auto T = Annotations(R"cpp(
		struct Foo {
		void $FooLoc[[uniqueMethodName]]();
		};
		struct Bar {
		void $BarLoc[[uniqueMethodName]]();
		};
		// Will call u^niqueMethodName() on t.
		template <typename T>
		void f(T t);
		)cpp");
		auto TU = TestTU::withCode(T.code());
		auto AST = TU.build();
		auto Index = TU.index();
		EXPECT_THAT(locateSymbolAt(AST, T.point(), Index.get()),
		UnorderedElementsAre(Sym("uniqueMethodName", T.range("FooLoc")),
		Sym("uniqueMethodName", T.range("BarLoc"))));
		}

TEST(LocateSymbol, TemplateTypedefs) {		TEST(LocateSymbol, TemplateTypedefs) {
auto T = Annotations(R"cpp(		auto T = Annotations(R"cpp(
template <class T> struct function {};		template <class T> struct function {};
template <class T> using callback = function<T()>;		template <class T> using callback = function<T()>;

c^allback<int> foo;		c^allback<int> foo;
)cpp");		)cpp");
auto AST = TestTU::withCode(T.code()).build();		auto AST = TestTU::withCode(T.code()).build();
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	TEST(LocateSymbol, NearbyTokenSmoke) {
auto AST = TestTU::withCode(T.code()).build();		auto AST = TestTU::withCode(T.code()).build();
// We don't pass an index, so can't hit index-based fallback.		// We don't pass an index, so can't hit index-based fallback.
EXPECT_THAT(locateSymbolAt(AST, T.point()),		EXPECT_THAT(locateSymbolAt(AST, T.point()),
ElementsAre(Sym("err", T.range())));		ElementsAre(Sym("err", T.range())));
}		}

TEST(LocateSymbol, NearbyIdentifier) {		TEST(LocateSymbol, NearbyIdentifier) {
const char *Tests[] = {		const char *Tests[] = {
R"cpp(		R"cpp(
// regular identifiers (won't trigger)		// regular identifiers (won't trigger)
int hello;		int hello;
int y = he^llo;		int y = he^llo;
)cpp",		)cpp",
R"cpp(		R"cpp(
// disabled preprocessor sections		// disabled preprocessor sections
int [[hello]];		int [[hello]];
#if 0		#if 0
int y = ^hello;		int y = ^hello;
#endif		#endif
)cpp",		)cpp",
R"cpp(		R"cpp(
// comments		// comments
// he^llo, world		// he^llo, world
int [[hello]];		int [[hello]];
)cpp",		)cpp",
R"cpp(		R"cpp(
// string literals		// string literals
int [[hello]];		int [[hello]];
const char* greeting = "h^ello, world";		const char* greeting = "h^ello, world";
)cpp",		)cpp",

R"cpp(		R"cpp(
// can refer to macro invocations (even if they expand to nothing)		// can refer to macro invocations (even if they expand to nothing)
#define INT int		#define INT int
[[INT]] x;		[[INT]] x;
// I^NT		// I^NT
)cpp",		)cpp",

R"cpp(		R"cpp(
// prefer nearest occurrence		// prefer nearest occurrence
int hello;		int hello;
int x = hello;		int x = hello;
// h^ello		// h^ello
int y = [[hello]];		int y = [[hello]];
int z = hello;		int z = hello;
)cpp",		)cpp",

R"cpp(		R"cpp(
// short identifiers find near results		// short identifiers find near results
int [[hi]];		int [[hi]];
// h^i		// h^i
)cpp",		)cpp",
R"cpp(		R"cpp(
// short identifiers don't find far results		// short identifiers don't find far results
int hi;		int hi;



// h^i		// h^i
)cpp",		)cpp",
};		};
for (const char* Test : Tests) {		for (const char *Test : Tests) {
Annotations T(Test);		Annotations T(Test);
auto AST = TestTU::withCode(T.code()).build();		auto AST = TestTU::withCode(T.code()).build();
const auto &SM = AST.getSourceManager();		const auto &SM = AST.getSourceManager();
llvm::Optional<Range> Nearby;		llvm::Optional<Range> Nearby;
if (const auto*Tok = findNearbyIdentifier(		if (const auto *Tok = findNearbyIdentifier(
cantFail(sourceLocationInMainFile(SM, T.point())), AST.getTokens()))		cantFail(sourceLocationInMainFile(SM, T.point())), AST.getTokens()))
Nearby = halfOpenToRange(SM, CharSourceRange::getCharRange(		Nearby = halfOpenToRange(SM, CharSourceRange::getCharRange(
Tok->location(), Tok->endLocation()));		Tok->location(), Tok->endLocation()));
if (T.ranges().empty())		if (T.ranges().empty())
EXPECT_THAT(Nearby, Eq(llvm::None)) << Test;		EXPECT_THAT(Nearby, Eq(llvm::None)) << Test;
else		else
EXPECT_THAT(Nearby, T.range()) << Test;		EXPECT_THAT(Nearby, T.range()) << Test;
}		}
}		}
▲ Show 20 Lines • Show All 406 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add a textual fallback for go-to-definitionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248630

clang-tools-extra/clangd/FindSymbols.h

clang-tools-extra/clangd/FindSymbols.cpp

clang-tools-extra/clangd/XRefs.h

clang-tools-extra/clangd/XRefs.cpp

clang-tools-extra/clangd/unittests/XRefsTests.cpp

[clangd] Add a textual fallback for go-to-definition
ClosedPublic