This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
2/4
SourceCode.h
3/3
SourceCode.cpp
30/31
XRefs.cpp
-
unittests/
1/1
XRefsTests.cpp

Differential D72874

[clangd] Add a textual fallback for go-to-definition
ClosedPublic

Authored by nridge on Jan 16 2020, 1:32 PM.

Download Raw Diff

Details

Reviewers

sammccall

Commits

rGdc4cd43904df: [clangd] Add a textual fallback for go-to-definition

Summary

This facilitates performing go-to-definition in contexts where AST-based resolution does not work, such as comments, string literals, preprocessor disabled regions, and macro definitions, based on textual lookup in the index.

Partially fixes https://github.com/clangd/clangd/issues/241

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nridge created this revision.Jan 16 2020, 1:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2020, 1:32 PM

Herald added subscribers: cfe-commits, usaxena95, kadircet and 4 others. · View Herald Transcript

Unit tests: pass. 61850 tests passed, 0 failed and 781 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster completed remote builds in B44195: Diff 238599.Jan 16 2020, 1:56 PM

nridge marked an inline comment as done.Jan 21 2020, 8:40 AM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
203	(There should be a return here, will fix locally.)

I've tried this out locally and it's fun! As suspected on the bug though, IMO it's far from accurate enough. Examples from clangd/Compiler.cpp:

it triggers on almost every word, even words that plainly don't refer to any decl like format [[lazily]], in case vlog is off. This means that e.g. (in VSCode) the underline on ctrl-hover gives no/misleading signal. It also means that missing your target now jumps you somewhere random instead of doing nothing.
when it works properly, the correct result usually mixed with incorrect results (e.g. createInvocationFromCommandLine sets [[DisableFree]]).
it doesn't work for some symbols - ones that are not indexable (e.g. RemappedFileBuffers will handle the lifetime of the [[Buffer]] pointer, gives a variety of wrong results)

So while I want to stress this is really cool, it doesn't feel reliable on any dimension: you can't trust clangd on whether the word is an actual reference, you can't trust any particular result, and you can't trust the correct result is in the set.

Some suggestions:

only trigger when there's *some* positive signal for the word.
- Markup like quotes/backticks/brackets/\p
- weird case like lowerCamel, UpperCamel, CAPS, mid-sentence Capitalization, under_scores.
- use of the word as a token in nearby code (very close if very short, anywhere in file if longer)
- (maybe you want to support ns::Qualifiers?)
post-filter aggressively - only return exact name matches (I think including case).
call fuzzyFind directly and set ProximityPath as well as the enclosing scopes from lexing. For extra strictness consider AnyScope=false
if you get more than 3 results, and none from current file, maybe don't return anything, as confidence is too low. Or try a stricter query...
handle the most common case of non-indexable symbols (local symbols) by running the query against the closest occurrence of the token in code.

Thanks for taking a look!

In D72874#1831977, @sammccall wrote:

it triggers on almost every word, even words that plainly don't refer to any decl like format [[lazily]], in case vlog is off. This means that e.g. (in VSCode) the underline on ctrl-hover gives no/misleading signal. It also means that missing your target now jumps you somewhere random instead of doing nothing.

Heh, I didn't realize VSCode had this feature. I do agree that it changes the tradeoffs a bit, as it means go-to-definition can be invoked in a context where there isn't an explicit signal from the user that they think there's a target there.

The other points you make are completely fair too. I will revise and take your suggestions into account.

I'll aim to start by factoring in enough of your suggestions to reduce the noise to an acceptable level for an initial landing, and leave some of the others for follow-up enhancements.

Address some review comments

I've addressed some of the review comments, with a view to getting something minimal we can land, and improve on in follow-up changes.

Mostly, I focused on the suggestions which reduce the number of results. I've left other suggestions which increase the number of results (e.g. handling non-indexed symbols) for follow-ups.

In D72874#1831977, @sammccall wrote:

only trigger when there's *some* positive signal for the word.

Markup like quotes/backticks/brackets/\p

weird case like lowerCamel, UpperCamel, CAPS, mid-sentence Capitalization, under_scores.

use of the word as a token in nearby code (very close if very short, anywhere in file if longer)

(maybe you want to support ns::Qualifiers?)

I currently handle lowerCamel, UpperCamel, CAPS, and under_scores. I've left the others as follow-ups.

post-filter aggressively - only return exact name matches (I think including case).

Done.

call fuzzyFind directly and set ProximityPath

Done.

as well as the enclosing scopes from lexing. For extra strictness consider AnyScope=false

I haven't done this yet, do you think it's important for an initial landing?

If so, could you mention what API you had in mind for determining "enclosing scopes from lexing"?

I had in mind using something like SelectionTree and collecting any RecordDecls or NamespaceDecls on the path from the common ancestor to the TU, but that's technically not "from lexing", so perhaps you have something else in mind.

if you get more than 3 results, and none from current file, maybe don't return anything, as confidence is too low. Or try a stricter query...

I implemented this, but my testing shows this causes a lot of results for class names to be excluded. The reason appears to be that fuzzyFind() returns the class and each of its constructors as distinct results, so if a class has more than two constructors, we'll have more than 3 results (and typically the class is declared in a different file).

Should we try to handle this case specifically (collapse a class name and its construtors to a single result), or should we reconsider this filtering criterion? It's not exactly clear to me what sort of bad behaviour it's intended to weed out.

handle the most common case of non-indexable symbols (local symbols) by running the query against the closest occurrence of the token in code.

I've left this as a follow-up.

Harbormaster completed remote builds in B47735: Diff 247529.Mar 1 2020, 5:26 PM

Thanks! The scope looks good to me now, on to implementation details.
I'm being a bit picky on the behaivor because go-to-def is a heavily-used feature, many users won't be expecting what we're doing here, and we can't reasonably expect them to understand the failure modes.
So, let's try hard not to fail :-)

This reminds me: it's not completely obvious what set of "act on symbol under the cursor" things this should (eventually) apply to.
I think not having e.g. find-references work makes sense - user should navigate to a "real" occurrence to resolve the ambiguity, and things like code actions are right out.
However having textDocument/hover work when we have high confidence in results would be really cool.
Obviously nothing in scope for this patch, but it seems worth writing this down somewhere, precisely because we shouldn't do it now.

In D72874#1900149, @nridge wrote:

I currently handle lowerCamel, UpperCamel, CAPS, and under_scores. I've left the others as follow-ups.

(sorry for shifting goalposts, I think CAPS may be too broad. Left a comment inline)

if you get more than 3 results, and none from current file, maybe don't return anything, as confidence is too low. Or try a stricter query...

I implemented this, but my testing shows this causes a lot of results for class names to be excluded. The reason appears to be that fuzzyFind() returns the class and each of its constructors as distinct results, so if a class has more than two constructors, we'll have more than 3 results (and typically the class is declared in a different file).

I think we should just drop constructor results, they'll always have this problem.
(There are other cases but this is the biggest).

handle the most common case of non-indexable symbols (local symbols) by running the query against the closest occurrence of the token in code.

I've left this as a follow-up.

Makes sense. I think this there's not a lot of new complexity here, we have the major pieces (getWordAtPosition, TokenBuffer, SelectionTree, targetDecl, index) but integration is definitely substantial.

I'd suggest we go down that path before adding complexity for the indexed-based path though, because I suspect it's going to handle many of the practical situations where the index-based approach needs a lot of help (and vice-versa).

clang-tools-extra/clangd/SourceCode.cpp
313	@kadircet is working on getting rid of this function because creating raw lexers is is wasteful and not actually very powerful. Mostly we're moving to syntax::TokenBuffer, which records actual lexed tokens, but that doesn't apply here. The examples in the tests seem like they'd be covered by something really simple, like enclosing identifier chars: unsigned Begin, End; for (Begin = Offset; Begin > 0 && isIdentifierBody(Code[Begin-1]); --BeginEnd) {} for (End = Offset; End < Code.size() && isIdentifierBody(Code[End]); ++End) {} return Code.slice(Begin, End); (Lexer::isIdentifierBodyChar requires langopts but just passes through DollarIdents to isIdentifierBody, and I don't think we care much about identifiers with $ in them.) If we really want to do something more subtle here, we should check it in SourceCodeTests.
clang-tools-extra/clangd/SourceCode.h
93	consider moving the isLikelyToBeIdentifier check inside. The current API is pretty general and it's not clear yet what (else) it's good for so it's nice to direct towards intended usage. Also doing the identifier check inside this function is more convenient when it relies on markers outside the identifier range (like doxygen `\p` or backtick-quoted identifiers) That said, you may still want to return the range when it's not a likely identifier, with a signature like `StringRef getWordAtPosition(bool *LikelyIdentifier = nullptr)`. I'm thinking of the future case where the caller wants to find a nearby matching token and resolve it - resolving belongs in the caller so there's not much point having this function duplicate the check.
93	This doesn't use the SourceManager-structure of the file, so the natural signature would be `StringRef getWordAtPosition(StringRef Code, unsigned Offset)`. (what are the practical cases where langopts is relevant?)
clang-tools-extra/clangd/XRefs.cpp
191	nit: mention snake_case, MACRO_CASE?
195	nit: can you mention this catches lowerCamel and UpperCamel
196	nit: prefer llvm::isUppercase to avoid locales
196	this will fire for initialisms like `HTTP`. I think we want to require both upper and lowercase letters.
215	I think this is dead - we're just sorting by score.
219	this function should have a high-level comment describing the strategy and the limitations (e.g. idea of extending it to resolve nearby matching tokens). A name like `locateSymbolNamedTextuallyAt` would better describe what this does, rather than what its caller does. I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through `locateSymbolAt`. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify.
233	FWIW the API for this is visibleNamespaces() from SourceCode.cpp. (No enclosing classes, but I suspect we can live without them once we have a nearby-tokens solution too)
236	If we're bailing out on >3, I think this limit should be aiming to detect when there's >3, and avoid fetching way too much data, but not trying to avoid noise. (I'd suggest 10 or so)
241	This seems dead, you're requiring exact matches, these will always have the same score.
243	This is an interesting signal, I think there are two sensible ways to go about it: assume results in this file are more likely accurate than those in other files. In this case we should at minimum be using this in ranking, but really we should just drop all cross-file results if we have an in-file one. don't rely on index for main-file cases, and rely on "find nearby matching token and resolve it instead". That can easily handled cases defined/referenced in the main-file with sufficient accuracy, including non-indexed symbols. So here we can assume this signal is always false, and drop it.
244	BTW I think the answer for constructors is just to drop all constructor results here. (This also affects template specializations which I think we can not worry about, and virtual method hierarchies which are more painful but I also wouldn't try to fix now)
245	I'm not sure why we're using SymbolToLocation here: Main file URI check: the `Symbol` has URIs. They need to be canonicalized to file URIs before comparison. This allows checking both decl and def location. PreferredDeclaration and Definition can be more easily set directly from the `Symbol`
251	I wouldn't bother qualifying this as "for now". Any code is subject to change in the future, but requiring an exact name match for index-based results seems more like a design decision than a fixme.
277	I don't think this should be logged, particularly by default - it doesn't really indicate anything other than we should have a "look up symbol by name" API (ok, actually I think this is just dead code because we've already checked name above)
clang-tools-extra/clangd/unittests/XRefsTests.cpp
588	`#ifdef`'d out code is another interesting motivation worth testing.

I'm playing with a prototype of the token-based approach, a couple of follow-ups from that.

I've split out functions to handle file/macro/ast from locateSymbolAt in e7de00cf974a4e30d4900518ae8473a117efbd6c - hopefully an easy merge, you're adding another one.

I think having this trigger where the identifier is an actual token in the program is a surprisingly invasive change and runs a strong risk of confusing users (who can't distinguish these textual heuristics from normal go-to-def behaviour, and rely on its accuracy), and we shouldn't do it without a lot more testing.
I think the way to implement this is to call getMacroArgExpandedLocation on the start of the "token" we found, and feed the result into TokenBuffer::expandedTokens(SourceRange). If we get an empty list back, then the parser didn't see this token and we're good to proceed without any overlap with the strict AST-based options.
This will leave comments, strings, and #ifdef'd sections should work fine, but not dependent or broken code. (Many cases of broken code can be fixed using RecoveryExpr which is finally going to land)

sammccall added inline comments.Mar 2 2020, 10:10 AM

clang-tools-extra/clangd/SourceCode.cpp
313	Mostly we're moving to syntax::TokenBuffer, which records actual lexed tokens, but that doesn't apply here. Oops, this isn't true - token buffer's expanded token stream has "real" tokens, but the spelled token streams use the raw lexer. You can just use spelledIdentifierTouching(), I think.

sammccall added inline comments.Mar 2 2020, 10:12 AM

clang-tools-extra/clangd/SourceCode.cpp
313	You can just use spelledIdentifierTouching(), I think. Sorry disregard this, obviously it doesn't work in comments etc. Need more coffee...

Thanks for all the comments Sam! I'll have a detailed look tomorrow, but I wanted to follow up on this:

In D72874#1901383, @sammccall wrote:

I think having this trigger where the identifier is an actual token in the program is a surprisingly invasive change and runs a strong risk of confusing users (who can't distinguish these textual heuristics from normal go-to-def behaviour, and rely on its accuracy), and we shouldn't do it without a lot more testing.

The "dependent code" use case is a pretty important one in my eyes.

In one of the codebases I work on, we have a fair amount of code like this:

template <typename T>
void foo(T t) {
   // ...
   t.someUniqueMethodName();
   // ...
   t.someOtherUniqueMethodName();
   // ...
}

The code is in practice only instantiated with a handful of types for T (often just two). (But we don't have a way to express this in the code at this time.) Being able to invoke go-to-definition at e.g. someUniqueMethodName and get the definition sites of the corresponding handful of methods, as opposed to nothing at all, is something I'd really like to get working. I'm open to suggestions to how we can test this better, or scope the behaviour more narrowly to avoid other unintended results for real tokens.

In D72874#1901606, @nridge wrote:

The "dependent code" use case is a pretty important one in my eyes.

In one of the codebases I work on, we have a fair amount of code like this:

Yep, fair enough. And I don't think that this is so bad for say DependentDeclRefExpr, where we're already doing heuristic stuff (and the user can reasonably understand that we might).
I'm more concerned that it might trigger at arbitrary times, like say on [[^noreturn]] void abort();.

But we can distinguish these cases! SelectionTree recognizes DependentDeclRefExpr and friends even if targetDecl can't resolve them. So I think we can use a whitelist: the AST part of locateSymbol reports the type of node that owned TouchedIdentifier, and if it's one of the types we want to use textual fallback for, then we go ahead with the fallback code (in addition to the cases I mentioned where the touched word doesn't turn out to be a real identifier).
We can even try to glean more info, e.g. if it's a CXXDependentScopeMemberExpr then we can filter out non-member index results.

(Tactically I think it makes sense to add the basic fallback logic, and follow up with the dependent-code entrypoints, but up to you)

sammccall mentioned this in D75479: [clangd] go-to-def on names in comments etc that are used nearby..Mar 2 2020, 1:45 PM

I'm posting some partial responses because I have some questions (really just one, about fuzzy matching).

In general the comments seem reasonable and I plan to address all of them.

(I've marked some comments as done because I've addressed them locally. I'm not uploading a revised patch yet because it wouldn't be very interesting.)

clang-tools-extra/clangd/XRefs.cpp
219	I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through locateSymbolAt. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify. I was going to push back against this, but I ended up convincing myself that your suggestion is better :) For the record, the consideration that convinced me was: Suppose in the future we add fancier AST-based logic that handles a case like `T().foo()` (for example, by surveying types for which `T` is actually substituted, and offering `foo()` inside those types). If all we're testing is "navigation works for this case" rather than "navigation works for this case via the AST-based mechanism", we could regress the AST logic but have our test still pass because the testcase is simple enough that the text-based navigation fallback (that we're adding here) works as well.
245	Well the `Symbol` has `SymbolLocation`s and we need protocol `Location`s, so we have to use something to convert them. Other places that perform such conversion use `symbolToLocation()`, so I reused it. But you're right that `symbolToLocation()` also has some "pick the definition or the declaration" logic which is less appropriate here. I can factor out the `SymbolLocation` --> `Location` conversion logic from `symbolToLocation()`, and just use that here.
251	Do we want to rule out the possibility of handling typos in an identifier name in a comment (in cases where we have high confidence in the match, e.g. a long / unique name, small edit distance, only one potential match) in the future? This is also relevant to whether we want to keep the `FuzzyMatcher` or not.

Rebase onto D75479 and address most review comments

Comments remaining to be addressed:

revising the tests to exercise locateSymbolNamedTextuallyAt() directly
comments related to fuzzy matching (I have an outstanding question about that)

Handling of dependent code has been deferred to a follow-up change

In D72874#1900648, @sammccall wrote:

This reminds me: it's not completely obvious what set of "act on symbol under the cursor" things this should (eventually) apply to.
I think not having e.g. find-references work makes sense - user should navigate to a "real" occurrence to resolve the ambiguity, and things like code actions are right out.
However having textDocument/hover work when we have high confidence in results would be really cool.
Obviously nothing in scope for this patch, but it seems worth writing this down somewhere, precisely because we shouldn't do it now.

Agreed. Filed https://github.com/clangd/clangd/issues/303 for hover.

In D72874#1901722, @sammccall wrote:

(Tactically I think it makes sense to add the basic fallback logic, and follow up with the dependent-code entrypoints, but up to you)

Yep, will handle dependent code in a folow-up patch.

clang-tools-extra/clangd/SourceCode.h
93	Now that I'm using `wordTouching()` from D75479, I think this comment no longer applies?
clang-tools-extra/clangd/XRefs.cpp
219	Renamed and comment added. I still need to revise the tests.
233	Thanks, that's convenient! Out of curiosity, though: is the reason to prefer this lexer-based approach over hit-testing the query location against `NamespaceDecl`s in the AST, mainly for performance?
235	It occured to me that I don't think we can do `AnyScope=false` if we want to handle dependent member cases like `T().uniqueMethodName()`. The members we want to find in such a case will often be both in a different file (so nearby-tokens won't handle them) and not in any visible scope.
243	Since you've implemented "find nearby matching token and resolve it", I went with the second approach.

nridge marked an inline comment as done.Mar 5 2020, 4:53 PM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
425	Sorry this location-setting code is so messy. All my attempts to make it more concise have been thwarted by `llvm::Expected`'s very restrictive API.

Harbormaster completed remote builds in B48285: Diff 248630.Mar 5 2020, 4:59 PM

I'd like to sync up briefly on https://github.com/clangd/clangd/issues/241 so we know where we want to end up.

I think this is in good shape and certainly doesn't need a bigger scope, just want to be able to reason about how things will fit together.

clang-tools-extra/clangd/SourceCode.h
93	I think the reasons still apply - D75479 doesn't need to check likelihood (it considers actual use as identifier evidence enough) so I didn't include it there, but we should eventually merge these more thoroughly I think. No need to do that until we actually want to implement different heuristics though.
clang-tools-extra/clangd/XRefs.cpp
233	Well, it was written for fallback code completion when we have no AST at all :-) Gathering from the AST should be better, though it's not quite as simple as hit-testing (you also have to find `using namespace`). But this exists today, which is a feature!
251	No idea whether typo-correction is a good idea in principle - tradeoff between current false negatives and false positives+compute. However neither FuzzyMatcher nor the existing index implementations support/can easily support real typo correction, and it seems implausible to me we'd add it for this feature. Compare to e.g: allowing case-insensitive match in some cases: `fooBar` vs `FooBar` is a plausible "typo". This is easy to implement. correct the typo where we see the fixed version used as an identifier in this file (and not the original). Excludes some cases, but drives false-positives way down, and easy to implement. I don't think we need to rule things out, but I'm uncertain enough about the approach to think that putting comments, fuzzymatcher etc here speculatively isn't worth it.
425	Ugh, don't get me started on Error/Expected :-( I'd love to get rid of it somehow, but it seems like we'd inevitably just end up with the new thing + Error/Expected + error_code/ErrorOr + return-a-bool, and I'm not sure it'd be better. (If you have more energy than me, I'd enthusiastically +1 an llvm-dev proposal to drop the clever checks from llvm::Error, and I know some others who would...)

nridge marked an inline comment as done.Mar 9 2020, 8:00 AM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
251	Perhaps I'm unclear on the distinction between fuzzy matching and typo correction. Are they not both a matter of comparing a candidate string against a test string, and considering it a match if the they are "close enough" according to some metric (with the metric potentially being a simple edit distance in the case of typo correction)?

I've started to update the patch to be in line with the direction discussed in the issue.

@sammccall, how would you like to proceed logistically:

Do you plan to land (a possibly modified version of) D75479?
Or should I combine that patch into this one?

In D72874#1915840, @nridge wrote:

I've started to update the patch to be in line with the direction discussed in the issue.

@sammccall, how would you like to proceed logistically:

Do you plan to land (a possibly modified version of) D75479?

Or should I combine that patch into this one?

This patch looks good, I wouldn't bother redesigning anything, we should iterate instead.

You should go ahead, and I'll merge, and then we should work towards enabling dependent code use cases etc. SG?

clang-tools-extra/clangd/FindSymbols.h
25 ↗	(On Diff #248630)	nit: these names are vague and echo the type signature, maybe indexToLSPLocation?
26 ↗	(On Diff #248630)	nit: HintPath should be TUPath, the decision to use some other path as a TU path can only be made in the caller (needs context). (Same is true for symbolToLocation, I'm not sure when that became public)
clang-tools-extra/clangd/XRefs.cpp
422	(The fuzzy matcher and topN are still here - I think we don't need them, right? With only up-to-3 results, std::sort seems more obvious)
424	maybe bail out early (on unusable/too many) instead of doing all the score computations first? fuzzyFind(..., { // bail out if it's a constructor or name doesn't match if (Results.size() >= 3) { TooMany = true; return; } // add result });

This revision is now accepted and ready to land.Mar 11 2020, 8:44 AM

Remove fuzzy matching
Rebase to apply to head, taking only the parts from D75479 that I need for index-based lookup (such as wordTouching())
Revise tests so they exercise locateSymbolNamedTextuallyAt() directly, except for one smoke test
Add tests that verify that we do not trigger on dependent or broken code

Herald added a subscriber: mgrang. · View Herald TranscriptMar 12 2020, 12:12 PM

nridge marked 2 inline comments as done.Mar 12 2020, 12:16 PM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
471	Oh whoops, this assumption is another dependency on `findNearbyIdentifier()`

nridge edited the summary of this revision. (Show Details)Mar 12 2020, 12:23 PM

Tweak a comment

nridge marked an inline comment as done.Mar 12 2020, 12:58 PM

nridge added inline comments.

clang-tools-extra/clangd/XRefs.cpp
471	For now, I just had it restrict to 3 results in general (even if they're in the same file). Once `findNearbyIdentifier()` lands, the behaviour will automatically become what we intended.

I should mention that in my local usage, I've found the restriction on no more than 3 results (even if they're not in the current file) to be somewhat limiting. For example, a comment can easily reference the name of a function which has more than 3 overloads.

But we can start by landing this, and consider relaxing the limit (either in general, or in specific cases such as the overload set case) in follow-ups.

(Also just to clarify: while I said on Discord that I already implemented exclusion of string literals, I actually ended up deferring that part to a follow-up because it wasn't working as I expected.)

Harbormaster failed remote builds in B49037: Diff 250018!Mar 12 2020, 1:34 PM

Closed by commit rGdc4cd43904df: [clangd] Add a textual fallback for go-to-definition (authored by sammccall, committed by nridge). · Explain WhyMar 12 2020, 1:34 PM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B49045: Diff 250031!Mar 12 2020, 2:07 PM

sammccall mentioned this in rG3f1c2bf1712c: [clangd] go-to-def on names in comments etc that are used nearby..Apr 22 2020, 10:53 AM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

SourceCode.h

9 lines

SourceCode.cpp

230 lines

XRefs.cpp

30 lines

unittests/

XRefsTests.cpp

66 lines

Diff 238599

clang-tools-extra/clangd/SourceCode.h

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	/// Turn an offset in Code into a [line, column] pair.			/// Turn an offset in Code into a [line, column] pair.
	/// The offset must be in range [0, Code.size()].			/// The offset must be in range [0, Code.size()].
	Position offsetToPosition(llvm::StringRef Code, size_t Offset);			Position offsetToPosition(llvm::StringRef Code, size_t Offset);

	/// Turn a SourceLocation into a [line, column] pair.			/// Turn a SourceLocation into a [line, column] pair.
	/// FIXME: This should return an error if the location is invalid.			/// FIXME: This should return an error if the location is invalid.
	Position sourceLocToPosition(const SourceManager &SM, SourceLocation Loc);			Position sourceLocToPosition(const SourceManager &SM, SourceLocation Loc);

	/// Returns the taken range at \p TokLoc.			/// Returns the token range at \p TokLoc.
	llvm::Optional<Range> getTokenRange(const SourceManager &SM,			llvm::Optional<Range> getTokenRange(const SourceManager &SM,
	const LangOptions &LangOpts,			const LangOptions &LangOpts,
	SourceLocation TokLoc);			SourceLocation TokLoc);

	/// Return the file location, corresponding to \p P. Note that one should take			/// Return the file location, corresponding to \p P. Note that one should take
	/// care to avoid comparing the result with expansion locations.			/// care to avoid comparing the result with expansion locations.
	llvm::Expected<SourceLocation> sourceLocationInMainFile(const SourceManager &SM,			llvm::Expected<SourceLocation> sourceLocationInMainFile(const SourceManager &SM,
	Position P);			Position P);

	/// Get the beginning SourceLocation at a specified \p Pos in the main file.			/// Get the beginning SourceLocation at a specified \p Pos in the main file.
	/// May be invalid if Pos is, or if there's no identifier or operators.			/// May be invalid if Pos is, or if there's no identifier or operators.
	/// The returned position is in the main file, callers may prefer to			/// The returned position is in the main file, callers may prefer to
	/// obtain the macro expansion location.			/// obtain the macro expansion location.
	SourceLocation getBeginningOfIdentifier(const Position &Pos,			SourceLocation getBeginningOfIdentifier(const Position &Pos,
	const SourceManager &SM,			const SourceManager &SM,
	const LangOptions &LangOpts);			const LangOptions &LangOpts);

				/// Get the source range of the raw word at a specified \p Pos in the main file.
				/// This is similar to the token at the specified position, but for positions
				/// inside comments and strings, it only returns a single word rather than
				/// the entire comment or string token.
				SourceRange getWordAtPosition(const Position &Pos, const SourceManager &SM,
				sammccallUnsubmitted Not Done Reply Inline Actions consider moving the isLikelyToBeIdentifier check inside. The current API is pretty general and it's not clear yet what (else) it's good for so it's nice to direct towards intended usage. Also doing the identifier check inside this function is more convenient when it relies on markers outside the identifier range (like doxygen `\p` or backtick-quoted identifiers) That said, you may still want to return the range when it's not a likely identifier, with a signature like `StringRef getWordAtPosition(bool LikelyIdentifier = nullptr)`. I'm thinking of the future case where the caller wants to find a nearby matching token and resolve it - resolving belongs in the caller so there's not much point having this function duplicate the check. sammccall:* consider moving the isLikelyToBeIdentifier check inside. The current API is pretty general and…
				nridgeAuthorUnsubmitted Done Reply Inline Actions Now that I'm using `wordTouching()` from D75479, I think this comment no longer applies? nridge: Now that I'm using `wordTouching()` from D75479, I think this comment no longer applies?
				sammccallUnsubmitted Not Done Reply Inline Actions I think the reasons still apply - D75479 doesn't need to check likelihood (it considers actual use as identifier evidence enough) so I didn't include it there, but we should eventually merge these more thoroughly I think. No need to do that until we actually want to implement different heuristics though. sammccall: I think the reasons still apply - D75479 doesn't need to check likelihood (it considers actual…
				sammccallUnsubmitted Done Reply Inline Actions This doesn't use the SourceManager-structure of the file, so the natural signature would be `StringRef getWordAtPosition(StringRef Code, unsigned Offset)`. (what are the practical cases where langopts is relevant?) sammccall: This doesn't use the SourceManager-structure of the file, so the natural signature would be…
				const LangOptions &LangOpts);

	/// Returns true iff \p Loc is inside the main file. This function handles			/// Returns true iff \p Loc is inside the main file. This function handles
	/// file & macro locations. For macro locations, returns iff the macro is being			/// file & macro locations. For macro locations, returns iff the macro is being
	/// expanded inside the main file.			/// expanded inside the main file.
	///			///
	/// The function is usually used to check whether a declaration is inside the			/// The function is usually used to check whether a declaration is inside the
	/// the main file.			/// the main file.
	bool isInsideMainFile(SourceLocation Loc, const SourceManager &SM);			bool isInsideMainFile(SourceLocation Loc, const SourceManager &SM);

	▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines

clang-tools-extra/clangd/SourceCode.cpp

Show First 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	if (Tok.is(tok::TokenKind::NUM_TOKENS))
return Whitespace;		return Whitespace;
if (Tok.is(tok::TokenKind::raw_identifier))		if (Tok.is(tok::TokenKind::raw_identifier))
return Identifier;		return Identifier;
if (isOverloadedOperator(Tok))		if (isOverloadedOperator(Tok))
return Operator;		return Operator;
return Other;		return Other;
}		}

		SourceLocation getRawWordBegin(SourceLocation Loc, const SourceManager &SM,
		const LangOptions &LangOpts) {
		llvm::StringRef Buf = SM.getBufferData(SM.getMainFileID());
		FileID FID;
		unsigned Offset;
		std::tie(FID, Offset) = SM.getDecomposedLoc(Loc);
		unsigned Start = Offset;
		while (Start > 0 && Lexer::isIdentifierBodyChar(Buf[Start - 1], LangOpts)) {
		--Start;
		}
		return SM.getComposedLoc(FID, Start);
		}

		SourceLocation getRawWordEnd(SourceLocation Loc, const SourceManager &SM,
		const LangOptions &LangOpts) {
		llvm::StringRef Buf = SM.getBufferData(SM.getMainFileID());
		FileID FID;
		unsigned Offset;
		std::tie(FID, Offset) = SM.getDecomposedLoc(Loc);
		unsigned End = Offset;
		while (End < Buf.size() && Lexer::isIdentifierBodyChar(Buf[End], LangOpts))
		++End;
		return SM.getComposedLoc(FID, End);
		}

		SourceLocation getEndOfIdentifier(SourceLocation Loc, const SourceManager &SM,
		const LangOptions &LangOpts) {
		if (!Loc.isValid())
		return SourceLocation{};
		bool Raw = getTokenFlavor(Loc, SM, LangOpts) == Other;
		if (Raw) {
		return getRawWordEnd(Loc, SM, LangOpts);
		}
		return Lexer::getLocForEndOfToken(Loc, 0, SM, LangOpts);
		}

} // namespace		} // namespace

SourceLocation getBeginningOfIdentifier(const Position &Pos,		SourceLocation getBeginningOfIdentifier(const Position &Pos,
		sammccallUnsubmitted Done Reply Inline Actions @kadircet is working on getting rid of this function because creating raw lexers is is wasteful and not actually very powerful. Mostly we're moving to syntax::TokenBuffer, which records actual lexed tokens, but that doesn't apply here. The examples in the tests seem like they'd be covered by something really simple, like enclosing identifier chars: unsigned Begin, End; for (Begin = Offset; Begin > 0 && isIdentifierBody(Code[Begin-1]); --BeginEnd) {} for (End = Offset; End < Code.size() && isIdentifierBody(Code[End]); ++End) {} return Code.slice(Begin, End); (Lexer::isIdentifierBodyChar requires langopts but just passes through DollarIdents to isIdentifierBody, and I don't think we care much about identifiers with $ in them.) If we really want to do something more subtle here, we should check it in SourceCodeTests. sammccall: @kadircet is working on getting rid of this function because creating raw lexers is is wasteful…
		sammccallUnsubmitted Done Reply Inline Actions Mostly we're moving to syntax::TokenBuffer, which records actual lexed tokens, but that doesn't apply here. Oops, this isn't true - token buffer's expanded token stream has "real" tokens, but the spelled token streams use the raw lexer. You can just use spelledIdentifierTouching(), I think. sammccall: > Mostly we're moving to syntax::TokenBuffer, which records actual lexed tokens, but that…
		sammccallUnsubmitted Done Reply Inline Actions You can just use spelledIdentifierTouching(), I think. Sorry disregard this, obviously it doesn't work in comments etc. Need more coffee... sammccall: > You can just use spelledIdentifierTouching(), I think. Sorry disregard this, obviously it…
const SourceManager &SM,		const SourceManager &SM,
const LangOptions &LangOpts) {		const LangOptions &LangOpts) {
FileID FID = SM.getMainFileID();		FileID FID = SM.getMainFileID();
auto Offset = positionToOffset(SM.getBufferData(FID), Pos);		auto Offset = positionToOffset(SM.getBufferData(FID), Pos);
if (!Offset) {		if (!Offset) {
log("getBeginningOfIdentifier: {0}", Offset.takeError());		log("getBeginningOfIdentifier: {0}", Offset.takeError());
return SourceLocation();		return SourceLocation();
}		}
Show All 20 Lines	SourceLocation CurrentTokBeginning =
Lexer::GetBeginningOfToken(InputLoc, SM, LangOpts);		Lexer::GetBeginningOfToken(InputLoc, SM, LangOpts);
TokenFlavor CurrentKind = getTokenFlavor(CurrentTokBeginning, SM, LangOpts);		TokenFlavor CurrentKind = getTokenFlavor(CurrentTokBeginning, SM, LangOpts);

// At the middle of the token.		// At the middle of the token.
if (BeforeTokBeginning == CurrentTokBeginning) {		if (BeforeTokBeginning == CurrentTokBeginning) {
// For interesting token, we return the beginning of the token.		// For interesting token, we return the beginning of the token.
if (CurrentKind == Identifier \|\| CurrentKind == Operator)		if (CurrentKind == Identifier \|\| CurrentKind == Operator)
return CurrentTokBeginning;		return CurrentTokBeginning;
// otherwise, we return the original loc.		// Otherwise, return the beginning of the raw word at the given position.
return InputLoc;		// This facilitates selecting identifiers in comments and strings.
		return getRawWordBegin(InputLoc, SM, LangOpts);
}		}

// Whitespace is not interesting.		// Whitespace is not interesting.
if (BeforeKind == Whitespace)		if (BeforeKind == Whitespace)
return CurrentTokBeginning;		return CurrentTokBeginning;
if (CurrentKind == Whitespace)		if (CurrentKind == Whitespace)
return BeforeTokBeginning;		return BeforeTokBeginning;

// The cursor is at the token boundary, e.g. "Before^Current", we prefer		// The cursor is at the token boundary, e.g. "Before^Current", we prefer
// identifiers to other tokens.		// identifiers to other tokens.
if (CurrentKind == Identifier)		if (CurrentKind == Identifier)
return CurrentTokBeginning;		return CurrentTokBeginning;
if (BeforeKind == Identifier)		if (BeforeKind == Identifier)
return BeforeTokBeginning;		return BeforeTokBeginning;
// Then prefer overloaded operators to other tokens.		// Then prefer overloaded operators to other tokens.
if (CurrentKind == Operator)		if (CurrentKind == Operator)
return CurrentTokBeginning;		return CurrentTokBeginning;
if (BeforeKind == Operator)		if (BeforeKind == Operator)
return BeforeTokBeginning;		return BeforeTokBeginning;

// Non-interesting case, we just return the original location.		// Non-interesting case, we just return the original location.
return InputLoc;		return InputLoc;
}		}

		SourceRange getWordAtPosition(const Position &Pos, const SourceManager &SM,
		const LangOptions &LangOpts) {
		SourceLocation Begin = getBeginningOfIdentifier(Pos, SM, LangOpts);
		SourceLocation End = getEndOfIdentifier(Begin, SM, LangOpts);
		return {Begin, End};
		}

bool isValidFileRange(const SourceManager &Mgr, SourceRange R) {		bool isValidFileRange(const SourceManager &Mgr, SourceRange R) {
if (!R.getBegin().isValid() \|\| !R.getEnd().isValid())		if (!R.getBegin().isValid() \|\| !R.getEnd().isValid())
return false;		return false;

FileID BeginFID;		FileID BeginFID;
size_t BeginOffset = 0;		size_t BeginOffset = 0;
std::tie(BeginFID, BeginOffset) = Mgr.getDecomposedLoc(R.getBegin());		std::tie(BeginFID, BeginOffset) = Mgr.getDecomposedLoc(R.getBegin());

▲ Show 20 Lines • Show All 440 Lines • ▼ Show 20 Lines	enum {
Using, // just saw 'using'		Using, // just saw 'using'
UsingNamespace, // just saw 'using namespace'		UsingNamespace, // just saw 'using namespace'
UsingNamespaceName, // just saw 'using namespace' NSName		UsingNamespaceName, // just saw 'using namespace' NSName
} State = Default;		} State = Default;
std::string NSName;		std::string NSName;

NamespaceEvent Event;		NamespaceEvent Event;
lex(Code, format::getFormattingLangOpts(Style),		lex(Code, format::getFormattingLangOpts(Style),
[&](const clang::Token &Tok,const SourceManager &SM) {		[&](const clang::Token &Tok, const SourceManager &SM) {
Event.Pos = sourceLocToPosition(SM, Tok.getLocation());		Event.Pos = sourceLocToPosition(SM, Tok.getLocation());
switch (Tok.getKind()) {		switch (Tok.getKind()) {
case tok::raw_identifier:		case tok::raw_identifier:
// In raw mode, this could be a keyword or a name.		// In raw mode, this could be a keyword or a name.
switch (State) {		switch (State) {
case UsingNamespace:		case UsingNamespace:
case UsingNamespaceName:		case UsingNamespaceName:
NSName.append(Tok.getRawIdentifier());		NSName.append(Tok.getRawIdentifier());
State = UsingNamespaceName;		State = UsingNamespaceName;
break;		break;
case Namespace:		case Namespace:
case NamespaceName:		case NamespaceName:
NSName.append(Tok.getRawIdentifier());		NSName.append(Tok.getRawIdentifier());
State = NamespaceName;		State = NamespaceName;
break;		break;
case Using:		case Using:
State =		State = (Tok.getRawIdentifier() == "namespace") ? UsingNamespace
(Tok.getRawIdentifier() == "namespace") ? UsingNamespace : Default;		: Default;
break;		break;
case Default:		case Default:
NSName.clear();		NSName.clear();
if (Tok.getRawIdentifier() == "namespace")		if (Tok.getRawIdentifier() == "namespace")
State = Namespace;		State = Namespace;
else if (Tok.getRawIdentifier() == "using")		else if (Tok.getRawIdentifier() == "using")
State = Using;		State = Using;
break;		break;
}		}
break;		break;
case tok::coloncolon:		case tok::coloncolon:
// This can come at the beginning or in the middle of a namespace name.		// This can come at the beginning or in the middle of a namespace
		// name.
switch (State) {		switch (State) {
case UsingNamespace:		case UsingNamespace:
case UsingNamespaceName:		case UsingNamespaceName:
NSName.append("::");		NSName.append("::");
State = UsingNamespaceName;		State = UsingNamespaceName;
break;		break;
case NamespaceName:		case NamespaceName:
NSName.append("::");		NSName.append("::");
State = NamespaceName;		State = NamespaceName;
break;		break;
case Namespace: // Not legal here.		case Namespace: // Not legal here.
case Using:		case Using:
case Default:		case Default:
State = Default;		State = Default;
break;		break;
}		}
break;		break;
case tok::l_brace:		case tok::l_brace:
// Record which { started a namespace, so we know when } ends one.		// Record which { started a namespace, so we know when } ends one.
if (State == NamespaceName) {		if (State == NamespaceName) {
// Parsed: namespace <name> {		// Parsed: namespace <name> {
BraceStack.push_back(true);		BraceStack.push_back(true);
Enclosing.push_back(NSName);		Enclosing.push_back(NSName);
Event.Trigger = NamespaceEvent::BeginNamespace;		Event.Trigger = NamespaceEvent::BeginNamespace;
Event.Payload = llvm::join(Enclosing, "::");		Event.Payload = llvm::join(Enclosing, "::");
Callback(Event);		Callback(Event);
} else {		} else {
// This case includes anonymous namespaces (State = Namespace).		// This case includes anonymous namespaces (State = Namespace).
// For our purposes, they're not namespaces and we ignore them.		// For our purposes, they're not namespaces and we ignore them.
BraceStack.push_back(false);		BraceStack.push_back(false);
}		}
State = Default;		State = Default;
break;		break;
case tok::r_brace:		case tok::r_brace:
// If braces are unmatched, we're going to be confused, but don't crash.		// If braces are unmatched, we're going to be confused, but don't
		// crash.
if (!BraceStack.empty()) {		if (!BraceStack.empty()) {
if (BraceStack.back()) {		if (BraceStack.back()) {
// Parsed: } // namespace		// Parsed: } // namespace
Enclosing.pop_back();		Enclosing.pop_back();
Event.Trigger = NamespaceEvent::EndNamespace;		Event.Trigger = NamespaceEvent::EndNamespace;
Event.Payload = llvm::join(Enclosing, "::");		Event.Payload = llvm::join(Enclosing, "::");
Callback(Event);		Callback(Event);
}		}
BraceStack.pop_back();		BraceStack.pop_back();
}		}
break;		break;
case tok::semi:		case tok::semi:
if (State == UsingNamespaceName) {		if (State == UsingNamespaceName) {
// Parsed: using namespace <name> ;		// Parsed: using namespace <name> ;
Event.Trigger = NamespaceEvent::UsingDirective;		Event.Trigger = NamespaceEvent::UsingDirective;
Event.Payload = std::move(NSName);		Event.Payload = std::move(NSName);
Callback(Event);		Callback(Event);
}		}
State = Default;		State = Default;
break;		break;
default:		default:
State = Default;		State = Default;
break;		break;
}		}
});		});
}		}

// Returns the prefix namespaces of NS: {"" ... NS}.		// Returns the prefix namespaces of NS: {"" ... NS}.
llvm::SmallVector<llvm::StringRef, 8> ancestorNamespaces(llvm::StringRef NS) {		llvm::SmallVector<llvm::StringRef, 8> ancestorNamespaces(llvm::StringRef NS) {
llvm::SmallVector<llvm::StringRef, 8> Results;		llvm::SmallVector<llvm::StringRef, 8> Results;
Results.push_back(NS.take_front(0));		Results.push_back(NS.take_front(0));
NS.split(Results, "::", /MaxSplit=/-1, /KeepEmpty=/false);		NS.split(Results, "::", /MaxSplit=/-1, /KeepEmpty=/false);
for (llvm::StringRef &R : Results)		for (llvm::StringRef &R : Results)
▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines

clang-tools-extra/clangd/XRefs.cpp

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	if (!Inc.Resolved.empty()) {
Result.push_back(DocumentLink(		Result.push_back(DocumentLink(
{Inc.R, URIForFile::canonicalize(Inc.Resolved, *MainFilePath)}));		{Inc.R, URIForFile::canonicalize(Inc.Resolved, *MainFilePath)}));
}		}
}		}

return Result;		return Result;
}		}

		std::vector<LocatedSymbol> navigationFallback(ParsedAST &AST,
		const SymbolIndex *Index,
		sammccallUnsubmitted Done Reply Inline Actions nit: mention snake_case, MACRO_CASE? sammccall: nit: mention snake_case, MACRO_CASE?
		Position Pos,
		const std::string &MainFilePath) {
		const auto &SM = AST.getSourceManager();
		auto SourceRange = getWordAtPosition(Pos, SM, AST.getLangOpts());
		sammccallUnsubmitted Done Reply Inline Actions nit: can you mention this catches lowerCamel and UpperCamel sammccall: nit: can you mention this catches lowerCamel and UpperCamel
		auto QueryString = toSourceCode(SM, SourceRange);
		sammccallUnsubmitted Done Reply Inline Actions nit: prefer llvm::isUppercase to avoid locales sammccall: nit: prefer llvm::isUppercase to avoid locales
		sammccallUnsubmitted Done Reply Inline Actions this will fire for initialisms like `HTTP`. I think we want to require both upper and lowercase letters. sammccall: this will fire for initialisms like `HTTP`. I think we want to require both upper and…
		// Choose a limit that's large enough that it contains the user's desired
		// target even in the presence of some false positives, but small enough that
		// it doesn't generate too much noise.
		int Limit = 5;
		auto Symbols = getWorkspaceSymbols(QueryString, Limit, Index, MainFilePath);
		if (!Symbols) {
		elog("Workspace symbols failed", Symbols.takeError());
		nridgeAuthorUnsubmitted Done Reply Inline Actions (There should be a return here, will fix locally.) nridge: (There should be a return here, will fix locally.)
		}
		std::vector<LocatedSymbol> Result;
		for (auto &Sym : *Symbols) {
		LocatedSymbol Located;
		Located.Name = Sym.name;
		Located.PreferredDeclaration = Sym.location;
		// TODO: Populate Definition?
		Result.push_back(std::move(Located));
		}
		return Result;
		}

		sammccallUnsubmitted Done Reply Inline Actions I think this is dead - we're just sorting by score. sammccall: I think this is dead - we're just sorting by score.
std::vector<LocatedSymbol> locateSymbolAt(ParsedAST &AST, Position Pos,		std::vector<LocatedSymbol> locateSymbolAt(ParsedAST &AST, Position Pos,
const SymbolIndex *Index) {		const SymbolIndex *Index) {
const auto &SM = AST.getSourceManager();		const auto &SM = AST.getSourceManager();
auto MainFilePath =		auto MainFilePath =
		sammccallUnsubmitted Done Reply Inline Actions this function should have a high-level comment describing the strategy and the limitations (e.g. idea of extending it to resolve nearby matching tokens). A name like `locateSymbolNamedTextuallyAt` would better describe what this does, rather than what its caller does. I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through `locateSymbolAt`. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify. sammccall: this function should have a high-level comment describing the strategy and the limitations (e.g.
		nridgeAuthorUnsubmitted Done Reply Inline Actions I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through locateSymbolAt. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify. I was going to push back against this, but I ended up convincing myself that your suggestion is better :) For the record, the consideration that convinced me was: Suppose in the future we add fancier AST-based logic that handles a case like `T().foo()` (for example, by surveying types for which `T` is actually substituted, and offering `foo()` inside those types). If all we're testing is "navigation works for this case" rather than "navigation works for this case via the AST-based mechanism", we could regress the AST logic but have our test still pass because the testcase is simple enough that the text-based navigation fallback (that we're adding here) works as well. nridge: > I would strongly consider exposing this function publicly for the detailed tests, and only…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Renamed and comment added. I still need to revise the tests. nridge: Renamed and comment added. I still need to revise the tests.
getCanonicalPath(SM.getFileEntryForID(SM.getMainFileID()), SM);		getCanonicalPath(SM.getFileEntryForID(SM.getMainFileID()), SM);
if (!MainFilePath) {		if (!MainFilePath) {
elog("Failed to get a path for the main file, so no references");		elog("Failed to get a path for the main file, so no references");
return {};		return {};
}		}

// Treat #included files as symbols, to enable go-to-definition on them.		// Treat #included files as symbols, to enable go-to-definition on them.
for (auto &Inc : AST.getIncludeStructure().MainFileIncludes) {		for (auto &Inc : AST.getIncludeStructure().MainFileIncludes) {
if (!Inc.Resolved.empty() && Inc.R.start.line == Pos.line) {		if (!Inc.Resolved.empty() && Inc.R.start.line == Pos.line) {
LocatedSymbol File;		LocatedSymbol File;
File.Name = llvm::sys::path::filename(Inc.Resolved);		File.Name = llvm::sys::path::filename(Inc.Resolved);
File.PreferredDeclaration = {		File.PreferredDeclaration = {
URIForFile::canonicalize(Inc.Resolved, *MainFilePath), Range{}};		URIForFile::canonicalize(Inc.Resolved, *MainFilePath), Range{}};
File.Definition = File.PreferredDeclaration;		File.Definition = File.PreferredDeclaration;
		sammccallUnsubmitted Done Reply Inline Actions FWIW the API for this is visibleNamespaces() from SourceCode.cpp. (No enclosing classes, but I suspect we can live without them once we have a nearby-tokens solution too) sammccall: FWIW the API for this is visibleNamespaces() from SourceCode.cpp. (No enclosing classes, but I…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Thanks, that's convenient! Out of curiosity, though: is the reason to prefer this lexer-based approach over hit-testing the query location against `NamespaceDecl`s in the AST, mainly for performance? nridge: Thanks, that's convenient! Out of curiosity, though: is the reason to prefer this lexer-based…
		sammccallUnsubmitted Done Reply Inline Actions Well, it was written for fallback code completion when we have no AST at all :-) Gathering from the AST should be better, though it's not quite as simple as hit-testing (you also have to find `using namespace`). But this exists today, which is a feature! sammccall: Well, it was written for fallback code completion when we have no AST at all :-) Gathering…
// We're not going to find any further symbols on #include lines.		// We're not going to find any further symbols on #include lines.
return {std::move(File)};		return {std::move(File)};
		nridgeAuthorUnsubmitted Done Reply Inline Actions It occured to me that I don't think we can do `AnyScope=false` if we want to handle dependent member cases like `T().uniqueMethodName()`. The members we want to find in such a case will often be both in a different file (so nearby-tokens won't handle them) and not in any visible scope. nridge: It occured to me that I don't think we can do `AnyScope=false` if we want to handle dependent…
}		}
		sammccallUnsubmitted Done Reply Inline Actions If we're bailing out on >3, I think this limit should be aiming to detect when there's >3, and avoid fetching way too much data, but not trying to avoid noise. (I'd suggest 10 or so) sammccall: If we're bailing out on >3, I think this limit should be aiming to detect when there's >3, and…
}		}

// Macros are simple: there's no declaration/definition distinction.		// Macros are simple: there's no declaration/definition distinction.
// As a consequence, there's no need to look them up in the index either.		// As a consequence, there's no need to look them up in the index either.
SourceLocation IdentStartLoc = SM.getMacroArgExpandedLocation(		SourceLocation IdentStartLoc = SM.getMacroArgExpandedLocation(
		sammccallUnsubmitted Done Reply Inline Actions This seems dead, you're requiring exact matches, these will always have the same score. sammccall: This seems dead, you're requiring exact matches, these will always have the same score.
getBeginningOfIdentifier(Pos, AST.getSourceManager(), AST.getLangOpts()));		getBeginningOfIdentifier(Pos, AST.getSourceManager(), AST.getLangOpts()));
std::vector<LocatedSymbol> Result;		std::vector<LocatedSymbol> Result;
		sammccallUnsubmitted Done Reply Inline Actions This is an interesting signal, I think there are two sensible ways to go about it: assume results in this file are more likely accurate than those in other files. In this case we should at minimum be using this in ranking, but really we should just drop all cross-file results if we have an in-file one. don't rely on index for main-file cases, and rely on "find nearby matching token and resolve it instead". That can easily handled cases defined/referenced in the main-file with sufficient accuracy, including non-indexed symbols. So here we can assume this signal is always false, and drop it. sammccall: This is an interesting signal, I think there are two sensible ways to go about it: - assume…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Since you've implemented "find nearby matching token and resolve it", I went with the second approach. nridge: Since you've implemented "find nearby matching token and resolve it", I went with the second…
if (auto M = locateMacroAt(IdentStartLoc, AST.getPreprocessor())) {		if (auto M = locateMacroAt(IdentStartLoc, AST.getPreprocessor())) {
		sammccallUnsubmitted Done Reply Inline Actions BTW I think the answer for constructors is just to drop all constructor results here. (This also affects template specializations which I think we can not worry about, and virtual method hierarchies which are more painful but I also wouldn't try to fix now) sammccall: BTW I think the answer for constructors is just to drop all constructor results here. (This…
if (auto Loc = makeLocation(AST.getASTContext(),		if (auto Loc = makeLocation(AST.getASTContext(),
		sammccallUnsubmitted Done Reply Inline Actions I'm not sure why we're using SymbolToLocation here: Main file URI check: the `Symbol` has URIs. They need to be canonicalized to file URIs before comparison. This allows checking both decl and def location. PreferredDeclaration and Definition can be more easily set directly from the `Symbol` sammccall: I'm not sure why we're using SymbolToLocation here: - Main file URI check: the `Symbol` has…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Well the `Symbol` has `SymbolLocation`s and we need protocol `Location`s, so we have to use something to convert them. Other places that perform such conversion use `symbolToLocation()`, so I reused it. But you're right that `symbolToLocation()` also has some "pick the definition or the declaration" logic which is less appropriate here. I can factor out the `SymbolLocation` --> `Location` conversion logic from `symbolToLocation()`, and just use that here. nridge: Well the `Symbol` has `SymbolLocation`s and we need protocol `Location`s, so we have to use…
M->Info->getDefinitionLoc(), *MainFilePath)) {		M->Info->getDefinitionLoc(), *MainFilePath)) {
LocatedSymbol Macro;		LocatedSymbol Macro;
Macro.Name = M->Name;		Macro.Name = M->Name;
Macro.PreferredDeclaration = *Loc;		Macro.PreferredDeclaration = *Loc;
Macro.Definition = Loc;		Macro.Definition = Loc;
Result.push_back(std::move(Macro));		Result.push_back(std::move(Macro));
		sammccallUnsubmitted Done Reply Inline Actions I wouldn't bother qualifying this as "for now". Any code is subject to change in the future, but requiring an exact name match for index-based results seems more like a design decision than a fixme. sammccall: I wouldn't bother qualifying this as "for now". Any code is subject to change in the future…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Do we want to rule out the possibility of handling typos in an identifier name in a comment (in cases where we have high confidence in the match, e.g. a long / unique name, small edit distance, only one potential match) in the future? This is also relevant to whether we want to keep the `FuzzyMatcher` or not. nridge: Do we want to rule out the possibility of handling typos in an identifier name in a comment (in…
		sammccallUnsubmitted Done Reply Inline Actions No idea whether typo-correction is a good idea in principle - tradeoff between current false negatives and false positives+compute. However neither FuzzyMatcher nor the existing index implementations support/can easily support real typo correction, and it seems implausible to me we'd add it for this feature. Compare to e.g: allowing case-insensitive match in some cases: `fooBar` vs `FooBar` is a plausible "typo". This is easy to implement. correct the typo where we see the fixed version used as an identifier in this file (and not the original). Excludes some cases, but drives false-positives way down, and easy to implement. I don't think we need to rule things out, but I'm uncertain enough about the approach to think that putting comments, fuzzymatcher etc here speculatively isn't worth it. sammccall: No idea whether typo-correction is a good idea in principle - tradeoff between current false…
		nridgeAuthorUnsubmitted Done Reply Inline Actions Perhaps I'm unclear on the distinction between fuzzy matching and typo correction. Are they not both a matter of comparing a candidate string against a test string, and considering it a match if the they are "close enough" according to some metric (with the metric potentially being a simple edit distance in the case of typo correction)? nridge: Perhaps I'm unclear on the distinction between fuzzy matching and typo correction. Are they not…

// Don't look at the AST or index if we have a macro result.		// Don't look at the AST or index if we have a macro result.
// (We'd just return declarations referenced from the macro's		// (We'd just return declarations referenced from the macro's
// expansion.)		// expansion.)
return Result;		return Result;
}		}
}		}

Show All 9 Lines	std::vector<LocatedSymbol> locateSymbolAt(ParsedAST &AST, Position Pos,
SourceLocation SourceLoc;		SourceLocation SourceLoc;
if (auto L = sourceLocationInMainFile(SM, Pos)) {		if (auto L = sourceLocationInMainFile(SM, Pos)) {
SourceLoc = *L;		SourceLoc = *L;
} else {		} else {
elog("locateSymbolAt failed to convert position to source location: {0}",		elog("locateSymbolAt failed to convert position to source location: {0}",
L.takeError());		L.takeError());
return Result;		return Result;
}		}

		sammccallUnsubmitted Done Reply Inline Actions I don't think this should be logged, particularly by default - it doesn't really indicate anything other than we should have a "look up symbol by name" API (ok, actually I think this is just dead code because we've already checked name above) sammccall: I don't think this should be logged, particularly by default - it doesn't really indicate…
// Emit all symbol locations (declaration or definition) from AST.		// Emit all symbol locations (declaration or definition) from AST.
DeclRelationSet Relations =		DeclRelationSet Relations =
DeclRelation::TemplatePattern \| DeclRelation::Alias;		DeclRelation::TemplatePattern \| DeclRelation::Alias;
for (const NamedDecl *D : getDeclAtPosition(AST, SourceLoc, Relations)) {		for (const NamedDecl *D : getDeclAtPosition(AST, SourceLoc, Relations)) {
const NamedDecl *Def = getDefinition(D);		const NamedDecl *Def = getDefinition(D);
const NamedDecl *Preferred = Def ? Def : D;		const NamedDecl *Preferred = Def ? Def : D;

// If we're at the point of declaration of a template specialization,		// If we're at the point of declaration of a template specialization,
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	Index->lookup(QueryRequest, [&](const Symbol &Sym) {
getPreferredLocation(R.PreferredDeclaration,		getPreferredLocation(R.PreferredDeclaration,
Sym.CanonicalDeclaration, Scratch),		Sym.CanonicalDeclaration, Scratch),
*MainFilePath))		*MainFilePath))
R.PreferredDeclaration = *Loc;		R.PreferredDeclaration = *Loc;
}		}
});		});
}		}

		if (Result.empty()) {
		return navigationFallback(AST, Index, Pos, *MainFilePath);
		}

return Result;		return Result;
}		}

namespace {		namespace {

/// Collects references to symbols within the main file.		/// Collects references to symbols within the main file.
class ReferenceFinder : public index::IndexDataConsumer {		class ReferenceFinder : public index::IndexDataConsumer {
public:		public:
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	indexTopLevelDecls(AST.getASTContext(), AST.getPreprocessor(),
AST.getLocalTopLevelDecls(), RefFinder, IndexOpts);		AST.getLocalTopLevelDecls(), RefFinder, IndexOpts);
return std::move(RefFinder).take();		return std::move(RefFinder).take();
}		}

} // namespace		} // namespace

std::vector<DocumentHighlight> findDocumentHighlights(ParsedAST &AST,		std::vector<DocumentHighlight> findDocumentHighlights(ParsedAST &AST,
Position Pos) {		Position Pos) {
const SourceManager &SM = AST.getSourceManager();		const SourceManager &SM = AST.getSourceManager();
		sammccallUnsubmitted Done Reply Inline Actions (The fuzzy matcher and topN are still here - I think we don't need them, right? With only up-to-3 results, std::sort seems more obvious) sammccall: (The fuzzy matcher and topN are still here - I think we don't need them, right? With only up-to…
// FIXME: show references to macro within file?		// FIXME: show references to macro within file?
DeclRelationSet Relations =		DeclRelationSet Relations =
		sammccallUnsubmitted Done Reply Inline Actions maybe bail out early (on unusable/too many) instead of doing all the score computations first? fuzzyFind(..., { // bail out if it's a constructor or name doesn't match if (Results.size() >= 3) { TooMany = true; return; } // add result }); sammccall: maybe bail out early (on unusable/too many) instead of doing all the score computations first?
DeclRelation::TemplatePattern \| DeclRelation::Alias;		DeclRelation::TemplatePattern \| DeclRelation::Alias;
		nridgeAuthorUnsubmitted Done Reply Inline Actions Sorry this location-setting code is so messy. All my attempts to make it more concise have been thwarted by `llvm::Expected`'s very restrictive API. nridge: Sorry this location-setting code is so messy. All my attempts to make it more concise have been…
		sammccallUnsubmitted Not Done Reply Inline Actions Ugh, don't get me started on Error/Expected :-( I'd love to get rid of it somehow, but it seems like we'd inevitably just end up with the new thing + Error/Expected + error_code/ErrorOr + return-a-bool, and I'm not sure it'd be better. (If you have more energy than me, I'd enthusiastically +1 an llvm-dev proposal to drop the clever checks from llvm::Error, and I know some others who would...) sammccall: Ugh, don't get me started on Error/Expected :-( I'd love to get rid of it somehow, but it…
auto References = findRefs(		auto References = findRefs(
getDeclAtPosition(AST,		getDeclAtPosition(AST,
SM.getMacroArgExpandedLocation(getBeginningOfIdentifier(		SM.getMacroArgExpandedLocation(getBeginningOfIdentifier(
Pos, SM, AST.getLangOpts())),		Pos, SM, AST.getLangOpts())),
Relations),		Relations),
AST);		AST);

// FIXME: we may get multiple DocumentHighlights with the same location and		// FIXME: we may get multiple DocumentHighlights with the same location and
Show All 29 Lines	if (!MainFilePath) {
return Results;		return Results;
}		}
auto URIMainFile = URIForFile::canonicalize(MainFilePath, MainFilePath);		auto URIMainFile = URIForFile::canonicalize(MainFilePath, MainFilePath);
auto Loc = SM.getMacroArgExpandedLocation(		auto Loc = SM.getMacroArgExpandedLocation(
getBeginningOfIdentifier(Pos, SM, AST.getLangOpts()));		getBeginningOfIdentifier(Pos, SM, AST.getLangOpts()));
RefsRequest Req;		RefsRequest Req;

if (auto Macro = locateMacroAt(Loc, AST.getPreprocessor())) {		if (auto Macro = locateMacroAt(Loc, AST.getPreprocessor())) {
// Handle references to macro.		// Handle references to macro.
		nridgeAuthorUnsubmitted Done Reply Inline Actions Oh whoops, this assumption is another dependency on `findNearbyIdentifier()` nridge: Oh whoops, this assumption is another dependency on `findNearbyIdentifier()`
		nridgeAuthorUnsubmitted Done Reply Inline Actions For now, I just had it restrict to 3 results in general (even if they're in the same file). Once `findNearbyIdentifier()` lands, the behaviour will automatically become what we intended. nridge: For now, I just had it restrict to 3 results in general (even if they're in the same file).
if (auto MacroSID = getSymbolID(Macro->Name, Macro->Info, SM)) {		if (auto MacroSID = getSymbolID(Macro->Name, Macro->Info, SM)) {
// Collect macro references from main file.		// Collect macro references from main file.
const auto &IDToRefs = AST.getMacros().MacroRefs;		const auto &IDToRefs = AST.getMacros().MacroRefs;
auto Refs = IDToRefs.find(*MacroSID);		auto Refs = IDToRefs.find(*MacroSID);
if (Refs != IDToRefs.end()) {		if (Refs != IDToRefs.end()) {
for (const auto Ref : Refs->second) {		for (const auto Ref : Refs->second) {
Location Result;		Location Result;
Result.range = Ref;		Result.range = Ref;
▲ Show 20 Lines • Show All 392 Lines • Show Last 20 Lines

clang-tools-extra/clangd/unittests/XRefsTests.cpp

Show First 20 Lines • Show All 579 Lines • ▼ Show 20 Lines	if (!WantDecl) {
llvm::Optional<Range> GotDef;		llvm::Optional<Range> GotDef;
if (Results[0].Definition)		if (Results[0].Definition)
GotDef = Results[0].Definition->range;		GotDef = Results[0].Definition->range;
EXPECT_EQ(WantDef, GotDef) << Test;		EXPECT_EQ(WantDef, GotDef) << Test;
}		}
}		}
}		}

		TEST(LocateSymbol, Textual) {
		sammccallUnsubmitted Done Reply Inline Actions `#ifdef`'d out code is another interesting motivation worth testing. sammccall: `#ifdef`'d out code is another interesting motivation worth testing.
		const char *Tests[] = {
		R"cpp(// Comment
		struct [[Foo]] {};
		// Comment mentioning F^oo
		)cpp",
		R"cpp(// String
		struct [[Foo]] {};
		const char* = "String literal mentioning F^oo";
		)cpp",
		R"cpp(// Invalid code
		int [[foo]](int);
		int var = f^oo();
		)cpp",
		R"cpp(// Dependent type
		struct Foo {
		void [[uniqueMethodName]]();
		};
		template <typename T>
		void f(T t) {
		t->u^niqueMethodName();
		}
		)cpp"};

		for (const char *Test : Tests) {
		Annotations T(Test);
		llvm::Optional<Range> WantDecl;
		if (!T.ranges().empty())
		WantDecl = T.range();

		auto TU = TestTU::withCode(T.code());

		auto AST = TU.build();
		auto Index = TU.index();
		auto Results = locateSymbolAt(AST, T.point(), Index.get());

		if (!WantDecl) {
		EXPECT_THAT(Results, IsEmpty()) << Test;
		} else {
		ASSERT_THAT(Results, ::testing::SizeIs(1)) << Test;
		EXPECT_EQ(Results[0].PreferredDeclaration.range, *WantDecl) << Test;
		}
		}
		}

TEST(LocateSymbol, Ambiguous) {		TEST(LocateSymbol, Ambiguous) {
auto T = Annotations(R"cpp(		auto T = Annotations(R"cpp(
struct Foo {		struct Foo {
Foo();		Foo();
Foo(Foo&&);		Foo(Foo&&);
$ConstructorLoc[[Foo]](const char*);		$ConstructorLoc[[Foo]](const char*);
};		};

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	TEST(LocateSymbol, Ambiguous) {
EXPECT_THAT(locateSymbolAt(AST, T.point("12")),		EXPECT_THAT(locateSymbolAt(AST, T.point("12")),
UnorderedElementsAre(Sym("bar", T.range("NonstaticOverload1")),		UnorderedElementsAre(Sym("bar", T.range("NonstaticOverload1")),
Sym("bar", T.range("NonstaticOverload2"))));		Sym("bar", T.range("NonstaticOverload2"))));
EXPECT_THAT(locateSymbolAt(AST, T.point("13")),		EXPECT_THAT(locateSymbolAt(AST, T.point("13")),
UnorderedElementsAre(Sym("baz", T.range("StaticOverload1")),		UnorderedElementsAre(Sym("baz", T.range("StaticOverload1")),
Sym("baz", T.range("StaticOverload2"))));		Sym("baz", T.range("StaticOverload2"))));
}		}

		TEST(LocateSymbol, TextualAmbiguous) {
		auto T = Annotations(R"cpp(
		struct Foo {
		void $FooLoc[[uniqueMethodName]]();
		};
		struct Bar {
		void $BarLoc[[uniqueMethodName]]();
		};
		template <typename T>
		void f(T t) {
		t->u^niqueMethodName();
		}
		)cpp");
		auto TU = TestTU::withCode(T.code());
		auto AST = TU.build();
		auto Index = TU.index();
		EXPECT_THAT(locateSymbolAt(AST, T.point(), Index.get()),
		UnorderedElementsAre(Sym("uniqueMethodName", T.range("FooLoc")),
		Sym("uniqueMethodName", T.range("BarLoc"))));
		}

TEST(LocateSymbol, TemplateTypedefs) {		TEST(LocateSymbol, TemplateTypedefs) {
auto T = Annotations(R"cpp(		auto T = Annotations(R"cpp(
template <class T> struct function {};		template <class T> struct function {};
template <class T> using callback = function<T()>;		template <class T> using callback = function<T()>;

c^allback<int> foo;		c^allback<int> foo;
)cpp");		)cpp");
auto AST = TestTU::withCode(T.code()).build();		auto AST = TestTU::withCode(T.code()).build();
▲ Show 20 Lines • Show All 561 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add a textual fallback for go-to-definitionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 238599

clang-tools-extra/clangd/SourceCode.h

clang-tools-extra/clangd/SourceCode.cpp

clang-tools-extra/clangd/XRefs.cpp

clang-tools-extra/clangd/unittests/XRefsTests.cpp

[clangd] Add a textual fallback for go-to-definition
ClosedPublic