This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add USR to textDocument/definition response
AbandonedPublic

Authored by jkorous on Nov 14 2018, 8:06 AM.

Details

Summary

We need a way for given code position to get the best definition/declaration location in given translation unit and it's USR.

Since the first element in list returned by textDocument/definition is meeting the def/decl location requirement we'd just need to add the USR. Other option would be to split this to a separate method. I thought this patch to be a better fit from protocol perspective but I am happy to go the other way if that's preferred.

I'd like to ask if someone more familiar with gtest matchers could take a look at changes I made in unit-tests. I am not sure it's exactly the idiomatic way. I was trying to use Property() matchers first but gave up since I wasn't able to debug it (got lost in template error messages).

Diff Detail

Event Timeline

jkorous created this revision.Nov 14 2018, 8:06 AM

Hi Jan,

Clangd uses SymbolIDs rather than USRs, to identify symbols.

However these are used only internally, and for extension point APIs (SymbolIndex), and not actually exposed over the wire.
Can you explain more about the need to identify the symbol in go-to-definition?

Conceptually the idea of providing at least an opaque identifier when naming symbols makes sense - the cursor position can often be ambiguous.
However LSP decided not to go this route, and we risk diverging (e.g. should "hover" be able to accept such an identifier too? etc).

Giving particular semantics to the identifiers seems much more risky.

One more thing - shall I create new client capability flag for this?

Hi Sam!

I am aware of clangd using SymbolID. We basically need USR for integration with our external indexing service. We don't plan to systematically use it in the protocol and I am not aware of any other requirement for using USR in LSP - will double check that to be sure.

We are also discussing creating separate method as we'll likely want to add other data to the response in the future. Would you prefer USR not to be in the standard part of LSP but only in our extension?

jkorous added a comment.EditedNov 14 2018, 10:42 AM

Correction - I asked offline about our intended use of USRs in LSP and it seems we might want to receive it as part of other responses too. Nothing specific for now but it's probable that it won't be just this singular case.

@sammccall RE using USRs: we want to integrate clangd into a larger source tooling system that is already using USRs extensively to identify symbols. The most obvious case is that we have an index outside of clangd that uses USRs from clang-the-compiler, so exposing USRs in clangd lets us integrate information from both places. The most critical piece of this is being able to ask "what symbol is at this location" and correlate that with index queries. Beyond this one case, my experience is that any time a language service API is providing information about a symbol or list of symbols it may be useful to add the USR (e.g. hover, definition or workspace symbols).

You mentioned a concern that this would make us diverge from LSP - could you expand on that? I think for any existing requests the USR is purely additive. If the concern is about affecting the return values of existing queries, would it help to put this behind a configuration option?

There is also an extension that might be interesting to look at here: https://github.com/sourcegraph/language-server-protocol/blob/master/extension-symbol-descriptor.md In this case the symbol descriptor is opaque, but in practice our use case would require knowing how to dig out a USR string.

For the specific use case of getting cursor info, as @jkorous mentioned we might want to move this out of "definition". Our experience with sourcekitd for Swift is that a cursor info request is a useful place to put an assortment of information that may grow over time - e.g. name, type name, USR, type USR, symbol kind, local declaration/definition location, etc.

If we're extending the API, we should be careful to do it in an orthogonal and reasonable general way. Apple may not need USR in other places now, but there are users other than Apple and times other than today :-)

One of the reasons this looks odd is the method is "find definitions" rather than "get symbol info". So e.g. it's odd to return the symbol USR here but not in "find references", and doing a lookup in the index is unnecessary if what you want want is the USR. So I do think a new "get symbol info" method is better, if that's what you want.

This is doable as an LSP extension, though it does give me pause to reconsider the design:

    • the C++ API is for this much easier to design: get a SymbolID & Decl* while the AST is alive, with the option of looking up in the index. With LSP, it's not obvious what to include/exclude.
    • an LSP call can't return any persistent handle to the symbol (i.e Decl*) as the AST has no lifetime guarantees. So any follow-up operation on the symbol needs to repeat work and handle errors (symbol gone?). This puts further pressure on orthogonality.
  • it's early days, are there more? The more custom actions there are that don't fit LSP, the less coherent ClangdLSPServer becomes (it's meant to model LSP quite closely, where ClangdServer matches "reality")

So are we still sure "mostly unmodified clangd over a different transport with a few extensions" is a better fit for the xcode XPC service than "AST management from ClangdServer, some features from ClangdServer, some custom features built on TUScheduler, exposed over an XPC protocol inspired by LSP"?

@benlangmuir: I hadn't seen your comment while replying, sorry for any confusion...

The most critical piece of this is being able to ask "what symbol is at this location" and correlate that with index queries.

This makes sense (at least doing follow-up index queries, I'm not sure exactly what "correlate" would entail).
clangd does have an internal concept of an index that is designed as an integration point for out-of-tree implementations.
Of course you're free not to use that and to query a separate index based on results returned from the LSP layer, but this is likely to be lossy (e.g. find refs will give you an empty list if the symbol under the cursor has no refs visible to clangd, leaving nothing to query the external index for).
The SymbolIndex API is not USR based, but in practice SymbolIDs are truncated hashes of USRs and we could probably add some guarantees there if it would help - integration may be an option.

Beyond this one case, my experience is that any time a language service API is providing information about a symbol or list of symbols it may be useful to add the USR (e.g. hover, definition or workspace symbols).

So in general I'm afraid this is unlikely.

  • As mentioned, clangd doesn't use USRs to identify symbols, but rather SymbolID. This representation was chosen specifically to avoid coupling to USR, and to be more compact. Index results do not have USRs available.
  • in principle embedding SymbolID in results seems ok. I do think in practice there's not much you can *do* with one outside clangd, without guarantees about SymbolID format. At that point (with guarantees) we're closer to the C++-embedding level of stability/coupling rather than the LSP-protocol level.

Our experience with sourcekitd for Swift is that a cursor info request is a useful place to put an assortment of information that may grow over time

Good to know, this does seem like a reasonable direction, and exposing USR there seems totally fine. The only high-level question I can see here is should such a request hit the index or rely on AST only? The index is needed for e.g. definitions not visible in the AST, but may spend some CPU or even a network RPC.

I don't think we should be using textDocument/definition here, and I agree with Sam that a new method would be better. We don't actually need the semantic guarantees/constrains imposed by LSP's description of textDocument/definition, as we want to find any declaration that corresponds to the reference at the location, so we don't need to guarantee that we return some definition. This would also avoid any confusion about USRs returned in textDocument/definition/textDocument/references.

@sammccall

Good to know, this does seem like a reasonable direction, and exposing USR there seems totally fine. The only high-level question I can see here is should such a request hit the index or rely on AST only

It's up to all of us to come up with the semantics of this new request. For our use-case we need the AST information only, and we don't want to pay the index hit here. I would propose to define the semantics similar to returning information about the symbol at the location from the view point of just the textDocument in which the query is located. This might not be useful to high level clients that want all queries to be backed by the Index, but it could be useful to other clients that are interested in the symbol as its viewed by that text document. We don't have to specify whether it's a declaration or a definition as well.

it's early days, are there more? The more custom actions there are that don't fit LSP, the less coherent ClangdLSPServer becomes (it's meant to model LSP quite closely, where ClangdServer matches "reality")

So are we still sure "mostly unmodified clangd over a different transport with a few extensions" is a better fit for the xcode XPC service than "AST management from ClangdServer, some features from ClangdServer, some custom features built on TUScheduler, exposed over an XPC protocol inspired by LSP"?

We are committed to relying on LSP with necessary extension / capability points. We don't really have any other option at this point due to scheduling concerns. As far as the need for different customizations, this "cursor-info" request is probably the most low level extension we need, and the majority of our future work will be relevant to high level, pure LSP clients (e.g. semantic highlighting, refactoring).

@jkorous

We don't need to rely on finding the definition or using the index for the new method. What we really need is to map from a location to a Decl * which we can represent as a symbol (USR, etc.) in the response. A declaration in this case is perfectly fine. You can leverage existing code like clang::tooling::getNamedDeclAt that will return the declaration for a location, and that's used by things like clang-rename.

OK, sounds like we have something to move forward with. I'd suggest we start with an operation returning {SymbolID, scope qualifiers, unqualified name, USR} and ignoring location for now, unless you have an immediate need. Reason being this sidesteps the index question, what the semantics of the location are, which redecl to use etc. Hmm, this looks a lot like SymbolInformation as used by textDocument/documentSymbol with a USR extension. We could consider reusing that type...

A declaration in this case is perfectly fine. You can leverage existing code like clang::tooling::getNamedDeclAt that will return the declaration for a location, and that's used by things like clang-rename.

I'd suggest putting this operation in XRefs.cpp and reusing getSymbolAtPosition, because that's consistent with how we handle other "symbol at cursor" operations, and is aware of our preamble handling. If the implementation can be improved or reuse other libraries that's a good idea, but it should be uniform.
(getNamedDeclAt looks simpler/cleaner, but I assume you want this to work when the cursor is on a reference rather than a decl? If I'm reading getNamedDeclAt correctly, it doesn't seem to handle that).

jkorous planned changes to this revision.Nov 21 2018, 6:51 AM