This is an archive of the discontinued LLVM Phabricator instance.

[clangd] textDocument/SymbolInfo method
ClosedPublic

Authored by jkorous on Nov 21 2018, 7:23 AM.

Details

Summary

Hi,

I implemented textDocument/cursorInfo method based on consensus in https://reviews.llvm.org/D54529.
I'd like to ask for early feedback - what's still missing is relevant client capability.

Couple things that I'd love to hear opinions about are:

  • conditional return in getCursorInfo - Should we return for example data with empty USR?
  • containerName of local variables - It's currently empty even if semantic parent has a name (say it's a function). (Please search for local.cpp in the test.) Is that what we want?
  • For now I used getSymbolAtPosition() as it's simpler and fits the context better. However I assume I could use this optimization from tooling::getNamedDeclAt() (in a separate patch): https://github.com/llvm-mirror/clang/blob/master/lib/Tooling/Refactoring/Rename/USRFinder.cpp#L82
  • One thing I am wondering about is whether we could use (and whether it's a significant improvement) some early return in indexTopLevelDecls (using DeclarationAndMacrosFinder) also for hover and definition use-cases. Is it correct to assume that at a given Location there can be maximum of one declaration and one definition?

P. S. Alex and Ben have thanksgiving break now so they'll probably add any feedback next week.

Diff Detail

Repository
rL LLVM

Event Timeline

jkorous created this revision.Nov 21 2018, 7:23 AM

Thanks for sending this! Broadly looks good, a few details to work out. I think the biggest one is multiple symbols which you've flagged.

I'd like to ask for early feedback - what's still missing is relevant client capability.

I'm actually not 100% sure that's necessary for custom LSP methods.

  • a server capability would suffice at least, unless we're worried clients are going to call it "by mistake"?
  • even that doesn't seem much better than just allowing the call where other servers would return an error. Keep it simple?

I'd suggest leaving it out unless others feel strongly.

Couple things that I'd love to hear opinions about are:

  • conditional return in getCursorInfo - Should we return for example data with empty USR?

Please return a symbol unless it has no SymbolID (we don't treat those as symbols in clangd).
In practice this amounts to the same thing for now (SymbolID is derived from USR). May change in the future though.

  • containerName of local variables - It's currently empty even if semantic parent has a name (say it's a function). (Please search for local.cpp in the test.) Is that what we want?

good question, no idea.

Yes, though at least for C++ this optimization won't usually buy anything, as the top-level decl will just be a namespace decl I think.
We could plumb this deeper if we want to win more here.

  • One thing I am wondering about is whether we could use (and whether it's a significant improvement) some early return in indexTopLevelDecls (using DeclarationAndMacrosFinder) also for hover and definition use-cases. Is it correct to assume that at a given Location there can be maximum of one declaration and one definition?

There can be multiple and we should return them all. (Not sure what we do for hover, but defs handles this correctly).
I can't remember which are the important cases, but one we handle like this is implicit constructor calls:

class Foo { Foo(const char*); }
void g(Foo);
const char* str = "abc";
g(str);

completion on the s in the last line finds both the Foo constructor and the definition of str.

Maybe it's worth revisiting this if we can't find any really important examples. That would involve digging up the historical context (I don't remember it right now).

P. S. Alex and Ben have thanksgiving break now so they'll probably add any feedback next week.

Hope they're having a good break!

ClangdLSPServer.cpp
728 ↗(On Diff #174926)

I'd suggest textDocument/symbolInfo for consistency with LSP (and similar with internal names).
The term "symbol" is used throughout LSP to describe roughly what we're using it to mean here.

Conversely I guess "cursor" refers to the editor cursor, but LSP seems to prefer just talking about positions.

ClangdServer.h
205 ↗(On Diff #174926)

nit: just cursorInfo (or symbolInfo etc) for consistency with other where there isn't a specific verb.

Protocol.cpp
436 ↗(On Diff #174926)

(you only want containername if non-empty, right?)

Protocol.h
676 ↗(On Diff #174926)

Unless I'm misunderstanding, this is about a symbol, not an identifier. (e.g. overloads of C++ functions are distinct symbols). Maybe SymbolDetails would be a useful name.

There's a question of scope here: getCursorInfo seems to return a single instance of these, but there can be 0, 1, or many symbols at a given location. These could be vector<SymbolDetails> or so, or if this struct is intended to encapsulate them all, it could be named SymbolsAtLocation or so.

676 ↗(On Diff #174926)

/// This is returned from textDocument/cursorInfo, which is a clangd extension.

678 ↗(On Diff #174926)

This comment doesn't really say anything. It can be omitted, or we could add additional information.
(Some of the existing comments are equally content-free, because we've copied the comments from LSP verbatim, but that's not the case here)

682 ↗(On Diff #174926)

If we're not going to merge, I'd suggest making this "scope" and allowing it to end with ::.
This makes it easier to compose (qname = scope + name), and makes it clearer what "contains" means. (LSP got this wrong IMO, at least for C++).

682 ↗(On Diff #174926)

what do you want this to do for objc methods and fields?
It'd be worth having a test if it's important, I'm guilty of not knowing/checking ObjC behavior.

684 ↗(On Diff #174926)

USR could use some definition/references.

685 ↗(On Diff #174926)

please don't micro-optimize with smallstring etc in this file.

689 ↗(On Diff #174926)

can this just be SymbolID?

690 ↗(On Diff #174926)

I think SymbolKind would be useful (in particular distinguishing macros might be important?)

This does point towards maybe adding to SymbolInformation rather than a new struct, but I'm not sure which is best.

XRefs.cpp
95 ↗(On Diff #174926)

Please don't do this - it's inconsistent with the other XRefs features.
(If for some reason you *really* want just one result, then this is easy to do on the client side if we get the API right)

770 ↗(On Diff #174926)

(as alluded to elsewhere, we should return all decls/macros, not just one)

clangd/cursor-info.test
1 ↗(On Diff #174926)

It's great to have a smoke test for features like this (probably just a trivial call), but we test the details in unit tests for easier maintenance.

XRefsTests has a bunch of tests for similar features. TestTU + Annotations should make these tests fairly easy to write.

jkorous marked 7 inline comments as done.Nov 22 2018, 12:45 PM

Hi Sam.

I'd like to ask for early feedback - what's still missing is relevant client capability.

I'd suggest leaving it out unless others feel strongly.

More than happy to keep it simple! I somehow assumed there'll be demand for this capability.

  • conditional return in getCursorInfo - Should we return for example data with empty USR?

Please return a symbol unless it has no SymbolID (we don't treat those as symbols in clangd).

Ok.

  • containerName of local variables - It's currently empty even if semantic parent has a name (say it's a function). (Please search for local.cpp in the test.) Is that what we want?

good question, no idea.

I am adding an attempt to get a named declaration context for such cases. It's easy to remove if we decide against that.

Yes, though at least for C++ this optimization won't usually buy anything, as the top-level decl will just be a namespace decl I think.
We could plumb this deeper if we want to win more here.

Ok, I'll leave it for the hypothetical future.

  • One thing I am wondering about is whether we could use (and whether it's a significant improvement) some early return in indexTopLevelDecls (using DeclarationAndMacrosFinder) also for hover and definition use-cases. Is it correct to assume that at a given Location there can be maximum of one declaration and one definition?

There can be multiple and we should return them all. (Not sure what we do for hover, but defs handles this correctly).

Ah, my bad - I was considering only explicit cases. Thanks for the explanation. In the context of textDocument/cursorInfo - does it make sense to use the first explicit declaration we find or am I missing something still? (This is related to discussion about StopOnFirstDeclFound.)

I processed and implemented most of your comments. Two major tasks are: finishing the discussion about multiple declarations and writing unittests (preferably after we agree on interfaces).

Protocol.h
676 ↗(On Diff #174926)

I probably don't understand "many symbols at a given location" - more at the end of my out-of-line comment.

678 ↗(On Diff #174926)

I was actually wondering what is the value of those comments but decided to keep it consistent. Deleting.

682 ↗(On Diff #174926)

I don't really have any opinion about this but @benlangmuir might have.

I'd just make sure whatever we decide makes sense for local symbols (if we keep returning them).

I'll add tests for ObjC. Thanks for pointing that out!

684 ↗(On Diff #174926)

True. Would you go further than expanding the acronym and copying description from doxygen annotation of SymbolID?
I haven't actually found any better description in clang repository.

Shall I note include/clang/Index/USRGeneration.h?

689 ↗(On Diff #174926)

I didn't want to pull SymbolID from index originally but sure.

Are you fine with it living in Protocol.h or shall I move it/factor it out?

690 ↗(On Diff #174926)

AFAIK for our use-case the information in USR is good enough (e. g. "c:foo.cpp@24@macro@FOO") so we might wait until we have a real need. WDYT?

XRefs.cpp
95 ↗(On Diff #174926)

Sure, let's discuss. I probably got confused by your suggestion of using getSymbolAtPosition() in https://reviews.llvm.org/D54529#1299531.

Do I understand it correctly that you mean visiting all declarations and selecting the single explicit one in getSymbolInfo()?
Or do you actually mean change of interface such as this? llvm::Optional<SymbolDetails> getSymbolInfo() -> std::vector<SymbolDetails> getSymbolInfo().
Is my assumption that there could be only single explicit declaration for any given position correct in the first place?

770 ↗(On Diff #174926)

I replied to the above comment as it's related.

jkorous updated this revision to Diff 175069.Nov 22 2018, 12:50 PM
jkorous marked 2 inline comments as done.

Changes based on review.

  • conditional return in getCursorInfo - Should we return for example data with empty USR?

Please return a symbol unless it has no SymbolID (we don't treat those as symbols in clangd).

Ok.

Sorry, this was a mistake - we do in fact use decls that don't have symbol IDs (we just don't look them up in the index).
So I think both SymbolID and USR are optional.

ClangdServer.h
205 ↗(On Diff #174926)

(not done I think? Still says getSymbolInfo)

Protocol.h
684 ↗(On Diff #174926)

I'd add something like:

This is an opaque string uniquely identifying a symbol.
Unlike SymbolID, it is variable-length and somewhat human-readable.
It is a common representation across several clang tools (See USRGeneration.h)
689 ↗(On Diff #174926)

SymbolID shouldn't live in Protocol.h.

It hadn't occurred to me, but including all of Index/Index.h is risky here - it's pretty broad in scope and we run the risk of a circular dependency down the line. Sorry for not noticing!

I'm fine with any of:

  • pull out SymbolID into Index/SymbolID.h which would have few dependencies
  • revert to using std::string here as you originally did
  • depend on index.h for now, and solve this when it becomes a problem
690 ↗(On Diff #174926)

I would personally advise against parsing information out of USRs, but up to you.
(On the other hand if you're just using them as keys to look up information from elsewhere, that makes sense)

XRefs.cpp
95 ↗(On Diff #174926)

I looked into this, and the multiple-symbols is more of a mess than I thought it was.

At a high level: I think it's important that various features that use "symbol under cursor" are consistent. If you do defs or refs or highlights at a point, you should be querying the same set of symbols. And the same goes for anything that starts with a symbolInfo call.

Unfortunately the way the sorting was introduced in rL341463 meant the existing things are not quite consistent, which confuses things a bit.

But at a high level:

  • the status quo is that findDefinition/findRefs/highlights deal with multiple symbols. So yes, symbolInfo should return a vector<SymbolDetails> as things stand.
  • the multiple-symbols situation is a bit surprising and maybe not that useful. It might be possible to change this behavior. It should be changed for all methods (i.e. either we land this patch with vector<SymbolDetails> and then (breaking) change it to Optional<SymbolDetails> later, or we remove the multiple-symbols behavior first.
  • I don't know what the deal is with "explicit". The cases with multiple symbols are already obscure, cases with multiple explicit symbols are probably even rarer if they exist. But I don't think it's particularly safe to assume they don't. And getSymbolInfo really shouldn't be adding its own unique logic around this.

@hokein @ilya-biryukov We should chat about the historical context here.

hokein added inline comments.Nov 23 2018, 1:33 AM
XRefs.cpp
95 ↗(On Diff #174926)

I have the same question when I first touched this code :)
IIRC, we want all symbols in GoToDef (as we can show multiple symbols to clients). Unfortunately, there are some inconsistencies between def (return all), ref (only return explicit symbols), hover (return the first one).

I agree that most cases are single-symbol, but the typical case with multiple-symbols is below. Personally, I'm +1 on removing the multiple-symbols behavior, it can simplify many things...

namespace ns {
void foo(int);
void foo(double);
}
using ns::f^oo;
jkorous marked 7 inline comments as done.Nov 23 2018, 4:56 AM

So I think both SymbolID and USR are optional.

No problem.
I am just wondering if it make sense to include any symbol with empty name, empty USR and no ID in LSP response. I assume either clangd or our LSP client (and hypothetically others too) will have to filter these out. But it might violate consistency with other LSP methods. Anyway, if we agree on filtering such symbols either in getSymbolInfo() or ClangdServer::getSymbolInfo() I am happy to do it, if not it's fine with me.

ClangdServer.h
205 ↗(On Diff #174926)

Sorry, my bad - I missed the verb dropping. Fixing now.
BTW if we want consistency here we should probably rename findReferences(), findHover() too.

Protocol.h
689 ↗(On Diff #174926)

To me the cleanest solution seems to be Index/SymbolID.h.

Generally if we ever get to a point where we'd hit this kind of issues more often it might be worthwhile to keep data structures for LSP protocol and for server implementation separate. But I don't think we're anywhere close yet.

690 ↗(On Diff #174926)

I assume that's what we do but I'll let @benlangmuir or @arphaman confirm this.

XRefs.cpp
95 ↗(On Diff #174926)

Thanks a lot @hokein! This is really helpful because I was assuming there can be just a single explicit symbol the whole time. Probably because getNamedDeclAt() which was our starting point makes that assumption too.

I reread the discussion and agree with @sammccall about keeping the behavior consistent.

My thoughts are:

  • I am going to use vector<SymbolDetails> as a return type.
  • We need to land this patch soon-ish (ideally before the end of next week, hard deadline is mid-December) so we'd prefer not getting blocked by changing all XRefs functionality to single symbol.
  • I assume our client can deal with any (rare) occurrence of multiple symbols in some reasonable fashion (use the first one, ignore all or similar). @benlangmuir could you please confirm this?
jkorous updated this revision to Diff 175115.Nov 23 2018, 5:14 AM
jkorous marked 2 inline comments as done.

Couple minor changes based on discussion.

  • Move SymbolID to index/SymbolID.h.
  • Rename in ClangdServer - drop the verb from method name.
  • Remove conditional return in XRefs.cpp.
jkorous updated this revision to Diff 175137.Nov 23 2018, 12:28 PM

Multiple symbols per location.

Thanks, this looks good! Just nits, and please do port most of the test cases to unit tests.

clangd/ClangdServer.cpp
529 ↗(On Diff #175137)

nit: SymbolInfo

clangd/Protocol.cpp
429 ↗(On Diff #175137)

for each of the attributes that can be logically absent, we should probably serialize it conditionally (or serialize it as null).

We're usually happy enough to use sentinel values like "" to mean missing internally, but we try not to send them over the wire.
(SymbolInformation isn't a great example unfortunately...)

clangd/Protocol.h
691 ↗(On Diff #175137)

Optional<SymbolID>?
"" is a reasonable "absent" value for strings, but we don't particularly have one for symbolID

clangd/XRefs.cpp
771 ↗(On Diff #175137)

nit: just else if (const auto* ParentND = dyn_cast_or_null<NamedDecl>(ND->getDeclContext())?

785 ↗(On Diff #175137)

nit: push_back (and below)

clangd/index/SymbolID.h
58 ↗(On Diff #175137)

nit: missing include of llvm/ADT/Hashing.h

test/clangd/cursor-info.test
1 ↗(On Diff #175137)

(this file should be renamed)

jkorous marked 8 inline comments as done.Nov 26 2018, 4:58 AM

Thank you very much for the review Sam!

I am going to write proper unit tests and then just wait for Alex and Ben to take a look.

clangd/XRefs.cpp
785 ↗(On Diff #175137)

Sure, no problem. I have to admit I thought these two are the same but I guess you pointed this out because there's a functional difference. I'd appreciate if you could advise where can I learn the details.

jkorous updated this revision to Diff 175229.Nov 26 2018, 5:00 AM
jkorous marked an inline comment as done.

Addressed comments from the reivew.

jkorous updated this revision to Diff 175230.Nov 26 2018, 5:02 AM

Removed empty line noise and fixed doxygen annotation.

jkorous retitled this revision from [clangd][WIP] textDocument/CursorInfo method to [clangd][WIP] textDocument/SymbolInfo method.Nov 26 2018, 5:02 AM
sammccall accepted this revision.Nov 26 2018, 5:23 AM

LG from my side.
If you have unit tests in the next couple of days, happy to take a look, otherwise go ahead once Alex/Ben are happy.

(I'm at work today/tomorrow and then I'm going to be away through new year's, so don't wait for me)

clangd/ClangdServer.cpp
529 ↗(On Diff #175137)

(This still says CursorInfo in the runWithAST call)

clangd/ClangdServer.h
204 ↗(On Diff #175137)

Maybe mention "clangd extension" here.
We should really set up some more structured docs for these one day, but a comment is something.

clangd/Protocol.cpp
429 ↗(On Diff #175137)

elsewhere we tend to write:

json::Object Result{
 ... mandatory fields ...
};

if (!P.name.empty())

Result["name"] = P.name;
etc. I find this a little bit more readable than the ternary expressions (due to the need for conversions) but up to you.
clangd/XRefs.cpp
785 ↗(On Diff #175137)

Here the behavior is identical, so this is really a style thing.

emplace_back is a more powerful function, with semantics "make a new element": it will forward *any* argument list to a constructor of Symbol, selected by overload resolution.

push_back is a more constrained function, with semantics "insert this element": it takes one Symbol and forwards it to the move constructor (or copy constructor, as appropriate).

They're equivalent here because emplace_back can select the move-constructor when passed a single Symbol&&. But it communicates intent less clearly, so humans have a harder time (harder to understand the code) as does the compiler (error messages are worse).

This revision is now accepted and ready to land.Nov 26 2018, 5:23 AM
jkorous marked 5 inline comments as done.Nov 26 2018, 12:26 PM
jkorous added inline comments.
clangd/ClangdServer.cpp
529 ↗(On Diff #175137)

I sent my comments first and updated the diff a minute after. You are probably just too fast and saw the stale version :)

clangd/XRefs.cpp
785 ↗(On Diff #175137)

Thanks!

jkorous updated this revision to Diff 175314.Nov 26 2018, 12:27 PM
jkorous marked an inline comment as done.

Adressed last comments + initial unit tests.

LGTM, we can optimize later. It looks like you also moved SymbolID to a new file, this should be an NFC precommit.

arphaman accepted this revision.Nov 26 2018, 6:59 PM
sammccall accepted this revision.Nov 27 2018, 4:55 AM
sammccall marked an inline comment as done.

Tests look great, thanks!

clangd/ClangdServer.cpp
529 ↗(On Diff #175137)

Haha, then I reloaded and saw your changes, but couldn't find my comment to delete it...

jkorous updated this revision to Diff 175475.Nov 27 2018, 6:56 AM

Finished tests and fixed operator<<(SymbolDetails) for containerNames that aren't part of fully qualified name.

jkorous retitled this revision from [clangd][WIP] textDocument/SymbolInfo method to [clangd] textDocument/SymbolInfo method.Nov 27 2018, 6:56 AM

Going to split SymbolID.h as a NFC commit per Alex's suggestion and land this.
Thanks everyone!

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Mar 25 2019, 7:13 AM

SymbolInfoTests.All takes > 20% of all of check-clang-tools time. Can the test be made faster?

Also, since it's using a large-ish non-pod initializer clang is currently slow to compile it, see http://llvm.org/viewvc/llvm-project?view=revision&revision=345329 / PRPR38829

Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2019, 7:13 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript

SymbolInfoTests.All takes > 20% of all of check-clang-tools time. Can the test be made faster?

Also, since it's using a large-ish non-pod initializer clang is currently slow to compile it, see http://llvm.org/viewvc/llvm-project?view=revision&revision=345329 / PRPR38829

SymbolInfoTests.All takes 300ms on my machine and all clangd tests take 12s. So it's definitely not >20% of check-clang-tools. What's your configuration? Any other tips for reproducing this?