This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add support for hierarchical documentSymbol
ClosedPublic

Authored by ilya-biryukov on Sep 20 2018, 9:11 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

ilya-biryukov created this revision.Sep 20 2018, 9:11 AM
ilya-biryukov planned changes to this revision.Sep 20 2018, 9:14 AM

Posted to make sure it's visible that I've started doing this.
Still need to update the tests and check for the capability from the client (and fallback to SymbolInformation if client does not support the new implementation).

Nevertheless, it works and looks really good in VSCode.

Will update the thread when the code is ready for review.

Ohh awesome, I didn't know the LSP supported that. I'll try it it Theia when I have time.

simark added a comment.Oct 4 2018, 2:38 PM

I just tried this, this looks very promising! It should help build our outline view in a much more robust way than we do currently.

A nit for the final patch, I would suggest omitting the fields that are optional, like children (when the list is empty) and deprecated.

In vscode, is there a way to get a tree representation of this data? When I look at "Go to symbol in File..." (ctrl-shift-o) or the outline view at the bottom of the file explorer, they are both a flat list. What difference does this patch make in how vscode shows the data?

I just tried this, this looks very promising! It should help build our outline view in a much more robust way than we do currently.
A nit for the final patch, I would suggest omitting the fields that are optional, like children (when the list is empty) and deprecated.

SG, will do.

In vscode, is there a way to get a tree representation of this data? When I look at "Go to symbol in File..." (ctrl-shift-o) or the outline view at the bottom of the file explorer, they are both a flat list. What difference does this patch make in how vscode shows the data?

There's an outline view that shows the tree after this patch.
IIRC, the "Go to symbol in File..." also shows the tree view now without the lack of functionality like filtering by name, etc.
Overall, the experience with a tree seemed strictly better in all scenarios for me.

  • Improve traversal of the AST.
  • Update the tests.
  • Add a fallback to SymbolInformation (flat structure) if the client does not support the hierarhical reponse.

Still some work left to do:

  • Do not drop the qualifiers for out-of-line declarations, i.e. 'X::foo' is presented as 'foo' now, which is a bit confusing.
  • See why some names pop up at the global scope (GTest generates macros like that).
ilya-biryukov added a comment.EditedOct 25 2018, 5:57 AM

Another annoyance to fix:

  • 'using namespace any_ns' is shown as '<using-directive>'

Very nice! Mostly just a few style/structure nits.

clangd/AST.cpp
69 ↗(On Diff #171067)

just use ND?

clangd/ClangdLSPServer.cpp
28 ↗(On Diff #171067)

move this nearer the call site?

29 ↗(On Diff #171067)

or flattenSymbolHierarchy?

31 ↗(On Diff #171067)

auto-typed lambdas can't be recursive, but std::function shouldn't be too slow here? And clearer I think:

vector<SymbolInformation> Results;
std::function<void(const DocumentSymbol&, StringRef)> Process = [&](const DocumentSymbol &Sym, StringRef ParentName) {
  ...
};
for (const auto& TopLevel : Symbols)
  Process(TopLevel);
return Results;

failing that, I found this pattern confusing - I'd suggest either making it a recursive plain function, or a standard functor object as if it were a lambda (with the recursive call being operator())

44 ↗(On Diff #171067)

nit: optional<StringRef> for parent name seems slightly neater

clangd/FindSymbols.cpp
190 ↗(On Diff #171067)

may want to add a FIXME here: per the tests, it's not classifying constructor/destructor/operator correctly (they're all "methods").

cons/dest is just indexSymbolKindToSymbolKind needing an update, but operator isn't represented in index::SymbolKind.
Maybe we should stop using index::SymbolKind entirely, it doesn't appear to be great.

208 ↗(On Diff #171067)

this class seems maybe too long to live inside this function. (I wouldn't worry about it being visible to other stuff, the file is small and focused)

208 ↗(On Diff #171067)

(this looks more like a class than a struct?)

208 ↗(On Diff #171067)

should this be a RecursiveASTVisitor?
I guess not, but please explain why (there's no documentation on the implementation strategy, and it's complicated)

211 ↗(On Diff #171067)

again, doing the work in the constructor is a bit odd - own the data and expose it?

253 ↗(On Diff #171067)

we only consider visiting named decls, maybe reflect that here?
(not needed now but may allow future refinement)

264 ↗(On Diff #171067)

isn't this covered by D->isImplicit?

276 ↗(On Diff #171067)

isn't this covered by D->isImplicit() above?

clangd/Protocol.h
43 ↗(On Diff #171067)

why?

154 ↗(On Diff #171067)

just an overload of contains()?

unittests/clangd/FindSymbolsTests.cpp
48 ↗(On Diff #171067)

ChildrenAre() (with no args) and NoChildren() are the same matcher I think.

I'd suggest flattening Children() into ChildrenAre() and renaming it to Children() - less things for the reader to understand.

442 ↗(On Diff #171067)

this one is to be fixed, right?

521 ↗(On Diff #171067)

hmm, this seems pretty confusing - I think Tmpl<float> would be a clearer name for a specialization, even if we just have Tmpl for the primary template.
Partial specializations are confusing, though :-/

523 ↗(On Diff #171067)

why the change in policy with this patch? (one of these previously was deliberately not indexed, now is)

524 ↗(On Diff #171067)

these assertions are confusing as it's not clear which expected belongs with which piece of actual code. Obviously they all need to have the same name though.
Best workaround I can think of is putting "int a", "int b" etc between the confusing decls, and including them in the expected top-level symbols (since you assert in order)

571 ↗(On Diff #171067)

hmm, our handling of anonymous enums seems different than anonymous namespaces - why?

ilya-biryukov marked 15 inline comments as done.
  • Address comments
clangd/FindSymbols.cpp
190 ↗(On Diff #171067)

Added a FIXME.
From what I can tell, writing our own classification function shouldn't be too much work and would clearly fix those cases. We should do this!

208 ↗(On Diff #171067)

Added a comment.

264 ↗(On Diff #171067)

Nope, IIUC things are only marked "implicit" in clangd if they were generated by the compiler from the start.
For this case, the initial template code was written by the user, so isImplicit() returns false for both the template and its instantiations.

276 ↗(On Diff #171067)

See the other comment.

clangd/Protocol.h
43 ↗(On Diff #171067)

Sorry, accidental change. Reverted.

unittests/clangd/FindSymbolsTests.cpp
442 ↗(On Diff #171067)

Why? The outline view gives both the declaration and the definition, since both were written in the code.
That seems to be in line with what I'd expect from the outline view.

521 ↗(On Diff #171067)

Done. Now prints as Tmpl<float>.
This may also include some arguments not written by the users (e.g. some default args), but added a FIXME to fix this, it's not entirely trivial

523 ↗(On Diff #171067)

We might have different goals in mind here.
From my perspective, the purpose of the outline is to cover all things written in the code: there are 4 decls that the user wrote: one primary template, one template specialization and two template instantiations.

What is your model here?

571 ↗(On Diff #171067)

No rigorous reasons. The namespaces tend to group more things together, so it's arguably more useful to have an option of folding their subitems in the tree.
Anonymous enums are typically used only in the classes (and on the top-level in C to define constants?) and I feel having an extra node there merely produces noise.
Happy to reconsider, what's your preferred behavior?

  • Remove accidentally added qualifiers
sammccall accepted this revision.Nov 22 2018, 6:54 AM
sammccall added inline comments.
unittests/clangd/FindSymbolsTests.cpp
442 ↗(On Diff #171067)

Sorry, I meant the name should be Foo::def instead of def, which you mentioned in a previous comment.

OK to land without this, but I think it deserves a fixme.
(If you think this is the *right* behavior, let's discuss further!)

521 ↗(On Diff #171067)

Nice! I thought this was going to be really hard.

(You can specialize without naming the defaulted args? That sounds... obscure. Definitely fine to leave for later)

523 ↗(On Diff #171067)

I don't have a particular opinion, just wondering if this was a deliberate change or fell out of the implementation.

(The old behavior was tested, so I guess Marc-Andre had a reason, but it didn't come up in the review. I don't think this is common enough to worry much about either way)

571 ↗(On Diff #171067)

I *think* I might prefer the opposite - namespaces flat and enums grouped :-)
I'm not certain nor adamant about either, so consider my thoughts below, and pick something.

Namespaces: anonymous namespaces are mostly (entirely?) used to control visibility. I don't think they *group* in a meaningful way, they just hide things like static does. In a perfect world I think there'd be a "private" bit on outline elements that was shown in the UI, and we'd put it on both things-in-anon-namespaces and static-as-in-private decls. The main counterargument I can see: because we typically don't indent inside namespaces, the enclosing anonymous namespace can be hard to recognize and jump to.

Enums: conversely, I think these _typically_ do group their elements, and there's often a strong semantic boundary between the last entry in the enum and the first decl after it. When people use a namespacing prefix (C-style VK_Foo, VK_Bar) then it's visually noticeable, but this is far from universal in C++ (particularly at class scope).

Consistency: It's not *really* confusing if these are different, just *slightly* confusing. Anonymous structs etc must certainly be grouped, so this is a (weak) argument for grouping everything.

This revision is now accepted and ready to land.Nov 22 2018, 6:54 AM
ilya-biryukov added inline comments.Nov 23 2018, 5:52 AM
unittests/clangd/FindSymbolsTests.cpp
442 ↗(On Diff #171067)

Oh, yeah, that I agree with! Will look into fixing this in a follow-up change.

521 ↗(On Diff #171067)

Yeah, you can. It's especially common with function where specializations don't name any arguments and those are inferred from the function type...

523 ↗(On Diff #171067)

Yeah, this fell out of the implementation of walking over the "user-written" part of the AST and it looked sensible so I updated the test.

571 ↗(On Diff #171067)

I'm sold on the consistency argument. Besides, the anonymous enums are not too frequent anyway, so treating them in a special way might be an overkill.
We can always tweak this in the future.

  • Do not drop anonymous enums
  • Handle 'using namespace' in printName
  • Do not mix-in symbols with locations from other files.
This revision was automatically updated to reflect the committed changes.