This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Type hierarchy subtypes
ClosedPublic

Authored by nridge on Mar 2 2019, 5:22 PM.

Details

Summary

This builds on the relations support added in D59407, D62459, D62471, and D62839 to implement type hierarchy subtypes.

Diff Detail

Repository
rL LLVM

Event Timeline

nridge created this revision.Mar 2 2019, 5:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2019, 5:22 PM
nridge updated this revision to Diff 189093.Mar 3 2019, 11:40 AM

Rebased onto updated D56370

Removing Sam as reviewer as he's on parental leave.

Kadir, if you have other reviewers to suggest to provide feedback on the API changes here, please feel free to add them. Thanks!

For starters, what about just introducing a new field Bases into the Symbol. Which can store symbol IDs for the parents of the mentioned symbol, then during index build we can simply add same relations from base to this symbol. This should help us get rid of RelationSlab logic.
If there wasn't a specific reason that I missed for introducing RelationSlab logic.

Sorry I didn't notice the mailing thread beforehand, looks like Sam had a good point regarding performing operations on types rather than symbols(http://lists.llvm.org/pipermail/clangd-dev/2019-January/000241.html).
Does current implementation take this into account by either pointing at template specializations and/or base template declarations?

nridge updated this revision to Diff 189275.Mar 4 2019, 9:52 PM

Add tests involving templates and template specializations

nridge added a comment.Mar 4 2019, 9:56 PM

Sorry I didn't notice the mailing thread beforehand, looks like Sam had a good point regarding performing operations on types rather than symbols(http://lists.llvm.org/pipermail/clangd-dev/2019-January/000241.html).
Does current implementation take this into account by either pointing at template specializations and/or base template declarations?

I thought about this, and I believe that having the relationship be between symbols is sufficient.

The main utility in a type hierarchy view is to allow the user to navigate to base/derived classes. If a base or derived class is an implicit specialization, the desired navigation is to the corresponding template definition; if it's an explicit specialization, the desired navigation is to the explicit specialization's definition.

A class template will obviously have its own symbol, so that part's fine. Explicit specializations are not currently stored in the index, but I think it would make sense for them to be, for other reasons as well (e.g. a user may want to navigate to an explicit specialization definition via "workspace/symbol"), in which case they too will get their own symbols. So, in either case, the desired type hierarchy item can be obtained from a symbol.

The updated patch has tests covering these scenarios, specifically Subtypes.ClassTemplate and Subtypes.DependentBase. Two additional tests, Subtypes.TemplateSpec1 and Subtypes.TemplateSpec2 are disabled until we start storing explicit specializations in the index.

hokein added a subscriber: hokein.Mar 5 2019, 4:17 AM

Haven't looked at the patch. I think we can extend the existing Ref to support it, so that most of the stuff could be reused, rather than implementing a new slab:

  • introduce a new RefKind, like BaseOf
  • add a new field SymbolID in Ref

and clangIndex library has already implemented relationship functionality (we could use handleDeclOccurence to store the symbols relationships).

I think we can extend the existing Ref to support it, so that most of the stuff could be reused, rather than implementing a new slab:

  • introduce a new RefKind, like BaseOf
  • add a new field SymbolID in Ref

I had considered this approach as well, but figured that it would be wasteful in terms of storage space.

My understanding is that the storage space taken up for Refs is currently 8 bytes per Ref (4 each for the start and end positions), plus filename strings which are deduplicated across all refs. If we add a SymbolID, that adds an additional 8 bytes to each Ref. Given that Refs are numerous, and most of them won't use the SymbolID, that seems wasteful.

That said, I do appreciate that this is a simpler approach in terms of complexity, so if folks feel the right tradeoff is to take the Refs approach, I am open to doing that.

My understanding is that the storage space taken up for Refs is currently 8 bytes per Ref (4 each for the start and end positions), plus filename strings which are deduplicated across all refs. If we add a SymbolID, that adds an additional 8 bytes to each Ref. Given that Refs are numerous, and most of them won't use the SymbolID, that seems wasteful.

I see. This is a very good observation, we didn't consider increase in memory usage when we discussed your patch. Sorry about that! We discussed your patch again and we agree with the approach of adding a RelationsSlab.

Here are the options we considered, and the trade-offs we identified.

Option 1: Add SymbolID to Ref. Advantage: simpler API, fewer types in the API. Disadvantage: increased memory usage, as you noted.

Option 2: Add Relation struct, but store them in RefSlab, using a compact representation that skips storing SymbolID where it is not present. Advantages: we avoid adding a type that's almost an exact duplicate of RefSlab, we don't unnecessarily increase memory usage. Disadvantages: tricky code to avoid storing SymbolID, and RefSlab now stores both Refs and Relations, which is confusing from the API point of view.

Option 3 (what your patch implements): Add Relation struct and RelationSlab. Advantage: API matches the data model, we don't unnecessarily increase memory usage. Disadvantage: have to plumb the new slab everywhere in the code.

Option 4: Add a Relation struct. Change RefSlab to a template so that it can store either Refs, or Relations. Advantage: API matches the data model, we don't unnecessarily increase memory usage, we avoid duplicating RefSlab type. Disadvantage: possibly complex template in a in important clangd API, we avoided duplicating at most 100 lines, but we still have to plumb the new slab everywhere in the code.

clang-tools-extra/clangd/index/FileIndex.h
125 ↗(On Diff #189275)

AllSlabs? And even better, a struct?

nridge planned changes to this revision.Mar 7 2019, 6:02 PM

Thanks for the detailed analysis!

For completeness, I think another advantage of option 3 is that it allows us to make the key for relation lookups be a (SymbolID, RelationKind) pair rather than just a SymbolID.

It sounds like I should continue implementing the current approach. Marking the revision as "changes planned" accordingly.

Hi Nathan,

I would also suggest splitting up current changes so that we can start reviewing them, which might result in other changes in your planned changes and help reduce duplicate work both on our and your side.
Also please have a look at D59083, and make sure it helps implement what you have in might and let me know if there's anything missing.

As for splitting changes I would suggest something like:

  • RelationSlab
  • Serialization/Deserialization of the RelationSlab
  • Integrating RelationSlab with SymbolCollector
  • Adding queries to index structures(memindex, dex, mergedindex etc.)
  • Surfacing those queries through typehierarchy request

Hi Nathan,

I would also suggest splitting up current changes so that we can start reviewing them, which might result in other changes in your planned changes and help reduce duplicate work both on our and your side.
[...]

As for splitting changes I would suggest something like:

  • RelationSlab
  • Serialization/Deserialization of the RelationSlab
  • Integrating RelationSlab with SymbolCollector
  • Adding queries to index structures(memindex, dex, mergedindex etc.)
  • Surfacing those queries through typehierarchy request

Thanks. The proposed split looks good to me. I've posted the first patch here: https://reviews.llvm.org/D59407

Also please have a look at D59083, and make sure it helps implement what you have in might and let me know if there's anything missing.

I verified that with D59083 + D59354, I'm able to get my template specialization tests passing. Thanks for working on those!

nridge updated this revision to Diff 203073.Jun 4 2019, 10:12 PM

Rebased onto D62839 and predecessors

nridge planned changes to this revision.Jun 4 2019, 10:12 PM
nridge retitled this revision from [WIP] [Looking for API feedback] [clangd] Type hierarchy subtypes to [WIP] [clangd] Type hierarchy subtypes.Jun 4 2019, 10:13 PM
nridge edited the summary of this revision. (Show Details)
nridge edited the summary of this revision. (Show Details)
nridge updated this revision to Diff 204201.Jun 11 2019, 6:31 PM

Rebase, add lit test, and post for review

nridge retitled this revision from [WIP] [clangd] Type hierarchy subtypes to [clangd] Type hierarchy subtypes.Jun 11 2019, 6:32 PM
nridge edited the summary of this revision. (Show Details)
nridge updated this revision to Diff 204208.Jun 11 2019, 7:39 PM

Get disabled tests passing

mostly LG, thanks!

clang-tools-extra/clangd/XRefs.cpp
1057 ↗(On Diff #204208)

nit: could we directly use THI.range.start.line (same for the following 3 lines)

1061 ↗(On Diff #204208)

a few different approaches that comes to my mind:

  • store the full range in index.
  • check AST cache to see if we have AST for CD.FileURI, and use that decl.
  • build AST for CD.FileURI
  • heuristically parse CD.FileURI

Could you create a bug report in https://github.com/clangd/clangd/issues ?

1064 ↗(On Diff #204208)

yeah it would be great to have a Location symbolToLocation(const Symbol& S);

1071 ↗(On Diff #204208)

you do not need WorkspaceRoot, you can pass the translationunit getTypeHierarchy was triggered on. Which is available in File parameter.

Also it always exists whereas WorkspaceRoot might be missing depending on the editor.

1212 ↗(On Diff #204208)

nit: braces(just the inner one, keep the outer one)

clang-tools-extra/clangd/unittests/TypeHierarchyTests.cpp
459 ↗(On Diff #204208)

bool GotResult = false;

462 ↗(On Diff #204208)
EXPECT_FALSE(GotResult);
GotResult = true;
466 ↗(On Diff #204208)

EXPECT_TRUE(GotResult);

470 ↗(On Diff #204208)

Maybe call it Subject instead of Type ?

478 ↗(On Diff #204208)

could you add another child that derives Parent to check multiple children case?

479 ↗(On Diff #204208)

could you get rid of member fields and put definitions into single lines to make test smaller?

505 ↗(On Diff #204208)

same suggestion for member fields and single line definitions

nridge updated this revision to Diff 204696.Jun 13 2019, 10:00 PM
nridge marked 13 inline comments as done.

Address review comments

nridge added inline comments.Jun 13 2019, 10:00 PM
clang-tools-extra/clangd/XRefs.cpp
1061 ↗(On Diff #204208)
kadircet accepted this revision.Jun 14 2019, 12:51 AM

LGTM with a few small comments.

Thanks for implementing this awesome feature!

clang-tools-extra/clangd/FindSymbols.cpp
42 ↗(On Diff #204696)

let's rather return an llvm::Expected<Location> and propagate the error messages in the error then log in the caller site

clang-tools-extra/clangd/FindSymbols.h
21 ↗(On Diff #204696)

let's rather #include "Symbol.h"

clang-tools-extra/clangd/XRefs.cpp
1052 ↗(On Diff #204696)

this part looks a little messed up ?

I suppose you wanted to propagate TUPath to symbolToLocation right?

clang-tools-extra/clangd/XRefs.h
139 ↗(On Diff #204696)

I believe TUPath is not used if Index is nullptr, maybe replace those two?

This revision is now accepted and ready to land.Jun 14 2019, 12:51 AM
nridge updated this revision to Diff 204937.Jun 15 2019, 7:27 PM
nridge marked 5 inline comments as done.

Address remaining review comments

clang-tools-extra/clangd/XRefs.cpp
1052 ↗(On Diff #204696)

Good catch! Fixed.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2019, 7:28 PM