This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add targetDecl(), which determines what declaration an AST node refers to.
ClosedPublic

Authored by sammccall on Aug 26 2019, 9:47 AM.

Details

Summary

This is the first part of an effort to "unbundle" our libIndex use into separate
concerns (AST traversal, token<->node mapping, node<->decl mapping,
decl<->decl relationshipes).

Currently, clangd relies on libIndex to associate tokens, AST nodes, and decls.
This leads to rather convoluted implementations of e.g. hover and
extract-function, which are not naturally thought of as indexing applications.

The idea is that by decoupling different concerns, we make them easier
to use, test, and combine, and more efficient when only one part is needed.
There are some synergies between e.g. traversal and finding
relationships between decls, hopefully the benefits outweight these.

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Aug 26 2019, 9:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2019, 9:47 AM
ilya-biryukov added a comment.EditedAug 27 2019, 2:12 AM

Mostly LGTM, thanks! Just a few clarifying questions and suggestions

This API allows us to get a target declaration for a reference, but does not help with finding the source locations for those references. Do you plan to put this into the API? Have this as a separate function?
E.g. finding a source location of field for DeclRefExpr produced for MyBase::field seems to be the same amount of work (in general case) as finding the Decl* of field.

clang-tools-extra/clangd/FindTarget.cpp
374 ↗(On Diff #217181)

NIT: maybe add braces around multi-line body of the for statement?

clang-tools-extra/clangd/FindTarget.h
70 ↗(On Diff #217181)

Could we add convenience overloads for Expr*, QualType, TypeLoc and maybe other interesting cases?
Having to call DynTypedNode::create for each invocation of these two functions would feel awkward, saving some typing for the clients seems to be worth a little boilerplate.

86 ↗(On Diff #217181)

Could you provide examples for Alias and Underlying in the comments?
One with a template and one with a namespace alias

92 ↗(On Diff #217181)

Could you also provide an example here?
It this a delegating constructor in the constructor-init-list?

103 ↗(On Diff #217181)

Why not unsigned? What are the interesting corner cases?

I agree with Ilya's concerns on SourceLocations, even though most of the time user will have the source locations available in the dyntyped node, they might need some traversals to fetch the exact range especially in cases of nested name specifiers.
It would be nice to provide a referencing range in addition to the decls.

clang-tools-extra/clangd/FindTarget.cpp
79 ↗(On Diff #217181)

nit: s/SD/CRD/

clang-tools-extra/clangd/FindTarget.h
97 ↗(On Diff #217181)

Could you rather put an end marker into the DeclRelation and make use of it in here? As people might forget to update this one if they add anything into DeclRelation.

(It seems rare for this enum to change though, so feel free to ignore if you believe it will stay the same).

clang-tools-extra/clangd/unittests/FindTargetTests.cpp
49 ↗(On Diff #217181)

can't you just call A.range() ? :D
I hope it wasn't clangd that somehow code-completed this.

First a wall of text to reply to the general comments. Let's chat if this doesn't make sense.

Mostly LGTM, thanks! Just a few clarifying questions and suggestions

This API allows us to get a target declaration for a reference, but does not help with finding the source locations for those references. Do you plan to put this into the API? Have this as a separate function?

Yes. I actually ripped this out of the patch, because it wasn't related enough and the patch is large.

To restate the relationship between these functions:
Many operations ultimately want to join source location and decl, e.g. gtd is location -> node -> decl, xrefs is decl -> node -> location, indexing is node -> (location, decl).
So providing node->decl and node->location primitives is sufficient. If you want to go in the other direction (decl->node or location->node) then of course you need to traverse and/or build an index.

With that said, there are several legitimate node->location models, and I think this is a good reason to to keep that decoupled from node->decl (testability is another). In particular:

the "flat" model

says that each node may have a token which is its "handle", and each token can be the handle for (refer to) only one node.

int a = X::y;
iii a   X  y

What is the range for a node? Its handle token.
What is the node for a range? The unique handle token covered by the range.

Strengths: accurate, precise triggering on node names. simplicity (conforms to user's view of the code)
Weaknesses: multi-token ranges (selections), implicit nodes (they don't get to have locations), requires specific per-node implementation
Use case: go-to-definition/xrefs triggering, xrefs results (nodes referenced by name)

the "hierarchical" model

says that each node owns a range of tokens, and these nest.

int a = X::y;
        X
        XXX
iii     yyyy
aaaaaaaaaaaa

What is the range for a node? Its full range.
What is the node for a range? The innermost node that entirely contains the selection.

Strengths: exposing underlying structure/hierarchy of the code, can be (mostly) implemented generically
Weaknesses: loose triggering that doesn't distinguish e.g. node names from qualifiers
Use case: extract variable, structured selection, xrefs results (nodes referenced implicitly)


I think we're going to need/want both. The hierarchical model corresponds to SelectionTree. The flat model is the thing we need to build. libIndex looks more like the flat model than the hierarchical one, but has elements of both I think.
(If you think the above explanation is useful, we can check something like it in somewhere)

E.g. finding a source location of field for DeclRefExpr produced for MyBase::field seems to be the same amount of work (in general case) as finding the Decl* of field.

For the hierarchical model it's easy, this is one of the few methods DynTypeNode actually has :-)
For the flat model: it's not trivial, but is less work because while you do have to dispatch over all node types, you don't have general multi-hop/graph cases (I think).
Usually the "handle" tokens are well-supported by the AST because they're precisely the ones that diagnostics want to report. So in the DeclRefExpr case, DeclRefExpr::getLoc() returns the right thing.

I agree with Ilya's concerns on SourceLocations, even though most of the time user will have the source locations available in the dyntyped node, they might need some traversals to fetch the exact range especially in cases of nested name specifiers.

I think if you want the (full) range, this is easy as mentioned above: getSourceRange() (available on DynTypedNode, but also on each specific node type).
If you want just the "handle" token for the flat model, NNS isn't a particularly hard case: it's basically a linked list. Each node claims one component of the NNS as its "handle". If you want to treat NNS as part of the parent (so pointing at X in X::y counts as a ref to Y) then I think you're better off with the hierarchical model.

It would be nice to provide a referencing range in addition to the decls.

Concretely, can you give an example of which node and what you'd want it to point to? If it's just the single unqualified-name token, I agree and would like to add it (as a separate patch). If it's the full range, I think that's getSourceRange(). Is it something else?

sammccall updated this revision to Diff 217392.Aug 27 2019, 7:15 AM
sammccall marked 11 inline comments as done.

review comments

clang-tools-extra/clangd/FindTarget.cpp
79 ↗(On Diff #217181)

Done.
(This was copy/pasted from libIndex, I wonder what SD stood for. StructDecl?)

clang-tools-extra/clangd/FindTarget.h
86 ↗(On Diff #217181)

Done. Added the code example as a section above the values, and the results with the comment on each value - hope this isn't too confusing.

92 ↗(On Diff #217181)

It was, but it was misguided - the delegating constructor calls have a TypeLoc which references the type being delegated to, so there's not much point handling them I think.

(There would be if we could get the constructor, but we can't: I guess in general it may not be resolved)

This was the only user of PureLexical, so I just removed it.

97 ↗(On Diff #217181)

Hmm, I'm reluctant to add a "last" member to a strongly-typed enum, and force switch to handle it etc.

I could add it next to the enum, but then it's ugly and still easy enough to miss.

Ultimately, I don't think it's likely this will be missed: bitset::set() checks its argument and throws out_of_range (i.e. aborts) if we try to set an invalid bit. So any test that exercised a new value would crash in all build configs.

103 ↗(On Diff #217181)

Oops, this dates back to when I was prematurely-templatizing this class. Reverted.

clang-tools-extra/clangd/unittests/FindTargetTests.cpp
49 ↗(On Diff #217181)

A.range() is an LSP Location (line/col pairs), offsets seem slightly more convenient here.

This is ugly though, using Annotations with SourceLocations seems unneccesarily clunky. Ideas?

(this reminds me, these test helpers need comments, added some)

kadircet accepted this revision.Aug 28 2019, 12:26 AM

Concretely, can you give an example of which node and what you'd want it to point to? If it's just the single unqualified-name token, I agree and would like to add it (as a separate patch). If it's the full range, I think that's getSourceRange(). Is it something else?

Yes I was talking about for the single unqualified name.

clang-tools-extra/clangd/unittests/FindTargetTests.cpp
49 ↗(On Diff #217181)

ah sorry somehow thought this was already an llvm::Annotations. You seem to be only using Code and Base::range and they are both coming from llvm::Annotations already. Any reason for not using it directly instead of clangd::Annotations ?

This revision is now accepted and ready to land.Aug 28 2019, 12:26 AM
sammccall marked an inline comment as done.

use llvm::Annotations

ilya-biryukov accepted this revision.Aug 29 2019, 2:36 AM

Yes. I actually ripped this out of the patch, because it wasn't related enough and the patch is large.

Makes sense and thanks for summing it up. Just wanted to make sure we are thinking about this use-case.

My comment was specifically referring to the lack of a "flat" model in our codebase.
I believe hierarchical model is very well handled by the selection tree (there might be bugs, obviously, but conceptually it's doing the right thing).

E.g. finding a source location of field for DeclRefExpr produced for MyBase::field seems to be the same amount of work (in general case) as finding the Decl* of field.

For the hierarchical model it's easy, this is one of the few methods DynTypeNode actually has :-)
For the flat model: it's not trivial, but is less work because while you do have to dispatch over all node types, you don't have general multi-hop/graph cases (I think).
Usually the "handle" tokens are well-supported by the AST because they're precisely the ones that diagnostics want to report. So in the DeclRefExpr case, DeclRefExpr::getLoc() returns the right thing.

To be clear, the concern is not that it's hard to find the corresponding locations for each particular AST node.
The concern is that it's hard to cover all possible nodes and we should do it once rather than repeat this multiple times.

DeclRefExpr was a bad example for that purpose. E.g. one could make the same claim about targetDecl function we are introducing: it's very easy to get a target decl of DeclRefExpr,
but it's hard to enumerate all nodes that can have target decls and ensure they are handled properly.

sammccall updated this revision to Diff 218432.Sep 3 2019, 4:19 AM

fix accidental reverted change

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 3 2019, 4:34 AM