This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Lib to compute and represent selection under cursor.
ClosedPublic

Authored by sammccall on Jan 31 2019, 7:55 PM.

Details

Summary

The primary problem this solves is to expose the codeAction selection to
AST-based refactorings in a way that makes it easy and efficient for them to
bind to the right parts of the AST.

It should also allow us to make XRefs based features (textDocument/definition)
more robust, more easily implement textDocument/typeDefinition etc.
As an example, template parameter references can be identified without special
handling.
There should be slight speedup too: we can prune most of the AST traversal
in most cases.

Elephant in the room: this is similar-but-different to Tooling/Refactoring/ASTSelection.
That captures a smaller set of AST nodes, has a slightly different way of
representing selections, and generally has mare features and does more work.
The overall shape is pretty similar, and yet I can't quite get to behave as I
expect.

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Jan 31 2019, 7:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2019, 7:55 PM
sammccall updated this revision to Diff 184672.Jan 31 2019, 8:07 PM

eliminate a gratuitous template

Awesome job. Any data on how much time this takes to build? Do you think this would ever be a bottleneck?

clangd/Selection.cpp
30 ↗(On Diff #184672)

Any reason to limit this to the main file and not take a FileID as a parameter? This looks like an artificial limitation, even if that's the only way we can use it now in clangd.

51 ↗(On Diff #184672)

NIT: maybe make the ctor private and create a static function? Alternatively, maybe create a free function that creates returns deque<Node>?
Would simplify the callsite a bit, making it a bit more readable.

113 ↗(On Diff #184672)

NIT: maybe specify auto twice?

auto B = ...;
auto E = ...;
158 ↗(On Diff #184672)

NIT: specify auto twice?

223 ↗(On Diff #184672)

Why not vector? We seem to only push_back/pop_back

224 ↗(On Diff #184672)

NIT: maybe split into three declarations?

sammccall marked 5 inline comments as done.Feb 1 2019, 2:14 AM

Awesome job. Any data on how much time this takes to build? Do you think this would ever be a bottleneck?

I don't have data yet, I need to add some tracing to D57570 and try it out with big files in VSCode.
Unless I'm *seriously* miscalculating though, it should be:

  • almost free on code that's not mostly macro-expanded where the selection is smallish (canSafelySkipNode optimization means not traversing much)
  • a fairly cheap RecursiveASTVisitor if the selection is fairly small (we just do push/pop and integer compares on SourceLocations, lexing only happens for nodes that actually are partially selected)
  • maybe quite expensive if you select-all on a huge file - this is what I should test. Could put some caps on size if needed.
clangd/Selection.cpp
30 ↗(On Diff #184672)

I can't think of anything that specificallly would break (unless someone passed a FileID corresponding to a macro...)

On the other hand, I can't think of any reason to allow it: this is very geared to user-driven selections, and users only ever see code in the main file. Do you have something in mind?

(In the absence of a need for it, I find plumbing an extra parameter and losing a little bit of concreteness in the mental model to be the main reasons to resist...)

113 ↗(On Diff #184672)

hmm, when these are logically pairs I prefer to omit the second auto, to emphasize that they're the same type...
is there a gotcha here?

223 ↗(On Diff #184672)

deque guarantees pointer stability, if we used vectors then push/pop would break the parent/child links between nodes.

224 ↗(On Diff #184672)

as above, I think SelBegin/SelEnd belong together.
Split out SelBeginTokenStart and gave it a lengthy comment, as it's pretty subtle.

sammccall updated this revision to Diff 184699.Feb 1 2019, 2:15 AM

Organization in Selection.cpp

The "select everything in a very large file" is exactly the problematic case that came to mind before. I'm sure it's possible to build the selection tree in linear time, but the traversals might also become a problem.

For the code tweaks, we definitely don't want each tweak traversing the whole AST. There are multiple potential options there, e.g. cutting the tree at a reasonable depth.

clangd/Selection.cpp
30 ↗(On Diff #184672)

Nothing in particular, it was just a code style thing, I sometimes find more generic functions easier to read, e.g. in this case I won't need to keep the fact that we're calling getMainFileID() because selections are currently only used in the main file (even though in principle they're also applicable to any file).

Not a big deal, though, I'm happy either way.

113 ↗(On Diff #184672)

Only if one starts using pointers(int *a, *b), but that alone is enough of a reason for me to keep away from multiple declarators in a single declaration.

I think in most of our code in clangd we tend to use separate declarations, so we may also want to keep away from them for consistency.

Not a big deal, though, keeping them is also totally fine.

223 ↗(On Diff #184672)

Got you! And thanks for adding a comment.

sammccall marked 10 inline comments as done.Feb 1 2019, 5:38 AM

The "select everything in a very large file" is exactly the problematic case that came to mind before. I'm sure it's possible to build the selection tree in linear time, but the traversals might also become a problem.

(Just so we're on the same page: this is good discussion regardless, but do you see this as an issue we should try and resolve in this patch?)

So recursive traversal downward is definitely the case that could be slow. My goal was to make it hard to be slow and I didn't entirely succeed (yet?).

I don't expect that to be a really common pattern. More common is traversing *upward* from the LCA (cheap because no branching), and possibly traversing downward but only looking for fairly specific nodes.

For the code tweaks, we definitely don't want each tweak traversing the whole AST. There are multiple potential options there, e.g. cutting the tree at a reasonable depth.

Another fairly natural-seeming idea is not to store children of Completely covered nodes - as soon as you hit something that's completely covered you should take it or leave it.
This should keep the number of nodes in the tree low, because it's hard to fractally select half of all parts of the tree - in practice you're selecting big subtrees.
However it's slightly tricky from an implementation point of view: we should still require a node own at least one character to be complete, which requires knowing the children, which we only know after traversal, which means it's too late to prune traversal of this subtree. And I'm also not sure if this might be too limiting for some tweaks we want to write.

clangd/Selection.cpp
30 ↗(On Diff #184672)

Fair enough. I've put a FileID member in SelectionVisitor, and have removed all the hardcoded main-file references, it's clearer and the plumbing is only a little noisy. IIUC the code-style thing mainly applies to the implementation?

I'll still keep this out of the *public* API for now though, until we have some need for it.
The SelectionTree constructors document that offsets are with respect to the main file, and I don't think that interface is particularly prone to misinterpretation.

113 ↗(On Diff #184672)

Expanded them all to separate decls, this is indeed more common.

sammccall updated this revision to Diff 184732.Feb 1 2019, 5:42 AM
sammccall marked 2 inline comments as done.

Review comments

ilya-biryukov accepted this revision.Feb 1 2019, 5:53 AM

LGTM and sorry for delaying this with discussions.
The change looks solid, regardless of potential problems we may still run into.

clangd/Selection.cpp
30 ↗(On Diff #184672)

Totally agree, public API is a totally different beast, the comment only applied to the implementation.

This revision is now accepted and ready to land.Feb 1 2019, 5:53 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 7:05 AM

(And another review that omitted lists..)