This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Fall back to selecting token-before-cursor if token-after-cursor fails.
ClosedPublic

Authored by sammccall on Dec 11 2019, 4:20 AM.

Details

Summary

The problem:

LSP specifies that Positions are between characters. Therefore when a position
(or an empty range) is used to target elements of the source code, there is an
ambiguity - should we look left or right of the cursor?

Until now, SelectionTree resolved this to the right except in trivial cases
(where there's whitespace, semicolon, or eof on the right).
This meant that it's unable to e.g. out-line int foo^() today.

Complicating this, LSP notwithstanding the cursor is *on* a character in many
editors (mostly terminal-based). In these cases there's no ambiguity - we must
"look right" - but there's also no way to tell in LSP.

(Several features currently resolve this by using getBeginningOfIdentifier,
which tries to rewind and supports end-of-identifier. But this relies on
raw lexing and is limited and buggy).

Precedent: well - most other languages aren't so full of densely packed symbols
that we might want to target. Bias-towards-identifier works well enough.
MS C++ for vscode seems to mostly use bias-toward-identifier too.
The problem with this solution is it doesn't provide any way to target some
things such as the constructor call in Foo^(bar());

Presented solution:

When an ambiguous selection is found, we generate *both* possible selection
trees. We try to run the feature on the rightward tree first, and then on the
leftward tree if it fails.

This is basically do-what-I-mean, the main downside is the need to do this on
a feature-by-feature basis (because each feature knows what "fail" means).
The most complicated instance of this is Tweaks, where the preferred selection
may vary tweak-by-tweak.

Diff Detail

Event Timeline

sammccall created this revision.Dec 11 2019, 4:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 11 2019, 4:20 AM
sammccall planned changes to this revision.Dec 11 2019, 4:20 AM

This needs tests, but wanted to share the ideas.

sammccall updated this revision to Diff 233364.Dec 11 2019, 7:11 AM

Make tweaktests do the fallback dance so that they test how the feature actually works.

Use callbacks to avoid creating too many trees eagerly. Add tests.

sammccall edited the summary of this revision. (Show Details)Dec 11 2019, 10:41 AM
sammccall added reviewers: kadircet, ilya-biryukov.

So a reasonable outcome here is we conclude this does more harm than good.
If that happens, I'd like to be clear on whether we have a better way of doing this in general or just don't think it's worth the cost, and should hack around it where possible as in D71284

Unit tests: pass. 60717 tests passed, 0 failed and 726 were skipped.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or apply this patch.

Build artifacts: console-log.txt, CMakeCache.txt, clang-format.patch, test-results.xml, diff.json

nridge added a subscriber: nridge.Dec 11 2019, 3:57 PM

I tried to do a less general version of this (for go-to-definition only) in D70727 :)

I tried to do a less general version of this (for go-to-definition only) in D70727 :)

Ah, I hadn't seen that. After thinking about this a bit, I think the behavior in that patch is OK, and it's complexity that will do us in. More cases keep coming up - internally people complained about this in code actions which is the hardest case to fix and needs most of this complexity.

How do you feel about the approach here? I like having the description of the problem centralized and directing callsites toward a solution.
At the same time, it does make me sad that a fairly nice abstraction becomes so much harder to pick up and use, the API is pretty horrendous.

nridge added a comment.EditedDec 12 2019, 3:21 PM

How do you feel about the approach here?

I agree that having more of the logic centralized (in SelectionTree) is preferable to having it directly in a call site like getDeclAtPosition.

I also agree that this makes the SelectionTree API a bit clunky.

I tested the behaviour in Eclipse CDT, on the following testcase:

struct Material {
    friend Material operator+(Material, Material);
};

void foo() {
    Material hydrogen, oxygen;
    hydrogen^+^oxygen;
}

Here, both navigations target the overloaded operator, but if you comment out the overloaded operator, they target the respective local variable declarations. I think this is pretty good do-what-I-mean behaviour, arguably better than a categorical preference for the right side over the left side.

I can describe how CDT achieves this:

  • There is an API that takes a textual selection, and an AST node type, and finds the innermost node of that type that encloses the selection.
  • The AST nodes built for the statement are as follows:
hydrogen+oxygen;
AAAAAAAABCCCCCC;   // A and C have type Name, B has type ImplicitName
  • (If the overloaded operator is commented out, there is no node B.)
  • For the purpose of determining "encloses", the cursor being at either boundary of a node counts as a match (so A and B are both considered to enclose the first cursor, and B and C are both considered to enclose the second cursor).
  • The algorithm for go-to-definition is: first look for an enclosing ImplicitName node; if one is found, go to its target. Otherwise, look for an enclosing Name node and go to its target.

Other features use the same API, but perhaps with a different node type; for example, the "define out of line" refactoring looks for an enclosing FunctionDeclarator node.

Perhaps we could have a simpler SelectionTree API along these lines, where the type of desired node informs the selection at each call site?

Here, both navigations target the overloaded operator, but if you comment out the overloaded operator, they target the respective local variable declarations. I think this is pretty good do-what-I-mean behaviour, arguably better than a categorical preference for the right side over the left side.

That's reasonable in eclipse (though personally I'd find it frustrating to have no way to target b in a+b+c).
It seems broken for editors where the cursor is on a character rather than between characters, though.

  • For the purpose of determining "encloses", the cursor being at either boundary of a node counts as a match (so A and B are both considered to enclose the first cursor, and B and C are both considered to enclose the second cursor).
  • The algorithm for go-to-definition is: first look for an enclosing ImplicitName node; if one is found, go to its target. Otherwise, look for an enclosing Name node and go to its target.

Other features use the same API, but perhaps with a different node type; for example, the "define out of line" refactoring looks for an enclosing FunctionDeclarator node.

Perhaps we could have a simpler SelectionTree API along these lines, where the type of desired node informs the selection at each call site?

Thanks for the details on CDT. This diverges quite a lot from what we currently have:

  • we have abstractions at the preprocessor level (tokens) and at the semantic level (clang AST), but nothing modeling the concept of e.g. "name" that we would need here.
  • our refactorings (tweak) relies on computing the selection by traversing the AST first and then querying it for each operation, which doesn't work if the query affects the traversal and the traversal is expensive

Syntax trees would address this by providing a tree with the relevant syntactic constructs that's cheap to traverse, and would certainly end up replacing SelectionTree altogether.

kadircet accepted this revision.Dec 13 2019, 5:26 AM

tl;dr; LGTM, from my side as long as you are also happy with the extra complexity introduced by this to all call sites.

As told in the offline discussions, only problem I have with this approach is its intrusiveness. But arguably all of the call sites that cares about what to do in such situations already has some sort of handling themselves. So I am OK with landing this generic situation and adding some more mental overhead to all of the callers, hopefully we could come up with some helpers that would enable callers to choose one behavior or the other all the time and get rid of extra complexity(not sure if they are likely to stick though).

This revision is now accepted and ready to land.Dec 13 2019, 5:26 AM
nridge added a comment.EditedDec 13 2019, 7:20 AM

(though personally I'd find it frustrating to have no way to target b in a+b+c).

(For completeness, there is a way to target b in a+b+c: by selecting b (such that your selection range has length 1 rather than 0). Then, neither + node will enclose the selection range, but the b node will.)

(though personally I'd find it frustrating to have no way to target b in a+b+c).

(For completeness, there is a way to target b in a+b+c: by selecting b (such that your selection range has length 1 rather than 0). Then, neither + node will enclose the selection range, but the b node will.)

Indeed, sorry - I meant that if we incorporated it into an LSP server, there'd be no way to target it in methods that take a position rather than a selection (go to defn, hover etc).

(For completeness, there is a way to target b in a+b+c: by selecting b (such that your selection range has length 1 rather than 0). Then, neither + node will enclose the selection range, but the b node will.)

yes but this doesn't work with most of the LSP features, e.g. go-to-def, hover, etc.

clang-tools-extra/clangd/ClangdServer.cpp
443

nit: no need for else

sammccall marked an inline comment as done.Dec 13 2019, 7:42 AM

OK, I'm going to land this with some reservations. It's a mess, but the number of callsites isn't overwhelming.
I'd been hoping to get away with bias-right, but we keep getting complaints about this.
It's hard to find an approach that solves this, doesn't obviously break other things, and is close enough to what we do now.
Happy to revisit in future, would be nice to get rid of this wart.

sammccall updated this revision to Diff 233804.Dec 13 2019, 7:56 AM

Bias selection-tree right for Hover.
In VSCode (and presumably other GUI-based editors) the location sent is always
the one on the left of the character under the cursor.

Clang-format.

This revision was automatically updated to reflect the committed changes.

Indeed, sorry - I meant that if we incorporated it into an LSP server, there'd be no way to target it in methods that take a position rather than a selection (go to defn, hover etc).

Ah, right, I forgot that LSP tends to send positions rather than ranges as inputs. (I hope that changes in the future, e.g. an issue is on file for hover.)

Unit tests: pass. 60850 tests passed, 0 failed and 726 were skipped.

clang-format: pass.

Build artifacts: console-log.txt, CMakeCache.txt, test-results.xml, diff.json

thakis added a subscriber: thakis.Dec 14 2019, 5:13 AM

This breaks compile on most bots in the "clang" section on http://lab.llvm.org:8011/console , probably the ones that use gcc as host compiler.

Here's an example:
http://lab.llvm.org:8011/builders/clang-cmake-armv7-selfhost-neon/builds/2714/steps/build%20stage%201/logs/stdio