This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Implement semanticTokens modifiers
ClosedPublic

Authored by sammccall on Apr 9 2020, 10:49 AM.

Details

Summary
  • Infrastructure to support modifiers (protocol etc)
  • standard modifiers:
    • declaration (but no definition, yet)
    • deprecated
    • readonly (based on a fairly fuzzy const checking)
    • static (for class members and locals, but *not* file-scope things!)
    • abstract (for C++ classes, and pure-virtual methods)
  • nonstandard modifier:
    • deduced (on "auto" whose Kind is Class etc) Happy to drop this if it's controversial at all.
  • While here, update sample tweak to use our internal names, in anticipation of theia TM scopes going away.

This addresses some of the goals of D77702, but leaves some things undone.
Mostly because I think these will want some discussion.

  • no split between dependent type/name. (We may want to model this as a modifier, type+dependent vs ???+dependent)
  • no split between primitive/typedef. (Is introducing a nonstandard kind is worth this distinction?)
  • no nonstandard local attribute This probably makes sense, I'm wondering if we want others and how they fit together.

There's one minor regression in explicit template specialization declarations
due to a latent bug in findExplicitReferences, but fixing it after seems OK.

Diff Detail

Event Timeline

sammccall created this revision.Apr 9 2020, 10:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2020, 10:49 AM
sammccall updated this revision to Diff 319591.Jan 27 2021, 8:37 AM

Rebase, polish, add tests.

Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2021, 8:37 AM
sammccall updated this revision to Diff 319592.Jan 27 2021, 8:40 AM
sammccall retitled this revision from [clangd] WIP playing with semantic highlighting modifiers to [clangd] Implement semanticTokens modifiers.
sammccall edited the summary of this revision. (Show Details)

Update phab with commit message

sammccall updated this revision to Diff 319594.Jan 27 2021, 9:08 AM

Tidy up, explain static

sammccall updated this revision to Diff 320218.Jan 29 2021, 2:26 PM

Add tests for 'abstract' and fix bugs

Thanks a lot for working on this! I agree this is well aligned with the goals of D77702 (and in fact goes even further and meets the goals of D66990 via the declaration modifier). The other modifiers are neat as well; I like the expressiveness of the kind + modifier model in general.

This addresses some of the goals of D77702, but leaves some things undone.
Mostly because I think these will want some discussion.

  • no split between dependent type/name. (We may want to model this as a modifier, type+dependent vs ???+dependent)

+1

  • no split between primitive/typedef. (Is introducing a nonstandard kind is worth this distinction?)

Here's my general take on this:

  • There is too much variety between programming languages to expect that a standard list of highlighting kinds is going to satisfactorily cover all languages.
  • If LSP aims to be the basis for tools that are competitive with purpose-built language-specific tools, it's reasonable to allow for clients willing to go the extra mile in terms of customization (e.g. use a language-specific theme which provides colors for language-specific token kinds).
  • Therefore, I would say yes, we should take the liberty of adding nonstandard highlighting kinds and modifiers where it makes sense from a language POV.

That said... for typedef specifically, I wonder if it actually makes more sense as a modifier than a kind. That is, have the kind be the target type (falling back to Unknown/Dependent if we don't know), and have a modifier flag for "the type is referred to via an alias" (as opposed to "the type is referred to directly"). WDYT?

  • no nonstandard local attribute This probably makes sense, I'm wondering if we want others and how they fit together.

Are you referring to local-scope variables (i.e. FunctionScope in D95701)? If so, I think the approach in that patch is promising.

clang-tools-extra/clangd/SemanticHighlighting.cpp
130

It seems a bit unintuitive that the path which non-pointer types (e.g. unqualified class or builtin types) take is to return false in this recursive call because getPointeeType() returns null... but I guess that works.

134

Do you think in the future it might make sense to have the readonly modifier reflect, at least for variables, whether the particular reference to the variable is a readonly reference (e.g. binding the variable to a const X& vs. an X&)?

182

Will anything take this path (rather than being covered by VD->isStaticDataMember() above)?

376

This is a change of behaviour from before, in the case where the ReferenceLoc has multiple targets.

Before, we would produce at most one highlighting token for the ReferenceLoc. In the case where different target decls had different highlighting kinds, we wouldn't produce any.

Now, it looks like we produce a separate token for every target whose kind matches the kind of the first target (and skip targets with a conflicting kind).

Is that the intention?

It seems a bit strange: if we allow multiple tokens, why couldn't they have different kinds?

clang-tools-extra/clangd/refactor/tweaks/AnnotateHighlightings.cpp
11

Maybe add // for llvm::to_string at it's not obvious

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
264 ↗(On Diff #320218)

Presumably, the highlighting kinds StaticField and StaticMethod are going to be collapsed into Field and Method in a future change (after the removal of TheiaSemanticHighlighting, I guess)?

718 ↗(On Diff #320218)

Should we add a test case for _deprecated as well?

sammccall updated this revision to Diff 320464.Feb 1 2021, 7:44 AM
sammccall marked an inline comment as done.

add missing test

Thanks a lot for working on this! I agree this is well aligned with the goals of D77702 (and in fact goes even further and meets the goals of D66990 via the declaration modifier).

Ooh, I hadn't seen D66990, thanks for the pointer and also giving the upstream feedback to LSP folks! I'm also really happy about the multi-dimensional nature of highlightings in the new protocol, looks like you pushed it in that directyon.

  • no split between primitive/typedef. (Is introducing a nonstandard kind is worth this distinction?)

Here's my general take on this:

  • There is too much variety between programming languages to expect that a standard list of highlighting kinds is going to satisfactorily cover all languages.
  • If LSP aims to be the basis for tools that are competitive with purpose-built language-specific tools, it's reasonable to allow for clients willing to go the extra mile in terms of customization (e.g. use a language-specific theme which provides colors for language-specific token kinds).
  • Therefore, I would say yes, we should take the liberty of adding nonstandard highlighting kinds and modifiers where it makes sense from a language POV.

Totally agree, I should have been more explicit.
Introducing nonstandard kinds is backwards-incompatible. If the client doesn't understand primitiveType, then the token kind is now completely unknown. This could be a regression from the current state (type).
But for primitive type specifically, the information we're losing is "auto denotes a type" which the client's non-semantic highlighting should manage anyway. So I agree we should do this.

That said... for typedef specifically, I wonder if it actually makes more sense as a modifier than a kind. That is, have the kind be the target type (falling back to Unknown/Dependent if we don't know), and have a modifier flag for "the type is referred to via an alias" (as opposed to "the type is referred to directly"). WDYT?

Agree. Do you think it should be the *same* modifier as deduced which I included here?
(In a similar vein, there's an argument for pointer, ref, rvalue-ref as modifiers)

  • no nonstandard local attribute This probably makes sense, I'm wondering if we want others and how they fit together.

Are you referring to local-scope variables (i.e. FunctionScope in D95701)? If so, I think the approach in that patch is promising.

Yes, exactly.

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
264 ↗(On Diff #320218)

Yeah, merging any kinds that are exported with the same name should be NFC at that point.

Hmm, though currently StaticField --> Variable, not Field (similarly StaticMethod --> Method).
So we can have static fields be Variable+Static+ClassScope or Field+Static+ClassScope.

I can see arguments for either...

718 ↗(On Diff #320218)

Oops, done!

Introducing nonstandard kinds is backwards-incompatible. If the client doesn't understand primitiveType, then the token kind is now completely unknown. This could be a regression from the current state (type).

Good point. I guess, if we're aiming for backwards compatibility, we'll want to do most of our future customization via modifiers, since those will gracefully degrade to the highlighting for base kind for clients that don't recongize them.

That said... for typedef specifically, I wonder if it actually makes more sense as a modifier than a kind. That is, have the kind be the target type (falling back to Unknown/Dependent if we don't know), and have a modifier flag for "the type is referred to via an alias" (as opposed to "the type is referred to directly"). WDYT?

Agree. Do you think it should be the *same* modifier as deduced which I included here?

Conceptually, they seem distinct to me.

(In a similar vein, there's an argument for pointer, ref, rvalue-ref as modifiers)

Indeed. I'm definitely happy to defer type modifiers like these (including typedef) to a future patch.

nridge added inline comments.Feb 2 2021, 9:13 PM
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
264 ↗(On Diff #320218)

I don't have a strong opinion on this one.

Maybe Field+Static+ClassScope, that way clients that use a generic theme (and thus do not recognize ClassScope) will color it as Field rather than Variable?

nridge added inline comments.Feb 5 2021, 12:58 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
376

Thinking more about this, the behaviour may actually be reasonable as written.

  • Tokens with different kinds would get discarded via resolveConflict().
  • Multiple tokens with the same kind are potentially useful because they may have different modifiers, and the modifiers are then merged in resolveConflict().
nridge accepted this revision.Feb 5 2021, 1:00 AM

I think this patch is in pretty good shape, the outstanding comments are either minor or thoughts about future changes.

I am curious what you think of the extension idea for readonly, though obviously it wouldn't be something to do in this patch.

This revision is now accepted and ready to land.Feb 5 2021, 1:00 AM
sammccall marked 2 inline comments as done.Feb 9 2021, 7:31 AM

Thanks for the excellent comments, and sorry for the radio silence. (Kids got quarantined recently so work time has been... scarce).

I think there are improvements to be had here but it's probably time to land this...

I am curious what you think of the extension idea for readonly, though obviously it wouldn't be something to do in this patch.

Longer comment below, but:

  • I love it
  • it's not trivial to implement, I think
  • not obvious whether this should replace the simple readonly given here, augment it, or be a different attribute - I'm probably happy with any of these.
clang-tools-extra/clangd/SemanticHighlighting.cpp
134

Do I understand correctly?

std::vector<int> X; // X is not readonly
X.push_back(42); // X is not readonly
X.size(); // X is readonly

Distinguishing potentially-mutating from non-mutating uses seems really useful to me.
My only question is whether mapping this concept onto readonly is the right thing to do:

  • there are really three sets: const access, mutable access, and non-access (e.g. declaration, or on RHS of using a=b). And I think maybe the most useful thing to distinguish is mutable access vs everything else, which is awkward with an attribute called "readonly".
  • this also seems likely to diverge from how other languages use the attribute (most don't have this concept)
  • on the other hand, standard attribute names will be better supported by clients

This also might be somewhat complicated to implement :-)
I'd like to leave in this simple decl-based thing as a placeholder, and either we can replace it and add an additional "mutation" attribute later (once we work out how to implement it!) I've left a comment about this...

(Related: a while ago Google's style guide dropped its requirement to use pointers rather than references for mutable function params. Having &x at the callsite rather than just x is a useful hint, but obviously diverging from common practice has a cost. We discussed how we could use semantic highlighting to highlight where a param was being passed by mutable reference, though didn't have client-side support for it yet)

376

Oops, I don't think the part where we treat the first kind specially was intended.
I think the simplest thing is to emit all the tokens and let conflict resolution sort them out.
In practice this means:

  • if there's multiple kinds, discard all
  • if there's different sets of modifiers, take the union

Which seems reasonable to me until proven otherwise

(This part of the patch is really old so I can't be 100% sure what my intention was...)

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
264 ↗(On Diff #320218)

Field+Static is probably better than Variable, yeah.

In the back of my head I wonder if there will be significant clients that handle type but not modifiers, and if Variable is a better first-order description than Field. (I was surprised to find out that static members in clang are VarDecls not FieldDecls, but now I got used to the idea...)

This revision was landed with ongoing or failed builds.Feb 9 2021, 7:35 AM
This revision was automatically updated to reflect the committed changes.
nridge added inline comments.Feb 9 2021, 11:46 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
134

Do I understand correctly?

std::vector<int> X; // X is not readonly
X.push_back(42); // X is not readonly
X.size(); // X is readonly

Yup, that's what I had in mind.

Distinguishing potentially-mutating from non-mutating uses seems really useful to me.
My only question is whether mapping this concept onto readonly is the right thing to do:

  • there are really three sets: const access, mutable access, and non-access (e.g. declaration, or on RHS of using a=b). And I think maybe the most useful thing to distinguish is mutable access vs everything else, which is awkward with an attribute called "readonly".
  • this also seems likely to diverge from how other languages use the attribute (most don't have this concept)
  • on the other hand, standard attribute names will be better supported by clients

This also might be somewhat complicated to implement :-)
I'd like to leave in this simple decl-based thing as a placeholder, and either we can replace it and add an additional "mutation" attribute later (once we work out how to implement it!) I've left a comment about this...

Sounds good!

(Related: a while ago Google's style guide dropped its requirement to use pointers rather than references for mutable function params. Having &x at the callsite rather than just x is a useful hint, but obviously diverging from common practice has a cost. We discussed how we could use semantic highlighting to highlight where a param was being passed by mutable reference, though didn't have client-side support for it yet)

Funny you mention this, because I implemented this exact highlighting in Eclipse a few years ago:

https://wiki.eclipse.org/CDT/User/NewIn92#Syntax_coloring_for_variables_passed_by_non-const_reference

(and it's my main motivation for the suggestion I've made here).

nridge added inline comments.Feb 9 2021, 11:57 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
134

I implemented this exact highlighting in Eclipse a few years ago

I will readily admit that classifying references as mutable or not can get tricky. For example:

int* array = new int[10];
array[0] = 42;

Is the reference to array on the second line mutable? Does the answer change if the declaration is changed to int array[10];?

Fun questions to figure out later :-)