This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Use token modifiers and more token types for semanticTokens
AbandonedPublic

Authored by nridge on Apr 7 2020, 7:59 PM.

Details

Reviewers
sammccall
Summary

The intention here is to allow clients to continue using a different
color for each of our HighlightingKinds, as with the old API.

For some highlighting kinds, this is accomplished by adding more
token types (e.g. splitting "dependent" into "dependentType" and
"dependentName"). For others, it is accomplished by using token
modifiers.

Note that the patch introduces a "member" modifier in place of
the "member" token type. This allows member variables and member
functions to be colored differently, by using the "variable" and
"function" token types, respectively, combined with the "member"
modifier.

Diff Detail

Unit TestsFailed

Event Timeline

nridge created this revision.Apr 7 2020, 7:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2020, 7:59 PM

Some context for me: I did also start some work on supporting modifiers, and after some experiments with VSCode concluded it was too early (no support in any shipped themes, I saw some hard-to-explain interactions with the default syntax highlighting...).
My plan was to revisit once this was in the standard and VSCode had some out-of-the-box support. If there's a goal to implement already in other editors (Theia?) that's a good reason to move forward sooner, but we should be wary of getting too far ahead of VSCode I think, as they're the ones who will mostly decide how the standard mostly gets interpreted.
The unfinished code is in D77811 for what it's worth (no tests, not sure if the scope is right etc). With that out of the way...


A couple of observations about the spec:

1: LSP modifiers are clearly designed to be orthogonal to the token kind (in principle at least, even if static namespace isn't a thing).
Some of the most useful modifiers mentioned in the spec (e.g. readonly and definition) are certainly going to apply to a wide range of kinds. And I imagine these will often be orthogonal in presentation too (e.g. underline all definitions, color indicates what it is).
I think this makes a lot of sense and would like our design to reflect this internally (e.g. modifiers as boolean flags in HighlightingToken>) rather than having modifiers+kind encode some internal enum. While it's OK to have some modifiers that aren't orthogonal internally, I worry about *starting* there. And I'm skeptical that some of those added here are best implemented this way.

2: The "predefined" modifiers/kinds in the semanticToken spec are a bit of a dilemma.
We care about lots of clients, and know from experience that a) we have very little influence on how vscode interprets the spec and b) we have some influence on other clients, but they mostly copy vscode by default
Because servers, client plugins, editors, and themes typically come from different vendors, getting modifiers/kinds highlighted well out-of-the-box is going to be really hard unless there's alignment on what the modifiers/kinds *are*.
I think the predefined values in the spec are probably the best/most likely way to get this alignment, so there's a lot of value in sticking to them closely.
On the other hand we'll never capture all interesting C++ semantics in a language-neutral spec, and there are some seriously questionable choices ("member" as a kind rather than an attribute or at least method/field?).
There's no "subclasses" at least for kinds, so we have to make a call one way or the other. I do think we should stick to the standard names where we possibly can, and fight (and probably lose) to get member fixed, local added etc in the spec.


I'm not sure that making the results of the new API easy to exactly map back onto the results of the old API is something we should be taking as a hard design goal, which I know makes transition a bit painful :-(
Similar to the first point, we don't want our design constrained too much by an interface that will go away.

clang-tools-extra/clangd/SemanticHighlighting.cpp
450

I like this a lot. There's a function computeScope in Quality.cpp that determines if something's class/function/file scoped, though for function-local I guess Decl::isLexicallyWithinFunctionOrMethod() is enough.

451

I think we should probably call this StaticMember (internally), to avoid accidentally conflating the various meanings of static.

(Which reminds me, I'd love to make hover on static explain what it means here!)

nridge added a comment.EditedApr 9 2020, 2:22 PM

Thanks for your thoughts!

Some context for me: I did also start some work on supporting modifiers, and after some experiments with VSCode concluded it was too early (no support in any shipped themes, I saw some hard-to-explain interactions with the default syntax highlighting...).
My plan was to revisit once this was in the standard and VSCode had some out-of-the-box support. If there's a goal to implement already in other editors (Theia?) that's a good reason to move forward sooner, but we should be wary of getting too far ahead of VSCode I think, as they're the ones who will mostly decide how the standard mostly gets interpreted.

VSCode Insiders today allows themes and user settings to customize styles for token types -- and, if desired, for combinations of token types and token modifiers -- using the "editor.tokenColorCustomizationsExperimental" settings entry, as documented here. For example:

"editor.tokenColorCustomizationsExperimental": {
  // Applies to tokens which have the "function" token type and both the "member" and "static" modifiers
  "function.member.static": {
    "foreground": "#123456",
    "fontStyle": "underline"
  }
}

My intention in this patch is to preserve the ability for users to have e.g. static and non-static methods styled differently, by using things like the above (whether via a theme or their user settings).

In terms of a timeline, my hope is that support for enough modifiers to be able to express our existing token kinds, can be added no later than the removal of the old API (which I understand from another patch is planned for clangd 12?).

(Since you mentioned Theia: I believe its plans are to move towards greater reuse of VSCode components. So, while it had its own implementation of the old semantic highlighting API, for semanticTokens it will likely be using VSCode's implementation, and therefore have the same capabilities and behaviours as VSCode.)

Some of the most useful modifiers mentioned in the spec (e.g. readonly and definition) are certainly going to apply to a wide range of kinds. And I imagine these will often be orthogonal in presentation too (e.g. underline all definitions, color indicates what it is).

Having modifiers be orthogonal in presentation is certainly a nice goal, but I don't think it's realistic to expect that clients will be able to accomplish this for all modifiers. Simply put, there are more potentially interesting modifiers than there are orthogonal ways to style text in your typical editor (e.g. color, bold, italic, underline, etc.). Even if that weren't the case, tying modifiers to orthogonal aspects of text styling will not necessarily produce the best-looking highlighting.

Notably, the customization ability described above allows associating styles with modifiers both in an orthogonal way (e.g. using "*.static" to apply a common style to all static entities) and in a non-orthogonal way (e.g. using "function.static" to apply a style specifically to static functions). As a server, I think we should leave it open to clients to use this larger degree of customizability / freedom if they choose to.

would like our design to reflect this internally (e.g. modifiers as boolean flags in HighlightingToken>) rather than having modifiers+kind encode some internal enum.

That makes a lot of sense, and I was actually thinking of moving in that direction in follow-up patches as well :)

I just wanted to start with a minimal change to avoid regressing the ability to express some of the existing distinctions that appear in our current API.

2: The "predefined" modifiers/kinds in the semanticToken spec are a bit of a dilemma.
[...]

I think languages are different enough that, while some common concepts may be shared, ultimately each language will end up having its own set of token types and modifiers. It seems to me, based on comments like this one, that the protocol is designed with this in mind.

I do think it's important to agree on a set of token types and modifiers within a language community (so e.g. among C++ language servers and clients). However, I think we (clangd) have enough of a "de facto standard" / front-runner status in the C++ community that we can take the lead on adding new ones? (That's my impression, anyways.)

nridge abandoned this revision.Feb 15 2021, 4:44 PM

Abandoning as this was superseded by { D77811, D95701, D95706 }.