This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add semantic token for labels
ClosedPublic

Authored by ckandeler on Feb 3 2023, 3:41 AM.

Diff Detail

Event Timeline

ckandeler created this revision.Feb 3 2023, 3:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 3:41 AM
ckandeler requested review of this revision.Feb 3 2023, 3:41 AM
nridge added a subscriber: nridge.Feb 4 2023, 7:09 PM

thanks for the patch, but could you elaborate a little bit on "why this is useful for clients".
in theory semantic highlighting tries to provide annotations for tokens that are hard to disambiguate by a syntax highlighter due to need for context.
at hindsight i can't see why goto X; and X: is not enough for clients to implement this without any need for semantic analysis. are there contexts where this kind of syntactical match is not enough?
moreover there are other label-like constructs that we're not handling, e.g. access specifiers and switch cases. any particular reason for not handling them as part of "label" highlights if we were to handle label-decls (the argument above applies to this case too though, I think these can be inferred without any semantic analysis)?

at hindsight i can't see why goto X; and X: is not enough for clients to implement this without any need for semantic analysis. are there contexts where this kind of syntactical match is not enough?

I suppose the label *use* could be identified by looking at the previous token, but not the label *declaration* (see below).

moreover there are other label-like constructs that we're not handling, e.g. access specifiers and switch cases.

But access specifiers are a completely different thing semantically, that's the point: The user does not tell the client: "I want everything that is followed by a single colon in this color"; that would be silly. They say "I want goto labels in this color", exactly because then they immediately stand out compared to access specifiers.
switch cases are indeed similar semantically, but the difference is that they already have a category assigned: They are either enum values (a semantic token type in clangd), macros (ditto) or number literals (likely to be its own category in the client's syntax highlighter).

nridge added a comment.EditedFeb 13 2023, 9:43 AM

FWIW, I agree that switch labels and goto labels are conceptually quite different: the thing before a switch label is an expression, and if it's an identifier it references an already-declared entity with its own kind (e.g. enumerator), whereas a goto label basically declares a new entity of a special kind (which can then be referenced from goto statements); having a distinct color for that kind would make sense to me.

Another potential consideration here is the GNU "labels as values" language extension, which clang seems to support, and which allows referencing a LabelDecl from a context other than a goto-statement (and thus less obvious from the lexical context only).

sorry for long silence.

But access specifiers are a completely different thing semantically, that's the point: The user does not tell the client: "I want everything that is followed by a single colon in this color"; that would be silly. They say "I want goto labels in this color", exactly because then they immediately stand out compared to access specifiers.

I am not sure if access specifiers and label declarations occur in the same context often enough for this mixing to actually cause trouble TBH.


I was mostly being cautious for new semantic highlighting token. I believe today we've too low of a bar in clangd's semantic highlighting functionality for introducing "custom" token types, which are probably not used by any people apart from the ones involved in the introduction of the token (we don't have custom themes/scopes even for vscode, we don't even mention them in our user-facing documentation).
So I feel like these custom semantic token types are getting us into "death by thousand cuts" situation. Each of these changes are small enough on their own, but we'll slowly get into a state in which we're spending time both calculating and emitting all of those token types that are just dropped on the floor (and also sometimes the extra cost to maintain them as language semantics change).

Hence my stance here is still towards "we don't need it", unless we have some big editor(s) that can already recognize "label" as a semantic token type to provide the functionality for a set of users. that way we can justify of maintaining the code and the runtime costs. (or finding a way to make this more closely aligned with an existing token type/modifier)

In my opinion, it is not possible to have a competitive client if you limit yourself to the official LSP feature set; you just need language-specific extensions in practice.
And while of course not every silly idea should be blindly accepted, I think this "lowest common denominator" approach artificially limits clients' potential feature sets and inhibits innovation.
I am of course biased, but it seems to me that when in doubt, client requirements should take precedence over server-side purity concerns.

nridge added a comment.EditedApr 15 2023, 1:06 AM

I am not sure if access specifiers and label declarations occur in the same context often enough for this mixing to actually cause trouble TBH.

I think the focus on access specifiers is a bit of a distraction, especially given the mentioned "labels as values" language extension, which makes possible usages of the label of the form &&labelname in any expression context (producing a pointer value).

In my mind, having a semantic token for labels is a matter of completeness: all other identifier tokens get some sort of semantic token depending on what kind of Decl they resolve to. LabelDecl is a kind of Decl that can be declared as label:, and referenced as goto label or &&label where label is lexically an identifier token -- so it makes sense for those identifiers to be covered by a semantic token as well.

I believe today we've too low of a bar in clangd's semantic highlighting functionality for introducing "custom" token types, which are probably not used by any people apart from the ones involved in the introduction of the token (we don't have custom themes/scopes even for vscode, we don't even mention them in our user-facing documentation).

In the case of labels (and angle brackets (D139926) and operators (D136594)), the set of users benefiting from them includes presumably "all QtCreator users" (assuming QtCreator assigns a default color to these).

You're right that vscode users don't benefit from them unless they configure semanticTokenColorCustomizations; in my mind, this is a reason for us to write some vscode themes, rather than to avoid adding new tokens. (I volunteer to write vscode themes if that helps change your mind.)

we'll slowly get into a state in which we're spending time both calculating and emitting all of those token types that are just dropped on the floor

I agree that the volume of semantic tokens is potentially a concern. I think it's more of a concern in the case of tokens that occur fairly frequently, like operator tokens. (This is why, in the review of D136594, I suggested only emitting semantic tokens for overloaded operators, since it's distinguishing those from built-in operators that's the more interesting use case in highlighting operators.) By comparison, label tokens would be relatively rare and thus their overhead would be quite small.

Hence my stance here is still towards "we don't need it", unless we have some big editor(s) that can already recognize "label" as a semantic token type

I'm not sure whether you mean LSP-based editors, but if we're counting editors that had "label" as a semantic token type pre-LSP, and are now looking to maintain that through their switch to LSP, then both Eclipse CDT and (I assume) Qt Creator fall into that category.

(sorry for taking so long to get back to this)

In the case of labels (and angle brackets (D139926) and operators (D136594)), the set of users benefiting from them includes presumably "all QtCreator users" (assuming QtCreator assigns a default color to these).

thanks this is useful, i think i know about this reasoning in the previous two reviews but wasn't sure about this one.

You're right that vscode users don't benefit from them unless they configure semanticTokenColorCustomizations; in my mind, this is a reason for us to write some vscode themes, rather than to avoid adding new tokens. (I volunteer to write vscode themes if that helps change your mind.)

Vscode is one that we have control over (similar to Qt creator if that helps reasoning) but I am afraid customizing vscode too much and assessing impact of features with a bias of those customization might hinder usability in the long run (also again custom themes etc. doesn't really scale).

I'm not sure whether you mean LSP-based editors, but if we're counting editors that had "label" as a semantic token type pre-LSP, and are now looking to maintain that through their switch to LSP, then both Eclipse CDT and (I assume) Qt Creator fall into that category.

Considering that we have added a bunch of token kinds, let me ask you another question then, how many more new token kinds do we expect to have? e.g. if label is the last one sure, but if we plan to have 10 more or if we don't know how many more we would like to have, i think it's better to first come up with a plan and an agreement to make sure we can avoid such discussions with every attempt. does that sound fair?

nridge added a comment.May 9 2023, 1:31 AM

Considering that we have added a bunch of token kinds, let me ask you another question then, how many more new token kinds do we expect to have? e.g. if label is the last one sure, but if we plan to have 10 more or if we don't know how many more we would like to have, i think it's better to first come up with a plan and an agreement to make sure we can avoid such discussions with every attempt. does that sound fair?

That's a totally fair question.

For my part, I don't have any additional token kinds in mind to add. (The suggestion in https://github.com/clangd/clangd/issues/787 to split "template parameter" into type vs. non-type is an interesting one, but it's more like a nice-to-have enhancement.)

Not sure about Qt Creator, will let @ckandeler chime in on whether he had any other token kinds in mind for it.

Considering that we have added a bunch of token kinds, let me ask you another question then, how many more new token kinds do we expect to have? e.g. if label is the last one sure, but if we plan to have 10 more or if we don't know how many more we would like to have, i think it's better to first come up with a plan and an agreement to make sure we can avoid such discussions with every attempt. does that sound fair?

Speaking for myself, I don't currently have any plans to add further tokens, but it's difficult to say with certainty that this will be the last one. For instance, new C++ features in future standards might suggest new token types (like "concept" for C++20). And if https://github.com/clangd/clangd/issues/1115 gets implemented, then there will be "string", "number" and "keyword", though these are all part of the LSP spec already.

kadircet accepted this revision.Jun 7 2023, 12:44 AM

alright then, this patch LGTM. going forward let's try to introduce new kinds (or handling for different semantic constructs) in a more holistic manner.

concerns with new language structs are OK, as the language evolves we'll surely need to add more highlighting, hopefully we can fit them under existing highlight kinds.

(sorry for the late reply)

This revision is now accepted and ready to land.Jun 7 2023, 12:44 AM
This revision was automatically updated to reflect the committed changes.