This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Support multiline semantic tokens
ClosedPublic

Authored by kadircet on Jun 15 2022, 6:46 AM.

Details

Summary

Per LSP, multiline tokens should be handled as if they end at the end
of the line starting the token (there's also a capability to enable them, but
that's an adventure for a different day).

Fixes https://github.com/clangd/clangd/issues/1145

Diff Detail

Event Timeline

kadircet created this revision.Jun 15 2022, 6:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 6:46 AM
kadircet requested review of this revision.Jun 15 2022, 6:46 AM
sammccall accepted this revision.Jun 20 2022, 5:15 AM

(My first reaction was that this belongs in semanticHighlights() rather than toSemanticTokens, but I think you got it right - the single-line restriction is pretty unnatural from clang's point of view, and cuts across tokens generated in different places).

clang-tools-extra/clangd/SemanticHighlighting.cpp
951

This wording is hard for me to follow. I think it's saying:

"If the token spans a line break, truncate it to avoid this"

951

It seems it would be better to split into one token per line, rather than simply truncating.

Is truncation for simplicity or is there a reason to prefer it?

FWIW I think this wouldn't be too hard to implement if you reordered the tokenType/tokenModifiers above so this part is the last step, and just copied the whole SemanticToken object from the back of the Result. But on the other hand it's not terribly important, either.

At least I think we should have a comment for the truncate/split tradeoff.

clang-tools-extra/clangd/SemanticHighlighting.h
90

I think this deserves a comment like:

// This is clang's token bounds, which may span multiple lines.
// This may be split/truncated to a single line when required by LSP.
This revision is now accepted and ready to land.Jun 20 2022, 5:15 AM
kadircet updated this revision to Diff 438387.Jun 20 2022, 7:29 AM

Split highlights into multiple tokens rather than trimming

clang-tools-extra/clangd/SemanticHighlighting.cpp
951

This wording is hard for me to follow. I think it's saying:

It is actually from LSP 😅 changed into a simpler version, I don't think we need references to LSP in here.

Is truncation for simplicity or is there a reason to prefer it?

Yeah I didn't want to do a loop, as in theory there can be many lines not just two. Also we need to take care of the Last and delta calculations for the following tokens.
But it isn't as bad, i suppose. Changed into splitting.

sammccall accepted this revision.Jun 21 2022, 6:28 AM
sammccall added inline comments.
clang-tools-extra/clangd/SemanticHighlighting.cpp
950

this copy feels gratuitous, can we just use Tokens.back(), or do the copy in the rare case?

956

This isn't literally true - files with multiline raw strings aren't rare. maybe "multiline tokens are rare"?
(even so I wonder if there's a real performance problem here...)

962

can we use < rather than != here? (or an assert)
I realize something has to have gone horribly wrong to infloop here, but it's something nonlocal...

973

reassigning fields one by one seems fragile, copy the last token and then overwrite the fields needed?

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
1007

please have the first token start somewhere other than column 0

kadircet updated this revision to Diff 438711.Jun 21 2022, 8:18 AM
kadircet marked 4 inline comments as done.

Get rid of the copy in common case

clang-tools-extra/clangd/SemanticHighlighting.cpp
950

unfortunately copy is needed, as the delta needs to be from start of the last token :/

changing the logic to do the copy only in the rare case by using a scratch element, LMK if you have better ideas here.

This revision was landed with ongoing or failed builds.Jun 29 2022, 4:49 AM
This revision was automatically updated to reflect the committed changes.