This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Inactive regions support as an extension to semantic highlighting
ClosedPublic

Authored by nridge on Sep 12 2019, 10:29 PM.

Event Timeline

nridge created this revision.Sep 12 2019, 10:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2019, 10:29 PM

This patch adds server-side support for greying out code in inactive preprocessor branches (issue #132).

I didn't write test yet. I wanted to post the patch for feedback first, to see if the general approach is ok.

For reference, I also posted a proof-of-concept client impl in D67537, which I used to test/validate this patch.

Rather than a separate method with parallel implementation, this seems very closely related to the syntax highlighting feature.

The minimal way to model this (no new protocol) would be for each disabled line, to add one token spanning the whole line with a TextMate scope like comment.disabled or meta.disabled.
There's no technical reason tokens can't overlap in the protocol, though it's possible editors will choose to turn off local heuristic highlighting on anything that's touched by an LSP-provided token.

Failing that, I'd suggest encoding a list of line-styles on SemanticHighlightingInformation, that should be combined with any tokens on that line.

nridge planned changes to this revision.Sep 15 2019, 8:00 PM

Ok, I can try formulating this as part of / an extension to semantic highlighting.

nridge updated this revision to Diff 222336.Sep 29 2019, 5:16 PM

Reformulate as an extension to semantic highlighting

The updated patch formulates this as a new semantic highlighting kind.

The tokens created for inactive regions are one per line, with the character offset and length being zero; the idea is that the client will handle this highlighting specially (see D67537) and highlight the entire line.

Note, highlighting the entire line is different than highlighting just the portion of the line that contains text. We want the grey background highlight to extend to the entire row in the editor, even over columns beyond the end of the text line (as accomplished with vscode's isWholeLine=true option in DecorationRenderOptions).

I don't love special-casing the scope in the client, but this does avoid having to extend the protocol itself which would be considerably more work.

hokein added a comment.Oct 2 2019, 3:39 AM

Could you add tests for this?

clang-tools-extra/clangd/Compiler.cpp
66 ↗(On Diff #222336)

I'm not sure how does this flag impact the size of Preamble/AST, @ilya-biryukov any thoughts?

clang-tools-extra/clangd/SemanticHighlighting.cpp
194

I think the current implementation only collects the skipped preprocessing ranges after preamble region in the main file. We probably need to store these ranges in PreambleData to make the ranges in the preamble region work, no need to do it in this patch, but we'd better have a test reflecting this fact.

#ifdef ActiveCOde
// inactive code here
#endif

#include "foo.h"
// preamble ends here

namespace ns {
// ..
}
203

This seems too couple with VSCode client, I would prefer to calculate the range of the line and return to the client.

Is there any big differences in VSCode between highlighting with the isWholeLine and highlighting with the range of the line?

clang-tools-extra/clangd/SemanticHighlighting.h
45

This is a different kind group, I would put it after the Macro, we'd need to update the LastKind.

The name seems too specific, how about "UnreachableCode"?

nridge marked an inline comment as done.Oct 3 2019, 8:21 PM

Could you add tests for this?

Certainly, I just wanted to discuss the general approach first, as it will affect what the tests look like.

clang-tools-extra/clangd/SemanticHighlighting.cpp
203

I took some screenshots to illustrate to difference.

Highlighting only to the end of the line of text:

Highlighting the whole line:

I think the first one looks pretty bad, and is inconsistent with existing practice.

Note also that the suggestion is not to special-case the VSCode client specifically; it's to special-case one particular highlighting, which any client can implement.

If this special-casing is really unpalatable, we could instead try this suggestion by @sammccall:

Failing that, I'd suggest encoding a list of line-styles on SemanticHighlightingInformation, that should be combined with any tokens on that line.

I guess one consideration when evaluating these options is, do we expect to use that "list of line-styles" for anything else in the future? I can't think of anything at the moment, but perhaps there are other uses for it.

If not, we could do something slightly simpler, and add a single isInactive flag to SemanticHighlightingInformation.

Why "inactive region", not "skipped ranges"?

ilya-biryukov added inline comments.Oct 4 2019, 12:55 AM
clang-tools-extra/clangd/Compiler.cpp
66 ↗(On Diff #222336)

Have no idea, but why do we need this in the first place?
PPCallbacks::SourceRangeSkipped should allow to record all skipped ranges in the main file. Can we use it?

nridge added a comment.Oct 7 2019, 6:59 AM

Why "inactive region", not "skipped ranges"?

I got the name from "$cquery/publishInactiveRegions" :) But I don't particularly care what we call it.

nridge updated this revision to Diff 223577.Oct 7 2019, 7:00 AM

Update to use PPCallbacks::SourceRangeSkipped

sammccall added inline comments.Oct 7 2019, 7:00 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
191

I think this comment could be clearer, e.g. // Add tokens indicating lines skipped by the preprocessor.

203

Three approaches seem feasible here:

  1. clients that know about the specific scope can extend it to the whole line.
  2. [0,0) or so indicates "highlight the whole line"
  3. use a dedicated property for line styles (vs token styles)

3 is clearly better than 2 I think, it's more explicit. I don't have a strong opinion of 1 vs 3, but if going with 1 I think it's a good idea to measure the line as Haojian says, so we at least get a basic version of the feature if the client doesn't know about line styles.

I guess one consideration when evaluating these options is, do we expect to use that "list of line-styles" for anything else in the future? I can't think of anything at the moment

Preprocessor directives maybe? (Though these are easy enough for clients to highlight with regex)

nridge marked 5 inline comments as done.Oct 7 2019, 7:02 AM
nridge added inline comments.
clang-tools-extra/clangd/Compiler.cpp
66 ↗(On Diff #222336)

Yes, PPCallbacks::SourceRangeSkipped works in place of using DetailedRecord. Thank you for the suggestion.

clang-tools-extra/clangd/SemanticHighlighting.h
45

I changed it to "InactiveCode". ("Unreachable" seemed like the wrong word, it brings to mind control flow.)

nridge marked 2 inline comments as done.Oct 7 2019, 7:04 AM

How would one even measure the line length? SourceManager doesn't sem to have a method like getLineLength() or similar.

How would one even measure the line length? SourceManager doesn't sem to have a method like getLineLength() or similar.

If you look at functions like offsetToPosition in SourceCode.h, basically you get the buffer as a string (contains utf-8), find the offset you're interested in, and then zoom around looking for \n. Once you have a substring, calling lspLength() on it will give you the length in UTF-16 or whatever LSP is speaking at the moment.

hokein added a comment.Oct 7 2019, 8:07 AM

How would one even measure the line length? SourceManager doesn't sem to have a method like getLineLength() or similar.

Yes, there is no existing API for that, I think you'd need to get the source code from the SM at the specific offset, and do the \n counting.

clang-tools-extra/clangd/ParsedAST.h
100 ↗(On Diff #223577)

Instead of adding new member and methods in ParsedAST, I think we can do it in CollectMainFileMacros (adding a new field SkippRanges in MainFileMacros), then we can get the skipped ranges for preamble region within the main file for free.

clang-tools-extra/clangd/SemanticHighlighting.cpp
203

I can't say whether highlighting the line is better than highlighting the range of the line text, but below is the how the inactive TS code is highlighted in VSCode (only the range of text), I personally prefer this style.

nridge added a comment.Oct 7 2019, 8:15 AM

One thing that may be worth considering as well, is that if the client prefers to highlight the text of the line only, it can calculate the length of the line itself. In VSCode for instance, the line lengths are readily available; I imagine other editors are similar since they need that information for many purposes.

One thing that may be worth considering as well, is that if the client prefers to highlight the text of the line only, it can calculate the length of the line itself. In VSCode for instance, the line lengths are readily available; I imagine other editors are similar since they need that information for many purposes.

So I don't think clients will/should prefer that - for best rendering they should know this is a line highlight.

I think this comes down to how line highlights are represented in the protocol:

  • by a separate field: no need to send line length
  • by a special token bounds (e.g. [0,0)): no need to send line length
  • by a special scope: sending line length is a nice-to-have as it provides graceful degradation for clients that don't understand this extension
clang-tools-extra/clangd/SemanticHighlighting.cpp
203

I think that's an argument for making sure clients clearly distinguish between regular tokens and marking lines: overlapping tokens don't compose well, but we can easily say lines and token styles should compose.

(That particular style is not for me, but it doesn't matter)

So I don't think clients will/should prefer that - for best rendering they should know this is a line highlight.

I have actually seen clients that just make the text gray and it looks pretty nice (ReSharper for Visual Studio and IntelliJ IDEA definitely do that).
It also lets them consistently highlight part of the line (e.g. dead expressions or statements can be marked in gray even if they are on the same line).

So I wouldn't be so sure, at least some clients might prefer to do exclusively range (as opposed to line-based) highlightings.

PS Sorry if this was discussed before and I'm just missing context.

nridge planned changes to this revision.Oct 8 2019, 8:48 PM
nridge marked an inline comment as done.
nridge added inline comments.
clang-tools-extra/clangd/SemanticHighlighting.cpp
203

I find this a convincing argument for using line styles, thanks.

Especially since, at some point in the future, I would like us to be able to produce regular token highlightings for inactive code, much like in that TS code screenshot.

nridge added a comment.Oct 8 2019, 8:53 PM

I have actually seen clients that just make the text gray and it looks pretty nice (ReSharper for Visual Studio and IntelliJ IDEA definitely do that).

I don't think there is a conflict between this style, and using line highlightings: if the style being applied is a foreground color (gray text) as opposed to a background color, then it will look the same regardless of whether you apply it as a line style, or as a character range style.

It also lets them consistently highlight part of the line (e.g. dead expressions or statements can be marked in gray even if they are on the same line).

Highlighting part of a line is not applicable to inactive preprocessor branches in C++.

If we later introduce highlightings for dead expressions or statements, we can of course use character ranges and not lines for them.

It also lets them consistently highlight part of the line (e.g. dead expressions or statements can be marked in gray even if they are on the same line).

Highlighting part of a line is not applicable to inactive preprocessor branches in C++.

Note that the opposite is true - inactive preprocessor branches can be expressed as range-based highlightings.

If we later introduce highlightings for dead expressions or statements, we can of course use character ranges and not lines for them.

Then we'll have two modes of operation - line-based and range-based highlightings. Clients will have to support both, we'll have code dealing with both.
The alternative is having only range-based highlightings and using them everywhere. That's the main reason I would advocate for range-based.

Just noticed the next version of LSP added diagnostic tags for things like "unused field" or "dead code":
https://microsoft.github.io/language-server-protocol/specifications/specification-3-15, search for DiagnosticTag.

So I guess we won't need ever add range-based highlightings for dead code (and similar), we'll use diagnostics instead.

Concerns about having two kinds of highlightings to handle on the clients are still there, though.

nridge added a comment.Oct 9 2019, 8:14 AM

It also lets them consistently highlight part of the line (e.g. dead expressions or statements can be marked in gray even if they are on the same line).

Highlighting part of a line is not applicable to inactive preprocessor branches in C++.

Note that the opposite is true - inactive preprocessor branches can be expressed as range-based highlightings.

Except that if we want to allow background styling, the client would need to special-case the highlighting scope to know to apply the background style to the entire line.

nridge updated this revision to Diff 224480.Oct 10 2019, 2:33 PM

Updated to use as isInactive flag on SemanticHighlightingInformation

Ok, I updated the patch to convey the line highlight separately from the token highlights.

I did use the simplified approach I mentioned, where I use a single isInactive flag. If you'd prefer the more general approach where we send an array of line styles, I can do that, though my personal preference would be to wait until we have an actual use case before generalizing.

There are still no tests yet, as I'm still not sure whether we have agreement on the approach. If you like the approach, please let me know, and I'll follow up with tests.

Review / feedback ping :)

Just a few NITs from my side. I'll let the other review the actual implementation

clang-tools-extra/clangd/ParsedAST.cpp
217 ↗(On Diff #224480)

Maybe keep only ranges from the main file?

clang-tools-extra/clangd/ParsedAST.h
100 ↗(On Diff #224480)

NIT: please return llvm::ArrayRef<SourceRange> instead

nridge updated this revision to Diff 226563.Oct 26 2019, 9:09 PM

Addressed nits

Thanks, I'm fine with the current approach, feel free to add unittests.

clang-tools-extra/clangd/Protocol.h
1212

could you add some documentation describing this is a clangd extension?

clang-tools-extra/clangd/SemanticHighlighting.cpp
426

it took me a while to understand this code,

If the NewLine is IsInactive, it just contains a single token whose range is [0, 0), can't we just?

if (NewLine.back().Tokens.empty()) continue;

bool InactiveLine = NewLine.back().Tokens.front().Kind == InactiveCode;
assert(InactiveLine && NewLine.back().Tokens.size() == 1 && "IncativeCode must have a single token");
DiffedLines.back().IsInactive = true;
clang-tools-extra/clangd/SemanticHighlighting.h
46

we should document this Kind as it is different than other kinds, it is for line-style I believe?

nridge updated this revision to Diff 227374.Oct 31 2019, 4:39 PM
nridge marked 3 inline comments as done.

Addressed some nits and got existing tests to pass

Will follow up with new tests in the next update

clang-tools-extra/clangd/SemanticHighlighting.cpp
426

An inactive line can contain token highlightings as well. For example, we highlight macro references in the condition of an #ifdef, and that line is also inactive if the condition is false. Clients can merge the line style (which is typically a background color) with the token styles (typically a foreground color).

I did expand the comment to explain what the loop is doing more clearly.

nridge updated this revision to Diff 228959.Nov 12 2019, 2:15 PM

Add unit tests

nridge retitled this revision from [WIP] [clangd] Add support for an inactive regions notification to [clangd] Inactive regions support as an extension to semantic highlighting.Nov 12 2019, 2:16 PM
nridge updated this revision to Diff 228960.Nov 12 2019, 2:18 PM
nridge marked 11 inline comments as done.

Address one minor remaining comment

clang-tools-extra/clangd/SemanticHighlighting.cpp
194

Your observation is correct: the current implementation only highlights inactive lines after the preamble. For now, I added a test case with a FIXME for this.

Harbormaster completed remote builds in B40844: Diff 228960.
hokein added inline comments.Nov 15 2019, 2:08 AM
clang-tools-extra/clangd/ParsedAST.h
100 ↗(On Diff #223577)

This comment is undone.

clang-tools-extra/clangd/SemanticHighlighting.cpp
194

I'd prefer to fix it in this patch, it should not require large effort, and probably simplify the patch, you don't need to implement a new PPCallbacks. See my another comment.

nridge marked an inline comment as done.Nov 16 2019, 6:30 PM
nridge added inline comments.
clang-tools-extra/clangd/ParsedAST.h
100 ↗(On Diff #223577)

My apologies, I overlooked this.

nridge updated this revision to Diff 229707.Nov 16 2019, 6:30 PM

Support preamble as well

nridge marked 2 inline comments as done.Nov 16 2019, 6:31 PM

thanks, mostly good, a few more nits.

clang-tools-extra/clangd/ParsedAST.h
134 ↗(On Diff #229707)

nit: it is not used anymore, remove it.

clang-tools-extra/clangd/Protocol.h
1212

could you document that the line-style highlightings can be combined with any token-style highlightings?

clang-tools-extra/clangd/SemanticHighlighting.cpp
426

thanks, I see.

I think we can still simplify the code like below, this could improve the code readability, and avoid the comment explaining it.

llvm::erase_if(AddedLine.Tokens, [](const Token& T) { return T.Kind == InactiveCode;});
nridge updated this revision to Diff 230195.Nov 19 2019, 9:44 PM
nridge marked 4 inline comments as done.

Address nits

clang-tools-extra/clangd/SemanticHighlighting.cpp
426

Done. Note that we still need to set AddedLine.IsInactive appropriately. I did that in the lambda, which feels like a strange thing for an erase_if predicate to do. The alternative would be doing a count_if first (but then we're looping over the tokens twice).

hokein accepted this revision.Nov 20 2019, 12:50 AM

looks good, thanks!

This revision is now accepted and ready to land.Nov 20 2019, 12:50 AM
This revision was automatically updated to reflect the committed changes.