Page MenuHomePhabricator

[clangd] Duplicate lines of semantic highlightings sent removed.
ClosedPublic

Authored by jvikstrom on Jul 10 2019, 4:29 AM.

Details

Summary

Added a function for diffing highlightings and removing duplicate lines. It also returns empty lines if the IDE should remove previous highlighting tokens on the line. Integrated with state into the highlighting generation flow. Only works correctly as long as there are no multiline tokens.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

A few suggestions from me, I haven't looked into the diffing algorithm and the tests yet.

clang-tools-extra/clangd/ClangdLSPServer.cpp
458

Should can't we handle this on didClose instead?

1115

NIT: maybe std::move() into Old?

If we had exceptions, that would be complicated, but we don't have them

1124

To follow the same pattern as diagnostics, could we store before calling publish...?

clang-tools-extra/clangd/SemanticHighlighting.cpp
243

Do you have any ideas on how we can fix this?

I would simply split those tokens into multiple tokens of the same kind at newlines boundaries, but maybe there are other ways to handle this.

In any case, could you put a suggested way to fix this (in case someone will want to pick it up, they'll have a starting point)

clang-tools-extra/clangd/SemanticHighlighting.h
78

Are we locked into the line-based diff implementation?
It works nicely while editing inside the same line, but seem to do a bad job on a common case of inserting/removing lines.

Does the protocol have a way to communicate this cleanly?

jvikstrom marked 6 inline comments as done.Jul 18 2019, 7:53 AM
jvikstrom added inline comments.
clang-tools-extra/clangd/ClangdLSPServer.cpp
458

We are removing in didClose but the problem is that there is a race condition (I think).

If someone does some edits and closes the document right after, the highlightings for the final edit might finish being generated after the FileToHighlightings have earsed the highlightings for the file. So next time when opening the file we will have those final highlightings that were generated for the last edit in the map.
I don't really know how we could handle this in didClose without having another map and with an open/closed bit and checking that every time we generate new highlightings. But we'd still have to set the bit to be open every didOpen so I don't really see the benefit.

However I've head that ASTWorked is synchronous for a single file, is that the case for the didClose call as well? Because in that case there is no race condition.

clang-tools-extra/clangd/SemanticHighlighting.cpp
243

Yeah, I would split the tokens into multiple lines as well. Or enforce that a token is single line and handle it in addToken in the collector. (i.e. change the HighlightingToken struct)

It's a bit annoying to solve because we'd have to get the line lengths of every line that the multiline length token covers when doing that.

clang-tools-extra/clangd/SemanticHighlighting.h
78

We are looked into some kind of line based diff implementation as the protocol sends token line by line.
There should be away of solving the inserting/removing lines, but I'll have a look at that in a follow up.
Theia seems to do a good job of moving tokens to where they should be automatically when inserting a new line though. I want to be able to see how vscode handles it first as well though before.

ilya-biryukov added inline comments.Jul 18 2019, 7:53 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
241

NIT: maybe rename to New and Old, the suffix of the name could be easily inferred from the variable type (clangd has hover/go-to-definition to find the type quickly, the code is rather small so its should always be visible on the screen too).

243

NIT: add assertions that highlightings are sorted.

253

NIT: maybe mention diff in the name somehow. I was confused at first, as I thought it's all highlightings grouped by line, not the diff.

259

NIT: maybe just inline NewEnd and OldEnd to save a few lines?

261

Could we skip directly to the next interesting line?
The simplest way to do this I could come up with is this:

auto NextLine = [&]() {
  int NextNew = NewLine.end() != NewHighlightings.end() ? NewLine.end()->start.line : numeric_limits<int>::max();
  int NextOld = ...;
  return std::min(NextNew, NextOld);
}

for (int Line = 0; ...; Line = NextLine()) {
}
jvikstrom updated this revision to Diff 210567.Jul 18 2019, 7:54 AM
jvikstrom edited the summary of this revision. (Show Details)

Added additional note to FIXME on how to solve and also move(ing) Old.

ilya-biryukov added inline comments.Jul 18 2019, 8:04 AM
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
272

Can we test this in a more direct manner by specifying:

  1. annotated input for old highlightings,
  2. annotated input for new highlightings,
  3. the expected diff?

The resulting tests don't have to be real C++ then, allowing to express what we're testing in a more direct manner.

{/*Old*/ "$Variable[[a]]", /*New*/ "$Class[[a]]", /*Diff*/ "{{/*line */0, "$Class[[a]]"}}

It also seems that the contents of the lines could even be checked automatically (they should be equal to the corresponding line from /*New*/, right?), that leaves us with even simpler inputs:

{/*Old*/ "$Variable[[a]]", /*New*/ "$Class[[a]]", /*DiffLines*/ "{0}}
356

NIT: remove braces

ilya-biryukov marked an inline comment as done.Jul 18 2019, 8:17 AM
ilya-biryukov added inline comments.
clang-tools-extra/clangd/ClangdLSPServer.cpp
458

You are correct, there is actually a race condition. We worked hard to eliminate it for diagnostics, but highlightings are going through a different code path in ASTWorker, not guarded by DiagsMu and ReportDiagnostics.

And, unfortunately, I don't think this guard here prevents it in all cases. In particular, there is still a possibility (albeit very low, I guess) that the old highlightings you are trying to remove here are still being computed. If they are reported after this erase() runs, we will end up with inconsistent highlightings.

Ideally, we would guard the diagnostics callback with the same mutex, but given our current layering it seems like a hard problem, will need to think what's the simplest way to fix this.

clang-tools-extra/clangd/SemanticHighlighting.h
78

It seems there no compact way to send interesting diffs then, although the clients can do a reasonably good job of moving the older version of highlightings on changes until the server sends them a new version.

The diff-based approach only seems to be helping with changes on a single line.

Which is ok, thanks for the explanation.

jvikstrom updated this revision to Diff 210600.Jul 18 2019, 9:22 AM
jvikstrom marked 7 inline comments as done.

Address comments and fixed bug where we'd send diffs outside of the file.

jvikstrom updated this revision to Diff 210601.Jul 18 2019, 9:23 AM

Remove braces.

Harbormaster completed remote builds in B35277: Diff 210601.
jvikstrom updated this revision to Diff 210606.Jul 18 2019, 9:40 AM

Fixed test case that was actually (kinda) broken even though it didn't fail.

(The last two edits in semantic-highlighting test. They removed the = 2 part and did not insert an extra ; which produced invalid code. It was still fine because the highlightings that were supposed to be generated were still generated. This just makes the code not be invalid)

jvikstrom updated this revision to Diff 210613.Jul 18 2019, 9:50 AM

Removed one lock from onSemanticHighlighting.

jvikstrom added inline comments.Jul 18 2019, 10:34 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
259

Wouldn't that violate code style: https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop ?
Or maybe that doesn't matter? (I have no preference either way, just picked this way as I could remeber reading from the styleguide)

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
272

That's a great idea on how to write these tests.

hokein added inline comments.Jul 19 2019, 2:28 AM
clang-tools-extra/clangd/ClangdLSPServer.cpp
458

The race condition of highlighting sounds bad (especially when a user opens a large file and closes it immediately, then the highlighting is finished and we emit it to the client). No need to fix it in this patch, just add a FIXME.

Can we use the same mechanism for Diagnostic to guard the highlighting here? or use the DiagsMu and ReportDiagnostics to guard the callback.onMainAST() as well (which is probably not a good idea)...

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
272

hmm, I have a different opinion here (I'd prefer the previous one), let's chat.

ilya-biryukov added inline comments.Jul 19 2019, 6:39 AM
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
276

@hokein rightfully pointed out that mentioning all changed lines makes the tests unreadable.
An alternative idea we all came up with is to force people to put ^ on each of the changed lines inside the NewCode, i.e.

{/*Before*/ R"(
  $Var[[a]]
  $Func[[b]]
"),
 /*After*/ R"(
  $Var[[a]]
 ^$Func[[b]]
)"} // and no list of lines is needed!

Could we do that here?

One interesting case that we can't test this way to removing lines from the end of the file. But for that particular case, could we just write a separate test case?

The fix for a race condition on remove has landed in rL366577, this revision would need a small update after it.

jvikstrom updated this revision to Diff 211890.Jul 26 2019, 1:21 AM
jvikstrom marked 2 inline comments as done.

Made tests more readable. Updated to work with rL366577.

The fix for a race condition on remove has landed in rL366577, this revision would need a small update after it.

Fixed to work with that patch.
Should the diffing be moved somewhere else that is not inside the publish function though? That would require moving the state outside the LSP server though and handle the onClose/onOpen events somewhere else though.
Maybe it's ok to keep the diffing in the publish lock?

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
276

We don't want to send diffs that are beyond the end of the file. There is a testcase for that as well (the count of newlines in the new code is sent to the differ as the number of lines in the file).

The fix for a race condition on remove has landed in rL366577, this revision would need a small update after it.

Fixed to work with that patch.
Should the diffing be moved somewhere else that is not inside the publish function though? That would require moving the state outside the LSP server though and handle the onClose/onOpen events somewhere else though.
Maybe it's ok to keep the diffing in the publish lock?

Yeah, let's keep it in publish for now. Diffing in there does not change complexity of the operations.
If it ever turns out that the constant factor is too high, we can figure out ways to fight this.

Mostly final NITs from me, but there is an important one about removing the erase call on didOpen.

clang-tools-extra/clangd/ClangdLSPServer.cpp
459

Now that the patch landed, this is obsolete. Could you remove?

clang-tools-extra/clangd/ClangdServer.cpp
80 ↗(On Diff #211890)

NIT: use unsigned instead of unsigned int

clang-tools-extra/clangd/ClangdServer.h
56 ↗(On Diff #211890)

NIT: could you document NLines

clang-tools-extra/clangd/SemanticHighlighting.cpp
260

Could we try to find a better name for this? Having Line and NextLine() which represent line numbers and NewLine which represents highlightings, creates some confusion.

276

Line != intmax is redundant (NLines is <= intmax by definition).
Maybe remove it altogether to simplify the loop condition?

clang-tools-extra/clangd/SemanticHighlighting.h
79

Could you document what NLines is? Specifically, whether it's the number of lines for New or Old.

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
75

NIT: add documentation on how this should be used
Most importantly, the fact that we need to put ^ on all changed lines should be mentioned.

81

NIT: use llvm::DenseMap

a few more comments from my side.

clang-tools-extra/clangd/SemanticHighlighting.cpp
275

nit: I believe theia is crashing because of this? could we file a bug to theia? I think it would be nice for client to be robust on this case.

clang-tools-extra/clangd/SemanticHighlighting.h
69

nit: add , before this

72

could you refine this sentence, I can't parse it.

73

and here as well.

79

could we use a more describe name for NLines? maybe MaxLines

jvikstrom updated this revision to Diff 212301.Jul 30 2019, 2:26 AM
jvikstrom marked 14 inline comments as done.

Address comments.

jvikstrom added inline comments.Jul 30 2019, 2:33 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
260

I renamed the Line and NextLine() instead. Could also rename NewLine to something like NewLineHighlightings but I feel like it almost becomes to verbose at that point.

275

I seem to be misremembering, when I first started testing in theia I think I was crashing the client (due to a broken base64 encoding function, will take a look and see what was actually happening, can't quite remember)
It actually seems like theia just ignores any out of bounds highlightings.

So this could be removed, it will just sometimes return highlightings that are outside of the file and hopefully any future clients would handle that as well.

hokein added inline comments.Jul 30 2019, 4:51 AM
clang-tools-extra/clangd/ClangdServer.cpp
81 ↗(On Diff #212301)

from the comment of the getLineNumber, this is not cheap to call this method. We may use SM.getBufferData(MainFileID).count('\n') to count the line numbers.

// This requires building and caching a table of line offsets for the
/// MemoryBuffer, so this is not cheap: use only when about to emit a
/// diagnostic.
clang-tools-extra/clangd/ClangdServer.h
55 ↗(On Diff #212301)

nit: drop the needed ..., this is an interface, don't mention the details of subclass (diffs are subclass implementation details).

Maybe name it "NumLines"?

jvikstrom updated this revision to Diff 212522.Jul 31 2019, 1:13 AM
jvikstrom marked 2 inline comments as done.

Stopped using expensive getLineNumber function.

ilya-biryukov added inline comments.Jul 31 2019, 1:17 AM
clang-tools-extra/clangd/ClangdLSPServer.cpp
1116

NIT: avoid copying (from Highlightings into the map) under a lock, make the copy outside the lock instead

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
373

Could you also add a separate test that checks diffs when the number of lines is getting smaller (i.e. taking NLines into account)

ilya-biryukov added inline comments.Jul 31 2019, 1:18 AM
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
373

I would expect this to be better handled outside checkDiffedHighlights to avoid over-generalizing this function. But up to you.

jvikstrom updated this revision to Diff 212528.Jul 31 2019, 1:39 AM
jvikstrom marked an inline comment as done.

Copy outside lock.

jvikstrom added inline comments.Jul 31 2019, 1:40 AM
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
373

That's what this test does:

{
                    R"(
        $Class[[A]]
        $Variable[[A]]
        $Class[[A]]
        $Variable[[A]]
      )",
                    R"(
        $Class[[A]]
        $Variable[[A]]
      )"},

But do you mean I should do a testcase like:

{
                    R"(
        $Class[[A]]
        $Variable[[A]]
        $Class[[A]]
        $Variable[[A]]
      )",
                    R"(
        $Class[[A]]
        $Variable[[A]]
        $Class[[A]]
        $Variable[[A]]
      )"},

And just set NLines = 2 instead when doing the diffing?

hokein added inline comments.Jul 31 2019, 1:43 AM
clang-tools-extra/clangd/ClangdLSPServer.cpp
1116

I think we can even avoid copy, since we only use the Highlightings for calculating the diff.

Maybe just

{
    std::lock_guard<std::mutex> Lock(HighlightingsMutex);
    Old = std::move(FileToHighlightings[File]);
}
// calculate the diff.
{  
     std::lock_guard<std::mutex> Lock(HighlightingsMutex);
     FileToHighlightings[File] = std::move(Highlightings);
}
jvikstrom marked an inline comment as done.Jul 31 2019, 8:20 AM
jvikstrom added inline comments.
clang-tools-extra/clangd/ClangdLSPServer.cpp
1116

I think I changed this to only lock once (and copy instead) at the same time me and @ilya-biryukov were talking about the race condition (?)

ilya-biryukov accepted this revision.Jul 31 2019, 9:01 AM
ilya-biryukov marked an inline comment as done.

LGTM from my side

clang-tools-extra/clangd/ClangdLSPServer.cpp
1116

It means there's a window where nobody can access the highlightings for this file. Which is probably fine, we really do use this only for computing the diff and nothing else.

But doing the copy shouldn't matter here either, so leaving as is also looks ok from my side.

If you decide to avoid an extra copy, please add comments to FileToHighlightings (e.g. only used to compute diffs, must not be used outside onHighlightingsReady and didRemove).
It is declared really close to FixItsMap and looks very similar, but the latter is used in completely different ways; that could lead to confusion.

Don't remember exact details about race conditions, but we should be good now that it's called inside Publish()

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
373

Ah, the test actually needs the set of changed lines to be exactly the same as marked lines.
In that case, all is good, I've just missed this detail.

hokein accepted this revision.Aug 1 2019, 12:52 AM
hokein added inline comments.
clang-tools-extra/clangd/ClangdLSPServer.cpp
1116

I'm fine with the current state, avoiding the copy cost would make the code here a bit mess...

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2019, 1:08 AM