Added a function for diffing highlightings and removing duplicate lines. It also returns empty lines if the IDE should remove previous highlighting tokens on the line. Integrated with state into the highlighting generation flow. Only works correctly as long as there are no multiline tokens.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
A few suggestions from me, I haven't looked into the diffing algorithm and the tests yet.
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
458 ↗ | (On Diff #210527) | Should can't we handle this on didClose instead? |
1115 ↗ | (On Diff #210527) | NIT: maybe std::move() into Old? If we had exceptions, that would be complicated, but we don't have them |
1124 ↗ | (On Diff #210527) | To follow the same pattern as diagnostics, could we store before calling publish...? |
clang-tools-extra/clangd/SemanticHighlighting.cpp | ||
276 ↗ | (On Diff #210527) | Do you have any ideas on how we can fix this? I would simply split those tokens into multiple tokens of the same kind at newlines boundaries, but maybe there are other ways to handle this. In any case, could you put a suggested way to fix this (in case someone will want to pick it up, they'll have a starting point) |
clang-tools-extra/clangd/SemanticHighlighting.h | ||
79 ↗ | (On Diff #210527) | Are we locked into the line-based diff implementation? Does the protocol have a way to communicate this cleanly? |
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
458 ↗ | (On Diff #210527) | We are removing in didClose but the problem is that there is a race condition (I think). If someone does some edits and closes the document right after, the highlightings for the final edit might finish being generated after the FileToHighlightings have earsed the highlightings for the file. So next time when opening the file we will have those final highlightings that were generated for the last edit in the map. However I've head that ASTWorked is synchronous for a single file, is that the case for the didClose call as well? Because in that case there is no race condition. |
clang-tools-extra/clangd/SemanticHighlighting.cpp | ||
276 ↗ | (On Diff #210527) | Yeah, I would split the tokens into multiple lines as well. Or enforce that a token is single line and handle it in addToken in the collector. (i.e. change the HighlightingToken struct) It's a bit annoying to solve because we'd have to get the line lengths of every line that the multiline length token covers when doing that. |
clang-tools-extra/clangd/SemanticHighlighting.h | ||
79 ↗ | (On Diff #210527) | We are looked into some kind of line based diff implementation as the protocol sends token line by line. |
clang-tools-extra/clangd/SemanticHighlighting.cpp | ||
---|---|---|
274 ↗ | (On Diff #210527) | NIT: maybe rename to New and Old, the suffix of the name could be easily inferred from the variable type (clangd has hover/go-to-definition to find the type quickly, the code is rather small so its should always be visible on the screen too). |
276 ↗ | (On Diff #210527) | NIT: add assertions that highlightings are sorted. |
286 ↗ | (On Diff #210527) | NIT: maybe mention diff in the name somehow. I was confused at first, as I thought it's all highlightings grouped by line, not the diff. |
292 ↗ | (On Diff #210527) | NIT: maybe just inline NewEnd and OldEnd to save a few lines? |
294 ↗ | (On Diff #210527) | Could we skip directly to the next interesting line? auto NextLine = [&]() { int NextNew = NewLine.end() != NewHighlightings.end() ? NewLine.end()->start.line : numeric_limits<int>::max(); int NextOld = ...; return std::min(NextNew, NextOld); } for (int Line = 0; ...; Line = NextLine()) { } |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
---|---|---|
307 ↗ | (On Diff #210527) | Can we test this in a more direct manner by specifying:
The resulting tests don't have to be real C++ then, allowing to express what we're testing in a more direct manner. {/*Old*/ "$Variable[[a]]", /*New*/ "$Class[[a]]", /*Diff*/ "{{/*line */0, "$Class[[a]]"}} It also seems that the contents of the lines could even be checked automatically (they should be equal to the corresponding line from /*New*/, right?), that leaves us with even simpler inputs: {/*Old*/ "$Variable[[a]]", /*New*/ "$Class[[a]]", /*DiffLines*/ "{0}} |
391 ↗ | (On Diff #210527) | NIT: remove braces |
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
458 ↗ | (On Diff #210527) | You are correct, there is actually a race condition. We worked hard to eliminate it for diagnostics, but highlightings are going through a different code path in ASTWorker, not guarded by DiagsMu and ReportDiagnostics. And, unfortunately, I don't think this guard here prevents it in all cases. In particular, there is still a possibility (albeit very low, I guess) that the old highlightings you are trying to remove here are still being computed. If they are reported after this erase() runs, we will end up with inconsistent highlightings. Ideally, we would guard the diagnostics callback with the same mutex, but given our current layering it seems like a hard problem, will need to think what's the simplest way to fix this. |
clang-tools-extra/clangd/SemanticHighlighting.h | ||
79 ↗ | (On Diff #210527) | It seems there no compact way to send interesting diffs then, although the clients can do a reasonably good job of moving the older version of highlightings on changes until the server sends them a new version. The diff-based approach only seems to be helping with changes on a single line. Which is ok, thanks for the explanation. |
Fixed test case that was actually (kinda) broken even though it didn't fail.
(The last two edits in semantic-highlighting test. They removed the = 2 part and did not insert an extra ; which produced invalid code. It was still fine because the highlightings that were supposed to be generated were still generated. This just makes the code not be invalid)
clang-tools-extra/clangd/SemanticHighlighting.cpp | ||
---|---|---|
292 ↗ | (On Diff #210527) | Wouldn't that violate code style: https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop ? |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
307 ↗ | (On Diff #210527) | That's a great idea on how to write these tests. |
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
458 ↗ | (On Diff #210527) | The race condition of highlighting sounds bad (especially when a user opens a large file and closes it immediately, then the highlighting is finished and we emit it to the client). No need to fix it in this patch, just add a FIXME. Can we use the same mechanism for Diagnostic to guard the highlighting here? or use the DiagsMu and ReportDiagnostics to guard the callback.onMainAST() as well (which is probably not a good idea)... |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
307 ↗ | (On Diff #210527) | hmm, I have a different opinion here (I'd prefer the previous one), let's chat. |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
---|---|---|
307 ↗ | (On Diff #210613) | @hokein rightfully pointed out that mentioning all changed lines makes the tests unreadable. {/*Before*/ R"( $Var[[a]] $Func[[b]] "), /*After*/ R"( $Var[[a]] ^$Func[[b]] )"} // and no list of lines is needed! Could we do that here? One interesting case that we can't test this way to removing lines from the end of the file. But for that particular case, could we just write a separate test case? |
The fix for a race condition on remove has landed in rL366577, this revision would need a small update after it.
Fixed to work with that patch.
Should the diffing be moved somewhere else that is not inside the publish function though? That would require moving the state outside the LSP server though and handle the onClose/onOpen events somewhere else though.
Maybe it's ok to keep the diffing in the publish lock?
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
---|---|---|
307 ↗ | (On Diff #210613) | We don't want to send diffs that are beyond the end of the file. There is a testcase for that as well (the count of newlines in the new code is sent to the differ as the number of lines in the file). |
Yeah, let's keep it in publish for now. Diffing in there does not change complexity of the operations.
If it ever turns out that the constant factor is too high, we can figure out ways to fight this.
Mostly final NITs from me, but there is an important one about removing the erase call on didOpen.
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
467 ↗ | (On Diff #211890) | Now that the patch landed, this is obsolete. Could you remove? |
clang-tools-extra/clangd/ClangdServer.cpp | ||
80 ↗ | (On Diff #211890) | NIT: use unsigned instead of unsigned int |
clang-tools-extra/clangd/ClangdServer.h | ||
56 ↗ | (On Diff #211890) | NIT: could you document NLines |
clang-tools-extra/clangd/SemanticHighlighting.cpp | ||
294 ↗ | (On Diff #211890) | Could we try to find a better name for this? Having Line and NextLine() which represent line numbers and NewLine which represents highlightings, creates some confusion. |
310 ↗ | (On Diff #211890) | Line != intmax is redundant (NLines is <= intmax by definition). |
clang-tools-extra/clangd/SemanticHighlighting.h | ||
80 ↗ | (On Diff #211890) | Could you document what NLines is? Specifically, whether it's the number of lines for New or Old. |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
60 ↗ | (On Diff #211890) | NIT: add documentation on how this should be used |
66 ↗ | (On Diff #211890) | NIT: use llvm::DenseMap |
a few more comments from my side.
clang-tools-extra/clangd/SemanticHighlighting.cpp | ||
---|---|---|
309 ↗ | (On Diff #211890) | nit: I believe theia is crashing because of this? could we file a bug to theia? I think it would be nice for client to be robust on this case. |
clang-tools-extra/clangd/SemanticHighlighting.h | ||
70 ↗ | (On Diff #211890) | nit: add , before this |
73 ↗ | (On Diff #211890) | could you refine this sentence, I can't parse it. |
74 ↗ | (On Diff #211890) | and here as well. |
80 ↗ | (On Diff #211890) | could we use a more describe name for NLines? maybe MaxLines |
clang-tools-extra/clangd/SemanticHighlighting.cpp | ||
---|---|---|
294 ↗ | (On Diff #211890) | I renamed the Line and NextLine() instead. Could also rename NewLine to something like NewLineHighlightings but I feel like it almost becomes to verbose at that point. |
309 ↗ | (On Diff #211890) | I seem to be misremembering, when I first started testing in theia I think I was crashing the client (due to a broken base64 encoding function, will take a look and see what was actually happening, can't quite remember) So this could be removed, it will just sometimes return highlightings that are outside of the file and hopefully any future clients would handle that as well. |
clang-tools-extra/clangd/ClangdServer.cpp | ||
---|---|---|
81 ↗ | (On Diff #212301) | from the comment of the getLineNumber, this is not cheap to call this method. We may use SM.getBufferData(MainFileID).count('\n') to count the line numbers. // This requires building and caching a table of line offsets for the /// MemoryBuffer, so this is not cheap: use only when about to emit a /// diagnostic. |
clang-tools-extra/clangd/ClangdServer.h | ||
55 ↗ | (On Diff #212301) | nit: drop the needed ..., this is an interface, don't mention the details of subclass (diffs are subclass implementation details). Maybe name it "NumLines"? |
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
1125 ↗ | (On Diff #212301) | NIT: avoid copying (from Highlightings into the map) under a lock, make the copy outside the lock instead |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
408 ↗ | (On Diff #212301) | Could you also add a separate test that checks diffs when the number of lines is getting smaller (i.e. taking NLines into account) |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
---|---|---|
408 ↗ | (On Diff #212301) | I would expect this to be better handled outside checkDiffedHighlights to avoid over-generalizing this function. But up to you. |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
---|---|---|
408 ↗ | (On Diff #212301) | That's what this test does: { R"( $Class[[A]] $Variable[[A]] $Class[[A]] $Variable[[A]] )", R"( $Class[[A]] $Variable[[A]] )"}, But do you mean I should do a testcase like: { R"( $Class[[A]] $Variable[[A]] $Class[[A]] $Variable[[A]] )", R"( $Class[[A]] $Variable[[A]] $Class[[A]] $Variable[[A]] )"}, And just set NLines = 2 instead when doing the diffing? |
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
1125 ↗ | (On Diff #212301) | I think we can even avoid copy, since we only use the Highlightings for calculating the diff. Maybe just { std::lock_guard<std::mutex> Lock(HighlightingsMutex); Old = std::move(FileToHighlightings[File]); } // calculate the diff. { std::lock_guard<std::mutex> Lock(HighlightingsMutex); FileToHighlightings[File] = std::move(Highlightings); } |
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
1125 ↗ | (On Diff #212301) | I think I changed this to only lock once (and copy instead) at the same time me and @ilya-biryukov were talking about the race condition (?) |
LGTM from my side
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
1125 ↗ | (On Diff #212301) | It means there's a window where nobody can access the highlightings for this file. Which is probably fine, we really do use this only for computing the diff and nothing else. But doing the copy shouldn't matter here either, so leaving as is also looks ok from my side. If you decide to avoid an extra copy, please add comments to FileToHighlightings (e.g. only used to compute diffs, must not be used outside onHighlightingsReady and didRemove). Don't remember exact details about race conditions, but we should be good now that it's called inside Publish() |
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp | ||
408 ↗ | (On Diff #212301) | Ah, the test actually needs the set of changed lines to be exactly the same as marked lines. |
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
1125 ↗ | (On Diff #212301) | I'm fine with the current state, avoiding the copy cost would make the code here a bit mess... |