This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Support incremental document syncing
ClosedPublic

Authored by simark on Mar 8 2018, 1:24 PM.

Details

Summary

This patch adds support for incremental document syncing, as described
in the LSP spec. The protocol specifies ranges in terms of Position (a
line and a character), and our drafts are stored as plain strings. So I
see two things that may not be super efficient for very large files:

  • Converting a Position to an offset (the positionToOffset function) requires searching for end of lines until we reach the desired line.
  • When we update a range, we construct a new string, which implies copying the whole document.

However, for the typical size of a C++ document and the frequency of
update (at which a user types), it may not be an issue. This patch aims
at getting the basic feature in, and we can always improve it later if
we find it's too slow.

Signed-off-by: Simon Marchi <simon.marchi@ericsson.com>

Diff Detail

Repository
rL LLVM

Event Timeline

simark created this revision.Mar 8 2018, 1:24 PM
ilya-biryukov requested changes to this revision.Mar 9 2018, 2:06 AM
ilya-biryukov added a reviewer: ilya-biryukov.
ilya-biryukov added inline comments.
clangd/ClangdLSPServer.cpp
101 ↗(On Diff #137648)

A comment mentioning that 2 means incremental document sync from LSP would be useful here.

149 ↗(On Diff #137648)

We should handle more than a single change here.

clangd/ClangdServer.h
159 ↗(On Diff #137648)

I don't think we should do this on ClangdServer level. We will have to copy the whole file anyway before passing it to clang, so there are no performance wins here and it complicates the interface.
I suggest we update the document text from diffs on ClangdLSPServer level and continue passing the whole document to ClangdServer.
It would mean DraftMgr will need to move to ClangdLSPServer, but it's fine.

clangd/DraftStore.cpp
47 ↗(On Diff #137648)

NIT: LLVM uses UpperCamelCase for parameters and local variables

57 ↗(On Diff #137648)

We should return an error to the client in that case. Changing return type to llvm::Expected<DocVersion> and handling an error in the callers should do the trick.

66 ↗(On Diff #137648)

We need to check that startIndex and endIndex are inside the bounds of the string.

clangd/DraftStore.h
60 ↗(On Diff #137648)

Let's model the LSP more closely here:

// Protocol.h
struct TextDocumentContentChangeEvent {
  llvm::Optional<Range> range;
  llvm::Optional<int>  rangeLength;
  std::string text;
};

/// DraftManager
class DraftManager {
  /// For initial didOpen.
  DocVersion addDraft(PathRef File, std::string Contents);

  /// For didChange, handles both incremental and full updates. \p Changes cannot be empty.
  DocVersion updateDraft(PathRef File, ArrayRef<TextDocumentContentChangeEvent> Changes);
};

Keeping track of LSP versions here could also be useful to make sure we drop any updates in between (which would definitely lead to surprising results), but it's ok if we add a FIXME and do it later.

clangd/Protocol.h
289 ↗(On Diff #137648)

Please add rangeLength from lsp spec too. On one hand, it seems redundant, but it can be used to add assertions that the text is the same.

This revision now requires changes to proceed.Mar 9 2018, 2:06 AM
ilya-biryukov added inline comments.Mar 9 2018, 2:14 AM
clangd/ClangdServer.cpp
557 ↗(On Diff #137648)

When you'll try remove the DraftMgr, this piece of code will be hard to refactor because it relies on DocVersion.
This is a workaround for a possible race condition that should really be handled by TUScheduler rather than this code, but we haven't got around to fixing it yet. (It's on the list, though, would probably get in next week).

A proper workaround for now would be to add llvm::StringMap<uint64_t> InternalVersions field to ClangdServer, increment those versions on each call to addDocument and use them here in the same way we currently use DraftMgr's versions.

simark marked 2 inline comments as done.Mar 9 2018, 8:20 AM
simark added inline comments.
clangd/ClangdLSPServer.cpp
101 ↗(On Diff #137648)

I opted to add a TextDocumentSyncKind enum to Protocol.h and use that. Though instead of an enum, I did a class with static const fields:

struct TextDocumentSyncKind {
  /// Documents should not be synced at all.
  static const int None = 0;

  /// Documents are synced by always sending the full content of the document.
  static const int Full = 1;

  /// Documents are synced by sending the full content on open.  After that
  /// only incremental updates to the document are send.
  static const int Incremental = 2;
};

otherwise it would've required an (int) cast when using it to generate JSON:

{"textDocumentSync", TextDocumentSyncKind::Incremental},
149 ↗(On Diff #137648)

Ok, I'll try that. I am not sure I interpret the spec correctly. If you have two changes, the second change applies to the state of the document after having applied the first change, is that right? If that's the case, I think we only need to iterate on the changes and call addDocument on them sequentially (which could be done in the DraftMgr, given the interface change to DraftStore you suggest lower).

clangd/ClangdServer.h
159 ↗(On Diff #137648)

ClangdServer also needs DraftMgr for forceReparse and reparseOpenedFiles. I guess that too would move to ClangdLSPServer? I'm just not sure how to adapt the unit tests, since we don't have unittests using ClangdLSPServer (yet).

simark added inline comments.Mar 9 2018, 8:20 AM
clangd/ClangdServer.cpp
557 ↗(On Diff #137648)

Hmm ok, it will probably make sense when I try to do it. The InternalVersions map maps URIs to the latest version seen?

Is the race condition because we could get diagnostics for a stale version of the document, so we don't want to emit them?

clangd/DraftStore.cpp
47 ↗(On Diff #137648)

Woops. I should learn to use clang-tidy. It found other instances (the local variables) but it doesn't find the parameters not written in camel case. Do you have an idea why? I dumped the config and see these:

- key:             readability-identifier-naming.ClassCase
  value:           CamelCase
- key:             readability-identifier-naming.EnumCase
  value:           CamelCase
- key:             readability-identifier-naming.UnionCase
  value:           CamelCase
- key:             readability-identifier-naming.VariableCase
  value:           CamelCase

I assume there must be a ParamCase or something like that, but I can't find the exhaustive list of parameters for that check.

66 ↗(On Diff #137648)

Right.

clangd/DraftStore.h
60 ↗(On Diff #137648)

By DraftManager I suppose you mean DraftStore? I'm fine with that interface change.

clangd/Protocol.h
289 ↗(On Diff #137648)

Ok.

simark marked 2 inline comments as done.Mar 9 2018, 8:43 AM
simark added inline comments.
clangd/ClangdServer.h
159 ↗(On Diff #137648)

Actually, forceReparse doesn't really need DraftMgr (it's only used for the assert), only reparseOpenedFiles needs it. So in the test, we can manually call forceReparse on all the files by hand... this just means that reparseOpenedFiles won't be tested anymore.

I uploaded a new patch that moves the draft store to ClangdLSPServer, that implements what you suggested.

https://reviews.llvm.org/D44408

I will update the incremental sync patch once that patch is in.

ilya-biryukov added inline comments.Mar 16 2018, 2:52 AM
clangd/ClangdLSPServer.cpp
149 ↗(On Diff #137648)

I am interpreting it in the same manner, i.e. we need to apply the edits in the sequence they were provided.
But I would double-check on a client that actually send multiple changes. Does VSCode do that?

clangd/ClangdServer.cpp
557 ↗(On Diff #137648)

For the record:
The race only happens if you add a file(version 1), remove the same file(version 2), add the same file again(version 3).
The diagnostics callbacks for version 1 and version 3 are triggered concurrently and we right report (3) and later (1).
It breaks "eventual consistency", i.e. clangd should eventually provide diagnostics for the latest version of the file, and we provide stale diagnostics in that case.

clangd/DraftStore.cpp
47 ↗(On Diff #137648)

There is indeed a readability-idenfitier.ParameterCase key.
Found in the code, it is probably missing from the docs. IdentifierNamingCheck.cpp.
We should probably update the relevant clang-tidy config. (not sure where it is, though, I don't use clang-tidy for clangd development. should really start doing that :-))

ilya-biryukov added inline comments.Mar 16 2018, 4:41 AM
clangd/ClangdLSPServer.cpp
101 ↗(On Diff #137648)

I'd opt for a real enum. Having some extra type-safety and compiler warnings (i.e. non-covered enum value in switch) is much more valuable than a small inconvenience when converting the value back to json and back.

simark marked 3 inline comments as done.Mar 16 2018, 11:39 AM
simark added inline comments.
clangd/ClangdLSPServer.cpp
101 ↗(On Diff #137648)

Makes sense.

149 ↗(On Diff #137648)

I don't know of any client that does that. AFAIK, vscode doesn't:

https://github.com/Microsoft/vscode-languageserver-node/blob/master/client/src/client.ts#L907-L908

It's the only place I could find where a DidChangeTextDocumentParams is created, and it comes from a single TextDocumentChangeEvent, so there will always be only one item in the array.

clangd/ClangdServer.cpp
557 ↗(On Diff #137648)

Ok, thanks.

clangd/DraftStore.cpp
47 ↗(On Diff #137648)

The .clang-tidy file in the root of the llvm repo contains it. But the one in the clang repo overrides the readability-identifier-naming section without defining ParameterCase. That's probably why it's not enabled.

'llvm-header-guard' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'llvm-include-order' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'llvm-namespace-comment' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'llvm-twine-local' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-definitions-in-headers' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-misplaced-const' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-new-delete-overloads' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-non-copyable-objects' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-redundant-expression' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-static-assert' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-throw-by-value-catch-by-reference' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-unconventional-assign-operator' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-uniqueptr-reset-release' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-unused-alias-decls' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'misc-unused-using-decls' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.
'readability-identifier-naming' is enabled in the /home/emaisin/src/llvm/tools/clang/tools/extra/clangd/.clang-tidy.

So we should probably update the .clang-tidy file in the clang repo?

simark updated this revision to Diff 138741.Mar 16 2018, 11:49 AM
simark marked an inline comment as done.

Address review comments, rebase on latest master

ilya-biryukov added inline comments.Mar 19 2018, 8:32 AM
clangd/ClangdLSPServer.cpp
164 ↗(On Diff #138741)

We should signal an error to the client by calling replyError

149 ↗(On Diff #137648)

But TextDocumentChangeEvent may contain multiple content changes, see codeConverter.ts.

Anyway, the spec seems to be clear enough here. Let's support multiple changes, it should be super-easy on our side (just call a function that does the update in a loop)

clangd/DraftStore.cpp
61 ↗(On Diff #138741)

positionToOffset always truncates to size of contents. It has to return Expected<int> too here.

64 ↗(On Diff #138741)

NIT: maybe use change or edit instead of range?
range's start position .... is out of range is a bit confusing.

73 ↗(On Diff #138741)

Having partially applied changes in the map seems weird. Maybe the code applying the diffs to a separate function that returns either new contents or an error and use it here?
I.e. we won't be able to end up in a state with partially applied changes:

Expected<string> applyChanges(StringRef Contents, ArrayRef<TextDocumentChange> Changes) {
}

// See other comment about Version.
Expected<string> updateDraft(StringRef File, /*int Version,*/ ArrayRef<TextDocumentChange> Changes) {
  // check File is in the map and version matches....
  //...
  string& Contents = ...;
  auto NewContents = applyChanges(Contents, Changes);
  if (!NewContents) 
    return NewContents.takeError();
  Contents = *NewContents;
  return std::move(*NewContents);
}
97 ↗(On Diff #138741)

The math is correct, but I'm confused on how to properly read the formula here.
Maybe change to:
Contents.length() - (EndIndex - StartIndex) + Change.text.length()?

This clearly reads as:
LengthBefore - LengthRemoved + LengthAdded

103 ↗(On Diff #138741)

This check seems unnecessary, we've already checked for range errors. Maybe remove it?

106 ↗(On Diff #138741)

It is impossible to mix full content changes with incremental range changes, right?

I suggest handling the full content change as a separate case at the start of the function:

if (Changes.size() == 1 && !Changes[0].range) {
  Contents = std::move(Change.text);
  return Contents;
}

for (auto &Change : Changes) {
  if (!Change.range)
    return make_error("Full change in the middle of incremental changes");
}
47 ↗(On Diff #137648)

Yeah, it looks outdated, .clang_tidy in top-level llvm directory already has ParameterCase. Created D44628 to update the config in clang too.

clangd/DraftStore.h
36 ↗(On Diff #138741)

Could we add versions from LSP's VersionedTextDocumentIdentifier here and make the relevant sanity checks?
Applying diffs to the wrong version will cause everything to fall apart, so we should detect this error and signal it to the client as soon as possible.

unittests/clangd/DraftStoreTests.cpp
27 ↗(On Diff #138741)

no need for static, since function is already inside an anonymous namespace.
remove it?

38 ↗(On Diff #138741)

NIT: remove empty line after comment?
NIT: make the comment doxygen-like (i.e. start with ///)?

42 ↗(On Diff #138741)

NIT: use constexpr StringLiteral Path("/hello.cpp") instead?
It's a StringRef with size known at compile-time.

58 ↗(On Diff #138741)

Use ASSERT_TRUE(!!Result), the code below won't work for failure results anyway.

simark marked 4 inline comments as done.Mar 19 2018, 9:03 AM
simark added inline comments.
clangd/ClangdLSPServer.cpp
164 ↗(On Diff #138741)

textDocument/didChange is a jsonrpc notification, not request, so we can't send back an error.

clangd/DraftStore.cpp
106 ↗(On Diff #138741)

I'd say it is unlikely and there's probably no reason to do it, but the way the data structure is allows it.

simark marked 3 inline comments as done.Mar 19 2018, 10:29 AM
simark added inline comments.
clangd/DraftStore.cpp
61 ↗(On Diff #138741)

Indeed, it would be better.

73 ↗(On Diff #138741)

It makes sense, but I think we can achieve the same result by just tweaking the current code. I don't think a separate function is really needed here.

97 ↗(On Diff #138741)

I saw it as LengthOfTheSliceFromTheOriginalThatWeKeepBefore + LengthOfNewStuff + LengthOfTheSliceFromTheOriginalThatWeKeepAfter. But I don't mind changing it.

103 ↗(On Diff #138741)

Note that this check is not equivalent to the previous

if (EndIndex > Contents.length())

for the case where EndIndex == Contents.length(). In our case, it's possible (and valid) for EndIndex to be equal to Contents.length(), when referring to the end of the document (past the last character). I thought that doing Contents.substr(Contents.length()) would be throw std::out_of_range or be undefined behavior, but now that I read it correctly:

@throw  std::out_of_range  If __pos > size().

So it looks like it would fine to have pos == Contents.length().

ilya-biryukov added inline comments.Mar 20 2018, 2:56 AM
clangd/ClangdLSPServer.cpp
164 ↗(On Diff #138741)

Right, my mistake. Sorry for the confusion.

clangd/DraftStore.cpp
73 ↗(On Diff #138741)

Sure, tweaking the function is also ok. Have to be careful to not accidentally update the old Contents before all changes are applied, but no problems other than that.

97 ↗(On Diff #138741)

Ha, that also makes sense to me, thanks for the explanation. And even mirrors the code better.

103 ↗(On Diff #138741)

Right, the substr will simply return an empty string. An extra check is not really necessary.

106 ↗(On Diff #138741)

I would be very surprised to see a request that mixes both, even though the data structure allows it. It's up to you if you want to change the behaviour or not.
If we keep the code this way, maybe first handle the full content update? It would make the code somewhat easier to read and reduce nesting (see the relevant bit of LLVM style guide):

for (auto &Change : Changes) {
  if (!Change.range) {
    Contents = std::move(Change.Text);
    continue;
  }

  // Handle incremental change
}
simark marked 34 inline comments as done.Mar 21 2018, 9:33 AM
simark added inline comments.
clangd/DraftStore.h
36 ↗(On Diff #138741)

I agree that applying diffs to the wrong version will break basically everything, but even if we detect a version mismatch, I don't see what we could do, since we don't have a mean to report the error to the client. The only thing we could do is log it (which we already do.

simark updated this revision to Diff 139310.Mar 21 2018, 9:46 AM
simark marked an inline comment as done.

Address review comments, except LSP version check

ilya-biryukov added inline comments.Mar 21 2018, 9:50 AM
clangd/DraftStore.h
36 ↗(On Diff #138741)

If we couldn't apply a diff, we should return errors from all future operations on this file until it gets closed and reopened. Otherwise clangd and the editor would see inconsistent contents for the file and results provided by clangd would be unreliable.
The errors from any actions on the invalid file would actually be visible to the users.

The simplest way to achieve that is to remove the file from DraftStore and ClangdServer on errors from updateDraft.
This will give "calling action on non-tracked file" errors for future operations and the actual root cause can be figured out from the logs.

simark updated this revision to Diff 139321.Mar 21 2018, 10:13 AM

Remove draft if update fails

simark updated this revision to Diff 139328.Mar 21 2018, 10:35 AM

Add lit test case

ilya-biryukov added inline comments.Mar 21 2018, 12:14 PM
unittests/clangd/DraftStoreTests.cpp
27 ↗(On Diff #138741)

This comment is marked as done, but not addressed.

38 ↗(On Diff #138741)

This comment is marked as done, but not addressed.

42 ↗(On Diff #138741)

This comment is marked as done, but not addressed.

58 ↗(On Diff #138741)

This comment is marked as done, but not addressed.

simark marked 6 inline comments as done.Mar 21 2018, 12:46 PM
simark added inline comments.
unittests/clangd/DraftStoreTests.cpp
27 ↗(On Diff #138741)

Hmm I addressed the comments at two different times, but I probably messed up my branches or something, so I lost the changes of my first pass. Sorry about that.

simark updated this revision to Diff 139355.Mar 21 2018, 12:48 PM
simark marked an inline comment as done.

Address review comments I failed to address previously

Thanks for addressing the comments. I have just one comment left, about the LSP versions and sanity-checking.

clangd/DraftStore.h
36 ↗(On Diff #138741)

We still ignore version numbers from the LSP.
Is this another change that didn't get in?

simark added inline comments.Mar 22 2018, 12:51 PM
clangd/DraftStore.h
36 ↗(On Diff #138741)

The more I think about it, the less sure I am that this is the intended usage of the version. The spec doesn't even say what the initial version of a file should be, 0 or 1? Then, it just says that the version optionally contained in the VersionedTextDocumentIdentifier reflects the version of the document after having applied the edit. But I don't think we can predict and validate what that version should be. Most clients will probably just use a sequence number, but I guess they don't have to...

Also, I initially assumed that having N changes in the contentChanges array would mean that the version number would be bumped by N. But now that I re-read it, there's really nothing that says it should behave like this.

I think that we should just record the version number and treat it as opaque, so that we can use it later when sending text edits to the client, for example. The client can then verify that the edit is for the same version of the document that it has at the moment.

Therefore, I don't think it would really be useful in this patch.

Thanks again. This generally looks LGTM, just a last drop of small nits. Will approve as soon as they land.

clangd/DraftStore.cpp
75 ↗(On Diff #139355)

Could we add a format provide for Position to make this formatting code shorter?
We'll need to define the corresponding specialization in Protocol.h:

namespace llvm {
  template <>
  struct format_provider<clang::clangd::Position> {
    static void format(const clang::clangd::Position &Pos, raw_ostream &OS, StringRef Style) { 
     assert(Style.empty() && "style modifiers for this type are not supported");
     OS << Pos;
    }
  };
}

This will allow to simplify this call site to:

llvm::formatv("Range's end position ({0}) is before start position ({1})", End, Start)
100 ↗(On Diff #139355)

Doing return std::move(localvar) is a pessimization. Use return Contents; instead.
(for reference see this SO post, clang should give a warning for that case too)

clangd/DraftStore.h
36 ↗(On Diff #138741)

The initial version is provided in didOpen (see TextDocumentItem).
I've also misread description of contentChanges in the same way. It's true that versions might grow by any any positive number, so we can't rely on them for the checks I propose here.
Thanks for looking into that and explaining the spec :-)

26 ↗(On Diff #139355)

Could we also mention in the comment that this class now handles applying incremental document changes?

unittests/clangd/DraftStoreTests.cpp
36 ↗(On Diff #139355)

NIT: another redundant static

62 ↗(On Diff #139355)

NIT: another redundant static

66 ↗(On Diff #139355)

NIT: another instance of constexpr StringLiteral

simark marked 10 inline comments as done.Mar 23 2018, 7:19 AM
simark added inline comments.
clangd/DraftStore.cpp
75 ↗(On Diff #139355)

Good idea, this is good for printing them in a consistent way too.

100 ↗(On Diff #139355)

Thanks, good catch.

simark updated this revision to Diff 139596.Mar 23 2018, 7:30 AM
simark marked 2 inline comments as done.

Address review comments

ilya-biryukov accepted this revision.Mar 26 2018, 1:51 AM

LGTM.

unittests/clangd/DraftStoreTests.cpp
1 ↗(On Diff #139596)

NIT: s/Clangd unit tests/Draft store tests
Or remove the comment title altogether, the filename seems to provide the same information.

20 ↗(On Diff #139596)

It seems this using namespace can be removed (at least there are lots of names qualified with llvm:: throughout the file)

This revision is now accepted and ready to land.Mar 26 2018, 1:51 AM
simark marked 2 inline comments as done.Mar 26 2018, 7:44 AM
This revision was automatically updated to reflect the committed changes.