[clangd] Initial cancellation mechanism for LSP requests.
Needs ReviewPublic

Authored by kadircet on Thu, Aug 9, 3:48 AM.


kadircet created this revision.Thu, Aug 9, 3:48 AM

I have left a few comments, but wanted to start with a higher-level design consideration.

I believe we should probably expose the cancellations in the ClangdServer API directly.
The reasons for that are:

  1. We have an internal client that would also benefit from using the cancellation API and we should try to share as much code as possible between that and the rest of clangd. Creating the cancellation cookies, stashing them in the context are all details that could be handled by ClangdServer so two clients don't have to do the same things.
  2. ClangdServer seems like a natural layer for adding this, since it's a user-facing async API. It spawns async tasks, so it is a perfect place to provide a way to cancel the spawned tasks as well.

The concrete API that I propose is:

  • Relevant methods of ClangdServer (we could start we code completion) that spawn async reqeuests returns a cancellation object that allows to cancel the spawned job:
class TaskHandle {
  void cancel();

class ClangdServer {
  TaskHandle codeComplete(File, Position, Callback);
  • Users of the API are free to ignore the TaskHandle if they don't need to cancel the corresponding requests.
  • If the users are capable of cancelling the request, they are responsible for keeping the TaskHandle around before the corresponding callback is called.
  • If the request was cancelled, the Callback might receive the TaskCancelledError in the callback and should handle it accordingly.

The rest of the design LG, i.e. the code that can actually be cancelled is responsible for checking CancellationHandler::HasCancelled and should return a corresponding error in the callback if it was cancelled.


This file could use some comments about the API and samples of how it can be used.
Could probably be done after we finalize the design, though.


I think we might want to expose the context key for the cancellation primitive here.
The reason is that checking if request is cancelled is a common operation (i.e. I can imagine it being called on every few parsed ast node, etc.), so we'd want to keep the overhead of checking for cancellation very low.

Exposing a key in the context would allow to avoid doing the lookups in the context every time we need to check if request was cancelled.

Note that exposing shared_ptr<atomic<bool>> is probably not a great idea, since it'll allow to write into that object too. I suggest we go with a class that wraps it and only allows to read the value of the variable, i.e. only has isCancelled method.


The LLVM_NODISCARD is misplaced. The current code applies the attribute to the type (i.e. Context), and we want to apply to the function.
There are two ways to do that:

  1. LLVM_NODISCARD static Context SetCurrentCancellationToken()
  2. static Context SetCurrentCancellationToken LLVM_NODISCARD ()

The first one is definitely less awkward and more widely used, so I suggest we stick to it


NIT: use lowerCamelCase for function names (per LLVM style guide)


It seems the contract for ErrorInfo actually requires this function to be implemented, so using llvm_unreachable doesn't seem right here.
We could get away without actually unique error codes by returning llvm::inconvertibleErrorCode(), though.

  1. Maybe use getAsInteger() to convert directly to int64_t?
  2. Maybe use itostr to avoid conversion from int64 to uint64, which is undefined for negative numbers?
  3. What if input is neither string nor number? Maybe add an assertion and check at the call-sites? That's a form of user input and we don't want clangd to crash in case of incorrect ones.

Alternatively, could we simply pipe the json::Value to a raw_string_stream and get the pretty-printed string? Would avoid dealing with input validation issues.


Maybe clarify in the comment that the key of the map is a serialized request-id?

kadircet updated this revision to Diff 160531.Tue, Aug 14, 1:36 AM
kadircet marked 7 inline comments as done.
  • Polished API.
  • Resolved discussions.

Sorry for missing this one. The biggest concern that I have is about the thread-safety of the API, it's too easy to misuse from multiple threads at this point.

We should define thread-safety guarantees for TaskHandle and CancellationToken and make sure they're not easily broken. The suggested way is to make both move-only types.
For CancellationToken we might need a non-thread-safe copy operation, though. But maybe make it a separate method (clone?) with a clear statement that it's not thread-safe?

Other than that just NITs.


Having a shared_ptr key in the Context can cause data races (e.g. if we copy it concurrently from multiple threads).
I suggest we make CancellationToken move-only (i.e. disallow copies) and return const CancellationToken& when getting it from the context.


I'm not sure if this function buys us much in terms of clarity, maybe remove it and inline everywhere?


The comments seems to mention all the important bits, but maybe we could separate different concerns more clearly based on how the code should be used?
E.g. here's a rough draft to illustrate the idea:

Cancellation mechanism for async tasks. Roughly all the clients of this code can be classified into three categories:
1. The code that creates and schedules async tasks, e.g. TUScheduler.
2. The callers of the async method that can cancel some of the running tasks, e.g. `ClangdLSPServer`
3. The code running inside the async task itself, i.e. code completion or find definition implementation that run clang, etc.

For (1), the guideline is to accept a callback for the result of async operation and return a `TaskHandle` to allow cancelling the request.
    TaskHandle someAsyncMethod(function<void(llvm::Expected<ResultType>)> Callback);

The callers of async methods (2) can issue cancellations and should be prepared to handle `TaskCancelledError` result:
    TaskHandle H = someAsyncMethod([](llvm::Expected<ResultType> R) {
        if (/* code to check for TaskCancelledError */)
           llvm::errs() << "Task got cancelled";
           llvm::errs() << "The result is: " << *R;        

The worker code itself (3) should check for cancellations using `CancellationToken` that can be retrieved via `CancellationToken::isCancelled()`.
    void codeComplete() {
      const CancellationToken& CT = CancellationToken::isCancelled();
      if (CT.isCancelled()) 
        return llvm::makeError<TaskCancelledError>();

      if (CT.isCancelled()) 
        return llvm::makeError<TaskCancelledError>();

      return results;
If the operation was cancelled before it could run to completion, it should propagate the TaskCancelledError as a result.

We're missing docs on this and the other classes. I suggest a small doc describing what they're used for, e.g. used for checking if the operation was cancelled and used to signal the operation should be cancelled, etc.

Also noting the threading-safety guarantees is a probably a good idea.


NIT: move members to the end of the class to be consistent with the rest of clangd.


Can we make this ctor private?


I wonder if we should make TaskHandle move-only?

The reasoning is that is can be easily used from multiple threads (since it's used by code dealing with async tasks anyway) and copying it concurrently is a data race.
On the other hand, calling cancel() is perfectly thread-safe and shouldn't cause any problems.


NIT: maybe use a shorter name, e.g. TaskHandle::create?


Maybe remove CancellationHandler and put all its functions into the clangd namespace? It's somewhat easily confused with CancellationToken.
We could declare some function friends if we need to


NIT: maybe use a shorter name: CancelledError?


I expect the pattern if cancelled -> reply with ErrorCode::RequestCancelled else reply with ErrorCode::InvalidParams to be pretty common when we add cancellation to more operations.

Maybe add a small helper that does that? E.g.

/// Replies to a request with an error, passing proper error codes to the LSP client.
/// Properly handles cancellations.
void replyError(Error E);

NIT: use Lock or L for consistencty with the rest of clangd. it's somewhat easy to misread std::lock_guard<std::mutex> _(Mut) as std::lock_guard<std::mutex> (Mut)

kadircet updated this revision to Diff 160636.Tue, Aug 14, 10:37 AM
kadircet marked 4 inline comments as done.
  • Get rid of getCancellationError.
  • Add replyError helper.
kadircet added inline comments.Tue, Aug 14, 10:37 AM

As talked offline, copying std::shared_ptr is thread safe.

kadircet updated this revision to Diff 160652.Tue, Aug 14, 11:38 AM
kadircet marked 8 inline comments as done.
  • Resolve discussions.
  • Get rid of CancellationHandler class.
  • Change error class name.
  • Improve documentation.
kadircet added inline comments.Tue, Aug 14, 2:30 PM

As similar to above mentioned case copying doesn't introduce any data races.

kadircet updated this revision to Diff 160986.Thu, Aug 16, 2:41 AM
  • Made TaskHandle move-only. Since it is costly and most likely unnecessary to copy it other than to move it into Context.
  • Provided an explicit clone method for copying.

Mostly LSP-specific comments and comments about making TaskHandle thread-safe.

I'm also starting to wonder if the scope of this patch is too big, we could potentially split it into three separate bits:

  1. Generic cancellation API, i.e. Cancellation.h and unit-tests of the API,
  2. ClangdServer using the cancellation API.
  3. LSP-specific bits, i.e. cancelRequest etc.

Not sure if it's worth splitting the last two, but splitting out the Cancellation API in a separate change could make review simpler.
But also happy to review the whole thing and land after it's done, feel free to pick whatever suits you best.


My mistake, thanks for explaining this one to me.
Copying of CancellationTokens LG!


NIT: Maybe replace 'Totally thread-safe' with just Thread-safe? The latter conveys the same information.


NIT: maybe use Token->load() instead of static_cast<bool>(Token)? Arguably, looks more concise.


It's a bit confusing that this name clashes with a member function.
We seem to optimize for making this work anywhere:

if (isCancelled())  // <-- get token from ctx, then use implicit conversion to check if it was cancelled
  return llvm::makeError<CancelledError>();

Which is probably a nice pattern. But we could have two functions that do the same thing with less magic (i.e. implicit conversions) involved at the call sites:

CancellationToken getCurrentCancellation();
void setCurrentCancellation(CancellationToken CT);

inline bool isCancelled() { return getCurrentCancellation().isCancelled(); }

Would allow to get rid of implicit conversion and friend function with the same signature as member. Would also make cases where we actually want to stash the token somewhere cleaner, e.g. instead of CancellationToken CT = isCancelled(), we'll have CancellationToken CT = getCurrentCancellation().




As discussed offline, maybe we should make TaskHandle move-only?
Conceptually, TaskHandle is an abstraction that represent some spawned async task and in that sense somewhat similar to futures.


I wonder if we should have a createCancellationToken() method and a method to stash it in the context instead?
Would help if we want to make TaskHandle move-only and in general seems like a simpler user API, because things you put and get back out of the context is the same, i.e. it's a CancellationToken.


NIT: return std::string, const does not buys us anything here.


NIT: remove braces around short, single-statement branches


Use a value type here, i.e. std::string instead of const std::string&.


Use a value type here, i.e. std::string instead of const std::string&.


NIT: remove these braces, they don't buy us anything here?


I'm not sure what associated with only their thread means.
The tokens are associated with the request-ids rather than threads, so having something like ..., they create and remove cancellation handles for the request-id stored in the current Context


These two methods look like a nice suit for better wrapped in a RAII-object. It would add the value to the associated map on construction and remove on destruction.
One problem is that all operations calls would become weird, because we'll have to move this RAII object around and moving into lambdas is not fun without C++14 (which allows [x=std::move(y)]() {}..

Maybe add a comment that calls to this function should always be paired and a FIXME that we should have an API that enforces this?


NIT: remove braces around single-statement branches


Isn't there a form of handleErrors that can handle the second case in the last argument?
I.e. can we merge the following if into the handleErrors call?


Maybe adds docs explaining what this function does?
Specifically, what ErrorCode it chooses and how the message would look like


naming-nit: use lowerCamelCase for function names


I suggest we document this function and put to next to reply and replyError, since they are closely related (i.e. the latter put the current request-id into the replies).
We should probably discourage using this function in its documentation unless absolutely necessary, the users should probably directly call reply or replyError instead.


Another interesting case to test:
create multiple threads, propagate cancellation into them and check they get properly cancelled.
Using Notification from Threading.h should allow to write it in a pretty simplistic manner.

The last set of comments is for the previous version, might be missing some updates, sorry about that. Will make sure to review the one too!

kadircet updated this revision to Diff 161018.Thu, Aug 16, 6:36 AM
kadircet marked 21 inline comments as done.
  • Resolve discussions.
kadircet added inline comments.Thu, Aug 16, 6:36 AM

Totally agreed, actually, didn't even notice they had the same signature :D, thanks!
But I still think implicit conversion of CancellationToken to bool seems like a good thing to have.


Yeah that would be really nice to have them in a RAII object, but the thing is cleanup can occur only after the task dies. Which we don't directly know since threads are being detached. So the only way to trigger destruction is at the callback, which is where I call cleanup currently. Maybe we can store the object within context to trigger destructor at the end of the thread, but still don't think it would look any better or ease the job of the caller.

Mostly NITs, but please take a look at the CancelParams handling problem. I believe there might be a potential bug hiding there :-)


s/Can be created with clangd::isCancelled()/Tokens for the currently running task can be obtained via clangd::getCurrentCancellationToken/?
(to focus on which kind of tokens can be obtained from the tokens)

Another idea: maybe add a reference no why someone would want to use CancellationToken instead of isCancelled() helper, e.g.

/// If cancellation checks are rare, one could use the isCancelled() helper to simplify the code.
/// However, if cancellation checks are frequent, the guideline is first obtain the CancellationToken for the currently running task with getCurrentCancellationToken() and do cancel checks using it to avoid extra lookups in the Context



Wouldn't it work incorrectly for string IDs?
When normalizing json::Value from getRequestID, we simply print out json. For strings, this should result in quoted output, e.g. "req". However, when parsing CancelParams, we parse this json. So we'll end up inserting with a key "req" and erasing with a key 'req' (without the quotes).


Could we elog if insert indicated that a value already exists in a map? This would probably mean malformed user input, but could also be immensely helpful for debugging.


Having these paired method calls is probably fine for the first iteration, but we should abstract away this pattern when supporting cancellations for more operations.
LG with a FIXME.


Maybe use a shorter name? E.g. Task or TH :-)


Instead of moving cancellation token around explicitly, we could stash in a context right away and it would nicely propagate into the thread. This looks like less boilerplate.

auto CancellableTaskHandle = TaskHandle::create();
WithContext ContextWithCancellation(setCurrentCancellationToken(TaskHandle.createCancellationToken()));

auto Task = [](File, Callback<CodeCompleteResult> CB, llvm::Expected<InputsAndPreamble> IP) {
  if (getCancellationToken().isCancelled())
    return CB(make_error<CancelledError>());
 /// rest of the code is the same

Thinking out loud: the TaskHandle and CancellationToken handling will probably eventually move into TUScheduler::runWithPreamble and other thread-spawning code to avoid this kind of boilerplate here.

No need to change anything in this patch, though.


Maybe add a comment from the LSP spec to be consistent with the rest of the file?
Also note that this can be either a number or a string, but we choose to print out a number in that case and just always use strings.

kadircet updated this revision to Diff 161261.Fri, Aug 17, 8:26 AM
kadircet marked 10 inline comments as done.
  • Resolve discussions & Delete enclosing quotes when normalizing.
kadircet marked an inline comment as done.Fri, Aug 17, 8:27 AM
kadircet added inline comments.

You are right, that's exactly the case, thanks!

kadircet marked 2 inline comments as done.Fri, Aug 17, 8:28 AM
ilya-biryukov added inline comments.Fri, Aug 17, 8:59 AM

This still misses string escaping issues. E.g. " will get quoted as \", etc.
I suggest we actually switch on this json value kind.
Better yet, reuse the part of fromJSON(CancelParams) that does the request-id parsing.


NIT: check for it even before looking at the preamble?