Page MenuHomePhabricator

[clangd] refactoring for XPC transport layer [NFCI]
AbandonedPublic

Authored by jkorous on Jun 25 2018, 1:01 PM.

Details

Summary

Hi all,

We finally finished a self-contained first version of our implementation of alternative transport layer for macOS based on XPC.

To enable this I did couple of changes to current clangd design. Generally I aimed for small amount of disruption and tried to keep it simple.

The notable changes are:

  • enabled registration of request handlers to different dispatchers by templatizing registerCallbackHandlers()
  • factored out parts of JsonDispatcher that could be reused (created DispatcherCommon.h/cpp)
  • created abstraction over JsonOutput (class LSPOutput)
  • removed of ClangdLSPServer::run() method so server can be run with different dispatcher
  • published IsDone and ShutdownRequestReceived through methods in ClangdLSPServer class interface to support new dispatcher

We are also putting up the transport layer implementation itself for a review soon so it will be more obvious where are we going to and what motivated some of these changes.

Big thanks in advance to all reviewers!

Jan

Diff Detail

Event Timeline

jkorous created this revision.Jun 25 2018, 1:01 PM

Followed by these two patches:
[clangd] JSON <-> XPC conversions
https://reviews.llvm.org/D48560
[clangd] XPC transport layer
https://reviews.llvm.org/D48562

Ping. Added reviewers suggestion.

Hi Jan,

Thanks for putting this together, and apologies - this is one of the places where we don't have nice abstractions/layering, so adding XPC was harder than it should be.

As I mentioned on the other review, I think maybe this patch isn't invasive enough, we could do a more thorough job of making the json transport a standalone, first-class thing. This makes it easier to swap out for XPC but also would improve the layering in the core clangd classes.

I put together a sketch of a Transport interface in D49389 (that patch is extremely unfinished and won't compile, but the idea might be interesting).
The idea is that we should be able to implement that class for XPC and then it should just drop-in as a replacement for the JSON one.
I've indicated there where XPC and JSON implementation can share code (assuming you wouldn't rather "tighten up" the protocol and eliminate some JSON-RPC noise).

If you like this direction I'm happy for you to pick the bits you like, or I can finish it as a refactoring and we can make sure it works for XPC.
If not, that's fine too - happy to look at other ways to reduce duplication between them.

DispatcherCommon.h
38

I think I wrote this bit... it's a hack that I'm not sure deserves to be visible in a header (it was bad enough in JSONRPCDispatcher.cpp!)

Rather than exposing it so we can use it twice, I'd hope we can manage to keep it hidden so that JSON and XPC can share it.

Hi Sam, thanks for your feedback!

In general I agree that explicitly factoring out the transport layer and improving layering would be a good thing. Unfortunately it's highly probable that we'd like to drop JSON completely from XPC dispatch (XPC -> JSON -> ProtocolCallbacks -> JSON -> XPC) in not too distant future. (Please don't take this as a promise as our priorities might change but it's probable enough that I don't recommend to base any common abstraction on JSON.) I originally tried to create similar abstraction but ultimately gave up as it was way too complex. Not saying it's impossible or that I would not like to though!

I basically see these different parts of the problem:

  1. dispatch implementation

I think that different dispatch implementations might be better kept separate without any common abstraction as it's not a lot of code and not much of opportunities for shared implementation - DispatcherCommon.h/cpp effectively vanishes in case we drop JSON from XPC dispatch.

  1. LSP request deserialization

Deserialization of LSP requests for different transport layers is easy - it's a single line of code (currently lambda wrapper in HandlerRegisterer) so it would be easy to parametrize this.

  1. LSP response serialization

This is where I failed. Basically I haven't found any nice way how to make it a part of transport layer interface. I could imagine having a set of response classes in Protocol.h and toXpc functions but still not sure what kind of interface to have for these in transport layer.

Hi Sam, thanks for your feedback!

In general I agree that explicitly factoring out the transport layer and improving layering would be a good thing. Unfortunately it's highly probable that we'd like to drop JSON completely from XPC dispatch (XPC -> JSON -> ProtocolCallbacks -> JSON -> XPC) in not too distant future. (Please don't take this as a promise as our priorities might change but it's probable enough that I don't recommend to base any common abstraction on JSON.) I originally tried to create similar abstraction but ultimately gave up as it was way too complex. Not saying it's impossible or that I would not like to though!

That makes sense and is good to know. I definitely don't want to add a transport abstraction that isn't a good fit for your long-term plans.

I think we need to clearly see what the goal is to get the design right though - it doesn't make sense to refactor in order to share code temporarily. Let's take code completion as an example, as it's one of the ones we've fleshed out most for our internal (non-LSP) clients.

  • if the goal is to mirror JSON-RPC with a different encoding (current patches), and you're just concerned about the *efficiency* of CodeCompletion -> CompletionItem -> json::Value -> xpc_object_t, this seems like premature optimization to me, but let's discuss!
  • if the goal is to basically follow the LSP (same textDocument/completion for), using the CompletionItem structs in Protocol.h but exposing more/fewer fields, renaming them etc, then we should probably have the feature code express replies as protocol structs, and make the transport define a way to transform specific objects (CompletionItem) into generic ones whose type (json::Value or xpc_object_t) are transport-specific
  • if the goal is to maintain a stable protocol that can be evolved independently of LSP (different methods or semantics) then this isn't really a transport, it's a new protocol. I think the best option is probably to skip ClangdLSPServer and have the xpc service wrap ClangdServer instead, with a separate main entrypoint. The litmus test here: as people make changes to clangd that affect the LSP surface, do you want the XPC service's interface to change too? This is the hardest model to support as the C++ API (ClangdServer) is not a stable one, but it's what we do for the other non-LSP clients.

Does one of these seem close to what you want? Any thoughts on the conclusions?

Hi Sam, thanks for your feedback!

In general I agree that explicitly factoring out the transport layer and improving layering would be a good thing. Unfortunately it's highly probable that we'd like to drop JSON completely from XPC dispatch (XPC -> JSON -> ProtocolCallbacks -> JSON -> XPC) in not too distant future. (Please don't take this as a promise as our priorities might change but it's probable enough that I don't recommend to base any common abstraction on JSON.) I originally tried to create similar abstraction but ultimately gave up as it was way too complex. Not saying it's impossible or that I would not like to though!

That makes sense and is good to know. I definitely don't want to add a transport abstraction that isn't a good fit for your long-term plans.

I think we need to clearly see what the goal is to get the design right though - it doesn't make sense to refactor in order to share code temporarily. Let's take code completion as an example, as it's one of the ones we've fleshed out most for our internal (non-LSP) clients.

  • if the goal is to mirror JSON-RPC with a different encoding (current patches), and you're just concerned about the *efficiency* of CodeCompletion -> CompletionItem -> json::Value -> xpc_object_t, this seems like premature optimization to me, but let's discuss!
  • if the goal is to basically follow the LSP (same textDocument/completion for), using the CompletionItem structs in Protocol.h but exposing more/fewer fields, renaming them etc, then we should probably have the feature code express replies as protocol structs, and make the transport define a way to transform specific objects (CompletionItem) into generic ones whose type (json::Value or xpc_object_t) are transport-specific
  • if the goal is to maintain a stable protocol that can be evolved independently of LSP (different methods or semantics) then this isn't really a transport, it's a new protocol. I think the best option is probably to skip ClangdLSPServer and have the xpc service wrap ClangdServer instead, with a separate main entrypoint. The litmus test here: as people make changes to clangd that affect the LSP surface, do you want the XPC service's interface to change too? This is the hardest model to support as the C++ API (ClangdServer) is not a stable one, but it's what we do for the other non-LSP clients.

    Does one of these seem close to what you want? Any thoughts on the conclusions?

Hi Sam,
Thanks for your feedback!

Generally right now will be mirroring JSON-RPC with a different encoding, but in the future we will send some optimized payloads for a subset of replies which are constructed out of things like CompletionItem directly (so bypassing the JSON step).

I believe that your patch (https://reviews.llvm.org/D49389) is implementing the JSON-RPC with a choice of encoding, which will work for us right now, and it doesn't look like we will have problems optimizing our payloads later when we choose to do so.
However, the second option that you proposed does sound quite compelling as well (i.e. then we should probably have the feature code express replies as protocol structs), as it would give us more flexibility. However, I think it's orthogonal to the first option (JSON-RPC with a choice of encoding), so maybe we can leave it as something that will be on the table if we need it later? I think for the transport layer idea you proposed in https://reviews.llvm.org/D49389 makes sense and will work for us, and in the future we always can implement the second option that your proposed on top of of your patch, or use a simpler way to optimize our payloads without much disruption.

Thanks,
Alex

Sam, just out of curiosity - would it be possible for you to share any relevant experience gained by using porting clangd to protobuf-based transport layer?

Hi Sam, we discussed a bit more with Alex offline. Ultimately we don't mind using JSON as part of the generic interface but we'd like to optimize specific cases in the future (thinking only about code completion so far) which might need to "work-around" the abstraction. What is your opinion about this - does it ruin the whole point of having the abstraction or do you consider pragmatic to have abstraction covering most of the functionality with maybe couple exceptions? I kind of assume (please correct me if I am wrong) that you might have done something similar with profobuf interface so we're definitely happy to discuss.

BTW Just for the record - I put the rebased & updated patch here: [clangd] XPC WIP https://reviews.llvm.org/D49548

Sorry for the delay here, and I'm sorry I haven't been into the patches in detail again yet - crazy week. After experiments, I am convinced the Transport abstraction patch can turn into something nice if we want XPC vs JSON to be a pure transport-level difference with the same semantics. But you need to decide the approach here based on your requirements.

Sam, just out of curiosity - would it be possible for you to share any relevant experience gained by using porting clangd to protobuf-based transport layer?

Sure! (and sorry for the delay here).

This is done as part of an internal-only editor that's a hosted web application with tight integration to our internal source control, build system etc.
That's not my immediate team, but we talk a lot and collaborate on the integration, which is different than a typical LSP server/editor integration in a few ways.
(I expect some of these aspects will apply to XCode+clangd, and others won't). This is kind of a brain dump...

  • like a regular LSP client, the editor wants to run clangd out-of-process for isolation reasons, and because of cross-language issues. Because of our hosted datacenter environment, a network RPC server was the most obvious choice, and protobufs are our lingua franca for such servers. This sounds analagous to the choice of XPC to me.
  • the editor is multi-language, so the idea of using the existing LSP is appealing (reuse servers, we can't rewrite the world).
  • adapting to protobuf is some work (write a wrapper for each LSP server). In principle a generic one is possible, but not for us due to VFS needs etc.
  • you can wrap at the LSP level (CompletionList -> CompletionListProto) or at the transport level (json::Value --> JsonValueProto). Editor team team went for the former.
  • consuming clangd as a C++ API rather than as an external process or opaque JSON stream, gives lots more flexibility, and we can add extension points relatively easily. Robust VFS support, performance tracing, logging, different threading model are examples of this outside the scope of LSP. The internal code uses unmodified upstream clangd code, but provides its own main().
  • Inside the scope of LSP, changes are tempting too due to LSP limitations. Since the editor team controls the protocol and the frontend, they can add features. This is a source of tension: we (clangd folks) don't want to support two divergent APIs. So in limited cases we added to clangd a richer/more semantic C++ API that provides extra features over more-presentational LSP. Best examples are diagnostics (C++ API includes fixits and 'note' stack, before LSP supported these), and code completion (C++ API includes semantic things like "name of include to insert"). We've tried to keep this limited to what's both valuable and sensible.
  • adding LSP extensions to the *protocol* causes skew across languages. So the divergences at that level tend to be more presentational (e.g. "completion item documentation subtitle"). The clangd-wrapper decides how to map the semantic Clangd C++ API fields onto the presentational mutant-LSP fields, in the same way that regular clangd decides how to map them onto regular LSP.
  • it's annoying that the issues you run into with this setup need to be carefully dissected (editor vs wrapper vs clangd) to work out where the bug belongs. Compare to "plain" clangd where you can just check with a different editor and then know the bug is in clangd.

In summary:

  • speaking a different wire protocol that's semantically identical to LSP is an easy win
  • owning a separate main() so you can plug in cross-cutting facilities (logging, tracing, index) is very manageable and worthwhile
  • maintaining a separate protocol is a bunch more work and decisions all the time, even if divergences from LSP are small. It *is* more flexible though.

One halfway possibility if you're on the fence: we could support some limited extensions to the protocol behind an option (e.g. extra fields serialized with the option, whether XPC or JSON is used). It's less flexibility than full ownership of the protocol semantics on Apple's side though.

Sorry for the delay here, and I'm sorry I haven't been into the patches in detail again yet - crazy week. After experiments, I am convinced the Transport abstraction patch can turn into something nice if we want XPC vs JSON to be a pure transport-level difference with the same semantics. But you need to decide the approach here based on your requirements.

Sam, just out of curiosity - would it be possible for you to share any relevant experience gained by using porting clangd to protobuf-based transport layer?

Sure! (and sorry for the delay here).

This is done as part of an internal-only editor that's a hosted web application with tight integration to our internal source control, build system etc.
That's not my immediate team, but we talk a lot and collaborate on the integration, which is different than a typical LSP server/editor integration in a few ways.
(I expect some of these aspects will apply to XCode+clangd, and others won't). This is kind of a brain dump...

  • like a regular LSP client, the editor wants to run clangd out-of-process for isolation reasons, and because of cross-language issues. Because of our hosted datacenter environment, a network RPC server was the most obvious choice, and protobufs are our lingua franca for such servers. This sounds analagous to the choice of XPC to me.
  • the editor is multi-language, so the idea of using the existing LSP is appealing (reuse servers, we can't rewrite the world).
  • adapting to protobuf is some work (write a wrapper for each LSP server). In principle a generic one is possible, but not for us due to VFS needs etc.
  • you can wrap at the LSP level (CompletionList -> CompletionListProto) or at the transport level (json::Value --> JsonValueProto). Editor team team went for the former.
  • consuming clangd as a C++ API rather than as an external process or opaque JSON stream, gives lots more flexibility, and we can add extension points relatively easily. Robust VFS support, performance tracing, logging, different threading model are examples of this outside the scope of LSP. The internal code uses unmodified upstream clangd code, but provides its own main().
  • Inside the scope of LSP, changes are tempting too due to LSP limitations. Since the editor team controls the protocol and the frontend, they can add features. This is a source of tension: we (clangd folks) don't want to support two divergent APIs. So in limited cases we added to clangd a richer/more semantic C++ API that provides extra features over more-presentational LSP. Best examples are diagnostics (C++ API includes fixits and 'note' stack, before LSP supported these), and code completion (C++ API includes semantic things like "name of include to insert"). We've tried to keep this limited to what's both valuable and sensible.
  • adding LSP extensions to the *protocol* causes skew across languages. So the divergences at that level tend to be more presentational (e.g. "completion item documentation subtitle"). The clangd-wrapper decides how to map the semantic Clangd C++ API fields onto the presentational mutant-LSP fields, in the same way that regular clangd decides how to map them onto regular LSP.
  • it's annoying that the issues you run into with this setup need to be carefully dissected (editor vs wrapper vs clangd) to work out where the bug belongs. Compare to "plain" clangd where you can just check with a different editor and then know the bug is in clangd.

    In summary:
    • speaking a different wire protocol that's semantically identical to LSP is an easy win
    • owning a separate main() so you can plug in cross-cutting facilities (logging, tracing, index) is very manageable and worthwhile
    • maintaining a separate protocol is a bunch more work and decisions all the time, even if divergences from LSP are small. It *is* more flexible though.

      One halfway possibility if you're on the fence: we could support some limited extensions to the protocol behind an option (e.g. extra fields serialized with the option, whether XPC or JSON is used). It's less flexibility than full ownership of the protocol semantics on Apple's side though.

A custom protocol does sound quite appealing. I previously was quite skeptical just because ClangdLSPServer seemed to have had a lot of important logic, but it seems that that's not the case right now. I will lead an internal discussion on this to figure out if that's something we'd like to do. I'll respond here sometime next week.

jkorous planned changes to this revision.Jul 25 2018, 6:39 AM

Hi Sam, we are still discussing internally how to fit clangd and XPC together. Please bear with us and ignore our patches until we decide.

Hi, we decided to go with a different solution:
https://reviews.llvm.org/D50452

jkorous abandoned this revision.Apr 23 2019, 5:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2019, 5:29 PM
Herald added a subscriber: kadircet. · View Herald Transcript