This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Extend dexp to support remote index
ClosedPublic

Authored by kbobyrev on Apr 20 2020, 3:25 PM.

Details

Summary
  • Merge clangd-remote-client into dexp
  • Implement clangd::remote::IndexClient that is derived from SymbolIndex
  • Upgrade remote mode-related CMake infrastructure

Diff Detail

Event Timeline

kbobyrev created this revision.Apr 20 2020, 3:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2020, 3:25 PM
kbobyrev updated this revision to Diff 258861.Apr 20 2020, 3:55 PM

Fix build.

sammccall added inline comments.Apr 20 2020, 4:01 PM
clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
14

this include and the stuff depending on it needs to be ifdef'd

clang-tools-extra/clangd/index/remote/CMakeLists.txt
1–4

why is this extra param needed?

13

let's avoid propagating the clangDaemon name any further. clangdRemoteIndex? or clangdRemoteIndexClient?

clang-tools-extra/clangd/index/remote/Index.h
21 ↗(On Diff #258853)

the class doesn't need to be exposed here, as the SymbolIndex interface seems to do everything we need. Just expose a factory?

Actually, I think that means we can expose this header whether grpc is enabled or not. If no grpc, then we just link in an implementation of the factory that always returns an error. WDYT?

clang-tools-extra/clangd/index/remote/Index.proto
16

should also return a stream. has_more is annoying, but you can keep it in the message and set it only on the last element of the stream.

67

confused... why not use Symbol now?

clang-tools-extra/clangd/index/remote/server/Server.cpp
46

move all the conversions into a file, marshalling.cpp or whatever?

kbobyrev marked 5 inline comments as done.Apr 20 2020, 4:30 PM
kbobyrev added inline comments.
clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
14

Ah, right, I forgot about this. Was present in the previous patch where I had custom definitions, but forgot about it for some reason.

41

@sammccall do you know why this opt is not being set for some reason? When I try to run dexp --help it is displayed in the possible arguments and when I try to pass anything invalid for boolean flags this also fails, but the value is not changed regardless of the values I put here :( Must be something simple I've missed.

clang-tools-extra/clangd/index/remote/CMakeLists.txt
1–4

Because this needs to be included both here and for dexp binary.

clang-tools-extra/clangd/index/remote/Index.proto
16

Is this an idiomatic way of using gRPC + Protobuf? Seems counter-intuitive to me since repeated stream of symbols in the message + single has_more seems quite logical.

67

Couldn't put Symbols into FuzzyFindReply for some reason. Clangd behaves really weird with Protobuf inclusions for me... Need to figure out why that happens, but might be me doing something wrong.

kbobyrev updated this revision to Diff 258936.Apr 21 2020, 2:36 AM
kbobyrev marked 2 inline comments as done.

Implement refs(), address some comments.

kbobyrev marked an inline comment as done and an inline comment as not done.Apr 21 2020, 2:37 AM
kbobyrev updated this revision to Diff 258942.Apr 21 2020, 2:53 AM

Add correct definition and fix build.

kbobyrev updated this revision to Diff 258958.Apr 21 2020, 3:38 AM

Move type conversions to Marshalling.(h|cpp) and hide IndexClient behind a factory function.

kbobyrev marked 2 inline comments as done.Apr 21 2020, 3:39 AM
kbobyrev updated this revision to Diff 259167.Apr 21 2020, 10:16 PM

Resolve a couple of FIXMEs. The patch is review-ready now.

kbobyrev updated this revision to Diff 259233.Apr 22 2020, 4:21 AM
kbobyrev marked an inline comment as done.

Make sure the value of --remote flag is stored.

kbobyrev updated this revision to Diff 259257.Apr 22 2020, 6:05 AM

Stream responses in refs() and fuzzyFind()

kbobyrev marked 2 inline comments as done.Apr 22 2020, 6:05 AM
sammccall added inline comments.Apr 22 2020, 7:50 AM
clang-tools-extra/clangd/CMakeLists.txt
161

Can we use features.inc.in instead for this?
(I don't have a strong opinion on which one is better, but I'd rather not do things two ways)

clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
41

I think you need to read it before we ResetCommandLineParser(). That part is a bit of a hack :-(

clang-tools-extra/clangd/index/remote/CMakeLists.txt
1–4

still? I thought the idea was stuff outside this directory could include the client header, whether it's compiled in or not, and that header doesn't require protos

13

clang -> clangd though :-)

21

do protobuf/grpc++ not get linked in transitively by linking against RemoteIndexProtos? Sad :-(

clang-tools-extra/clangd/index/remote/Index.h
1 ↗(On Diff #259257)

Suggest calling this Client.h

12 ↗(On Diff #259257)

no implementation headers should be needed now

21 ↗(On Diff #259257)

this needs docs:

  • what it does
  • error conditions
  • what happens if support isn't compiled in
21 ↗(On Diff #259257)

is the address resolved synchronously? is there a timeout?
does this block until the channel is ready, or will that happen on the first request?

(My sense is you're probably going to want a timeout param and to describe what it does, and *maybe* return different error types)

sammccall added inline comments.Apr 22 2020, 7:50 AM
clang-tools-extra/clangd/index/remote/Index.cpp
36 ↗(On Diff #259257)

error-handling: what if the symbol is invalid? (log and skip)

36 ↗(On Diff #259257)

this should call a function in marshalling, which just does the YAML thing for now

38 ↗(On Diff #259257)

use log()/vlog if you want to log.
But this should probably be a trace::Span instead so you get the latency.

47 ↗(On Diff #259257)

I'd consider pulling out a streamRPC template, these tend to attract cross-cutting code (setting deadlines, tracing/logging, and such)

97 ↗(On Diff #259257)

if remote is disabled we can't compile the rest of this file.
Rather than ifdefs here, I'd suggest doing it at the cmake level:

if(CLANGD_ENABLE_REMOTE)
  add_clang_library(... Index.cpp)
else()
  add_clang_library(... IndexUnimplemented.cpp)
endif()
clang-tools-extra/clangd/index/remote/Index.proto
13

Just realized: we should probably call this SymbolIndex to match clangd::SymbolIndex (or rename the latter to match this)

21

nit: I'd suggest grouping requests with corresponding replies, and all the common vocab types either above or below.

51–64

"Actual messages" is a bit vague.
Maybe "The response is a stream of symbol messages, and one terminating has_more message."

54

I think a generic name like kind would be more readable here.

If the stream/final-result pattern is going to be our common one, you could consider giving them *all* generic names, like:

message FuzzyFindReply {
  oneof kind {
    Symbol stream_result;
    bool final_result; // HasMore
  }
}

Then they conform to a Stream<StreamT, FinalT> concept and you can abstract the handling in a template.

This makes sense if we want to lean into the index service being a mechanical translation of clangd::SymbolIndex. But I'm not sure whether that's a good idea, e.g. this one needs to be backwards-compatible, so they may drift.

67

Did you find out? Fixing this isn't a backwards-compatible change, and "this code is broken and I don't know why" isn't a great state for checkin.

Happy to help but please provide some details.

76

Ref should be a message with ref_yaml for now?
(Consistency with Symbol)

clang-tools-extra/clangd/index/remote/server/Server.cpp
79

new and set_allocated_* are scary :-)

NextMessage.mutable_symbol()->set_yaml_serialization(...)

92

fromProtobuf(Request)?

kbobyrev updated this revision to Diff 259356.Apr 22 2020, 12:17 PM
kbobyrev marked 10 inline comments as done.

Just a checkpoint: resolve several comments.

clang-tools-extra/clangd/index/remote/Index.proto
67

Yes, it has to do with different code generated by Protobuf and I learned how to discover which API is becoming removed and which is accessible.

kbobyrev updated this revision to Diff 259547.Apr 23 2020, 6:37 AM
kbobyrev marked 13 inline comments as done.

Address the remaining comments.

clang-tools-extra/clangd/index/remote/Index.cpp
97 ↗(On Diff #259257)

I would argue that including index/remote/Client.h without CLANGD_ENABLE_REMOTE is not correct. We would still have to put #ifdefs in the user code regardless of whether what you proposed is implemented. I don't see any benefits in allowing users to include index/remote/Client.h, link clangdRemoteIndex and getting a runtime error. All of those steps have _remote_ in them and if _remote mode_ is not enabled, something certainly went wrong.

Also, this will complicate CMake structure as I can't put files that are conditionally added/excluded from the clang library and I would either have to make a separate directory with an "empty" library, or put a bunch of #ifdefs in here. Either is not optimal and I think it'd be better to leave it like this. WDYT?

kbobyrev edited the summary of this revision. (Show Details)Apr 23 2020, 6:38 AM
kbobyrev updated this revision to Diff 259553.Apr 23 2020, 6:50 AM

Convert plain function pointers to std::function in templates.

kbobyrev updated this revision to Diff 259604.Apr 23 2020, 9:26 AM

Implement (B) for now

kbobyrev updated this revision to Diff 259644.Apr 23 2020, 11:25 AM
kbobyrev marked an inline comment as done.

Resolve the last comment.

sammccall added inline comments.Apr 23 2020, 1:15 PM
clang-tools-extra/clangd/Features.inc.in
2

nit: can we use the same name for the cmake variable and the preprocessor define?

There's a lot of potential for conceptual confusion between them, but I don't think giving them subtly different names helps, I'd rather make it just work for confused people :-)

clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
14

Features.inc - that's the version that's been cmake-substituted
(I'm not sure why this isn't a regular include-guarded header, let's fix that separately)

18

Being able to include this header but not call the function in it doesn't make sense - linker errors aren't a friendly way to surface this constraint.

If the goal is to forbid use of the API without it compiled in, the header should #ifndef #CLANGD_ENABLE_REMOTE #error "Client.h included but CLANGD_ENABLE_REMOTE is off" #endif, and all the include-sites should be #ifdef-guarded.

(But please see other comments first)

40

I'd consider rather making this a string prefix on IndexLocation, so you'd invoke as dexp remote:localhost:4004 or whatever.
This encapsulates more easily, e.g. we can eventually move this into openIndex so all the callers transparently support the remote index.
(Though we shouldn't push that dependency into clangd in *this* patch, it just makes it easier in future)

clang-tools-extra/clangd/index/remote/Client.h
22

I can't follow this sentence - it's not clear what "SymbolIndex request" or "the actual connection" are.

I guess you mean something like "This method blocks to resolve the address, but does not wait for the channel to become ready".

22

The blocking behaviour is interesting:

  • clangd should keep working while offline or in bad connectivity (thus no timeout on RPCs is a bug, we shouldn't rely on the channel outright detecting failure)
  • this implies you don't want to return null if the connection can't be established, and so there's indeed not much point blocking. But blocking on resolving the name removes a lot of the benefit here.

I think GRPC actually has a pretty good model, it's documented here: https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md
(may be worth adding a link).
I'm not sure if this is actually blocked on name resolution, but it seems like it's at most blocked on *one attempt* at name resolution: the channel can still be created if resolution hasn't succeeded yet.

23

RPCs have no timeout should be a FIXME

30

I'm not sure what this is saying or why a caller would want to know it.

clang-tools-extra/clangd/index/remote/Index.cpp
1 ↗(On Diff #259604)

This implements Client.h, should be Client.cpp

14 ↗(On Diff #259604)

I'd consider using `<> for this include, or at least splitting it into a different section - non-LLVM non-stdlib includes are really rare.

26 ↗(On Diff #259604)

no need to pass the stub if you make this a member func

27 ↗(On Diff #259604)

Hmm, I think member pointers are totally the right thing here. And all the params should be deducible.

Have a look at https://godbolt.org/z/AnprJ- (simplified but self-contained and I think shows all the bits)

The thing I can't work out how to do is make the member pointer a template param *and* have it be deduced...

31 ↗(On Diff #259604)

SpanName can be RPCRequestT::descriptor()->name() (e.g. "LookupRequest") rather than having to pass it explicitly

46 ↗(On Diff #259604)

not even attempting to open the connection on startup seems to introduce more latency on first request than necessary. Call Channel->GetState(true)?
That won't block, but it will cause the connection to establish in the background.

68 ↗(On Diff #259604)

this callback is almost the same - if you added a dummy stream result to Lookup you could move it into streamrpc :-)

97 ↗(On Diff #259257)

I would argue that including index/remote/Client.h without CLANGD_ENABLE_REMOTE is not correct. We would still have to put #ifdefs in the user code regardless of whether what you proposed is implemented

This is begging the question - what I'm proposing is to give that header/library defined behavior when CLANGD_ENABLE_REMOTE is off, and so you wouldn't need #ifdefs.

I don't see any benefits in allowing users to include index/remote/Client.h, link clangdRemoteIndex and getting a runtime error.

The benefits are:

  • no ifdefs and fewer conditional cmake rules, which are both hard to read and result in untested constructs without even basic syntax checking
  • fewer cases to consider in clients, as this falls into the regular "can't connect to index" codepath
  • readers trying to follow the possible behaviours end up at the header documentation of a function that is called, which is an easy and familiar workflow

All of those steps have _remote_ in them and if _remote mode_ is not enabled, something certainly went wrong

Again, this is begging the question. If the header says "if GRPC mode is not enabled, always returns nullptr", then everything is working as designed.

Also, this will complicate CMake structure as I can't put files that are conditionally added/excluded from the clang library

Why not? (This sounds like a fair technical objection, but may be surmountable)

clang-tools-extra/clangd/index/remote/Marshalling.h
25 ↗(On Diff #259604)

You could also just log the error and return Optional at this level... there's no recovery other than logging anyway. I kind of hate Expected/Error because of their asserting and weird non-constness so I'd be tempted by that.
Up to you though

clang-tools-extra/clangd/index/remote/server/CMakeLists.txt
17

why does the server depend on the client?

kbobyrev updated this revision to Diff 259701.Apr 23 2020, 1:52 PM
kbobyrev marked 13 inline comments as done.

Resolve comments that refer to the outdated patch and address few small issues.

clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
18

Should be resolved in the newer version with (A) implementation.

clang-tools-extra/clangd/index/remote/Client.h
23

streamRPC already has a FIXME for that.

clang-tools-extra/clangd/index/remote/Index.cpp
97 ↗(On Diff #259257)

I believe this is resolved in new version.

Also, this will complicate CMake structure as I can't put files that are conditionally added/excluded from the clang library

Why not? (This sounds like a fair technical objection, but may be surmountable)

As discussed before, there is a CMake module in LLVM that errors out in case some files are not included in any library. In case of conditional compilation, either UnimplementedClient.cpp or Client.cpp will be left out (or this has to be the single file with #ifdefs). See new version for the exact layout I came up with.

This is especially unfortunate because I actually need two libraries - one for Client implementation and one for Marshalling. And because of that I have to have three separate directories :( Suboptimal, but probably OK.

clang-tools-extra/clangd/index/remote/server/CMakeLists.txt
17

Fixed in a newer diff.

kbobyrev updated this revision to Diff 259762.Apr 23 2020, 5:29 PM
kbobyrev marked 3 inline comments as done.

Resolve couple more comments.

clang-tools-extra/clangd/index/remote/Index.cpp
27 ↗(On Diff #259604)

Uh, I had the pointers but decided against it because of rather... bizarre syntax :D

And yes, just like you said, the calls contained too many explicit template parameters, so I decided the implementation should be more verbose than "user code". I'm not against pointers, but I think the current version looks simpler, WDYT?

kbobyrev updated this revision to Diff 259836.Apr 24 2020, 2:31 AM
kbobyrev marked 6 inline comments as done.

Resolve the last comments.

kbobyrev updated this revision to Diff 259839.Apr 24 2020, 2:44 AM

Don't use callback on the last message.

sammccall accepted this revision.Apr 24 2020, 3:45 AM

LG, nits apart from the llvm_unreachable in unimplemented, let's discuss further if you disagree on that.

We do have to go back and add unit tests for this stuff. I'm OK with doing that once the marshalling is sorted out (not YAML), but we should make sure we remember!

clang-tools-extra/clangd/index/YAMLSerialization.cpp
536

we should still log (or at least vlog)

(Actually personally I'd return Expected here and then log and drop to optional in Marshalling, but it doesn't matter)

clang-tools-extra/clangd/index/dex/dexp/CMakeLists.txt
2

This comment is likely to get stale. I'd drop it: if we're including the clangd source dir, also including the clangd binary dir is appropriate and doesn't really require an explanation.

clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
32

Dex -> index file
(dex is the in-memory implementation, not the data format)

44

dex -> dexp (and in the example command)

331

no longer needs to be ifdef'd: if the user passes remote:foo and it's not compiled in, they'll get an appropriate error

clang-tools-extra/clangd/index/remote/CMakeLists.txt
26

Maybe "# provides a dummy implementation of clangdRemoteIndex"

clang-tools-extra/clangd/index/remote/Client.cpp
30

whether this is a member pointer or a function wrapper, it should be in a typedef or readability.
e.g.
template <typename ReqT, typename RspT>
using StreamingCall = std::function<unique_ptr<ClientReader<RspT>>>(Stub*, ClientContext*, const RspT&);

48

nit: I'd suggest dropping "YAML" from this message so we don't forget later. Rather "Received invalid {0}: {1}", ReplyT::descriptor()->name(), Sym.takeError()

96

nit: for slightly cleaner separation, I'd suggest creating the channel here and passing it into the IndexClient constructor.

The channel is where various configuration (auth, etc) might live, and it's nice if the client class stays doesn't need to deal with that level of abstraction.
e.g. a test version that doesn't use the real network would pass in a different channel and the IndexClient wouldn't care. (Not sure if that's how testing is actually done in grpc these days, but you get the idea...)

clang-tools-extra/clangd/index/remote/Client.h
2

Client.h - Connect to a remote index via gRPC

clang-tools-extra/clangd/index/remote/Index.cpp
27 ↗(On Diff #259604)

There are advantages of member function pointers here:

  • the fact that it's a member function on the stub *is* significant information here I think
  • passing a std::function allows deduction of RPCRequestT and ReplyT as the compiler can pattern-match, while that doesn't work for function<> because there's an implicit conversion in the way.
  • (std::function is slow, but that shouldn't matter here. I guess function_ref doesn't have support for member functions?)

And yes, just like you said, the calls contained too many explicit template parameters

I think this is backwards, the function pointer version should require no explicit type template parameters, where the function<> version requires 2. See the godbolt example above for deduction.

(The thing I couldn't get working was making the member fptr a *non-type-template-param* while keeping the type template params deduced, but that's not a disadvantage vs function<>)

I think the current version looks simpler, WDYT?

After extracting an alias template for the type (which should happen anyway) I don't think there's a significant difference really:

template <typename ReqT, RspT>
using StreamingCall = std::function<std::unique_ptr<grpc::ClientReader<ReplyT>>(Stub*, ClientContext*, const ReqT&)>;
// vs
template <typename ReqT, RspT>
using StreamingCall = std::unique_ptr<grpc::ClientReader<ReplyT>>(Stub::*)(ClientContext*, const ReqT&);

the member fptr syntax is more unusual/exotic, but shorter and less nested. Neither is really readable, and the readability of this line isn't that significant.

clang-tools-extra/clangd/index/remote/server/Server.cpp
64

*NextMessage.mutable_stream_result() = toProtobuf(Sym)

and move the trivial toProtobuf logic to Marshalling (so it's trivial to replace later)

clang-tools-extra/clangd/index/remote/unimplemented/CMakeLists.txt
2

This is unusual, include a comment for what it's doing.

clang-tools-extra/clangd/index/remote/unimplemented/Client.cpp
1 ↗(On Diff #259839)

Probably call this UnimplementedClient.cpp, no need to have the source filenames unneccesarily coincide? Makes it clear the other one is the primary.

19 ↗(On Diff #259839)

no llvm_unreachable, we shouldn't crash. Just elog and continue...

This revision is now accepted and ready to land.Apr 24 2020, 3:46 AM
kbobyrev updated this revision to Diff 259859.Apr 24 2020, 4:42 AM
kbobyrev marked 16 inline comments as done.

Address the last round of comments.

sammccall accepted this revision.Apr 24 2020, 5:05 AM
sammccall added inline comments.
clang-tools-extra/clangd/index/remote/Client.cpp
65

nit: LookupRequest and LookupReply should be deducible now I think

This revision was automatically updated to reflect the committed changes.
kbobyrev marked an inline comment as done.Apr 24 2020, 5:27 AM
clang-tools-extra/clangd/index/remote/Index.proto