Page MenuHomePhabricator

[clangd] Log remote index connectivity status
ClosedPublic

Authored by kbobyrev on Nov 26 2020, 4:20 PM.

Diff Detail

Event Timeline

kbobyrev created this revision.Nov 26 2020, 4:20 PM
kbobyrev requested review of this revision.Nov 26 2020, 4:20 PM
kbobyrev planned changes to this revision.Nov 26 2020, 4:21 PM

This is not working properly yet.

kbobyrev updated this revision to Diff 307936.Nov 26 2020, 4:25 PM

Remove some logs.

kbobyrev updated this revision to Diff 307942.Nov 26 2020, 6:06 PM

This is working now. However, the handshake is actually deferred for
ProjectAwareIndex until the first request. I tried to invoke it via something
like Opts.StaticIdx->estimateMemoryUsage() in ClangdMain.cpp but the
problem is getIndex() will return nullptr until the Config::current is
ready. I'm not entirely sure when this happens so I can't promote handshake to
some sensible time point yet.

kbobyrev edited the summary of this revision. (Show Details)Nov 26 2020, 6:07 PM
kbobyrev added a reviewer: sammccall.
kbobyrev updated this revision to Diff 307943.Nov 26 2020, 6:08 PM

Remove unused header.

Harbormaster completed remote builds in B80286: Diff 307942.

Haven't checked the details but is there a specific reason for implementing a custom protocol rather than making use of NotifyOnStateChange (https://grpc.github.io/grpc/cpp/classgrpc_1_1_channel_interface.html) or even WaitForStateChange if we really want to block the channel creation ?

Also if there's a need to make this an RPC request, there's the default HealthCheck request we can send, as described in https://github.com/grpc/grpc/blob/master/doc/health-checking.md. This is already enabled for remote-index-server in https://github.com/llvm/llvm-project/blob/master/clang-tools-extra/clangd/index/remote/server/Server.cpp#L310, but I suppose registering a callback for statechange and logging that should be enough for a "debugging" use case (not sure what the motivation currently is).

It'd be nice to know what problem this is trying to solve :-)
I can guess, but don't feel entirely sure here.

Haven't checked the details but is there a specific reason for implementing a custom protocol rather than making use of NotifyOnStateChange (https://grpc.github.io/grpc/cpp/classgrpc_1_1_channel_interface.html) or even WaitForStateChange if we really want to block the channel creation ?

Right, this is usually the way to go: we should definitely log NotifyOnStateChange events (note that the connection can go between available/unavailable many times over the life of the program). WaitForStateChange would be useful for blocking, but in a program where we want to proceed even if the connection is down (which is AFAIK the plan here) it's usually best not to block, and just handle fast-failure because the channel is down the same way as any other RPC error.

(I can imagine scenarios where we'd want to notify the user with showMessage or so whether the index is connected, but this requires more design)

clang-tools-extra/clangd/index/remote/Client.cpp
179

This causes getClient to become blocking.

Granted, we expect this to be used from ProjectAwareIndex which currently places it on a separate thread anyway. But in general, what's the advantage of blocking here?

183

this means if we're offline or so when clangd starts up, the remote index will not work until restarting clangd. Do we want this behavior?

clang-tools-extra/clangd/index/remote/Index.proto
14 ↗(On Diff #307943)

I find this protocol confusing, e.g. the "checksum" isn't actually a checksum of anything.
But this wolud be easier to assess if we knew what this is for.

kbobyrev updated this revision to Diff 308269.Nov 30 2020, 12:06 AM
kbobyrev marked 3 inline comments as done.

Use WaitForStateChange instead. Simplify code structure.

Yes, the idea here is to provide logging for gRPC server state. My original thought was that default HealthCheck might not be enough and we might want to add more things in the mechanism but I guess it's not obvious what these extensions might be and getting grpc_connectivity_state might be enough.

kbobyrev retitled this revision from [clangd] Implement remote index handshake to [clangd] Log remote index connectivity status.Nov 30 2020, 1:21 AM

Thanks! Hooking into the native status stuff seems much nicer!

Sorry if it seems like we're going in circles here, I think we should be careful about adding background threads so want to look at an alternative too.

clang-tools-extra/clangd/index/remote/Client.cpp
27

this seems pedantic, but we really shouldn't specialize templates we don't own for types we don't own (ODR concerns)

I'd just make this function (state -> StringRef) and call it

33

any reason not to make these a bit more human-readable? connecting/ready/transient failure/shutdown?

123

The thread is a bit surprising here. I was assuming we'd use some callback-based API to watch rather than spawn a thread to do the "work" of blocking on a change.
Also this is going to "wake up" periodically to refresh the watch, which seems undesirable if we want to go truly idle.

Ah, so there's no callback-type connection-changed API! Sorry to mislead you :-( I hadn't read the API carefully. (I think this is coming as some point - they're talking about replacing CompletionQueue with EventManager which is a callback-based abstraction)

Seems the "async" option is to subscribe on a CompletionQueue, but then we have to poll it. We can use a zero timeout so the poll doesn't block... But then we might as well poll GetStatus directly.

Then the question is where to do that from. What if we called GetStatus before/after each request, and reported whenever it has changed? That means:

  • no idle background work
  • we know the connection state for each request
  • we know when a request triggered a connection, and whether that connection succeeded in time for the request
  • if a connection goes idle, we don't know *when*, it gets reported on the next request
132

This is spinning in the background keeping the channel alive (true)... I'm actually not sure that's advisable.

If the connection is being used, it'll stay alive in the same way. So true here affects idle (and init) behavior.

Specifically, we maintain the TCP connection for the idle period to reduce latency of the first couple of requests after waking up (reestablish connection). I'm not sure this is a good trade, idle could easily be long and I don't think we know better than the OS when to drop a connection. (Cost might be battery life, background CPU load... IDK but probably something)

I think we should ask the connection to establish on startup (*high* chance we need it soon) but otherwise only in response to requests

133

Can we include the remote address and prior status?

e.g. "Remote index ({0}): {1} => {2}"

kbobyrev planned changes to this revision.Dec 8 2020, 2:54 PM

Need to resolve review comments.

kbobyrev updated this revision to Diff 310381.Dec 8 2020, 4:07 PM
kbobyrev marked 3 inline comments as done.

Resolve the comments.

clang-tools-extra/clangd/index/remote/Client.cpp
123

("resolved" in this version)

I can understand your point and I support the arguments but this would mean that we only get status during requests and can't query it at arbitrary time which seems like a loss. Maybe it's OK because there might not be many interesting cases when we want to query the server status but not do any requests and your proposal makes it easier to implement & maintain. But also maybe it could be nice to have some kind of callback on status change.

As you mentioned, I was looking for something like this in gRPC itself but couldn't find it there outside the CompletionQueue and EvenManager things. Sadly, I don't think we'd be in a position to use the replacement even when it's there since we have to support older gRPC versions :(

When I see the logs and the server connectivity updates are following the requests, it looks OK but I'm not entirely sure giving up on _watching_ the status is great. But it still makes life easier so maybe this change should be about it.

133

I can see how it'd be nice but I'm not sure how to do this properly. My only idea is to query prior status in the beginning of the request? Otherwise it needs to be stored somewhere and gRPC interfaces require const functions, so IndexClient does not change the state within requests. I could either wrap it into something or have global grpc_connnectivity_state for the last observed status but neither look OK. Maybe querying the prior status right before the request is OK (that's what I've done here) but I'm not sure if that's exactly what you suggested.

sammccall added inline comments.Dec 9 2020, 5:16 AM
clang-tools-extra/clangd/index/remote/Client.cpp
107

we might want to put the server address at the end in [], mirroring D92181

123

I agree, it's absolutely a loss vs watching the state continuously.
But I think simplicity needs to win out here. Imagine if we have trouble coordinating shutdown in certain configurations because the thread gets blocked (grpc bug?)... this would not be much fun to debug.

In principle we could #ifdef based on GRPC version in the future, in practice you're probably right that we just suffer without this :-(

133

TL;DR: mutable atomic

Using a mutable member is the usual solution to "I want to be stateful but have to adhere to a const interface.
We usually have some philosophical debate about whether the function is still "conceptually const" but it certainly is here as the state in question is only affecting logging.

With mutable you usually have to add a mutex to provide the usual thread-safety guarantees (const = threadsafe). However in this case we just want to detect transitions, and atomics are great at this:

auto NewStatus = getStatus();
auto OldStatus = last_status_.exchange(NewStatus);
if (NewStatus != OldStatus)
  log(...);

This is guaranteed to log each time last_status_ changes value. (I think this will correspond closely enough to actual status changes, though calling using a mutex and calling getStatus() under the mutex would be a little more accurate)

Note that there's no difference between what you want to do before vs after an RPC - both are just chances to detect a transition - so pull out a function and call it in both places.

(I'm fairly sure atomic<grpc_connectivity_state> should be directly usable, on any decent impl this should be the same implementation as atomic<underlying_type<grpc_connectivity_state>>)

kbobyrev updated this revision to Diff 310642.Dec 9 2020, 1:39 PM
kbobyrev marked 5 inline comments as done.

Resolve review comments.

sammccall accepted this revision.Dec 15 2020, 6:01 AM

LG with one fix around the atomic

clang-tools-extra/clangd/index/remote/Client.cpp
50

you access ConnectionStatus twice, and have a race between them.

auto OldStatus = ConnectionStatus.exchange(NewStatus);
if (OldStatus != NewStatus)
  log(...)
This revision is now accepted and ready to land.Dec 15 2020, 6:01 AM
kbobyrev updated this revision to Diff 311937.Dec 15 2020, 9:23 AM
kbobyrev marked an inline comment as done.

Remove the race between ConnectionStatus access.

kbobyrev updated this revision to Diff 311939.Dec 15 2020, 9:25 AM

Cleanup includes.

kbobyrev updated this revision to Diff 311942.Dec 15 2020, 9:27 AM

Cleanup rebase artifact.

kbobyrev updated this revision to Diff 311943.Dec 15 2020, 9:28 AM

Actually clean up rebase artifacts.

This revision was landed with ongoing or failed builds.Dec 15 2020, 9:30 AM
This revision was automatically updated to reflect the committed changes.