This is an archive of the discontinued LLVM Phabricator instance.

[lldb] [gdb-remote] Client support for using the non-stop protocol
Needs ReviewPublic

Authored by mgorny on May 29 2022, 7:28 AM.

Details

Summary

Add a client support for using the non-stop protocol to run processes
in all-stop mode. The operations are still done synchronously, so this
does not enable client to asynchronously resume threads or send
additional requests while the process is running.

If plugin.process.gdb-remote.use-non-stop-protocol is set to true prior
to connecting and the server reports support for QNonStop extension,
the client attempts to enable non-stop mode. Using non-stop mode has
the following implications:

  1. When a continue packet is sent, the client receives the synchronous "OK" notification and blocks until %Stop notification arrives. Then it drains the notification queue.
  1. When a "?" packet is sent, the client issues "vStopped" command series in order to drain the notification queue.
  1. When the process is about to be stopped, the client issues a "vCtrlC" command instead of the break sequence.

Sponsored by: The FreeBSD Foundation

Diff Detail

Event Timeline

mgorny created this revision.May 29 2022, 7:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2022, 7:28 AM
Herald added a subscriber: arichardson. · View Herald Transcript
mgorny updated this revision to Diff 432910.May 30 2022, 6:37 AM

Split the main tests into two variants: one that assumes that server runs in all-stop mode (like lldb-server), the other assuming it runs in non-stop mode (like gdbserver on Linux).

mgorny updated this revision to Diff 434007.Jun 3 2022, 6:17 AM

Use vCont;t instead of vCtrlC to stop other threads for better gdbserver compatibility. Defer enabling non-stop until after the ? status query, to workaround a bug in gdbserver.

(I think Jim would be interested in following this.)

I'm not sure I understand what this does. Is it that, when the setting is enabled, lldb uses non-stop communication at the protocol level, but then forces an all-stop whenever one thread stops (so that the upper layers are happy)?

API tests required changing the Process::Halt() API to inspect process'
private state rather than the public state, that remains eStateUnloaded
in gdb_remote_client tests.

I don't think that checking the private state is correct there. Doesn't that just mean that there's a bug in the tests that it's not simulating a real gdbserver sufficiently well? Or maybe (if the simulated responses are reasonable) lldb should be made to work (properly set the state) for the simulated responses?

lldb/packages/Python/lldbsuite/test/gdbclientutils.py
26

I guess this comment needs updating. I've been also thinking whether this is a sufficiently obvious way to send notifications, but I kinda like the brevity of it.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
128

not vCont;t ? Also, I think these log lines are unnecessarily verbose, and are obscuring the real code. I think a single log line (indicating failure) should be more than enough as more information (the kind of failure, for instance) can be determined by looking at the packet log.

147

This is unfortunate. Isn't there a better way to do this? Why aren't you using vCtrlC as all the comments say? Did you find any parallel to this in gdb?

450

Isn't the vCtrlC packet supposed to have a response (OK or Exx according to https://sourceware.org/gdb/onlinedocs/gdb/Packets.html#Packets) ?

lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.h
62

Maybe call this GetNonStopEnabled or GetNonStopInUse ?

lldb/test/API/functionalities/gdb_remote_client/TestGDBRemoteClient.py
686

This is not a good assertion. Maybe you could replace check that after-the-fact with assertPacketLogContains?

717

I think here you'll need to manually extract (and verify) the running/stopped events from the appropriate listener. In async mode, Continue and Stop return immediately, so there's no guarantee that anything has actually happened at this point.

Thanks for the review and welcome back!

(I think Jim would be interested in following this.)

Ok, I've added him to all the related diffs.

I'm not sure I understand what this does. Is it that, when the setting is enabled, lldb uses non-stop communication at the protocol level, but then forces an all-stop whenever one thread stops (so that the upper layers are happy)?

Yes. This way, it preserves compatibility with genuine gdbserver that actually implements non-stop mode.

API tests required changing the Process::Halt() API to inspect process'
private state rather than the public state, that remains eStateUnloaded
in gdb_remote_client tests.

I don't think that checking the private state is correct there. Doesn't that just mean that there's a bug in the tests that it's not simulating a real gdbserver sufficiently well? Or maybe (if the simulated responses are reasonable) lldb should be made to work (properly set the state) for the simulated responses?

Yes, I think it's limitation of the test logic that public state isn't being updated. I haven't really managed to get past figuring that part out. Do you have any suggestion where to look for the problematic code?

That said, I'm currently focused on multiprocess support in lldb-server. While this is relevant, it isn't blocking server and I don't want to distract myself switching contexts right now but I'll look into implementing your suggestions later on.

lldb/packages/Python/lldbsuite/test/gdbclientutils.py
26

Yeah, I can't say I'm very proud of it but I can't think of a way that would both be really clean and wouldn't involve major changes all around the place.

I suppose one option would be to wrap these packets in a minimal Notification class, and then do something like:

if isinstance(message, Notification):
  ...

and then pass Notification("foo") instead of `"%foo" but I'm not convinced it'd be actually more obvious.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
128

Good catch.

Just to be clear, do I understand that you're talking of having a single LLDB_LOG that covers all non-successful responses?

147

Yes, it is unfortunate. I'm going to think more and try to figure out a better way. I was originally thinking of simply not waiting for the response and instead letting it come after and then taking care of any pending notifications before sending the next continue packet (this would require merging D126655 first) but I'm somewhat afraid of race conditions. Though maybe unnecessarily.

Though OTOH I guess 500 ms may be insufficient for debugging over the Internet, and this would lead to even worse results.

As for vCtrlC, I've changed the code to use vCont;t as the former didn't work well with gdbserver (and the latter is also more correct wrt the protocol). I've forgotten to update the comments.

450

Indeed you're correct. I guess I've missed it because stop reason handler took care of eating it.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.h
62

I kinda wanted to emphasize 'protocol' here, to make it clear we aren't introducing full-scale non-stop support.

lldb/test/API/functionalities/gdb_remote_client/TestGDBRemoteClient.py
686

Do you mean because the client could issue an irrelevant QNonStop:0 at some point? I suppose that makes sense then.

mgorny marked 5 inline comments as done.Jul 5 2022, 9:56 AM
mgorny added inline comments.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
147

Ok, I've been thinking and thinking about this and I have a better idea. However, since this is not needed for LLGS and only for true gdbserver, I'm going to split it into a separate patch.

My idea is basically to, in a loop:

  1. issue vCont;t to request stopping all threads.
  2. issue ? to get the list of all stopped threads.
  3. issue qfThreadInfo to get list of all threads.

and basically do this until both lists are the same. That should take care of pretty much every corner case and race condition I can think of.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.h
62

Thinking about it again, I suppose you're right.

mgorny updated this revision to Diff 442519.Jul 6 2022, 4:48 AM
mgorny marked 2 inline comments as done.

Partial update. Implemented most of the requests, except what noted below. Also implemented support for %Stdio: stdout forwarding style as was eventually implemented in LLGS.

I haven't been able to figure out how to make the test suite work without the public-private state change.

As for the OK response to vCtrlC… it's hard because the incoming packet causes the pending read in GDBRemoteClientBase::SendContinuePacketAndWaitForResponse() to succeed, so the OK is swallowed there. I can't think of any way to do this that wouldn't be super ugly.

mgorny updated this revision to Diff 443427.Jul 9 2022, 3:52 AM
mgorny edited the summary of this revision. (Show Details)

Good news, everyone! It turns out that the private/public state is not needed anymore — I guess fixing the races in tests resolved that.

Did some final cleanup of tests as well. I guess this covers everything except for grabbing the "OK" response to "vCtrlC" interrupt.

mgorny updated this revision to Diff 443428.Jul 9 2022, 4:07 AM

Move common parts of the mock responder into a base class. Include PIDs in responses, and add trailing ; to T packets.

@labath, gentle ping. We appreciate all the work you've done reviewing my patches already, and we realize this is a lot of extra work for you. However, could you please try to review this one as well? It's starting to block our further work.

Sorry about the delay. I've been putting this off because it's complicated, although that's a lousy excuse. The gist of it is that I don't think that the existing setup of the "async threads" and "continue locks" is going to work in this new world. The reason it has worked until now is because the async thread has been exclusively reading packets, and the only thing that the other threads could do is to write a special single-character "packet" which has no response. With this patch neither of those things is true anymore -- the async thread sends a bunch of notification packets, and vCtrlC packet sent by other threads is a regular packet that we should get a response to. Currently, I don't see anything preventing the two threads from talking over each other.

I think we need to do something about this. *One* way to do it (I'm sure there are others) is to move all the (interruption-related) communication into the async thread. Instead of sending the interrupt (both ^C and vCtrlC) packet from the interrupting thread directly, we would instead signal the async thread in some way, and let it do the interruption for us. The communication will be single threaded then, and it should be easier to maintain the protocol integrity. We might be able to reuse/repurpose the InterruptRead functionality to achieve this. (It's possible that won't work on windows -- I should be able to help if that's the case.)

lldb/test/API/functionalities/gdb_remote_client/TestNonStop.py
80–82 ↗(On Diff #443428)

I don't think this is necessary anymore. These days we should be reseting all settings before each test.

mgorny updated this revision to Diff 448707.Jul 29 2022, 1:25 PM
mgorny marked an inline comment as done.

Thanks for the suggestions. I have to admit I don't necessarily like the InterruptRead() approach but given that the alternative involves major changes to how we read packets, I suppose it's a good enough solution (and big thanks for that suggestion!). All should be done now.

labath added a comment.Aug 3 2022, 6:43 AM

Ok, I think I like this, but since this is pretty finicky code, could I ask you to split it up into smaller patches? The first could be just the interruption refactor. The second could contain all the m_notification_packet_queue and ReadPacket stuff (I think this one could come with the a c++ unit test in unittests/Process/gdb-remote/. And the last one could be the vCtrlC/vCont;t stuff and whatever is left (?)

lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
105

Could we just send vCont;t to begin with? (I know this was discussed already, but that was before we switched to the new interruption mechanism)

mgorny added inline comments.Aug 3 2022, 6:56 AM
lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
105

I'm sorry but I don't understand what you're asking about. The vCont;t part is split into D129554 since it involves some finicky logic because we can't predict if we're going to get any notifications or not.

labath added inline comments.Aug 3 2022, 7:13 AM
lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
105

Nevermind, it's me who is not understanding things. I forgot about what's the problem with vCont;t (serves me right for not reviewing on time).

I guess my point was that we may be able to ignore the vCtrlC part, and immediately start off with a vCont;t (and then do the wait-until-everything-stops dance), but we can leave that for the other patch.

mgorny updated this revision to Diff 449744.Aug 3 2022, 12:41 PM

Rebase on top of the split patches.