This is an archive of the discontinued LLVM Phabricator instance.

[lldb] [llgs] Implement non-stop style stop notification packets
ClosedPublic

Authored by mgorny on May 13 2022, 11:52 AM.

Details

Summary

Implement the support for %Stop asynchronous notification packet format
in LLGS. This does not implement full support for non-stop mode for
threaded programs -- process plugins continue stopping all threads
on every event. However, it will be used to implement asynchronous
events in multiprocess debugging.

The non-stop protocol is enabled using QNonStop packet. When it is
enabled, the server uses notification protocol instead of regular stop
replies. Since all threads are always stopped, notifications are always
generated for all active threads and copied into stop notification
queue.

If the queue was empty, the initial asynchronous %Stop notification
is sent to the client immediately. The client needs to (eventually)
acknowledge the notification by sending the vStopped packet, in which
case it is popped from the queue and the stop reason for the next thread
is reported. This continues until notification queue is empty again,
in which case an OK reply is sent.

Asychronous notifications are also used for vAttach results and program
exits. The ? packet uses a hybrid approach -- it returns the first
stop reason synchronously, and exposes the stop reasons for remaining
threads via vStopped queue.

The change includes a test case for a program generating a segfault
on 3 threads. The server is expected to generate a stop notification
for the segfaulting thread, along with the notifications for the other
running threads (with "no stop reason"). This verifies that the stop
reasons are correctly reported for all threads, and that notification
queue works.

Sponsored by: The FreeBSD Foundation

Diff Detail

Event Timeline

mgorny created this revision.May 13 2022, 11:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2022, 11:52 AM
mgorny requested review of this revision.May 13 2022, 11:52 AM
mgorny updated this revision to Diff 431124.May 20 2022, 10:04 PM

Fix missing OK response for c/C/vCont/… packets.

mgorny updated this revision to Diff 431140.May 21 2022, 8:56 AM

Implemented vCtrlC packet and added a test for it.

mgorny updated this revision to Diff 432787.May 29 2022, 7:27 AM
mgorny edited the summary of this revision. (Show Details)

Improve protocol conformance:

  • use non-stop notifications for vAttach packets
  • clear notification queue and use hybrid approach for ? packets
  • ensure that the server doesn't exit before the client gets a chance to ACK the exit notification
mgorny updated this revision to Diff 434006.Jun 3 2022, 6:16 AM

Send OK before the asynchronous stop reason to k packet. Support vCont;t (as a special case) in non-stop mode.

I think this is a pretty good start. The main thing bugging me is the two bool flags on the stop reply packet functions. I wonder if it would be more readable to split this into two functions. Something like:

SendStopReplyForThread(bool force_synchronous /*seems better than allow_async, but that could just be me*/, bool enqueue /*enqueues just this thread*/);
EnqueueStopReplyPackets(NativeThreadProtocol &thread_to_skip);

perhaps?

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
3709

Is this correct? I would expect to this to first read (front()) the packet before popping it.

lldb/test/API/tools/lldb-server/TestLldbGdbServer.py
1414

Could you put these into a new file? TestLldbGdbServer is already one of the longest running tests...

1443

Is this necessary? (I get that it's not useful, but it seems weird...)

1525

Are you sure this is not racy (like, can the inferior terminate before we get a chance to interrupt it)?

mgorny added inline comments.Jun 15 2022, 12:55 PM
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
3709

Yes, it is. I suppose it could be a bit misleading. The first message is sent as asynchronous notification but we keep it on the queue (and technically, the protocol assumes we may resend the asynchronous notification if we think client may have not gotten it). vStopped means client has processed it and is ready for the next one, so we pop the one that's been sent already and send the next one, wait for the next ACK and so on.

lldb/test/API/tools/lldb-server/TestLldbGdbServer.py
1443

It's convenient because it lets us compare the matches from threads and threads_verify.

1525

Well, if I'm reading the code right, the new thread waits 60 seconds, so yes, technically it's racy ;-). Do you have a better idea? Some new argument that puts the process in an infinite sleep loop?

mgorny marked an inline comment as done.Jun 17 2022, 9:28 AM

I think this is a pretty good start. The main thing bugging me is the two bool flags on the stop reply packet functions. I wonder if it would be more readable to split this into two functions. Something like:

SendStopReplyForThread(bool force_synchronous /*seems better than allow_async, but that could just be me*/, bool enqueue /*enqueues just this thread*/);
EnqueueStopReplyPackets(NativeThreadProtocol &thread_to_skip);

perhaps?

force_synchronous makes sense. I guess moving the loop into another function does as well. However, I think the bool enqueue change is going to be more confusing than helpful (note that the notification is always enqueued through SendNotificationPacketNoLock() unless force_synchronous, so bool enqueue only applies for that case), and I don't really see how to avoid bool queue_all_threads other than repeating the calls to EnqueueStopReplyPackets() all over the place — which ofc is possible but seems like a lot of duplication.

The thing is, we generally have three cases to handle:

  1. A function that uses asynchronous replies in non-stop mode — i.e. we always enqueue, and send async notification if one wasn't sent already.
  2. A function that always uses synchronous replies — i.e. we send plain old reply and don't queue anything.
  3. ? — which in non-stop mode enqueues for all threads but sends the first one synchronous rather than asynchronous.

Perhaps the best approach would be to eliminate bool enqueue entirely and call PrepareStopReplyPacketForThread() from ? handler directly. I need to think about it more.

lldb/test/API/tools/lldb-server/TestLldbGdbServer.py
1414

Yes, I agree. TestNonStop.py coming once!

mgorny marked an inline comment as done.Jun 17 2022, 10:11 AM

Thinking about it more, ? needs to be special-cased anyway to handle multiprocess, so that's probably the best way forward.

mgorny updated this revision to Diff 438007.Jun 17 2022, 12:29 PM

Move tests to a separate file. Rename and reverse allow_async into force_synchronous. Remove queue_all_threads — we now always queue all in async mode. Rework Handle_stop_reason not to rely on SendStopReasonForState(), and fix reporting exiting processes via ? when not acked by client yet. Add tests for correct exit reporting.

labath accepted this revision.Jun 21 2022, 7:26 AM

Ok, looks good. Some extra suggestions inline.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
487

no else after return

1564–1574

Could we have a helper function for this, and make all the continue-like actions go through that?

1720

llvm::erase_if(m_stop_notification_queue, []{ return !begins_with_W_or_X;});

3203

add /*force_synchronous=*/ here and elsewhere.

3709

Yeah, it is confusing. Having a comment (with the explanation you just made) would help a lot. However, if you don't see any use case for the resending, we could also delete it, and use the standard queue pattern. Up to you...

lldb/test/API/tools/lldb-server/TestLldbGdbServer.py
1443

Ok, I suppose I can live with that.

1525

That's ok, a 60s wait is fine. I forgot there was a default sleep action -- I though the process would quit immediately...

This revision is now accepted and ready to land.Jun 21 2022, 7:26 AM
mgorny marked 10 inline comments as done.Jun 21 2022, 9:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 10:10 AM
mib added a subscriber: mib.Jun 21 2022, 5:54 PM

Hi @mgorny ! TestNonStop.py is also failing on the macOS bots: https://green.lab.llvm.org/green/job/lldb-cmake/44743/

Let me know if you need help reproducing the issue, otherwise skip the test on Darwin systems @skipIfDarwin

Thanks!

Hi @mgorny ! TestNonStop.py is also failing on the macOS bots: https://green.lab.llvm.org/green/job/lldb-cmake/44743/

Let me know if you need help reproducing the issue, otherwise skip the test on Darwin systems @skipIfDarwin

Sorry about that. Actually, I need to mark them as LLGS-specific. Thanks for reporting!