This is an archive of the discontinued LLVM Phabricator instance.

Introduce a MainLoop class and switch llgs to use it
ClosedPublic

Authored by labath on Jul 9 2015, 9:32 AM.

Details

Summary

This is the first part of our effort to make llgs single threaded. Currently, llgs consists of
about three threads and the synchronisation between them is a major source of latency when
debugging linux and android applications.

In order to be able to go single threaded, we must have the ability to listen for events from
multiple sources (primarily, client commands coming over the network and debug events from the
inferior) and perform necessary actions. For this reason I introduce the concept of a MainLoop.
A main loop has the ability to register callback's which will be invoked upon receipt of certain
events. MainLoopPosix has the ability to listen for file descriptors and signals. I have also
introduced a stubbed-out implementation of MainLoopWindows, which was necessary to make code
compile on windows (this does not introduce any regression as the llgs component is not used on
windows). It is my hope that if can get a windows implementation of this class, we could
gradually replace the uses of select in lldb (which are currently #ifdefed for windows) with
MainLoop invocations.

For the moment, I have merely made the GDBRemoteCommunicationServerLLGS class use MainLoop
instead of waiting on the network socket directly, but the other threads still remain. In the
followup patches I indend to migrate NativeProcessLinux to this class and remove the remaining
threads.

Diff Detail

Repository
rL LLVM

Event Timeline

labath updated this revision to Diff 29342.Jul 9 2015, 9:32 AM
labath retitled this revision from to Introduce a MainLoop class and switch llgs to use it.
labath updated this object.
labath added a subscriber: lldb-commits.
labath added a comment.Jul 9 2015, 9:35 AM

I would welcome any feedback about the general design of the class (and in particular, whether such a design is workable on windows).

clayborg edited edge metadata.Jul 9 2015, 9:51 AM

Since there are going to require multiple patches to make this work it might be best to do this in a branch. I worry about being able to handle everything in a single threaded way and have it be faster. I worry about deadlocks and other things that might not be anticipated. Did you profile this to make sure the threading is a huge overhead? What exact parts are holding up things?

include/lldb/Core/Connection.h
202 ↗(On Diff #29342)

return "lldb::IOObjectSP()" instead of nullptr.

tools/lldb-server/lldb-gdbserver.cpp
667 ↗(On Diff #29342)

Why do you need to pass "mainloop" if it is already in gdb_server?

The design is very signal centric right now. Not sure if that will bode well for windows.

What are the signals supposed to be for? Just handling "SIGHUP" and "SIGINT"? If so, you need to make sure no one does any real work inside any of these functions because it isn't safe to do 95% of any function calls in a signal handler function. It would be better to just create a lldb_private::Event into your loop that can be handled on the up and up and the signal handler would just post an event to the queue for the MainLoop to consume. Then you can do anything you want when you get the event because you aren't handling it from a signal handler callback. But of course this is usually the problem: something has locked up your lldb-server and you want to interrupt it. If the MainLoop is handling a packet that is deadlocked somehow, you will never be able to consume the event you just created for SIGHUP and you won't be able to cancel/kill/disconnect...

Are the signal handlers designed to handle the signals received from the program that is being debugged? If so this would be better served up using StopInfoSP objects for specific threads.

I look forward to seeing your responses.

amccarth edited edge metadata.Jul 9 2015, 4:54 PM

Assuming I've understood correctly ...

I don't think any approach will completely eliminate the need for a separate thread on Windows to run the debug loop (at least, not without an inefficient hack like polling for events).

On Windows, debug events are special--they don't work like other types of asynchronous events, so you can't just wait for them like you might wait for an asynchronous i/o or for some handle to become signaled. Nor is there a hybrid wait function (like MsgWaitForMultipleObjects that lets you wait for a either a windows message or a handle to become signaled). As far as I know, there's just WaitForDebugEvent, which only waits for debug events (and imposes some restrictions on other things that can be done in that thread). You could poll by waiting with a 0 timeout, but that would be an inefficient hack, so that's a non-starter.

But that doesn't rule out this MainLoop abstraction. A simple debug loop in a separate thread could translate raw Windows debugging events into events that could be detected by MainLoop. So if MainLoop helps on other platforms, the Windows implementation could be made to work with this model, probably without a significant difference in performance or code complexity.

include/lldb/Host/posix/MainLoopPosix.h
23 ↗(On Diff #29342)

s/Montitoring/Monitoring/

24 ↗(On Diff #29342)

s/recieves/receives/

81 ↗(On Diff #29342)

What sorts of signals might this mechanism be used for?

Be aware that Windows defines fewer signals than POSIX.

include/lldb/Host/windows/MainLoopWindows.h
21 ↗(On Diff #29342)

Is there a particular reason you didn't defined a common interface in a base class that both MainLoopWindows and MainLoopPosix would derive from? Is it concern over the performance of virtual function calls? Or are you expecting different platforms to provide significantly different public interfaces?

labath marked 3 inline comments as done.Jul 10 2015, 3:10 AM

Thank you for the comments. My (rather lengthy) replies are below.

Since there are going to require multiple patches to make this work it might be best to do this in a branch. I worry about being able to handle everything in a single threaded way and have it be faster. I worry about deadlocks and other things that might not be anticipated. Did you profile this to make sure the threading is a huge overhead? What exact parts are holding up things?

It is going to be a series of patches, but it's not going to be that big, since I have already done a big thread cleanup inside NativeProcessLinux. llgs now consists of three threads
a) network communication thread
b) inferior monitoring thread
c) stdio forwarding thread
This commit makes (a) use the mainloop abstraction. I expect I will need two more to port over (b) and (c), and then maybe some small cleanup commits. I think it will be easier to review this if I keep the work on the main branch, and I believe it will work without introducing regressions.
( Just to make things clear, I do not intend to do anything with the client threads here. The IOHandler thing i suggest below is merely an abstraction switch, but all the thread interactions remain the same. I do feel that the client has too many threads, but that is a completely different topic. :) )

I already have a butchered version of llgs, which runs on a single thread. I have noticed no deadlocks and my benchmarks indicate that the round-trip time for the qThreadStopInfo packet has gone down from 7.8 to 5.4 ms (these numbers include network latency). For comparison switching to jThreadsInfo did not speed things up at all, it merely added up the individual round-trip (but I have some ideas on how to improve that as well). The threading is an overhead for us since we have to perform all debug operations on a designated thread. This involves a lot of back-and-forth when we are responding to packets since our debug operations are quite small-grained (e.g. "read this register"). Now we could alleviate this by making the operations more coarse-grained ("read all GPRs"), but I think we would hit this wall sooner or later. As I understand it, debugserver is also single-threaded for performance reasons, right?

This is why I think it would be better to get rid of the thread ping-pong first, and then profiling the remaining bottle-necks (of which there are many, I am sure) will become easier.

What are the signals supposed to be for? Just handling "SIGHUP" and "SIGINT"? If so, you need to make sure no one does any real work inside any of these functions because it isn't safe to do 95% of any function calls in a signal handler function.

Currently, the signal handler just sets a flag. This flag will be checked in the Run() function and the appropriate callback invoked if it is set. NativeProcessLinux will install a callback for SIGCHLD and will retrieve all the info it needs when it gets called.

If the MainLoop is handling a packet that is deadlocked somehow, you will never be able to consume the event you just created for SIGHUP and you won't be able to cancel/kill/disconnect...

SIGHUP is not my main motivation for this, but I have ported it while I was at it. SIGHUP is used as a request for graceful termination, so checking for the flag after packet handling completes is a good choice I think (and this is what the current implementation is doing). If the process is locked up, then we are unlikely to be able to terminate gracefully, so sending a signal which will kill us outright (SIGTERM or any other unhandled signal) is as good solution as any.

But that doesn't rule out this MainLoop abstraction. A simple debug loop in a separate thread could translate raw Windows debugging events into events that could be detected by MainLoop. So if MainLoop helps on other platforms, the Windows implementation could be made to work with this model, probably without a significant difference in performance or code complexity.

I don't think the current implementation of ProcessWindows needs to worry about the MainLoop (see the comment about IOHandler on why i am involving you). However, if windows ever moves to the remote debugging model then we may need to do something like you suggest.

include/lldb/Host/posix/MainLoopPosix.h
81 ↗(On Diff #29342)

The main use for me now is the receipt of SIGCHLD (which signals that there is something interesting happening in the inferior). On linux we can use this to multiplex inferior events and network communication on a single thread.

I know that on windows, signals aren't really a thing and if we don't need them there, MainLoopWindows does not even have to declare this method. On the other hand, I can imagine implementing a MLW::RegisterWindowsSpecificHandle, which would then multiplex socket I/O with some other events using whatever method is right for that on windows. Or maybe it does not need additional methods and everything that we need to wait for can be represented with IOObjects.

The main value of this abstraction I see for windows would be for things like IOHandlerProcessSTDIO, which is disabled for windows, because there you cannot select() over non-socket file descriptors. I imagine the code there can be replaced by something like

void IOHandlerProcessSTDIO::Run() {
  MainLoop::ReadHandleUP = m_mainloop.RegisterReadObject(m_read_file,
    [] { do_whatever_you_need_to_do_when_data_is_available(); } );
  m_mainloop.Run();
}
void IOHandlerProcessSTDIO::Cancel() {
  m_mainloop.RequestTermination();
}

(for this we would need to make RequestTermination be safe to execute from another thread, which I haven't done yet, since I don't need it, but it can be easily added)

include/lldb/Host/windows/MainLoopWindows.h
21 ↗(On Diff #29342)

I saw the typedef pattern in LockFile.h, but I did not notice that there actually is a LockFileBase that they inherit from. I will add a Base class for this.

tools/lldb-server/lldb-gdbserver.cpp
667 ↗(On Diff #29342)

I have refactored this to avoid passing the parameter. Main loop is now run directly from the main function. (mainloop is in gdb_server, because the NativeProcessLinux will need to register additional events when it get's spawned).

labath updated this revision to Diff 29439.Jul 10 2015, 4:22 AM
labath edited edge metadata.
  • Address review comments
clayborg accepted this revision.Jul 10 2015, 10:01 AM
clayborg edited edge metadata.

I am fine with this part of the patch since it doesn't introduce any changes and most of the threading issues seem to be with the ProcessMonitor if I understood your comments correctly.

To clarify how debugserver works: It has 3 threads:

  • main thread that just does read packet, process packet, write packet, loop
  • exception thread for catching exceptions from the debugged process
  • reap thread to read the unix process
This revision is now accepted and ready to land.Jul 10 2015, 10:01 AM
ovyalov accepted this revision.Jul 10 2015, 10:39 AM
ovyalov edited edge metadata.

Minor comments.

include/lldb/Host/posix/MainLoopPosix.h
91 ↗(On Diff #29439)

s/int/IOObject::WaitableHandle ?

source/Host/posix/MainLoopPosix.cpp
108 ↗(On Diff #29439)

Could add a check for iterator != m_signals.end() ?

171 ↗(On Diff #29439)

Check for termination flag set here?
It seems no need to read from sockets if signal handlers requested termination?

tools/lldb-server/lldb-gdbserver.cpp
411 ↗(On Diff #29439)

Could you log if error is returned?

470 ↗(On Diff #29439)

ditto.

labath marked 3 inline comments as done.Jul 13 2015, 3:38 AM
labath added inline comments.
source/Host/posix/MainLoopPosix.cpp
171 ↗(On Diff #29439)

Ok, makes sense. I will add the check after every callback then, because by extension, there is no need to read from the second socket (if any) if the first socket requested termination.

tools/lldb-server/lldb-gdbserver.cpp
411 ↗(On Diff #29439)

Done. I've also merged the two branches a bit.

This revision was automatically updated to reflect the committed changes.