This is an archive of the discontinued LLVM Phabricator instance.

[lldb/driver] Fix SIGTSTP handling
ClosedPublic

Authored by labath on Feb 22 2022, 5:48 AM.

Details

Summary

Our SIGTSTP handler was working, but that was mostly accidental.

The reason it worked is because lldb is multithreaded for most of its
lifetime and the OS is reasonably fast at responding to signals. So,
what happened was that the kill(SIGTSTP) which we sent from inside the
handler was delivered to another thread while the handler was still set
to SIG_DFL (which then correctly put the entire process to sleep).

Sometimes it happened that the other thread got the second signal after
the first thread had already restored the handler, in which case the
signal handler would run again, and it would again attempt to send the
SIGTSTP signal back to itself.

Normally it didn't take many iterations for the signal to be delivered
quickly enough. However, if you were unlucky (or were playing around
with pexpect) you could get SIGTSTP while lldb was single-threaded, and
in that case, lldb would go into an endless loop because the second
SIGTSTP could only be handled on the main thread, and only after the
handler for the first signal returned (and re-installed itself). In that
situation the handler would keep re-sending the signal to itself.

This patch fixes the issue by implementing the handler the way it
supposed to be done:

  • before sending the second SIGTSTP, we unblock the signal (it gets automatically blocked upon entering the handler)
  • we use raise to send the signal, which makes sure it gets delivered to the thread which is running the handler

This also means we don't need the SIGCONT handler, as our TSTP handler
resumes right after the entire process is continued, and we can do the
required work there.

I also include a test case for the SIGTSTP flow. It uses pexpect, but it
includes a couple of extra twists. Specifically, I needed to create an
extra process on top of lldb, which will run lldb in a separate process
group and simulate the role of the shell. This is needed because SIGTSTP
is not effective on a session leader (the signal gets delivered, but it
does not cause a stop) -- normally there isn't anyone to notice the
stop.

Diff Detail

Event Timeline

labath requested review of this revision.Feb 22 2022, 5:48 AM
labath created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 5:48 AM
DavidSpickett added inline comments.Feb 23 2022, 5:36 AM
lldb/test/API/driver/job_control/TestJobControl.py
10

Unused import?

25

Am I correct that self.launch will wait until post_spawn finishes, meaning that self.child will always be set by this point?
(or post_spawn timed out and we don't get here anyway)

Also if the point is just to assert that self.child has been set, it would be a bit clearer to use an assert. Even if it is:

self.assertTrue(hasattr(self, "child"))

At least it looks like a deliberate check rather than a mistake if it fails. (I probably have the arguments the wrong way around, I always do somehow)

lldb/test/API/driver/job_control/shell.py
11

Please comment this function.

I'm not sure if this intends to put the new process into a new process group or the same process group as this script. (but mostly because I haven't used process groups before now)

lldb/tools/driver/Driver.cpp
829

Is sigcont_handler now unused? It only references itself and is registered here.

labath updated this revision to Diff 410800.Feb 23 2022, 6:19 AM

Add comments, delete unused code.

labath marked 4 inline comments as done.Feb 23 2022, 6:19 AM
labath added inline comments.
lldb/test/API/driver/job_control/TestJobControl.py
10

Yes, so are lldb and decorator imports. :)

25

The point of this was that I copy-pasted this code from some other test, which wanted to save some characters by typing child instead of self.child. However, I did not make use of that variable in any new code I actually wrote. :)

I'll just delete this line.

lldb/test/API/driver/job_control/shell.py
11

It creates a new process group.

The way that process groups (and sessions), terminals and signals interact is both awesome and horrifying at the same time. One has to admire the ingenuity of the job control implementation, which completely avoids threads and waits in the shell. But otoh, we're talking about signals, so chances are any random implementation will have a bug/race somewhere, particularly if the app is multi-threaded.

lldb/tools/driver/Driver.cpp
829

The patch actually does delete the function. It's just that the phabricator formatting makes that unobvious, since the patch essentially merges the two functions into one.

I think I mostly get it and the code looks fine, but my signal foo is weak so @mgorny should give a second look.

we use raise to send the signal, which makes sure it gets delivered to the thread which is running the handler

https://man7.org/linux/man-pages/man3/raise.3.html says:

If the signal causes a handler to be called, raise() will return
only after the signal handler has returned.

So what is going to be the "thread which is running the signal handler" here? Will all the lldb threads follow these steps of enter handler -> unregister handler -> raise until finally lldb as a whole sleeps until we get a continue and do the reverse?

labath marked 4 inline comments as done.Feb 23 2022, 7:34 AM

The behavior of the stop signals (SIGTTIN, SIGTTOU, SIGTSTP) is this:

  • if the process has a signal handler, then the kernel picks an arbitrary thread (which does not have the signal blocked) to handle it. As far as the kernel is concerned, that's it. Everything else is up to the handler.
  • if the process has no handler (and the signal is unblocked by at least one thread), then the _entire process_ goes to sleep

So, the answer to your question is: "thread which is running the signal handler" is "whichever is picked by the kernel". Its identity is not important. What's important is that the second signal gets delivered (not "handled", because at that point we have removed the handler) to the same thread, as that's the only one we're sure that will have it unblocked (although, in practice, all threads will probably have it unblocked).

Got it, thanks for the explanation.

What's important is that the second signal gets delivered (not "handled", because at that point we have removed the handler) to the same thread, as that's the only one we're sure that will have it unblocked (although, in practice, all threads will probably have it unblocked).

Right, I was focusing on the "will return only after the signal handler has returned" aspect, but that's not the reason to you're choosing to use raise.

JDevlieghere added inline comments.Feb 23 2022, 12:17 PM
lldb/tools/driver/Driver.cpp
676–677

I see an opportunity for a little RAII helper.

labath added inline comments.Feb 24 2022, 3:03 AM
lldb/tools/driver/Driver.cpp
676–677

What kind of a helper did you have in mind? Practically the entire function consists of setup and teardown in preparation for the raise(signo) call. If I wanted to be fancy I could put all of that in a helper, but I don't think that would make it cleaner. Plus, we also need to be careful about the functions we call from a signal handler, and I really don't know whether e.g. llvm::make_scope_exit is guaranteed to not allocate (heap) memory.

JDevlieghere added inline comments.Feb 24 2022, 9:58 AM
lldb/tools/driver/Driver.cpp
676–677

I was only referring to the Save/RestoreInputTerminalState() part of this function. Something like:

class TerminalStateRAII() {
public:
  TerminalStateRAII(Driver* driver) : driver(m_driver) {
    if (m_driver)
      m_driver->GetDebugger().SaveInputTerminalState();
  }

  ~SignalHelper() {
    if (m_driver)
      m_driver->GetDebugger().SaveInputTerminalState();
  }

private:
  Driver* m_driver;
};

Obviously, this isn't at all important, just something that came to mind.

andcarminati added inline comments.
lldb/tools/driver/Driver.cpp
676–677

I think this is a good idea to reduce code duplication. Another approach:

class TerminalStateRAII() {
public:
  TerminalStateRAII(Driver* driver) : driver(m_driver) {
    SaveInputTerminalState();
  }

  ~TerminalStateRAII() {
    SaveInputTerminalState();
  }

private:
  Driver* m_driver;
  void SaveInputTerminalState(){
    if (m_driver)
      m_driver->GetDebugger().SaveInputTerminalState();
  }
};
JDevlieghere added inline comments.Feb 24 2022, 5:18 PM
lldb/tools/driver/Driver.cpp
676–677

That's a typo on my part, the destructor needs to call RestoreInputTerminalState (as opposed to SaveInputTerminalState).

labath added inline comments.Feb 25 2022, 1:49 AM
lldb/tools/driver/Driver.cpp
676–677

Ok, I see. If this was a more complex function (e.g. multiple return points), then I'd agree, but this function is really simple and linear (just like a signal handler should be). I am not convinced by the "code duplication" argument -- the way I see it, the helper class replaces 4 lines of (simple) code with ~15 lines of boilerplate. And this function is literally the only caller of Save/RestoreInputTerminalState.

andcarminati added inline comments.Feb 25 2022, 5:01 AM
lldb/tools/driver/Driver.cpp
676–677

Considering this argument:

Ok, I see. If this was a more complex function (e.g. multiple return points), then I'd agree, but this function is really simple and linear (just like a signal handler should be).

LGTM

mgorny added inline comments.Mar 2 2022, 11:24 AM
lldb/packages/Python/lldbsuite/test/lldbpexpect.py
27

I think we should follow PEP8 for Python code, i.e. align the first param with the first param from line above.

32
51
lldb/test/API/driver/job_control/shell.py
10
24
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 11:24 AM
labath updated this revision to Diff 412643.Mar 3 2022, 2:22 AM
labath marked 3 inline comments as done.

Address mgorny's comments.

mgorny accepted this revision.Mar 3 2022, 2:37 AM

LGTM (but I cannot test)

This revision is now accepted and ready to land.Mar 3 2022, 2:37 AM
This revision was automatically updated to reflect the committed changes.