This is an archive of the discontinued LLVM Phabricator instance.

If kevent() is interrupted by signal (or is being debugged) and we get EINTR, retry
Needs RevisionPublic

Authored by jasonmolenda on Jan 17 2018, 3:29 PM.

Details

Summary

I hit this while trying to debug lldb-server. When lldb-server is waiting for connections, it will be in MainLoop::RunImpl::Poll(), in the kevent() call. On darwin systems, attaching to the process (or interrupting it to add a breakpoint or detach) will interrupt the system call it is sitting in. We will get back an error return value (-1) and errno will be set to EINTR.

Currently we flag this as an error and lldb-server exits with

error: IO object is not valid.

which makes it a bit difficult to debug.

This section of code is only used on the *BSD systems and Darwin (it's #ifdef'ed HAVE_SYS_EVENT_H). I tested it on Darwin and on Linux (before noticing that linux doesn't have sys/event.h) but don't have access to a *BSD system. I don't expect any problems handling the interrupted kevent() call on those systems as I'm doing here.

Diff Detail

Repository
rL LLVM

Event Timeline

jasonmolenda created this revision.Jan 17 2018, 3:29 PM
davide requested changes to this revision.Jan 17 2018, 3:41 PM
davide added reviewers: zturner, labath, vsk, aprantl.

I think this one is reasonable, but please wait for another opinion before committing (and write a test). Pavel touched this code very recently and added tests for this so it should be possible to extend.
@labath, what do you think?

This revision now requires changes to proceed.Jan 17 2018, 3:42 PM

Hmmm, testing this would require launching lldb-server, attaching to it from a debugger, detaching, and seeing if lldb-server is still running.

clayborg accepted this revision.Jan 17 2018, 3:46 PM
clayborg added a subscriber: clayborg.

This is a very common unix thing.

vsk requested changes to this revision.Jan 17 2018, 3:57 PM

Why not take an approach similar to the one in https://reviews.llvm.org/D41008? It looks like it's possible to set up a poll loop, call signal(), and verify that the loop is still running.

In D42206#979566, @vsk wrote:

Why not take an approach similar to the one in https://reviews.llvm.org/D41008? It looks like it's possible to set up a poll loop, call signal(), and verify that the loop is still running.

Yes, that was what I had in mind.

I tried sending signals to lldb-server via kill() and the signal handler caught them, the bit of code I had printing out the return value & errno value never got executed. The only way I was able to repo this was by debugging lldb-server.

I tried sending signals to lldb-server via kill() and the signal handler caught them, the bit of code I had printing out the return value & errno value never got executed. The only way I was able to repo this was by debugging lldb-server.

I think Pavel might have ideas on how to test this. Let's wait for him and see. We could all learn something :)

vsk added a comment.Jan 17 2018, 4:06 PM

I tried sending signals to lldb-server via kill() and the signal handler caught them, the bit of code I had printing out the return value & errno value never got executed. The only way I was able to repo this was by debugging lldb-server.

Is it possible to unregister the handler within the unit test?

Are we going to test each unix call that can fail with EINTR? Seems a bit overkill.

vsk added a comment.Jan 17 2018, 4:28 PM

Are we going to test each unix call that can fail with EINTR? Seems a bit overkill.

I think going forward, we should test all functional changes unless it really is prohibitively expensive to do so. Useful changes like this shouldn't be lost to accidental/unrelated commits, and having a regression test is a good way to ensure that.

I actually enjoy debugging things like this, so I tried playing around and came up with this test case:

// Test that a signal which is not monitored by the MainLoop does not cause a premature exit.
TEST_F(MainLoopTest, UnmonitoredSignal) {
  MainLoop loop;
  Status error;
  struct sigaction sa;
  sa.sa_sigaction = [](int, siginfo_t *, void *) { };
  sa.sa_flags = SA_SIGINFO; // important: no SA_RESTART
  sigemptyset(&sa.sa_mask);
  ASSERT_EQ(0, sigaction(SIGUSR2, &sa, nullptr));

  auto handle = loop.RegisterSignal(SIGUSR1, make_callback(), error);
  ASSERT_TRUE(error.Success());
  std::thread killer([]() {
    sleep(1);
    kill(getpid(), SIGUSR2);
    sleep(1);
    kill(getpid(), SIGUSR1);
  });
  ASSERT_TRUE(loop.Run().Success());
  killer.join();
  ASSERT_EQ(1u, callback_count);
}

It's not the nicest of tests: it uses sleep, which is necessary to make sure the main thread gets a chance to block in kevent(2), and one needs to be very careful when doing signal catching in a multithreaded environment (e.g. creating the killer thread before registering the SIGUSR1 handler would make the test flaky on linux, as the new thread would not inherit the signal mask which has USR1 blocked). However, I think it should be sufficient for the purposes of this patch.

In case anyone is interested, the fact that attaching a debugger produces an observable side-effect (EINTR) in the debugged process is considered a bug by the linux kernel folks and they have worked hard to fix it. Right now linux kernel will automatically restart most syscalls after a debugger resume regardless of the value of SA_RESTART flags or anything like that. There is just a handful of syscalls that are really complex and they haven't figured out how to resume them part-way. These still return EINTR.

source/Host/common/MainLoop.cpp
108–112

There's an llvm::sys::RetryAfterSignal which implements the EINTR loop. I had to pass out_events as &out_events[0] to make template deduction happy.

krytarowski added a subscriber: krytarowski.EditedFeb 14 2019, 7:22 AM

'attaching a debugger produces an observable side-effect (EINTR) in the debugged process is considered a bug by the linux kernel folks'

There are differences between Linux and BSD.

On BSD we try hard to make return of such syscall peacefully with valid (sometimes partial) input (like returning already received bytes int read(2)), while on Linux there is hard interruption of operation.

Also we can introduce EINTR easily on BSD in arbitrary moments with ctrl-T (SIGINFO) that is used in many programs to pass information and the unprepared ones can misbehave due to EINTR.

So even if this could be a bug to cause EINTR in some scenarios on BSD under a debugger, it's a program fault to not handle it.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 7:22 AM
labath resigned from this revision.Aug 9 2019, 2:07 AM