This is an archive of the discontinued LLVM Phabricator instance.

Disable exit-on-SIGPIPE in lldb
ClosedPublic

Authored by vsk on Oct 17 2019, 4:09 PM.

Details

Summary

Occasionally, during test teardown, LLDB writes to a closed pipe.
Sometimes the communication is inherently unreliable, so LLDB tries to
avoid being killed due to SIGPIPE. Actually, it explicitly calls
signal(SIGPIPE, SIG_IGN). However, LLVM's default SIGPIPE behavior is
to exit with IO_ERR. Opt LLDB out of that.

I expect that this will resolve some LLDB test suite flakiness (tests
randomly failing with IO_ERR) that we've seen since r344372.

rdar://55750240

Diff Detail

Event Timeline

vsk created this revision.Oct 17 2019, 4:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2019, 4:09 PM
jfb added inline comments.Oct 17 2019, 4:20 PM
lldb/tools/driver/Driver.cpp
864

Could this instead pass a lambda, like other signal handlers? So sigpipe would call any user-specified lambda, and by default would instead call exit.

vsk planned changes to this revision.Oct 17 2019, 4:23 PM
vsk marked an inline comment as done.
vsk added inline comments.
lldb/tools/driver/Driver.cpp
864

Ah, sure. That would make things a bit more customizable.

vsk updated this revision to Diff 225546.Oct 17 2019, 4:37 PM
vsk edited the summary of this revision. (Show Details)
  • Allow setting a SIGPIPE handler.
jfb accepted this revision.Oct 17 2019, 4:55 PM
jfb added a subscriber: jordan_rose.

More of an FYI, @jordan_rose might be interested in this.

This revision is now accepted and ready to land.Oct 17 2019, 4:55 PM
jordan_rose added inline comments.Oct 17 2019, 5:03 PM
llvm/lib/Support/Unix/Signals.inc
372–373

Should it be doing the same sort of exchange as the interrupt function?

vsk marked an inline comment as done.Oct 18 2019, 8:16 AM
vsk added inline comments.
llvm/lib/Support/Unix/Signals.inc
372–373

The interrupt handler is apparently meant to be disabled after it’s used once (not sure why, the docs don’t say). As lldb is long-lived, I expect it should survive multiple SIGPIPEs.

JDevlieghere accepted this revision.Oct 18 2019, 8:37 AM

Thanks for fixing this, Vedant!

jordan_rose added inline comments.Oct 18 2019, 9:38 AM
llvm/lib/Support/Unix/Signals.inc
372–373

I guess the SIGINT one makes sense so that you can kill a stuck process with ^C^C.

vsk planned changes to this revision.Oct 18 2019, 10:22 AM

The death tests are flaky. I've noticed two issues:

  1. When run under lit, the DisableExitOnSIGPIPE doesn't actually exit when it receives SIGPIPE. Dtrace suggests that the unit test process inherits an "ignore" sigaction from its parent.
  2. When run in parallel, exiting causes the unit test to run atexit destructors for other unit tests lumped into the SupportTests binary. One of these is a condition_variable destructor and it hangs. Also, gtest warns: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 21 threads.

So I plan to give up on testing "EnableExitOnSIGPIPE" and will write a non-death test to get some coverage.

vsk updated this revision to Diff 225661.Oct 18 2019, 10:27 AM

Replace the death tests with a non-death test. I plan to commit this later in the day if there aren't any objections.

This revision is now accepted and ready to land.Oct 18 2019, 10:27 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2019, 2:04 PM
labath added a subscriber: labath.Oct 19 2019, 12:01 AM
In D69148#1714785, @vsk wrote:

The death tests are flaky. I've noticed two issues:

  1. When run under lit, the DisableExitOnSIGPIPE doesn't actually exit when it receives SIGPIPE. Dtrace suggests that the unit test process inherits an "ignore" sigaction from its parent.
  2. When run in parallel, exiting causes the unit test to run atexit destructors for other unit tests lumped into the SupportTests binary. One of these is a condition_variable destructor and it hangs. Also, gtest warns: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 21 threads.

So I plan to give up on testing "EnableExitOnSIGPIPE" and will write a non-death test to get some coverage.

gtest gives you a (somewhat rudimentary) mechanism of ensuring death tests run in a single threaded fashion. It consists of tacking "DeathTest" at the end of your test case name (like TEST(SignalDeathTest, Whatever)). These kinds of tests are run first, hopefully before the tests which spin up various other threads.

As for the parent "ignore" action (it's more likely to be a signal mask, actually), this can be reset with an appropriate sigprocmask(2) call.

lldb/tools/driver/Driver.cpp
850

I don't think this line does anything anymore.

vsk marked an inline comment as done.EditedOct 21 2019, 3:41 PM
In D69148#1714785, @vsk wrote:

The death tests are flaky. I've noticed two issues:

  1. When run under lit, the DisableExitOnSIGPIPE doesn't actually exit when it receives SIGPIPE. Dtrace suggests that the unit test process inherits an "ignore" sigaction from its parent.
  2. When run in parallel, exiting causes the unit test to run atexit destructors for other unit tests lumped into the SupportTests binary. One of these is a condition_variable destructor and it hangs. Also, gtest warns: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 21 threads.

So I plan to give up on testing "EnableExitOnSIGPIPE" and will write a non-death test to get some coverage.

gtest gives you a (somewhat rudimentary) mechanism of ensuring death tests run in a single threaded fashion. It consists of tacking "DeathTest" at the end of your test case name (like TEST(SignalDeathTest, Whatever)). These kinds of tests are run first, hopefully before the tests which spin up various other threads.

A problem I'm encountering with this is that the static initializer from Signals.inc don't appear to be re-run between death tests. This makes the death tests pretty brittle, imo, as swapping the order of the tests can break them. Do you think the extra coverage is worth it? (See https://reviews.llvm.org/D69283 for what this looks like.)

As for the parent "ignore" action (it's more likely to be a signal mask, actually), this can be reset with an appropriate sigprocmask(2) call.

Thanks, that's a great catch.

lldb/tools/driver/Driver.cpp
850

I don't think that's true. LLVM still does registerHandler(SIGPIPE, SignalKind::IsKill) in RegisterHandlers. At least, removing the signal(SIGPIPE, SIG_IGN) calls causes the unit test to "fail" (i.e. the test RUNs but doesn't reach OK).

In D69148#1717361, @vsk wrote:

A problem I'm encountering with this is that the static initializer from Signals.inc don't appear to be re-run between death tests. This makes the death tests pretty brittle, imo, as swapping the order of the tests can break them. Do you think the extra coverage is worth it? (See https://reviews.llvm.org/D69283 for what this looks like.)

Yeah, not being able to run each test in a separate process is a big problem here. I'm not too worried about adding those tests here. However, looking at those tests, and your inline comment, I get the impression you're not understanding how the various signal functions actually work. One definitely shouldn't be calling the signal function manually, and also using the llvm signal handler functions (at least, not without a very big comment explaining what he's doing). They both do the same thing -- set the global signal handler. And since the llvm handling is initialized lazily this sets us up for all sorts of unpredictible and confusing behavior. See the inline comment for details.

lldb/tools/driver/Driver.cpp
850

Ah, right, I see now. This does have a very subtle change in behavior -- it changes the signal handler that llvm restores *after* the signal has been received the first time (Signals.inc:355). Normally, it would put SIG_DFL there (which would kill the process). But with this, it puts SIG_IGN there. So this changes what we do when we receive SIGPIPE for the *second* time.

This means that the first SIGPIPE signal is ignored by the virtue of us calling SetPipeSignalFunction(nullptr), but all subsequent SIGPIPEs are ignored thanks to this SIG_IGN. It also means this whole talk of "being able to survive multiple SIGPIPEs" is moot, as llvm does not allow that. In fact it looks like the very first occurence of SIGPIPE will cause _all_ signal handlers to be uninstalled. I think this is a very confusing situation to be in.

If we don't want to make llvm support handling/ignoring multiple SIGPIPEs correctly, then I think that a much better solution would be to just work around this in lldb by making sure the llvm handler is never invoked for SIGPIPE (this can be arranged by making sure signal(SIGPIPE, SIG_IGN) is called *after* llvm installs its handlers.

I'm not too worried about adding those tests here.

I'm not too worried about *not* adding those tests here.

vsk marked an inline comment as done.Oct 21 2019, 5:20 PM

I admit I hadn't paid enough attention to the order in which signal handlers were registered in lldb.

lldb/tools/driver/Driver.cpp
850

Point taken. I agree that when a SIGPIPE is received, it doesn't make sense to reset the signal handler state as it was prior to llvm's handler registration. Do you think it'd be reasonable to:

  • In lldb, force llvm to register its handlers, then immediately ignore SIGPIPE.
  • Revert the SetPipeSignalFunction changes, as they would no longer be required.
labath added inline comments.Oct 23 2019, 6:15 AM
lldb/tools/driver/Driver.cpp
850

As far as workarounds go, I think that's the best we can do know (+ add a big comment explaining what we're doing there).

I think that overall the llvm signal handling needs a lot more work, but as that would probably require buy-in from various actors, I don't think that it should be you who has to fix all that.

FTR, I find the current signal handling very strange, and completely unsuitable for something which claims to be a library. I don't believe a library should install any signal handler unless it's explicitly asked to, or, at the very least, it should provide a mechanism to opt out of this.
I would make this handler installation a part of InitLLVM, or something similar, and make it possible to customize its behavior a lot more. The current implementation is very clearly tailored for one-shot non-interactive tools that immediately exit when encountering a problem. Besides SIGPIPE (which even the non-interactive tools might want to ignore), it's handling of SIGINT is also definitely not what lldb wants to do (forward it to the debugged process). There are even some JIT tools that might want to catch & recover from SIGSEGVs, but this implementation makes that very hard to do because of the whole race about who installs the signal handlers last.

vsk added a comment.Oct 24 2019, 1:24 PM

Reverted in d0bd3fc88be.