This is an archive of the discontinued LLVM Phabricator instance.

Don't stop execution in batch mode when process stops with SIGINT or SIGSTOP
ClosedPublic

Authored by tatyana-krasnukha on Sep 19 2019, 3:32 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

labath added a subscriber: labath.Sep 20 2019, 5:03 AM

What behavior does this change exactly? This is not immediately obvious to me, and probably anyone else not intimately familiar with the stop reason machinery (= everyone except Jim, probably). Could you add a test case for this (which I am hoping will also shed some light on what is this patch doing)?

The flag m_abnormal_stop_was_expected was added in the commit to avoid stop on crash because attach always stops
with a SIGSTOP on OSX. I faced the same issue with SIGTRAP and SIGSTOP sent by a gdb-remote stub on some commands. This patch filters out such harmless signals instead of setting the flag everywhere in commands.

+1 on a test case. Other than that this seems like a good improvement to the way we filter stop reasons.

source/Interpreter/CommandInterpreter.cpp
2164 ↗(On Diff #220913)

I'm curious why you swapped the two operands. Is there a benefit in eStopReasonException == reason compared to reason == eStopReasonException?

source/Interpreter/CommandInterpreter.cpp
2164 ↗(On Diff #220913)

This technique prevents unintended assignment instead of comparison since left-hand value is constant.

JDevlieghere added inline comments.Sep 20 2019, 11:05 AM
packages/Python/lldbsuite/test/driver/batch_mode/TestBatchMode.py
83 ↗(On Diff #221058)

Update the comment?

source/Interpreter/CommandInterpreter.cpp
2164 ↗(On Diff #220913)

Ah, because the StopReason is not const. Fair enough, thanks for the explanation. I don't think I've seen that in other parts of LLDB but I'm not opposed to it.

jingham requested changes to this revision.EditedSep 20 2019, 11:18 AM

This is all about the decision in "batch" mode for what constitutes a stop by the program that should cause batch mode to return the control to the user.

I don't think anybody would want to stop for SIGSTOP and SIGINT if the program under the debugger can recover.

However, builtin_trap on most UNIX systems results in a SIGTRAP. It's not at all uncommon to use builtin_trap to signal a fatal condition in a library (for instance Swift uses this on array overflow and for a number of other error conditions.) It has the benefit over "abort" that it doesn't push a bunch of stack frames onto the stack before dying, which makes crash analysis easier. This use means we certainly want batch mode to stop for SIGTRAP.

If the SIGTRAP is for a breakpoint, lldb will convert the incoming signal event into a breakpoint-hit event, which doesn't cause the batch mode to stop.

So maybe we don't actually need to have "should suppress" be true for SIGTRAP. We should be able to distinguish between traps that were really in the code of the binary and traps lldb inserted, and make the decision based on that. That "process handle" behavior was just copied from gdb, which kept the breakpoint SIGTRAP's raw. And after all, non-breakpoint SIGTRAP's are generally fatal so whether you pass them or not is not of much real concern.

I can't tell how real SIGTRAP signaled breakpoints work through lldb because on MacOS because don't come in as SIGTRAP's (they come in as Mach exceptions). It would be worth seeing how this works for Linux hosts.

But in any case, a SIGTRAP that doesn't have an underlying breakpoint should definitely cause batch mode to stop. Under what circumstances did you need to suppress it?

source/Interpreter/CommandInterpreter.cpp
2164 ↗(On Diff #220913)

We don't use that convention anywhere else in lldb. Introducing it piecemeal just makes the code harder to read, and I personally find this convention makes code generally harder to parse in my head.

This revision now requires changes to proceed.Sep 20 2019, 11:18 AM

This is an interesting problem. As Jim suspects, breakpoint int3s come out as stop reason = breakpoint in lldb, and non-breakpoint int3 as SIGTRAP. So, in theory, it should be possible to change the SIGTRAP behavior so that it does not impact SIGTRAPs from lldb-generated breakpoints, though it will require some fiddling with the QPassSignals packet -- the way it's implemented now is that it would bypass the stop reason determination and forcefully reinject all signals (even the "debugger generated" ones into the process).

That said, I'm not sure this is actually a good idea. :) One of the uses of application-generated SIGTRAPs I've seen (and used) was to implement a poor man's "fast conditional breakpoint" -- you insert something like if (do_i_want_to_stop) int3() into the code, and let the application run under a debugger. The intention here is to be able to continue the program from this "breakpoint" and this will only work if the debugger does not reinject the SIGTRAP.

Now, this is not a super common pattern, so it might be fine to just ask to user to change the SIGTRAP disposition if he wants to do this. However, given that this is the default in gdb too, I am not sure if it is such a good idea to change it. As this problem is specifically about the behavior of process attach, maybe the problem should be solved at a higher level? Could we make "process attach" never set the "crashed" flag? Attaching to a process just at the moment it is about the crash is very unlikely, and I suspect it is actually impossible (on linux at least).

Alternatively, (and this is something that would be useful in in (lit) tests too), we could add a mechanism to selectively suppress the stop-on-error behavior. Something like nohup process attach perhaps?

First off, note that this becomes complicated because this patch tied the decision of whether a given stop reason should warrant returning control to the user in lldb's "batch mode", and whether the signal should be passed back to the target. That isn't a necessary binding, and indeed it might be surprising to people to find out that changing the "target handle -p" option changes the behavior of batch mode. So I'm not 100% sure I think these two notions should be tied together.

Pavel's example shows why these two choices might be better separated. If you are using __builtin_trap to implement a stop when in the debugger of some sort, it seems likely you would want to suppress the signal, but it also seems likely that when you are running in batch mode, you would want the batch mode run to return control to the user at this point. batch mode is really so you can run some scenario in a loop, looking for something odd to happen. These debug breaks are probably a signal of that "something odd"...

To Pavel: this problem is not only about the behavior of process attach, but for process launch too (and possibly something else). That's why I'm trying to move this logic out of specific commands.

Now I see that there are too many different usages of SIGTRAP and most of them are architecture-dependent (like __builtin_trap which can send SIGTRAP for one processor and SIGILL for another). So, a process cannot know how to distinguish the signal sender's intention. The problem is common to gdb.

tatyana-krasnukha retitled this revision from Use UnixSignals::ShouldSuppress to filter out signals a process expects to Don't stop execution in batch mode when process stops with SIGINT or SIGSTOP.
tatyana-krasnukha edited the summary of this revision. (Show Details)

Removed using UnixSignals::ShouldSuppress, just check if a signal is either SIGINT or SIGSTOP.

Fixed header of the test.

tatyana-krasnukha marked 2 inline comments as done.Sep 25 2019, 6:42 AM
jingham accepted this revision.Sep 25 2019, 3:26 PM

LGTM. Thanks for sticking with this.

This revision is now accepted and ready to land.Sep 25 2019, 3:26 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2019, 3:56 AM