This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Skip signal handlers for ignored signals on lldb-server's side when stepping
Needs RevisionPublic

Authored by kpdev42 on Feb 20 2023, 6:11 AM.

Details

Summary

Before this patch, stepping off a breakpoint in lldb might end up inside a signal handler if a signal was received, which would result in continuing and hitting the same breakpoint again, while on the surface no instruction was executed and the user is not interested in that specific signal or its handling. This patch uses the machinery set up for software single-stepping to circumvent this behavior. This changes what a eStateStepping in lldb-server does to a thread on linux platforms: if a single-step is requested right after an ignored (pass=true, stop=false, notify=false) signal is received by the inferior, then a breakpoint is installed on the current pc, and the thread is continued (not single-stepped). When that breakpoint hits (presumably after the user signal handler returns or immediately after the continue, if the handler is not installed), it is removed, the stop is ignored and the thread is continued with single-stepping again. If a stop occurs inside the handler, then the stepping ends there, and the breakpoint is removed as well.

Diff Detail

Event Timeline

kpdev42 created this revision.Feb 20 2023, 6:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2023, 6:11 AM
kpdev42 requested review of this revision.Feb 20 2023, 6:11 AM
  1. There is a situation where the patch might go wrong. When stepping while inside:
    • a signal handler when SA_NODEFER flag is set (the same signal can be received while it is still being handled), and the signal is received again;
    • a handler that is set on multiple signals and they are not masked, and these signals are received one after another;
    • a handler that happens to call the current function

recursion happens and stopping at this specifically set up breakpoint on pc is not valid. With this patch this would result in unexpected stops, though this situation is very specific. Would it make sense to complicate this logic and store current sp or fp values from NativeRegisterContext and compare if they are the same?

  1. The basic behavior is pretty easy to test by sending a lot of signals, but testing specific cases like setting the temporary breakpoint on an address where there is already another breakpoint is tough. One test was added for this, but it does not actually trigger the situation. This very much depends on when the signal is going to be received by the process, and it can't really be controlled.
  1. This slightly changes 1 line of code in NativeProcessFreeBSD, but I don't have the means to test if it even compiles, though the logic should be equivalent.
  1. When I looked at NativeProcessFreeBSD, it seems that software single stepping isn't even being set up there, just checked for when the process stops. As I can see from history, software single stepping with similar code was in the "legacy FreeBSD plugin" which was removed here: https://reviews.llvm.org/D96555 , and then single stepping with shared code was added to the new plugin here: https://reviews.llvm.org/D95802 . Because I couldn't find where the SetupSoftwareSingleStepping would be called for FreeBSD I didn't change that, but this probably should be addressed?
labath requested changes to this revision.Feb 20 2023, 7:33 AM
labath added a reviewer: jingham.
labath added subscribers: jingham, labath.

I'm afraid you're fixing this at the wrong end. This kind of complex thread control does not belong inside the debug stub.

IIUC, the problematic sequence of events is:

  1. lldb sends a vCont:s packet
  2. lldb-server resumes (PTRACE_SINGLESTEP)
  3. process immediatelly stops again (without stepping) due to a pending signal
  4. lldb-server decides to inject the signal and resumes (steps) again
  5. process stops inside the signal handler
  6. confusion ensues

If that's the case, then I think the bug is at step 4, specifically in this code:

// Check if debugger should stop at this signal or just ignore it and resume
// the inferior.
if (m_signals_to_ignore.contains(signo)) {
   ResumeThread(thread, thread.GetState(), signo);
   return;
}

I believe we should not be attempting to inject the signal if the thread was stepping. I think we should change this to report the signal to the client and let it figure out what to do. I.e., change this code into something like:

if (m_signals_to_ignore.contains(signo) && thread.GetState() == eStateRunning) {
   ResumeThread(thread, eStateRunning, signo);
   return;
}
// else report the signal as usual

If that is not enough to fix the bug, then we can figure out what needs to be done on the client. @jingham might have something to say about that.

This revision now requires changes to proceed.Feb 20 2023, 7:33 AM

I'm afraid you're fixing this at the wrong end. This kind of complex thread control does not belong inside the debug stub.

IIUC, the problematic sequence of events is:

  1. lldb sends a vCont:s packet
  2. lldb-server resumes (PTRACE_SINGLESTEP)
  3. process immediatelly stops again (without stepping) due to a pending signal
  4. lldb-server decides to inject the signal and resumes (steps) again
  5. process stops inside the signal handler
  6. confusion ensues

If that's the case, then I think the bug is at step 4, specifically in this code:

// Check if debugger should stop at this signal or just ignore it and resume
// the inferior.
if (m_signals_to_ignore.contains(signo)) {
   ResumeThread(thread, thread.GetState(), signo);
   return;
}

I believe we should not be attempting to inject the signal if the thread was stepping. I think we should change this to report the signal to the client and let it figure out what to do. I.e., change this code into something like:

if (m_signals_to_ignore.contains(signo) && thread.GetState() == eStateRunning) {
   ResumeThread(thread, eStateRunning, signo);
   return;
}
// else report the signal as usual

If that is not enough to fix the bug, then we can figure out what needs to be done on the client. @jingham might have something to say about that.

The full problematic sequence is this:

  • current pc is at a breakpoint, user hits continue (or step, etc.)
  • ThreadPlanStepOverBreakpoint is pushed
  • the plan wants eStateStepping, so lldb sens a vCont;s packet to the lldb-server
  • lldb-server resumes with PTRACE_SINGLESTEP
  • process stops due to a (pass=true, stop=false) signal

Then there are two possibilities, depending on signal filtering settings:

  1. The signal is ignored (notify=false)
    • lldb-server injects the signal back to the process and resumes (steps again)
  2. The signal is not ignored (notify=true)
    • lldb-server sends the stop reason with signal to the lldb client
    • lldb does not care, because it is configured to not stop, so wants to step again, sends the packet

After 1. and 2., the events are the same:

  • process stops due to stepping into the signal handler, lldb client sees a successful step
  • StepOverBreakpoint plan sees a pc != breakpoint address, thinks its job is done, pops off successfuly with an autocontinue
  • process resumes, gets out of the handler right back to the breakpoint address
  • the breakpoint hits, so the user sees a second breakpoint hit, but the instruction under that address was still not executed

Technically, this is correct, because the pc was at a breakpoint, the process did execute some instructions and went back to the breakpoint, and the program state could have changed. But this is very annoying when a lot of signals are received in the background (and the signal is not interesting to the user, as it, for example, comes from a low level library, the same reason real-time signals are stop=false notify=false by default right now: https://reviews.llvm.org/D12795)

So it does not depend on the signal filtering (it can be configured anyway), but the problem I would think is that client does not handle the situation with signals while stepping (at least stepping off a breakpoint) properly, and lldb-server does not help.

Gdb does this for example:

GDB optimizes for stepping the mainline code. If a signal that has handle nostop and handle pass set arrives while a stepping command (e.g., stepi, step, next) is in progress, GDB lets the signal handler run and then resumes stepping the mainline code once the signal handler returns. In other words, GDB steps over the signal handler. This prevents signals that you’ve specified as not interesting (with handle nostop) from changing the focus of debugging unexpectedly.

(https://sourceware.org/gdb/onlinedocs/gdb/Signals.html), which seems reasonable.

If this logic does not belong in lldb-server, then it could be fixed right inside the StepOverBreakpoint plan, by enabling the breakpoint being stepped over when a signal is received, waiting for it to hit when the potential handler has returned, and then trying to step off again. But then doing a step-into from a line with a breakpoint will step over the signal handler, and from a line without a breakpoint will stop inside the handler, if a signal is received. Then probably creating a new ThreadPlan instead with the same logic, executing on top of the plan stack, is the way to go?

Anyway, in my attempts at fixing it on the client side, additionally, for some reason the following situation is often triggered:

BreakpointSiteSP bp_site_sp =
    GetProcess()->GetBreakpointSiteList().FindByAddress(thread_pc);
if (bp_site_sp) {
...
    ThreadPlanSP step_bp_plan_sp(new ThreadPlanStepOverBreakpoint(*this));
    ...
        QueueThreadPlan(step_bp_plan_sp, false);

pushes a step over breakpoint plan, and the breakpoint is skipped. This should not happen, but I couldn't reproduce this without the patch.

Gdb does this for example:

GDB optimizes for stepping the mainline code. If a signal that has handle nostop and handle pass set arrives while a stepping command (e.g., stepi, step, next) is in progress, GDB lets the signal handler run and then resumes stepping the mainline code once the signal handler returns. In other words, GDB steps over the signal handler. This prevents signals that you’ve specified as not interesting (with handle nostop) from changing the focus of debugging unexpectedly.

(https://sourceware.org/gdb/onlinedocs/gdb/Signals.html), which seems reasonable.

I agree that this is the most useful behavior in most normal situations.

If this logic does not belong in lldb-server,

Yes, I believe this is too complex to be done inside lldb-server. I might reconsider if you can show that the gdb logic you described above is also implemented on the gdbserver side, but I'd rather not.. :)

then it could be fixed right inside the StepOverBreakpoint plan, by enabling the breakpoint being stepped over when a signal is received, waiting for it to hit when the potential handler has returned, and then trying to step off again. But then doing a step-into from a line with a breakpoint will step over the signal handler, and from a line without a breakpoint will stop inside the handler, if a signal is received. Then probably creating a new ThreadPlan instead with the same logic, executing on top of the plan stack, is the way to go?

Sounds approximately right. The way I'd probably do is it to have the StepOverBreakpoint step into the signal handler, and then push a StepOut (which hopefully knows how to step out of a signal handler -- if not it may need some tuning, or creating a specialized StepOutOfSignal plan). After the step out is done, it can retry the step over operation.

@jingham is our thread plan guru, though, so he may have a better idea...

@jingham could you please take a look?

lldb/source/Plugins/Process/Utility/NativeProcessContinueUntilBreakpoint.cpp