Page MenuHomePhabricator

[lldb/Target] Select most relevant frame only in case of signal
Needs ReviewPublic

Authored by PatriosTheGreat on May 27 2021, 11:06 AM.

Details

Summary

In order to select most relevant frame the debugger needs to unwind current frames for each thread which could be slow.
This is a problem in case of executing with remote debugger attached and conditional breakpoints set.
In that case the debugger will stop the execution on a breakpoint hit to run conditional expression and will resume the execution after.
If there are a lot of threads (90+) the simple "for" loop with 50 iterations and conditional breakpoint could take more than 2 minutes to execute.
From my observation most of this time will be spent on SelectMostRelevantFrame method (since it need to unwind stack for all threads).

Since the most relevant frame is needed only for assert failure it looks like we can collect it only for signals.

Diff Detail

Event Timeline

PatriosTheGreat requested review of this revision.May 27 2021, 11:06 AM
mib requested changes to this revision.May 27 2021, 11:43 AM

At first sight, this looks like a fair addition, but this feature is also used to report Swift Runtime Failure in the debugger (and Xcode) ...

However, a Swift Runtime Failure is triggered as an exception not a signal so your change would break the feature. May be we could add a method that looks up the current StopReason in a list of supported stop reasons before selecting the most relevant frame ?

This revision now requires changes to proceed.May 27 2021, 11:43 AM
jingham requested changes to this revision.May 27 2021, 11:54 AM
jingham added a subscriber: jingham.

It is incorrect to say that SelectMostRelevantFrame is only relevant for asserts. The FrameRecognizer is a generic mechanism, and you have no a priori way to know what kinds of thread stops it wants to operate on. So this patch is not correct.

Other than calling the Frame Recognizers, SelectMostRelevantFrame does NOT trigger a stack walk itself.

So if there is a frame recognizer that is doing a frame walk on threads that aren't relevant to it, then changing the frame recognizer to check that the stop reason of its thread makes sense before doing any work seems a better solution to your problem.

PatriosTheGreat requested review of this revision.May 31 2021, 3:37 AM

Thanks for feedback.

I made a small sample: https://pastebin.com/F8nde1zi
Compile command: clang++ -O0 -g -pthread sample.cpp
LLDB command: breakpoint set -f sample.cpp -l 24 --condition 'i == 100'
This break should not be hit due to the condition.

I got followed results for local and remote run:
LLDB 11:
Local:
Time difference = 10 330 [ms]
Remote:
Time difference = 189 707 [ms]

Patched (turned off select of the most relevant frame for threads which were stopped without a reason):
Local:
Time difference = 7 865 [ms]
Remote:
Time difference = 13 630 [ms]

Looks like there is a performance improvement even with local run, but with remote run it becomes much more significant.

It is incorrect to say that SelectMostRelevantFrame is only relevant for asserts. The FrameRecognizer is a generic mechanism, and you have no a priori way to know what kinds of thread stops it wants to operate on. So this patch is not correct.

Other than calling the Frame Recognizers, SelectMostRelevantFrame does NOT trigger a stack walk itself.

So if there is a frame recognizer that is doing a frame walk on threads that aren't relevant to it, then changing the frame recognizer to check that the stop reason of its thread makes sense before doing any work seems a better solution to your problem.

From performance analyzer I see that SelectMostRelevantFrame calls StackFrameList::GetFrameAtIndex -> StackFrameList::GetFramesUpTo -> UnwindLLDB::DoGetFrameInfoAtIndex -> UnwindLLDB::AddFirstFrame
UnwindLLDB::AddFirstFrame calls two expensive methods: UnwindLLDB::UpdateUnwindPlanForFirstFrameIfInvalid (takes around 2/3 of execution time) and UnwindLLDB::RegisterContextUnwind (takes around 1/3 of execution time)

Does frame recognition relevant for threads which were stopped without any reason?
May I filter this logic like this:

lldb::StopReason stop_reason = GetStopReason();
if (stop_reason != lldb::StopReason::eStopReasonNone)
  SelectMostRelevantFrame();

If this is also an incorrect solution may we somehow select most relevant frame without calling of UnwindLLDB::DoGetFrameInfoAtIndex?

Select most related frame only in threads which were stopped with reason.
This diff is for further discussion.

jingham added a comment.EditedJun 1 2021, 3:56 PM

One of the "frame recognizers" - probably the one that's causing this particular problem - is the "assert" recognizer. The idea is that when you call "assert" in user code, you pretty much always end up with a 4 or 5 frames deep stack that wanders it's way from assert to __pthread_kill. But you pretty much never care about any of those stack frames. The stack frame you really want to see is the caller of the assert. So the assert recognizer detects a thread in that situation and resets the selected frame to the assert caller.

The assert recognizer is a little special because the process is about to die, so what it does after this stop is not so crucial. But this is not a guarantee for recognizers, they are just a general mechanism to look for possibly useful representations of a stack frame.

So if you only applied recognizers on frames that stopped "with a reason", you would get into the situation where one thread stops because of a breakpoint, say, and gets a cooked presentation because it triggered a recognizer, but then you continue and another thread stops - say for another breakpoint. While you're stopped if you go back to look at the first thread you might see that, even though it had not made any progress, it's presentation was different. That's because it didn't have a stop reason and so didn't trigger the recognizer on this stop. That's a little weak even theoretically, but to a user, that's going to be quite confusing.

So, first off, in situations which are highly sensitive to doing general work on "the stacks of all threads", you probably just want to turn frame recognizers off. To do that, call frame recognizer clear. Then none of this work will be done.

But there are two other things to try if you want to keep these recognizers on.

The first is to see if the recognizers can be run lazily when the user prints a thread backtrace. If someone stops in the debugger because of a breakpoint, say, and never does a "thread list" and only backtraces the stopped thread, they should not pay the cost of running the recognizers on the other threads. It might take a little work to find/make a hook for "thread about to be presented to the user", but that would be a better place to put this computation, and would nicely avoid the problem you are seeing.

The other thing is that the way the assert frame recognizer is currently written it will always unwind 5 frames looking for the signature of an assert -> __pthread_kill (or whatever it is on other systems...). You could probably rewrite this to be more careful, and only continue unwinding if the first frame or two looked likely to lead to an assert. You're going to have to do this somewhat carefully, however, since you don't want to miss actual asserts.

Thanks for your suggestions.
I tried both of them, however the solution to patch assert recognizer doesn’t solve the original performance issue. Since the performance degradation doesn’t come from the recognizer. It appears since the LLDB needs to unwind at least one stack frame for each thread in order to determine does the LLDB even need to run recognizers (see this line: https://github.com/llvm/llvm-project/blob/main/lldb/source/Target/Thread.cpp#L587)

The solution to select the most relevant frame at the moment when the user actually needs a stack frame seems to fork fine. If I understand correctly it can be done at the Thread::GetStackFrameAtIndex method, since to provide stack trace view to the user all clients need to call it for all stack frames.

One problem I see in this solution is that recognizers should be aware to not call Thread::GetStackFrameAtIndex for the zeroth frame (which they actually got as a parameter), since otherwise it will bring to the infinity recursion. See changes to AssertFrameRecognizer.cpp

I’ve attached this solution to the current revision for further discussion.

jingham added a comment.EditedJun 8 2021, 11:42 AM

"Unwinding" the zero-th frame is really just getting the PC & FP. Since you are very likely to need this information on any given stop, lldb has accelerated versions of the gdb-remote stop reply packets that provide this information up front so this sort of request is cheap to fulfill. I know that debugserver & lldbserver both provide these accelerated stop-reply packets. So if you are using either of these stubs, we should figure out why asking for frame 0 is slowing you down.

If you are using a different stub and can either switch to using the lldb-server stub - or fixing your stub to provide the accelerated stop reply packets - your lldb sessions will be a lot more efficient...

I don't think it's right to do it in GetStackFrameAtIndex. For instance, suppose lldb stops somewhere, and the user does a backtrace, then they select a different frame on the stack, and then invoke some other command that happens to ask for the frame at index 0. If that subsequent GetStackFrameAtIndex(0) request triggers the recognizer to select a different frame, that would be surprising and not desirable.

I think you need to do this somewhere in the StackFrameList, somewhere where we start filling in frames (maybe GetFramesUpTo) but you are going to have to be careful here to avoid recursion as well.

Thanks for the review feedback.
Looks like moving frame recognizer call to StackFrameList::GetFramesUpTo also fixes infinity recursion issue.

Regarding the zeroth frame issue: I'm actually using the lldb-server compiled with followed cmake command:
cmake -G Ninja -DLLVM_ENABLE_PROJECTS='lldb;clang;libcxx' ../llvm

I see a performance improvement even on a local run, however it's not as huge:
In the sample I provided previously with this patch the execution time will be around 4 seconds instead of 6 seconds without it during the local run.

I see at the profiler that asking 0th frame triggers UnwindLLDB::DoGetFrameInfoAtIndex -> UnwindLLDB::AddFirstFrame -> UnwindLLDB::UpdateUnwindPlanForFirstFrameIfInvalid -> UnwindLLDB::AddOneMoreFrame -> UnwindLLDB::GetOneMoreFrame -> RegisterContextUnwind::RegisterContextUnwind -> RegisterContextUnwind::InitializeNonZerothFrame

  • which is costly.

If I understand correctly the LLDB won't do any of this for background threads by default.

Could it be that accelerated stop-reply packets you said previously is turned off by default in lldb-server and I need to set a special setting to turn them on?