This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
source/Interpreter/
-
Interpreter/
1/2
CommandInterpreter.cpp
-
test/Shell/Driver/
-
Shell/
-
Driver/
3/4
CommandOnCrashMultiThreaded.test
-
Inputs/
2/3
CommandOnCrashMultiThreaded.cpp

Differential D134037

[lldb] Fix CommandInterpreter::DidProcessStopAbnormally() with multiple threads
ClosedPublic

Authored by mstorsjo on Sep 16 2022, 6:23 AM.

Download Raw Diff

Details

Reviewers

labath
DavidSpickett
jingham

Commits

rG8a3597d73c8f: [lldb] Fix CommandInterpreter::DidProcessStopAbnormally() with multiple threads

Summary

If a process has multiple threads, the thread with the stop
info might not be the first one in the thread list.

On Windows, under certain circumstances, processes seem to have
one or more extra threads that haven't been launched by the
executable itself, waiting in NtWaitForWorkViaWorkerFactory. If the
main (stopped) thread isn't the first one in the list (the order
seems nondeterministic), DidProcessStopAbnormally() would return
false prematurely, instead of inspecting later threads.

The main observable effect of DidProcessStopAbnormally() erroneously
returning false, is when running lldb with multiple "-o" parameters
to specify multiple commands to execute on the command line.

After an abnormal stop, lldb would stop executing "-o" parameters
and execute "-k" parameters instead - but due to this issue, it
would instead keep executing "-o" parameters as if there was no
abnormal stop. (If multiple parameters are specified via a script
file via the "-s" option, all of the commands in that file are
executed regardless of whether there's an abnormal stop inbetween.)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mstorsjo created this revision.Sep 16 2022, 6:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 6:23 AM

mstorsjo requested review of this revision.Sep 16 2022, 6:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 6:23 AM

Harbormaster completed remote builds in B187122: Diff 460722.Sep 16 2022, 6:26 AM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptSep 16 2022, 6:26 AM

We never made any guarantee about the order threads would be listed in the array returned by GetThreadList. I'm not sure it would be worth trying to do that, because often you don't know whether a thread's public StopInfo will be valid even if the private stop info is. For instance, if a thread hits a thread specific breakpoint for a thread that isn't in the breakpoint's thread specifier, then we reset the stop info back to an empty one because formally that thread didn't hit our breakpoint... So this change is clearly right.

I don't suppose there's a way to write a test for this?

In D134037#3796251, @jingham wrote:

We never made any guarantee about the order threads would be listed in the array returned by GetThreadList. I'm not sure it would be worth trying to do that, because often you don't know whether a thread's public StopInfo will be valid even if the private stop info is. For instance, if a thread hits a thread specific breakpoint for a thread that isn't in the breakpoint's thread specifier, then we reset the stop info back to an empty one because formally that thread didn't hit our breakpoint... So this change is clearly right.

I don't suppose there's a way to write a test for this?

Actually, yes, it is testable fairly easily - by firing up a test example with 2 threads and hitting a crash or breakpoint in either of them, and running it with a -k option. Before this fix, the test would still pass maybe half of the time, but now it should pass reliably.

Added a testcase. I'm not sure if this is the most accurate/correct place for the test, but the name is at least verbose enough to describe what it exercises.

Harbormaster completed remote builds in B187234: Diff 460880.Sep 16 2022, 1:34 PM

mstorsjo added a reviewer: jingham.Sep 19 2022, 9:59 AM

Ping

DavidSpickett added inline comments.Sep 23 2022, 5:20 AM

lldb/source/Interpreter/CommandInterpreter.cpp
2473–2474	Please add a comment explaining why we walk all the threads. (though in hindsight it does seem the obvious thing to do)
lldb/test/Shell/Driver/CommandOnCrashMultiThreaded.test
2	Isn't this fixing an issue that you saw on Windows as well?
lldb/test/Shell/Driver/Inputs/CommandOnCrashMultiThreaded.cpp
6	This is an x86 breakpoint, right? Shame there is no `__builtin_break` (well, msvc has one apparently but no help for this). I might add the AArch64 and Arm equivalent if I have some spare time.

mstorsjo added inline comments.Sep 23 2022, 5:35 AM

lldb/source/Interpreter/CommandInterpreter.cpp
2473–2474	Sure, can do
lldb/test/Shell/Driver/CommandOnCrashMultiThreaded.test
2	Yes, that was the place where I observed the issue (as Windows apparently, under some circumstances, can have a couple of extra threads running that haven't been spawned by the code of the process itself), but it's easily reproducible on any platform as long as you make sure there's more than one thread running. I copied the basis for the test from `Shell/Register/x86-multithreaded-read.test` - most of the tests there under `Shell/Register` seem to have such an XFAIL, not sure exactly why. (In this particular test, at least the `-pthread` linker flag might be a trivial platform specific difference that might break it, which would need to be abstracted somewhere.)
lldb/test/Shell/Driver/Inputs/CommandOnCrashMultiThreaded.cpp
6	Yes, afaik this is a regular x86 programmatic breakpoint. About the portability of breakpoints on ARM btw; the exact `udf #xx` code for a programmatic breakpoint is OS dependent, and does differ between Linux and Windows at least - see e.g. https://github.com/mingw-w64/mingw-w64/commit/38c9f0318c6509110706afdc52adc12d8cc34ead.

Add a code comment about the code that is changed in the patch.

Harbormaster completed remote builds in B188382: Diff 462454.Sep 23 2022, 5:37 AM

LGTM

lldb/test/Shell/Driver/CommandOnCrashMultiThreaded.test
2	Yeah I didn't notice the pthread flag, that could be it. Makes sense to me.
lldb/test/Shell/Driver/Inputs/CommandOnCrashMultiThreaded.cpp
6	Good point, definitely would have tripped over that. I'll double check arm64 as well.

This revision is now accepted and ready to land.Sep 23 2022, 5:45 AM

mstorsjo added inline comments.Sep 26 2022, 1:01 AM

lldb/test/Shell/Driver/CommandOnCrashMultiThreaded.test
2	Actually, it turns out that this test does pass on Windows (I hadn't tested before, I had just inherited the XFAIL from the test I based it on) - as `-pthread` isn't a linker flag but a compiler driver flag, the msvc target in clang just does nothing about it. So I'll remove the XFAIL here before pushing it.

Closed by commit rG8a3597d73c8f: [lldb] Fix CommandInterpreter::DidProcessStopAbnormally() with multiple threads (authored by mstorsjo). · Explain WhySep 26 2022, 1:07 AM

This revision was automatically updated to reflect the committed changes.

mstorsjo added a commit: rG8a3597d73c8f: [lldb] Fix CommandInterpreter::DidProcessStopAbnormally() with multiple threads.

I might add the AArch64 and Arm equivalent if I have some spare time.

I looked into this and there's a detail that makes this work on x86 only. So I'll leave it as is, it's getting tested somewhere, and that's what matters.

If you're curious this is the spanner in the works:

size_t NativeProcessProtocol::GetSoftwareBreakpointPCOffset() {
  switch (GetArchitecture().GetMachine()) {
  case llvm::Triple::x86:
  case llvm::Triple::x86_64:
  case llvm::Triple::systemz:
    // These architectures report increment the PC after breakpoint is hit.
    return cantFail(GetSoftwareBreakpointTrapOpcode(0)).size();

Meaning AArch64 doesn't. lldb on x86 can continue past a break in the program because of this:

(lldb) run
Process 31032 launched: '/tmp/test.o' (x86_64)
Process 31032 stopped
* thread #1, name = 'test.o', stop reason = signal SIGTRAP
    frame #0: 0x000055555555460f test.o`main at test.c:3
   1    int main() {
   2            __asm__ __volatile__("int3");
-> 3                  return 0;
   4    }
(lldb) c
Process 31032 resuming
Process 31032 exited with status = 0 (0x00000000)

On AArch64 we just stick on the break:

(lldb) run
Process 2162869 launched: '/tmp/test.o' (aarch64)
Process 2162869 stopped
* thread #1, name = 'test.o', stop reason = signal SIGTRAP
    frame #0: 0x0000aaaaaaaaa71c test.o`main at test.c:2:3
   1    int main() {
-> 2      asm volatile("brk #0xf000\n\t");
   3      return 0;
   4    }
(lldb) c
Process 2162869 resuming
Process 2162869 stopped
* thread #1, name = 'test.o', stop reason = signal SIGTRAP
    frame #0: 0x0000aaaaaaaaa71c test.o`main at test.c:2:3
   1    int main() {
-> 2      asm volatile("brk #0xf000\n\t");
   3      return 0;
   4    }

gdb has the same behaviour on AArch64 so I'll leave it as is until someone complains.

In D134037#3815240, @DavidSpickett wrote:

I might add the AArch64 and Arm equivalent if I have some spare time.

I looked into this and there's a detail that makes this work on x86 only. So I'll leave it as is, it's getting tested somewhere, and that's what matters.

If you're curious this is the spanner in the works:

size_t NativeProcessProtocol::GetSoftwareBreakpointPCOffset() {
  switch (GetArchitecture().GetMachine()) {
  case llvm::Triple::x86:
  case llvm::Triple::x86_64:
  case llvm::Triple::systemz:
    // These architectures report increment the PC after breakpoint is hit.
    return cantFail(GetSoftwareBreakpointTrapOpcode(0)).size();

Meaning AArch64 doesn't. lldb on x86 can continue past a break in the program because of this:

(lldb) run
Process 31032 launched: '/tmp/test.o' (x86_64)
Process 31032 stopped
* thread #1, name = 'test.o', stop reason = signal SIGTRAP
    frame #0: 0x000055555555460f test.o`main at test.c:3
   1    int main() {
   2            __asm__ __volatile__("int3");
-> 3                  return 0;
   4    }
(lldb) c
Process 31032 resuming
Process 31032 exited with status = 0 (0x00000000)

On AArch64 we just stick on the break:

(lldb) run
Process 2162869 launched: '/tmp/test.o' (aarch64)
Process 2162869 stopped
* thread #1, name = 'test.o', stop reason = signal SIGTRAP
    frame #0: 0x0000aaaaaaaaa71c test.o`main at test.c:2:3
   1    int main() {
-> 2      asm volatile("brk #0xf000\n\t");
   3      return 0;
   4    }
(lldb) c
Process 2162869 resuming
Process 2162869 stopped
* thread #1, name = 'test.o', stop reason = signal SIGTRAP
    frame #0: 0x0000aaaaaaaaa71c test.o`main at test.c:2:3
   1    int main() {
-> 2      asm volatile("brk #0xf000\n\t");
   3      return 0;
   4    }

gdb has the same behaviour on AArch64 so I'll leave it as is until someone complains.

Ah, yes, I remember hitting that issue occasionally. On the other hand, if it would be somewhat doable to fix it, it'd would be a great value add IMO - I've had to do acrobatics to set the program counter to the next instruction to resume running after such a breakpoint a couple times...

Turns out you're not the only one https://github.com/llvm/llvm-project/issues/56268.

Revision Contents

Path

Size

lldb/

source/

Interpreter/

CommandInterpreter.cpp

8 lines

test/

Shell/

Driver/

CommandOnCrashMultiThreaded.test

5 lines

Inputs/

CommandOnCrashMultiThreaded.cpp

13 lines

Diff 462836

lldb/source/Interpreter/CommandInterpreter.cpp

Show First 20 Lines • Show All 2,464 Lines • ▼ Show 20 Lines	bool CommandInterpreter::DidProcessStopAbnormally() const {
ProcessSP process_sp(target_sp->GetProcessSP());		ProcessSP process_sp(target_sp->GetProcessSP());
if (!process_sp)		if (!process_sp)
return false;		return false;

if (eStateStopped != process_sp->GetState())		if (eStateStopped != process_sp->GetState())
return false;		return false;

for (const auto &thread_sp : process_sp->GetThreadList().Threads()) {		for (const auto &thread_sp : process_sp->GetThreadList().Threads()) {
StopInfoSP stop_info = thread_sp->GetStopInfo();		StopInfoSP stop_info = thread_sp->GetStopInfo();
if (!stop_info)		if (!stop_info) {
		DavidSpickettUnsubmitted Not Done Reply Inline Actions Please add a comment explaining why we walk all the threads. (though in hindsight it does seem the obvious thing to do) DavidSpickett: Please add a comment explaining why we walk all the threads. (though in hindsight it does seem…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions Sure, can do mstorsjo: Sure, can do
return false;		// If there's no stop_info, keep iterating through the other threads;
		// it's enough that any thread has got a stop_info that indicates
		// an abnormal stop, to consider the process to be stopped abnormally.
		continue;
		}

const StopReason reason = stop_info->GetStopReason();		const StopReason reason = stop_info->GetStopReason();
if (reason == eStopReasonException \|\|		if (reason == eStopReasonException \|\|
reason == eStopReasonInstrumentation \|\|		reason == eStopReasonInstrumentation \|\|
reason == eStopReasonProcessorTrace)		reason == eStopReasonProcessorTrace)
return true;		return true;

if (reason == eStopReasonSignal) {		if (reason == eStopReasonSignal) {
▲ Show 20 Lines • Show All 1,022 Lines • Show Last 20 Lines

lldb/test/Shell/Driver/CommandOnCrashMultiThreaded.test

This file was added.

				# REQUIRES: native && (target-x86 \|\| target-x86_64)
				# RUN: %clangxx_host %p/Inputs/CommandOnCrashMultiThreaded.cpp -o %t -pthread
				DavidSpickettUnsubmitted Not Done Reply Inline Actions Isn't this fixing an issue that you saw on Windows as well? DavidSpickett: Isn't this fixing an issue that you saw on Windows as well?
				mstorsjoAuthorUnsubmitted Done Reply Inline Actions Yes, that was the place where I observed the issue (as Windows apparently, under some circumstances, can have a couple of extra threads running that haven't been spawned by the code of the process itself), but it's easily reproducible on any platform as long as you make sure there's more than one thread running. I copied the basis for the test from `Shell/Register/x86-multithreaded-read.test` - most of the tests there under `Shell/Register` seem to have such an XFAIL, not sure exactly why. (In this particular test, at least the `-pthread` linker flag might be a trivial platform specific difference that might break it, which would need to be abstracted somewhere.) mstorsjo: Yes, that was the place where I observed the issue (as Windows apparently, under some…
				DavidSpickettUnsubmitted Done Reply Inline Actions Yeah I didn't notice the pthread flag, that could be it. Makes sense to me. DavidSpickett: Yeah I didn't notice the pthread flag, that could be it. Makes sense to me.
				mstorsjoAuthorUnsubmitted Done Reply Inline Actions Actually, it turns out that this test does pass on Windows (I hadn't tested before, I had just inherited the XFAIL from the test I based it on) - as `-pthread` isn't a linker flag but a compiler driver flag, the msvc target in clang just does nothing about it. So I'll remove the XFAIL here before pushing it. mstorsjo: Actually, it turns out that this test does pass on Windows (I hadn't tested before, I had just…
				# RUN: %lldb -b -o "process launch" -k "process continue" -k "exit" %t \| FileCheck %s

				# CHECK: Process {{[0-9]+}} exited with status = 0

lldb/test/Shell/Driver/Inputs/CommandOnCrashMultiThreaded.cpp

This file was added.

				#include <thread>

				void t_func() {
				asm volatile(
				"int3\n\t"
				);
				DavidSpickettUnsubmitted Not Done Reply Inline Actions This is an x86 breakpoint, right? Shame there is no `__builtin_break` (well, msvc has one apparently but no help for this). I might add the AArch64 and Arm equivalent if I have some spare time. DavidSpickett: This is an x86 breakpoint, right? Shame there is no `__builtin_break` (well, msvc has one…
				mstorsjoAuthorUnsubmitted Done Reply Inline Actions Yes, afaik this is a regular x86 programmatic breakpoint. About the portability of breakpoints on ARM btw; the exact `udf #xx` code for a programmatic breakpoint is OS dependent, and does differ between Linux and Windows at least - see e.g. https://github.com/mingw-w64/mingw-w64/commit/38c9f0318c6509110706afdc52adc12d8cc34ead. mstorsjo: Yes, afaik this is a regular x86 programmatic breakpoint. About the portability of breakpoints…
				DavidSpickettUnsubmitted Done Reply Inline Actions Good point, definitely would have tripped over that. I'll double check arm64 as well. DavidSpickett: Good point, definitely would have tripped over that. I'll double check arm64 as well.
				}

				int main() {
				std::thread t(t_func);
				t.join();
				return 0;
				}