Download Raw Diff

Details

Reviewers

labath
clayborg
jfb

Commits

rG77460d3888c5: ProcessMinidump: Suppress reporting stop for signal '0'
rL375244: ProcessMinidump: Suppress reporting stop for signal '0'
rLLDB375244: ProcessMinidump: Suppress reporting stop for signal '0'

Summary

The minidump exception stream can report an exception record with
signal 0. If we try to create a stop reason with signal zero, processing
of the stop event won't find anything, and the debugger will hang.
So, simply early-out of RefreshStateAfterStop in this case.

Also set the UnixSignals object in DoLoadCore as is done for
ProcessElfCore.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 38685
Build 38684: arc lint + arc unit

Event Timeline

JosephTremoulet created this revision.Sep 26 2019, 12:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2019, 12:09 PM

Herald added a subscriber: lldb-commits. · View Herald Transcript

Harbormaster completed remote builds in B38617: Diff 221998.Sep 26 2019, 12:10 PM

I'm trying to fix an issue where opening a minidump created by breakpad in lldb just hangs, but if I use breakpad's minidump-2-core on that same dump then opening the core dump works fine. From debugging the two cases I can see that the critical logic that ProcessElfCore has which ProcessMinidump lacks is here, and the code in this patch is my attempt to replicate that.

I'd love some help on how to create an appropriate test for this. From looking at existing tests, I gather it would make sense to add a test to lldb/lit/Minidump or packages/Python/lldbsuite/test/functionalities/postmortem/minidump-new (aside: what's the difference? is one preferred?) that has a yaml-ized minidump, the test would try to open the dump and make sure it didn't hang. I'm running into issues when I try to create that test, and I've put details in a gist. There's a program there which launches another program and then uses breakpad to capture a minidump of the child process (another aside: it looks like the existing tests all dump their own process, I think that dumping another process is the reason I'm running into this but others haven't). If I load that minidump without the fixes in this patch, lldb hangs. If I load that minidump with the fixes in this patch, lldb reads it and can show an accurate backtrace, I've put a log of that in the gist. But if I then run that minidump through obj-2-yaml and yaml-2-obj, if I load the resulting minidump, while it doesn't hang, it can't show an accurate backtrace. I've also put up a log of that.

I'd imagine my next step would be to look into what information is getting dropped when I try to do the round-trip through obj-2-yaml and yaml-2-obj, then update those utilities to preserve more, so that I can come back and add this test with a yaml input... does that sound right, or have I gone off track somewhere?

Thanks.

JosephTremoulet added reviewers: labath, clayborg.Sep 26 2019, 12:27 PM

The functionality is fine, though I'd hope it can be cleaned up a bit.

As for testing, yes, yaml2obj has some problems roundtripping complex minidumps (and elf files). Fortunately, for the functionality you're testing, I don't think you really need the executable file, so you can just ignore the elf bit and test with a plain lldb -c foo.dmp (obviously, you won't get the backtrace that way, but you don't really need that here. Round-tripping minidumps is also WIP -- I've only implemented the bits I have needed so far (I am adding MemoryInfoList stream as we speak). For some streams that just means, they're not getting pretty-printed, while for others (those that contain pointers to other parts of the file) it means the reconstituted file ends up containing garbage. The exception stream is one of those cases, but here it looks like you actually want to test the case where the exception stream is not present (?), so that might be fine. Or it might be fine if you manually remove the ExceptionStream from the yaml? If you can send me the minidump you have generated I can try to play around with it to see if I can make a reasonable yaml out of it.

As for the "lldb/lit/Minidump or packages/Python/lldbsuite/test/functionalities/postmortem/minidump-new" part, the main difference there is how the test is written. Which one is better depends on what you want to test and to some extent, also on personal preference. The python tests are better when you want to programatically drive lldb and have complex interactions with it. the lit tests are good for simply executing a bunch of known commands, and verifying static output. For this case, I think a lit test is definitely the easier route.

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp
218–236	Would it be possible to move this code (except maybe the SetUnixSignals bit) into the `RefreshStateAfterStop` function? Would be less confusing and it would avoid the need for extra member variables...

Fortunately, for the functionality you're testing, I don't think you really need the executable file, so you can just ignore the elf bit and test with a plain lldb -c foo.dmp (obviously, you won't get the backtrace that way, but you don't really need that here.

Ah, yeah, good point.

here it looks like you actually want to test the case where the exception stream is not present (?)

If I'm reading things correctly, the exception stream is there and we get a non-null exception record, which has a tid, but its code and flags are both zero.
This makes me realize that the title I've put here is very misleading, I'll fix that when I update the patch.

If you can send me the minidump you have generated I can try to play around with it to see if I can make a reasonable yaml out of it.

Thanks! Will send.

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp
218–236	Yeah, happy to. I wasn't sure what the implications of doing so would be, but I guess for a dump file there will only ever be one stop?

Move artificial SIGSTOP injection to RefreshStateAfterStop

Harbormaster completed remote builds in B38685: Diff 222218.Sep 27 2019, 11:55 AM

JosephTremoulet retitled this revision from Add Linux signal support to ProcessMinidump to ProcessMinidump: inject SIGSTOP on Linux if no thread has a signal.Sep 27 2019, 11:57 AM

JosephTremoulet marked 2 inline comments as done.Sep 27 2019, 12:00 PM

JosephTremoulet added inline comments.

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp
218–236	I've moved it. In doing so I left the early-out for `!m_active_exception`, which is a behavioral difference from my previous iteration, but this covers my use-case so I think better to keep the change focused like this.

Use accessor for m_unix_signals_sp

Harbormaster completed remote builds in B38687: Diff 222225.Sep 27 2019, 12:09 PM

Thanks for cleaning this up, and for sending the minidump. Looking at the dump, I am pretty sure that the problem is the lack of proper exception stream support in yaml2obj. The stream contains a reference to a thread context, and since yaml2obj does not understand this, it copies the offsets verbatim, and so they end up pointing to random garbage after the yaml round trip.

It doesn't look like it should be too hard to add yaml support for the exceptions stream -- it should only be a matter of adapting the patterns already used for other stream to work for this particular case. Could you give a go at that? Alternatively, you can wait a bit until I finish with the MemoryInfoList stream. After that, I can squeeze some time to do the Exception stream too.

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp
219	I think you can create this unconditionally.
258–265	After some investigation, I re-learned about the DumpRequested constant, which is already checked for in this code (and it is tested (TestMiniDumpNew.py:test_snapshow_minidump) and works/does not hang). It seems to me like it would make sense to treat this case the same way as DumpRequested, and just don't return any stop info (instead of returning a stop info with signal 0 or SIGSTOP). IOW, you would just put `if (signo==0) return;` here. Eventually we could apply the same change to process elf core as well.. WDYT?

Review feedback

Harbormaster completed remote builds in B38759: Diff 222444.Sep 30 2019, 8:51 AM

JosephTremoulet retitled this revision from ProcessMinidump: inject SIGSTOP on Linux if no thread has a signal to ProcessMinidump: Suppress reporting stop for signal '0'.Sep 30 2019, 8:55 AM

JosephTremoulet edited the summary of this revision. (Show Details)

In D68096#1687790, @labath wrote:

It doesn't look like it should be too hard to add yaml support for the exceptions stream -- it should only be a matter of adapting the patterns already used for other stream to work for this particular case. Could you give a go at that?

Yes, will do, thanks for the pointers.

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp
258–265	SGTM, and seems to work for my use case. I wasn't sure though whether to leave the `if (signo==0) return` under `if (arch.GetTripe().getOS() == llvm::Triple::Linux)`. Originally I'd put logic here just because I was copying logic from ProcessElfCore. It looks like the Apple case here would maybe similarly have an issue with exception code zero (?), but the `else` case (Windows?) doesn't reference the exception_code at all so I'd hesitate to apply this change there... I've left it under the check for Linux, but happy to move it if that seems wrong.

labath marked an inline comment as done.Sep 30 2019, 11:12 PM

labath added inline comments.

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp
258–265	Yeah, the exception code seems to be used completely differently on other OSs, so making this linux specific seems like the right choice.

Rebase
Add testcase

Herald added a reviewer: jfb. · View Herald TranscriptOct 10 2019, 2:03 PM

JosephTremoulet added a parent revision: D68658: LLDB: Use LLVM's type for minidump ExceptionStream [NFC].Oct 10 2019, 2:04 PM

Herald added a subscriber: dexonsmith. · View Herald TranscriptOct 10 2019, 2:04 PM

Harbormaster completed remote builds in B39363: Diff 224470.Oct 10 2019, 2:05 PM

We need to grab the address from the exception info and pass that along for linux.

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp
266	Are exceptions on linux always just a signal number? Mac can be a mach exception or a signal. Might be worth looking at the breakpad sources to see if they ever fill in anything else for linux other than a signal. Also, there is more info in the exception record that needs to be passed along. Like for a SIGSEGV, the address should be in the exception info somewhere.

This revision now requires changes to proceed.Oct 10 2019, 2:06 PM

We should add a test that has no exception info and verify we get a stopped process if we don't already have a test like this. LLDB doesn't like it when there is no stop reason and might try to resume the process and might not print out the stop and backtrace?

What lldb does not like the most is a stop info with a signal "0" -- that causes a resume. Having no stop info whatsoever seems to work just fine -- and this is already what happens when there is no exception record around. I was surprised by that distinction, but we already have tests with minidumps like those, and they seem to work ok, so I thought it'd make sense to do the same for the case where we have an exception record, but it contains no signal (as opposed to e.g. inventing a SIGSTOP out of thin air). We already have a test for that in this patch.

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp
266	Having the address (if it is there) would be nice, but I think that's a separate bug/feature, as this patch is about what happens when there is no exception record around.

Just to make sure I'm understanding the feedback correctly, I'll try to summarize. Please let me know if this is off track:

When one uses breakpad to generate a minidump for a process that *hasn't* crashed, depending which of its methods one calls to write the dump, the generated dump's exception stream's exception code is either the "dump requested" sentinel or zero.
We have a bug, which this patch is meant to fix, wherein loading one of those dumps with exception code zero creates a stop with signal zero, resulting in lldb hanging.
We have some "prior art" in ProcessElfCore, which is how we successfully load the core dump you get by running a zero-exception-code minidump through breakpad's minidump-2-core, which is to artificially report SIGSTOP. @labath, your feedback is that this is a hack that we shouldn't extend and should ideally/eventually remove from ProcessElfCore
We also have some "prior art" in ProcessMinidump, which is how we successfully load a minidump with the "dump requested" sentinel, which is to create no stop at all. That's what this patch currently does when it sees signal zero. @clayborg, your feedback is that simply checking that the error code is zero is insufficient grounds to suppress stop creation, that we should probably check other fields in the exception stream and create a stop if *any* of them are reporting anything non-null, and/or verify that the only way breakpad/crashpad will produce a linux minidump with null signal is if there's no exception whatsoever

I may need to set this aside for a while and come back to it to address @clayborg's concerns (if I've understood them correctly), but I can live with my downstream changes in the meantime, so that's seeming like the best path forward.

In D68096#1706076, @JosephTremoulet wrote:

Just to make sure I'm understanding the feedback correctly, I'll try to summarize. Please let me know if this is off track:

When one uses breakpad to generate a minidump for a process that *hasn't* crashed, depending which of its methods one calls to write the dump, the generated dump's exception stream's exception code is either the "dump requested" sentinel or zero.

I believe so.

We have a bug, which this patch is meant to fix, wherein loading one of those dumps with exception code zero creates a stop with signal zero, resulting in lldb hanging.

If this patch is solely meant to fix this issue, then I remove my "Requires Changes" and this patch is good to go if it no longer hangs LLDB.

We have some "prior art" in ProcessElfCore, which is how we successfully load the core dump you get by running a zero-exception-code minidump through breakpad's minidump-2-core, which is to artificially report SIGSTOP. @labath, your feedback is that this is a hack that we shouldn't extend and should ideally/eventually remove from ProcessElfCore

I agree on this as long as no hang happens!

We also have some "prior art" in ProcessMinidump, which is how we successfully load a minidump with the "dump requested" sentinel, which is to create no stop at all. That's what this patch currently does when it sees signal zero. @clayborg, your feedback is that simply checking that the error code is zero is insufficient grounds to suppress stop creation, that we should probably check other fields in the exception stream and create a stop if *any* of them are reporting anything non-null, and/or verify that the only way breakpad/crashpad will produce a linux minidump with null signal is if there's no exception whatsoever

As Pavel mentioned, I was asking for other fields in the exception info to be passed along for other signals, like SIGSEGV. After reading his comments, I agree this can be done in another patch if it isn't already being done.

I may need to set this aside for a while and come back to it to address @clayborg's concerns (if I've understood them correctly), but I can live with my downstream changes in the meantime, so that's seeming like the best path forward.

If this patch fixes the hang, then it is good to go.

This revision is now accepted and ready to land.Oct 11 2019, 10:59 AM

In D68096#1706258, @clayborg wrote:

If this patch is solely meant to fix this issue, then I remove my "Requires Changes" and this patch is good to go if it no longer hangs LLDB.

Great, thanks. And just to confirm on the testing:

test/functionalities/postmortem/minidump-new/linux-x86_64_not_crashed.cpp is an example program that uses breakpad to generate a dump that uses the sentinel signal, and we have a test checked in that uses a reduced yaml-ized copy of its minidump
https://gist.github.com/JosephTremoulet/2689c5ed21d7be5543cfcb70d548e0bc is an example program that uses breakpad to generate a dump that uses null signal, and in this patch I'm adding a test that uses a reduced yaml-ized copy of its minidump, which lldb hangs on loading without this fix and successfully loads with this fix

labath accepted this revision.Oct 14 2019, 12:38 AM

Closed by commit rG77460d3888c5: ProcessMinidump: Suppress reporting stop for signal '0' (authored by JosephTremoulet). · Explain WhyOct 18 2019, 8:01 AM

This revision was automatically updated to reflect the committed changes.

Diff 222218

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	default:
error.SetErrorStringWithFormat("unsupported minidump architecture: %s",		error.SetErrorStringWithFormat("unsupported minidump architecture: %s",
arch.GetArchitectureName());		arch.GetArchitectureName());
return error;		return error;
}		}
GetTarget().SetArchitecture(arch, true /set_platform/);		GetTarget().SetArchitecture(arch, true /set_platform/);

m_thread_list = m_minidump_parser->GetThreads();		m_thread_list = m_minidump_parser->GetThreads();
m_active_exception = m_minidump_parser->GetExceptionStream();		m_active_exception = m_minidump_parser->GetExceptionStream();

		if (arch.GetTriple().isOSLinux())
		labathUnsubmitted Done Reply Inline Actions I think you can create this unconditionally. labath: I think you can create this unconditionally.
		SetUnixSignals(UnixSignals::Create(GetArchitecture()));

ReadModuleList();		ReadModuleList();

llvm::Optional<lldb::pid_t> pid = m_minidump_parser->GetPid();		llvm::Optional<lldb::pid_t> pid = m_minidump_parser->GetPid();
if (!pid) {		if (!pid) {
error.SetErrorString("failed to parse PID");		error.SetErrorString("failed to parse PID");
return error;		return error;
}		}
SetID(pid.getValue());		SetID(pid.getValue());

return error;		return error;
}		}

ConstString ProcessMinidump::GetPluginName() { return GetPluginNameStatic(); }		ConstString ProcessMinidump::GetPluginName() { return GetPluginNameStatic(); }

uint32_t ProcessMinidump::GetPluginVersion() { return 1; }		uint32_t ProcessMinidump::GetPluginVersion() { return 1; }
		labathUnsubmitted Done Reply Inline Actions Would it be possible to move this code (except maybe the SetUnixSignals bit) into the `RefreshStateAfterStop` function? Would be less confusing and it would avoid the need for extra member variables... labath: Would it be possible to move this code (except maybe the SetUnixSignals bit) into the…
		JosephTremouletAuthorUnsubmitted Done Reply Inline Actions Yeah, happy to. I wasn't sure what the implications of doing so would be, but I guess for a dump file there will only ever be one stop? JosephTremoulet: Yeah, happy to. I wasn't sure what the implications of doing so would be, but I guess for a…
		JosephTremouletAuthorUnsubmitted Done Reply Inline Actions I've moved it. In doing so I left the early-out for `!m_active_exception`, which is a behavioral difference from my previous iteration, but this covers my use-case so I think better to keep the change focused like this. JosephTremoulet: I've moved it. In doing so I left the early-out for `!m_active_exception`, which is a…

Status ProcessMinidump::DoDestroy() { return Status(); }		Status ProcessMinidump::DoDestroy() { return Status(); }

void ProcessMinidump::RefreshStateAfterStop() {		void ProcessMinidump::RefreshStateAfterStop() {

if (!m_active_exception)		if (!m_active_exception)
return;		return;

if (m_active_exception->exception_record.exception_code ==		if (m_active_exception->exception_record.exception_code ==
MinidumpException::DumpRequested) {		MinidumpException::DumpRequested) {
return;		return;
}		}

lldb::StopInfoSP stop_info;		lldb::StopInfoSP stop_info;
lldb::ThreadSP stop_thread;		lldb::ThreadSP stop_thread;

Process::m_thread_list.SetSelectedThreadByID(m_active_exception->thread_id);		Process::m_thread_list.SetSelectedThreadByID(m_active_exception->thread_id);
stop_thread = Process::m_thread_list.GetSelectedThread();		stop_thread = Process::m_thread_list.GetSelectedThread();
ArchSpec arch = GetArchitecture();		ArchSpec arch = GetArchitecture();

if (arch.GetTriple().getOS() == llvm::Triple::Linux) {		if (arch.GetTriple().getOS() == llvm::Triple::Linux) {
		uint32_t signo = m_active_exception->exception_record.exception_code;

		if (signo == 0) {
		// Artifically inject a SIGSTOP so that we'll find a stop reason
		// when we process the stop event.
		signo = m_unix_signals_sp->GetSignalNumberFromName("SIGSTOP");
		}

		labathUnsubmitted Not Done Reply Inline Actions After some investigation, I re-learned about the DumpRequested constant, which is already checked for in this code (and it is tested (TestMiniDumpNew.py:test_snapshow_minidump) and works/does not hang). It seems to me like it would make sense to treat this case the same way as DumpRequested, and just don't return any stop info (instead of returning a stop info with signal 0 or SIGSTOP). IOW, you would just put `if (signo==0) return;` here. Eventually we could apply the same change to process elf core as well.. WDYT? labath: After some investigation, I re-learned about the DumpRequested constant, which is already…
		JosephTremouletAuthorUnsubmitted Done Reply Inline Actions SGTM, and seems to work for my use case. I wasn't sure though whether to leave the `if (signo==0) return` under `if (arch.GetTripe().getOS() == llvm::Triple::Linux)`. Originally I'd put logic here just because I was copying logic from ProcessElfCore. It looks like the Apple case here would maybe similarly have an issue with exception code zero (?), but the `else` case (Windows?) doesn't reference the exception_code at all so I'd hesitate to apply this change there... I've left it under the check for Linux, but happy to move it if that seems wrong. JosephTremoulet: SGTM, and seems to work for my use case. I wasn't sure though whether to leave the `if…
		labathUnsubmitted Done Reply Inline Actions Yeah, the exception code seems to be used completely differently on other OSs, so making this linux specific seems like the right choice. labath: Yeah, the exception code seems to be used completely differently on other OSs, so making this…
stop_info = StopInfo::CreateStopReasonWithSignal(		stop_info = StopInfo::CreateStopReasonWithSignal(
		clayborgUnsubmitted Not Done Reply Inline Actions Are exceptions on linux always just a signal number? Mac can be a mach exception or a signal. Might be worth looking at the breakpad sources to see if they ever fill in anything else for linux other than a signal. Also, there is more info in the exception record that needs to be passed along. Like for a SIGSEGV, the address should be in the exception info somewhere. clayborg: Are exceptions on linux always just a signal number? Mac can be a mach exception or a signal.
		labathUnsubmitted Not Done Reply Inline Actions Having the address (if it is there) would be nice, but I think that's a separate bug/feature, as this patch is about what happens when there is no exception record around. labath: Having the address (if it is there) would be nice, but I think that's a separate bug/feature…
*stop_thread, m_active_exception->exception_record.exception_code);		*stop_thread, signo);
} else if (arch.GetTriple().getVendor() == llvm::Triple::Apple) {		} else if (arch.GetTriple().getVendor() == llvm::Triple::Apple) {
stop_info = StopInfoMachException::CreateStopReasonWithMachException(		stop_info = StopInfoMachException::CreateStopReasonWithMachException(
*stop_thread, m_active_exception->exception_record.exception_code, 2,		*stop_thread, m_active_exception->exception_record.exception_code, 2,
m_active_exception->exception_record.exception_flags,		m_active_exception->exception_record.exception_flags,
m_active_exception->exception_record.exception_address, 0);		m_active_exception->exception_record.exception_address, 0);
} else {		} else {
std::string desc;		std::string desc;
llvm::raw_string_ostream desc_stream(desc);		llvm::raw_string_ostream desc_stream(desc);
▲ Show 20 Lines • Show All 560 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

ProcessMinidump: Suppress reporting stop for signal '0'
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 222218

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp

This is an archive of the discontinued LLVM Phabricator instance.

ProcessMinidump: Suppress reporting stop for signal '0'ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 222218

lldb/source/Plugins/Process/minidump/ProcessMinidump.cpp

ProcessMinidump: Suppress reporting stop for signal '0'
ClosedPublic