This is an archive of the discontinued LLVM Phabricator instance.

Add automatic Windows Minidump support for tools crashes
AbandonedPublic

Authored by gbedwell on Oct 23 2015, 11:03 AM.

Details

Summary

Hi,

This patch adds the ability for LLVM-based tools on Windows to automatically generate MiniDump .dmp files in the case of tool crashes which can then be used for post-mortem debugging within Visual Studio. The default Windows behaviour in applications is typically to allow the creation of a .dmp file on a crash by popping up a modal dialog box. As this is hardly ideal for a command line tool that's likely to be running unattended or even on a headless remote machine we've had this disabled since r227470. With this patch a .dmp file is now generated automatically when LLVMUnhandledExceptionFilter is executed in the event of a crash, just prior to the existing code for printing a stacktrace.

We've found this very useful in the past to give us clues in the case of intermittent crashes that can't be replicated locally in a debugging environment (or at all) or cases where our users may be unable or unwilling to provide us with a repro example. Obviously for this to be most useful it needs to be used with a build configuration such as Debug or RelWithDebInfo, but the latter works quite nicely for shipping MSVC builds as the debugging information gets stored in PDB files which don't need to be shipped, but just archived for such a time that you may need to analyze a .dmp file.

The core of this patch is the call to MiniDumpWriteDump ( https://msdn.microsoft.com/en-us/library/windows/desktop/ms680360(v=vs.85).aspx ) inside new function WriteWindowsDumpFile. I've also implemented support for reading registry keys from "SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps" ( https://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx ) to allow custom MiniDump options and location to be specified (for example, in cases where a reporting tool may be automatically monitoring a specific location for any crash dumps).

Now, for example, if I force a clang crash using a debug pragma (without having set up the above registry keys) I will now see a message like the following prior to the normal stack trace output:

Wrote crash dump file "C:\Users\GREG~1\AppData\Local\Temp\clang.exe-91d481.dmp"

There are a few things that I'm not entirely happy with that I'd like to get some opinions on, but hopefully the attached patch can serve as a starting point for discussion:

  • PreventCoreFiles(): Obviously there are plenty of cases where we want to disable automatic minidump creation. This is the purpose of the existing Process::PreventCoreFiles(). Unfortunately, the existing cases covered by that function deal with calling various OS APIs, whereas as for this approach it needs to control a later decision we make in the crash handler so we need to carry some information around about whether it's been called or not, hence the introduction of bool Process::CoreFilesPrevented. Is this reasonable, or is there a better way to convey that information to the point at which it is needed?
  • "LLVM_DISABLE_CRASH_REPORT": I'm assuming that we don't want crashing lit tests to start filling the disk up with crash dump files every time someone introduces a crash bug during normal development. I noticed that lit is setting this environment variable which is currently only used in some APPLE guarded code in Unix/Signals.inc with the comment "Environment variable to disable any kind of crash dialog". I can see that it used to also be checked in Windows/Signals.inc but was removed as we unconditionally prevent any crash dialogs now. Right now in this patch I've got the Windows implementation calling PreventCoreFiles() in the presence of the environment variable. Is my assumption that we don't want crash dumps for lit tests correct? If so, is this the way to do it?
  • Testing: Unfortunately I'm at a loss for an effective way to do any useful form of unit or regression testing for this type of change. Internally I think I can test this in our private tests by creating a mock environment with fake Windows registry keys, etc. and using clang's crash pragmas to ensure that files get written to the expected location (at least in the case of clang) but I can't think of an effective way that is going to be contributable to the community. What's the policy for tests in the deep OS-entangled code in the Support libs?

Many Thanks,

Greg Bedwell
SN Systems Ltd. - Sony Computer Entertainment Group

Diff Detail

Event Timeline

gbedwell updated this revision to Diff 38242.Oct 23 2015, 11:03 AM
gbedwell retitled this revision from to Add automatic Windows Minidump support for tools crashes.
gbedwell updated this object.
gbedwell added reviewers: rnk, aaron.ballman, Bigcheese.
gbedwell added a subscriber: llvm-commits.
rnk edited edge metadata.Oct 23 2015, 1:31 PM

For testing, what do you think is most worth testing here? I think testing the registry reading logic is probably the hardest to do since it requires making local modifications to the registry. If we give up on testing that, we could add some environment variable (LLVM_CRASH_DUMP_DIR) that bypasses all that logic and use it to test the integration from the crash handler through to creating an actual file on disk.

You can look at unittests/Support/ProgramTest.cpp for examples of other subprocess tests in LLVM. I think our crash handler might have bad interactions with gtest, though, since it uses SEH __try to attempt to report crashes as test failures. That'll probably block our minidumping. Alternatively, you could add something similar to clang's crash recovery tests. We have pragmas that explicitly force clang to do stuff like dereference null, and then you'd verify that we get a dump file.

lib/Support/Windows/Signals.inc
488

Are there issues with enabling this under mingw?

579

This can just be a StringRef instead of a SmallString. No need to copy it.

584

You never bother calling RegCloseKey on this HKEY. That's fine, this is a crash handler, we don't need to worry, but I would like to see a comment to that effect.

585

This Twine construction should be unnecessary, it should happen implicitly.

591

I think if you want operator+ to do the right thing here, you want to write this as:

FindRegistryKey(Twine(LocalDumpsRegistryLocation) + '\\' + ProgramName);
599

LLVM doesn't normally format one-line ifs this way.

aaron.ballman added inline comments.Nov 1 2015, 2:21 PM
include/llvm/Support/Process.h
47

Do we have to worry about multithreaded access to this? I'm not certain how we handle other situations where we're adding what are effectively globals. Perhaps documenting that this is not thread-safe is sufficient.

I would feel a bit more comfortable if this wasn't exposed as a public member, since it's set to true by PreventCoreFiles, but can also be set to true outside of that API.

lib/Support/Windows/Signals.inc
496

Why do we want to access a 64-bit key from a 32-bit app by default? That seems like surprising behavior for a generic-sounding helper function.

Since you always use the defaults anyway, perhaps it makes more sense to leave them out and hard-code them. Then you can name the function something like FindWERKey or something.

509

I'm concerned that this API will fail on systems with non-ASCII characters in the resulting path.

510

Formatting

536

Formatting

584

I would prefer we clean up properly -- don't we have an RAII helper class that does this for Win32 API types already?

616

Should be able to ditch the explicit creation of Twine.

620

This will have the same problem that rnk commented on above.

631

This needs to check for INVALID_HANDLE_VALUE, not nullptr.

634

Failing to close the handle in this case. I would recommend the RAII helper class (ScopedHandle, IIRC).

Thanks both for your feedback! I'll start tackling it all once my jetlag from developer meeting travel subsides :).

asl added a subscriber: asl.Mar 17 2016, 10:57 AM

Apologies for the (long) delay here. I'm looking at this again now and I'll post an update soon.

Hi,

Just a quick update. I'm continuing to work on this. So far I've addressed the majority of comments locally, but as has sadly been the continuing story of the last few months, other commitments cropping up are meaning that I've had much less time than I'd hoped to actually just sit down in front of an editor and work on LLVM. Currently I'm in the depths of path handling trying to reconcile the half of the existing code that seems to assume char, with the rest that deals with wchar_t paths. I'm going to be out of the office next week, so if there is a pressing need to move forward on this or D18216 I'm happy for that to happen in the meantime and I'll expand on that with a subsequent patch to fill in anything missing such as the Windows registry handling side here, otherwise I'll resume this as soon as I'm back at my desk!

Thanks
-Greg

Hi Greg,

Thank you for the update. There is no time pressure. However, if you don't mind, I can update D18216 with ideas from D14022 and then post it for review.

Hi guys,

I have updated D18216 patch with ideas from D14022. I would be highly appreciated if you have time to review it.

gbedwell abandoned this revision.Apr 18 2016, 8:47 AM

Superseded by D18216.