Page MenuHomePhabricator

Add new hidden option -print-crash-IR that prints out IR that caused opt pipeline to crash
AcceptedPublic

Authored by jamieschmeiser on Aug 26 2020, 1:26 PM.

Details

Summary

A new hidden option -print-on-crash that prints the IR as it was upon entering
the last pass when there is a crash.

The IR is saved in its print form before each pass is started and a
signal handler is registered. If the compilation crashes, the signal
handler will print the saved IR to dbgs(). This option
can be modified using -print-module-scope to get the IR for the complete
module. Note that this option only works with the new pass manager.

Although it is not included in the patch, this was tested by adding a hidden
option to a pass that caused the pass to call __builtin_trap().

Diff Detail

Event Timeline

jamieschmeiser created this revision.Aug 26 2020, 1:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2020, 1:26 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
jamieschmeiser requested review of this revision.Aug 26 2020, 1:26 PM

This option will be introduced (along with others) in my tutorial at the LLVM developer's conference in October.

I'm worried that this will affect memory usage.
Is --print-before-all not good enough?

--print-before-all will print the IR before every pass which can result in the IR being printed literally thousands of times while this option prints it only once. The IR is saved in a string which is cleared each time and re-used so the memory overhead is not great; also this overhead is only hit when the option is requested. It does slow the compilation when used but not as much as --print-before-all and the user does not have as difficult a task in extracting the IR from the output. If the option is false (the default), the overhead is only the checking of the option. Basically, --print-before-all is cumbersome for this purpose while this option is convenient. Also, one could reasonably request a user to retrieve the IR (if they are willing) with this option when reporting a crash while requesting a user to use --print-before-all would not be reasonable.

[NFC] Reorder includes and change case of variable as suggested by
clang-tidy/clang-format.

yrouban added inline comments.Aug 26 2020, 9:34 PM
llvm/lib/Passes/StandardInstrumentations.cpp
346–347

static_cast<PrintCrashIRInstrumentation *>(P)

1035–1040

do we need to remove this signal handler when this is destructed?

Ok I guess this makes sense over --print-before-all.

To bikeshed a bit, maybe the option should be named --print-on-crash? For example --print-before-all does not contain "IR" in the name.

It would be nice to include a test. Maybe creating a new pass that crashes upon running could be useful.

Looks like the latest diff on Phab is against the first revision, could that be fixed?

For a test, I could create a pass that calls __builtin_trap that is only added to the opt pipeline by a really hidden option. The danger is that if someone uses that option, the compiler would crash, which seems dangerous just for testing. Is that desirable?

jamieschmeiser added inline comments.Aug 27 2020, 2:07 PM
llvm/lib/Passes/StandardInstrumentations.cpp
1035–1040

I don't think it is necessary because typically, control does not return from the signal handler and destructors are not run.

yrouban added inline comments.Aug 27 2020, 9:26 PM
llvm/lib/Passes/StandardInstrumentations.cpp
1035–1040
  1. The life time of the PrintCrashIRInstrumentation instance is shorter than the scope of signal handlers. So there is a chance when PrintCrashIRInstrumentation is destructed then LLVM is crashed. This results in executing the handler with the deleted instance of PrintCrashIRInstrumentation.
  2. AddSignalHandler() has a limited number of handlers. You should not add a new handler for every new instance of PrintCrashIRInstrumentation. I would suggest that only one crash reporting handler is registered and this reporter maintains a global set of live instances of PrintCrashIRInstrumentation.

Address review comments.

Change the name of the option to -print-on-crash.
Change the registered signal handler to be a global function that only
references other globals to avoid life-span problems.
Only register a single signal handler for multiple handlers that all refer
to the same global object. This may cause corruption of the output
in some cases due to race conditions but this is deemed acceptable
since this is for developer use, rather than seen by a user.
Create a pass for the new pass manager that always triggers a crash but
is not put into any pipeline to allow testing of the option.
Finally, a test for the option is added.

ychen added a subscriber: ychen.Aug 28 2020, 12:37 PM
yrouban added inline comments.Aug 30 2020, 10:14 PM
llvm/lib/Passes/StandardInstrumentations.cpp
351–354

I would suggest that :

  1. SavedIR is an instance field of PrintCrashIRInstrumentation.
  2. There is a global set of live PrintCrashIRInstrumentation. Every instance is added in its constructor and deleted in its destructor. The add and delete operations are synchronized.
  3. For all live PrintCrashIRInstrumentation instances the crashedIRReporter prints whatever it is supposed to print.
932

this must be done for every call of registerCallbacks disregard to HandlerRegistered.

aeubanks added inline comments.Aug 31 2020, 9:27 PM
llvm/test/Other/print-on-crash.ll
6

I think the typical way is not --crash opt ....
Not sure if this may REQUIRES: asserts or not

15

maybe check that some or the IR is printed? at least a function name

Respond to review comments: use mutex to protect building of vector of crash printers.
Create a global vector of crash printers that the signal handler uses to report the IRs in the event of a crash.
Adding a crash printer and removing it from the vector is protected using mutex locking.

Respond to review comments: change test to use not --crash and make it check for part of IR.

yrouban added inline comments.Sep 1 2020, 9:24 PM
llvm/lib/Passes/StandardInstrumentations.cpp
348–350

please make these both static private fields of PrintIRInstrumentation.
rename mtx with first uppercase.

942

This way CrashReporters vector is growing unlimited - memory leak.
Please use a pointer set and no registration index will be needed.
E.g. DenseSet<PrintCrashIRInstrumentation *> CrashReporters;

yrouban added inline comments.Sep 2 2020, 11:13 PM
llvm/lib/Passes/StandardInstrumentations.cpp
348–350

Please elaborate. Do you mean you disagree agree with my suggestion?
I just proposed to hide the mtx and CrashReporters making them static fields of PrintCrashIRInstrumentation:

StandardInstrumentations.h:

class PrintCrashIRInstrumentation {
  ...
  // Avoid races when creating/destroying the crash IR printers.
  static std::mutex mtx;
  // All of the crash reporters that will report on a crash.
  static std::vector<PrintCrashIRInstrumentation *> CrashReporters;
}

StandardInstrumentations.cpp:

std::mutex PrintCrashIRInstrumentation::mtx;
std::vector<PrintCrashIRInstrumentation *> PrintCrashIRInstrumentation::CrashReporters;

This is essentially the same as

std::mutex mtx;
std::vector<PrintCrashIRInstrumentation *> CrashReporters;
aeubanks added inline comments.Sep 3 2020, 1:49 PM
llvm/lib/Passes/StandardInstrumentations.cpp
348–350

These are global variables with constructors right? LLVM coding guide doesn't allow global constructors. I think that also applies to static members of classes.

And I think adding a signal handler per StandardInstrumentation is fine. Apparently LLVM only allows 8 max signal handlers (Signals.cpp), but we probably won't hit that, especially since this is a hidden option and likely we'll only have one StandardInstrumentation in most cases. We can always revisit later if necessary.

yrouban added inline comments.Sep 3 2020, 8:14 PM
llvm/lib/Passes/StandardInstrumentations.cpp
348–350

global variables ...
literally yes, but we have plenty of those of type cl::opt.
Another option I see is to create a singleton at the very first use. But it would complicate things.

Adding a signal handler for every instance of StandardInstrumentation is unacceptable for compilers that compile many units in one process. A new instance is created for every call of llvm::runPassPipeline().

Respond to review comments: Change global vector to pointer to DenseSet,
change objects from unnamed namespace to private statics.

Also, add assertions and make locking thread-safe.

aeubanks added inline comments.Sep 6 2020, 5:56 PM
llvm/lib/Passes/StandardInstrumentations.cpp
348–350

It seems you have some experience in globals based on some older llvm-dev threads you were involved in, seems like you have more background on this than me so I won't block on the mutex global (although of course a non-global way would be better if possible and relatively clean).

Ok multiple runs creating multiple StandardInstrumentations makes sense.

some nits, but lgtm as long as yrouban approves

llvm/include/llvm/Passes/StandardInstrumentations.h
23

is this include necessary?

llvm/lib/Passes/PassBuilder.cpp
268–278

could you move this right after the noop passes below?

jamieschmeiser edited the summary of this revision. (Show Details)

Respond to review comments: replace unneeded include with standard include, move code in file.
React to review comments on other PRs: change test prefixes to something more useful.
React to test failures with other PRs: change tests to have regex rather than having PassID in test
Add another test to test interaction with -print-module-scope

overall lgtm with some nits, but I'd like yrouban to take a look at the signal handler

llvm/lib/Passes/StandardInstrumentations.cpp
57–58

the last two sentences seem unnecessary. also hidden option doesn't need to be specified here and in the commit description.

So just "Print the IR being processed when a pass crashes" is good enough IMO

1077

why is this inline?

llvm/test/Other/print-on-crash.ll
24

nitpicking, but reducing this to
define i32 @main() {
would be ideal

Matt added a subscriber: Matt.Oct 5 2020, 12:19 PM

Made changes requested by @aeubanks, who is satisfied but would like @yrouban to check the signal handler.

@yrouban, @aeubanks indicated that he is satisfied but would like you to sign off on the signal handler.

If @yrouban doesn't respond I'd say it's fine to go ahead and submit this, just make sure that the title and summary are properly updated.

yrouban accepted this revision.Dec 14 2020, 11:52 PM
yrouban added inline comments.
llvm/include/llvm/Passes/StandardInstrumentations.h
156

I would suggest that we use just

static DenseSet<PrintCrashIRInstrumentation *> CrashReporters

It would simplify implementation but take some more bytes for empty DenseSet.

This revision is now accepted and ready to land.Dec 14 2020, 11:52 PM

the title should probably contain the new --print-on-crash flag name

Also, adding a file to dump the IR to would be awesome for debugging crashes, maybe in a future change

I think that is a great idea, especially since it is likely it is being captured anyway... Perhaps with -print-on-crash=<filename> with the current behaviour when =<filename> is not specified? As a subsequent change rather than in this one.

aeubanks accepted this revision.Jan 28 2021, 10:28 AM

that sounds good to me

dmajor added a subscriber: dmajor.Feb 3 2021, 6:56 PM

Is this ready to land?

Also, on further thought, do we really need all the mutex code? This is supposed to be for debugging anyway, not really used in production code like JITs.

Yes, it is ready to land. There was a fair bit of discussion during the review so I left it after it was accepted in case there were still concerns. The mutex code was added because it was requested in review. IIRC, there were concerns about multiple pass managers registering signal handlers. Since it is for debugging, efficiency of the mutex code is not as important.

Rebased and updated call to registerBeforeNonSkippedPassCallback. Also removed extra blank line at end of test.

This revision was landed with ongoing or failed builds.Tue, Mar 23, 6:30 AM
This revision was automatically updated to reflect the committed changes.
jamieschmeiser reopened this revision.Tue, Mar 23, 7:14 AM

The Buildbot has detected a failed build on builder mlir-windows while building llvm.

Full details are available at:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lab.llvm.org_buildbot-23builders_13_builds_6076&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Nr2SnYQn80wzIpml4veK2C-U4ilVZzi0kWYCc2WSLoA&m=CQQdxPSThSw2OucwuObeggJaCzdQ6vt0CFRALzhLKoY&s=IE0fvkJX313_gvazCZdO2fla5o5wb1oOEemSpIR3ZMM&e=
This revision is now accepted and ready to land.Tue, Mar 23, 7:14 AM

The build problem is that __builtin_trap is not recognized in PassBuilder.cpp.

maybe an assert(false) is more portable?