This is an archive of the discontinued LLVM Phabricator instance.

Add new hidden option -print-on-crash that prints out IR that caused opt pipeline to crash
ClosedPublic

Authored by aeubanks on Aug 26 2020, 1:26 PM.

Details

Summary

A new hidden option -print-on-crash that prints the IR as it was upon entering
the last pass when there is a crash.

The IR is saved in its print form before each pass is started and a
signal handler is registered. If the compilation crashes, the signal
handler will print the saved IR to dbgs(). This option
can be modified using -print-module-scope to get the IR for the complete
module. Note that this option only works with the new pass manager.

Diff Detail

Event Timeline

jamieschmeiser created this revision.Aug 26 2020, 1:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2020, 1:26 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
jamieschmeiser requested review of this revision.Aug 26 2020, 1:26 PM

This option will be introduced (along with others) in my tutorial at the LLVM developer's conference in October.

I'm worried that this will affect memory usage.
Is --print-before-all not good enough?

--print-before-all will print the IR before every pass which can result in the IR being printed literally thousands of times while this option prints it only once. The IR is saved in a string which is cleared each time and re-used so the memory overhead is not great; also this overhead is only hit when the option is requested. It does slow the compilation when used but not as much as --print-before-all and the user does not have as difficult a task in extracting the IR from the output. If the option is false (the default), the overhead is only the checking of the option. Basically, --print-before-all is cumbersome for this purpose while this option is convenient. Also, one could reasonably request a user to retrieve the IR (if they are willing) with this option when reporting a crash while requesting a user to use --print-before-all would not be reasonable.

[NFC] Reorder includes and change case of variable as suggested by
clang-tidy/clang-format.

yrouban added inline comments.Aug 26 2020, 9:34 PM
llvm/lib/Passes/StandardInstrumentations.cpp
202–212

static_cast<PrintCrashIRInstrumentation *>(P)

360–386

do we need to remove this signal handler when this is destructed?

Ok I guess this makes sense over --print-before-all.

To bikeshed a bit, maybe the option should be named --print-on-crash? For example --print-before-all does not contain "IR" in the name.

It would be nice to include a test. Maybe creating a new pass that crashes upon running could be useful.

Looks like the latest diff on Phab is against the first revision, could that be fixed?

For a test, I could create a pass that calls __builtin_trap that is only added to the opt pipeline by a really hidden option. The danger is that if someone uses that option, the compiler would crash, which seems dangerous just for testing. Is that desirable?

jamieschmeiser added inline comments.Aug 27 2020, 2:07 PM
llvm/lib/Passes/StandardInstrumentations.cpp
360–386

I don't think it is necessary because typically, control does not return from the signal handler and destructors are not run.

yrouban added inline comments.Aug 27 2020, 9:26 PM
llvm/lib/Passes/StandardInstrumentations.cpp
360–386
  1. The life time of the PrintCrashIRInstrumentation instance is shorter than the scope of signal handlers. So there is a chance when PrintCrashIRInstrumentation is destructed then LLVM is crashed. This results in executing the handler with the deleted instance of PrintCrashIRInstrumentation.
  2. AddSignalHandler() has a limited number of handlers. You should not add a new handler for every new instance of PrintCrashIRInstrumentation. I would suggest that only one crash reporting handler is registered and this reporter maintains a global set of live instances of PrintCrashIRInstrumentation.

Address review comments.

Change the name of the option to -print-on-crash.
Change the registered signal handler to be a global function that only
references other globals to avoid life-span problems.
Only register a single signal handler for multiple handlers that all refer
to the same global object. This may cause corruption of the output
in some cases due to race conditions but this is deemed acceptable
since this is for developer use, rather than seen by a user.
Create a pass for the new pass manager that always triggers a crash but
is not put into any pipeline to allow testing of the option.
Finally, a test for the option is added.

ychen added a subscriber: ychen.Aug 28 2020, 12:37 PM
yrouban added inline comments.Aug 30 2020, 10:14 PM
llvm/lib/Passes/StandardInstrumentations.cpp
207–210

I would suggest that :

  1. SavedIR is an instance field of PrintCrashIRInstrumentation.
  2. There is a global set of live PrintCrashIRInstrumentation. Every instance is added in its constructor and deleted in its destructor. The add and delete operations are synchronized.
  3. For all live PrintCrashIRInstrumentation instances the crashedIRReporter prints whatever it is supposed to print.
375

this must be done for every call of registerCallbacks disregard to HandlerRegistered.

aeubanks added inline comments.Aug 31 2020, 9:27 PM
llvm/test/Other/print-on-crash.ll
5

I think the typical way is not --crash opt ....
Not sure if this may REQUIRES: asserts or not

14

maybe check that some or the IR is printed? at least a function name

Respond to review comments: use mutex to protect building of vector of crash printers.
Create a global vector of crash printers that the signal handler uses to report the IRs in the event of a crash.
Adding a crash printer and removing it from the vector is protected using mutex locking.

Respond to review comments: change test to use not --crash and make it check for part of IR.

yrouban added inline comments.Sep 1 2020, 9:24 PM
llvm/lib/Passes/StandardInstrumentations.cpp
204–206

please make these both static private fields of PrintIRInstrumentation.
rename mtx with first uppercase.

385

This way CrashReporters vector is growing unlimited - memory leak.
Please use a pointer set and no registration index will be needed.
E.g. DenseSet<PrintCrashIRInstrumentation *> CrashReporters;

yrouban added inline comments.Sep 2 2020, 11:13 PM
llvm/lib/Passes/StandardInstrumentations.cpp
204–206

Please elaborate. Do you mean you disagree agree with my suggestion?
I just proposed to hide the mtx and CrashReporters making them static fields of PrintCrashIRInstrumentation:

StandardInstrumentations.h:

class PrintCrashIRInstrumentation {
  ...
  // Avoid races when creating/destroying the crash IR printers.
  static std::mutex mtx;
  // All of the crash reporters that will report on a crash.
  static std::vector<PrintCrashIRInstrumentation *> CrashReporters;
}

StandardInstrumentations.cpp:

std::mutex PrintCrashIRInstrumentation::mtx;
std::vector<PrintCrashIRInstrumentation *> PrintCrashIRInstrumentation::CrashReporters;

This is essentially the same as

std::mutex mtx;
std::vector<PrintCrashIRInstrumentation *> CrashReporters;
aeubanks added inline comments.Sep 3 2020, 1:49 PM
llvm/lib/Passes/StandardInstrumentations.cpp
204–206

These are global variables with constructors right? LLVM coding guide doesn't allow global constructors. I think that also applies to static members of classes.

And I think adding a signal handler per StandardInstrumentation is fine. Apparently LLVM only allows 8 max signal handlers (Signals.cpp), but we probably won't hit that, especially since this is a hidden option and likely we'll only have one StandardInstrumentation in most cases. We can always revisit later if necessary.

yrouban added inline comments.Sep 3 2020, 8:14 PM
llvm/lib/Passes/StandardInstrumentations.cpp
204–206

global variables ...
literally yes, but we have plenty of those of type cl::opt.
Another option I see is to create a singleton at the very first use. But it would complicate things.

Adding a signal handler for every instance of StandardInstrumentation is unacceptable for compilers that compile many units in one process. A new instance is created for every call of llvm::runPassPipeline().

Respond to review comments: Change global vector to pointer to DenseSet,
change objects from unnamed namespace to private statics.

Also, add assertions and make locking thread-safe.

aeubanks added inline comments.Sep 6 2020, 5:56 PM
llvm/lib/Passes/StandardInstrumentations.cpp
204–206

It seems you have some experience in globals based on some older llvm-dev threads you were involved in, seems like you have more background on this than me so I won't block on the mutex global (although of course a non-global way would be better if possible and relatively clean).

Ok multiple runs creating multiple StandardInstrumentations makes sense.

some nits, but lgtm as long as yrouban approves

llvm/include/llvm/Passes/StandardInstrumentations.h
21

is this include necessary?

llvm/lib/Passes/PassBuilder.cpp
261–271

could you move this right after the noop passes below?

jamieschmeiser edited the summary of this revision. (Show Details)

Respond to review comments: replace unneeded include with standard include, move code in file.
React to review comments on other PRs: change test prefixes to something more useful.
React to test failures with other PRs: change tests to have regex rather than having PassID in test
Add another test to test interaction with -print-module-scope

overall lgtm with some nits, but I'd like yrouban to take a look at the signal handler

llvm/test/Other/print-on-crash.ll
24

nitpicking, but reducing this to
define i32 @main() {
would be ideal

aeubanks added inline comments.Sep 29 2020, 3:49 PM
llvm/lib/Passes/StandardInstrumentations.cpp
50–51

the last two sentences seem unnecessary. also hidden option doesn't need to be specified here and in the commit description.

So just "Print the IR being processed when a pass crashes" is good enough IMO

365

why is this inline?

Matt added a subscriber: Matt.Oct 5 2020, 12:19 PM

Made changes requested by @aeubanks, who is satisfied but would like @yrouban to check the signal handler.

@yrouban, @aeubanks indicated that he is satisfied but would like you to sign off on the signal handler.

If @yrouban doesn't respond I'd say it's fine to go ahead and submit this, just make sure that the title and summary are properly updated.

yrouban accepted this revision.Dec 14 2020, 11:52 PM
yrouban added inline comments.
llvm/include/llvm/Passes/StandardInstrumentations.h
91

I would suggest that we use just

static DenseSet<PrintCrashIRInstrumentation *> CrashReporters

It would simplify implementation but take some more bytes for empty DenseSet.

This revision is now accepted and ready to land.Dec 14 2020, 11:52 PM

the title should probably contain the new --print-on-crash flag name

Also, adding a file to dump the IR to would be awesome for debugging crashes, maybe in a future change

I think that is a great idea, especially since it is likely it is being captured anyway... Perhaps with -print-on-crash=<filename> with the current behaviour when =<filename> is not specified? As a subsequent change rather than in this one.

aeubanks accepted this revision.Jan 28 2021, 10:28 AM

that sounds good to me

dmajor added a subscriber: dmajor.Feb 3 2021, 6:56 PM

Is this ready to land?

Also, on further thought, do we really need all the mutex code? This is supposed to be for debugging anyway, not really used in production code like JITs.

Yes, it is ready to land. There was a fair bit of discussion during the review so I left it after it was accepted in case there were still concerns. The mutex code was added because it was requested in review. IIRC, there were concerns about multiple pass managers registering signal handlers. Since it is for debugging, efficiency of the mutex code is not as important.

Rebased and updated call to registerBeforeNonSkippedPassCallback. Also removed extra blank line at end of test.

This revision was landed with ongoing or failed builds.Mar 23 2021, 6:30 AM
This revision was automatically updated to reflect the committed changes.
jamieschmeiser reopened this revision.Mar 23 2021, 7:14 AM

The Buildbot has detected a failed build on builder mlir-windows while building llvm.

Full details are available at:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lab.llvm.org_buildbot-23builders_13_builds_6076&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Nr2SnYQn80wzIpml4veK2C-U4ilVZzi0kWYCc2WSLoA&m=CQQdxPSThSw2OucwuObeggJaCzdQ6vt0CFRALzhLKoY&s=IE0fvkJX313_gvazCZdO2fla5o5wb1oOEemSpIR3ZMM&e=
This revision is now accepted and ready to land.Mar 23 2021, 7:14 AM

The build problem is that __builtin_trap is not recognized in PassBuilder.cpp.

maybe an assert(false) is more portable?

anna added a subscriber: anna.Jul 26 2021, 10:38 AM

maybe an assert(false) is more portable?

@jamieschmeiser does that help? Wondering if that will allow to land this. This is a good option to have.

@anna Sorry, this kind of dropped off my radar. I will check into it and try to land this.

jamieschmeiser edited the summary of this revision. (Show Details)

Update and change test pass to use assert(false)

fix Clang-tidy problems

back to the mutex discussion, I feel like it's a very narrow use case to use --print-on-crash with multiple threads
I'd rather land this without the mutex stuff and add the mutex stuff later if we really decide we need it

llvm/lib/Passes/PassBuilder.cpp
412

this should probably be a module pass

417

I'd move the definition into the class definition

llvm/test/Other/print-on-crash.ll
2

we should probably add REQUIRES: asserts to make sure the assert in the pass triggers

4

not true anymore

15

unnecessary comment

I agree that the mutex code is overkill, considering that this is intended for debugging. If things go off the rails because of the lacking mutex code, the consequences are not severe since it is expected to crash anyway (or else why would you use this?)
I will removed the mutex code and clean things up based on your comments unless someone raises an objection.

As requested, I removed the mutex code and cleaned up a few things.

mostly looks good
should we always print the full module? generally if we're looking at some crashed IR, we want to be able to run opt on it, and just printing out a function won't allow us to do that

llvm/include/llvm/Passes/StandardInstrumentations.h
91

we don't need a set anymore right? just one crash reporter

llvm/lib/Passes/PassBuilder.cpp
412

extraneous struct?

Regarding always printing the module, I'm not sure that is desirable. The
problematic IR may be very large and the option respects -print-module-scope
so the module IR can be obtained. There are also tools for creating a fleshed-out IR
from the IR for a function.

There are also tools for creating a fleshed-out IR from the IR for a function.

Can you elaborate?

For now this still lgtm, we can iterate on it later

I think I am getting confused with llvm-extract --func=<function> which will extract what is needed for a function from a module IR. However, one could likely use that to get an IR with the necessary addition declarations and then substitute in the IR for the function using this option. I wonder if it might be possible to use the code in llvm-extract to do that automatically? Something to think about...

I think I am getting confused with llvm-extract --func=<function> which will extract what is needed for a function from a module IR. However, one could likely use that to get an IR with the necessary addition declarations and then substitute in the IR for the function using this option. I wonder if it might be possible to use the code in llvm-extract to do that automatically? Something to think about...

llvm-extract is a bit finnicky, it basically relies on GlobalDCE and has to change function linkage in order to preserve functions.

The title needs updating for the option name

Btw I just tried this on a very large file with a (real) LLVM crash and it was very slow and I gave up. It's probably from printing the IR after every pass. Perhaps cloning the module via CloneModule() and keeping that around might be faster?

DaniilSuchkov added a comment.EditedOct 1 2021, 4:06 PM

The title needs updating for the option name

Btw I just tried this on a very large file with a (real) LLVM crash and it was very slow and I gave up. It's probably from printing the IR after every pass. Perhaps cloning the module via CloneModule() and keeping that around might be faster?

Keep in mind that:
a) This is still (supposedly) a quality of life improvement over using -print-after-all -print-module-scope. In my experience -print-after-all -print-module-scope in some cases can take hours (and then I have to deal with a few GB long log).
b) There are bugs that can corrupt IR in other modules (like the one I fixed here: D110752), so keeping a live Module object isn't the best idea.

Serializing the module into the binary representation and then de-serizalining and printing it upon crash is an option, though I'm not sure it's robust enough, since the LLVMContext can be arbitrarily broken at that point.

The title needs updating for the option name

Btw I just tried this on a very large file with a (real) LLVM crash and it was very slow and I gave up. It's probably from printing the IR after every pass. Perhaps cloning the module via CloneModule() and keeping that around might be faster?

Keep in mind that:
a) This is still (supposedly) a quality of life improvement over using -print-after-all -print-module-scope. In my experience -print-after-all -print-module-scope in some cases can take hours (and then I have to deal with a few GB long log).

The current implementation isn't that much better than just -print-after-all -print-module-scope. It's easy to look at a huge log, search for the last instance of "IR Dump", and delete everything from that to the beginning of the file. I think making this quick is very important for it to be much more useful than -print-after-all -print-module-scope.
For example, I had a repro that took 2 seconds to crash. I tried specifying (with this patched in) --print-on-crash, and gave up after several minutes. If could keep the slowdown of turning this on within an order of magnitude of the original repro that'd be super useful.

b) There are bugs that can corrupt IR in other modules (like the one I fixed here: D110752), so keeping a live Module object isn't the best idea.

D110752 is not corrupting IR, it happened because we were looking at a constant's users, which is not a thing that makes sense to do (e.g. look at all users of the constant false).

Serializing the module into the binary representation and then de-serizalining and printing it upon crash is an option, though I'm not sure it's robust enough, since the LLVMContext can be arbitrarily broken at that point.

I guess it's possible that LLVMContext is broken, but that's much rarer than simply seeing a random crash in a pass. I don't think it's worth handling the small chance that something fundamental is broken to slow down the common use case.

But again, still fine to land this and improve on it later if you'd like.

D110752 is not corrupting IR, it happened because we were looking at a constant's users, which is not a thing that makes sense to do (e.g. look at all users of the constant false).

Yep, what it did doesn't make any sense because it was a bug. Since here we're discussing a debugging option, in my opinion it makes sense to take such cases into account.

jamieschmeiser retitled this revision from Add new hidden option -print-crash-IR that prints out IR that caused opt pipeline to crash to Add new hidden option -print-on-crash that prints out IR that caused opt pipeline to crash.

Added the usual filtering options (filter-passes and filter-print-func) with uninteresting passes being saved with a message that they were filtered out. This should speed up things significantly when they are used. If the user isn't aware of the name of the crashing pass, they can use -filter-passes=blah (where blah doesn't exist) and they will get a message saying that the pass was filtered out. They can discover the name from that message and then rerun it with -filter-passes.

aeubanks added inline comments.Oct 4 2021, 6:23 PM
llvm/lib/Passes/StandardInstrumentations.cpp
18

is this necessary?

Remove unnecessary include.

This revision was landed with ongoing or failed builds.Oct 7 2021, 12:03 PM
This revision was automatically updated to reflect the committed changes.
MaskRay added inline comments.
llvm/lib/Passes/PassBuilder.cpp
412

error: non-void function does not return a value [-Werror,-Wreturn-type]

use llvm_unreachable

This revision is now accepted and ready to land.Oct 7 2021, 12:46 PM

missed @MaskRay 's comment, will add that before attempting a relanding

fhahn added a subscriber: fhahn.Mar 4 2022, 5:44 AM

I saw the patch got reverted a while ago. Are there any plans for re-submitting the change?

This would be very valuable to have in combination with an option to print the whole module before the crashing pass to a a file or the function causing the crash extracted from the whole module, so there's a reproducer file that can be used easily.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 5:44 AM
aeubanks commandeered this revision.Mar 8 2022, 9:53 AM
aeubanks edited reviewers, added: jamieschmeiser; removed: aeubanks.

I'll reland this using the crash passes added in D120993 once that lands