Page MenuHomePhabricator

[Support] make report_fatal_error `abort` instead of `exit`
AcceptedPublic

Authored by ychen on Sep 20 2019, 10:59 AM.

Details

Summary

This patch could be treated as a rebase of D33960. It also fixes PR35547.
A fix for llvm/test/Other/close-stderr.ll is proposed in D68164. Seems
the consensus is that the test is passing by chance and I'm not
sure how important it is for us. So it is removed like in D33960 for now.
The rest of the test fixes are just adding --crash flag to not tool.

  • The reason it fixes PR35547 is

exit does cleanup including calling class destructor whereas abort
does not do any cleanup. In multithreading environment such as ThinLTO or JIT,
threads may share states which mostly are ManagedStatic<>. If faulting thread
tearing down a class when another thread is using it, there are chances of
memory corruption. This is bad 1. It will stop error reporting like pretty
stack printer; 2. The memory corruption is distracting and nondeterministic in
terms of error message, and corruption type (depending one the timing, it
could be double free, heap free after use, etc.).

Event Timeline

ychen created this revision.Sep 20 2019, 10:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2019, 10:59 AM
rnk added a subscriber: MatzeB.Sep 20 2019, 11:48 AM

This has been proposed before by @MatzeB:
https://reviews.llvm.org/D33960

At the time, there was an issue that some people had been "fixing" fuzzer bugs by replacing asserts with if (cond) report_fatal_error(...);. Moving to abort will reopen all of those fuzzer bugs. However, I think we have consensus that that is the desired behavior: every call to report_fatal_error is essentially a bug,. Code should be using the error handlers on MCContext and LLVMContext instead and recovering.

ychen added a comment.Sep 20 2019, 5:17 PM
In D67847#1677251, @rnk wrote:

This has been proposed before by @MatzeB:
https://reviews.llvm.org/D33960

At the time, there was an issue that some people had been "fixing" fuzzer bugs by replacing asserts with if (cond) report_fatal_error(...);. Moving to abort will reopen all of those fuzzer bugs. However, I think we have consensus that that is the desired behavior: every call to report_fatal_error is essentially a bug,. Code should be using the error handlers on MCContext and LLVMContext instead and recovering.

Thanks for the pointer, Reid. It feels to me that both versions of report_fatal_error have their practical value in the codebase. If the only known value of the exit version is to make the fuzzer happy, should we just make fuzzer not report abort version of report_fatal_error? This way we could encourage the correct error handling and not miss anything from the fuzzer side?

PS: I think ThinLTO is another case of should not use report_fatal_error in the first place.

The main alternative worth mentioning is _exit, which LLD already uses for the same reason you mention here.

Honestly, I think there is consensus that calling abort here is the right thing and we should just do it. It's not like LLVM is already fuzzer clean, and if this does reopen fuzzer bugs, it's good to fix them properly anyway. :)

I mostly wanted to link back to the past attempt to do this to get the reviewers involved in that to +1 this new attempt. I guess I'll @ some more: @davide, @filcab, @mehdi_amini

LGTM: report_fatal_error() should not be considered like a "normal way" of aborting the compilation flow, crashing is never a good thing: in general I'm in favor of better error propagation (we claim that LLVM is a library framework right?)

This sounds fine to me. We should probably replace most of report_fatal_error with llvm::Error and propagate them correctly. In the meantime, having better crash is an improvement and does not conflict with what we want to do eventually.

MaskRay accepted this revision.EditedSep 30 2019, 10:36 PM
MaskRay added a subscriber: MaskRay.

This also fixes a C-compliance problem. report_fatal_error() may be called by a destructor, registered by atexit(). C11 says "If a program calls the exit function more than once, or calls the quick_exit function in addition to the exit function, the behavior is undefined."

As noted at https://reviews.llvm.org/D68164#1689155, close-stderr.ll should be updated to use not --crash opt --reject-this-option 2>&-. Don't rely on the exit code.

This revision is now accepted and ready to land.Sep 30 2019, 10:36 PM
ychen updated this revision to Diff 222734.Oct 1 2019, 5:33 PM

Fix tests

Herald added a project: Restricted Project. · View Herald Transcript
ychen edited the summary of this revision. (Show Details)Oct 2 2019, 11:11 AM

The abort() function raises SIGABRT, for which the default behavior is to trigger a coredump. Do we actually want that behavior?

Either _exit() (long available extension, which lld already uses) or quick_exit() (the new C standard way) seem possibly preferable?

ychen added a comment.Oct 2 2019, 2:04 PM

The abort() function raises SIGABRT, for which the default behavior is to trigger a coredump. Do we actually want that behavior?

Either _exit() (long available extension, which lld already uses) or quick_exit() (the new C standard way) seem possibly preferable?

I think this patch is more for the purpose of making error handling better than for fixing PR35547. Fixing PR35547 is probably just a side effect. Apologies that I should have updated the summary to reflect that.

ychen edited the summary of this revision. (Show Details)Oct 2 2019, 4:35 PM
ychen edited the summary of this revision. (Show Details)
rnk accepted this revision.Oct 3 2019, 2:51 PM

The abort() function raises SIGABRT, for which the default behavior is to trigger a coredump. Do we actually want that behavior?

Either _exit() (long available extension, which lld already uses) or quick_exit() (the new C standard way) seem possibly preferable?

It's easy to crash LLVM even without this change, so anyone running LLVM better have core dumps configured the way they like. Failed asserts raise SIGABRT already, for example, and we have tons of those. The only difference is that now end users, who may have never configured this stuff, may see more crashes. If it's a problem, I'd consider it QoI: we should fix the report_fatal_error to use proper diagnostics anyway so end users don't see them as often, just as we would treat any other user-visible crash.