Page MenuHomePhabricator

ErrorHandling: report_fatal_error: call abort() instead of exit()
Needs RevisionPublic

Authored by MatzeB on Jun 6 2017, 2:22 PM.

Details

Summary

report_fatal_error(): Call abort() instead of exit(); this allows
debuggers, crashtracers etc. to show the point in the compiler where the
problem occured.

Note that this change will not affect clang or other frontends that
register a fatal_error_handler that exits the program by other means.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB created this revision.Jun 6 2017, 2:22 PM
dblaikie accepted this revision.Jun 7 2017, 8:25 AM

Seems OK to me. Any reason not to use abort always? (probably is a good reason - just curious to hear/understand/have it written down)

This revision is now accepted and ready to land.Jun 7 2017, 8:25 AM
MatzeB updated this revision to Diff 114098.Sep 6 2017, 4:41 PM
MatzeB retitled this revision from ErrorHandling: Replace exit() with abort() for debug builds. to ErrorHandling: report_fatal_error: call abort() instead of exit().
MatzeB edited the summary of this revision. (Show Details)

Turns out this affects a bunch of testcases. Updating them here.

  • There is a problem with test/Other/close-stderr.ll which expected an exit(1) when printing to stderr fails. With this patch in place we do report_fatal_error() that the write to stderr failed and end up in an abort() again. Is it okay to remove the test?
  • I decided that it may be best to always call abort(). This avoid an unnecessary discrepancy between release/debug builds and it turns out that frontends like clang or swift registers a custom handlers for fatal errors anyway and are not affected by this change.
MatzeB requested review of this revision.Sep 6 2017, 4:41 PM
MatzeB edited edge metadata.
MatzeB added a subscriber: atrick.
atrick added a comment.Sep 6 2017, 4:46 PM

Please add a comment that abort() also causes any signal handlers registered with LLVM such as PrettyStackTrace to run, which is important for compiler debugging.

Hrm.

Seeing that closing stderr test - I wouldn't mind some more eyes on this change. There's a fair amount of report_fatal_error usage in LLVM tools that probably aren't abort-worthy? But I'm not sure. But maybe that's OK (I mean they're mostly internal tools anyway)

MatzeB added a comment.Sep 6 2017, 4:53 PM

Hrm.

Seeing that closing stderr test - I wouldn't mind some more eyes on this change. There's a fair amount of report_fatal_error usage in LLVM tools that probably aren't abort-worthy? But I'm not sure. But maybe that's OK (I mean they're mostly internal tools anyway)

Many of the report_fatal_errors in LLVM are abort worthy in my opinion.
There are indeed some like the bitcode parsing failures that really should rather use one of the other mechanisms available such as passing an Error() value to the caller or at least using LLVMContext::diagnose(). I hope this won't stop this patch, in fact it may be nice that this patch makes these things more obvious and hopefully brings people to use more apropriate APIs there.

chandlerc accepted this revision.Sep 7 2017, 4:40 PM

Hrm.

Seeing that closing stderr test - I wouldn't mind some more eyes on this change. There's a fair amount of report_fatal_error usage in LLVM tools that probably aren't abort-worthy? But I'm not sure. But maybe that's OK (I mean they're mostly internal tools anyway)

I agree that some of the LLVM tools probably need to move to other error reporting mechanisms.

Many of the report_fatal_errors in LLVM are abort worthy in my opinion.

I suspect essentially all of the ones in libraries are. =]

There are indeed some like the bitcode parsing failures that really should rather use one of the other mechanisms available such as passing an Error() value to the caller or at least using LLVMContext::diagnose(). I hope this won't stop this patch, in fact it may be nice that this patch makes these things more obvious and hopefully brings people to use more apropriate APIs there.

I think this change is probably a good starting point. I see David's point in that I think we may want to revisit some uses of report_fatal_error in tools. But as he says, they're internal tools, and so I feel like we can incrementally go after the ones that are indeed inappropriate to use 'abort()'. We can even to quick fixes by setting up a custom handler much like frontends do.

This revision is now accepted and ready to land.Sep 7 2017, 4:40 PM
rnk requested changes to this revision.Sep 7 2017, 4:59 PM
rnk added a subscriber: rnk.

I'm not sure this is a good idea. For years, report_fatal_error outside clang hasn't printed a stack trace or a banner, and that's intentional behavior.

I think the question is, is a report_fatal_error exit indicative of a bug, or is it possibly working as intended? For a long time we've treated it as a valid (but lazy and crappy) way to handle errors in library code. Are we changing that? I could be OK with that as long as its announced. If we want to do this, we should announce it as a new direction for the project on llvm-dev.

This revision now requires changes to proceed.Sep 7 2017, 4:59 PM
In D33960#864213, @rnk wrote:

I'm not sure this is a good idea. For years, report_fatal_error outside clang hasn't printed a stack trace or a banner, and that's intentional behavior.

I agree it has been the behavior, but I'm not sure how intentional it has been... =[

I think the question is, is a report_fatal_error exit indicative of a bug, or is it possibly working as intended? For a long time we've treated it as a valid (but lazy and crappy) way to handle errors in library code. Are we changing that? I could be OK with that as long as its announced. If we want to do this, we should announce it as a new direction for the project on llvm-dev.

I feel like almost every time I have hit report_fatal_error, I have wanted it to break in a debugger and potentially provide a stack trace. Maybe I'm weird though.

But I don't see how library code calling exit(1) is *ever* going to be a useful error handling mechanism. So I think any place we do this in libraries is either trying to do something like abort() (and reasonable) or trying to propagate an error and needs to be fixed pretty badly. Maybe by making it print stack traces and such we will actually be better at going and fixing these issues.

All that said I'm pretty happy asking for somewhat widespread communication of the fact that this is changing. It *is* a change, and we shouldn't paper over that. I just think it is probably the right direction.

MatzeB added a comment.Sep 7 2017, 5:12 PM

I think the question is, is a report_fatal_error exit indicative of a bug, or is it possibly working as intended? For a long time we've treated it as a valid (but lazy and crappy) way to handle errors in library code. Are we changing that? I could be OK with that as long as its announced. If we want to do this, we should announce it as a new direction for the project on llvm-dev.

I feel like almost every time I have hit report_fatal_error, I have wanted it to break in a debugger and potentially provide a stack trace. Maybe I'm weird though.

Exactly the same for me and that was my motivation for this patch.

But I don't see how library code calling exit(1) is *ever* going to be a useful error handling mechanism. So I think any place we do this in libraries is either trying to do something like abort() (and reasonable) or trying to propagate an error and needs to be fixed pretty badly. Maybe by making it print stack traces and such we will actually be better at going and fixing these issues.

All that said I'm pretty happy asking for somewhat widespread communication of the fact that this is changing. It *is* a change, and we shouldn't paper over that. I just think it is probably the right direction.

I agree.

atrick added a comment.Sep 7 2017, 5:14 PM

Note that you can get the stack trace just call RunSignalHandlers before calling exit(1)...

But, I wholeheartedly agree with Chandler's evaluation. exit() makes no sense for a library error. Announce it to the community and fix the behavior.

rnk added a comment.Sep 7 2017, 5:24 PM

Oh yeah, the reason I added @filcab: People have been "fixing" AFL and other fuzzer bugs by adding turning asserts into if (X) report_fatal_error("invalid input");. See rL238272, rL238269, rL238268, etc. If you call abort, these will be treated as fuzzer bugs. Maybe that's what you want (for people to actually handle errors), but that's not the status quo.

Another way of looking at it is, does LLVM really need an assert that doesn't compile out in release builds? That's kind of what it will become with this patch.

Re: stopping in the debugger, just do catch syscall exit + exit_group. gdb's behavior of disposing of the inferior after it exits always struck me as weird.

MatzeB added a comment.Sep 7 2017, 5:31 PM

Did a quick ad-hoc and incomplete survey of report_fatal_error usage in LLVM.

Abort Worthy:

  • Situations where IR/MI violated assumptions: IR/Verifier, MachineVerifier, WinEHPrepare, LiveRangeCalc, TargetLoweringObjFile, Coro*, Analysis,
  • Unimplemented functionality: Support/YAMLParser,
  • Out of resources/memory: MachineOutliner
  • Internal compiler problems: Expected passes not in pipeline, target callbacks mibehaving: SafeStack, SplitKit, TargetInstrInfo

User/Input errors:

  • Wrong options: TargetPassConfig
  • Parse Errors: Bitcode/Reader, IR/DataLayout
  • Impossible to allocate registers for function: RegAllocBase
  • Target doesn't support a feature/calling convention/address space/...: TargetLoweringObjectFile, various target files

So far I'd say the ones in the second category really shouldn't be use report_fatal_error.

Did a quick ad-hoc and incomplete survey of report_fatal_error usage in LLVM.

Abort Worthy:

  • Situations where IR/MI violated assumptions: IR/Verifier, MachineVerifier, WinEHPrepare, LiveRangeCalc, TargetLoweringObjFile, Coro*, Analysis,
  • Unimplemented functionality: Support/YAMLParser,
  • Out of resources/memory: MachineOutliner
  • Internal compiler problems: Expected passes not in pipeline, target callbacks mibehaving: SafeStack, SplitKit, TargetInstrInfo

    User/Input errors:
  • Wrong options: TargetPassConfig
  • Parse Errors: Bitcode/Reader, IR/DataLayout

I would like to work on getting rid of report_fatal_error from the Bitstream reader sometime this year / next year as it's used for Clang's serialized diagnostics and report_fatal_error can bring down Xcode if the diagnostics are malformed.

  • Impossible to allocate registers for function: RegAllocBase
  • Target doesn't support a feature/calling convention/address space/...: TargetLoweringObjectFile, various target files

    So far I'd say the ones in the second category really shouldn't be use report_fatal_error.