This is an archive of the discontinued LLVM Phabricator instance.

[Support] - Add bad alloc error handler for handling allocation malfunctions
ClosedPublic

Authored by rnk on Jun 28 2017, 7:53 AM.

Details

Summary

We would like to introduce a new type of llvm error handler for handling bad alloc fault situations.
LLVM already provides a fatal error handler for serious non-recoverable error situations which by
default writes some error information to stderr and calls exit(1) at the end (functions are marked as
'noreturn').

For long running processes (e.g. a server application), exiting the process is not an acceptable option,
especially not when the system is in a temporary resource bottleneck with a good chance to recover from
this fault situation. In such a situation you would rather throw an exception to stop the current
compilation and try to overcome the resource bottleneck. The user should be aware of the problem of throwing
an exception in bad alloc situations, e.g. you must not do any allocations in the unwind chain. This is especially
true when adding exceptions in existing unfamiliar code (as already stated in the comment of the current fatal error
handler)

So the new handler can also be used to distinguish from general fatal error situations where recovering is no option.
It should be used in cases where a clean unwind after the allocation is guaranteed.

This patch contains:

  • A report_bad_alloc function which calls a user defined bad alloc error handler. If no user handler is registered the report_fatal_error function is called. This function is not marked as 'noreturn'.
  • A install/restore_bad_alloc_error_handler to install/restore the bad alloc handler.
  • An example (in Mutex.cpp) where the report_bad_alloc function is called in case of a malloc returns a nullptr.

If this patch gets accepted we would create similar patches to fix corresponding malloc/calloc usages in the llvm code.

Patch by Klaus Kretzschmar

Event Timeline

kkretzsch created this revision.Jun 28 2017, 7:53 AM
MatzeB added a subscriber: MatzeB.Jul 5 2017, 10:40 AM

In general: This may warrant an RFC to llvm-dev to discuss the scope of these patches. I feel that making all of LLVM abort gracefully in out of memory situations will be hard/unmaintainable. I believe the trick would be to limit the scope. Remembering your EuroLLVM talk it seemed that you would only need IR construction in the long running server process so you can hand over that IR to shorter lived/restartable worker processes doing the actual code generation. So the scope may be limited to this use case avoiding the need to fix all the transformation passes to recover gracefully.

Specifically about this patch: So a failing malloc would now call the handler, but a failing new would throw an exception (or more likely abort() in the default no-exceptions configuration of LLVM. Maybe we should rather: Use new instead of malloc wherever possible. I'm not sure why this particular code uses malloc at all? For the cases that still use malloc (only good reason I can think of would be using realloc later) maybe we can trigger the same C++ exception as a failing new?

kkretzsch abandoned this revision.Jul 6 2017, 3:50 AM

In general: This may warrant an RFC to llvm-dev to discuss the scope of these patches. I feel that making all of LLVM abort gracefully in out of memory situations will be hard/unmaintainable. I believe the trick would be to limit the scope. Remembering your EuroLLVM talk it seemed that you would only need IR construction in the long running server process so you can hand over that IR to shorter lived/restartable worker processes doing the actual code generation. So the scope may be limited to this use case avoiding the need to fix all the transformation passes to recover gracefully.

Yes, I agree limiting the scope is necessary and has to be discussed with the people. Howver, on short term we would like to start with small patches that fixes some obvious flaws, like exchanging mallocs by raw new operators as you suggest

! In D34753#799802, @MatzeB wrote:

Specifically about this patch: So a failing malloc would now call the handler, but a failing new would throw an exception (or more likely abort() in the default no-exceptions configuration of LLVM. Maybe we should rather: Use new instead of malloc wherever possible. I'm not sure why this particular code uses malloc at all? For the cases that still use malloc (only good reason I can think of would be using realloc later) maybe we can trigger the same C++ exception as a failing new?

Good idea. I'll run through our malloc patches and check where I can replace the mallocs . So a new bad alloc handler is not necessary anymore, so I'm closing this patch request.

kkretzsch reclaimed this revision.Jul 7 2017, 7:51 AM

! In D34753#799802, @MatzeB wrote:

Specifically about this patch: So a failing malloc would now call the handler, but a failing new would throw an exception (or more likely abort() in the default no-exceptions configuration of LLVM. Maybe we should rather: Use new instead of malloc wherever possible. I'm not sure why this particular code uses malloc at all? For the cases that still use malloc (only good reason I can think of would be using realloc later) maybe we can trigger the same C++ exception as a failing new?

Good idea. I'll run through our malloc patches and check where I can replace the mallocs . So a new bad alloc handler is not necessary anymore, so I'm closing this patch request.

SmallPtrsSet and SmallVector uses realloc which means that the mallocs and frees in that code cannot be replaced with global operator new/delete. Coming back to your original comment, overriding the global ::operator new in a way that it calls the bad alloc handler in case of OOM would create a consistent behaviour between the malloc/realloc/calloc familiy and global ::operator new.

rnk accepted this revision.Jul 7 2017, 2:25 PM

Given that LLVM will contain calls to malloc and it may return null, I'm in favor of this. Even for users that don't install the handler, it turns an OOM crash into a fatal error. That seems good.

include/llvm/Support/ErrorHandling.h
100 ↗(On Diff #104418)

use clang-format to align the wrapped line

lib/Support/ErrorHandling.cpp
130 ↗(On Diff #104418)

Add a space after ','

This revision is now accepted and ready to land.Jul 7 2017, 2:25 PM
kkretzsch updated this revision to Diff 105847.Jul 10 2017, 7:25 AM
kkretzsch removed a reviewer: llvm-commits.
kkretzsch added a subscriber: llvm-commits.

Thanks Reid for reviewing. I updated this patch according to your review comments. In addition I noticed that the report_fatal_error function does allocations inside and is therefore not suitable to be called in the report_bad_alloc_error function. So I changed the default behavior to call an assert instead.

I do not have commit rights to push this change to llvm trunk. Therefore I need someone who is doing this for me.

Thanks,
Klaus

So how about

#include <new>
...
x = malloc(...);
if (x == nullptr)
  throw std::bad_alloc();

that way running out of memory in malloc and new would behave the same way.

include/llvm/Support/ErrorHandling.h
82 ↗(On Diff #105847)

Do not repeat the name of the function in the documentation like this (you see the pattern a lot in older code, but we avoid it in new code).

lib/Support/ErrorHandling.cpp
130 ↗(On Diff #105847)

How about:

#ifdef __EXCEPTIONS
  throw std::bad_alloc();
#else
  report_fatal_error("Unable to allocate memory.\n");
#endif

(__EXCEPTIONS is defined by clang/gcc when exceptions are enabled. An actual patch would probably better put that into llvm/Support/Compiler.h and define some LLVM specific constant like LLVM_ENABLE_EXCEPTIONS so we can add the msvc specific thing later).

lib/Support/Mutex.cpp
52–54 ↗(On Diff #105847)

We omit {} in llvm in simple if and loop statements.

Please, ignore the first part in my last reply that proposes immediately throwing an exception. I realized that we cannot compile that line while -fno-exceptions is enabled. Hence I proposed a different solution that could go as part of llvm::report_bad_alloc_error() (as I proposed later in the review).

kkretzsch updated this revision to Diff 106027.Jul 11 2017, 8:16 AM

Hi Matthias,
thanks for reviewing. I have updated the patch.

Klaus

rnk commandeered this revision.Jul 11 2017, 9:38 AM
rnk edited reviewers, added: kkretzsch; removed: rnk.

I made some minor modifications. Taking the revision to upload them so you can see them.

This revision now requires review to proceed.Jul 11 2017, 9:38 AM
rnk updated this revision to Diff 106053.Jul 11 2017, 9:39 AM
  • minor NFC changes
This revision was automatically updated to reflect the committed changes.
kkretzsch edited edge metadata.Jul 12 2017, 1:00 AM
In D34753#805362, @rnk wrote:

I made some minor modifications. Taking the revision to upload them so you can see them.

Thanks for commiting my change. One comment on your changes. The reason why I changed the fallback behavior to calling an assertion instead of the report_fatal_error function was that 'report_fatal_error' itself does some allocation, e.g. constucts a MutexGuard. This is problematic in an OOM situation.

rnk added a comment.Jul 12 2017, 9:22 AM
In D34753#805362, @rnk wrote:

I made some minor modifications. Taking the revision to upload them so you can see them.

Thanks for commiting my change. One comment on your changes. The reason why I changed the fallback behavior to calling an assertion instead of the report_fatal_error function was that 'report_fatal_error' itself does some allocation, e.g. constucts a MutexGuard. This is problematic in an OOM situation.

Hm. We probably don't want assert(false), though, since that's compiled out in a release build. exit(1) or abort() would be more appropriate.

The MutexGuard I added around the read of BadAllocHandler is also problematic, though. It's not really the MutexGuard that's problematic, though, it's the ManagedStatic<Mutex>, which conditionally allocates memory if we haven't touched the mutex before. The whole design there is troublesome. What we really need is a global statically initialized mutex.

In theory, std::mutex has a constexpr default constructor, so it fits the bill, but I assume there are still portability issues there. It might be worth addressing them. I think at this point, the only supported way to build LLVM with an STL that lacks <mutex> is to disable threading support (cmake -DLLVM_ENABLE_THREADS=OFF), so we can try using std::mutex. That said, we still have to take it on faith that std::mutex::lock doesn't allocate memory. Does that sound reasonable? Certainly pthread_mutex_lock and EnterCriticalSection don't look like they need it.

rnk added a comment.Jul 12 2017, 11:23 AM

Hopefully rL307827 addresses the memory allocation consideration.

In D34753#806738, @rnk wrote:
In D34753#805362, @rnk wrote:

I made some minor modifications. Taking the revision to upload them so you can see them.

Thanks for commiting my change. One comment on your changes. The reason why I changed the fallback behavior to calling an assertion instead of the report_fatal_error function was that 'report_fatal_error' itself does some allocation, e.g. constucts a MutexGuard. This is problematic in an OOM situation.

Hm. We probably don't want assert(false), though, since that's compiled out in a release build. exit(1) or abort() would be more appropriate.

I missed that point, you are right.

! In D34753#806738, @rnk wrote:

The MutexGuard I added around the read of BadAllocHandler is also problematic, though. It's not really the MutexGuard that's problematic, though, it's the ManagedStatic<Mutex>, which conditionally allocates memory if we haven't touched the mutex before. The whole design there is troublesome. What we really need is a global statically initialized mutex.

In theory, std::mutex has a constexpr default constructor, so it fits the bill, but I assume there are still portability issues there. It might be worth addressing them. I think at this point, the only supported way to build LLVM with an STL that lacks <mutex> is to disable threading support (cmake -DLLVM_ENABLE_THREADS=OFF), so we can try using std::mutex. That said, we still have to take it on faith that std::mutex::lock doesn't allocate memory. Does that sound reasonable? Certainly pthread_mutex_lock and EnterCriticalSection don't look like they need it.

In D34753#806906, @rnk wrote:

Hopefully rL307827 addresses the memory allocation consideration.

Yes, using the std::lock_guard on a static mutex variable should work, Thanks!