This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Support/Unix/
-
lib/
-
Support/
-
Unix/
6
Signals.inc

Differential D59005

Fix memory leak in CreateSigAltStack
Needs ReviewPublic

Authored by fez on Mar 5 2019, 4:00 PM.

Download Raw Diff

Details

Reviewers

jfb

Summary

In lib/Support/Unix/Signals.inc there is an occasional memory leak, whenever
the function CreateSigAltStack is called multiple times in certain conditions.

To reproduce the bug, just run

valgrind --leak-check=full opt -o /dev/null /dev/null

OS: Ubuntu 18.04
Valgrind version: 3.13.0
llvm version: 7.0.0 (opt-7 system installation in ubuntu, but for what I saw in the sources it affects llvm 8.0.0 as well)
You should be on a Unix system with sigaltstack available.

This is the leak report I get from Valgrind

$ valgrind --leak-check=full opt-7 -o /dev/null /dev/null
==27771== Memcheck, a memory error detector
==27771== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==27771== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==27771== Command: opt-7 -o /dev/null /dev/null
==27771==
==27771==
==27771== HEAP SUMMARY:
==27771==     in use at exit: 67,640 bytes in 3 blocks
==27771==   total heap usage: 3,488 allocs, 3,485 frees, 1,069,258 bytes
allocated
==27771==
==27771== 67,584 bytes in 1 blocks are definitely lost in loss record 3 of 3
==27771==    at 0x4C2FB0F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27771==    by 0x59622D3: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-7.so.1)
==27771==    by 0x5909CED: llvm::EnablePrettyStackTrace() (in
/usr/lib/x86_64-linux-gnu/libLLVM-7.so.1)
==27771==    by 0x58F2C21: llvm::InitLLVM::InitLLVM(int&, char const**&) (in
/usr/lib/x86_64-linux-gnu/libLLVM-7.so.1)
==27771==    by 0x1B43F7: main (in /usr/lib/llvm-7/bin/opt)
==27771==
==27771== LEAK SUMMARY:
==27771==    definitely lost: 67,584 bytes in 1 blocks
==27771==    indirectly lost: 0 bytes in 0 blocks
==27771==      possibly lost: 0 bytes in 0 blocks
==27771==    still reachable: 56 bytes in 2 blocks
==27771==         suppressed: 0 bytes in 0 blocks
==27771== Reachable blocks (those to which a pointer was found) are not shown.
==27771== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==27771==
==27771== For counts of detected and suppressed errors, rerun with: -v
==27771== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

I actually tracked it down in a version compiled from source with all the debug
symbols. The leak report is very similar, except that I get the exact info on
the leaking allocation:

==20704== HEAP SUMMARY:
==20704==     in use at exit: 1,020,675 bytes in 35 blocks
==20704==   total heap usage: 292,536 allocs, 292,501 frees, 49,368,073 bytes
allocated
==20704==
==20704== 67,584 bytes in 1 blocks are definitely lost in loss record 14 of 27
==20704==    at 0x4C2FB0F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20704==    by 0x893D63C: safe_malloc (MemAlloc.h:27)
==20704==    by 0x893D63C: CreateSigAltStack (Signals.inc:254)
==20704==    by 0x893D63C: RegisterHandlers() (Signals.inc:277)
==20704==    by 0x893DF9C: llvm::sys::AddSignalHandler(void (*)(void*), void*)
(Signals.inc:386)
==20704==    by 0x88E538D: RegisterCrashPrinter (PrettyStackTrace.cpp:180)
==20704==    by 0x88E538D: llvm::EnablePrettyStackTrace()
(PrettyStackTrace.cpp:188)
==20704==    by 0x88CFFED: PrettyStackTraceProgram (PrettyStackTrace.h:77)
==20704==    by 0x88CFFED: llvm::InitLLVM::InitLLVM(int&, char const**&)
(InitLLVM.cpp:25)
==20704==    by 0x41DF07: InitLLVM (InitLLVM.h:35)
==20704==    by 0x41DF07: main (opt.cpp:426)

The leak shows up whenever CreateSigAltStack is called multiple times, in some
specific conditions.

The first time CreateSigAltStack is called, a new alternate signal stack
isallocated with

AltStack.ss_sp = static_cast<char *>(safe_malloc(AltStackSize));

Then the value of AltStack.ss_sp is saved into NewAltStackPointer. According
to the comments, this is a workaround to stop valgrind or asan complaining about
a leak, but it's not enough. Here's why. Whenever CreateSigAltStack is called
a second time and it reaches past the first if, a new larger alternate stack is
allocated again with

AltStack.ss_sp = static_cast<char *>(safe_malloc(AltStackSize));

But at this point, the NewAltStackPointer is overwritten, and the old
alternate signal stack is never free'd, and we don't hold a pointer to it
anymore, causing the leak.

With this fix the leak should go away. The valgrind leak report report goes away
as well.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 28821
Build 28820: arc lint + arc unit

Event Timeline

fez created this revision.Mar 5 2019, 4:00 PM

Herald added a reviewer: jfb. · View Herald TranscriptMar 5 2019, 4:00 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, jdoerfert, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B28821: Diff 189416.Mar 5 2019, 4:01 PM

fez added inline comments.Mar 5 2019, 4:08 PM

llvm/lib/Support/Unix/Signals.inc
255	The leak shows up here, whenever the execution reaches this point two times in a single execution. The first time is fine, but the second time the first alternate signal stack leaks, because `free` is not called before reassigning `NewAltStackPointer`.

This seems wrong. How is this point being reached twice? It's called from here:

static void RegisterHandlers() { // Not signal-safe.
  // The mutex prevents other threads from registering handlers while we're
  // doing it. We also have to protect the handlers and their count because
  // a signal handler could fire while we're registeting handlers.
  static ManagedStatic<sys::SmartMutex<true>> SignalHandlerRegistrationMutex;
  sys::SmartScopedLock<true> Guard(*SignalHandlerRegistrationMutex);

  // If the handlers are already registered, we're done.
  if (NumRegisteredSignals.load() != 0)
    return;

  // Create an alternate stack for signal handling. This is necessary for us to
  // be able to reliably handle signals due to stack overflow.
  CreateSigAltStack();

Is the mutex wrong? It should ensure that the alt stack is only ever set up once.

Is the mutex wrong? It should ensure that the alt stack is only ever set up once.

The mutex is fine, but it only guarantees that only one thread at any given time enters CreateSigAltStack. Nothing prevents the entire RegisterHandlers function to be called multiple times, by a single thread, which causes the execution to reach line 255 multiple times in a single execution of the same thread. This is actually exactly what's happening in my debug runs. It's not a concurrency issue (which would be effectively prevented by the mutex), it's just a memory leak in a single thread.

But now I'm puzzled by this check.

// If the handlers are already registered, we're done.
if (NumRegisteredSignals.load() != 0)
  return;

This should prevent the call to CreateSigAltStack to be reached the second time.

I need to investigate further.

In D59005#1420436, @fez wrote:
But now I'm puzzled by this check.
// If the handlers are already registered, we're done.
if (NumRegisteredSignals.load() != 0)
  return;
This should prevent the call to CreateSigAltStack to be reached the second time.

I need to investigate further.

Indeed, that's why I was confused :)

And NumRegisteredSignals uses sequentially-consistent operations on purpose: it's technically slower but way easier to understand. What could be happening is:

// The signal handler that runs.
static RETSIGTYPE SignalHandler(int Sig) {
  // Restore the signal behavior to default, so that the program actually
  // crashes when we return and the signal reissues.  This also ensures that if
  // we crash in our signal handler that the program will terminate immediately
  // instead of recursing in the signal handler.
  UnregisterHandlers();

if you deliver another signal after unregistering all of them, you get the behavior you observed? I'm not sure whether we care about clang-as-a-library being loaded / unloaded in the same program. If not, we could limit signal registration to be a one-time thing.

if you deliver another signal after unregistering all of them, you get the behavior you observed? I'm not sure whether we care about clang-as-a-library being loaded / unloaded in the same program. If not, we could limit signal registration to be a one-time thing.

That's right. That is exactly what happened when I first stumbled on the problem.

I tried to simplify it to a minimal reproducible example with Valgrind. Here's how I realized that even running valgrind --leak-check=full opt -o /dev/null /dev/null reported the leak, whereas with my fix the error report was going away (BTW have you had the chance to try it? It runs in just a few seconds even with Valgrind).

I think there are two things to point out about this situation.

With the suggested fix the error report from Valgrind goes away _and_ also the memory leak with register/unregister goes away, which means that yes, you could prevent a leak if someone has clang-as-a-library loaded/unloaded in the same program (even if I admit that is not something many people are going to do, even in my case I ended up doing something else).
Valgrind is reporting a false positive on the memory leak with valgrind --leak-check=full opt -o /dev/null /dev/null. I just double-checked, with that command line RegisterHandlers is being called only once, which means that there isn't actually any memory leak. Anyway, the fix makes the false positive effectively go away, which I think is a nice thing because in this way you get clean runs with Valgrind.

Both things might not be all that important, but the patch is pretty self-contained and I don't see any downsides.

What do you think?

jfb added inline comments.Mar 6 2019, 2:59 PM

llvm/lib/Support/Unix/Signals.inc
239	This comment leads me to believe that the leak is quite intentional. What happens if you free the alt stack while it's being used by another signal handler?
255	IIUC you need to trigger two signals? How does this happen? What are the signals that trigger the leak? Do they happen concurrently? Maybe the answer is to hold on to the alt stacks, and free them only when there's no pending signal (or not free them, ever). That seems error-prone, and the point of my refactoring was to reduce the number of existing errors in the signal handler code (which was pretty high before). Before we do something I think we need to understand exactly what triggers this.

fez marked an inline comment as done.Mar 6 2019, 4:18 PM

fez added inline comments.

llvm/lib/Support/Unix/Signals.inc
239	Good point. For this to happen we need to assume we're are in a multi-threaded scenario, with more than one thread trying to register signal handlers and alt stacks for them. If we assume such a scenario, then at the moment there actually is a memory leak: thread 0 calls `CreateSigAltStack`, and reserves a new alt stack, storing a pointer to it in the `static void *NewAltStackPointer`. later (never at the same time because of the mutex) thread 1 calls `CreateSigAltStack`, and reserves another new separate alt stack, storing a pointer to it again in `NewAltStackPointer`, which is `static`, hence shared among all threads. This shared `static` global gets overwritten by thread 1, effectively leaking the alt stack of thread 0. The easiest fix I can think of, is to change `OldAltStack` and `NewAltStackPointer` to be `thread_local static` instead of just `static`. In this way we avoid both the leak (because thread 1 does not overwrite the `NewAltStackPointer` version of thread 0) and the race condition caused by one thread to free the alt stack of another thread (because each thread only allocates and frees its own `thread_local` alt stack, pointed-to by the `thread_local` version of `NewAltStackPointer`). With this change, we could even think of moving the lock in `RegisterHandlers` right before for (auto S : IntSigs) registerHandler(S); This should be fine, because with `CreateSigAltStack` only manipulating local and `thread_local` variables there should be no race condition anymore in `CreateSigAltStack`.

fez marked an inline comment as done.Mar 6 2019, 4:38 PM

fez added inline comments.

llvm/lib/Support/Unix/Signals.inc
255	IIUC you need to trigger two signals? How does this happen? What are the signals that trigger the leak? Do they happen concurrently? No, the leak is not triggered by incoming signals, but by the fact that `CreateSigAltStack` is called multiple times. But because of the check `if (NumRegisteredSignals.load() != 0)` this can never actually happen in a multi threaded program (and neither in a single threaded one), unless for some reasons the signal handlers are initially registered (causing `NumRegisteredSignals` to be `!= 0`), then unregistered (causing `NumRegisteredSignals` to drop to 0), and then re-registered again (because after the de-registration, the check on `NumRegisteredSignals.load != 0` fails a second time). I realize this is an extremely unlikely scenario. At this point the only immediate advantage I see is just cleaning up Valgrind's report from this subtle false positive, which I still think is a nice thing. I think there are three options: drop the patch, which leaves Valgrind complaining and effectively leaks in case of register/unregister/re-register; accept it like it is now, which breaks multi-threaded programs if a thread frees another's alt stack like you pointed out; consider the implications of my proposal above about `thread_local`, which should both fix Valgrind's false positive memory leak report, leave the multi-threaded scenario safe, and also support the register/unregister/re-register scenario (both for single- and multi-threaded). What do you think?

fez marked 2 inline comments as not done.Mar 6 2019, 4:38 PM

jfb added inline comments.Mar 7 2019, 9:10 AM

llvm/lib/Support/Unix/Signals.inc
325	I think this is the root of your issue: the signal handlers are de-registered with the expectation that LLVM will never handle signals ever again. Clearly that's not true. What signals are being handled? i.e. to get the bug you're seeing, can you log `Sig` every time it occurs. What are they? Maybe we're installing non-fatal handlers (that's fine!) but then we can only unregister handlers for fatal signals, not for non-fatal ones.

Revision Contents

Path

Size

llvm/

lib/

Support/

Unix/

Signals.inc

7 lines

Diff 189416

llvm/lib/Support/Unix/Signals.inc

	Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	} RegisteredSignalInfo[array_lengthof(IntSigs) + array_lengthof(KillSigs)];			} RegisteredSignalInfo[array_lengthof(IntSigs) + array_lengthof(KillSigs)];

	#if defined(HAVE_SIGALTSTACK)			#if defined(HAVE_SIGALTSTACK)
	// Hold onto both the old and new alternate signal stack so that it's not			// Hold onto both the old and new alternate signal stack so that it's not
	// reported as a leak. We don't make any attempt to remove our alt signal			// reported as a leak. We don't make any attempt to remove our alt signal
	// stack if we remove our signal handlers; that can't be done reliably if			// stack if we remove our signal handlers; that can't be done reliably if
	// someone else is also trying to do the same thing.			// someone else is also trying to do the same thing.
	static stack_t OldAltStack;			static stack_t OldAltStack;
	static void* NewAltStackPointer;			static void* NewAltStackPointer;
				jfbUnsubmitted Not Done Reply Inline Actions This comment leads me to believe that the leak is quite intentional. What happens if you free the alt stack while it's being used by another signal handler? jfb: This comment leads me to believe that the leak is quite intentional. What happens if you free…
				fezAuthorUnsubmitted Not Done Reply Inline Actions Good point. For this to happen we need to assume we're are in a multi-threaded scenario, with more than one thread trying to register signal handlers and alt stacks for them. If we assume such a scenario, then at the moment there actually is a memory leak: thread 0 calls `CreateSigAltStack`, and reserves a new alt stack, storing a pointer to it in the `static void NewAltStackPointer`. later (never at the same time because of the mutex) thread 1 calls `CreateSigAltStack`, and reserves another new separate alt stack, storing a pointer to it again in `NewAltStackPointer`, which is `static`, hence shared among all threads. This shared `static` global gets overwritten by thread 1, effectively leaking the alt stack of thread 0. The easiest fix I can think of, is to change `OldAltStack` and `NewAltStackPointer` to be `thread_local static` instead of just `static`. In this way we avoid both the leak (because thread 1 does not overwrite the `NewAltStackPointer` version of thread 0) and the race condition caused by one thread to free the alt stack of another thread (because each thread only allocates and frees its own `thread_local` alt stack, pointed-to by the `thread_local` version of `NewAltStackPointer`). With this change, we could even think of moving the lock in `RegisterHandlers` right before for (auto S : IntSigs) registerHandler(S); This should be fine, because with `CreateSigAltStack` only manipulating local and `thread_local` variables there should be no race condition anymore in `CreateSigAltStack`. fez:* Good point. For this to happen we need to assume we're are in a multi-threaded scenario, with…

	static void CreateSigAltStack() {			static void CreateSigAltStack() {
	const size_t AltStackSize = MINSIGSTKSZ + 64 * 1024;			const size_t AltStackSize = MINSIGSTKSZ + 64 * 1024;

	// If we're executing on the alternate stack, or we already have an alternate			// If we're executing on the alternate stack, or we already have an alternate
	// signal stack that we're happy with, there's nothing for us to do. Don't			// signal stack that we're happy with, there's nothing for us to do. Don't
	// reduce the size, some other part of the process might need a larger stack			// reduce the size, some other part of the process might need a larger stack
	// than we do.			// than we do.
	if (sigaltstack(nullptr, &OldAltStack) != 0 \|\|			if (sigaltstack(nullptr, &OldAltStack) != 0 \|\|
	OldAltStack.ss_flags & SS_ONSTACK \|\|			OldAltStack.ss_flags & SS_ONSTACK \|\|
	(OldAltStack.ss_sp && OldAltStack.ss_size >= AltStackSize))			(OldAltStack.ss_sp && OldAltStack.ss_size >= AltStackSize))
	return;			return;

	stack_t AltStack = {};			stack_t AltStack = {};
	AltStack.ss_sp = static_cast<char *>(safe_malloc(AltStackSize));			AltStack.ss_sp = static_cast<char *>(safe_malloc(AltStackSize));
	NewAltStackPointer = AltStack.ss_sp; // Save to avoid reporting a leak.
	fezAuthorUnsubmitted Not Done Reply Inline Actions The leak shows up here, whenever the execution reaches this point two times in a single execution. The first time is fine, but the second time the first alternate signal stack leaks, because `free` is not called before reassigning `NewAltStackPointer`. fez: The leak shows up here, whenever the execution reaches this point two times in a single…
	jfbUnsubmitted Not Done Reply Inline Actions IIUC you need to trigger two signals? How does this happen? What are the signals that trigger the leak? Do they happen concurrently? Maybe the answer is to hold on to the alt stacks, and free them only when there's no pending signal (or not free them, ever). That seems error-prone, and the point of my refactoring was to reduce the number of existing errors in the signal handler code (which was pretty high before). Before we do something I think we need to understand exactly what triggers this. jfb: IIUC you need to trigger two signals? How does this happen? What are the signals that trigger…
	fezAuthorUnsubmitted Not Done Reply Inline Actions IIUC you need to trigger two signals? How does this happen? What are the signals that trigger the leak? Do they happen concurrently? No, the leak is not triggered by incoming signals, but by the fact that `CreateSigAltStack` is called multiple times. But because of the check `if (NumRegisteredSignals.load() != 0)` this can never actually happen in a multi threaded program (and neither in a single threaded one), unless for some reasons the signal handlers are initially registered (causing `NumRegisteredSignals` to be `!= 0`), then unregistered (causing `NumRegisteredSignals` to drop to 0), and then re-registered again (because after the de-registration, the check on `NumRegisteredSignals.load != 0` fails a second time). I realize this is an extremely unlikely scenario. At this point the only immediate advantage I see is just cleaning up Valgrind's report from this subtle false positive, which I still think is a nice thing. I think there are three options: drop the patch, which leaves Valgrind complaining and effectively leaks in case of register/unregister/re-register; accept it like it is now, which breaks multi-threaded programs if a thread frees another's alt stack like you pointed out; consider the implications of my proposal above about `thread_local`, which should both fix Valgrind's false positive memory leak report, leave the multi-threaded scenario safe, and also support the register/unregister/re-register scenario (both for single- and multi-threaded). What do you think? fez: > IIUC you need to trigger two signals? How does this happen? What are the signals that trigger…
	AltStack.ss_size = AltStackSize;			AltStack.ss_size = AltStackSize;
	if (sigaltstack(&AltStack, &OldAltStack) != 0)			if (sigaltstack(&AltStack, &OldAltStack) != 0) {
	free(AltStack.ss_sp);			free(AltStack.ss_sp);
				} else {
				free(NewAltStackPointer);
				NewAltStackPointer = AltStack.ss_sp; // Save to avoid reporting a leak.
				}
	}			}
	#else			#else
	static void CreateSigAltStack() {}			static void CreateSigAltStack() {}
	#endif			#endif

	static void RegisterHandlers() { // Not signal-safe.			static void RegisterHandlers() { // Not signal-safe.
	// The mutex prevents other threads from registering handlers while we're			// The mutex prevents other threads from registering handlers while we're
	// doing it. We also have to protect the handlers and their count because			// doing it. We also have to protect the handlers and their count because
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	}			}

	// The signal handler that runs.			// The signal handler that runs.
	static RETSIGTYPE SignalHandler(int Sig) {			static RETSIGTYPE SignalHandler(int Sig) {
	// Restore the signal behavior to default, so that the program actually			// Restore the signal behavior to default, so that the program actually
	// crashes when we return and the signal reissues. This also ensures that if			// crashes when we return and the signal reissues. This also ensures that if
	// we crash in our signal handler that the program will terminate immediately			// we crash in our signal handler that the program will terminate immediately
	// instead of recursing in the signal handler.			// instead of recursing in the signal handler.
	UnregisterHandlers();			UnregisterHandlers();
				jfbUnsubmitted Not Done Reply Inline Actions I think this is the root of your issue: the signal handlers are de-registered with the expectation that LLVM will never handle signals ever again. Clearly that's not true. What signals are being handled? i.e. to get the bug you're seeing, can you log `Sig` every time it occurs. What are they? Maybe we're installing non-fatal handlers (that's fine!) but then we can only unregister handlers for fatal signals, not for non-fatal ones. jfb: I think this is the root of your issue: the signal handlers are de-registered with the…

	// Unmask all potentially blocked kill signals.			// Unmask all potentially blocked kill signals.
	sigset_t SigMask;			sigset_t SigMask;
	sigfillset(&SigMask);			sigfillset(&SigMask);
	sigprocmask(SIG_UNBLOCK, &SigMask, nullptr);			sigprocmask(SIG_UNBLOCK, &SigMask, nullptr);

	{			{
	RemoveFilesToRemove();			RemoveFilesToRemove();
	▲ Show 20 Lines • Show All 255 Lines • Show Last 20 Lines