This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/tsan/rtl/
-
tsan/
-
rtl/
1/9
tsan_interceptors.cc
-
test/tsan/
-
tsan/
-
atexit3.cc

Differential D39619

Correct atexit(3) support in TSan/NetBSD
ClosedPublic

Authored by krytarowski on Nov 3 2017, 2:27 PM.

Download Raw Diff

Details

Reviewers

vitalybuka
dvyukov
joerg
kcc
eugenis

Commits

rG2fd314e2e2e6: Correct atexit(3) support in TSan/NetBSD
rCRT317735: Correct atexit(3) support in TSan/NetBSD
rL317735: Correct atexit(3) support in TSan/NetBSD

Summary

The NetBSD specific implementation of cxa_atexit() does not
preserve the 2nd argument if dso is equal to NULL.

Changes:

Split paths of handling intercepted __cxa_atexit() and atexit(3). This affects all supported Operating Systems.
Add a local stack-like structure to hold the __cxa_atexit() context. atexit(3) is documented in the C standard as calling callback from the earliest to the oldest entry. This path also fixes potential ABI problem of passing an argument to a function from the atexit(3) callback mechanism.
Add new test to ensure LIFO style of atexit(3) callbacks: atexit3.cc

Proposal to change the behavior of __cxa_atexit() in NetBSD has been rejected.

With the above changes TSan/NetBSD with the current tsan_interceptors.cc
can bootstrap into operation.

Diff Detail

Repository: rL LLVM

Event Timeline

krytarowski created this revision.Nov 3 2017, 2:27 PM

Herald added a subscriber: kubamracek. · View Herald TranscriptNov 3 2017, 2:28 PM

krytarowski edited the summary of this revision. (Show Details)Nov 3 2017, 2:29 PM

thanks for splitting.
I don't have particular opinion on the patch, just some nits.
Maybe wait for @dvyukov input?

lib/sanitizer_common/sanitizer_allocator.cc
196 ↗	(On Diff #121552)	!naddr
lib/tsan/rtl/tsan_interceptors.cc
420	could you please move both vars closed to the use?
432	Dmitry, can this be just "Acquire(cur_thread(), 0, (uptr)ctx);" ?
468	maybe int res = REAL(__cxa_atexit)((void ()(void a))at_exit_wrapper, dso ? ctx : 0, dso); if (!dso) { ... }

Maybe wait for @dvyukov input?

It would be nice to get feedback from !NetBSD tests whether it works correctly.

This code in the past and after my change levites over OS-specific implementations and assumptions.

lib/tsan/rtl/tsan_interceptors.cc
468	There are now two `at_exit_wrapper` functions so I would need to inline the `?`: operator into the first argument as well.

Is there any reason why keeping at_exit and __cxa_atexit handling merged? They are pretty much disjunct code paths, especially since the at_exit stack means that the real at_exit can be used.

atexit(3)/NetBSD calls internally __cxa_atexit() and it can be intercepted.

Certainly there is some way to support real atexit(3), but it has been disabled explicitly perhaps for the same reason.

Another reason is that with two more distinct paths we can get more code duplication for no good reason.

For the sake of purity (like removal of casting function pointers) we generate extra new code, new structs (or unions) etc.

I've decided go this path in this revision when I noted that NetBSD casts function pointers in the atexit(3) implementation.

Doesn't cxa_atexit call callbacks in the same order as atexit? You said that atexit is implemented by means of cxa_atexit.
As far as I remember, __cxa_atexit is called with dso=0 for destructors in main executable. If so, order of execution will depend on if the dtor is in main executable or not, which looks wrong.
Is it really necessary to use different call mechanisms? Please add a test for LIFO atexit order, if it works then we can keep the current code.

Re InternalReallocArr, we already have Vector<T> class in tsan_vector.h for this. So if this code is really necessary, we should use Vector (probably will require declaring ctor as constexpr if we need a global instance).

No, __cxa_atexit will always reference the DSO handle. That exists even in the main executable.

In D39619#916666, @joerg wrote:

No, __cxa_atexit will always reference the DSO handle. That exists even in the main executable.

Ack. Probably confused it with __cxa_finalize.

The behavior of atexit(3) LIFO has been defined in the C standard.

I'm going to switch to the internal vector container and add a dedicated test to verify that this is really LIFO.

The behavior of atexit(3) LIFO has been defined in the C standard.

C++ also defines order of destruction of static and thread-local objects as LIFO:

3.6.3/1
If the completion of the constructor or dynamic initialization of an object with static storage
duration is sequenced before that of another, the completion of the destructor of the second is sequenced
before the initiation of the destructor of the first.

So I think we can assume they all are LIFO.

Hmm, I might be doing something incorrectly, but there is a problem with Vector. We fire the destructor for this LIFO container.. before actually calling atexit(3) functions from libc.

diff --git a/lib/tsan/rtl/tsan_interceptors.cc b/lib/tsan/rtl/tsan_interceptors.cc
index 0b4e873a0..106a3fd0e 100644
--- a/lib/tsan/rtl/tsan_interceptors.cc
+++ b/lib/tsan/rtl/tsan_interceptors.cc
@@ -391,7 +391,22 @@ struct AtExitCtx {
   void *arg;
 };
 
-static void at_exit_wrapper(void *arg) {
+Vector<struct AtExitCtx *> AtExitStack(MBlockAtExit);
+
+static void at_exit_wrapper() {
+  ThreadState *thr = cur_thread();
+  uptr pc = 0;
+
+  // Pop AtExitCtx from the top of the stack of callback functions
+  AtExitCtx *ctx = AtExitStack[AtExitStack.Size() - 1];
+  AtExitStack.PopBack();
+
+  Acquire(thr, pc, (uptr)ctx);
+  ((void(*)())ctx->f)();
+  InternalFree(ctx);
+}
+
+static void cxa_at_exit_wrapper(void *arg) {
   ThreadState *thr = cur_thread();
   uptr pc = 0;
   Acquire(thr, pc, (uptr)arg);
@@ -430,7 +445,18 @@ static int setup_at_exit_wrapper(ThreadState *thr, uptr pc, void(*f)(),
   // Memory allocation in __cxa_atexit will race with free during exit,
   // because we do not see synchronization around atexit callback list.
   ThreadIgnoreBegin(thr, pc);
-  int res = REAL(__cxa_atexit)(at_exit_wrapper, ctx, dso);
+  int res;
+  if (dso == 0) {
+    // NetBSD does not preserve the 2nd argument if dso is equal to 0
+    // Store ctx in a local stack-like structure (LIFO order)
+    res = REAL(__cxa_atexit)((void (*)(void *a))at_exit_wrapper, 0, 0);
+    // Push AtExitCtx on the top of the stack of callback functions
+    if (res == 0) {
+      AtExitStack.PushBack(ctx);
+    }
+  } else {
+    res = REAL(__cxa_atexit)(cxa_at_exit_wrapper, ctx, dso);
+  }
   ThreadIgnoreEnd(thr, pc);
   return res;
 }

This means that we need to use internal_malloc() like in the proposed patch.

krytarowski updated this revision to Diff 121751.Nov 6 2017, 10:02 AM

krytarowski edited the summary of this revision. (Show Details)

Hmm, I might be doing something incorrectly, but there is a problem with Vector. We fire the destructor for this LIFO container.. before actually calling atexit(3) functions from libc.

Right, there is also the dtor.
I've mailed https://reviews.llvm.org/D39721 which should enable this and all other similar use-cases.
Spreading this "linker-initialized ctors/no dtors plague" throughout the code base will lead to maintenance pain long term. This is just a small, local functionality and you had to write a custom container for it.

No need for a custom container, just allocate the vector dynamically and free it when it becomes empty.

dvyukov added inline comments.Nov 7 2017, 4:23 AM

lib/tsan/rtl/tsan_interceptors.cc
472	This needs a mutex. at_exit_wrapper too. C++ allows concurrent callbacks, not sure if anybody does it, but at least it races with setup_at_exit_wrapper.

krytarowski updated this revision to Diff 121950.Nov 7 2017, 11:53 AM

krytarowski edited the summary of this revision. (Show Details)

krytarowski marked an inline comment as done.

dvyukov added inline comments.Nov 7 2017, 11:55 PM

lib/tsan/rtl/tsan_interceptors.cc
413	This also needs the mutex for 2 reasons: It can race with setup_at_exit_wrapper in other threads. You push onto the stack _after_ calling __cxa_atexit, so if another thread calls exit in between, at_exit_wrapper can see an empty stack.

krytarowski added inline comments.Nov 8 2017, 7:39 AM

lib/tsan/rtl/tsan_interceptors.cc
413	atexit(3) are global, not per-thread. I'm not convinced that races for 1 can impact execution of `at_exit_wrapper()`. I agree that someone might try to keep registering new atexit(3) callbacks inside atexit(3) callback functions. I don't know whether this behavior is even defined or real-world. I think that function-scoped mutex can cause deadlock with callback registration mutex, so I will protect only the access to `AtExitStack`.

This comment has been deleted.

lib/tsan/rtl/tsan_interceptors.cc
413	I'm not convinced that races for 1 can impact execution of at_exit_wrapper(). Why? One thread executes callbacks, another registers new callbacks. AtExitStack is corrupted as the result. Everything explodes. I agree that someone might try to keep registering new atexit(3) callbacks inside atexit(3) callback functions. That's less of an issue. The problem are other threads that know nothing about process existing yet and continue registering callbacks.

dvyukov added inline comments.Nov 8 2017, 8:01 AM

lib/tsan/rtl/tsan_interceptors.cc
413	And the second problem is: thread registers callback with __cxa_atexit but don't push onto AtExitStack yes; another thread calls exit and at_exit_wrapper() discovers empty AtExitStack.

Protect AtExitStack with a mutex in at_exit_wrapper().

dvyukov accepted this revision.Nov 8 2017, 8:22 AM

This revision is now accepted and ready to land.Nov 8 2017, 8:22 AM

Thanks for review, I will commit it tonight!

The remaining problems with TSan/NetBSD:

Mutex tracking is broken, they seem to be not registered (in ScopedReport::AddMemoryAccess we always get mset->Size() equal to 0)
Thread Deatch and Thread Joined can race and assert that they go in the wrong order (we destroy a thread before marking it Finished)

Less important ones:

setjmp/longjmp unimplemented for NetBSD
StartBackgroundThread() is not executed (need to be deferred)

krytarowski closed this revision.Nov 8 2017, 2:34 PM

Revision Contents

Path

Size

lib/

tsan/

rtl/

tsan_interceptors.cc

57 lines

test/

tsan/

atexit3.cc

41 lines

Diff 122099

lib/tsan/rtl/tsan_interceptors.cc

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	struct ThreadSignalContext {
atomic_uintptr_t in_blocking_func;		atomic_uintptr_t in_blocking_func;
atomic_uintptr_t have_pending_signals;		atomic_uintptr_t have_pending_signals;
SignalDesc pending_signals[kSigCount];		SignalDesc pending_signals[kSigCount];
// emptyset and oldset are too big for stack.		// emptyset and oldset are too big for stack.
__sanitizer_sigset_t emptyset;		__sanitizer_sigset_t emptyset;
__sanitizer_sigset_t oldset;		__sanitizer_sigset_t oldset;
};		};

		// The sole reason tsan wraps atexit callbacks is to establish synchronization
		// between callback setup and callback execution.
		struct AtExitCtx {
		void (*f)();
		void *arg;
		};

// InterceptorContext holds all global data required for interceptors.		// InterceptorContext holds all global data required for interceptors.
// It's explicitly constructed in InitializeInterceptors with placement new		// It's explicitly constructed in InitializeInterceptors with placement new
// and is never destroyed. This allows usage of members with non-trivial		// and is never destroyed. This allows usage of members with non-trivial
// constructors and destructors.		// constructors and destructors.
struct InterceptorContext {		struct InterceptorContext {
// The object is 64-byte aligned, because we want hot data to be located		// The object is 64-byte aligned, because we want hot data to be located
// in a single cache line if possible (it's accessed in every interceptor).		// in a single cache line if possible (it's accessed in every interceptor).
ALIGNED(64) LibIgnore libignore;		ALIGNED(64) LibIgnore libignore;
sigaction_t sigactions[kSigCount];		sigaction_t sigactions[kSigCount];
#if !SANITIZER_MAC && !SANITIZER_NETBSD		#if !SANITIZER_MAC && !SANITIZER_NETBSD
unsigned finalize_key;		unsigned finalize_key;
#endif		#endif

		BlockingMutex atexit_mu;
		Vector<struct AtExitCtx *> AtExitStack;

InterceptorContext()		InterceptorContext()
: libignore(LINKER_INITIALIZED) {		: libignore(LINKER_INITIALIZED), AtExitStack(MBlockAtExit) {
}		}
};		};

static ALIGNED(64) char interceptor_placeholder[sizeof(InterceptorContext)];		static ALIGNED(64) char interceptor_placeholder[sizeof(InterceptorContext)];
InterceptorContext *interceptor_ctx() {		InterceptorContext *interceptor_ctx() {
return reinterpret_cast<InterceptorContext*>(&interceptor_placeholder[0]);		return reinterpret_cast<InterceptorContext*>(&interceptor_placeholder[0]);
}		}

▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	TSAN_INTERCEPTOR(int, nanosleep, void req, void rem) {
return res;		return res;
}		}

TSAN_INTERCEPTOR(int, pause, int fake) {		TSAN_INTERCEPTOR(int, pause, int fake) {
SCOPED_TSAN_INTERCEPTOR(pause, fake);		SCOPED_TSAN_INTERCEPTOR(pause, fake);
return BLOCK_REAL(pause)(fake);		return BLOCK_REAL(pause)(fake);
}		}

// The sole reason tsan wraps atexit callbacks is to establish synchronization		static void at_exit_wrapper() {
// between callback setup and callback execution.		AtExitCtx *ctx;
struct AtExitCtx {		{
		dvyukovUnsubmitted Not Done Reply Inline Actions This also needs the mutex for 2 reasons: It can race with setup_at_exit_wrapper in other threads. You push onto the stack _after_ calling __cxa_atexit, so if another thread calls exit in between, at_exit_wrapper can see an empty stack. dvyukov: This also needs the mutex for 2 reasons: 1. It can race with setup_at_exit_wrapper in other…
		krytarowskiAuthorUnsubmitted Not Done Reply Inline Actions atexit(3) are global, not per-thread. I'm not convinced that races for 1 can impact execution of `at_exit_wrapper()`. I agree that someone might try to keep registering new atexit(3) callbacks inside atexit(3) callback functions. I don't know whether this behavior is even defined or real-world. I think that function-scoped mutex can cause deadlock with callback registration mutex, so I will protect only the access to `AtExitStack`. krytarowski: atexit(3) are global, not per-thread. I'm not convinced that races for 1 can impact execution…
		dvyukovUnsubmitted Not Done Reply Inline Actions I'm not convinced that races for 1 can impact execution of at_exit_wrapper(). Why? One thread executes callbacks, another registers new callbacks. AtExitStack is corrupted as the result. Everything explodes. I agree that someone might try to keep registering new atexit(3) callbacks inside atexit(3) callback functions. That's less of an issue. The problem are other threads that know nothing about process existing yet and continue registering callbacks. dvyukov: > I'm not convinced that races for 1 can impact execution of at_exit_wrapper(). Why? One…
		dvyukovUnsubmitted Not Done Reply Inline Actions And the second problem is: thread registers callback with __cxa_atexit but don't push onto AtExitStack yes; another thread calls exit and at_exit_wrapper() discovers empty AtExitStack. dvyukov: And the second problem is: thread registers callback with __cxa_atexit but don't push onto…
void (*f)();		// Ensure thread-safety.
void *arg;		BlockingMutexLock l(&interceptor_ctx()->atexit_mu);
};

static void at_exit_wrapper(void *arg) {		// Pop AtExitCtx from the top of the stack of callback functions
ThreadState *thr = cur_thread();		uptr element = interceptor_ctx()->AtExitStack.Size() - 1;
uptr pc = 0;		ctx = interceptor_ctx()->AtExitStack[element];
Acquire(thr, pc, (uptr)arg);		interceptor_ctx()->AtExitStack.PopBack();
		vitalybukaUnsubmitted Not Done Reply Inline Actions could you please move both vars closed to the use? vitalybuka: could you please move both vars closed to the use?
		}

		Acquire(cur_thread(), (uptr)0, (uptr)ctx);
		((void(*)())ctx->f)();
		InternalFree(ctx);
		}

		static void cxa_at_exit_wrapper(void *arg) {
		Acquire(cur_thread(), 0, (uptr)arg);
AtExitCtx ctx = (AtExitCtx)arg;		AtExitCtx ctx = (AtExitCtx)arg;
((void()(void arg))ctx->f)(ctx->arg);		((void()(void arg))ctx->f)(ctx->arg);
InternalFree(ctx);		InternalFree(ctx);
		vitalybukaUnsubmitted Not Done Reply Inline Actions Dmitry, can this be just "Acquire(cur_thread(), 0, (uptr)ctx);" ? vitalybuka: Dmitry, can this be just "Acquire(cur_thread(), 0, (uptr)ctx);" ?
}		}

static int setup_at_exit_wrapper(ThreadState thr, uptr pc, void(f)(),		static int setup_at_exit_wrapper(ThreadState thr, uptr pc, void(f)(),
void arg, void dso);		void arg, void dso);

#if !SANITIZER_ANDROID		#if !SANITIZER_ANDROID
TSAN_INTERCEPTOR(int, atexit, void (*f)()) {		TSAN_INTERCEPTOR(int, atexit, void (*f)()) {
if (cur_thread()->in_symbolizer)		if (cur_thread()->in_symbolizer)
Show All 16 Lines	static int setup_at_exit_wrapper(ThreadState thr, uptr pc, void(f)(),
void arg, void dso) {		void arg, void dso) {
AtExitCtx ctx = (AtExitCtx)InternalAlloc(sizeof(AtExitCtx));		AtExitCtx ctx = (AtExitCtx)InternalAlloc(sizeof(AtExitCtx));
ctx->f = f;		ctx->f = f;
ctx->arg = arg;		ctx->arg = arg;
Release(thr, pc, (uptr)ctx);		Release(thr, pc, (uptr)ctx);
// Memory allocation in __cxa_atexit will race with free during exit,		// Memory allocation in __cxa_atexit will race with free during exit,
// because we do not see synchronization around atexit callback list.		// because we do not see synchronization around atexit callback list.
ThreadIgnoreBegin(thr, pc);		ThreadIgnoreBegin(thr, pc);
int res = REAL(__cxa_atexit)(at_exit_wrapper, ctx, dso);		int res;
		if (!dso) {
		// NetBSD does not preserve the 2nd argument if dso is equal to 0
		// Store ctx in a local stack-like structure
		vitalybukaUnsubmitted Not Done Reply Inline Actions maybe int res = REAL(__cxa_atexit)((void ()(void a))at_exit_wrapper, dso ? ctx : 0, dso); if (!dso) { ... } vitalybuka: maybe ``` int res = REAL(__cxa_atexit)((void ()(void a))at_exit_wrapper, dso ? ctx : 0, dso)…
		krytarowskiAuthorUnsubmitted Not Done Reply Inline Actions There are now two `at_exit_wrapper` functions so I would need to inline the `?`: operator into the first argument as well. krytarowski: There are now two `at_exit_wrapper` functions so I would need to inline the `?`: operator into…

		// Ensure thread-safety.
		BlockingMutexLock l(&interceptor_ctx()->atexit_mu);

		dvyukovUnsubmitted Done Reply Inline Actions This needs a mutex. at_exit_wrapper too. C++ allows concurrent callbacks, not sure if anybody does it, but at least it races with setup_at_exit_wrapper. dvyukov: This needs a mutex. at_exit_wrapper too. C++ allows concurrent callbacks, not sure if anybody…
		res = REAL(__cxa_atexit)((void ()(void a))at_exit_wrapper, 0, 0);
		// Push AtExitCtx on the top of the stack of callback functions
		if (!res) {
		interceptor_ctx()->AtExitStack.PushBack(ctx);
		}
		} else {
		res = REAL(__cxa_atexit)(cxa_at_exit_wrapper, ctx, dso);
		}
ThreadIgnoreEnd(thr, pc);		ThreadIgnoreEnd(thr, pc);
return res;		return res;
}		}

#if !SANITIZER_MAC && !SANITIZER_NETBSD		#if !SANITIZER_MAC && !SANITIZER_NETBSD
static void on_exit_wrapper(int status, void *arg) {		static void on_exit_wrapper(int status, void *arg) {
ThreadState *thr = cur_thread();		ThreadState *thr = cur_thread();
uptr pc = 0;		uptr pc = 0;
▲ Show 20 Lines • Show All 991 Lines • Show Last 20 Lines

test/tsan/atexit3.cc

This file was added.

				// RUN: %clang_tsan -O1 %s -o %t && %run %t 2>&1 \| FileCheck %s

				#include <stdio.h>
				#include <stdlib.h>

				static void atexit5() {
				fprintf(stderr, "5");
				}

				static void atexit4() {
				fprintf(stderr, "4");
				}

				static void atexit3() {
				fprintf(stderr, "3");
				}

				static void atexit2() {
				fprintf(stderr, "2");
				}

				static void atexit1() {
				fprintf(stderr, "1");
				}

				static void atexit0() {
				fprintf(stderr, "\n");
				}

				int main() {
				atexit(atexit0);
				atexit(atexit1);
				atexit(atexit2);
				atexit(atexit3);
				atexit(atexit4);
				atexit(atexit5);
				}

				// CHECK-NOT: FATAL: ThreadSanitizer
				// CHECK-NOT: WARNING: ThreadSanitizer
				// CHECK: 54321