Download Raw Diff

Details

Reviewers

kcc
glider
dvyukov
samsonov

Commits

rGcd18f28751ad: [tsan] Alternative ThreadState storage for OS X
rCRT252159: [tsan] Alternative ThreadState storage for OS X
rL252159: [tsan] Alternative ThreadState storage for OS X

Summary

On OS X, there are several issues with using __thread to store the ThreadState objects that TSan relies on in all interceptors and memory instrumentation:

During early process startup, interceptors are called (from dyld, Libc, etc.) when TLV is simply not available and any access to it will crash.
During early new thread initialization, interceptors are called, but the TLV for the current thread is not yet initialized. It will be lazily loaded on the first access, but the initialization actually needs to call one of the intercepted functions (pthread_mutex_lock), creating a circular dependency.
When a thread is finished, during its teardown, the TLV is destroyed (deallocated), but interceptors are still called on that thread, which will cause the TLV to get resurrected (by lazy initialization).

There are several possible workarounds, one could be to use pthread_key_create and pthread_getspecific, but this still has the thread finalization issue. This patch presents a different solution (originally proposed by Kostya): Based on the fact that pthread_self() is always available and reliable and returns a valid pointer to memory, we'll use the shadow memory of this pointer as a "poor man's TLV". No user code should ever read/write to this internal libpthread structure, so it's safe to use it for this purpose. We can simply lazily allocate the ThreadState object and store the pointer here.

To make this work, we need to store the main thread's ThreadState separately, because it needs to be available even before the shadow memory is initialized. Note that the current patch never deallocates the ThreadState objects and simply leaks them, which I'll fix in a subsequent patch.

There are some performance implications here, but I'd like to point out that the hot path contains only a call to pthread_main_np, pthread_self and MemToShadow. At least on OS X, pthread_self is only a single memory access (via the %gs segment) plus a return, and pthread_main_np has an extra memory access plus 2 arithmetic operations. So it seems that this implementation shouldn't hurt too much.

(This is part of an effort to port TSan to OS X, and it's one the very first steps. Don't expect TSan on OS X to actually work or pass tests at this point.)

Diff Detail

Repository: rL LLVM

Event Timeline

kubamracek updated this revision to Diff 39065.Nov 3 2015, 8:10 AM

kubamracek retitled this revision from to [tsan] Alternative ThreadState storage for OS X.

kubamracek updated this object.

kubamracek added reviewers: kcc, samsonov, glider, dvyukov.

kubamracek added subscribers: llvm-commits, zaks.anna, ismailp, jasonk.

dvyukov added inline comments.Nov 3 2015, 9:32 AM

lib/tsan/rtl/tsan_platform_mac.cc
43 ↗	(On Diff #39065)	#ifndef SANITIZER_GO
51 ↗	(On Diff #39065)	How complex is that patch? If it is not too complex, then I would prefer to resolve this TODO right in this patch.
55 ↗	(On Diff #39065)	What does pthread_self return for main thread? Or is the issue with early bootstrap? If it is bootstrap, then I think it is better to have a global var that contains pthread_self value for main thread. This variable is initialized in Iniitialize. If this variable is 0, then we also know that this is the main thread. Basically it will allow us to replace pthread_main_np call with a check of a global. Something along the lines of: uptr th = pthread_self(); uptr mt = g_main_thread; if (mt == 0 \|\| mt == th) { // main thread } else { // non-main thread } Will this work?
64 ↗	(On Diff #39065)	This is not async-signal-safe. A thread can receive a signal concurrent with the first call to cur_thread(), and signal handler will also call cur_thread().
lib/tsan/rtl/tsan_rtl.cc
47 ↗	(On Diff #39065)	!defined(SANITIZER_GO)
lib/tsan/rtl/tsan_rtl.h
412 ↗	(On Diff #39065)	This must be !defined(SANITIZER_GO)

Updating patch to address comments.

This is not async-signal-safe. A thread can receive a signal concurrent with the first call to cur_thread(), and signal handler will also call cur_thread().

I see. Could you please suggest a way to make it async-signal-safe?

I see. Could you please suggest a way to make it async-signal-safe?

Something along the lines of:

thr = atomic_load(fake_tls, memory_order_relaxed);
atomic_signal_fence(memory_order_acquire); // turns the previous load into acquire wrt signals
if (thr == nullptr) {

// slow-path
thr = (ThreadState *)InternalAlloc(sizeof(ThreadState), nullptr);
internal_memset(thr, 0, sizeof(*thr));
ThreadState *cmp = nullptr;
if (!atomic_compare_exchange_strong(fake_tls, &cmp, thr, memory_order_acq_rel)) {
    InternalFree(thr);
    thr = cmp;
}

}
return thr;

However, InternalAlloc is not async-signal-safe (and probably needs explicit initialization, so don't work too early), so you need to replace them with internal_mmap/munmap.

Updating the patch to also deal with ThreadState cleanup.

Thanks for the explanation of async-signal-safety, I'll take a look at it in another update of this patch.

Looks much better now.

Waiting for the async-signal-safe change, and then we can get this in.

lib/tsan/rtl/tsan_platform_mac.cc
59 ↗	(On Diff #39211)	Move this in InitializePlatform.
64 ↗	(On Diff #39211)	Add UNLIKELY. I know you are now concentrated on getting it to work end-to-end, but these are small things that do improve generated code.
lib/tsan/rtl/tsan_rtl.h
423 ↗	(On Diff #39211)	This also needs to be !defined(SANITIZER_GO) I think it's better to do: #ifndef SANITIZER_GO #if SANITIZER_MAC ... #else ... #endif #endif

jevinskie added a subscriber: jevinskie.Nov 4 2015, 11:19 AM

Updating patch to implement async-signal-safety.

I'm unsure about the destruction of ThreadState in cur_thread_finalize, does that need to be signal aware as well? But that only happens once in a thread, and it seems it's already too late for a signal handler to do anything reasonable, and we don't want it to resurrect the ThreadState. What is the behavior on Linux with regular __thread ThreadState?

Signal handlers call SigCtx to obtain signal handling context:

static ThreadSignalContext *SigCtx(ThreadState *thr) {

ThreadSignalContext *ctx = (ThreadSignalContext*)thr->signal_ctx;
if (ctx == 0 && !thr->is_dead) {
  ctx = (ThreadSignalContext*)MmapOrDie(sizeof(*ctx), "ThreadSignalContext");
  MemoryResetRange(thr, (uptr)&SigCtx, (uptr)ctx, sizeof(*ctx));
  thr->signal_ctx = ctx;
}
return ctx;

}

If the thread is already considered "dead" (during destruction or destroyed), then we just ignore signals.
This is most likely OK, because we can pretend that the thread just exited before catching the signal.

Add a note to cur_thread_finalize saying that it is not signal-safe (in particular you call unmap first and then clear the fake_tls, if we receive a signal in between, handler will try to access the unmapped ThreadState). We can sort it out later. But this is something to keep in mind (in fact, signals are the most common source of tsan bugs and crashes).

LGTM after addition of note to cur_thread_finalize and removal of memset.

lib/tsan/rtl/tsan_platform_mac.cc
54 ↗	(On Diff #39344)	This is unnecessary with mmap, remove.

This revision is now accepted and ready to land.Nov 5 2015, 5:12 AM

Closed by commit rL252159: [tsan] Alternative ThreadState storage for OS X (authored by kuba.brecka). · Explain WhyNov 5 2015, 5:57 AM

This revision was automatically updated to reflect the committed changes.

Landed in r252159. Thanks for the fast review!

Diff 39352

compiler-rt/trunk/lib/tsan/rtl/tsan_interceptors.cc

	Show First 20 Lines • Show All 813 Lines • ▼ Show 20 Lines
	void DestroyThreadState() {			void DestroyThreadState() {
	ThreadState *thr = cur_thread();			ThreadState *thr = cur_thread();
	ThreadFinish(thr);			ThreadFinish(thr);
	ThreadSignalContext *sctx = thr->signal_ctx;			ThreadSignalContext *sctx = thr->signal_ctx;
	if (sctx) {			if (sctx) {
	thr->signal_ctx = 0;			thr->signal_ctx = 0;
	UnmapOrDie(sctx, sizeof(*sctx));			UnmapOrDie(sctx, sizeof(*sctx));
	}			}
				cur_thread_finalize();
	}			}
	} // namespace __tsan			} // namespace __tsan

	static void thread_finalize(void *v) {			static void thread_finalize(void *v) {
	uptr iter = (uptr)v;			uptr iter = (uptr)v;
	if (iter > 1) {			if (iter > 1) {
	if (pthread_setspecific(g_thread_finalize_key, (void*)(iter - 1))) {			if (pthread_setspecific(g_thread_finalize_key, (void*)(iter - 1))) {
	Printf("ThreadSanitizer: failed to set thread key\n");			Printf("ThreadSanitizer: failed to set thread key\n");
	▲ Show 20 Lines • Show All 1,809 Lines • Show Last 20 Lines

compiler-rt/trunk/lib/tsan/rtl/tsan_platform_mac.cc

Show All 9 Lines
// This file is a part of ThreadSanitizer (TSan), a race detector.		// This file is a part of ThreadSanitizer (TSan), a race detector.
//		//
// Mac-specific code.		// Mac-specific code.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "sanitizer_common/sanitizer_platform.h"		#include "sanitizer_common/sanitizer_platform.h"
#if SANITIZER_MAC		#if SANITIZER_MAC

		#include "sanitizer_common/sanitizer_atomic.h"
#include "sanitizer_common/sanitizer_common.h"		#include "sanitizer_common/sanitizer_common.h"
#include "sanitizer_common/sanitizer_libc.h"		#include "sanitizer_common/sanitizer_libc.h"
		#include "sanitizer_common/sanitizer_posix.h"
#include "sanitizer_common/sanitizer_procmaps.h"		#include "sanitizer_common/sanitizer_procmaps.h"
#include "tsan_platform.h"		#include "tsan_platform.h"
#include "tsan_rtl.h"		#include "tsan_rtl.h"
#include "tsan_flags.h"		#include "tsan_flags.h"

#include <pthread.h>		#include <pthread.h>
#include <signal.h>		#include <signal.h>
#include <stdio.h>		#include <stdio.h>
#include <stdlib.h>		#include <stdlib.h>
#include <string.h>		#include <string.h>
#include <stdarg.h>		#include <stdarg.h>
#include <sys/mman.h>		#include <sys/mman.h>
#include <sys/syscall.h>		#include <sys/syscall.h>
#include <sys/time.h>		#include <sys/time.h>
#include <sys/types.h>		#include <sys/types.h>
#include <sys/resource.h>		#include <sys/resource.h>
#include <sys/stat.h>		#include <sys/stat.h>
#include <unistd.h>		#include <unistd.h>
#include <errno.h>		#include <errno.h>
#include <sched.h>		#include <sched.h>

namespace __tsan {		namespace __tsan {

		static void SignalSafeGetOrAllocate(uptr dst, uptr size) {
		atomic_uintptr_t a = (atomic_uintptr_t )dst;
		void val = (void )atomic_load_relaxed(a);
		atomic_signal_fence(memory_order_acquire); // Turns the previous load into
		// acquire wrt signals.
		if (UNLIKELY(val == nullptr)) {
		val = (void *)internal_mmap(nullptr, size, PROT_READ \| PROT_WRITE,
		MAP_PRIVATE \| MAP_ANON, -1, 0);
		CHECK(val);
		void *cmp = nullptr;
		if (!atomic_compare_exchange_strong(a, (uintptr_t *)&cmp, (uintptr_t)val,
		memory_order_acq_rel)) {
		internal_munmap(val, size);
		val = cmp;
		}
		}
		return val;
		}

		#ifndef SANITIZER_GO
		// On OS X, accessing TLVs via __thread or manually by using pthread_key_* is
		// problematic, because there are several places where interceptors are called
		// when TLVs are not accessible (early process startup, thread cleanup, ...).
		// The following provides a "poor man's TLV" implementation, where we use the
		// shadow memory of the pointer returned by pthread_self() to store a pointer to
		// the ThreadState object. The main thread's ThreadState pointer is stored
		// separately in a static variable, because we need to access it even before the
		// shadow memory is set up.
		static uptr main_thread_identity = 0;
		static ThreadState *main_thread_state = nullptr;

		ThreadState *cur_thread() {
		ThreadState **fake_tls;
		uptr thread_identity = (uptr)pthread_self();
		if (thread_identity == main_thread_identity \|\| main_thread_identity == 0) {
		fake_tls = &main_thread_state;
		} else {
		fake_tls = (ThreadState **)MemToShadow(thread_identity);
		}
		ThreadState thr = (ThreadState )SignalSafeGetOrAllocate(
		(uptr *)fake_tls, sizeof(ThreadState));
		return thr;
		}

		// TODO(kuba.brecka): This is not async-signal-safe. In particular, we call
		// munmap first and then clear `fake_tls`; if we receive a signal in between,
		// handler will try to access the unmapped ThreadState.
		void cur_thread_finalize() {
		uptr thread_identity = (uptr)pthread_self();
		CHECK_NE(thread_identity, main_thread_identity);
		ThreadState fake_tls = (ThreadState )MemToShadow(thread_identity);
		internal_munmap(*fake_tls, sizeof(ThreadState));
		*fake_tls = nullptr;
		}
		#endif

uptr GetShadowMemoryConsumption() {		uptr GetShadowMemoryConsumption() {
return 0;		return 0;
}		}

void FlushShadowMemory() {		void FlushShadowMemory() {
}		}

void WriteMemoryProfile(char *buf, uptr buf_size, uptr nthread, uptr nlive) {		void WriteMemoryProfile(char *buf, uptr buf_size, uptr nthread, uptr nlive) {
Show All 39 Lines	static void my_pthread_introspection_hook(unsigned int event, pthread_t thread,
if (prev_pthread_introspection_hook != nullptr)		if (prev_pthread_introspection_hook != nullptr)
prev_pthread_introspection_hook(event, thread, addr, size);		prev_pthread_introspection_hook(event, thread, addr, size);
}		}

void InitializePlatform() {		void InitializePlatform() {
DisableCoreDumperIfNecessary();		DisableCoreDumperIfNecessary();
#ifndef SANITIZER_GO		#ifndef SANITIZER_GO
CheckAndProtect();		CheckAndProtect();

		CHECK_EQ(main_thread_identity, 0);
		main_thread_identity = (uptr)pthread_self();
#endif		#endif

prev_pthread_introspection_hook =		prev_pthread_introspection_hook =
pthread_introspection_hook_install(&my_pthread_introspection_hook);		pthread_introspection_hook_install(&my_pthread_introspection_hook);
}		}

#ifndef SANITIZER_GO		#ifndef SANITIZER_GO
// Note: this function runs with async signals enabled,		// Note: this function runs with async signals enabled,
Show All 21 Lines

compiler-rt/trunk/lib/tsan/rtl/tsan_rtl.h

Show First 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	#endif

explicit ThreadState(Context *ctx, int tid, int unique_id, u64 epoch,		explicit ThreadState(Context *ctx, int tid, int unique_id, u64 epoch,
unsigned reuse_count,		unsigned reuse_count,
uptr stk_addr, uptr stk_size,		uptr stk_addr, uptr stk_size,
uptr tls_addr, uptr tls_size);		uptr tls_addr, uptr tls_size);
};		};

#ifndef SANITIZER_GO		#ifndef SANITIZER_GO
		#if SANITIZER_MAC
		ThreadState *cur_thread();
		void cur_thread_finalize();
		#else
__attribute__((tls_model("initial-exec")))		__attribute__((tls_model("initial-exec")))
extern THREADLOCAL char cur_thread_placeholder[];		extern THREADLOCAL char cur_thread_placeholder[];
INLINE ThreadState *cur_thread() {		INLINE ThreadState *cur_thread() {
return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);		return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
}		}
#endif		INLINE void cur_thread_finalize() { }
		#endif // SANITIZER_MAC
		#endif // SANITIZER_GO

class ThreadContext : public ThreadContextBase {		class ThreadContext : public ThreadContextBase {
public:		public:
explicit ThreadContext(int tid);		explicit ThreadContext(int tid);
~ThreadContext();		~ThreadContext();
ThreadState *thr;		ThreadState *thr;
u32 creation_stack_id;		u32 creation_stack_id;
SyncClock sync;		SyncClock sync;
▲ Show 20 Lines • Show All 338 Lines • Show Last 20 Lines

compiler-rt/trunk/lib/tsan/rtl/tsan_rtl.cc

	Show All 38 Lines
	volatile int __tsan_resumed = 0;			volatile int __tsan_resumed = 0;

	extern "C" void __tsan_resume() {			extern "C" void __tsan_resume() {
	__tsan_resumed = 1;			__tsan_resumed = 1;
	}			}

	namespace __tsan {			namespace __tsan {

	#ifndef SANITIZER_GO			#if !defined(SANITIZER_GO) && !SANITIZER_MAC
	THREADLOCAL char cur_thread_placeholder[sizeof(ThreadState)] ALIGNED(64);			THREADLOCAL char cur_thread_placeholder[sizeof(ThreadState)] ALIGNED(64);
	#endif			#endif
	static char ctx_placeholder[sizeof(Context)] ALIGNED(64);			static char ctx_placeholder[sizeof(Context)] ALIGNED(64);
	Context *ctx;			Context *ctx;

	// Can be overriden by a front-end.			// Can be overriden by a front-end.
	#ifdef TSAN_EXTERNAL_HOOKS			#ifdef TSAN_EXTERNAL_HOOKS
	bool OnFinalize(bool failed);			bool OnFinalize(bool failed);
	▲ Show 20 Lines • Show All 963 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[tsan] Alternative ThreadState storage for OS X
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 39352

compiler-rt/trunk/lib/tsan/rtl/tsan_interceptors.cc

compiler-rt/trunk/lib/tsan/rtl/tsan_platform_mac.cc

compiler-rt/trunk/lib/tsan/rtl/tsan_rtl.h

compiler-rt/trunk/lib/tsan/rtl/tsan_rtl.cc

This is an archive of the discontinued LLVM Phabricator instance.

[tsan] Alternative ThreadState storage for OS XClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 39352

compiler-rt/trunk/lib/tsan/rtl/tsan_interceptors.cc

compiler-rt/trunk/lib/tsan/rtl/tsan_platform_mac.cc

compiler-rt/trunk/lib/tsan/rtl/tsan_rtl.h

compiler-rt/trunk/lib/tsan/rtl/tsan_rtl.cc

[tsan] Alternative ThreadState storage for OS X
ClosedPublic