This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/lib/tsan/rtl/
-
lib/
-
tsan/
-
rtl/
3
tsan_platform_mac.cpp

Differential D110162

[TSan][Darwin] Avoid crashes due to interpreting non-zero shadow content as a pointer
AbandonedPublic

Authored by yln on Sep 21 2021, 6:00 AM.

Download Raw Diff

Details

Reviewers

kubamracek
delcypher
aralisza
dvyukov

Summary

We would like to use TLS to store the ThreadState object (or at least a
reference ot it), but on Darwin accessing TLS via __thread or manually
by using pthread_key_* is problematic, because there are several places
where interceptors are called when TLS is not accessible (early process
startup, thread cleanup, ...).

Previously, we used a "poor man's TLS" implementation, where we use the
shadow memory of the pointer returned by pthread_self() to store a
pointer to the ThreadState object.

The problem with that was that certain operations can populate shadow
bytes unbeknownst to TSan, and we later interpret these non-zero bytes
as the pointer to our ThreadState object and crash on when dereferencing
the pointer.

This patch changes how we store the reference to the ThreadState object.
Instead, of simulating TLS via the shadow memory, we use a global,
thread-safe hash map to store a pointer to our ThreadState objects and
use mmap() to allocate the backing memory. The main thread's
ThreadState is stored separately in a static variable, because we need
to access it even before we can allocate and initialize the hash map.

Radar-Id: rdar://problem/72010355

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	50 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
	60 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-static-initializer.S
	50 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-tls.S

Event Timeline

yln requested review of this revision.Sep 21 2021, 6:00 AM

yln created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2021, 6:00 AM

Herald added a subscriber: Restricted Project. · View Herald Transcript

Harbormaster completed remote builds in B124883: Diff 373883.Sep 21 2021, 6:13 AM

kubamracek added a reviewer: dvyukov.Sep 21 2021, 6:52 AM

Hi Julian,

I assume the garbage in shadow you are referring to is written to the shadow before the thread is created, is it correct? If yes, couldn't we reset the shadow to 0 on thread creation? We intercept all thread creations anyway.
I am quite worried performance overhead. cur_thread is called for every memory access.

delcypher added inline comments.Sep 21 2021, 4:47 PM

compiler-rt/lib/tsan/rtl/tsan_platform_mac.cpp
69	Is this how TSan internals do allocation? I would have expected a call to the internal allocator.
96	Is a write to the value stored in the map safe to do? The comments on the data structure make it sound like its only safe to write to the stored value when `h.created()` is true. // { // Map::Handle h(&m, addr); // use h.operator->() to access the data // if h.created() then the element was just created, and the current thread // has exclusive access to it // otherwise the current thread has only read access to the data // }
292	The code in the `else` block is a little confusing. It looks like it's assuming the address returned by `pthread_self` is going to be somewhere in the TLS and so it tries to avoid updating the part of the shadow that we're using to store the `ThreadState` pointer. Presumably it can be removed in this patch because we're not storing the `ThreadState` pointer in the shadow?

In D110162#3012662, @dvyukov wrote:

Hi Julian,

I assume the garbage in shadow you are referring to is written to the shadow before the thread is created, is it correct? If yes, couldn't we reset the shadow to 0 on thread creation? We intercept all thread creations anyway.
I am quite worried performance overhead. cur_thread is called for every memory access.

Hi Dimitry,

I agree that this would be the best solution, i.e., it would solve the root cause (corrupted shadow memory bytes) and not just the symptom (crash).

Unfortunately, I can't figure out why this approach doesn't work in our customer's setup:
https://reviews.llvm.org/D109184 (not sufficient)

I've confirmed that storing the reference in "true" TLS (via a trick) works. Would you be happy with this approach?
https://reviews.llvm.org/D110236 (confirmed fix)

Thanks,
Julian

Abandoning in favor of D110236

In D110162#3030861, @yln wrote:

In D110162#3012662, @dvyukov wrote:

Hi Julian,

I assume the garbage in shadow you are referring to is written to the shadow before the thread is created, is it correct? If yes, couldn't we reset the shadow to 0 on thread creation? We intercept all thread creations anyway.
I am quite worried performance overhead. cur_thread is called for every memory access.

Hi Dimitry,

I agree that this would be the best solution, i.e., it would solve the root cause (corrupted shadow memory bytes) and not just the symptom (crash).

Unfortunately, I can't figure out why this approach doesn't work in our customer's setup:
https://reviews.llvm.org/D109184 (not sufficient)

MemoryRangeImitateWriteOrResetRange writes non-0's to shadow. Maybe MemoryResetRange will help?

In D110162#3033354, @dvyukov wrote:

MemoryRangeImitateWriteOrResetRange writes non-0's to shadow. Maybe MemoryResetRange will help?

To give a bit more context:
Calling any of the "proper" shadow memory functions already requires a ThreadState thr, which we don't have in the cases where this matters: we want to initialize the pointer with 0 to make sure that later on other code recognizes that it needs to be created in the first place!

There are 2 more issues:

MemoryRangeReset() doesn't necessarily force 0 in the shadow bytes, but may only marks the region as "deleted" (like deleting a file in a filesystem)
Even when it resets bytes, it doesn't necessarily reset all bytes in a large region (just the first and last few pages)

To sidestep these issues I put a blunt internal_memset(shadow_addr, 0, shadow_size) in all cases (even the ones that show early returns, e.g., because thr isn't initialized, in the current patch) in mach_vm_map and mach_vm_allocator interceptor just to see if it would resolve the issue (and then work on a refined patch for the approach), but our customer still reported the same crashes.

Revision Contents

Path

Size

compiler-rt/

lib/

tsan/

rtl/

tsan_platform_mac.cpp

135 lines

Diff 373883

compiler-rt/lib/tsan/rtl/tsan_platform_mac.cpp

//===-- tsan_platform_mac.cpp ---------------------------------------------===//		//===-- tsan_platform_mac.cpp ---------------------------------------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file is a part of ThreadSanitizer (TSan), a race detector.		// This file is a part of ThreadSanitizer (TSan), a race detector.
//		//
// Mac-specific code.		// Mac-specific code.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "sanitizer_common/sanitizer_platform.h"		#include "sanitizer_common/sanitizer_platform.h"
#if SANITIZER_MAC		#if SANITIZER_MAC

		#include "sanitizer_common/sanitizer_addrhashmap.h"
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -#include "sanitizer_common/sanitizer_addrhashmap.h" -#include "sanitizer_common/sanitizer_atomic.h" -#include "sanitizer_common/sanitizer_common.h" -#include "sanitizer_common/sanitizer_libc.h" -#include "sanitizer_common/sanitizer_posix.h" -#include "sanitizer_common/sanitizer_procmaps.h" -#include "sanitizer_common/sanitizer_ptrauth.h" -#include "sanitizer_common/sanitizer_stackdepot.h" -#include "tsan_platform.h" -#include "tsan_rtl.h" 17 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` -#include "sanitizer_common/sanitizer_addrhashmap.h"…
#include "sanitizer_common/sanitizer_atomic.h"		#include "sanitizer_common/sanitizer_atomic.h"
#include "sanitizer_common/sanitizer_common.h"		#include "sanitizer_common/sanitizer_common.h"
#include "sanitizer_common/sanitizer_libc.h"		#include "sanitizer_common/sanitizer_libc.h"
#include "sanitizer_common/sanitizer_posix.h"		#include "sanitizer_common/sanitizer_posix.h"
#include "sanitizer_common/sanitizer_procmaps.h"		#include "sanitizer_common/sanitizer_procmaps.h"
#include "sanitizer_common/sanitizer_ptrauth.h"		#include "sanitizer_common/sanitizer_ptrauth.h"
#include "sanitizer_common/sanitizer_stackdepot.h"		#include "sanitizer_common/sanitizer_stackdepot.h"
#include "tsan_platform.h"		#include "tsan_platform.h"
#include "tsan_rtl.h"		#include "tsan_rtl.h"
#include "tsan_flags.h"		#include "tsan_flags.h"

#include <mach/mach.h>		#include <mach/mach.h>
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -#include <mach/mach.h> -#include <pthread.h> -#include <signal.h> -#include <stdio.h> -#include <stdlib.h> -#include <string.h> -#include <stdarg.h> -#include <sys/mman.h> -#include <sys/syscall.h> -#include <sys/time.h> 17 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` -#include <mach/mach.h> -#include <pthread.h>…
#include <pthread.h>		#include <pthread.h>
#include <signal.h>		#include <signal.h>
#include <stdio.h>		#include <stdio.h>
#include <stdlib.h>		#include <stdlib.h>
#include <string.h>		#include <string.h>
#include <stdarg.h>		#include <stdarg.h>
#include <sys/mman.h>		#include <sys/mman.h>
#include <sys/syscall.h>		#include <sys/syscall.h>
#include <sys/time.h>		#include <sys/time.h>
#include <sys/types.h>		#include <sys/types.h>
#include <sys/resource.h>		#include <sys/resource.h>
#include <sys/stat.h>		#include <sys/stat.h>
#include <unistd.h>		#include <unistd.h>
#include <errno.h>		#include <errno.h>
#include <sched.h>		#include <sched.h>

namespace __tsan {		namespace __tsan {

#if !SANITIZER_GO		#if !SANITIZER_GO
static void SignalSafeGetOrAllocate(uptr dst, uptr size) {		// We would like to use TLS to store the ThreadState object (or at least a
atomic_uintptr_t a = (atomic_uintptr_t )dst;		// reference ot it), but on Darwin accessing TLS via __thread or manually by
void val = (void )atomic_load_relaxed(a);		// using pthread_key_* is problematic, because there are several places where
atomic_signal_fence(memory_order_acquire); // Turns the previous load into		// interceptors are called when TLS is not accessible (early process startup,
// acquire wrt signals.		// thread cleanup, ...).
if (UNLIKELY(val == nullptr)) {		// Instead, we use a global, thread-safe hash map to store a pointer to our
val = (void *)internal_mmap(nullptr, size, PROT_READ \| PROT_WRITE,		// ThreadState objects and use mmap() to allocate the backing memory. The main
MAP_PRIVATE \| MAP_ANON, -1, 0);		// thread's ThreadState is stored separately in a static variable, because we
CHECK(val);		// need to access it even before we can allocate and initialize the hash map.
void *cmp = nullptr;
if (!atomic_compare_exchange_strong(a, (uintptr_t *)&cmp, (uintptr_t)val,		constexpr uptr kNumThreads = 67; // prime
memory_order_acq_rel)) {		using ThreadStateMap = AddrHashMap<ThreadState *, kNumThreads>;
internal_munmap(val, size);
val = cmp;		static ThreadStateMap* thread_state_map;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -static ThreadStateMap* thread_state_map; +static ThreadStateMap thread_state_map; Lint: Pre-merge checks:* clang-format: please reformat the code ``` -static ThreadStateMap* thread_state_map; +static…
}		static char main_thread_state[sizeof(ThreadState)] ALIGNED(64);
}
return val;		static uptr thread_identity() { return (uptr)pthread_self(); }
}
		static void InitializeThreadStateMap() {
// On OS X, accessing TLVs via __thread or manually by using pthread_key_* is		CHECK_EQ(thread_state_map, nullptr);
// problematic, because there are several places where interceptors are called		thread_state_map = new ThreadStateMap(); // never freed
		delcypherUnsubmitted Not Done Reply Inline Actions Is this how TSan internals do allocation? I would have expected a call to the internal allocator. delcypher: Is this how TSan internals do allocation? I would have expected a call to the internal…
// when TLVs are not accessible (early process startup, thread cleanup, ...).
// The following provides a "poor man's TLV" implementation, where we use the		uptr main_thread = thread_identity();
// shadow memory of the pointer returned by pthread_self() to store a pointer to		ThreadStateMap::Handle h(thread_state_map, main_thread);
// the ThreadState object. The main thread's ThreadState is stored separately		CHECK(h.created());
// in a static variable, because we need to access it even before the		h = (ThreadState )main_thread_state;
// shadow memory is set up.
static uptr main_thread_identity = 0;
ALIGNED(64) static char main_thread_state[sizeof(ThreadState)];
static ThreadState main_thread_state_loc = (ThreadState )main_thread_state;

// We cannot use pthread_self() before libpthread has been initialized. Our
// current heuristic for guarding this is checking `main_thread_identity` which
// is only assigned in `__tsan::InitializePlatform`.
static ThreadState **cur_thread_location() {
if (main_thread_identity == 0)
return &main_thread_state_loc;
uptr thread_identity = (uptr)pthread_self();
if (thread_identity == main_thread_identity)
return &main_thread_state_loc;
return (ThreadState **)MemToShadow(thread_identity);
}		}

ThreadState *cur_thread() {		ThreadState *cur_thread() {
return (ThreadState *)SignalSafeGetOrAllocate(		if (UNLIKELY(thread_state_map == nullptr)) {
(uptr *)cur_thread_location(), sizeof(ThreadState));		return (ThreadState *)main_thread_state;
		}

		// Our first interceptors get called before libpthread has been fully
		// initialized and calling pthread_self() would crash. These cases are also
		// covered by the above condition, i.e., we only reach this line once TSan
		// (and therefore libpthread) have been initialized.
		ThreadStateMap::Handle h(thread_state_map, thread_identity());
		if (h.created()) {
		h = (ThreadState )MmapOrDie(sizeof(ThreadState), "ThreadState");
		}
		return *h;
}		}

void set_cur_thread(ThreadState *thr) {		void set_cur_thread(ThreadState *thr) {
*cur_thread_location() = thr;		ThreadStateMap::Handle h(thread_state_map, thread_identity());
		CHECK(h.exists());
		*h = thr;
		delcypherUnsubmitted Not Done Reply Inline Actions Is a write to the value stored in the map safe to do? The comments on the data structure make it sound like its only safe to write to the stored value when `h.created()` is true. // { // Map::Handle h(&m, addr); // use h.operator->() to access the data // if h.created() then the element was just created, and the current thread // has exclusive access to it // otherwise the current thread has only read access to the data // } delcypher: Is a write to the value stored in the map safe to do? The comments on the data structure make…
}		}

// TODO(kuba.brecka): This is not async-signal-safe. In particular, we call
// munmap first and then clear `fake_tls`; if we receive a signal in between,
// handler will try to access the unmapped ThreadState.
void cur_thread_finalize() {		void cur_thread_finalize() {
ThreadState **thr_state_loc = cur_thread_location();		ThreadStateMap::Handle h(thread_state_map, thread_identity(), /remove=/true);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - ThreadStateMap::Handle h(thread_state_map, thread_identity(), /remove=/true); + ThreadStateMap::Handle h(thread_state_map, thread_identity(), + /remove=/true); Lint: Pre-merge checks: clang-format: please reformat the code ``` - ThreadStateMap::Handle h(thread_state_map…
if (thr_state_loc == &main_thread_state_loc) {		CHECK(h.exists());
		if (UNLIKELY(h == (ThreadState )main_thread_state)) {
// Calling dispatch_main() or xpc_main() actually invokes pthread_exit to		// Calling dispatch_main() or xpc_main() actually invokes pthread_exit to
// exit the main thread. Let's keep the main thread's ThreadState.		// exit the main thread. Let's keep the main thread's ThreadState.
return;		return;
}		}
internal_munmap(*thr_state_loc, sizeof(ThreadState));		UnmapOrDie(*h, sizeof(ThreadState));
*thr_state_loc = nullptr;
}		}
#endif		#endif

void FlushShadowMemory() {		void FlushShadowMemory() {
}		}

static void RegionMemUsage(uptr start, uptr end, uptr res, uptr dirty) {		static void RegionMemUsage(uptr start, uptr end, uptr res, uptr dirty) {
vm_address_t address = start;		vm_address_t address = start;
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	if (thread == pthread_self()) {
ThreadStart(thr, tid, GetTid(), ThreadType::Worker);		ThreadStart(thr, tid, GetTid(), ThreadType::Worker);
}		}
} else if (event == PTHREAD_INTROSPECTION_THREAD_TERMINATE) {		} else if (event == PTHREAD_INTROSPECTION_THREAD_TERMINATE) {
if (thread == pthread_self()) {		if (thread == pthread_self()) {
ThreadState *thr = cur_thread();		ThreadState *thr = cur_thread();
if (thr->tctx) {		if (thr->tctx) {
DestroyThreadState();		DestroyThreadState();
}		}
		// FIXME: We destroyed the ThreadState object above which includes
		// unmapping the backing memory in cur_thread_finalize(). However, other
		// intercepted APIs can still get called on the terminating thread. One
		// example is from an "outer" THREAD_TERMINATE introspection hook (e.g.,
		// libBacktraceRecording for Xcode "Queue Debugging" feature). When this
		// happens we re-allocate more backing storage in cur_thread() and, in
		// most cases, leak the allocated memory. Potential solutions include:
		// * Call ThreadFinish() here, but delay releasing the memory for the
		// ThreadState object in THREAD_DESTROY event (which is guaranteed to
		// be delivered on the parent thread after the thread terminated).
		// This requires splitting up the operations in DestroyThreadState()
		// between THREAD_TERMINATE and THREAD_DESTROY.
		// * Use a dummy "dead thread" ThreadState object (like we do on the
		// Linux side) to avoid re-allocation.
}		}
}		}

if (prev_pthread_introspection_hook != nullptr)		if (prev_pthread_introspection_hook != nullptr)
prev_pthread_introspection_hook(event, thread, addr, size);		prev_pthread_introspection_hook(event, thread, addr, size);
}		}
#endif		#endif

Show All 10 Lines

static uptr longjmp_xor_key = 0;		static uptr longjmp_xor_key = 0;

void InitializePlatform() {		void InitializePlatform() {
DisableCoreDumperIfNecessary();		DisableCoreDumperIfNecessary();
#if !SANITIZER_GO		#if !SANITIZER_GO
CheckAndProtect();		CheckAndProtect();

CHECK_EQ(main_thread_identity, 0);		InitializeThreadStateMap();
main_thread_identity = (uptr)pthread_self();

prev_pthread_introspection_hook =		prev_pthread_introspection_hook =
pthread_introspection_hook_install(&my_pthread_introspection_hook);		pthread_introspection_hook_install(&my_pthread_introspection_hook);
#endif		#endif

if (GetMacosAlignedVersion() >= MacosVersion(10, 14)) {		if (GetMacosAlignedVersion() >= MacosVersion(10, 14)) {
// Libsystem currently uses a process-global key; this might change.		// Libsystem currently uses a process-global key; this might change.
const unsigned kTLSLongjmpXorKeySlot = 0x7;		const unsigned kTLSLongjmpXorKeySlot = 0x7;
Show All 13 Lines	uptr ExtractLongJmpSp(uptr *env) {
uptr sp = mangled_sp ^ longjmp_xor_key;		uptr sp = mangled_sp ^ longjmp_xor_key;
sp = (uptr)ptrauth_auth_data((void *)sp, ptrauth_key_asdb,		sp = (uptr)ptrauth_auth_data((void *)sp, ptrauth_key_asdb,
ptrauth_string_discriminator("sp"));		ptrauth_string_discriminator("sp"));
return sp;		return sp;
}		}

#if !SANITIZER_GO		#if !SANITIZER_GO
void ImitateTlsWrite(ThreadState *thr, uptr tls_addr, uptr tls_size) {		void ImitateTlsWrite(ThreadState *thr, uptr tls_addr, uptr tls_size) {
// The pointer to the ThreadState object is stored in the shadow memory		// Unlike Linux, the ThreadState object is not stored in TLS.
// of the tls.
uptr tls_end = tls_addr + tls_size;
uptr thread_identity = (uptr)pthread_self();
if (thread_identity == main_thread_identity) {
MemoryRangeImitateWrite(thr, /pc=/2, tls_addr, tls_size);		MemoryRangeImitateWrite(thr, /pc=/2, tls_addr, tls_size);
} else {
uptr thr_state_start = thread_identity;
delcypherUnsubmitted Not Done Reply Inline Actions The code in the `else` block is a little confusing. It looks like it's assuming the address returned by `pthread_self` is going to be somewhere in the TLS and so it tries to avoid updating the part of the shadow that we're using to store the `ThreadState` pointer. Presumably it can be removed in this patch because we're not storing the `ThreadState` pointer in the shadow? delcypher: The code in the `else` block is a little confusing. It looks like it's assuming the address…
uptr thr_state_end = thr_state_start + sizeof(uptr);
CHECK_GE(thr_state_start, tls_addr);
CHECK_LE(thr_state_start, tls_addr + tls_size);
CHECK_GE(thr_state_end, tls_addr);
CHECK_LE(thr_state_end, tls_addr + tls_size);
MemoryRangeImitateWrite(thr, /pc=/2, tls_addr,
thr_state_start - tls_addr);
MemoryRangeImitateWrite(thr, /pc=/2, thr_state_end,
tls_end - thr_state_end);
}
}		}
#endif		#endif

#if !SANITIZER_GO		#if !SANITIZER_GO
// Note: this function runs with async signals enabled,		// Note: this function runs with async signals enabled,
// so it must not touch any tsan state.		// so it must not touch any tsan state.
int call_pthread_cancel_with_cleanup(int (fn)(void arg),		int call_pthread_cancel_with_cleanup(int (fn)(void arg),
void (cleanup)(void arg), void *arg) {		void (cleanup)(void arg), void *arg) {
Show All 13 Lines