This is an archive of the discontinued LLVM Phabricator instance.

[tsan] Alternative ThreadState storage for OS X
ClosedPublic

Authored by kubamracek on Nov 3 2015, 8:10 AM.

Details

Summary

On OS X, there are several issues with using __thread to store the ThreadState objects that TSan relies on in all interceptors and memory instrumentation:

  • During early process startup, interceptors are called (from dyld, Libc, etc.) when TLV is simply not available and any access to it will crash.
  • During early new thread initialization, interceptors are called, but the TLV for the current thread is not yet initialized. It will be lazily loaded on the first access, but the initialization actually needs to call one of the intercepted functions (pthread_mutex_lock), creating a circular dependency.
  • When a thread is finished, during its teardown, the TLV is destroyed (deallocated), but interceptors are still called on that thread, which will cause the TLV to get resurrected (by lazy initialization).

There are several possible workarounds, one could be to use pthread_key_create and pthread_getspecific, but this still has the thread finalization issue. This patch presents a different solution (originally proposed by Kostya): Based on the fact that pthread_self() is always available and reliable and returns a valid pointer to memory, we'll use the shadow memory of this pointer as a "poor man's TLV". No user code should ever read/write to this internal libpthread structure, so it's safe to use it for this purpose. We can simply lazily allocate the ThreadState object and store the pointer here.

To make this work, we need to store the main thread's ThreadState separately, because it needs to be available even before the shadow memory is initialized. Note that the current patch never deallocates the ThreadState objects and simply leaks them, which I'll fix in a subsequent patch.

There are some performance implications here, but I'd like to point out that the hot path contains only a call to pthread_main_np, pthread_self and MemToShadow. At least on OS X, pthread_self is only a single memory access (via the %gs segment) plus a return, and pthread_main_np has an extra memory access plus 2 arithmetic operations. So it seems that this implementation shouldn't hurt too much.

(This is part of an effort to port TSan to OS X, and it's one the very first steps. Don't expect TSan on OS X to actually work or pass tests at this point.)

Diff Detail

Repository
rL LLVM

Event Timeline

kubamracek updated this revision to Diff 39065.Nov 3 2015, 8:10 AM
kubamracek retitled this revision from to [tsan] Alternative ThreadState storage for OS X.
kubamracek updated this object.
kubamracek added reviewers: kcc, samsonov, glider, dvyukov.
dvyukov added inline comments.Nov 3 2015, 9:32 AM
lib/tsan/rtl/tsan_platform_mac.cc
43 ↗(On Diff #39065)

#ifndef SANITIZER_GO

51 ↗(On Diff #39065)

How complex is that patch?
If it is not too complex, then I would prefer to resolve this TODO right in this patch.

55 ↗(On Diff #39065)

What does pthread_self return for main thread? Or is the issue with early bootstrap?
If it is bootstrap, then I think it is better to have a global var that contains pthread_self value for main thread. This variable is initialized in Iniitialize. If this variable is 0, then we also know that this is the main thread.
Basically it will allow us to replace pthread_main_np call with a check of a global.
Something along the lines of:

uptr th = pthread_self();
uptr mt = g_main_thread;
if (mt == 0 || mt == th) {
  // main thread
} else {
  // non-main thread
}

Will this work?

64 ↗(On Diff #39065)

This is not async-signal-safe. A thread can receive a signal concurrent with the first call to cur_thread(), and signal handler will also call cur_thread().

lib/tsan/rtl/tsan_rtl.cc
47 ↗(On Diff #39065)

!defined(SANITIZER_GO)

lib/tsan/rtl/tsan_rtl.h
412 ↗(On Diff #39065)

This must be !defined(SANITIZER_GO)

kubamracek updated this revision to Diff 39114.Nov 3 2015, 2:28 PM

Updating patch to address comments.

This is not async-signal-safe. A thread can receive a signal concurrent with the first call to cur_thread(), and signal handler will also call cur_thread().

I see. Could you please suggest a way to make it async-signal-safe?

dvyukov edited edge metadata.EditedNov 4 2015, 1:30 AM

I see. Could you please suggest a way to make it async-signal-safe?

Something along the lines of:

thr = atomic_load(fake_tls, memory_order_relaxed);
atomic_signal_fence(memory_order_acquire); // turns the previous load into acquire wrt signals
if (thr == nullptr) {

// slow-path
thr = (ThreadState *)InternalAlloc(sizeof(ThreadState), nullptr);
internal_memset(thr, 0, sizeof(*thr));
ThreadState *cmp = nullptr;
if (!atomic_compare_exchange_strong(fake_tls, &cmp, thr, memory_order_acq_rel)) {
    InternalFree(thr);
    thr = cmp;
}

}
return thr;

However, InternalAlloc is not async-signal-safe (and probably needs explicit initialization, so don't work too early), so you need to replace them with internal_mmap/munmap.

kubamracek updated this revision to Diff 39211.Nov 4 2015, 8:50 AM
kubamracek edited edge metadata.

Updating the patch to also deal with ThreadState cleanup.

Thanks for the explanation of async-signal-safety, I'll take a look at it in another update of this patch.

Looks much better now.

Waiting for the async-signal-safe change, and then we can get this in.

lib/tsan/rtl/tsan_platform_mac.cc
59 ↗(On Diff #39211)

Move this in InitializePlatform.

64 ↗(On Diff #39211)

Add UNLIKELY.

I know you are now concentrated on getting it to work end-to-end, but these are small things that do improve generated code.

lib/tsan/rtl/tsan_rtl.h
423 ↗(On Diff #39211)

This also needs to be !defined(SANITIZER_GO)

I think it's better to do:

#ifndef SANITIZER_GO

#if SANITIZER_MAC
...
#else
...
#endif

#endif

kubamracek updated this revision to Diff 39344.Nov 5 2015, 4:43 AM

Updating patch to implement async-signal-safety.

I'm unsure about the destruction of ThreadState in cur_thread_finalize, does that need to be signal aware as well? But that only happens once in a thread, and it seems it's already too late for a signal handler to do anything reasonable, and we don't want it to resurrect the ThreadState. What is the behavior on Linux with regular __thread ThreadState?

Signal handlers call SigCtx to obtain signal handling context:

static ThreadSignalContext *SigCtx(ThreadState *thr) {

ThreadSignalContext *ctx = (ThreadSignalContext*)thr->signal_ctx;
if (ctx == 0 && !thr->is_dead) {
  ctx = (ThreadSignalContext*)MmapOrDie(sizeof(*ctx), "ThreadSignalContext");
  MemoryResetRange(thr, (uptr)&SigCtx, (uptr)ctx, sizeof(*ctx));
  thr->signal_ctx = ctx;
}
return ctx;

}

If the thread is already considered "dead" (during destruction or destroyed), then we just ignore signals.
This is most likely OK, because we can pretend that the thread just exited before catching the signal.

Add a note to cur_thread_finalize saying that it is not signal-safe (in particular you call unmap first and then clear the fake_tls, if we receive a signal in between, handler will try to access the unmapped ThreadState). We can sort it out later. But this is something to keep in mind (in fact, signals are the most common source of tsan bugs and crashes).

dvyukov accepted this revision.Nov 5 2015, 5:12 AM
dvyukov edited edge metadata.

LGTM after addition of note to cur_thread_finalize and removal of memset.

lib/tsan/rtl/tsan_platform_mac.cc
54 ↗(On Diff #39344)

This is unnecessary with mmap, remove.

This revision is now accepted and ready to land.Nov 5 2015, 5:12 AM
This revision was automatically updated to reflect the committed changes.

Landed in r252159. Thanks for the fast review!