On OS X, there are several issues with using __thread to store the ThreadState objects that TSan relies on in all interceptors and memory instrumentation:
- During early process startup, interceptors are called (from dyld, Libc, etc.) when TLV is simply not available and any access to it will crash.
- During early new thread initialization, interceptors are called, but the TLV for the current thread is not yet initialized. It will be lazily loaded on the first access, but the initialization actually needs to call one of the intercepted functions (pthread_mutex_lock), creating a circular dependency.
- When a thread is finished, during its teardown, the TLV is destroyed (deallocated), but interceptors are still called on that thread, which will cause the TLV to get resurrected (by lazy initialization).
There are several possible workarounds, one could be to use pthread_key_create and pthread_getspecific, but this still has the thread finalization issue. This patch presents a different solution (originally proposed by Kostya): Based on the fact that pthread_self() is always available and reliable and returns a valid pointer to memory, we'll use the shadow memory of this pointer as a "poor man's TLV". No user code should ever read/write to this internal libpthread structure, so it's safe to use it for this purpose. We can simply lazily allocate the ThreadState object and store the pointer here.
To make this work, we need to store the main thread's ThreadState separately, because it needs to be available even before the shadow memory is initialized. Note that the current patch never deallocates the ThreadState objects and simply leaks them, which I'll fix in a subsequent patch.
There are some performance implications here, but I'd like to point out that the hot path contains only a call to pthread_main_np, pthread_self and MemToShadow. At least on OS X, pthread_self is only a single memory access (via the %gs segment) plus a return, and pthread_main_np has an extra memory access plus 2 arithmetic operations. So it seems that this implementation shouldn't hurt too much.
(This is part of an effort to port TSan to OS X, and it's one the very first steps. Don't expect TSan on OS X to actually work or pass tests at this point.)