This is an archive of the discontinued LLVM Phabricator instance.

[tsan] Intercept libcxx __release_shared to avoid false positive with weak_ptrs and destructors in C++
ClosedPublic

Authored by kubamracek on Jun 22 2016, 7:14 AM.

Details

Summary

There is a “well-known” TSan false positive when using C++ weak_ptr/shared_ptr and code in destructors, e.g. described at https://llvm.org/bugs/show_bug.cgi?id=22324. The “standard" solution is to build and use a TSan-instrumented version of libcxx, which is not trivial for end-users. This patch tries a different approach (on OS X): It adds an interceptor for the specific function in libc++.dylib, which implements the atomic operation that needs to be visible to TSan.

The API/ABI of libc++.dylib is stable on OS X, so adding this interceptor is compatible with all OS versions. I went through the sources of libcxx and this seems to be the only function that needs to be intercepted.

Diff Detail

Repository
rL LLVM

Event Timeline

kubamracek updated this revision to Diff 61547.Jun 22 2016, 7:14 AM
kubamracek retitled this revision from to [tsan] Intercept libcxx __release_shared to avoid false positive with weak_ptrs and destructors in C++.
kubamracek updated this object.
kubamracek added reviewers: dvyukov, kcc, glider.
kubamracek added a project: Restricted Project.
dvyukov added inline comments.Jun 22 2016, 11:21 AM
lib/tsan/rtl/tsan_interceptors_mac.cc
286 ↗(On Diff #61547)

Why we don't need to intercept release of weak references?
Is it that last shared references releases weak, or vice versa?

289 ↗(On Diff #61547)

I think Acquire should happen _after_ the actual release to synchronize with all preceding Release's.
Normally, you can't touch the object after release in all but the last thread, because the object can already be freed (unmapped). So ideally we need to determine the last thread and do Acquire only in it.
But I wonder if tsan Release is OK -- it won't touch object memory.

kubamracek added inline comments.Jun 23 2016, 6:08 AM
lib/tsan/rtl/tsan_interceptors_mac.cc
286 ↗(On Diff #61547)

As far as I know, the destructor of the object is only called when the last shared reference is destroyed. Weak references don’t matter, as they don’t run the destructor. The only issue I’m trying to solve here is a false positive when the destructor of a user’s object is called when the last shared reference is freed. I’m not aware of other false positives with C++ shared/weak pointers.

289 ↗(On Diff #61547)

I’m not sure what you’re suggesting here -- are you saying that Acquire should be at the end of the interceptor? Note that o here isn’t the user’s object, nor the shared_ptr, it’s the control block, which is heap-allocated. I actually think that even in the last thread (last shared reference is destroyed), o is already destroyed when the REAL() call finishes (the object destroys itself).

Secondly, I remember accidentally calling Release() on just-freed memory, which resulted in subtle crashes and many wasted hours of debugging. I was actually thinking of implementing a DCHECK that would trigger whenever Release or Acquire is called on a deallocated heap object.

What is the problem you’re seeing here with this patch? Is it about two threads doing Acquires first and only then doing Releases, in which case we’d still report a false positive? Then it sounds we need an atomic Acquire+Release, which we could solve with some atomic CAS loop or, worst case, a spinlock.

Suggestions?

dvyukov added inline comments.Jun 23 2016, 6:24 AM
lib/tsan/rtl/tsan_interceptors_mac.cc
289 ↗(On Diff #61547)

Is it about two threads doing Acquires first and only then doing Releases, in which case we’d still report a false positive?

Yes.
I see there are some other helper functions that return true/false and don't destroy the object. Is it possible to intercept any of them?

kubamracek added inline comments.Jun 23 2016, 6:28 AM
lib/tsan/rtl/tsan_interceptors_mac.cc
289 ↗(On Diff #61547)

I don’t think so, unfortunately. The other functions get inlined into __shared_weak_count::__release_shared and they’re not called directly.

dvyukov added inline comments.Jun 23 2016, 7:17 AM
lib/tsan/rtl/tsan_interceptors_mac.cc
289 ↗(On Diff #61547)

The simplest and the most reliable thing would be:

u64 shared_ptr_sync[kSomePrime];

uptr sync = (uptr)&shared_ptr_sync[hash(o) % kSomePrime];
Release(thr, pc, sync);
SCOPED_TSAN_INTERCEPTOR_USER_CALLBACK_START();
REAL(_ZNSt3__119__shared_weak_count16__release_sharedEv)(o);
SCOPED_TSAN_INTERCEPTOR_USER_CALLBACK_END();
Acquire(thr, pc, sync);

Note that kSomePrime should not be too large, because eventually every slot will contain maximally-sized vector clock which takes 8*num_threads bytes.

Also comment it all.

kubamracek added inline comments.Jun 23 2016, 7:30 AM
lib/tsan/rtl/tsan_interceptors_mac.cc
289 ↗(On Diff #61547)

I’m probably missing some point. This doesn’t work in the case when two threads release a shared ptr: T1 calls Release(), T2 calls Release(), T1 calls REAL(), T2 calls REAL(), which is the last reference, it gets destroyed, the user’s object destructor is called and we report a false positive (because there hasn’t been any acquires at all). If I put a Sleep(1) before Acquire(), the test case I provided reports the false positive reliably.

dvyukov edited edge metadata.Jun 23 2016, 7:42 AM

You are right.

Then we can try:

Mutex shared_ptr_sync[kSomePrime];

Mutex *mtx = &shared_ptr_sync[hash(o) % kSomePrime];
mtx.Lock();
Acquire(thr, pc, o);
Release(thr, pc, o);
SCOPED_TSAN_INTERCEPTOR_USER_CALLBACK_START();
REAL(_ZNSt3__119__shared_weak_count16__release_sharedEv)(o);
SCOPED_TSAN_INTERCEPTOR_USER_CALLBACK_END();
mtx.Unlock();

But I am scared by potential consequences of running user code under internal mutex...

You are right.

Then we can try:

Mutex shared_ptr_sync[kSomePrime];

Mutex *mtx = &shared_ptr_sync[hash(o) % kSomePrime];
mtx.Lock();
Acquire(thr, pc, o);
Release(thr, pc, o);
SCOPED_TSAN_INTERCEPTOR_USER_CALLBACK_START();
REAL(_ZNSt3__119__shared_weak_count16__release_sharedEv)(o);
SCOPED_TSAN_INTERCEPTOR_USER_CALLBACK_END();
mtx.Unlock();

But I am scared by potential consequences of running user code under internal mutex...

Can we unlock before calling the user code? It sounds to me we just need the Acquire+Release to be atomic.

No, we need them to be atomic with free.
Consider:
T1 does Acq+Rel
T2 does Acq+Rel
T2 releases reference
T1 releases reference and frees the object
free is not synchronized with T2

No, we need them to be atomic with free.
Consider:
T1 does Acq+Rel
T2 does Acq+Rel
T2 releases reference
T1 releases reference and frees the object
free is not synchronized with T2

Ah, you’re right.

Do we have a recursive mutex in the runtime? I’m afraid destroying another shared_ptr from the destructor is quite likely.

Uh, if destructor is called there, then it won't work. Recursive mutexes won't help. A destructor can spawn another thread and wait for its completion, while the other thread can destroy other shared_ptr's which in turn will try to acquire the same mutex.

Uh, if destructor is called there, then it won't work. Recursive mutexes won't help. A destructor can spawn another thread and wait for its completion, while the other thread can destroy other shared_ptr's which in turn will try to acquire the same mutex.

:(

Okay, another option: Reimplementing the __release_shared method without calling REAL(). Either we could cast it to a carefully-crafted C++ class that’s set up in the same way as __shared_weak_count is, or we could do some extremely ugly pointer arithmetics & vtable magic. But it should work, because the ABI is (and needs to be) stable. Opinions?

I can only think of other ugly solutions.
But first I need to ask: have you considered bundling tsan-instrumented libc++ with clang? It would benefit asan/tsan, and is a requirement for msan.

I can only think of other ugly solutions.
But first I need to ask: have you considered bundling tsan-instrumented libc++ with clang? It would benefit asan/tsan, and is a requirement for msan.

Yes. I'm able to build it and use it. But distributing it to users is logistically extremely problematic. The original library ships with the OS, but we cannot ship an instrumented version as well, because that depends on the version of the instrumentation. Having the dylib in the toolchain would yet again complicate a lot of use cases, like deploying to different machines (e.g. for testing).

Having an interceptor (or a few of them), even when it's ugly and complex, is by far the easiest option for us.

kubamracek edited edge metadata.

Updating the patch to re-implement the whole method using.

I can't make my mind. I well understand that saying users "build your own tsan-instrumented libcxx" is close to having no answer at all. So I guess if that's the solution you want to maintain, I am OK with it.

Does it make sense to add a runtime flag that turns the interceptor off for emergency cases?

Also please document all alternatives that we've considered and why they don't work.

I would also write some kind of a stress test that creates/destroys shared_ptrs from different threads and takes/releases strong/weak references.

One other alternative is to insert empty non-inlinable hook functions into libcxx so that tsan can intercept them. But that will slow down normal execution, so I guess people will complain.

kubamracek updated this revision to Diff 61771.Jun 24 2016, 3:56 AM

Updating patch, adding comment, adding a flag to disable the interceptor, adding a test for recursive shared_ptr destruction, adding a stress test.

dvyukov accepted this revision.Jun 24 2016, 2:09 PM
dvyukov edited edge metadata.

LGTM if you agree to change acq_rel to release

lib/tsan/rtl/tsan_interceptors_mac.cc
309 ↗(On Diff #61771)

Since we are reimplementing the whole function we can actually do better than the original. The problem (for tsan) with mo_acq_rel is that all threads synchronize with each other, while only the last thread needs to synchronize with all other threads. So I would suggest to do:

if (__tsan_atomic64_fetch_add(&o->shared_owners, -1, mo_release) == 0) {
  Acquire(thr, pc, &o->shared_owners);
  o->on_zero_shared();
  if (__tsan_atomic64_fetch_add(&o->shared_weak_owners, -1, mo_release) == 0)
    Acquire(thr, pc, &o->shared_weak_owners);
    o->on_zero_shared_weak();
  }
}
This revision is now accepted and ready to land.Jun 24 2016, 2:09 PM

... and tests pass after that

This revision was automatically updated to reflect the committed changes.