This is an archive of the discontinued LLVM Phabricator instance.

[tsan] Implement basic GCD interceptors for OS X
ClosedPublic

Authored by kubamracek on Nov 17 2015, 5:13 AM.

Details

Summary

We need to intercept libdispatch APIs (dispatch_sync, dispatch_async, etc.) to add synchronization between the code that submits the task and the code that gets executed (possibly on a different thread).

This code adds release+acquire semantics for dispatch_sync, dispatch_async, dispatch_after and dispatch_apply (plus their "_f" variants). Added a simple testcase using dispatch_async, I'll add more GCD tests later.

For a program using libdispatch, TSan produces a bunch of false positives (from libxpc, libobjc, etc.). In order to suppress those, I added a few system libraries into a list of default suppressions.

Secondly, dispatch_once is not properly interceptible, because part of it is inlined, see the comment in the patch. To avoid false positives here, this patch ignores all reads and writes coming from the initializer within dispatch_once.

Diff Detail

Event Timeline

kubamracek updated this revision to Diff 40387.Nov 17 2015, 5:13 AM
kubamracek retitled this revision from to [tsan] Implement basic GCD interceptors for OS X.
kubamracek updated this object.
kubamracek added reviewers: dvyukov, kcc, glider, samsonov.
dvyukov edited edge metadata.Nov 17 2015, 5:51 AM

Doh! Don't know whether I need to laugh or cry.
If you mean this one:
http://www.opensource.apple.com/source/libdispatch/libdispatch-339.1.9/dispatch/once.h
It does contain real, harmful data race. Load of predicate on fast path must be an atomic load with acquire memory ordering.

First we need to fix the actual bug in libdispatch.
Then you can do the same thing I did for pthread_once. Namely, reimplement dispatch_once entirely expecting that the inlined fast-path will load acquire load of the predicate address.

dvyukov added inline comments.Nov 17 2015, 6:03 AM
lib/tsan/rtl/tsan_libdispatch_mac.cc
1

s/tsan_platform_mac.cc/tsan_libdispatch_mac.cc/

162

I see the issue with non-instrumented system libraries.
Does it actually happen?

171

Why do you use a code block here, but don't use it in dispatch_once_f?
Can't we simply wrap REAL(dispatch_once) in ThreadIgnoreBegin/End here as well?

lib/tsan/rtl/tsan_suppressions.cc
39 โ†—(On Diff #40387)

Note that if you ignore interceptors, this can also introduce new false positives because of missed synchronization. So if you ignore a library you need to annotate all synchonization this library provides to user code.

test/tsan/Darwin/gcd-simple.mm
20 โ†—(On Diff #40387)

Also print DONE at the end of the program and check that it is printed. It happened several times that interceptors crash badly, but tests pass.

dvyukov added inline comments.Nov 17 2015, 6:36 AM
lib/tsan/rtl/tsan_libdispatch_mac.cc
65

You've added libdispatch to suppressions, and caller_pc inside of SCOPED_INTERCEPTOR_RAW should point into libdispatch, and you call user functions in the scope of SCOPED_INTERCEPTOR_RAW. This means that ScopedInterceptor constructor will disable memory accesses inside of user callback as well and we don't get any race detection.
Does it work at all? Please add a test with a data race inside of the callback.

75

It would be good to synchronize on an address associated with the work item submitted, rather than on the queue address. Synchronization on queue address will introduce lots of parasitic synchronization. Think of a widely used concurrent queue.

I guess we can't use the context for synchronization, because internal allocator memory does not have meta shadow mapped. Don't know whether a code block address can be used for that purpose (and whether it can be casted to uptr at all or not). If block address can't be used, then it worth it to allocate a context with malloc and use it address for synchronization.

81

Are closures for these code blocks dynamically allocated? With malloc?

86

empty line between macros, please

88

Why here you use a code block, but below you use an explicit context and function wrapper?

153

Is RELEASE_ACQUIRE_QUEUE_FUNCTION_INDEX intended to be used more? Currently it is the only call site. Which makes introduction of the helper macro questionable.

Doh! Don't know whether I need to laugh or cry.
If you mean this one:
http://www.opensource.apple.com/source/libdispatch/libdispatch-339.1.9/dispatch/once.h
It does contain real, harmful data race. Load of predicate on fast path must be an atomic load with acquire memory ordering.

First we need to fix the actual bug in libdispatch.
Then you can do the same thing I did for pthread_once. Namely, reimplement dispatch_once entirely expecting that the inlined fast-path will load acquire load of the predicate address.

I talked to the maintainers of libdispatch, and I couldn't convince them that the race is actually harmful (including linking to https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong). I think they'd be fine with changing the read into a *relaxed* atomic read (because that wouldn't actually change the generated ASM instructions), but that's still not enough for us (it needs to be *acquire*). Even if they did change it into a *load acquire* that TSan would catch, there'll be these issues left:

  1. TSan will miss the *load acquire* in already compiled code (system libraries). This is unfortunately quite common.
  2. It will only work properly in the next OS X release.

So at this point, I'd like to implement something that 1) works on current OS X and 2) doesn't produce false positives in system libraries. If you know of a better way (other than disabling reads/writes in the initializer), I'll be glad to consider it.

I talked to the maintainers of libdispatch, and I couldn't convince them that the race is actually harmful

This is a bug and it must be fixed. Filed:
https://libdispatch.macosforge.org/trac/ticket/2779

So at this point, I'd like to implement something that 1) works on current OS X and 2) doesn't produce false positives in system libraries. If you know of a better way (other than disabling reads/writes in the initializer), I'll be glad to consider it.

We can do:

TSAN_INTERCEPTOR(void, dispatch_once_f, dispatch_once_t *predicate, void *context, dispatch_function_t function) {

SCOPED_INTERCEPTOR_RAW(dispatch_once_f, predicate, context, function);
atomic_uint32_t *a = static_cast<atomic_uint32_t*>(predicate);
u32 v = atomic_load(a, memory_order_acquire);
if (v == 0 && atomic_compare_exchange_strong(a, &v, 1,
                                             memory_order_relaxed)) {
  (*function)(context);
  Release(thr, pc, (uptr)a);
  atomic_store(a, 2, memory_order_release);
} else {
  while (v != 2) {
    internal_sched_yield();
    v = atomic_load(a, memory_order_acquire);
  }
  Acquire(thr, pc, (uptr)a);
}
return 0;

}

Since we don't store ~0 into the predicate, the fast path will never be taken. So we will be able to annotate it even for non-instrumented code. Also, since tsan does not see own atomic stores, the race on predicate won't be reported: the only accesses to it that tsan sees is the load on fast path. Loads don't race.

I talked to the maintainers of libdispatch, and I couldn't convince them that the race is actually harmful

This is a bug and it must be fixed. Filed:
https://libdispatch.macosforge.org/trac/ticket/2779

This bugtracking system seems to be unused. Would you mind filing a bug at https://bugreport.apple.com instead?

Filed radar bug 23575208. There does not seem to be a way to get a link to it....

Filed radar bug 23575208. There does not seem to be a way to get a link to it....

Thanks!

I extracted the dispatch_once part of this patch into a separate revision: http://reviews.llvm.org/D14811

kubamracek edited edge metadata.

Simplifying this patch to only add interceptors for dispatch_sync and dispatch_async (and their _f and _barrier variants). Added more tests to show usage of dispatch_sync and dispatch_async, for cases where we expect no warnings and for cases where TSan finds races.

This patch doesn't need any suppressions (which was a bad idea and didn't work, you were right). The synchronization is now done on malloc'd contexts (separate for each submitted block/callback).

dvyukov accepted this revision.Nov 23 2015, 11:37 AM
dvyukov edited edge metadata.
This revision is now accepted and ready to land.Nov 23 2015, 11:37 AM
This revision was automatically updated to reflect the committed changes.