Reimplement dispatch_once in an interceptor to solve these issues that may produce false positives with TSan on OS X:
- there is a racy load inside an inlined part of dispatch_once,
- the fast path in dispatch_once doesn't perform an acquire load, so we don't properly synchronize the initialization and subsequent uses of whatever is initialized,
- dispatch_once is already used in a lot of already-compiled code, so TSan doesn't see the inlined fast-path.
This patch uses a trick to avoid ever taking the fast path (by never storing ~0 into the predicate), which means the interceptor will always be called even from already-compiled code. Within the interceptor, our own atomic reads and writes are not written into shadow cells, so the race in the inlined part is not reported (because the accesses are only loads).