This patch adds a wrapper for call_once, which uses an already-compiled helper __call_once with an atomic release which is invisible to TSan. To avoid false positives, the interceptor performs an explicit atomic release in the callback wrapper.
The test passes only after a real race in libc++ is fixed, patch at https://reviews.llvm.org/D24028.
Do we need an acquire here?
Consider that two threads fall onto this slow path. One calls the user function and does release. The other thread waits on some internal synchronization and returns. What's missing is release->acquire between the first thread and the second thread.