OS X provides atomic functions in libkern/OSAtomic.h. These provide atomic guarantees and they have alternatives which have barrier semantics. This patch adds proper TSan support for these functions (all from libkern/OSAtomic.h):
- OSAtomicAdd32
- OSAtomicAdd32Barrier
- OSAtomicAdd64
- OSAtomicAdd64Barrier
- OSAtomicAnd32
- OSAtomicAnd32Barrier
- OSAtomicAnd32Orig
- OSAtomicAnd32OrigBarrier
- OSAtomicCompareAndSwap32
- OSAtomicCompareAndSwap32Barrier
- OSAtomicCompareAndSwap64
- OSAtomicCompareAndSwap64Barrier
- OSAtomicCompareAndSwapInt
- OSAtomicCompareAndSwapIntBarrier
- OSAtomicCompareAndSwapLong
- OSAtomicCompareAndSwapLongBarrier
- OSAtomicCompareAndSwapPtr
- OSAtomicCompareAndSwapPtrBarrier
- OSAtomicDecrement32
- OSAtomicDecrement32Barrier
- OSAtomicDecrement64
- OSAtomicDecrement64Barrier
- OSAtomicDequeue
- OSAtomicEnqueue
- OSAtomicFifoDequeue
- OSAtomicFifoEnqueue
- OSAtomicIncrement32
- OSAtomicIncrement32Barrier
- OSAtomicIncrement64
- OSAtomicIncrement64Barrier
- OSAtomicOr32
- OSAtomicOr32Barrier
- OSAtomicOr32Orig
- OSAtomicOr32OrigBarrier
- OSAtomicTestAndClear
- OSAtomicTestAndClearBarrier
- OSAtomicTestAndSet
- OSAtomicTestAndSetBarrier
- OSAtomicXor32
- OSAtomicXor32Barrier
- OSAtomicXor32Orig
- OSAtomicXor32OrigBarrier
This is not correct. If you separate the operation itself and the memory barrier, release barrier needs to precede the operation itself (while acquire barrier needs to follow the operation itself). If you split this into acquire+operation+release, it will increase cost of the operation significantly, as we will lock sync mutex twice and process vector clock twice.
I would suggest to use __tsan_atomic8_fetch_or/and to implement these bit flag operations. For NoBarrier versions you can pass mo_relaxed and it should be as cheap as calling the REAL function.