This is an archive of the discontinued LLVM Phabricator instance.

Vector clock algorithm improvement in case of blocking release sequence
Needs ReviewPublic

Authored by YorovSobir on May 19 2018, 10:59 AM.

Download Raw Diff

Details

Reviewers

dvyukov
kcc

Summary

In tsan, release sequences are never blocked, and all of them will continue indefinitely. It leads several data races to be missed. On the other hand, tsan does not recognise fence semantics and their role in synchronization, causing tsan to produce false positives.
See more in https://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2017/POPL.pdf
Now vector clock algorithm is modified according to the article. Correctness is proved by several unit tests.

Diff Detail

Repository

rCRT Compiler Runtime

Build Status

Buildable 18377
Build 18377: arc lint + arc unit

Event Timeline

YorovSobir created this revision.May 19 2018, 10:59 AM

Herald added subscribers: Restricted Project, llvm-commits, delcypher and 2 others. · View Herald TranscriptMay 19 2018, 10:59 AM

YorovSobir retitled this revision from release sequence now can be blocked to Vector clock algorithm improvement in case of blocking release sequence.May 19 2018, 1:20 PM

YorovSobir edited the summary of this revision. (Show Details)

YorovSobir added reviewers: dvyukov, kcc.

dorooleg added a subscriber: dorooleg.May 20 2018, 3:19 AM

jfb added a subscriber: jfb.May 23 2018, 8:38 AM

Hi Yorov,

Sorry for delays, I was on a vacation and now have a backlog of work.

Thanks for working on this.

This needs some additional work before we can merge this. High-level notes:

Fences and release sequences need to be separated into individual patches. I would suggest to start with fences as it looks simpler.

Tsan is used on a very large programs with thousands of threads and tens of millions of sync objects. Performance and especially memory consumption of synchronization handling is crucial. Since tsan is already widely deployed any non-constant overheads need to be controlled by flags. For example, any fence handling need to be completely skipped if the flag is not set. Default value for these flags it's a hard question, but we can decide later.

Vector's should not be used inside of sync objects. There are 2 problems with Vector's:
- they can use memory suboptimally, and this is unacceptable for sync objects since they can consume tens of gigs
- our memory allocator does reuse memory between different size classes, this leads to O(N^2) memory consumption as these Vector's slowly grow during program startup and eat memory in all consecutive size classes; that's the reason why current clocks are built as chunked data structures with constant-size blocks

For fences check out what we did in KTSAN to reduce their overheads (acquire_active and release_active):

https://github.com/google/ktsan/blob/tsan/mm/ktsan/sync_atomic.c#L6
I think we need something similar here. The actual threshold for discarding fence effects should be controllable with a flag.

For release fences it's possible to get their overhead to almost zero in common case. The idea is as follows. In common case there are no acquire operations in between release fence and the corresponding relaxed store, in such case we don't need to memorize the whole thread vector clock. Instead we can memorize only the current thread time. Then when we need to turn a relaxed store into release operation, we use the current thread vector clock (it did not change because we did not do any acquire operations) except for the current thread time, for the current thread time we use the previously saved value. This makes release fence cost O(1) instead of O(N).

But if there is an acquire operation in between, then we need to save the whole vector clock at that point. However, if we also do (4), then usually there won't be an acquire fence while a release fence is active.

lib/tsan/rtl/tsan_clock.cc
185	This comment and "Check if we need to resize dst" does not seem to add much value. I would rather see more extensive comments for more complex parts of the change.
186	This code base does not use {} around if/for blocks if they contain a single statement. Please fix here and below.
274	The TODOs need to be resolved before commit. Either by fixing code, or by removing TODO if we are not going to fix code.
349	revert
383	Same here: either resolve it, or remove.
396	We use 2 styles for comments: comment that take the full line are full English sentences: start with a capital letter and end with dot short comments on the same line with code can be either full English sentences or start with lower letter and end with no dot. So if you do this as a full line this needs to be: // Unshare before changing dst.
443	Empty line between functions please.
lib/tsan/rtl/tsan_clock.h
180	I think we can use kMaxTid here. kMaxTidInClock is kMaxTid*2, that's required to detect accesses to freed memory at low cost. But we should not access the second half of the clock here.
lib/tsan/rtl/tsan_interface_atomic.cc
227	No dead code please.
test/CMakeLists.txt
18	I don't think this needs to be here as no other tests are listed here. If you want to build a single test, just invoke the compiler manually.

eugenyk added a subscriber: eugenyk.Oct 27 2018, 5:27 AM

Revision Contents

Path

Size

lib/

tsan/

rtl/

tsan_clock.h

16 lines

tsan_clock.cc

132 lines

tsan_interface_atomic.cc

48 lines

tsan_rtl.h

5 lines

tsan_rtl_mutex.cc

48 lines

test/

CMakeLists.txt

2 lines

tsan/

atomic_release_sequence_blocking.cpp

45 lines

fence_norace.cpp

35 lines

Diff 147669

lib/tsan/rtl/tsan_clock.h

Show All 9 Lines
// This file is a part of ThreadSanitizer (TSan), a race detector.		// This file is a part of ThreadSanitizer (TSan), a race detector.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#ifndef TSAN_CLOCK_H		#ifndef TSAN_CLOCK_H
#define TSAN_CLOCK_H		#define TSAN_CLOCK_H

#include "tsan_defs.h"		#include "tsan_defs.h"
#include "tsan_dense_alloc.h"		#include "tsan_dense_alloc.h"
		#include "sanitizer_common/sanitizer_vector.h"

namespace __tsan {		namespace __tsan {

typedef DenseSlabAlloc<ClockBlock, 1<<16, 1<<10> ClockAlloc;		typedef DenseSlabAlloc<ClockBlock, 1<<16, 1<<10> ClockAlloc;
typedef DenseSlabAllocCache ClockCache;		typedef DenseSlabAllocCache ClockCache;

// The clock that lives in sync variables (mutexes, atomics, etc).		// The clock that lives in sync variables (mutexes, atomics, etc).
class SyncClock {		class SyncClock {
public:		public:
SyncClock();		SyncClock();
~SyncClock();		~SyncClock();

uptr size() const;		uptr size() const;

// These are used only in tests.		// These are used only in tests.
u64 get(unsigned tid) const;		u64 get(unsigned tid) const;
u64 get_clean(unsigned tid) const;		u64 get_clean(unsigned tid) const;

void Resize(ClockCache *c, uptr nclk);		void Resize(ClockCache *c, uptr nclk);
		void ResizeReleaseSequenceVectors(unsigned int tid, u16 nclk);
void Reset(ClockCache *c);		void Reset(ClockCache *c);

void DebugDump(int(printf)(const char s, ...));		void DebugDump(int(printf)(const char s, ...));

// Clock element iterator.		// Clock element iterator.
// Note: it iterates only over the table without regard to dirty entries.		// Note: it iterates only over the table without regard to dirty entries.
class Iter {		class Iter {
public:		public:
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	private:
// +------------------+		// +------------------+
//		//
// Note: dirty entries, if active, always override what's stored in the clock.		// Note: dirty entries, if active, always override what's stored in the clock.
ClockBlock *tab_;		ClockBlock *tab_;
u32 tab_idx_;		u32 tab_idx_;
u16 size_;		u16 size_;
u16 blocks_; // Number of second level blocks.		u16 blocks_; // Number of second level blocks.

		// support several release sequence in one sync variable
		// see Page 5 in https://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2017/POPL.pdf
		Vector<Vector<u64>> released_threads_clock;
		Vector<bool> release_sequence_blocked;

void Unshare(ClockCache *c);		void Unshare(ClockCache *c);
bool IsShared() const;		bool IsShared() const;
bool Cachable() const;		bool Cachable() const;
void ResetImpl();		void ResetImpl();
void FlushDirty();		void FlushDirty();
uptr capacity() const;		uptr capacity() const;
u32 get_block(uptr bi) const;		u32 get_block(uptr bi) const;
void append_block(u32 idx);		void append_block(u32 idx);
Show All 9 Lines	public:

u64 get(unsigned tid) const;		u64 get(unsigned tid) const;
void set(ClockCache *c, unsigned tid, u64 v);		void set(ClockCache *c, unsigned tid, u64 v);
void set(u64 v);		void set(u64 v);
void tick();		void tick();
uptr size() const;		uptr size() const;

void acquire(ClockCache c, SyncClock src);		void acquire(ClockCache c, SyncClock src);
		void relaxed_load(ClockCache c, SyncClock src);
void release(ClockCache c, SyncClock dst);		void release(ClockCache c, SyncClock dst);
		void relaxed_store(ClockCache c, SyncClock dst);
void acq_rel(ClockCache c, SyncClock dst);		void acq_rel(ClockCache c, SyncClock dst);
void ReleaseStore(ClockCache c, SyncClock dst);		void ReleaseStore(ClockCache c, SyncClock dst);
		void RelaxedStore(ClockCache c, SyncClock dst);
		void acquire_fence();
		void release_fence();
void ResetCached(ClockCache *c);		void ResetCached(ClockCache *c);

		void block_release_sequences(SyncClock *dst);

void DebugReset();		void DebugReset();
void DebugDump(int(printf)(const char s, ...));		void DebugDump(int(printf)(const char s, ...));

private:		private:
static const uptr kDirtyTids = SyncClock::kDirtyTids;		static const uptr kDirtyTids = SyncClock::kDirtyTids;
// Index of the thread associated with he clock ("current thread").		// Index of the thread associated with he clock ("current thread").
const unsigned tid_;		const unsigned tid_;
const unsigned reused_; // tid_ reuse count.		const unsigned reused_; // tid_ reuse count.
// Current thread time when it acquired something from other threads.		// Current thread time when it acquired something from other threads.
u64 last_acquire_;		u64 last_acquire_;

// Cached SyncClock (without dirty entries and release_store_tid_).		// Cached SyncClock (without dirty entries and release_store_tid_).
// We reuse it for subsequent store-release operations without intervening		// We reuse it for subsequent store-release operations without intervening
// acquire operations. Since it is shared (and thus constant), clock value		// acquire operations. Since it is shared (and thus constant), clock value
// for the current thread is then stored in dirty entries in the SyncClock.		// for the current thread is then stored in dirty entries in the SyncClock.
// We host a refernece to the table while it is cached here.		// We host a refernece to the table while it is cached here.
u32 cached_idx_;		u32 cached_idx_;
u16 cached_size_;		u16 cached_size_;
u16 cached_blocks_;		u16 cached_blocks_;

// Number of active elements in the clk_ table (the rest is zeros).		// Number of active elements in the clk_ table (the rest is zeros).
uptr nclk_;		uptr nclk_;
u64 clk_[kMaxTidInClock]; // Fixed size vector clock.		u64 clk_[kMaxTidInClock]; // Fixed size vector clock.
		u64 fence_release_clock[kMaxTidInClock];
		dvyukovUnsubmitted Not Done Reply Inline Actions I think we can use kMaxTid here. kMaxTidInClock is kMaxTid2, that's required to detect accesses to freed memory at low cost. But we should not access the second half of the clock here. dvyukov:* I think we can use kMaxTid here. kMaxTidInClock is kMaxTid*2, that's required to detect…
		u64 fence_acquire_clock[kMaxTidInClock];

bool IsAlreadyAcquired(const SyncClock *src) const;		bool IsAlreadyAcquired(const SyncClock *src) const;
void UpdateCurrentThread(ClockCache c, SyncClock dst) const;		void UpdateCurrentThread(ClockCache c, SyncClock dst) const;
};		};

ALWAYS_INLINE u64 ThreadClock::get(unsigned tid) const {		ALWAYS_INLINE u64 ThreadClock::get(unsigned tid) const {
DCHECK_LT(tid, kMaxTidInClock);		DCHECK_LT(tid, kMaxTidInClock);
return clk_[tid];		return clk_[tid];
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

lib/tsan/rtl/tsan_clock.cc

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	void ThreadClock::acquire(ClockCache c, SyncClock src) {

if (acquired) {		if (acquired) {
CPP_STAT_INC(StatClockAcquiredSomething);		CPP_STAT_INC(StatClockAcquiredSomething);
last_acquire_ = clk_[tid_];		last_acquire_ = clk_[tid_];
ResetCached(c);		ResetCached(c);
}		}
}		}

		void ThreadClock::relaxed_load(ClockCache c, SyncClock src) {
		DCHECK_LE(nclk_, kMaxTid);
		DCHECK_LE(src->size_, kMaxTid);

		// Check if it's empty -> no need to do anything.
		dvyukovUnsubmitted Not Done Reply Inline Actions This comment and "Check if we need to resize dst" does not seem to add much value. I would rather see more extensive comments for more complex parts of the change. dvyukov: This comment and "Check if we need to resize dst" does not seem to add much value. I would…
		if (src->size_ == 0) {
		dvyukovUnsubmitted Not Done Reply Inline Actions This code base does not use {} around if/for blocks if they contain a single statement. Please fix here and below. dvyukov: This code base does not use {} around if/for blocks if they contain a single statement. Please…
		return;
		}

		uptr i = 0;
		for (ClockElem &ce : *src) {
		fence_acquire_clock[i] = max(fence_acquire_clock[i], ce.epoch);
		++i;
		}
		}

void ThreadClock::release(ClockCache c, SyncClock dst) {		void ThreadClock::release(ClockCache c, SyncClock dst) {
DCHECK_LE(nclk_, kMaxTid);		DCHECK_LE(nclk_, kMaxTid);
DCHECK_LE(dst->size_, kMaxTid);		DCHECK_LE(dst->size_, kMaxTid);

if (dst->size_ == 0) {		if (dst->size_ == 0) {
// ReleaseStore will correctly set release_store_tid_,		// ReleaseStore will correctly set release_store_tid_,
// which can be important for future operations.		// which can be important for future operations.
		// TODO(sobir.yorov94):
		// maybe failed because RMW doesn't block release sequence,
		// but in ReleaseStore we are blocked they
ReleaseStore(c, dst);		ReleaseStore(c, dst);
return;		return;
}		}

CPP_STAT_INC(StatClockRelease);		CPP_STAT_INC(StatClockRelease);
// Check if we need to resize dst.		// Check if we need to resize dst.
if (dst->size_ < nclk_)		if (dst->size_ < nclk_)
dst->Resize(c, nclk_);		dst->Resize(c, nclk_);

		// Check if we need to resize release release sequence vector for tid
		if (dst->released_threads_clock.Size() < dst->size_ \|\|
		dst->released_threads_clock[tid_].Size() < dst->size_) {
		dst->ResizeReleaseSequenceVectors(tid_, dst->size_);
		}
// Check if we had not acquired anything from other threads		// Check if we had not acquired anything from other threads
// since the last release on dst. If so, we need to update		// since the last release on dst. If so, we need to update
// only dst->elem(tid_).		// only dst->elem(tid_).
if (dst->elem(tid_).epoch > last_acquire_) {		if (dst->elem(tid_).epoch > last_acquire_) {
UpdateCurrentThread(c, dst);		UpdateCurrentThread(c, dst);
if (dst->release_store_tid_ != tid_ \|\|		if (dst->release_store_tid_ != tid_ \|\|
dst->release_store_reused_ != reused_)		dst->release_store_reused_ != reused_)
dst->release_store_tid_ = kInvalidTid;		dst->release_store_tid_ = kInvalidTid;
return;		return;
}		}

// O(N) release.		// O(N) release.
CPP_STAT_INC(StatClockReleaseFull);		CPP_STAT_INC(StatClockReleaseFull);
dst->Unshare(c);		dst->Unshare(c);
// First, remember whether we've acquired dst.		// First, remember whether we've acquired dst.
bool acquired = IsAlreadyAcquired(dst);		bool acquired = IsAlreadyAcquired(dst);
if (acquired)		if (acquired)
CPP_STAT_INC(StatClockReleaseAcquired);		CPP_STAT_INC(StatClockReleaseAcquired);
// Update dst->clk_.		// Update dst->clk_.
dst->FlushDirty();		dst->FlushDirty();
uptr i = 0;		uptr i = 0;
for (ClockElem &ce : *dst) {		for (ClockElem &ce : *dst) {
		dst->released_threads_clock[tid_][i] = ce.epoch;
		++i;
		}
		dst->release_sequence_blocked[tid_] = false;
		i = 0;
		for (ClockElem &ce : *dst) {
ce.epoch = max(ce.epoch, clk_[i]);		ce.epoch = max(ce.epoch, clk_[i]);
ce.reused = 0;		ce.reused = 0;
i++;		i++;
}		}
// Clear 'acquired' flag in the remaining elements.		// Clear 'acquired' flag in the remaining elements.
if (nclk_ < dst->size_)		if (nclk_ < dst->size_)
CPP_STAT_INC(StatClockReleaseClearTail);		CPP_STAT_INC(StatClockReleaseClearTail);
for (uptr i = nclk_; i < dst->size_; i++)		for (uptr i = nclk_; i < dst->size_; i++)
dst->elem(i).reused = 0;		dst->elem(i).reused = 0;
dst->release_store_tid_ = kInvalidTid;		dst->release_store_tid_ = kInvalidTid;
dst->release_store_reused_ = 0;		dst->release_store_reused_ = 0;
// If we've acquired dst, remember this fact,		// If we've acquired dst, remember this fact,
// so that we don't need to acquire it on next acquire.		// so that we don't need to acquire it on next acquire.
if (acquired)		if (acquired)
dst->elem(tid_).reused = reused_;		dst->elem(tid_).reused = reused_;
}		}

		void ThreadClock::relaxed_store(ClockCache c, SyncClock dst) {
		DCHECK_LE(nclk_, kMaxTid);
		DCHECK_LE(dst->size_, kMaxTid);

		// Check if we need to resize dst.
		if (dst->size_ < nclk_)
		dst->Resize(c, nclk_);

		// TODO(sobir.yorov94@gmail.com):
		dvyukovUnsubmitted Not Done Reply Inline Actions The TODOs need to be resolved before commit. Either by fixing code, or by removing TODO if we are not going to fix code. dvyukov: The TODOs need to be resolved before commit. Either by fixing code, or by removing TODO if we…
		// optimize in case when we didn't change fence_release_clock from last time
		dst->Unshare(c);
		uptr i = 0;
		for (ClockElem &ce : *dst) {
		ce.epoch = max(ce.epoch, fence_release_clock[i]);
		ce.reused = 0;
		i++;
		}
		dst->FlushDirty();
		}

void ThreadClock::ReleaseStore(ClockCache c, SyncClock dst) {		void ThreadClock::ReleaseStore(ClockCache c, SyncClock dst) {
DCHECK_LE(nclk_, kMaxTid);		DCHECK_LE(nclk_, kMaxTid);
DCHECK_LE(dst->size_, kMaxTid);		DCHECK_LE(dst->size_, kMaxTid);
		DCHECK_LE(dst->released_threads_clock.Size(), kMaxTid);
		DCHECK_LE(dst->release_sequence_blocked.Size(), kMaxTid);
CPP_STAT_INC(StatClockStore);		CPP_STAT_INC(StatClockStore);

if (dst->size_ == 0 && cached_idx_ != 0) {		if (dst->size_ == 0 && cached_idx_ != 0) {
// Reuse the cached clock.		// Reuse the cached clock.
// Note: we could reuse/cache the cached clock in more cases:		// Note: we could reuse/cache the cached clock in more cases:
// we could update the existing clock and cache it, or replace it with the		// we could update the existing clock and cache it, or replace it with the
// currently cached clock and release the old one. And for a shared		// currently cached clock and release the old one. And for a shared
// existing clock, we could replace it with the currently cached;		// existing clock, we could replace it with the currently cached;
Show All 16 Lines	if (dst->size_ == 0 && cached_idx_ != 0) {
atomic_fetch_add(ref_ptr(dst->tab_), 1, memory_order_relaxed);		atomic_fetch_add(ref_ptr(dst->tab_), 1, memory_order_relaxed);
return;		return;
}		}

// Check if we need to resize dst.		// Check if we need to resize dst.
if (dst->size_ < nclk_)		if (dst->size_ < nclk_)
dst->Resize(c, nclk_);		dst->Resize(c, nclk_);

		// Check if we need to resize release sequence vector for tid
		if (dst->released_threads_clock.Size() < dst->size_ \|\|
		dst->released_threads_clock[tid_].Size() < dst->size_) {
		dst->ResizeReleaseSequenceVectors(tid_, dst->size_);
		}

		// Since it's release store we block release sequence of other threads
		block_release_sequences(dst);
		dst->release_sequence_blocked[tid_] = false;
		for (uptr i = 0; i < nclk_; ++i) {
		dst->released_threads_clock[tid_][i] = clk_[i];
		}

if (dst->release_store_tid_ == tid_ &&		if (dst->release_store_tid_ == tid_ &&
dst->release_store_reused_ == reused_ &&		dst->release_store_reused_ == reused_ &&
dst->elem(tid_).epoch > last_acquire_) {		dst->elem(tid_).epoch > last_acquire_) {
CPP_STAT_INC(StatClockStoreFast);		CPP_STAT_INC(StatClockStoreFast);
UpdateCurrentThread(c, dst);		UpdateCurrentThread(c, dst);
return;		return;
}		}

// O(N) release-store.		// O(N) release-store.
CPP_STAT_INC(StatClockStoreFull);		CPP_STAT_INC(StatClockStoreFull);
dst->Unshare(c);		dst->Unshare(c);
// Note: dst can be larger than this ThreadClock.		// Note: dst can be larger than this ThreadClock.
// This is fine since clk_ beyond size is all zeros.		// This is fine since clk_ beyond size is all zeros.

		dvyukovUnsubmitted Not Done Reply Inline Actions revert dvyukov: revert
uptr i = 0;		uptr i = 0;
for (ClockElem &ce : *dst) {		for (ClockElem &ce : *dst) {
ce.epoch = clk_[i];		ce.epoch = clk_[i];
ce.reused = 0;		ce.reused = 0;
i++;		i++;
}		}
for (uptr i = 0; i < kDirtyTids; i++)		for (uptr i = 0; i < kDirtyTids; i++)
dst->dirty_[i].tid = kInvalidTid;		dst->dirty_[i].tid = kInvalidTid;
Show All 12 Lines	if (cached_idx_ == 0 && dst->Cachable()) {
else		else
atomic_fetch_add(ref_ptr(dst->tab_), 1, memory_order_relaxed);		atomic_fetch_add(ref_ptr(dst->tab_), 1, memory_order_relaxed);
cached_idx_ = dst->tab_idx_;		cached_idx_ = dst->tab_idx_;
cached_size_ = dst->size_;		cached_size_ = dst->size_;
cached_blocks_ = dst->blocks_;		cached_blocks_ = dst->blocks_;
}		}
}		}

		void ThreadClock::RelaxedStore(ClockCache c, SyncClock dst) {
		DCHECK_LE(nclk_, kMaxTid);
		DCHECK_LE(dst->size_, kMaxTid);
		DCHECK_LE(dst->released_threads_clock.Size(), kMaxTid);
		DCHECK_LE(dst->release_sequence_blocked.Size(), kMaxTid);
		// TODO(yorov.sobir): add stat here
		dvyukovUnsubmitted Not Done Reply Inline Actions Same here: either resolve it, or remove. dvyukov: Same here: either resolve it, or remove.

		// Check if we need to resize dst.
		if (dst->size_ < nclk_) {
		dst->Resize(c, nclk_);
		}
		// Check if we need to resize release sequence vector for tid
		if (dst->released_threads_clock.Size() < dst->size_ \|\|
		dst->released_threads_clock[tid_].Size() < dst->size_) {
		dst->ResizeReleaseSequenceVectors(tid_, dst->size_);
		}
		// relaxed store block all release sequences on current atomic
		block_release_sequences(dst);
		// unshare before changing dst
		dvyukovUnsubmitted Not Done Reply Inline Actions We use 2 styles for comments: comment that take the full line are full English sentences: start with a capital letter and end with dot short comments on the same line with code can be either full English sentences or start with lower letter and end with no dot. So if you do this as a full line this needs to be: // Unshare before changing dst. dvyukov: We use 2 styles for comments: 1. comment that take the full line are full English sentences…
		dst->Unshare(c);
		if (!dst->release_sequence_blocked[tid_]) {
		uptr i = 0;
		for (ClockElem &ce : *dst) {
		ce.epoch = max(dst->released_threads_clock[tid_][i],
		fence_release_clock[i]);
		ce.reused = 0;
		i++;
		}
		} else {
		uptr i = 0;
		for (ClockElem &ce : *dst) {
		ce.epoch = fence_release_clock[i];
		ce.reused = 0;
		i++;
		}
		}
		dst->FlushDirty();
		}

void ThreadClock::acq_rel(ClockCache c, SyncClock dst) {		void ThreadClock::acq_rel(ClockCache c, SyncClock dst) {
CPP_STAT_INC(StatClockAcquireRelease);		CPP_STAT_INC(StatClockAcquireRelease);
acquire(c, dst);		acquire(c, dst);
ReleaseStore(c, dst);		ReleaseStore(c, dst);
}		}

		void ThreadClock::acquire_fence() {
		DCHECK_LE(nclk_, kMaxTid);
		for (uptr i = 0; i < nclk_; ++i) {
		clk_[i] = max(clk_[i], fence_acquire_clock[i]);
		}
		}

		void ThreadClock::release_fence() {
		DCHECK_LE(nclk_, kMaxTid);
		for (uptr i = 0; i < nclk_; ++i) {
		fence_release_clock[i] = clk_[i];
		}
		}

		void ThreadClock::block_release_sequences(SyncClock *dst) {
		for (uptr i = 0; i < nclk_; ++i) {
		if (i != tid_) {
		dst->release_sequence_blocked[i] = true;
		}
		}
		}
		dvyukovUnsubmitted Not Done Reply Inline Actions Empty line between functions please. dvyukov: Empty line between functions please.
// Updates only single element related to the current thread in dst->clk_.		// Updates only single element related to the current thread in dst->clk_.
void ThreadClock::UpdateCurrentThread(ClockCache c, SyncClock dst) const {		void ThreadClock::UpdateCurrentThread(ClockCache c, SyncClock dst) const {
// Update the threads time, but preserve 'acquired' flag.		// Update the threads time, but preserve 'acquired' flag.
for (unsigned i = 0; i < kDirtyTids; i++) {		for (unsigned i = 0; i < kDirtyTids; i++) {
SyncClock::Dirty *dirty = &dst->dirty_[i];		SyncClock::Dirty *dirty = &dst->dirty_[i];
const unsigned tid = dirty->tid;		const unsigned tid = dirty->tid;
if (tid == tid_ \|\| tid == kInvalidTid) {		if (tid == tid_ \|\| tid == kInvalidTid) {
CPP_STAT_INC(StatClockReleaseFast);		CPP_STAT_INC(StatClockReleaseFast);
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	void SyncClock::DebugDump(int(printf)(const char s, ...)) {
for (uptr i = 0; i < size_; i++)		for (uptr i = 0; i < size_; i++)
printf("%s%llu", i == 0 ? "" : ",", elem(i).reused);		printf("%s%llu", i == 0 ? "" : ",", elem(i).reused);
printf("] release_store_tid=%d/%d dirty_tids=%d[%llu]/%d[%llu]",		printf("] release_store_tid=%d/%d dirty_tids=%d[%llu]/%d[%llu]",
release_store_tid_, release_store_reused_,		release_store_tid_, release_store_reused_,
dirty_[0].tid, dirty_[0].epoch,		dirty_[0].tid, dirty_[0].epoch,
dirty_[1].tid, dirty_[1].epoch);		dirty_[1].tid, dirty_[1].epoch);
}		}

		void SyncClock::ResizeReleaseSequenceVectors(const unsigned int tid, u16 nclk) {
		released_threads_clock.Resize(nclk);
		released_threads_clock[tid].Resize(nclk);
		release_sequence_blocked.Resize(nclk);
		}

void SyncClock::Iter::Next() {		void SyncClock::Iter::Next() {
// Finished with the current block, move on to the next one.		// Finished with the current block, move on to the next one.
block_++;		block_++;
if (block_ < parent_->blocks_) {		if (block_ < parent_->blocks_) {
// Iterate over the next second level block.		// Iterate over the next second level block.
u32 idx = parent_->get_block(block_);		u32 idx = parent_->get_block(block_);
ClockBlock *cb = ctx->clock_alloc.Map(idx);		ClockBlock *cb = ctx->clock_alloc.Map(idx);
pos_ = &cb->clock[0];		pos_ = &cb->clock[0];
Show All 15 Lines

lib/tsan/rtl/tsan_interface_atomic.cc

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
}		}
#endif		#endif

template<typename T>		template<typename T>
static T AtomicLoad(ThreadState thr, uptr pc, const volatile T a, morder mo) {		static T AtomicLoad(ThreadState thr, uptr pc, const volatile T a, morder mo) {
CHECK(IsLoadOrder(mo));		CHECK(IsLoadOrder(mo));
// This fast-path is critical for performance.		// This fast-path is critical for performance.
// Assume the access is atomic.		// Assume the access is atomic.
if (!IsAcquireOrder(mo)) {		// if (!IsAcquireOrder(mo)) {
		dvyukovUnsubmitted Not Done Reply Inline Actions No dead code please. dvyukov: No dead code please.
MemoryReadAtomic(thr, pc, (uptr)a, SizeLog<T>());		// MemoryReadAtomic(thr, pc, (uptr)a, SizeLog<T>());
return NoTsanAtomicLoad(a, mo);		// return NoTsanAtomicLoad(a, mo);
}		// }
// Don't create sync object if it does not exist yet. For example, an atomic		// Don't create sync object if it does not exist yet. For example, an atomic
// pointer is initialized to nullptr and then periodically acquire-loaded.		// pointer is initialized to nullptr and then periodically acquire-loaded.
T v = NoTsanAtomicLoad(a, mo);		T v = NoTsanAtomicLoad(a, mo);
SyncVar *s = ctx->metamap.GetIfExistsAndLock((uptr)a, false);		SyncVar *s = ctx->metamap.GetIfExistsAndLock((uptr)a, false);
if (s) {		if (s) {
		if (IsAcquireOrder(mo)) {
AcquireImpl(thr, pc, &s->clock);		AcquireImpl(thr, pc, &s->clock);
		} else {
		RelaxedLoadImpl(thr, pc, &s->clock);
		}
// Re-read under sync mutex because we need a consistent snapshot		// Re-read under sync mutex because we need a consistent snapshot
// of the value and the clock we acquire.		// of the value and the clock we acquire.
v = NoTsanAtomicLoad(a, mo);		v = NoTsanAtomicLoad(a, mo);
s->mtx.ReadUnlock();		s->mtx.ReadUnlock();
}		}
MemoryReadAtomic(thr, pc, (uptr)a, SizeLog<T>());		MemoryReadAtomic(thr, pc, (uptr)a, SizeLog<T>());
return v;		return v;
}		}
Show All 14 Lines
static void AtomicStore(ThreadState thr, uptr pc, volatile T a, T v,		static void AtomicStore(ThreadState thr, uptr pc, volatile T a, T v,
morder mo) {		morder mo) {
CHECK(IsStoreOrder(mo));		CHECK(IsStoreOrder(mo));
MemoryWriteAtomic(thr, pc, (uptr)a, SizeLog<T>());		MemoryWriteAtomic(thr, pc, (uptr)a, SizeLog<T>());
// This fast-path is critical for performance.		// This fast-path is critical for performance.
// Assume the access is atomic.		// Assume the access is atomic.
// Strictly saying even relaxed store cuts off release sequence,		// Strictly saying even relaxed store cuts off release sequence,
// so must reset the clock.		// so must reset the clock.
if (!IsReleaseOrder(mo)) {
NoTsanAtomicStore(a, v, mo);
return;
}
__sync_synchronize();		__sync_synchronize();
SyncVar *s = ctx->metamap.GetOrCreateAndLock(thr, pc, (uptr)a, true);		SyncVar *s = ctx->metamap.GetOrCreateAndLock(thr, pc, (uptr)a, true);
		if (IsReleaseOrder(mo)) {
thr->fast_state.IncrementEpoch();		thr->fast_state.IncrementEpoch();
// Can't increment epoch w/o writing to the trace as well.		// Can't increment epoch w/o writing to the trace as well.
TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);		TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);
ReleaseStoreImpl(thr, pc, &s->clock);		ReleaseStoreImpl(thr, pc, &s->clock);
		} else {
		RelaxedStoreImpl(thr, pc, &s->clock);
		}
NoTsanAtomicStore(a, v, mo);		NoTsanAtomicStore(a, v, mo);
s->mtx.Unlock();		s->mtx.Unlock();
}		}

template<typename T, T (F)(volatile T v, T op)>		template<typename T, T (F)(volatile T v, T op)>
static T AtomicRMW(ThreadState thr, uptr pc, volatile T a, T v, morder mo) {		static T AtomicRMW(ThreadState thr, uptr pc, volatile T a, T v, morder mo) {
MemoryWriteAtomic(thr, pc, (uptr)a, SizeLog<T>());		MemoryWriteAtomic(thr, pc, (uptr)a, SizeLog<T>());
SyncVar *s = 0;		SyncVar *s = ctx->metamap.GetOrCreateAndLock(thr, pc, (uptr)a, true);
if (mo != mo_relaxed) {		if (mo != mo_relaxed) {
s = ctx->metamap.GetOrCreateAndLock(thr, pc, (uptr)a, true);
thr->fast_state.IncrementEpoch();		thr->fast_state.IncrementEpoch();
// Can't increment epoch w/o writing to the trace as well.		// Can't increment epoch w/o writing to the trace as well.
TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);		TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);
if (IsAcqRelOrder(mo))		if (IsAcqRelOrder(mo))
AcquireReleaseImpl(thr, pc, &s->clock);		AcquireReleaseImpl(thr, pc, &s->clock);
else if (IsReleaseOrder(mo))		else if (IsReleaseOrder(mo))
ReleaseImpl(thr, pc, &s->clock);		ReleaseImpl(thr, pc, &s->clock);
else if (IsAcquireOrder(mo))		else if (IsAcquireOrder(mo))
AcquireImpl(thr, pc, &s->clock);		AcquireImpl(thr, pc, &s->clock);
		} else {
		RelaxedImpl(thr, pc, &s->clock);
}		}
v = F(a, v);		v = F(a, v);
if (s)		if (s)
s->mtx.Unlock();		s->mtx.Unlock();
return v;		return v;
}		}

template<typename T>		template<typename T>
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	static T NoTsanAtomicCAS(volatile T *a, T c, T v, morder mo, morder fmo) {
return c;		return c;
}		}

template<typename T>		template<typename T>
static bool AtomicCAS(ThreadState *thr, uptr pc,		static bool AtomicCAS(ThreadState *thr, uptr pc,
volatile T a, T c, T v, morder mo, morder fmo) {		volatile T a, T c, T v, morder mo, morder fmo) {
(void)fmo; // Unused because llvm does not pass it yet.		(void)fmo; // Unused because llvm does not pass it yet.
MemoryWriteAtomic(thr, pc, (uptr)a, SizeLog<T>());		MemoryWriteAtomic(thr, pc, (uptr)a, SizeLog<T>());
SyncVar *s = 0;
bool write_lock = mo != mo_acquire && mo != mo_consume;		bool write_lock = mo != mo_acquire && mo != mo_consume;
		SyncVar *s = ctx->metamap.GetOrCreateAndLock(thr, pc, (uptr)a, write_lock);
if (mo != mo_relaxed) {		if (mo != mo_relaxed) {
s = ctx->metamap.GetOrCreateAndLock(thr, pc, (uptr)a, write_lock);
thr->fast_state.IncrementEpoch();		thr->fast_state.IncrementEpoch();
// Can't increment epoch w/o writing to the trace as well.		// Can't increment epoch w/o writing to the trace as well.
TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);		TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);
if (IsAcqRelOrder(mo))		if (IsAcqRelOrder(mo))
AcquireReleaseImpl(thr, pc, &s->clock);		AcquireReleaseImpl(thr, pc, &s->clock);
else if (IsReleaseOrder(mo))		else if (IsReleaseOrder(mo))
ReleaseImpl(thr, pc, &s->clock);		ReleaseImpl(thr, pc, &s->clock);
else if (IsAcquireOrder(mo))		else if (IsAcquireOrder(mo))
AcquireImpl(thr, pc, &s->clock);		AcquireImpl(thr, pc, &s->clock);
		} else {
		RelaxedImpl(thr, pc, &s->clock);
}		}
T cc = *c;		T cc = *c;
T pr = func_cas(a, cc, v);		T pr = func_cas(a, cc, v);
if (s) {		if (s) {
if (write_lock)		if (write_lock)
s->mtx.Unlock();		s->mtx.Unlock();
else		else
s->mtx.ReadUnlock();		s->mtx.ReadUnlock();
Show All 12 Lines
}		}

#if !SANITIZER_GO		#if !SANITIZER_GO
static void NoTsanAtomicFence(morder mo) {		static void NoTsanAtomicFence(morder mo) {
__sync_synchronize();		__sync_synchronize();
}		}

static void AtomicFence(ThreadState *thr, uptr pc, morder mo) {		static void AtomicFence(ThreadState *thr, uptr pc, morder mo) {
// FIXME(dvyukov): not implemented.		if (IsAcquireOrder(mo))
__sync_synchronize();		AcquireFenceImpl(thr, pc);
		else if (IsReleaseOrder(mo))
		ReleaseFenceImpl(thr, pc);
		NoTsanAtomicFence(mo);
}		}
#endif		#endif

// Interface functions follow.		// Interface functions follow.
#if !SANITIZER_GO		#if !SANITIZER_GO

// C/C++		// C/C++

▲ Show 20 Lines • Show All 498 Lines • Show Last 20 Lines

lib/tsan/rtl/tsan_rtl.h

	Show First 20 Lines • Show All 795 Lines • ▼ Show 20 Lines
	// handle Go finalizers. Namely, finalizer goroutine executes AcquireGlobal			// handle Go finalizers. Namely, finalizer goroutine executes AcquireGlobal
	// right before executing finalizers. This provides a coarse, but simple			// right before executing finalizers. This provides a coarse, but simple
	// approximation of the actual required synchronization.			// approximation of the actual required synchronization.
	void AcquireGlobal(ThreadState *thr, uptr pc);			void AcquireGlobal(ThreadState *thr, uptr pc);
	void Release(ThreadState *thr, uptr pc, uptr addr);			void Release(ThreadState *thr, uptr pc, uptr addr);
	void ReleaseStore(ThreadState *thr, uptr pc, uptr addr);			void ReleaseStore(ThreadState *thr, uptr pc, uptr addr);
	void AfterSleep(ThreadState *thr, uptr pc);			void AfterSleep(ThreadState *thr, uptr pc);
	void AcquireImpl(ThreadState thr, uptr pc, SyncClock c);			void AcquireImpl(ThreadState thr, uptr pc, SyncClock c);
				void RelaxedLoadImpl(ThreadState thr, uptr pc, SyncClock c);
	void ReleaseImpl(ThreadState thr, uptr pc, SyncClock c);			void ReleaseImpl(ThreadState thr, uptr pc, SyncClock c);
				void RelaxedImpl(ThreadState thr, uptr pc, SyncClock c);
	void ReleaseStoreImpl(ThreadState thr, uptr pc, SyncClock c);			void ReleaseStoreImpl(ThreadState thr, uptr pc, SyncClock c);
				void RelaxedStoreImpl(ThreadState thr, uptr pc, SyncClock c);
	void AcquireReleaseImpl(ThreadState thr, uptr pc, SyncClock c);			void AcquireReleaseImpl(ThreadState thr, uptr pc, SyncClock c);
				void AcquireFenceImpl(ThreadState *thr, uptr pc);
				void ReleaseFenceImpl(ThreadState *thr, uptr pc);

	// The hacky call uses custom calling convention and an assembly thunk.			// The hacky call uses custom calling convention and an assembly thunk.
	// It is considerably faster that a normal call for the caller			// It is considerably faster that a normal call for the caller
	// if it is not executed (it is intended for slow paths from hot functions).			// if it is not executed (it is intended for slow paths from hot functions).
	// The trick is that the call preserves all registers and the compiler			// The trick is that the call preserves all registers and the compiler
	// does not treat it as a call.			// does not treat it as a call.
	// If it does not work for you, use normal call.			// If it does not work for you, use normal call.
	#if !SANITIZER_DEBUG && defined(__x86_64__) && !SANITIZER_MAC			#if !SANITIZER_DEBUG && defined(__x86_64__) && !SANITIZER_MAC
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

lib/tsan/rtl/tsan_rtl_mutex.cc

	Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines
	void AcquireImpl(ThreadState thr, uptr pc, SyncClock c) {			void AcquireImpl(ThreadState thr, uptr pc, SyncClock c) {
	if (thr->ignore_sync)			if (thr->ignore_sync)
	return;			return;
	thr->clock.set(thr->fast_state.epoch());			thr->clock.set(thr->fast_state.epoch());
	thr->clock.acquire(&thr->proc()->clock_cache, c);			thr->clock.acquire(&thr->proc()->clock_cache, c);
	StatInc(thr, StatSyncAcquire);			StatInc(thr, StatSyncAcquire);
	}			}

				void RelaxedLoadImpl(ThreadState thr, uptr pc, SyncClock c) {
				if (thr->ignore_sync) {
				return;
				}
				thr->clock.set(thr->fast_state.epoch());
				thr->clock.relaxed_load(&thr->proc()->clock_cache, c);
				}

	void ReleaseImpl(ThreadState thr, uptr pc, SyncClock c) {			void ReleaseImpl(ThreadState thr, uptr pc, SyncClock c) {
	if (thr->ignore_sync)			if (thr->ignore_sync)
	return;			return;
	thr->clock.set(thr->fast_state.epoch());			thr->clock.set(thr->fast_state.epoch());
	thr->fast_synch_epoch = thr->fast_state.epoch();			thr->fast_synch_epoch = thr->fast_state.epoch();
	thr->clock.release(&thr->proc()->clock_cache, c);			thr->clock.release(&thr->proc()->clock_cache, c);
	StatInc(thr, StatSyncRelease);			StatInc(thr, StatSyncRelease);
	}			}

				void RelaxedImpl(ThreadState thr, uptr pc, SyncClock c) {
				if (thr->ignore_sync) {
				return;
				}
				thr->clock.set(thr->fast_state.epoch());
				thr->fast_synch_epoch = thr->fast_state.epoch();
				thr->clock.relaxed_store(&thr->proc()->clock_cache, c);
				// TODO(yorov.sobir): add stat here
				}

	void ReleaseStoreImpl(ThreadState thr, uptr pc, SyncClock c) {			void ReleaseStoreImpl(ThreadState thr, uptr pc, SyncClock c) {
	if (thr->ignore_sync)			if (thr->ignore_sync)
	return;			return;
	thr->clock.set(thr->fast_state.epoch());			thr->clock.set(thr->fast_state.epoch());
	thr->fast_synch_epoch = thr->fast_state.epoch();			thr->fast_synch_epoch = thr->fast_state.epoch();
	thr->clock.ReleaseStore(&thr->proc()->clock_cache, c);			thr->clock.ReleaseStore(&thr->proc()->clock_cache, c);
	StatInc(thr, StatSyncRelease);			StatInc(thr, StatSyncRelease);
	}			}

				void RelaxedStoreImpl(ThreadState thr, uptr pc, SyncClock c) {
				if (thr->ignore_sync) {
				return;
				}
				thr->clock.set(thr->fast_state.epoch());
				thr->fast_synch_epoch = thr->fast_state.epoch();
				thr->clock.RelaxedStore(&thr->proc()->clock_cache, c);
				// TODO(yorov.sobir): add stat here
				}

	void AcquireReleaseImpl(ThreadState thr, uptr pc, SyncClock c) {			void AcquireReleaseImpl(ThreadState thr, uptr pc, SyncClock c) {
	if (thr->ignore_sync)			if (thr->ignore_sync)
	return;			return;
	thr->clock.set(thr->fast_state.epoch());			thr->clock.set(thr->fast_state.epoch());
	thr->fast_synch_epoch = thr->fast_state.epoch();			thr->fast_synch_epoch = thr->fast_state.epoch();
	thr->clock.acq_rel(&thr->proc()->clock_cache, c);			thr->clock.acq_rel(&thr->proc()->clock_cache, c);
	StatInc(thr, StatSyncAcquire);			StatInc(thr, StatSyncAcquire);
	StatInc(thr, StatSyncRelease);			StatInc(thr, StatSyncRelease);
	}			}

				void AcquireFenceImpl(ThreadState *thr, uptr pc) {
				if (thr->ignore_sync) {
				return;
				}
				thr->clock.set(thr->fast_state.epoch());
				thr->fast_synch_epoch = thr->fast_state.epoch();
				thr->clock.acquire_fence();
				//TODO(sobir.yorov94@gmail.com): add stat
				}

				void ReleaseFenceImpl(ThreadState *thr, uptr pc) {
				if (thr->ignore_sync) {
				return;
				}
				thr->clock.set(thr->fast_state.epoch());
				thr->fast_synch_epoch = thr->fast_state.epoch();
				thr->clock.release_fence();
				//TODO(sobir.yorov94@gmail.com): add stat
				}

	void ReportDeadlock(ThreadState thr, uptr pc, DDReport r) {			void ReportDeadlock(ThreadState thr, uptr pc, DDReport r) {
	if (r == 0)			if (r == 0)
	return;			return;
	ThreadRegistryLock l(ctx->thread_registry);			ThreadRegistryLock l(ctx->thread_registry);
	ScopedReport rep(ReportTypeDeadlock);			ScopedReport rep(ReportTypeDeadlock);
	for (int i = 0; i < r->n; i++) {			for (int i = 0; i < r->n; i++) {
	rep.AddMutex(r->loop[i].mtx_ctx0);			rep.AddMutex(r->loop[i].mtx_ctx0);
	rep.AddUniqueTid((int)r->loop[i].thr_ctx);			rep.AddUniqueTid((int)r->loop[i].thr_ctx);
	Show All 19 Lines

test/CMakeLists.txt

	Show All 9 Lines

	set(SANITIZER_COMMON_LIT_TEST_DEPS)			set(SANITIZER_COMMON_LIT_TEST_DEPS)

	if(COMPILER_RT_BUILD_PROFILE AND COMPILER_RT_HAS_PROFILE)			if(COMPILER_RT_BUILD_PROFILE AND COMPILER_RT_HAS_PROFILE)
	list(APPEND SANITIZER_COMMON_LIT_TEST_DEPS profile)			list(APPEND SANITIZER_COMMON_LIT_TEST_DEPS profile)
	endif()			endif()

	if(COMPILER_RT_STANDALONE_BUILD)			if(COMPILER_RT_STANDALONE_BUILD)
	add_executable(FileCheck IMPORTED GLOBAL)			add_executable(FileCheck IMPORTED GLOBAL tsan/atomic_release_sequence_blocking.cpp tsan/fence_norace.cpp)
				dvyukovUnsubmitted Not Done Reply Inline Actions I don't think this needs to be here as no other tests are listed here. If you want to build a single test, just invoke the compiler manually. dvyukov: I don't think this needs to be here as no other tests are listed here. If you want to build a…
	set_property(TARGET FileCheck PROPERTY IMPORTED_LOCATION ${LLVM_TOOLS_BINARY_DIR}/FileCheck)			set_property(TARGET FileCheck PROPERTY IMPORTED_LOCATION ${LLVM_TOOLS_BINARY_DIR}/FileCheck)
	list(APPEND SANITIZER_COMMON_LIT_TEST_DEPS FileCheck)			list(APPEND SANITIZER_COMMON_LIT_TEST_DEPS FileCheck)
	endif()			endif()

	# When ANDROID, we build tests with the host compiler (i.e. CMAKE_C_COMPILER),			# When ANDROID, we build tests with the host compiler (i.e. CMAKE_C_COMPILER),
	# and run tests with tools from the host toolchain.			# and run tests with tools from the host toolchain.
	if(NOT ANDROID)			if(NOT ANDROID)
	if(NOT COMPILER_RT_STANDALONE_BUILD)			if(NOT COMPILER_RT_STANDALONE_BUILD)
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/tsan/atomic_release_sequence_blocking.cpp

This file was added.

				// RUN: %clangxx_tsan -O0 %s -o %t && %run %t 2>&1 \| FileCheck %s
				#include "test.h"

				int nax;
				int x;

				void* thread1(void* arg) {
				nax = 1;
				__atomic_store_n(&x, 1, __ATOMIC_RELEASE);
				return 0;
				}

				void* thread2(void* arg) {
				if (__atomic_load_n(&x, __ATOMIC_ACQUIRE) == 1) {
				__atomic_store_n(&x, 2, __ATOMIC_RELAXED);
				}
				return 0;
				}

				void* thread3(void* arg) {
				if (__atomic_load_n(&x, __ATOMIC_ACQUIRE) == 2) {
				int temp = nax;
				(void)temp;
				} else {
				fprintf(stderr, "DONE\n");
				}
				return 0;
				}

				int main() {
				pthread_t t1;
				pthread_t t2;
				pthread_t t3;
				pthread_create(&t1, nullptr, thread1, nullptr);
				pthread_create(&t2, nullptr, thread2, nullptr);
				pthread_create(&t3, nullptr, thread3, nullptr);

				pthread_join(t1, nullptr);
				pthread_join(t2, nullptr);
				pthread_join(t3, nullptr);
				return 0;
				}

				// CHECK: WARNING: ThreadSanitizer: data race
				// CHECK-NOT-EMPTY:
				No newline at end of file

test/tsan/fence_norace.cpp

This file was added.

				// RUN: %clangxx_tsan -O0 %s -o %t && %run %t 2>&1 \| FileCheck %s
				#include "test.h"
				#include <atomic>

				int nax;
				std::atomic<int> x;

				void* thread1(void* arg) {
				nax = 1;
				std::atomic_thread_fence(std::memory_order_release);
				x.store(1, std::memory_order_relaxed);
				return 0;
				}

				void* thread2(void* arg) {
				if (x.load(std::memory_order_relaxed) == 1) {
				std::atomic_thread_fence(std::memory_order_acquire);
				printf("%d\n", nax);
				}
				return 0;
				}

				int main() {
				pthread_t t1;
				pthread_t t2;
				pthread_create(&t1, nullptr, thread1, nullptr);
				pthread_create(&t2, nullptr, thread2, nullptr);

				pthread_join(t1, nullptr);
				pthread_join(t2, nullptr);
				return 0;
				}

				// CHECK-NOT: ThreadSanitizer: data race
				// CHECK: 1