This is an archive of the discontinued LLVM Phabricator instance.

tsan: model atomic read for failing CAS
AcceptedPublic

Authored by dvyukov on Apr 27 2022, 1:34 AM.

Download Raw Diff

Details

Reviewers

melver
vitalybuka

Commits

rG2fec52a40261: tsan: model atomic read for failing CAS

Summary

See the added test and https://github.com/google/sanitizers/issues/1520
for the description of the problem.
The standard says that failing CAS is a memory load only,
model it as such to avoid false positives.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dvyukov created this revision.Apr 27 2022, 1:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 1:34 AM

dvyukov requested review of this revision.Apr 27 2022, 1:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 1:34 AM

Herald added a subscriber: Restricted Project. · View Herald Transcript

This is tricky, I am not 100% sure (1) it's the right thing to do, (2) that destruction safety problem (see the comment) will hit us or not (maybe there are some other cases I don't see).

In D124507#3476655, @dvyukov wrote:

This is tricky, I am not 100% sure (1) it's the right thing to do, (2) that destruction safety problem (see the comment) will hit us or not (maybe there are some other cases I don't see).

also (3) do I do correct reading of the standard?
(4) is it correct to extend this guarantee to the __atomic builtins?

Harbormaster completed remote builds in B161548: Diff 425439.Apr 27 2022, 1:54 AM

In D124507#3476656, @dvyukov wrote:

In D124507#3476655, @dvyukov wrote:

This is tricky, I am not 100% sure (1) it's the right thing to do, (2) that destruction safety problem (see the comment) will hit us or not (maybe there are some other cases I don't see).

also (3) do I do correct reading of the standard?
(4) is it correct to extend this guarantee to the __atomic builtins?

The question I wonder is: do we need 100% correctness at cost of complexity and/or other issues?

Several cons:

The fact we haven't encountered this on any realistic large codebase yet tells me that the example in the Github issue is a contrived example. Are there any real codebases where this false positive is a problem?
Mixing atomic and non-atomic ops should be discouraged.
If someone deliberately does mix plain and atomic ops and runs into this case, I'm assuming it's some one-off complex algorithm and the author could also add no_sanitize("thread").
In most algorithms, the CAS will eventually also do a write, so the cases where it will always be a false positive is extremely small.
(2) might be a more real problem.

One pro:

More correct interpretation of the standard.

Assuming the implementation can't be simpler and it causes other issues, pragmatically I would therefore avoid supporting this.

melver accepted this revision.Apr 28 2022, 3:58 AM

melver added inline comments.

compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
436–441	This leads me to think this is a real problem, but because of the memory ordering requirements it's not (AFAIK, and as just discussed in chat).

This revision is now accepted and ready to land.Apr 28 2022, 3:58 AM

Closed by commit rG2fec52a40261: tsan: model atomic read for failing CAS (authored by dvyukov). · Explain WhyMay 2 2022, 3:58 AM

This revision was automatically updated to reflect the committed changes.

dvyukov added a commit: rG2fec52a40261: tsan: model atomic read for failing CAS.

I've merged this. In few weeks it will pass more extensive testing, if there will be unforeseen issues, I will revert this. But at least we will know on what code it fails and add a test.

sanitizer-x86_64-linux-autoconf failed:

https://lab.llvm.org/buildbot/#/builders/70/builds/21206

[30/31] Running ThreadSanitizer tests

Testing: 442 tests, 80 workers --

command timed out: 1200 seconds without output running [b'python', b'../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py'], attempting to kill
process killed by signal 9

Not sure how this new test could have caused the timeout...

vitalybuka added a reverting change: rGeeccdd318d25: Revert "tsan: model atomic read for failing CAS".May 2 2022, 10:28 PM

vitalybuka reopened this revision.May 2 2022, 10:29 PM

This revision is now accepted and ready to land.May 2 2022, 10:29 PM

In D124507#3485363, @dvyukov wrote:

sanitizer-x86_64-linux-autoconf failed:

https://lab.llvm.org/buildbot/#/builders/70/builds/21206

[30/31] Running ThreadSanitizer tests

Testing: 442 tests, 80 workers --

command timed out: 1200 seconds without output running [b'python', b'../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py'], attempting to kill
process killed by signal 9

Not sure how this new test could have caused the timeout...

It's likely deadlock or other hang.

Yes, debug versions of compare_exchange.cpp and atomic_norace2.cpp hang with:

ThreadSanitizer: internal deadlock: can't lock Slots under SyncVar mutex

at:

(gdb) bt
#0 sanitizer::FutexWait (p=p@entry=0x7ffff7fa80a8, cmp=cmp@entry=0) at sanitizer_common/sanitizer_linux.cpp:678
#1 0x000000000042d9ea in sanitizer::Semaphore::Wait (this=0x7ffff7fa80a8) at sanitizer_common/sanitizer_mutex.cpp:35
#2 0x000000000042be7c in sanitizer::Mutex::Lock (this=this@entry=0x7ffff7fa8098) at sanitizer_common/sanitizer_mutex.h:196
#3 0x000000000043d6e0 in sanitizer::GenericScopedLock<sanitizer::Mutex>::GenericScopedLock (mu=0x7ffff7fa8098, this=<optimized out>) at sanitizer_common/sanitizer_mutex.h:383
#4 sanitizer::Symbolizer::SymbolizePC (this=0x7ffff7fa8028, addr=4815253) at sanitizer_common/sanitizer_symbolizer_libcdep.cpp:86
#5 0x000000000043be57 in sanitizer::(anonymous namespace)::StackTraceTextPrinter::ProcessAddressFrames (this=this@entry=0x7ffff3efbe88, pc=4815253) at sanitizer_common/sanitizer_stacktrace_libcdep.cpp:36
#6 0x000000000043bd79 in sanitizer::StackTrace::PrintTo (this=this@entry=0x7ffff3efbf18, output=output@entry=0x7ffff3efbee8) at sanitizer_common/sanitizer_stacktrace_libcdep.cpp:109
#7 0x000000000043c101 in sanitizer::StackTrace::Print (this=0x7ffff3efbf18) at sanitizer_common/sanitizer_stacktrace_libcdep.cpp:132
#8 0x00000000004a28f3 in sanitizer::PrintMutexPC (pc=4815254) at tsan/rtl/tsan_rtl.cpp:1064
#9 0x000000000042df9b in sanitizer::InternalDeadlockDetector::Lock (this=0x7ffff3efccc0, type=type@entry=12, pc=pc@entry=4498873) at sanitizer_common/sanitizer_mutex.cpp:177
#10 0x000000000042de05 in sanitizer::CheckedMutex::LockImpl (this=<optimized out>, pc=0) at sanitizer_common/sanitizer_mutex.cpp:218
#11 0x000000000042bdb2 in sanitizer::CheckedMutex::Lock (this=0x1868b80 <tsan::ctx_placeholder+10518912>) at sanitizer_common/sanitizer_mutex.h:129
#12 sanitizer::Mutex::Lock (this=this@entry=0x1868b80 <tsan::ctx_placeholder+10518912>) at sanitizer_common/sanitizer_mutex.h:167
#13 0x000000000044a5b9 in sanitizer::GenericScopedLock<sanitizer::Mutex>::GenericScopedLock (mu=0x1868b80 <tsan::ctx_placeholder+10518912>, this=<optimized out>) at tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:383
#14 tsan::TraceAcquire<tsan::EventFunc> (thr=0x7ffff3efcf00, ev=<optimized out>) at tsan/rtl/tsan_rtl.h:659
#15 tsan::TryTraceFunc (thr=0x7ffff3efcf00, pc=4411429) at tsan/rtl/tsan_rtl.h:702
#16 tsan::FuncEntry (thr=0x7ffff3efcf00, pc=4411429) at tsan/rtl/tsan_rtl.h:733
#17 tsan::ScopedInterceptor::ScopedInterceptor (this=this@entry=0x7ffff3efc010, thr=thr@entry=0x7ffff3efcf00, fname=<optimized out>, pc=4411429) at tsan/rtl/tsan_interceptors_posix.cpp:251
#18 0x0000000000452cf9 in interceptor_dl_iterate_phdr (cb=0x435030 <sanitizer::dl_iterate_phdr_cb(dl_phdr_info*, unsigned long, void*)>, data=0x7ffff3efc040) at tsan/rtl/tsan_interceptors_posix.cpp:2300
#19 0x0000000000435025 in sanitizer::ListOfModules::init (this=0x7ffff7fa8050) at sanitizer_common/sanitizer_linux_libcdep.cpp:691
#20 0x000000000043d7b3 in sanitizer::Symbolizer::RefreshModules (this=0x7ffff7fa8028) at sanitizer_common/sanitizer_symbolizer_libcdep.cpp:185
#21 sanitizer::Symbolizer::FindModuleForAddress (this=this@entry=0x7ffff7fa8028, address=address@entry=4815253) at sanitizer_common/sanitizer_symbolizer_libcdep.cpp:204
#22 0x000000000043d6fb in sanitizer::Symbolizer::SymbolizePC (this=0x7ffff7fa8028, addr=4815253) at sanitizer_common/sanitizer_symbolizer_libcdep.cpp:88
#23 0x000000000043be57 in sanitizer::(anonymous namespace)::StackTraceTextPrinter::ProcessAddressFrames (this=this@entry=0x7ffff3efc158, pc=4815253) at sanitizer_common/sanitizer_stacktrace_libcdep.cpp:36
#24 0x000000000043bd79 in sanitizer::StackTrace::PrintTo (this=this@entry=0x7ffff3efc1e8, output=output@entry=0x7ffff3efc1b8) at sanitizer_common/sanitizer_stacktrace_libcdep.cpp:109
#25 0x000000000043c101 in sanitizer::StackTrace::Print (this=0x7ffff3efc1e8) at sanitizer_common/sanitizer_stacktrace_libcdep.cpp:132
#26 0x00000000004a28f3 in sanitizer::PrintMutexPC (pc=4815254) at tsan/rtl/tsan_rtl.cpp:1064
#27 0x000000000042df9b in sanitizer::InternalDeadlockDetector::Lock (this=0x7ffff3efccc0, type=type@entry=12, pc=pc@entry=4876568) at sanitizer_common/sanitizer_mutex.cpp:177
#28 0x000000000042de05 in sanitizer::CheckedMutex::LockImpl (this=<optimized out>, pc=0) at sanitizer_common/sanitizer_mutex.cpp:218
#29 0x000000000042bdb2 in sanitizer::CheckedMutex::Lock (this=0x1868b80 <tsan::ctx_placeholder+10518912>) at sanitizer_common/sanitizer_mutex.h:129
#30 sanitizer::Mutex::Lock (this=this@entry=0x1868b80 <tsan::ctx_placeholder+10518912>) at sanitizer_common/sanitizer_mutex.h:167
#31 0x00000000004a6918 in sanitizer::GenericScopedLock<sanitizer::Mutex>::GenericScopedLock (mu=0x1868b80 <tsan::ctx_placeholder+10518912>, this=<optimized out>) at tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:383
#32 tsan::TraceAcquire<tsan::EventAccess> (thr=0x7ffff3efcf00, ev=<optimized out>) at tsan/rtl/tsan_rtl.h:659
#33 tsan::TryTraceMemoryAccess (thr=0x7ffff3efcf00, pc=5011256, addr=26177776, size=8, typ=3) at tsan/rtl/tsan_rtl_access.cpp:25
#34 tsan::MemoryAccess (thr=thr@entry=0x7ffff3efcf00, pc=pc@entry=5011256, addr=addr@entry=26177776, size=size@entry=8, typ=typ@entry=3) at tsan/rtl/tsan_rtl_access.cpp:449
#35 0x00000000004979ce in AtomicCAS<unsigned long long> (thr=thr@entry=0x7ffff3efcf00, pc=5011256, a=a@entry=0x18f70f0 <_nodes>, c=c@entry=0x7ffff3efc390, v=v@entry=0, mo=tsan::mo_relaxed, fmo=tsan::mo_relaxed) at tsan/rtl/tsan_interface_atomic.cpp:462
#36 0x0000000000498913 in AtomicCAS<unsigned long long> (thr=0x7ffff3efcf00, pc=128, a=0x18f70f0 <_nodes>, c=135257110085632, v=0, mo=tsan::mo_relaxed, fmo=tsan::mo_relaxed) at tsan/rtl/tsan_interface_atomic.cpp:479
#37 tsan_atomic64_compare_exchange_val (a=0x18f70f0 <_nodes>, c=0, v=0, mo=tsan::mo_relaxed, fmo=tsan::mo_relaxed) at tsan/rtl/tsan_interface_atomic.cpp:853
#38 0x00000000004c7738 in std::atomic_base<node*>::compare_exchange_strong (m1=<optimized out>, this=<optimized out>, p1=<optimized out>, p2=<optimized out>, m2=<optimized out>) at /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:880
#39 std::atomic<node*>::compare_exchange_weak (this=this@entry=0x18f70f0 <_nodes>, p1=@0x7ffff3efc408: 0x0, p2=p2@entry=0x0, m1=m1@entry=std::memory_order_release, m2=m2@entry=std::memory_order_relaxed) at /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/atomic:589
#40 0x00000000004c7263 in f2<0> (i=i@entry=0, mo=mo@entry=std::memory_order_release, fmo=std::memory_order_relaxed) at ../../../test/tsan/compare_exchange.cpp:25
#41 0x00000000004c785b in std::invoke_impl<void, void (*)(int, std::memory_order, std::memory_order), unsigned int, std::memory_order, std::memory_order> (f=<optimized out>, args=@0x7b0800000048: std::memory_order_relaxed, args=@0x7b0800000048: std::memory_order_relaxed, args=@0x7b0800000048: std::memory_order_relaxed) at /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
#42 std::invoke<void (*)(int, std::memory_order, std::memory_order), unsigned int, std::memory_order, std::memory_order> (fn=<optimized out>, args=@0x7b0800000048: std::memory_order_relaxed, args=@0x7b0800000048: std::memory_order_relaxed, args=@0x7b0800000048: std::memory_order_relaxed) at /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
#43 std::thread::_Invoker<std::tuple<void (*)(int, std::memory_order, std::memory_order), unsigned int, std::memory_order, std::memory_order> >::_M_invoke<0ul, 1ul, 2ul, 3ul> (this=0x7b0800000048) at /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253
#44 std::thread::_Invoker<std::tuple<void (*)(int, std::memory_order, std::memory_order), unsigned int, std::memory_order, std::memory_order> >::operator() (this=0x7b0800000048) at /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260
#45 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(int, std::memory_order, std::memory_order), unsigned int, std::memory_order, std::memory_order> > >::_M_run (this=0x7b0800000040) at /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211
#46 0x00007ffff7e6c954 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#47 0x000000000044d29c in __tsan_thread_start_func (arg=<optimized out>) at tsan/rtl/tsan_interceptors_posix.cpp:1012
#48 0x00007ffff7c3dd80 in start_thread (arg=0x7ffff3efd640) at pthread_create.c:481
#49 0x00007ffff7b2b76f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

And modelling the memory access after the CAS and after releasing the mutex it subject to use-after-frees even without standalone fences.
So so far this looks infeasible.

Revision Contents

Path

Size

compiler-rt/

lib/

tsan/

rtl/

tsan_interface_atomic.cpp

38 lines

test/

tsan/

atomic_norace3.cpp

23 lines

Diff 425439

compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp

	Show First 20 Lines • Show All 405 Lines • ▼ Show 20 Lines
	template <typename T>			template <typename T>
	static bool AtomicCAS(ThreadState thr, uptr pc, volatile T a, T *c, T v,			static bool AtomicCAS(ThreadState thr, uptr pc, volatile T a, T *c, T v,
	morder mo, morder fmo) {			morder mo, morder fmo) {
	// 31.7.2.18: "The failure argument shall not be memory_order_release			// 31.7.2.18: "The failure argument shall not be memory_order_release
	// nor memory_order_acq_rel". LLVM (2021-05) fallbacks to Monotonic			// nor memory_order_acq_rel". LLVM (2021-05) fallbacks to Monotonic
	// (mo_relaxed) when those are used.			// (mo_relaxed) when those are used.
	DCHECK(IsLoadOrder(fmo));			DCHECK(IsLoadOrder(fmo));

	MemoryAccess(thr, pc, (uptr)a, AccessSize<T>(), kAccessWrite \| kAccessAtomic);
	if (LIKELY(mo == mo_relaxed && fmo == mo_relaxed)) {			if (LIKELY(mo == mo_relaxed && fmo == mo_relaxed)) {
	T cc = *c;			T cc = *c;
	T pr = func_cas(a, cc, v);			T pr = func_cas(a, cc, v);
	if (pr == cc)			bool success = pr == cc;
	return true;			if (!success)
	*c = pr;			*c = pr;
	return false;			// We used to always model a write and before the actual CAS.
				// However, the standard 29.6.5/21 says:
				// If the operation returns true, these operations are atomic
				// read-modify-write operations (1.10). Otherwise, these operations
				// are atomic load operations.
				//
				// And this difference can lead to false positive reports when
				// a program uses __atomic builtins and one thread executes a failing CAS
				// while another executes a non-atomic load. If we always model a write
				// these operations will race.
				//
				// This leads to another subtle aspect:
				// Atomics provide destruction-safety and can be used to synchronize
				// own destruction/freeing of underlying memory.
				// So one thread can do (a successful) CAS to signal to free the memory,
				// and another thread do an atomic load, observe the CAS and free the
				// memory. So modelling the memory access after the actual access is
				// somewhat risky (can potentially lead to other false positive reports if
				// the memory is already reused when we model the memory access). However,
				// destruction signalling must use acquire/release memory ordering
				// constraints. So we should lock s->mtx below and it should prevent the
				// reuse race. If/when we support stand-alone memory fences, it's unclear
				melverUnsubmitted Not Done Reply Inline Actions This leads me to think this is a real problem, but because of the memory ordering requirements it's not (AFAIK, and as just discussed in chat). melver: This leads me to think this is a real problem, but because of the memory ordering requirements…
				// how to handle this case. We could always lock s->mtx even for relaxed
				// accesses, but it will have severe performance impact on relaxed loads.
				MemoryAccess(thr, pc, (uptr)a, AccessSize<T>(),
				kAccessAtomic \| (success ? kAccessWrite : kAccessRead));
				return success;
	}			}
	SlotLocker locker(thr);			SlotLocker locker(thr);
	bool release = IsReleaseOrder(mo);			bool release = IsReleaseOrder(mo);
	bool success;			bool success;
	{			{
	auto s = ctx->metamap.GetSyncOrCreate(thr, pc, (uptr)a, false);			auto s = ctx->metamap.GetSyncOrCreate(thr, pc, (uptr)a, false);
	RWLock lock(&s->mtx, release);			RWLock lock(&s->mtx, release);
	T cc = *c;			T cc = *c;
	T pr = func_cas(a, cc, v);			T pr = func_cas(a, cc, v);
	success = pr == cc;			success = pr == cc;
	if (!success) {			if (!success) {
	*c = pr;			*c = pr;
	mo = fmo;			mo = fmo;
	}			}
				// See the comment on the previous MemoryAccess.
				MemoryAccess(thr, pc, (uptr)a, AccessSize<T>(),
				kAccessAtomic \| (success ? kAccessWrite : kAccessRead));
	if (success && IsAcqRelOrder(mo))			if (success && IsAcqRelOrder(mo))
	thr->clock.ReleaseAcquire(&s->clock);			thr->clock.ReleaseAcquire(&s->clock);
	else if (success && IsReleaseOrder(mo))			else if (success && IsReleaseOrder(mo))
	thr->clock.Release(&s->clock);			thr->clock.Release(&s->clock);
	else if (IsAcquireOrder(mo))			else if (IsAcquireOrder(mo))
	thr->clock.Acquire(s->clock);			thr->clock.Acquire(s->clock);
	}			}
	if (success && release)			if (success && release)
	▲ Show 20 Lines • Show All 482 Lines • Show Last 20 Lines

compiler-rt/test/tsan/atomic_norace3.cpp

This file was added.

				// RUN: %clangxx_tsan -O1 %s %link_libcxx_tsan -o %t && %run %t 2>&1 \| FileCheck %s

				#include "test.h"
				#include <thread>

				int main() {
				barrier_init(&barrier, 2);
				volatile int x = 0;
				std::thread reader([&]() {
				barrier_wait(&barrier);
				int l = x;
				(void)l;
				});
				int cmp = 1;
				__atomic_compare_exchange_n(&x, &cmp, 1, 1, __ATOMIC_RELAXED,
				__ATOMIC_RELAXED);
				barrier_wait(&barrier);
				reader.join();
				fprintf(stderr, "DONE\n");
				}

				// CHECK-NOT: ThreadSanitizer: data race
				// CHECK: DONE