This is an archive of the discontinued LLVM Phabricator instance.

sanitizer_common: add deadlock detection to the Mutex2
ClosedPublic

Authored by dvyukov on Jul 22 2021, 6:50 AM.

Details

Summary
Copy internal deadlock detector from tsan to sanitizer_common
(with some cosmetic changes).
Tsan version will be deleted in subsequent changes.
This allows us to switch tsan to the sanitizer_common mutex
and remove tsan's mutex.

Diff Detail

Event Timeline

dvyukov requested review of this revision.Jul 22 2021, 6:50 AM
dvyukov created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2021, 6:50 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dvyukov updated this revision to Diff 360795.Jul 22 2021, 6:55 AM

re-upload in the hope that arc will auto-set dependent reviews

melver added inline comments.Jul 22 2021, 9:53 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
58

should this be 'static constexpr' ?

Summary says: "Move internal deadlock", but the patch only adds new code

dvyukov updated this revision to Diff 360864.Jul 22 2021, 10:03 AM
dvyukov marked an inline comment as done.

s/const/static constexpr/ and update commit description

Summary says: "Move internal deadlock", but the patch only adds new code

Updated commit description

dvyukov edited the summary of this revision. (Show Details)Jul 22 2021, 10:05 AM
vitalybuka added inline comments.Jul 22 2021, 10:21 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
76

if mutex_count == kMutexTypeMax
if (!mutex_meta[t].name) will overflow?

Why it can't just?
for (int t = 0; t < kMutexTypeMax; t++) {

dvyukov updated this revision to Diff 360876.Jul 22 2021, 10:32 AM
dvyukov edited the summary of this revision. (Show Details)

fixed CHECK_LE(mutex_count, kMutexTypeMax);

dvyukov added inline comments.Jul 22 2021, 10:35 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
76

I guess that needed to be CHECK_LT.
The original reason for this code is that I declared:

SANITIZER_WEAK_ATTRIBUTE MutexMeta mutex_meta[] = {{}};

(array with 1 element)
but this made compilers generate some wild code because they assumed the array is const size and has 1 element. But in reality its overriden in tsan with a different size. Compilers didn't understand the intention behind the WEAK part.

Now with mutex_meta[kMutexTypeMax] we can do this.
Done. PTAL

melver added inline comments.Jul 22 2021, 10:47 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
59

Given this is just the dummy, could this be mutex_meta[1] ? I think in the case that this is not overridden, the DebugMutexInit loop will just immediately see !mutex_meta[0].name, right?

76

We might still overflow with the improved loop if the overridden version has <kMutexTypeMax elements and no terminator (unless I misunderstand how weak linking works for arrays). I think the previous version was correct, because it is impossible to know what the size of the override array is.

Do we now still have a CHECK-fail if the last element was not the terminating element? Because otherwise somebody might keep adding entries and they will just be ignored.

Non-dummy arrays of mutex_meta should therefore actually be able to have up to kMutexTypeMax+1 elements to be checked, where the last one should always be the terminator.

melver added inline comments.Jul 22 2021, 10:48 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
59

Never mind, I just saw your comment about miscompilation.

dvyukov added inline comments.Jul 22 2021, 10:55 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
76

What about adding CHECK_LT(mutex_count, kMutexTypeMax) after the loop? It will catch "too many" types.

melver added inline comments.Jul 22 2021, 11:00 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
76

An override array can have up to kMutexTypeMax+1 elements with +1 being the terminator. Otherwise kMutexTypeMax is not the max, and the other arrays are wasting space.

I think to fix:

  • iterate kMutexTypeMax+1 elements.
  • add CHECK_LE(mutex_count, kMutexTypeMax) after loop.

The dummy mutex_meta might need a size-change to kMutexTypeMax+1 in case the compiler decides to be clever again.

melver added inline comments.Jul 22 2021, 11:06 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
76

... and there needs to be a CHECK after if (!mutex_meta[t].name)

mutex_count = t;
if (!mutex_meta[t].name)
  break;
CHECK(t < kMutexTypeMax); // Indicates array is not properly terminated.
vitalybuka accepted this revision.Jul 22 2021, 11:18 AM

Summary says: "Move internal deadlock", but the patch only adds new code

Updated commit description

Oh, it's actual copying, but it looks very different from TSAN version.

compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
51

Why this detector operates on types and not individual mutexes?
Is this enough? I guess we can have deadlock on two different mutexes of the same type?

152

mutex_count -> mutex_type_count ?

210

static THREADLOCAL

compiler-rt/lib/sanitizer_common/sanitizer_mutex.h
79

enum class (maybe)

This revision is now accepted and ready to land.Jul 22 2021, 11:18 AM
dvyukov added inline comments.Jul 22 2021, 11:22 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
76

and the other arrays are wasting space

This is a very slow debug mode. Execution speed effect is worse then spending few bytes of memory. And we are wasting half of the array right now, what tsan uses is half of kMutexTypeMax. And other sanitizer waste all of it. Not just one and not just if we are not careful with checks here. They are always wasting all of it.

dvyukov updated this revision to Diff 360903.Jul 22 2021, 11:32 AM
dvyukov marked 2 inline comments as done.

added static
s/mutex_count/mutex_type_count/
CHECK for mutex_meta array overflow

PTAL

compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
51

We don't lock more than 1 of the same type.
There will be 1 exception in the new tsan, but it's very local and I don't see how we can reasonably handle it. We would need to allocate per-mutex data (there can be millions of them), build runtime graph, traverse it, somehow protect from concurrent accesses in a scalable way, etc. That's tons of code complexity and memory/performance overheads. And for all types but one we still want to prohibit any recursion on mutexes of the same type.

compiler-rt/lib/sanitizer_common/sanitizer_mutex.h
79

The same reason we can't use enum call in tsan applies here. It's not possible to "extend" enum classes, and we need to be able to extend it in each sanitizer.

vitalybuka accepted this revision.Jul 22 2021, 11:59 AM
vitalybuka added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
51

That's fine, but maybe add comment explaining that. It's not obvious without digging into details.

dvyukov updated this revision to Diff 360919.Jul 22 2021, 12:08 PM

add comment explaining why deadlock detector operates on mutex types

dvyukov edited the summary of this revision. (Show Details)Jul 22 2021, 12:09 PM
dvyukov added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
51

Done

melver accepted this revision.Jul 22 2021, 12:12 PM
This revision was automatically updated to reflect the committed changes.
pgousseau added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
76

Hi Dmitry, the statement "int cnt[kMutexTypeMax] = {};" seems to be causing some unit test to fail to in some configurations ? Can we maybe remove the empty initializer list?
Thanks!
Pierre
cf:

$ cmake -G Ninja "../llvm" -DLLVM_ENABLE_PROJECTS="clang;compiler-rt;" -DLLVM_TARGETS_TO_BUILD="X86" -DCMAKE_BUILD_TYPE=Release -DCOMPILER_RT_DEBUG=ON -DCMAKE_C_COMPILER=clang-10 -DCMAKE_CXX_COMPILER=clang++-10
$ ninja -j12 check-sanitizer
[6/22] Generating Sanitizer-x86_64-Test-Nolibc
FAILED: projects/compiler-rt/lib/sanitizer_common/tests/Sanitizer-x86_64-Test-Nolibc
cd /home/pierre/llvm-project/build/projects/compiler-rt/lib/sanitizer_common/tests && /home/pierre/llvm-project/build/./bin/clang sanitizer_nolibc_test_main.x86_64.o -Wl,-whole-archive libRTSanitizerCommon.test.nolibc.x86_64.a -Wl,-no-whole-archive -o /home/pierre/llvm-project/build/projects/compiler-rt/lib/sanitizer_common/tests/./Sanitizer-x86_64-Test-Nolibc -static -nostdlib -m64 -m64
libRTSanitizerCommon.test.nolibc.x86_64.a(sanitizer_mutex.cpp.o): In function `__sanitizer::DebugMutexInit()':
/home/pierre/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp:76: undefined reference to `memset'
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
[13/22] Running lint check for sanitizer sources...
ninja: build stopped: subcommand failed.
dvyukov added inline comments.Jul 28 2021, 11:05 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp
76

I've sent https://reviews.llvm.org/D106978
thanks for reporting