This is an archive of the discontinued LLVM Phabricator instance.

tsan: unify __cxa_guard_acquire and pthread_once implementations
ClosedPublic

Authored by dvyukov on Aug 3 2021, 8:20 AM.

Details

Summary

Currently we effectively duplicate "once" logic for __cxa_guard_acquire
and pthread_once. Unify the implementations.

This is not a no-op change:

  • constants used for pthread_once are changed to match cxa_guard_acquire (cxa_guard_acquire constants are tied to ABI, but it does not seem to be the case for pthread_once)
  • pthread_once now also uses PotentiallyBlockingRegion annotations
  • __cxa_guard_acquire checks thr->in_ignored_lib to skip user synchronization

It's unclear if these 2 differences are intentional or a mere sloppy inconsistency.
Since all tests still pass, let's assume the latter.

Diff Detail

Event Timeline

dvyukov created this revision.Aug 3 2021, 8:20 AM
dvyukov requested review of this revision.Aug 3 2021, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2021, 8:20 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
vitalybuka accepted this revision.Aug 3 2021, 10:09 AM
This revision is now accepted and ready to land.Aug 3 2021, 10:09 AM
melver accepted this revision.Aug 3 2021, 10:16 AM

@dvyukov, this appears to be causing test failure on Darwin. Can you please look into it.

Failed Tests (1):

ThreadSanitizer-x86_64 :: cxa_guard_acquire.cpp

https://green.lab.llvm.org/green/job/clang-stage1-RA/23006/testReport/junit/ThreadSanitizer-x86_64/ThreadSanitizer-x86_64/cxa_guard_acquire_cpp/

@dvyukov, this appears to be causing test failure on Darwin. Can you please look into it.

Failed Tests (1):

ThreadSanitizer-x86_64 :: cxa_guard_acquire.cpp

https://green.lab.llvm.org/green/job/clang-stage1-RA/23006/testReport/junit/ThreadSanitizer-x86_64/ThreadSanitizer-x86_64/cxa_guard_acquire_cpp/

Humm... I've looked at this, but so far don't have any concrete ideas.

First, I see that this build also includes these 2 other changes that touched this code:
https://github.com/llvm/llvm-project/commit/0bc626d516a201953ef6a45be0d059d70672d7db
https://github.com/llvm/llvm-project/commit/e3f4c63e78b1ed54f0a35aeb30730e5c74bcfeed
here is build info:
https://green.lab.llvm.org/green/job/clang-stage1-RA/23006/

Also I see this is the only test that failed:
https://green.lab.llvm.org/green/job/clang-stage1-RA/23006/testReport/junit/ThreadSanitizer-x86_64/ThreadSanitizer-x86_64/

But it seems that we don't have any tests for pthread_once, so we can't say if this issue is related to cxa_guard_acquire specifically, or to both cxa_guard_acquire and pthread_once. That may have helped to narrow the problem down.

The only semantic change for __cxa_guard_acquire that I see is that we now test "!thr->in_ignored_lib" before doing Acquire/Release. But I don't see how _not_ calling these functions can cause any crashes...

OnPotentiallyBlockingRegionBegin/End was already present in __cxa_guard_acquire.

So for now I am puzzled...

Azharuddin, can you run this test under gdb and do a "bt", "disass" and "info registers" on SIGSEGV?

And FutexWait is the same sched_yield on mac as we had before in this code.

@dvyukov, this appears to be causing test failure on Darwin. Can you please look into it.

Failed Tests (1):

ThreadSanitizer-x86_64 :: cxa_guard_acquire.cpp

https://green.lab.llvm.org/green/job/clang-stage1-RA/23006/testReport/junit/ThreadSanitizer-x86_64/ThreadSanitizer-x86_64/cxa_guard_acquire_cpp/

Humm... I've looked at this, but so far don't have any concrete ideas.

First, I see that this build also includes these 2 other changes that touched this code:
https://github.com/llvm/llvm-project/commit/0bc626d516a201953ef6a45be0d059d70672d7db
https://github.com/llvm/llvm-project/commit/e3f4c63e78b1ed54f0a35aeb30730e5c74bcfeed
here is build info:
https://green.lab.llvm.org/green/job/clang-stage1-RA/23006/

Also I see this is the only test that failed:
https://green.lab.llvm.org/green/job/clang-stage1-RA/23006/testReport/junit/ThreadSanitizer-x86_64/ThreadSanitizer-x86_64/

But it seems that we don't have any tests for pthread_once, so we can't say if this issue is related to cxa_guard_acquire specifically, or to both cxa_guard_acquire and pthread_once. That may have helped to narrow the problem down.

The only semantic change for __cxa_guard_acquire that I see is that we now test "!thr->in_ignored_lib" before doing Acquire/Release. But I don't see how _not_ calling these functions can cause any crashes...

OnPotentiallyBlockingRegionBegin/End was already present in __cxa_guard_acquire.

So for now I am puzzled...

Azharuddin, can you run this test under gdb and do a "bt", "disass" and "info registers" on SIGSEGV?

And FutexWait is the same sched_yield on mac as we had before in this code.

Sorry, I couldn't get back earlier. I was able to narrow it down to this particular commit.

Here is the test output:

ThreadSanitizer:DEADLYSIGNAL
==31465==ERROR: ThreadSanitizer: stack-overflow on address 0x7ffee73fffd8 (pc 0x00010807fd2a bp 0x7ffee7400050 sp 0x7ffee73fffb0 T93815)
    #0 __tsan::MetaMap::GetSync(__tsan::ThreadState*, unsigned long, unsigned long, bool, bool) tsan_sync.cpp:195 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x78d2a)
    #1 __tsan::MutexPreLock(__tsan::ThreadState*, unsigned long, unsigned long, unsigned int) tsan_rtl_mutex.cpp:143 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x6cefc)
    #2 wrap_pthread_mutex_lock sanitizer_common_interceptors.inc:4240 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x3dae0)
    #3 flockfile <null>:2 (libsystem_c.dylib:x86_64+0x38a69)
    #4 puts <null>:2 (libsystem_c.dylib:x86_64+0x3f69b)
    #5 wrap_puts sanitizer_common_interceptors.inc (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x34d83)
    #6 __tsan::OnPotentiallyBlockingRegionBegin() cxa_guard_acquire.cpp:8 (foo:x86_64+0x100000e48)
    #7 wrap_pthread_once tsan_interceptors_posix.cpp:1512 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2f6e6)

I ran it under lldb and here is more information:

(lldb) disassemble --frame
libclang_rt.tsan_osx_dynamic.dylib`__tsan::MetaMap::GetSync:
    0x100180d10 <+0>:    pushq  %rbp
    0x100180d11 <+1>:    movq   %rsp, %rbp
    0x100180d14 <+4>:    pushq  %r15
    0x100180d16 <+6>:    pushq  %r14
    0x100180d18 <+8>:    pushq  %r13
    0x100180d1a <+10>:   pushq  %r12
    0x100180d1c <+12>:   pushq  %rbx
    0x100180d1d <+13>:   subq   $0x78, %rsp
    0x100180d21 <+17>:   movl   %r9d, %r14d
    0x100180d24 <+20>:   movl   %r8d, %r12d
    0x100180d27 <+23>:   movq   %rcx, %r11
    0x100180d2a <+26>:   movq   %rdx, -0x78(%rbp)
    0x100180d2e <+30>:   movq   %rsi, %r10
    0x100180d31 <+33>:   movq   %rdi, %r15
    0x100180d34 <+36>:   movq   %rcx, %rax
    0x100180d37 <+39>:   shrq   %rax
    0x100180d3a <+42>:   movabsq $0x7fffc3fffffffffc, %rcx ; imm = 0x7FFFC3FFFFFFFFFC
    0x100180d44 <+52>:   andq   %rax, %rcx
    0x100180d47 <+55>:   movabsq $0x300000000000, %rax     ; imm = 0x300000000000
    0x100180d51 <+65>:   leaq   (%rcx,%rax), %r13
    0x100180d55 <+69>:   movl   (%rcx,%rax), %eax
    0x100180d58 <+72>:   leaq   0xa00018(%rdi), %rcx
    0x100180d5f <+79>:   movq   %rcx, -0x68(%rbp)
    0x100180d63 <+83>:   leaq   0x200018(%rdi), %rcx
->  0x100180d6a <+90>:   movq   %rcx, -0x98(%rbp)
    0x100180d71 <+97>:   xorl   %edx, %edx
    0x100180d73 <+99>:   xorl   %esi, %esi
    0x100180d75 <+101>:  movq   %r11, -0x40(%rbp)
    0x100180d79 <+105>:  movq   %r10, -0x38(%rbp)
    0x100180d7d <+109>:  testl  %esi, %esi
    0x100180d7f <+111>:  jne    0x100180f35               ; <+549> at sanitizer_atomic_clang.h
    0x100180d85 <+117>:  testb  %r12b, %r12b
    0x100180d88 <+120>:  je     0x1001811b7               ; <+1191> at tsan_sync.cpp
    0x100180d8e <+126>:  movl   %eax, %edi
    0x100180d90 <+128>:  testl  %eax, %eax
    0x100180d92 <+130>:  je     0x100180de0               ; <+208> at tsan_sync.cpp:215:9
    0x100180d94 <+132>:  movl   %edi, %ecx
    0x100180d96 <+134>:  movl   %edi, %eax
    0x100180d98 <+136>:  andl   $0x40000000, %ecx         ; imm = 0x40000000
    0x100180d9e <+142>:  jne    0x100180de0               ; <+208> at tsan_sync.cpp:215:9
    0x100180da0 <+144>:  andl   $0x3fffffff, %eax         ; imm = 0x3FFFFFFF
    0x100180da5 <+149>:  movq   %rax, %rcx
(lldb) register read
General Purpose Registers:
       rax = 0x0000000080000035
       rbx = 0x00007fff8fa3ec18  __sFX + 216
       rcx = 0x0000000100cdcde0  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ctx_placeholder + 2097184
       rdx = 0x0000000100145a41  libclang_rt.tsan_osx_dynamic.dylib`::wrap_pthread_mutex_lock(void *) + 49 at sanitizer_common_interceptors.inc:4239:3
       rdi = 0x0000000100adcdc8  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ctx_placeholder + 8
       rsi = 0x00000001034ddbc0  libclang_rt.tsan_osx_dynamic.dylib`__tsan::main_thread_state
       rbp = 0x00007ffeef400080
       rsp = 0x00007ffeef3fffe0
        r8 = 0x0000000000000001
        r9 = 0x0000000000000001
       r10 = 0x00000001034ddbc0  libclang_rt.tsan_osx_dynamic.dylib`__tsan::main_thread_state
       r11 = 0x00007fff8fa3ec18  __sFX + 216
       r12 = 0x0000000000000001
       r13 = 0x000033ffc7d1f60c
       r14 = 0x0000000000000001
       r15 = 0x0000000100adcdc8  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ctx_placeholder + 8
       rip = 0x0000000100180d6a  libclang_rt.tsan_osx_dynamic.dylib`__tsan::MetaMap::GetSync(__tsan::ThreadState*, unsigned long, unsigned long, bool, bool) + 90 at tsan_sync.cpp
    rflags = 0x0000000000010206
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

Let me go ahead and mark this test as unsupported on Darwin for now until we get it sorted out.

glider added a subscriber: glider.Aug 18 2021, 4:36 AM

Let me go ahead and mark this test as unsupported on Darwin for now until we get it sorted out.

Hi Azharuddin,

Dmitry is out for a while. Any chance you could try a couple of changes locally to see if the problem is reproducible for you?

  1. Revert changes to pthread_once() interceptor, which shouldn't be involved in this crash. (Would be interesting to see it actually is - perhaps one of the interceptors is calling the other one)
  2. Revert the whole patch, but add the if (!thr->in_ignored_lib) checks in front of Acquire() and Release() calls in __cxa_guard_acquire() and __cxa_guard_release() interceptors.

If you don't have resources for that, we can probably revert this patch if doing so doesn't break Linux.

Let me go ahead and mark this test as unsupported on Darwin for now until we get it sorted out.

Can you check if this works: https://reviews.llvm.org/D108305