This is an archive of the discontinued LLVM Phabricator instance.

[asan] Fix deadlock issue on FreeBSD, caused by use of .preinit_array in rL325240
ClosedPublic

Authored by MaskRay on Jun 30 2018, 1:27 PM.

Details

Summary

Without this patch,
clang -fsanitize=address -xc =(printf 'int main(){}') -o a; ./a => deadlock in __asan_init>AsanInitInternal>AsanTSDInit>...>__getcontextx_size>_rtld_bind>rlock_acquire(rtld_bind_lock, &lockstate)

libexec/rtld-elf/rtld.c

wlock_acquire(rtld_bind_lock, &lockstate);
if (obj_main->crt_no_init)
  preinit_main(); // unresolved PLT functions cannot be called here

lib/libthr/thread/thr_rtld.c

  uc_len = __getcontextx_size(); // unresolved PLT function in libthr.so.3

/* So sad that I cannot attend LLVM Social - Shanghai: July 1st, 2018 */
/* This revision is dedicated to hotpxl who will start a new adventure in New York City. */

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.Jun 30 2018, 1:27 PM
Herald added subscribers: Restricted Project, llvm-commits, delcypher and 3 others. · View Herald TranscriptJun 30 2018, 1:27 PM
MaskRay edited the summary of this revision. (Show Details)Jun 30 2018, 1:41 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay edited the summary of this revision. (Show Details)Jun 30 2018, 3:29 PM

Is Xray instrumentation working now ?

MaskRay edited the summary of this revision. (Show Details)Jun 30 2018, 3:33 PM
MaskRay added a comment.EditedJun 30 2018, 3:43 PM

Is Xray instrumentation working now ?

% fclang++ -fxray-instrument -fxray-instruction-threshold=1 a.cc -o a
% XRAY_OPTIONS=patch_premain=true:verbosity=1:xray_mode=xray-basic ./a                                         
==29379==Missing rdtscp support.                                                                               
==29379==Missing rdtscp support.                                                                               
==29379==WARNING: Required CPU features missing for XRay instrumentation, using                                
emulation instead.                                                                                             
==29379==XRay: Log file in 'xray-log.a.q1WUln'                                                                 
meow                                                                                                           
meow                                                                                                           
==29379==Cleaned up log for TID: 100557  
% ls -l xray-log.a.q1WUln                                                                                      
-rw-------  1 ray  wheel  224 Jun 30 15:41 xray-log.a.q1WUln
% fllvm-xray account xray-log.a.q1WUln
Functions with latencies: 2
   funcid      count [      min,       med,       90p,       99p,       max]
   sum  function
        1          2 [ 0.000015,  0.000143,  0.000143,  0.000143,  0.000143]  0.000158  (unknown): #1
        2          1 [ 0.000191,  0.000191,  0.000191,  0.000191,  0.000191]  0.000191  (unknown): #2

Hopefully unit tests still passing.

MaskRay added a comment.EditedJun 30 2018, 6:10 PM

Hopefully unit tests still passing.

Some check-xray tests will fail on FreeBSD, e.g. test/xray/TestCases/Posix/fdr-mode.cc. It seems these xray tests rely on eager initialization of __xray_init, but I don't think this revision should be blocked by these failures. The current check-xray status isn't clean either:

Failing Tests (2):
  XRay-x86_64-freebsd :: TestCases/Posix/profiling-multi-threaded.cc
  XRay-x86_64-freebsd :: TestCases/Posix/profiling-single-threaded.cc

Note, rL325240 makes ASAN seriously broken on FreeBSD: all -fsanitize=address tests will deadlock (they will dead lock in __asan_init) (so ninja check-asan will hang forever)

clang -fsanitize=address -xc =(printf 'int main(){}') -o a; ./a => deadlock in __asan_init>AsanInitInternal>AsanTSDInit>...>__getcontextx_size>_rtld_bind>rlock_acquire(rtld_bind_lock, &lockstate)
MaskRay updated this revision to Diff 153641.Jun 30 2018, 6:16 PM

80 columns rule

This issue is well known in fact, we were trying to solve it. @dim for advice here.

The failing tests might depend on FreeBSD version, on 10.4

[100%] Running the XRay tests
Testing Time: 2.81s
  Expected Passes    : 57
  Unsupported Tests  : 1
[100%] Built target check-xray

Also, as an alternative, what if we disable the early init in asan only instead of generalising it ? Works too.

MaskRay updated this revision to Diff 153651.Jul 1 2018, 9:37 AM

Special case xray to use unsafe .preinit_array

The failing tests might depend on FreeBSD version, on 10.4

[100%] Running the XRay tests
Testing Time: 2.81s
  Expected Passes    : 57
  Unsupported Tests  : 1
[100%] Built target check-xray

Also, as an alternative, what if we disable the early init in asan only instead of generalising it ? Works too.

Made a special case in xray_init.cc

devnexen accepted this revision.Jul 1 2018, 9:58 AM
This revision is now accepted and ready to land.Jul 1 2018, 9:58 AM
This revision was automatically updated to reflect the committed changes.
dim added a comment.Jul 1 2018, 2:23 PM

Sorry for not getting to this one earlier, indeed it fixes the deadlock for me too!

Now the only "big" thing left is the hundreds of:

==41641==AddressSanitizer CHECK failed: /share/dim/src/llvm/trunk/projects/compiler-rt/lib/asan/asan_posix.cc:48 "((0)) == ((pthread_key_create(&tsd_key, destructor)))" (0x0, 0x4e)
`

failures. :-)

devnexen added a comment.EditedJul 2 2018, 1:35 AM
In D48806#1149021, @dim wrote:

Sorry for not getting to this one earlier, indeed it fixes the deadlock for me too!

Now the only "big" thing left is the hundreds of:

==41641==AddressSanitizer CHECK failed: /share/dim/src/llvm/trunk/projects/compiler-rt/lib/asan/asan_posix.cc:48 "((0)) == ((pthread_key_create(&tsd_key, destructor)))" (0x0, 0x4e)
`

failures. :-)

I guess it s FreeBSD 11.2 ? Seems his fix working on 10.4, will check on 11.x when I can ... otherwise there is always the thread local solution I threw a while ago. WE might need to come up with another solution anyway @krytarowski reports it is still an issue on NetBSD.

dim added a comment.Jul 2 2018, 9:35 AM
In D48806#1149021, @dim wrote:

Sorry for not getting to this one earlier, indeed it fixes the deadlock for me too!

Now the only "big" thing left is the hundreds of:

==41641==AddressSanitizer CHECK failed: /share/dim/src/llvm/trunk/projects/compiler-rt/lib/asan/asan_posix.cc:48 "((0)) == ((pthread_key_create(&tsd_key, destructor)))" (0x0, 0x4e)
`

failures. :-)

I guess it s FreeBSD 11.2 ? Seems his fix working on 10.4, will check on 11.x when I can ... otherwise there is always the thread local solution I threw a while ago. WE might need to come up with another solution anyway @krytarowski reports it is still an issue on NetBSD.

This is on FreeBSD 12-CURRENT, but it could also show the same issue on 11.x.

dim added a comment.Jul 2 2018, 2:37 PM
In D48806#1149655, @dim wrote:
In D48806#1149021, @dim wrote:

Sorry for not getting to this one earlier, indeed it fixes the deadlock for me too!

Now the only "big" thing left is the hundreds of:

==41641==AddressSanitizer CHECK failed: /share/dim/src/llvm/trunk/projects/compiler-rt/lib/asan/asan_posix.cc:48 "((0)) == ((pthread_key_create(&tsd_key, destructor)))" (0x0, 0x4e)
`

failures. :-)

I guess it s FreeBSD 11.2 ? Seems his fix working on 10.4, will check on 11.x when I can ... otherwise there is always the thread local solution I threw a while ago. WE might need to come up with another solution anyway @krytarowski reports it is still an issue on NetBSD.

This is on FreeBSD 12-CURRENT, but it could also show the same issue on 11.x.

I verified this, just to be sure, and it also happens on 11.x.

Maybe we need to move from pthread_get/set_specific ? https://reviews.llvm.org/D47448

Maybe we need to move from pthread_get/set_specific ? https://reviews.llvm.org/D47448

Yes, please. I was thinking about the same _lwp_kill(2) trick like in TSan or LSan for NetBSD. I've noted that FreeBSD adopted this solution there too.