For platforms which support slow unwinder only, we restrict the store context size to 1, basically only storing the current pc. We do this because the slow unwinder which is based on libunwind is not async signal safe and causes random freezes in forking applications as well as in signal handlers.
Diff Detail
- Repository
- rL LLVM
Event Timeline
Sounds like a bug in msan runtime - it should not hang while reading origins. What is really happening?
Hi @eugenis
This test fails on MIPS because many threads are waiting indefinitely on a futex wait. The test pases with reduced number of child processes. Is it okay if we reduce the number of child processes of this test for MIPS?
Regards,
Sagar
Well, that's simply another way of saying "hangs while reading the chained origins". Reducing the number of child processes is likely to reduce the probability of the deadlock instead of eliminating it. Anyway, this sounds like a bug in the msan runtime library.
This test is for the ChainedOriginDepotLockAll() logic in msan_interceptors.cc. Could you verify that the fork interceptor is being used on MIPS? I wonder if we need to intercept something else, like vfork.
This test is for the ChainedOriginDepotLockAll() logic in msan_interceptors.cc. Could you verify that the fork interceptor is being used on MIPS? I wonder if we need to intercept something else, like vfork.
The fork call is being intercepted by the runtime MIPS.
In order to find the root cause of this bug I debugged fork.cc on x86 (as gdb on mips was unable to give a proper backtrace from the point of deadlock) with the fast unwinder disabled (the deadlock appears only when fast unwinder is disabled and slow unwinder is used instead). Following is the stack trace from the point of deadlock on x86:
#0 lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fbdeaaa7e92 in GI___pthread_mutex_lock ( mutex=0x7fbdeb56c970 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007fbdea1f542f in GI_dl_iterate_phdr ( callback=0x437630 <msan_dl_iterate_phdr_cb(sanitizer::sanitizer_dl_phdr_info*, SIZE_T, void*)>, data=0x7ffdaaaaca60) at dl-iteratephdr.c:41
#3 0x00000000004269fc in interceptor_dl_iterate_phdr (callback=0x7fbdea48cdf0, data=0x7ffdaaaaca90) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/msan/msan_interceptors.cc:1430
#4 0x00007fbdea48e16e in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#5 0x00007fbdea48ab63 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#6 0x00007fbdea48bd80 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7 0x00007fbdea48c638 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8 0x00000000004895c3 in sanitizer::BufferedStackTrace::SlowUnwindStack ( this=0x7ffdaaaad040, pc=pc@entry=4790675, max_depth=max_depth@entry=20) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_unwind_linux_libcdep.cc:125
#9 0x000000000048641a in sanitizer::BufferedStackTrace::Unwind ( this=this@entry=0x7ffdaaaad040, max_depth=max_depth@entry=20, pc=pc@entry=4790675, bp=bp@entry=140727466776720, context=context@entry=0x0, stack_top=stack_top@entry=0, stack_bottom=0, request_fast_unwind=true) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:76
#10 0x000000000041be78 in msan::GetStackTrace (request_fast_unwind=true, bp=140727466776720, pc=4790675, max_s=20, stack=0x7ffdaaaad040) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/msan/msan.cc:226
#11 __msan_chain_origin (id=2147483648) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/msan/msan.cc:562
#12 0x0000000000491993 in child () at /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/fork.cc:60
#13 test () at /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/fork.cc:80
#14 0x00000000004924b4 in main () at /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/fork.cc:91
From this backtrace, it looks like the deadlock is in libunwind where dl_interate_phdr is waiting on a lock.
Good catch!
So, the slow unwinder is not async signal safe. This makes _any_ store of uninit value not async signal safe with chained-origins on platforms w/o the fast unwinder (mips, sparc).
I think we should play it safe and disable the slow unwinder in __msan_chain_origin. If there is no fast unwinder for the platform, the next best thing would be reducing store_context_size to 1 (or is it 0? basically, just recording the current PC). This would not be great for the user experience, but the alternative is random freezes in forking applications as well as in signal handlers.
Phabricator shows lots of affected files with empty diffs for some reason.
lib/msan/msan.h | ||
---|---|---|
340 | max(1, flags()->store_context_size) | |
341 | and this can be just "false" | |
test/msan/chained_origin.cc | ||
16 | I can't find any use of CHECK-(NON-)MIPS-FRAMES-HEAP in this test. | |
test/msan/chained_origin_limits.cc | ||
157 | Hm, this test expects three instances of "Uninitialized value was stored to memory at" on non-mips, and only one instance on mips. Why is that? | |
test/msan/lit.cfg | ||
42 | fast unwinder is also unavailable on sparc, please include that here | |
43 | i don't think these braces around "CHECK-MIPS-FRAMES" are necessary | |
45 | Instead of mips and non-mips, let's call it something neutral, like |
Addressed review comments
lib/msan/msan.h | ||
---|---|---|
340 | max(1, flags()->store_context_size) will return the value of flags()->store_context_size which is 1000 in the test case fork.cc. This will invoke the slow unwinder and cause deadlock. | |
test/msan/chained_origin_limits.cc | ||
157 | On architectures with short stack, due to reduced origin context, all the stacks in the chain are same because the stack trace contains only 1 frame and does not contain frames upto the functions (fn1, fn2, fn3) from where the uninitialized stores actually originate. And since origin_history_per_stack_limit is set to 1 we only get 1 instance of "Uninitialized value was stored to memory at" per stack. Therefore even though the uninitialized stores come from 3 different functions we only have one stack trace. In case of architectures having full stack, there are 3 different stacks because the uninitialized stores originate from 3 different functions in the chain and therefore we can see 1 instance (per stack) of "Uninitialized value was stored to memory at". The full stack (for architectures with full origin context) which is different from the 2 other stacks in the chain is :
Whereas the short stack (for architectures with short origin context) which is same as all other stacks in the chain is :
| |
test/msan/lit.cfg | ||
42 | Msan is not supported on sparc architecture. Do you still want me to include it ? | |
45 | If we use "" in place of %short-stack for architectures which have full stack, then we will have to check the output lines for the full stack with the "CHECK" prefix . And these full stack frame lines will also be checked for architectures which have short stack because most of the tests have "--check-prefix=CHECK" for checking output lines other than stack frames. Therefore it is important to distinguish between architectures having short stack and architectures having full stack context. So I have used |
LGTM w/ 2 minor comments
lib/msan/msan.h | ||
---|---|---|
340 | Sorry, I meant min(). | |
test/msan/chained_origin_limits.cc | ||
157 | Yeah, I've completely forgot about this feature. Please add a short comment to the test case to avoid future questions. | |
test/msan/lit.cfg | ||
42 | No, that's fine. | |
45 | agreed. |
max(1, flags()->store_context_size)