This is an archive of the discontinued LLVM Phabricator instance.

[MSAN][MIPS] Fix fork.cc test on MIPS
ClosedPublic

Authored by slthakur on Aug 3 2016, 12:14 AM.

Details

Summary

For platforms which support slow unwinder only, we restrict the store context size to 1, basically only storing the current pc. We do this because the slow unwinder which is based on libunwind is not async signal safe and causes random freezes in forking applications as well as in signal handlers.

Diff Detail

Repository
rL LLVM

Event Timeline

slthakur updated this revision to Diff 66620.Aug 3 2016, 12:14 AM
slthakur retitled this revision from to [MSAN][MIPS] Fix fork.cc test on MIPS.
slthakur updated this object.
slthakur added reviewers: eugenis, kcc, samsonov.
slthakur set the repository for this revision to rL LLVM.
slthakur added a project: Restricted Project.
eugenis requested changes to this revision.Aug 4 2016, 1:36 PM
eugenis edited edge metadata.

Sounds like a bug in msan runtime - it should not hang while reading origins. What is really happening?

This revision now requires changes to proceed.Aug 4 2016, 1:36 PM

Hi @eugenis

This test fails on MIPS because many threads are waiting indefinitely on a futex wait. The test pases with reduced number of child processes. Is it okay if we reduce the number of child processes of this test for MIPS?

Regards,
Sagar

Well, that's simply another way of saying "hangs while reading the chained origins". Reducing the number of child processes is likely to reduce the probability of the deadlock instead of eliminating it. Anyway, this sounds like a bug in the msan runtime library.

This test is for the ChainedOriginDepotLockAll() logic in msan_interceptors.cc. Could you verify that the fork interceptor is being used on MIPS? I wonder if we need to intercept something else, like vfork.

This test is for the ChainedOriginDepotLockAll() logic in msan_interceptors.cc. Could you verify that the fork interceptor is being used on MIPS? I wonder if we need to intercept something else, like vfork.

The fork call is being intercepted by the runtime MIPS.

In order to find the root cause of this bug I debugged fork.cc on x86 (as gdb on mips was unable to give a proper backtrace from the point of deadlock) with the fast unwinder disabled (the deadlock appears only when fast unwinder is disabled and slow unwinder is used instead). Following is the stack trace from the point of deadlock on x86:

#0 lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fbdeaaa7e92 in
GI___pthread_mutex_lock ( mutex=0x7fbdeb56c970 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007fbdea1f542f in GI_dl_iterate_phdr ( callback=0x437630 <msan_dl_iterate_phdr_cb(sanitizer::sanitizer_dl_phdr_info*, SIZE_T, void*)>, data=0x7ffdaaaaca60) at dl-iteratephdr.c:41
#3 0x00000000004269fc in interceptor_dl_iterate_phdr (callback=0x7fbdea48cdf0, data=0x7ffdaaaaca90) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/msan/msan_interceptors.cc:1430
#4 0x00007fbdea48e16e in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#5 0x00007fbdea48ab63 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#6 0x00007fbdea48bd80 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7 0x00007fbdea48c638 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8 0x00000000004895c3 in
sanitizer::BufferedStackTrace::SlowUnwindStack ( this=0x7ffdaaaad040, pc=pc@entry=4790675, max_depth=max_depth@entry=20) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_unwind_linux_libcdep.cc:125
#9 0x000000000048641a in sanitizer::BufferedStackTrace::Unwind ( this=this@entry=0x7ffdaaaad040, max_depth=max_depth@entry=20, pc=pc@entry=4790675, bp=bp@entry=140727466776720, context=context@entry=0x0, stack_top=stack_top@entry=0, stack_bottom=0, request_fast_unwind=true) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:76
#10 0x000000000041be78 in
msan::GetStackTrace (request_fast_unwind=true, bp=140727466776720, pc=4790675, max_s=20, stack=0x7ffdaaaad040) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/msan/msan.cc:226
#11 __msan_chain_origin (id=2147483648) at /home/slt/LLVM/llvm/projects/compiler-rt/lib/msan/msan.cc:562
#12 0x0000000000491993 in child () at /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/fork.cc:60
#13 test () at /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/fork.cc:80
#14 0x00000000004924b4 in main () at /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/fork.cc:91

From this backtrace, it looks like the deadlock is in libunwind where dl_interate_phdr is waiting on a lock.

Good catch!
So, the slow unwinder is not async signal safe. This makes _any_ store of uninit value not async signal safe with chained-origins on platforms w/o the fast unwinder (mips, sparc).

I think we should play it safe and disable the slow unwinder in __msan_chain_origin. If there is no fast unwinder for the platform, the next best thing would be reducing store_context_size to 1 (or is it 0? basically, just recording the current PC). This would not be great for the user experience, but the alternative is random freezes in forking applications as well as in signal handlers.

slthakur updated this revision to Diff 80058.Dec 2 2016, 4:29 AM
slthakur updated this object.
slthakur edited edge metadata.

As suggested, reduced store_context_size to 1 when slow unwinder is being used.

Phabricator shows lots of affected files with empty diffs for some reason.

lib/msan/msan.h
340

max(1, flags()->store_context_size)

341

and this can be just "false"

test/msan/chained_origin.cc
16

I can't find any use of CHECK-(NON-)MIPS-FRAMES-HEAP in this test.

test/msan/chained_origin_limits.cc
164

Hm, this test expects three instances of "Uninitialized value was stored to memory at" on non-mips, and only one instance on mips. Why is that?

test/msan/lit.cfg
42

fast unwinder is also unavailable on sparc, please include that here

43

i don't think these braces around "CHECK-MIPS-FRAMES" are necessary

45

Instead of mips and non-mips, let's call it something neutral, like
%short-stack -> "SHORT-STACK" on mips and sparc, and "" on other platforms.
Then any affected CHECK can be given a -%short-stack suffix.

slthakur updated this revision to Diff 80387.Dec 6 2016, 1:34 AM
slthakur edited edge metadata.
slthakur marked 5 inline comments as done.

Addressed review comments

lib/msan/msan.h
340

max(1, flags()->store_context_size) will return the value of flags()->store_context_size which is 1000 in the test case fork.cc. This will invoke the slow unwinder and cause deadlock.

test/msan/chained_origin_limits.cc
164

On architectures with short stack, due to reduced origin context, all the stacks in the chain are same because the stack trace contains only 1 frame and does not contain frames upto the functions (fn1, fn2, fn3) from where the uninitialized stores actually originate. And since origin_history_per_stack_limit is set to 1 we only get 1 instance of "Uninitialized value was stored to memory at" per stack. Therefore even though the uninitialized stores come from 3 different functions we only have one stack trace. In case of architectures having full stack, there are 3 different stacks because the uninitialized stores originate from 3 different functions in the chain and therefore we can see 1 instance (per stack) of "Uninitialized value was stored to memory at".

The full stack (for architectures with full origin context) which is different from the 2 other stacks in the chain is :

Uninitialized value was stored to memory at

#0 0xaaaef00f10 in __msan_memmove /home/slt/LLVM/llvm/projects/compiler-rt/lib/msan/msan_interceptors.cc:1480:3
#1 0xaaaef9ccd4 in line_flush /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/chained_origin_limits.cc:90:5
#2 0xaaaef9ccd4 in buffered_write /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/chained_origin_limits.cc:102
#3 0xaaaef9ccd4 in fn3 /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/chained_origin_limits.cc:115
#4 0xaaaef9ccd4 in main /home/slt/LLVM/llvm/projects/compiler-rt/test/msan/chained_origin_limits.cc:123

Whereas the short stack (for architectures with short origin context) which is same as all other stacks in the chain is :

Uninitialized value was stored to memory at

#0 0xaab04dcf20 in __msan_memmove /home/slt/LLVM/llvm/projects/compiler-rt/lib/msan/msan_interceptors.cc:1480:3
test/msan/lit.cfg
42

Msan is not supported on sparc architecture. Do you still want me to include it ?

45

If we use "" in place of %short-stack for architectures which have full stack, then we will have to check the output lines for the full stack with the "CHECK" prefix . And these full stack frame lines will also be checked for architectures which have short stack because most of the tests have "--check-prefix=CHECK" for checking output lines other than stack frames.

Therefore it is important to distinguish between architectures having short stack and architectures having full stack context. So I have used
%short-stack -> "CHECK-FULL-STACK" substitution for architectures having full stack size. Please let me know if this should be done in a different way.

eugenis accepted this revision.Dec 6 2016, 1:59 PM
eugenis edited edge metadata.

LGTM w/ 2 minor comments

lib/msan/msan.h
340

Sorry, I meant min().
store_context_size = 0 should keep working.

test/msan/chained_origin_limits.cc
164

Yeah, I've completely forgot about this feature. Please add a short comment to the test case to avoid future questions.

test/msan/lit.cfg
42

No, that's fine.

45

agreed.

This revision is now accepted and ready to land.Dec 6 2016, 1:59 PM
slthakur updated this revision to Diff 80718.Dec 7 2016, 10:29 PM
slthakur edited edge metadata.

Addressed review comments

slthakur closed this revision.Dec 7 2016, 10:42 PM

Committed in rL289027.