Page MenuHomePhabricator

[ASan] Fix stack-overflow.cc test on PowerPC64 Linux
AcceptedPublic

Authored by foad on Dec 24 2014, 7:29 AM.

Details

Summary

On PowerPC64 Linux the stack-overflow.cc test fails intermittently with:

27505==AddressSanitizer CHECK failed: /home/buildbots/sanitizerslave1/sanitizer-ppc64-1/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cc:94 "(((uptr)&rl >= start && (uptr)&rl < end)) != (0)" (0x0, 0x0)

I have managed to catch this failure in the debugger, but only
occasionally, and so far only with ASAN_OPTIONS=use_sigaltstack=1
and unlimited stacks ("ulimit -s unlimited").

The problem occurs when GetThreadStackTopAndBottom tries to look up the
address of a local variable in /proc/maps. On Linux, the entry for the
stack in /proc/maps deliberately excludes the first page (the
"stack guard page"):

https://github.com/torvalds/linux/blob/c164e038eee805147e95789dddb88ae3b3aca11c/fs/proc/task_mmu.c#L285

But sometimes when we get to GetThreadStackTopAndBottom, we are already
in the guard page, so the test "(uptr)&rl >= start" fails. The fix is to
tweak the start address before this test, to try to undo the adjustment
that was done in /proc/maps.

Diff Detail

Event Timeline

foad updated this revision to Diff 17626.Dec 24 2014, 7:29 AM
foad retitled this revision from to [ASan] Fix stack-overflow.cc test on PowerPC64 Linux.
foad updated this object.
foad edited the test plan for this revision. (Show Details)
foad added reviewers: kcc, eugenis, samsonov.
foad added a subscriber: Unknown Object (MLST).
eugenis edited edge metadata.Dec 24 2014, 7:39 AM

The only call to GetThreadStackTopAndBottom I see is in AsanThread::Init(), and this test creates only one thread. Could you provide more details?

foad added a comment.Dec 28 2014, 1:32 PM

Here is the stack trace at the point of failure and the value of &rl:

(gdb) bt
#0  0x00003fffa7748a28 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x00003fffa774881c in .__sleep () from /lib64/libc.so.6
#2  0x0000000010144110 in __sanitizer::CheckFailed (
    file=0x10179600 "/home/foad/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cc", line=94, 
    cond=0x101796c8 "(((uptr)&rl >= start && (uptr)&rl < end)) != (0)", v1=0, v2=0)
    at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_common.cc:124
#3  0x000000001015d424 in __sanitizer::GetThreadStackTopAndBottom (at_initialization=true, stack_top=0x3fffe798f960, 
    stack_bottom=0x3fffe798f968)
    at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cc:94
#4  0x000000001015da44 in __sanitizer::GetThreadStackAndTls (main=true, stk_addr=0x3fffa72b0020, stk_size=0x3fffa72b0028, 
    tls_addr=0x3fffa72b0030, tls_size=0x3fffe798f9f0)
    at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cc:306
#5  0x000000001013fc60 in __asan::AsanThread::SetThreadStackAndTls (this=0x3fffa72b0000)
    at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_thread.cc:200
#6  0x000000001013f8a8 in __asan::AsanThread::Init (this=0x3fffa72b0000)
    at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_thread.cc:155
#7  0x000000001013fab8 in __asan::AsanThread::ThreadStart (this=0x3fffa72b0000, os_id=19870, signal_thread_is_registered=0x0)
    at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_thread.cc:169
#8  0x000000001013d9c8 in __asan::AsanInitInternal () at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:425
#9  0x000000001013dca0 in __asan_init_v5 () at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:509
#10 0x00003fffa7c37f88 in ._dl_init_internal () from /lib64/ld64.so.1
#11 0x00003fffa7c23d5c in ._dl_start_user () from /lib64/ld64.so.1
(gdb) up 3
#3  0x000000001015d424 in __sanitizer::GetThreadStackTopAndBottom (at_initialization=true, stack_top=0x3fffe798f960, 
    stack_bottom=0x3fffe798f968)
    at /home/foad/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cc:94
94	    CHECK((uptr)&rl >= start && (uptr)&rl < end);
(gdb) p/x $sp
$1 = 0x3fffe798f780
(gdb) p &rl
$2 = (rlimit *) 0x3fffe798f870

Here is the stack mapping from /proc/*/maps:

3fffe7990000-3fffe79a0000 rw-p 00000000 00:00 0                          [stack]

And from /proc/*/smaps:

3fffe7990000-3fffe79a0000 rw-p 00000000 00:00 0                          [stack]
Size:                128 kB
Rss:                 128 kB
Pss:                 128 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:       128 kB
Referenced:          128 kB
Anonymous:           128 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:       64 kB
MMUPageSize:          64 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac

Note that Size is 128k, but the difference between the start and end addresses is only 64k! This machine has 64k pages.

Linux's proc fs seems to think that there is a "stack guard page" at address 0x3fffe7980000, but I don't understand why, because it appears to be both readable and writable.

foad added a comment.EditedJan 2 2015, 4:52 AM

I can demonstrate the same failure on x86-64 by setting a low limit on the stack size:

$ clang -fsanitize=address ~/svn/llvm-project/compiler-rt/trunk/test/asan/TestCases/stack-overflow.cc -o stack-overflow
$ ulimit -S -s 12 # soft limit stack to 12k
$ ./stack-overflow
==2423==AddressSanitizer CHECK failed: /home/jay/svn/llvm-project/llvm/trunk/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cc:94 "(((uptr)&rl >= start && (uptr)&rl < end)) != (0)" (0x0, 0x0)
ASAN:SIGSEGV
=================================================================
==2423==ERROR: AddressSanitizer: stack-overflow on address 0x7fff10736fb8 (pc 0x7fe212b02883 bp 0x7fff10737060 sp 0x7fff10736f40 T0)
    <empty stack>

==2423==ABORTING

I get the "CHECK failed" message only intermittently, maybe 10% or 20% of the time, depending on the Linux kernel's stack address randomization (which needs to be enabled; make sure that /proc/sys/kernel/randomize_va_space is non-0).

On my PowerPC64 box the page size is 64k. If you run stack-overflow with unlimited stack, it will impose its own limit of 128k, which is only two pages, which I think explains why the failure is more common on PowerPC: the address randomization can give you an initial sp near the beginning of the second page, so the stack quickly grows down into the first page, which is (mis?-)interpreted as a guard page by /proc/maps.

foad added a comment.Jan 2 2015, 5:43 AM

Here's my current theory about what's going wrong with /proc/maps:

  1. Whenever the kernel allocates some stack memory, it also tries to allocate an extra guard page just before the stack. But this can fail, e.g. if you have already reached RLIMIT_STACK. See mm/memory.c:check_stack_guard_page(), which calls mm/mmap.c:expand_downwards(), which calls acct_stack_growth() to check the rlimits.
  1. /proc/maps adjusts the start address of a stack mapping so as not to include the guard page, but it does this unconditionally, even if the attempt to allocate a guard page failed. See fs/proc/task_mmu.c:show_map_vma().
foad added a comment.Jan 5 2015, 1:23 AM

In the mean time, may I bump up the self-imposed stack limit in stack-overflow.cc from 128k to 256k ? That will make it pass on PowerPC64 the same way that does on x86-64: by luck!

kcc edited edge metadata.Jan 5 2015, 1:55 PM
In D6777#105332, @foad wrote:

In the mean time, may I bump up the self-imposed stack limit in stack-overflow.cc from 128k to 256k ? That will make it pass on PowerPC64 the same way that does on x86-64: by luck!

Yes, let's do it if it helps.

foad added a comment.Jan 6 2015, 1:25 AM
In D6777#105104, @foad wrote:

Here's my current theory about what's going wrong with /proc/maps:

  1. Whenever the kernel allocates some stack memory, it also tries to allocate an extra guard page just before the stack. But this can fail, e.g. if you have already reached RLIMIT_STACK. See mm/memory.c:check_stack_guard_page(), which calls mm/mmap.c:expand_downwards(), which calls acct_stack_growth() to check the rlimits.
  2. /proc/maps adjusts the start address of a stack mapping so as not to include the guard page, but it does this unconditionally, even if the attempt to allocate a guard page failed. See fs/proc/task_mmu.c:show_map_vma().

I've reported this to the kernel people here: http://lkml.iu.edu//hypermail/linux/kernel/1501.0/01025.html

foad added a comment.Jan 6 2015, 2:04 AM
In D6777#105597, @kcc wrote:
In D6777#105332, @foad wrote:

In the mean time, may I bump up the self-imposed stack limit in stack-overflow.cc from 128k to 256k ? That will make it pass on PowerPC64 the same way that does on x86-64: by luck!

Yes, let's do it if it helps.

OK, done in r225261.

eugenis accepted this revision.Jan 12 2015, 1:46 AM
eugenis edited edge metadata.

OK, the fix looks reasonable. Thanks for getting to the bottom of this!

This revision is now accepted and ready to land.Jan 12 2015, 1:46 AM
foad added a comment.Jan 12 2015, 2:23 AM

OK, the fix looks reasonable. Thanks for getting to the bottom of this!

Thanks. I will clean up the description and comments in the patch before committing.

Looks like patch was not committed.