This is an archive of the discontinued LLVM Phabricator instance.

[asan] Fix GC of FakeFrames
ClosedPublic

Authored by vitalybuka on Aug 9 2023, 2:58 PM.

Details

Summary

When FakeStack GC from altstack, it may see default stack on lower
addressed and incorectly disard all frames.

Fixes bug exposed by D153536.

Diff Detail

Event Timeline

vitalybuka created this revision.Aug 9 2023, 2:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2023, 2:58 PM
Herald added a subscriber: Enna1. · View Herald Transcript
vitalybuka requested review of this revision.Aug 9 2023, 2:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2023, 2:58 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
fmayer added inline comments.Aug 9 2023, 3:05 PM
compiler-rt/lib/asan/asan_fake_stack.cpp
138

just confirming (not that familiar with ASan) that this leads to false negatives, correct?

vitalybuka added inline comments.Aug 9 2023, 3:12 PM
compiler-rt/lib/asan/asan_fake_stack.cpp
138

No, nested stacks may still cause false positives, if we discard frame, which is actually alive.

vitalybuka updated this revision to Diff 548781.Aug 9 2023, 3:15 PM

update comment

kstoimenov accepted this revision.Aug 9 2023, 3:58 PM
kstoimenov added inline comments.
compiler-rt/lib/asan/asan_fake_stack.cpp
165–166

Add a comment what is the intent of this code.

This revision is now accepted and ready to land.Aug 9 2023, 3:58 PM
vitalybuka marked an inline comment as done.Aug 9 2023, 8:46 PM
This revision was landed with ongoing or failed builds.Aug 9 2023, 8:47 PM
This revision was automatically updated to reflect the committed changes.
ro added a subscriber: ro.Aug 10 2023, 12:18 AM

The new test SEGVs on Solaris/amd64. I haven't started investigating yet.

ro added a comment.Aug 10 2023, 1:38 AM
In D157552#4575681, @ro wrote:

The new test SEGVs on Solaris/amd64. I haven't started investigating yet.

I've digged some more: here's the stacktrace from the SEGV:

Thread 3 received signal SIGSEGV, Segmentation fault.
0x08158265 in Fn<715u> ()
    at /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/test/asan/TestCases/Posix/fake_stack_gc.cpp:23
23	template <size_t N> void Fn() {
1: x/i $pc
=> 0x8158265 <_Z2FnILj715EEvv+101>:	movl   $0x41b58ab3,(%eax)
(gdb) bt
#0  0x08158265 in Fn<715u> ()
    at /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/test/asan/TestCases/Posix/fake_stack_gc.cpp:23
#1  0x08158187 in Fn<716u> ()
    at /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/test/asan/TestCases/Posix/fake_stack_gc.cpp:27
[...]
#285 0x0813e067 in Fn<1000u> ()
    at /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/test/asan/TestCases/Posix/fake_stack_gc.cpp:27
#286 0x0813d848 in Handler (signo=6)
    at /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/test/asan/TestCases/Posix/fake_stack_gc.cpp:36
#287 <signal handler called>
#288 0xfe15e145 in __lwp_sigqueue () from /lib/libc.so.1
#289 0xfe155eaf in thr_kill () from /lib/libc.so.1
#290 0xfe08d9c2 in raise () from /lib/libc.so.1
#291 0xfe05b094 in abort () from /lib/libc.so.1
#292 0x0813d7f8 in Thread (arg=0xfdafe800)
    at /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/test/asan/TestCases/Posix/fake_stack_gc.cpp:64
#293 0x0810315c in asan_thread_start ()
    at /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/lib/asan/asan_interceptors.cpp:234
#294 0xfe158a9b in _thrp_setup () from /lib/libc.so.1
#295 0xfe158dd0 in ?? () from /lib/libc.so.1
#296 0x00000000 in ?? ()

At that point, %eax is 0xfdafc6c0. If you look at the process mappings with pmap -x, you get:

4093:	/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/stage1/p
 Address Kbytes  RSS Anon Lock Mode     Mapped File
08050000   1332  836    -    - r-x----  [ text ] fake_stack_gc.cpp.tmp
081AC000    188  188   16    - rwx----  [ data ] fake_stack_gc.cpp.tmp
081DB000   9160  232  232    - rwx----  [ data ] fake_stack_gc.cpp.tmp
08ACD000      4    4    4    - rwx----  [ heap ]
21010000      4    4    4    - rw--R-E  [ anon ]
[...]
FD9FA000   1032   20   20    - rw-----  [ anon ]
FDAFD000   1032 1032 1032    - rw-----  [ anon ]
FDC00000   2184    -    -    - rw-----  [ anon ]

The mapping at 0xFD9FA000 is from FD9FA000 to FDAFC000. So %eax points to unmapped memory, thus a stack overflow. It seems the test assumes larger stacks somehow.

@vitalybuka could you please take a look at the failure on the macOS bot that Steven highlighted above?

I marked it as Unsupported on Darwin for now while it's being investigated. Please let me know if you can resolve it.

Thanks for feedback. Probably fixed and reenabled on Darwin.
Recursion stack usage was expected to be least 2Mb (more with redzones), but I allocated only 1Mb. Strange that it consistently passed on Linux x86_64, aarch64 bots.

ro added a comment.Aug 12 2023, 1:57 AM

Recursion stack usage was expected to be least 2Mb (more with redzones), but I allocated only 1Mb. Strange that it consistently passed on Linux x86_64, aarch64 bots.

That patch also fixed the Solaris/amd64 buildbot. Thanks.

Thanks for feedback. Probably fixed and reenabled on Darwin.

Unfortunately it looks like it still fails on Darwin even after that change:

<stdin>:1:1: note: scanning from here
Assertion failed: (pthread_attr_setstack(&attr, main_stack, kStackSize) == 0), function main, file /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/compiler-rt/test/asan/TestCases/Posix/fake_stack_gc.cpp, line 79.
^

https://green.lab.llvm.org/green/job/clang-stage1-RA/35285/testReport/junit/AddressSanitizer-x86_64-darwin/TestCases_Posix/fake_stack_gc_cpp/

Could you please take another look?