This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC]Adapt fast stack unwinding to work for Power.
AbandonedPublic

Authored by seurer on Apr 24 2015, 10:55 AM.

Details

Summary

These changes allow Power to use fast stack unwinding. This is a huge performance benefit for asan.
For instance, using the testcase mentioned here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63927
these are sample runs with and without the change. The numbers are seconds taken to run that test.

With asan and existing "slow" stack unwinding:
size array vector_point vector_iters deque list set multiset
10 12.40 11.74 11.39 24.85 220.89 124.04 246.55
100 0.84 0.82 0.86 3.48 110.01 67.22 128.29
1000 0.35 0.32 0.33 1.67 74.42 47.36 89.42
10000 0.27 0.28 0.26 1.37 56.92 38.08 70.83
100000 0.26 0.26 0.26 1.03 45.02 32.75 58.08
1000000 0.26 0.21 0.26 0.90 37.96 30.64 53.24

With asan and now working "fast" stack unwinding:
size array vector_point vector_iters deque list set multiset
10 0.57 0.58 0.62 1.19 6.57 3.69 6.60
100 0.30 0.27 0.31 0.48 3.07 1.89 3.46
1000 0.31 0.28 0.29 0.36 2.26 1.28 2.44
10000 0.25 0.25 0.26 0.33 1.82 1.14 1.90
100000 0.26 0.25 0.24 0.28 1.75 1.20 2.01
1000000 0.25 0.21 0.26 0.28 2.28 1.93 3.22

Diff Detail

Event Timeline

seurer updated this revision to Diff 24401.Apr 24 2015, 10:55 AM
seurer retitled this revision from to [PowerPC]Adapt fast stack unwinding to work for Power. .
seurer updated this object.
seurer edited the test plan for this revision. (Show Details)
seurer added a subscriber: Unknown Object (MLST).
wschmidt edited edge metadata.Apr 24 2015, 1:56 PM

This looks ok to me -- you should probably add kcc or somebody else from the ASAN project to the review, though.

Also I should say that I'm really happy you were able to fix this. :D

seurer added a reviewer: kcc.Apr 24 2015, 2:22 PM
hfinkel edited edge metadata.Apr 24 2015, 6:52 PM

Can you please post this patch with full context; see: http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace.cc
93 ↗(On Diff #24401)

Please explain why.

seurer updated this revision to Diff 24471.Apr 27 2015, 6:45 AM
seurer edited edge metadata.

Updated diffs with full context. Sorry!

seurer updated this revision to Diff 24472.Apr 27 2015, 6:52 AM
hfinkel added inline comments.Apr 27 2015, 8:38 AM
/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace.h
52 ↗(On Diff #24472)

What is special about PPC that makes this true?

seurer added inline comments.Apr 27 2015, 9:06 AM
/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace.h
52 ↗(On Diff #24472)

The offset where the PC is stored is different on ppc than on whatever this was originally written for (from the comments possibly Macintosh).

The stack frame trace for "slow" unwinding does the "pop the frames until we get to pc" behavior so I duplicated that here. Without popping the extra frames many of the test cases fail. The checks leave the existing behavior alone for whatever else uses it.

seurer updated this revision to Diff 24556.Apr 28 2015, 9:08 AM

Updated comments to better explain popping of stack frames.

hfinkel added inline comments.May 5 2015, 8:32 PM
/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace.h
56 ↗(On Diff #24556)

Kostya, can you please look at this (or ping the appropriate person)?

I talked to Bill about this offline, and he explained that there is nothing PowerPC-specific about needing to pop these extra frames to match the regular unwinder (he says the same is needed on all platforms). Maybe we should be changing this in general, or this is a deliberate optimization that PowerPC should just match (and adjust test cases as necessary?).

I know this is extremely hard in this case, but please try to make a test for this change.

/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace.h
50 ↗(On Diff #24556)

Does this have to be in the header?

51 ↗(On Diff #24556)

if you need constants like this, name them kOffsetToPcInFrame

56 ↗(On Diff #24556)

I wrote this code initially, samsonov@ and eugenis@ touched it a few times.
This is probably the most fragile part in the entire family of tools -- hard to test, easy to break.

Since I don't know what problem you are solving on PPC it's hard for me to comment.
Can you describe the actual problem (difference from x86 case)?

eugenis added inline comments.May 6 2015, 2:27 PM
/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace.cc
93 ↗(On Diff #24556)

I don't fully understand why this is needed. In line 73 we have "trace_buffer[0] = pc" which is supposed to ensure that pc is, in fact, in the first stack frame. What is the effect of this call? Could you show the example of a stack trace with undesirable frames on top?

The stack popping makes the fast unwinding act closer to how slow unwinding works (though still not identical). I don't know how this works on x86 so I can't comment on that. I will move and rename the constants.

Here is an example. With slow unwinding:

68283==ERROR: AddressSanitizer: heap-use-after-free on address 0x0a100000eff0 at pc 0x000010103af8 bp 0x3ffff7640400 sp 0x3ffff7640420

WRITE of size 4 at 0x0a100000eff0 thread T0

#0 0x10103af4 in Func6(int, int*) /home/seurer/test/stack/crash.cpp:19:7
#1 0x10103c9c in Func5(int, int*) /home/seurer/test/stack/crash.cpp:27:10
#2 0x10103f3c in Func4(int, int*) /home/seurer/test/stack/crash.cpp:34:10
#3 0x101041dc in Func3(int, int*) /home/seurer/test/stack/crash.cpp:41:10
#4 0x1010447c in Func2(int, int*) /home/seurer/test/stack/crash.cpp:48:10
#5 0x1010471c in Func1(int, int*) /home/seurer/test/stack/crash.cpp:55:10
#6 0x101049dc in main /home/seurer/test/stack/crash.cpp:64:10
#7 0x3fff9c134cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287

0x0a100000eff0 is located 0 bytes inside of 4-byte region [0x0a100000eff0,0x0a100000eff4)
freed by thread T0 here:

#0 0x1010115c in operator delete(void*) /home/seurer/llvm/llvm-base/projects/compiler-rt/lib/asan/asan_new_delete.cc:94
#1 0x10104990 in main /home/seurer/test/stack/crash.cpp:62:3
#2 0x3fff9c134cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287

previously allocated by thread T0 here:

#0 0x1010096c in operator new(unsigned long) /home/seurer/llvm/llvm-base/projects/compiler-rt/lib/asan/asan_new_delete.cc:62
#1 0x10104954 in main /home/seurer/test/stack/crash.cpp:60:8
#2 0x3fff9c134cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287

With fast unwinding and no stack popping:

117763==ERROR: AddressSanitizer: heap-use-after-free on address 0x0a100000eff0 at pc 0x0000100ee418 bp 0x3fffdcc66870 sp 0x3fffdcc668d8

WRITE of size 4 at 0x0a100000eff0 thread T0

#0 0x100ee414 in Func6(int, int*) /home/seurer/test/stack/crash.cpp:19:7
#1 0x100ee5bc in Func5(int, int*) /home/seurer/test/stack/crash.cpp:27:10
#2 0x100ee85c in Func4(int, int*) /home/seurer/test/stack/crash.cpp:34:10
#3 0x100eeafc in Func3(int, int*) /home/seurer/test/stack/crash.cpp:41:10
#4 0x100eed9c in Func2(int, int*) /home/seurer/test/stack/crash.cpp:48:10
#5 0x100ef03c in Func1(int, int*) /home/seurer/test/stack/crash.cpp:55:10
#6 0x100ef2fc in main /home/seurer/test/stack/crash.cpp:64:10
#7 0x3fffa9574cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287

0x0a100000eff0 is located 0 bytes inside of 4-byte region [0x0a100000eff0,0x0a100000eff4)
freed by thread T0 here:

#0 0x100ecd14 in operator delete(void*) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:94:3
#1 0x100ecda8 in GetStackTraceWithPcBpAndContext /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_stack.h:50:7
#2 0x100ecda8 in operator delete(void*) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:94
#3 0x100ef2b0 in main /home/seurer/test/stack/crash.cpp:62:3
#4 0x3fffa9574cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287
#5 0x3fffa9574ef4 in __libc_start_main /build/buildd/eglibc-2.19/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:93

previously allocated by thread T0 here:

#0 0x100ec5d4 in operator new(unsigned long) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62:35
#1 0x100ec668 in GetStackTraceWithPcBpAndContext /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_stack.h:50:7
#2 0x100ec668 in operator new(unsigned long) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62
#3 0x100ef274 in main /home/seurer/test/stack/crash.cpp:60:8
#4 0x3fffa9574cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287
#5 0x3fffa9574ef4 in __libc_start_main /build/buildd/eglibc-2.19/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:93

With fast unwinding and stack popping:

75053==ERROR: AddressSanitizer: heap-use-after-free on address 0x0a100000eff0 at pc 0x0000100ee518 bp 0x3fffed4849f0 sp 0x3fffed484a58

WRITE of size 4 at 0x0a100000eff0 thread T0

#0 0x100ee514 in Func6(int, int*) /home/seurer/test/stack/crash.cpp:19:7
#1 0x100ee6bc in Func5(int, int*) /home/seurer/test/stack/crash.cpp:27:10
#2 0x100ee95c in Func4(int, int*) /home/seurer/test/stack/crash.cpp:34:10
#3 0x100eebfc in Func3(int, int*) /home/seurer/test/stack/crash.cpp:41:10
#4 0x100eee9c in Func2(int, int*) /home/seurer/test/stack/crash.cpp:48:10
#5 0x100ef13c in Func1(int, int*) /home/seurer/test/stack/crash.cpp:55:10
#6 0x100ef3fc in main /home/seurer/test/stack/crash.cpp:64:10
#7 0x3fff9b9f4cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287

0x0a100000eff0 is located 0 bytes inside of 4-byte region [0x0a100000eff0,0x0a100000eff4)
freed by thread T0 here:

#0 0x100ece14 in operator delete(void*) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:94:3
#1 0x100ef3b0 in main /home/seurer/test/stack/crash.cpp:62:3
#2 0x3fff9b9f4cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287
#3 0x3fff9b9f4ef4 in __libc_start_main /build/buildd/eglibc-2.19/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:93

previously allocated by thread T0 here:

#0 0x100ec6d4 in operator new(unsigned long) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62:35
#1 0x100ef374 in main /home/seurer/test/stack/crash.cpp:60:8
#2 0x3fff9b9f4cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287
#3 0x3fff9b9f4ef4 in __libc_start_main /build/buildd/eglibc-2.19/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:93

One more thing I should mention is that the fast unwinding without popping the extra frames causes 7 test cases to fail though fixing them would probably not be all that difficult.

Failing Tests (7):

AddressSanitizer-powerpc64le-linux :: TestCases/Linux/stack-trace-dlclose.cc
AddressSanitizer-powerpc64le-linux :: TestCases/double-free.cc
AddressSanitizer-powerpc64le-linux :: TestCases/large_func_test.cc
AddressSanitizer-powerpc64le-linux :: TestCases/strncpy-overflow.cc
AddressSanitizer-powerpc64le-linux :: TestCases/use-after-delete.cc
AddressSanitizer-powerpc64le-linux :: TestCases/use-after-free-right.cc
AddressSanitizer-powerpc64le-linux :: TestCases/use-after-free.cc
eugenis edited edge metadata.May 18 2015, 10:45 AM

This is unexpected. Reading the code, operator new() does
GET_STACK_TRACE_MALLOC, which grabs the current PC (though
builtin_return_address) and frame address (builtin_frame_address)
and starts frame pointer-based unwind from there.

#0 0x100ec5d4 in operator new(unsigned long)

/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62:35

#1 0x100ec668 in GetStackTraceWithPcBpAndContext

/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_stack.h:50:7

#2 0x100ec668 in operator new(unsigned long)

/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62

#3 0x100ef274 in main /home/seurer/test/stack/crash.cpp:60:8
#4 0x3fffa9574cfc in generic_start_main

/build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287

#5 0x3fffa9574ef4 in __libc_start_main

/build/buildd/eglibc-2.19/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:93

Here, it looks like frame #0 is the one we got from GetCurrentPc(),
and frames 1 and 2 (inlined) correspond to the return address of the
frame of operator new() (at the point of GET_STACK_TRACE_MALLOC). I'm
not sure why it is, again, operator new(). Also, if there is a real
extra stack frame there, this is not guaranteed to succeed, because we
build ASan runtime library without -fno-omit-frame-pointer.

I wonder if there is a problem with __builtin_frame_address on
PowerPC, or if ALWAYS_INLINE on GetStackTraceWithPcBpAndContext did
not have any effect, or something like that?

Interesting. I tried this on an x86 box and it removed "main" from the stack for my little test (same one I ran on Power) for new and delete.

On x86 with the stack popping (the code as-is):

==10965==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200000eff0 at pc 0x0000004dd5b8 bp 0x7fff42515970 sp 0x7fff42515968
WRITE of size 4 at 0x60200000eff0 thread T0

#0 0x4dd5b7 in Func6(int, int*) /home/seurer/test/stack/crash.cpp:19:7
#1 0x4dd6d6 in Func5(int, int*) /home/seurer/test/stack/crash.cpp:27:10
#2 0x4dd886 in Func4(int, int*) /home/seurer/test/stack/crash.cpp:34:10
#3 0x4dda36 in Func3(int, int*) /home/seurer/test/stack/crash.cpp:41:10
#4 0x4ddbe6 in Func2(int, int*) /home/seurer/test/stack/crash.cpp:48:10
#5 0x4ddd96 in Func1(int, int*) /home/seurer/test/stack/crash.cpp:55:10
#6 0x4ddf63 in main /home/seurer/test/stack/crash.cpp:64:10
#7 0x3af5a1ed5c in __libc_start_main (/lib64/libc.so.6+0x3af5a1ed5c)
#8 0x41b324 in _start (/home/seurer/test/stack/crash-test+0x41b324)

0x60200000eff0 is located 0 bytes inside of 4-byte region [0x60200000eff0,0x60200000eff4)
freed by thread T0 here:

#0 0x4db0d0 in operator delete(void*) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:94
#1 0x3af5a1ed5c in __libc_start_main (/lib64/libc.so.6+0x3af5a1ed5c)

previously allocated by thread T0 here:

#0 0x4daad0 in operator new(unsigned long) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62
#1 0x3af5a1ed5c in __libc_start_main (/lib64/libc.so.6+0x3af5a1ed5c)

With the stack popping removed:

==14038==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200000eff0 at pc 0x0000004dd398 bp 0x7fffa7320310 sp 0x7fffa7320308
WRITE of size 4 at 0x60200000eff0 thread T0

#0 0x4dd397 in Func6(int, int*) /home/seurer/test/stack/crash.cpp:19:7
#1 0x4dd4b6 in Func5(int, int*) /home/seurer/test/stack/crash.cpp:27:10
#2 0x4dd666 in Func4(int, int*) /home/seurer/test/stack/crash.cpp:34:10
#3 0x4dd816 in Func3(int, int*) /home/seurer/test/stack/crash.cpp:41:10
#4 0x4dd9c6 in Func2(int, int*) /home/seurer/test/stack/crash.cpp:48:10
#5 0x4ddb76 in Func1(int, int*) /home/seurer/test/stack/crash.cpp:55:10
#6 0x4ddd43 in main /home/seurer/test/stack/crash.cpp:64:10
#7 0x3af5a1ed5c in __libc_start_main (/lib64/libc.so.6+0x3af5a1ed5c)
#8 0x41b324 in _start (/home/seurer/test/stack/crash-test+0x41b324)

0x60200000eff0 is located 0 bytes inside of 4-byte region [0x60200000eff0,0x60200000eff4)
freed by thread T0 here:

#0 0x4daeb0 in operator delete(void*) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:94
#1 0x4ddd07 in main /home/seurer/test/stack/crash.cpp:62:3
#2 0x3af5a1ed5c in __libc_start_main (/lib64/libc.so.6+0x3af5a1ed5c)

previously allocated by thread T0 here:

#0 0x4da8b0 in operator new(unsigned long) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62
#1 0x4ddcd4 in main /home/seurer/test/stack/crash.cpp:60:8
#2 0x3af5a1ed5c in __libc_start_main (/lib64/libc.so.6+0x3af5a1ed5c)
seurer added a comment.Jun 5 2015, 8:01 AM

I did some additional investigation on this.

The "second" copy of the calling function that appears on the stack ("new" and "delete" above) are caused by the code on lines 73 and 74 of sanitizer_stacktrace.cc.

trace_buffer[0] = pc;
size = 1;

That can be avoided but I am not sure if it is necessary for other platforms and in any case it still works as-is.

With that bit of code removed

// trace_buffer[0] = pc;

size = 0;

then the trace comes out as

#0 0x10104888 in __asan::GetStackTraceWithPcBpAndContext(__sanitizer::BufferedStackTrace*, unsigned long, unsigned long, unsigned long, void*, bool) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_stack.h:50
#1 0x10104888 in operator delete(void*) /home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:94
#2 0x10106fb4 in main (/home/seurer/test/stack/crash-small+0x10106fb4)
#3 0x3fffa9ab4dfc  (/lib/powerpc64le-linux-gnu/libc.so.6+0x24dfc)
#4 0x3fffa9ab4ff0 in __libc_start_main (/lib/powerpc64le-linux-gnu/libc.so.6+0x24ff0)

The question is where does __asan::GetStackTraceWithPcBpAndContext come from?

I looked through the inliner output and it shows that GetStackTraceWithPcBpAndContext is being inlined. I also stepped through the compiled code in the debugger and it too shows it was inlined. Note that GetStackTraceWithPcBpAndContext and operator delete show the same address (0x10104888) above in the trace which is weird.

So something odd is going on (in the debugger maybe?) but the same thing happens with the slow unwinder too. That is maybe something that can be investigated but it doesn't really affect this code which is written to match what the slow unwinder does.

seurer updated this revision to Diff 27446.Jun 10 2015, 8:58 AM
seurer edited edge metadata.

This is a reworking of the fast unwind changes for power. There is now no popping of "extra" stack frames.

10 test cases that check for specific stack locations had to be adjusted to allow for an extra frame.

samsonov edited edge metadata.Jun 11 2015, 11:33 AM

I don't think we should do this. Our fast unwinder *should* produce stack traces that have the correct top frame. If we start unwinding from PC inside free() interceptor, then free() should be frame #0.

We seem to be kind of stuck here. Either we remove the inlined function leaving the function that started the unwinding (free and etc.) as the top frame in the stack which was my previous solution or we don't remove it which is this solution. Which should we go for?

We seem to be kind of stuck here. Either we remove the inlined function leaving the function that started the unwinding (free and etc.) as the top frame in the stack which was my previous solution or we don't remove it which is this solution. Which should we go for?

I agree with Alexey. I feel that your earlier solution was correct. That solution is correct for PowerPC and does not harm other architectures. At this point it would be nice if we could get a resolution on this issue -- it's been held up for a very long time.

Thanks,
Bill

I agree that the earlier solution is better than relaxing tests to meet the broken unwinder behavior. Still, it papers over the problem instead of fixing it. As I explained in earlier comments, this code does not make sense; unwinding starts from exactly the frame we want it to start and the should be no extra frames to remove. This means that something else is broken, maybe the way we obtain current frame address? Also note that this extra frame means we made one step of frame pointer based unwind through sanitizer library code, which is intentionally built w/o frame pointers. This can break at any time.

I accept that this could be the easiest way to work around the (unknown) problem, and a platform-specific #ifdef is not the end of the world. But I'd like to understand (and leave a comment in the code) what is actually going on there.

I looked into why the extra frame was showing up when I first looked at this but did not see anything obvious. Unfortunately I have been tasked with some other work right now and don't time for any further in-depth investigation on it.

So if things can be accepted as-is I will make a note to investigate the cause later.

I'll let Evgeniy comment on that, but I feel that we shouldn't add a crutch here. Unwinder functionality is essential for ASan usability, so it's better to keep the problems we're aware of visible until we have a good understanding of their root causes.

Bill Schmidt had me try something: disable the stack popping in the slow unwinder for power to see what the popping code really does. Note that the unwanted function __asan::GetStackTraceWithPcBpAndContext appears in both slow and fast unwinders. So whatever is causing it to appear has been there all along and is already compensated for in the slow unwinder.

This is the stack for operator new *without popping* with the slow unwinder. Note that normally (i.e., with popping) the top 3 frames are removed.

#0 0x100ff688 in

__sanitizer::BufferedStackTrace::SlowUnwindStack(unsigned long, unsigned
int)
/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/sanitizer_common/sanitizer_unwind_posix_libcdep.cc:116

#1 0x100fc684 in __sanitizer::BufferedStackTrace::Unwind(unsigned

int, unsigned long, unsigned long, void*, unsigned long, unsigned long,
bool)
/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:66

#2 0x10104748 in

asan::GetStackTraceWithPcBpAndContext(sanitizer::BufferedStackTrace*, unsigned
long, unsigned long, unsigned long, void*, bool)
/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_stack.h:50

#3 0x10104748 in operator new(unsigned long)

/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62

#4 0x10107c30 in main /home/seurer/test/stack/crash-small.cpp:11:8
#5 0x3fff93b84dfc  (/lib/powerpc64le-linux-gnu/libc.so.6+0x24dfc)

And here it is *without popping* with the fast unwinder.

#0 0x10104768 in

asan::GetStackTraceWithPcBpAndContext(sanitizer::BufferedStackTrace*, unsigned
long, unsigned long, unsigned long, void*, bool)
/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_stack.h:50

#1 0x10104768 in operator new(unsigned long)

/home/seurer/llvm/llvm-test/projects/compiler-rt/lib/asan/asan_new_delete.cc:62

#2 0x10107c50 in main /home/seurer/test/stack/crash-small.cpp:11:8
#3 0x3fff97ba4dfc  (/lib/powerpc64le-linux-gnu/libc.so.6+0x24dfc)
#4 0x3fff97ba4ff0 in __libc_start_main

(/lib/powerpc64le-linux-gnu/libc.so.6+0x24ff0)

The same thing happens with the other functions, too.

Interesting. Apparently, this happens with symbolization enabled. Frame 0 and Frame 1 in the example above share the *same* PC 0x10104768. Which is somewhat weird - normally PC you should unwind from should be inside the interceptor (e.g. "operator new"), but not inside GetStackTraceWithPcBpAndContext() that is inlined into it.

seurer updated this revision to Diff 30743.Jul 27 2015, 3:14 PM
seurer edited edge metadata.

OK, I've been looking at a small example. I've taken the portion of Bill Seurer's patch that modifies the offset to the return address in the stack frame, but omitted the part that pops extra stack frames, so that I can get a look at why we get a bogus extra frame.

In the example, we produce this:

freed by thread T0 here:

#0 0x1010d988 in GetStackTraceWithPcBpAndContext /home/wschmidt/llvm/llvm-test2/projects/compiler-rt/lib/asan/asan_stack.h:50
#1 0x1010d988 in operator delete(void*) /home/wschmidt/llvm/llvm-test2/projects/compiler-rt/lib/asan/asan_new_delete.cc:94
#2 0x10110474 in main /home/wschmidt/seurer/tests/stack/crash-small.cpp:13:3
#3 0x3fffb7ad4cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287
#4 0x3fffb7ad4ef4 in __libc_start_main /build/buildd/eglibc-2.19/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:93

The issue is "frames" #0 and #1, which share an identical call address, but claim to be in different functions.

Let's start with a picture of the stack:

...callers of main...
0x3ffffffff410: stack frame for main()
0x3fffffffeb40: address of the BufferedStackTrace object created during operator delete(void*)
0x3fffffffeb20: stack frame for operator delete(void*)

Looking through all the various macro definitions, this is the order of events I see:

  • Inside operator delete(void*), we call StackTrace::GetCurrentPc(), which correctly returns the address following the call to StackTrace::GetCurrentPc(). This is 0x1010d910. It also calls GET_CURRENT_FRAME(), which equates to __builtin_frame_address(0), to obtain the address of the stack frame for operator delete(void*), which is 0x3fffffffeb20.
  • Operator delete(void*) then calls GetStackTraceWithPcBpAndContext(). This is inlined as expected, so no frame is stacked. The callee is passed 0x1010d910 as "pc", and 0x3fffffffeb20 as "bp".
  • GetStackTraceWithPcBpAndContext() calls stack->Unwind() on behalf of the uninitialized BufferedStackTrace object at 0x3fffffffeb40, passing the "pc" and "bp" values as above. This has the effect of storing the address following the call to Unwind() into operator delete(void*)'s stack frame at offset 16, i.e., at 0x3fffffffeb30. That address is 0x1010d98c. Observe that both 0x1010d910 ("pc") and 0x1010d98c (the saved return address) are addresses within the text of operator delete(void*).
  • BufferedStackTrace::Unwind() calls BufferedStackTrace::FastUnwindStack() with the same pc and bp parameters. This code pulls "pc1" out of the frame "bp" at offset 16, which is 0x3fffffffeb30. Thus pc1 = 0x1010d98c, whereas pc = 0x1010d910. There is code within FastUnwindStack() to check whether pc and pc1 are identical, presumably to determine whether they are in the same stack frame. This is a fallacious test, at least with the Power ABI. Even though both pc and pc1 are addresses within operator delete(void*), this test fails to detect this, and thus both addresses are included in the stack frame to be reported.

The only remaining question is why the two stack frames name different functions. This is simply a matter of the debug data attempting to track inlining. This library code is optimized, and the statement number information thus bounces around quite a lot. While stepping through the code with gdb, I saw gdb reporting that it was inside operator delete() for a few instructions, then that it was inside GetStackTraceWithPcBpAndContext() for a few instructions, and back and forth. It just so happens that 0x1010d910 was associated with a statement number from the inlined function, so we got the different name.

My conclusion is that the code in BufferedStackTrace::FastUnwindStack() that compares pc with pc1 isn't doing what it intends to do. I am frankly not entirely sure what it is intending to do. If that can be fixed, then perhaps Bill Seurer's code to change the amount of frame popping is not necessary. If it can't, then it seems to me to be as good a solution as any to avoid misreporting the same stack frame twice.

OK, I've been looking at a small example. I've taken the portion of Bill Seurer's patch that modifies the offset to the return address in the stack frame, but omitted the part that pops extra stack frames, so that I can get a look at why we get a bogus extra frame.

In the example, we produce this:

freed by thread T0 here:

#0 0x1010d988 in GetStackTraceWithPcBpAndContext /home/wschmidt/llvm/llvm-test2/projects/compiler-rt/lib/asan/asan_stack.h:50
#1 0x1010d988 in operator delete(void*) /home/wschmidt/llvm/llvm-test2/projects/compiler-rt/lib/asan/asan_new_delete.cc:94

^^
Why do we output two identical (0x1010d988) addresses here, if you refer to them as 0x1010d98c and 0x1010d910 later?
We do call StackTrace::GetPreviousInstructionPC() when we symbolize addresses, but why isn't second address turned into 0x1010d99c?

#2 0x10110474 in main /home/wschmidt/seurer/tests/stack/crash-small.cpp:13:3
#3 0x3fffb7ad4cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287
#4 0x3fffb7ad4ef4 in __libc_start_main /build/buildd/eglibc-2.19/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:93

The issue is "frames" #0 and #1, which share an identical call address, but claim to be in different functions.

Let's start with a picture of the stack:

...callers of main...
0x3ffffffff410: stack frame for main()
0x3fffffffeb40: address of the BufferedStackTrace object created during operator delete(void*)
0x3fffffffeb20: stack frame for operator delete(void*)

Looking through all the various macro definitions, this is the order of events I see:

  • Inside operator delete(void*), we call StackTrace::GetCurrentPc(), which correctly returns the address following the call to StackTrace::GetCurrentPc(). This is 0x1010d910. It also calls GET_CURRENT_FRAME(), which equates to __builtin_frame_address(0), to obtain the address of the stack frame for operator delete(void*), which is 0x3fffffffeb20.
  • Operator delete(void*) then calls GetStackTraceWithPcBpAndContext(). This is inlined as expected, so no frame is stacked. The callee is passed 0x1010d910 as "pc", and 0x3fffffffeb20 as "bp".
  • GetStackTraceWithPcBpAndContext() calls stack->Unwind() on behalf of the uninitialized BufferedStackTrace object at 0x3fffffffeb40, passing the "pc" and "bp" values as above. This has the effect of storing the address following the call to Unwind() into operator delete(void*)'s stack frame at offset 16, i.e., at 0x3fffffffeb30. That address is 0x1010d98c. Observe that both 0x1010d910 ("pc") and 0x1010d98c (the saved return address) are addresses within the text of operator delete(void*).
  • BufferedStackTrace::Unwind() calls BufferedStackTrace::FastUnwindStack() with the same pc and bp parameters. This code pulls "pc1" out of the frame "bp" at offset 16, which is 0x3fffffffeb30. Thus pc1 = 0x1010d98c, whereas pc = 0x1010d910. There is code within FastUnwindStack() to check whether pc and pc1 are identical, presumably to determine whether they are in the same stack frame. This is a fallacious test, at least with the Power ABI. Even though both pc and pc1 are addresses within operator delete(void*), this test fails to detect this, and thus both addresses are included in the stack frame to be reported.

I don't know why pc1 and pc are compared, this code looks really weird to me. However, I'm pretty sure it's not intending to solve the problem you're facing (two top PCs coming from the same function). On x86_64 ABI two PCs from the same function will almost never be equal as well.

I think there's some confusion here. You mention that we call BufferedStackTrace::FastUnwindStack(pc = 0x1010d910, bp = 0x3fffffffeb20, ...), and 0x3fffffffeb20 is a frame of operator delete, 0x1010d910 is an instruction inside operator delete code. So far so good! But the value of pc1 we fetch from a function frame is supposed to be a *return address of operator delete*, not a return address of StackTrace::Unwind() called from operator delete. Probably, we're just reading incorrect value from the function frame? (that's weird, though, docs I found suggest that "link register" used to store return address is callee-saved, as on x86: https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/LowLevelABI/110-64-bit_PowerPC_Function_Calling_Conventions/64bitPowerPC.html#//apple_ref/doc/uid/TP40002471-SW14)

The only remaining question is why the two stack frames name different functions. This is simply a matter of the debug data attempting to track inlining. This library code is optimized, and the statement number information thus bounces around quite a lot. While stepping through the code with gdb, I saw gdb reporting that it was inside operator delete() for a few instructions, then that it was inside GetStackTraceWithPcBpAndContext() for a few instructions, and back and forth. It just so happens that 0x1010d910 was associated with a statement number from the inlined function, so we got the different name.

My conclusion is that the code in BufferedStackTrace::FastUnwindStack() that compares pc with pc1 isn't doing what it intends to do. I am frankly not entirely sure what it is intending to do. If that can be fixed, then perhaps Bill Seurer's code to change the amount of frame popping is not necessary. If it can't, then it seems to me to be as good a solution as any to avoid misreporting the same stack frame twice.

OK, I've been looking at a small example. I've taken the portion of Bill Seurer's patch that modifies the offset to the return address in the stack frame, but omitted the part that pops extra stack frames, so that I can get a look at why we get a bogus extra frame.

In the example, we produce this:

freed by thread T0 here:

#0 0x1010d988 in GetStackTraceWithPcBpAndContext /home/wschmidt/llvm/llvm-test2/projects/compiler-rt/lib/asan/asan_stack.h:50
#1 0x1010d988 in operator delete(void*) /home/wschmidt/llvm/llvm-test2/projects/compiler-rt/lib/asan/asan_new_delete.cc:94

^^
Why do we output two identical (0x1010d988) addresses here, if you refer to them as 0x1010d98c and 0x1010d910 later?
We do call StackTrace::GetPreviousInstructionPC() when we symbolize addresses, but why isn't second address turned into 0x1010d99c?

I haven't looked into this yet, but the address that's printed is the value of the saved link register - 4 (the call address, not the return address). That would be the same for both. That's just an artifact, and not the root problem. (see below)
<SNIP>

I don't know why pc1 and pc are compared, this code looks really weird to me. However, I'm pretty sure it's not intending to solve the problem you're facing (two top PCs coming from the same function). On x86_64 ABI two PCs from the same function will almost never be equal as well.

Agreed. Since this code isn't documented and makes no sense, you might want to experiment with removing the check and see why somebody put it in there in the first place...

I think there's some confusion here. You mention that we call BufferedStackTrace::FastUnwindStack(pc = 0x1010d910, bp = 0x3fffffffeb20, ...), and 0x3fffffffeb20 is a frame of operator delete, 0x1010d910 is an instruction inside operator delete code. So far so good! But the value of pc1 we fetch from a function frame is supposed to be a *return address of operator delete*, not a return address of StackTrace::Unwind() called from operator delete. Probably, we're just reading incorrect value from the function frame? (that's weird, though, docs I found suggest that "link register" used to store return address is callee-saved, as on x86: https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/LowLevelABI/110-64-bit_PowerPC_Function_Calling_Conventions/64bitPowerPC.html#//apple_ref/doc/uid/TP40002471-SW14)

Aha! This is where the problem comes from. The link register is indeed callee-saved, but it is saved by the callee at offset 16 from the base of the *caller's* stack frame prior to pushing the callee's frame. To find the correct return address on Power from the callee's stack frame, you must load the back chain pointer (located at the base of the callee's stack frame) and add 16 to that. See Figure 2-18 of the ELFv2 ABI at https://www-03.ibm.com/technologyconnect/tgcm/TGCMFileServlet.wss/ABI64BitOpenPOWERv1.1_16July2015_pub.pdf?id=B81AEC1A37F5DAF185257C3E004E8845. (Requires free registration. Let me know if you have trouble accessing this and I'll send you a copy.)

In prolog code on Power, you will see:

mflr r0
std r0,16(r1)

prior to decrementing r1 to establish the callee stack frame.

Thus I can fix this by making the following changes to Bill Seurer's patch:

  • Remove changes involving stack popping;
  • Remove the kFastUnwindAddPC change; and
  • Change calculation of pc1 as follows:
#ifdef __powerpc__
     uhwptr *caller_frame = (uhwptr*)frame[0];
     if (!IsValidFrame((uptr)caller_frame, stack_top, bottom))
       break;
     uhwptr pc1 = caller_frame[kOffsetToPcInFrame];
 #else
     uhwptr pc1 = frame[kOffsetToPcInFrame];
 #endif

Result with those changes:

freed by thread T0 here:

#0 0x1010d94c in operator delete(void*) /home/wschmidt/llvm/llvm-test2/projects/compiler-rt/lib/asan/asan_new_delete.cc:94
#1 0x101104b4 in main /home/wschmidt/seurer/tests/stack/crash-small.cpp:13:3
#2 0x3fffa1824cfc in generic_start_main /build/buildd/eglibc-2.19/csu/../csu/libc-start.c:287
#3 0x3fffa1824ef4 in __libc_start_main /build/buildd/eglibc-2.19/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:93

With these changes, the ASAN test suite passes on powerpc64le.