This is an archive of the discontinued LLVM Phabricator instance.

[sanitizer] [SystemZ] Fix stack traces.
ClosedPublic

Authored by koriakin on Apr 8 2016, 3:21 AM.

Details

Summary

On s390, the return address is in %r14, which is saved 14 words from
the frame pointer.

Unfortunately, there's no way to do a proper fast backtrace on SystemZ
with current LLVM - the saved %r15 in fixed-layout register save
area points to the containing frame itself, and not to the next one.
Likewise for %r11 - it's identical to %r15, unless alloca is used
(and even if it is, it's still useless). There's just no way to
determine frame size / next frame pointer. -mbackchain would fix that
(and make the current code just work), but that's not yet supported
in LLVM. The end result is that stack traces show only 2 functions.
We will thus need to XFAIL some asan tests
(Linux/stack-trace-dlclose.cc, deep_stack_uaf.cc).

Diff Detail

Repository
rL LLVM

Event Timeline

koriakin retitled this revision from to [sanitizer] [SystemZ] Fix stack traces..
koriakin updated this object.
koriakin set the repository for this revision to rL LLVM.
koriakin added a project: Restricted Project.
koriakin added a subscriber: llvm-commits.
aizatsky accepted this revision.Apr 14 2016, 1:52 PM
aizatsky added a reviewer: aizatsky.
aizatsky added a subscriber: aizatsky.

LGTM

This revision is now accepted and ready to land.Apr 14 2016, 1:52 PM
This revision was automatically updated to reflect the committed changes.

@uweigand I have committed this as-is, but what do you think about the backchain issue? Should we prioritize adding -mbackchain support before ASan, or just XFAIL these tests?

eugenis edited edge metadata.Apr 14 2016, 2:27 PM

Also, if we know that the 3rd frame is always garbage, maybe break unwind after the second step?

Also, if we know that the 3rd frame is always garbage, maybe break unwind after the second step?

The sanitizers are also going to be used by gcc, which supports -mbackchain just fine and will give proper backtraces.

uweigand edited edge metadata.Apr 14 2016, 2:30 PM

@uweigand I have committed this as-is, but what do you think about the backchain issue? Should we prioritize adding -mbackchain support before ASan, or just XFAIL these tests?

Just adding -mbackchain isn't going to help much, since on all current SystemZ Linux distros the whole system -including all libraries- is built without backchain, so you'll always run into code without backchain.

Generally, the only safe way to unwind the stack is via DWARF CFI; we always default to -fasynchronous-unwind-tables, so .eh_frame should always have good CFI.

@uweigand I have committed this as-is, but what do you think about the backchain issue? Should we prioritize adding -mbackchain support before ASan, or just XFAIL these tests?

Just adding -mbackchain isn't going to help much, since on all current SystemZ Linux distros the whole system -including all libraries- is built without backchain, so you'll always run into code without backchain.

We can make -fsanitize=whatever force -mbackchain though (it's already done for -fno-omit-frame-pointer), which makes the main app traceable, plus add the flag to sanitizer build, which is all that really matters. And mixing non-sanitized libraries is quite sketchy for MSan and TSan either way...

Generally, the only safe way to unwind the stack is via DWARF CFI; we always default to -fasynchronous-unwind-tables, so .eh_frame should always have good CFI.

That is true, and it works in general - but IIRC one of the failing tests checks unwinding through a library that's already unloaded (it prints the stack trace of whatever allocated the problematic memory area, I forgot how it works exactly...). I'll take a closer look at why the fast unwinder is used in the other tests (this patch fixed 10 or so of them).

@uweigand I have committed this as-is, but what do you think about the backchain issue? Should we prioritize adding -mbackchain support before ASan, or just XFAIL these tests?

Just adding -mbackchain isn't going to help much, since on all current SystemZ Linux distros the whole system -including all libraries- is built without backchain, so you'll always run into code without backchain.

We can make -fsanitize=whatever force -mbackchain though (it's already done for -fno-omit-frame-pointer), which makes the main app traceable, plus add the flag to sanitizer build, which is all that really matters. And mixing non-sanitized libraries is quite sketchy for MSan and TSan either way...

Generally, the only safe way to unwind the stack is via DWARF CFI; we always default to -fasynchronous-unwind-tables, so .eh_frame should always have good CFI.

That is true, and it works in general - but IIRC one of the failing tests checks unwinding through a library that's already unloaded (it prints the stack trace of whatever allocated the problematic memory area, I forgot how it works exactly...). I'll take a closer look at why the fast unwinder is used in the other tests (this patch fixed 10 or so of them).

@uweigand I have debugged the issue - the failing tests involved

@uweigand I have committed this as-is, but what do you think about the backchain issue? Should we prioritize adding -mbackchain support before ASan, or just XFAIL these tests?

Just adding -mbackchain isn't going to help much, since on all current SystemZ Linux distros the whole system -including all libraries- is built without backchain, so you'll always run into code without backchain.

We can make -fsanitize=whatever force -mbackchain though (it's already done for -fno-omit-frame-pointer), which makes the main app traceable, plus add the flag to sanitizer build, which is all that really matters. And mixing non-sanitized libraries is quite sketchy for MSan and TSan either way...

Generally, the only safe way to unwind the stack is via DWARF CFI; we always default to -fasynchronous-unwind-tables, so .eh_frame should always have good CFI.

That is true, and it works in general - but IIRC one of the failing tests checks unwinding through a library that's already unloaded (it prints the stack trace of whatever allocated the problematic memory area, I forgot how it works exactly...). I'll take a closer look at why the fast unwinder is used in the other tests (this patch fixed 10 or so of them).

@uweigand The issue here is that DWARF CFI is *slow*. Since ASan does a stack trace on every malloc and free (to print it if someone oversteps the buffer later), it uses a "fast" unwinder for those by default, which naïvely walks the stack using fixed-location frame pointers. So we have three options here:

  1. Always use the DWARF unwinder (causes a HUGE perf hit in the ASan testsuite - it hasn't finished yet, but it seems to be about 10×),
  2. Implement -mbackchain and make -fsanitize=* imply it,
  3. Make peace with 2-element backtraces and XFAIL the tests.

I'm currently testing a patch for #1 and will submit it soon. WDYT?

@uweigand I have committed this as-is, but what do you think about the backchain issue? Should we prioritize adding -mbackchain support before ASan, or just XFAIL these tests?

Just adding -mbackchain isn't going to help much, since on all current SystemZ Linux distros the whole system -including all libraries- is built without backchain, so you'll always run into code without backchain.

We can make -fsanitize=whatever force -mbackchain though (it's already done for -fno-omit-frame-pointer), which makes the main app traceable, plus add the flag to sanitizer build, which is all that really matters. And mixing non-sanitized libraries is quite sketchy for MSan and TSan either way...

Generally, the only safe way to unwind the stack is via DWARF CFI; we always default to -fasynchronous-unwind-tables, so .eh_frame should always have good CFI.

That is true, and it works in general - but IIRC one of the failing tests checks unwinding through a library that's already unloaded (it prints the stack trace of whatever allocated the problematic memory area, I forgot how it works exactly...). I'll take a closer look at why the fast unwinder is used in the other tests (this patch fixed 10 or so of them).

@uweigand I have debugged the issue - the failing tests involved

@uweigand I have committed this as-is, but what do you think about the backchain issue? Should we prioritize adding -mbackchain support before ASan, or just XFAIL these tests?

Just adding -mbackchain isn't going to help much, since on all current SystemZ Linux distros the whole system -including all libraries- is built without backchain, so you'll always run into code without backchain.

We can make -fsanitize=whatever force -mbackchain though (it's already done for -fno-omit-frame-pointer), which makes the main app traceable, plus add the flag to sanitizer build, which is all that really matters. And mixing non-sanitized libraries is quite sketchy for MSan and TSan either way...

Generally, the only safe way to unwind the stack is via DWARF CFI; we always default to -fasynchronous-unwind-tables, so .eh_frame should always have good CFI.

That is true, and it works in general - but IIRC one of the failing tests checks unwinding through a library that's already unloaded (it prints the stack trace of whatever allocated the problematic memory area, I forgot how it works exactly...). I'll take a closer look at why the fast unwinder is used in the other tests (this patch fixed 10 or so of them).

@uweigand The issue here is that DWARF CFI is *slow*. Since ASan does a stack trace on every malloc and free (to print it if someone oversteps the buffer later), it uses a "fast" unwinder for those by default, which naïvely walks the stack using fixed-location frame pointers. So we have three options here:

  1. Always use the DWARF unwinder (causes a HUGE perf hit in the ASan testsuite - it hasn't finished yet, but it seems to be about 10×),
  2. Implement -mbackchain and make -fsanitize=* imply it,
  3. Make peace with 2-element backtraces and XFAIL the tests.

We have the exact same situation on linux/x86_64: system libraries are built w/o frame pointers. This is not a problem in practice at all - very few allocations that matter are done from system libraries. Simple cases like strdup are handled by intercepting the function and unwinding through the implementation in ASan runtime library instead. In the super rare cases where it matters, we have a runtime flag to turn on the dwarf unwind.

Dwarf unwind is too slow for malloc; it slows down malloc-intensive programs a lot (like 10x or more). This is a problem with real-world code, too, not just ASan test suite.

I vote for (2), but the behaviour should be the same as -fno-omit-frame-pointer which is currently not enabled automatically by ASan.

OK, in that case I agree with option (2). Implementing -mbackchain in itself doesn't hurt anyway, if only for compatibility with GCC.

[ Note that -fno-omit-frame-pointer doesn't suffice on s390x to enable unwinding; even though functions up the call stack will then use a frame pointer, and save it into their register save area, you still do not know without consulting DWARF CFI *where* that save area is (since it is at the top of the frame and you don't know the frame size). ]