This is an archive of the discontinued LLVM Phabricator instance.

Wrong sanitizer handling of PCs in backtrace
Needs ReviewPublic

Authored by jakubjelinek on Feb 15 2017, 9:02 AM.

Details

Reviewers
kcc
koriakin
Summary

The null-deref-1.c testcase (from compiler-rt null_deref.cc) fails on s390x with -march=zEC12.
The problem is that sanitizer_common and asan doesn't care about important difference in PCs in backtraces.
Most of the PCs are return addresses after calls, and as such for unwind info as well as source locations (symbolization)
it is desirable to apply StackTrace::GetPreviousInstructionPc on them (though it is questionable if it should be applied
also to the addresses actually printed in the backtrace, e.g. GDB prints the return addresses instead), because at
the return address might be e.g. already start of a different function or unrelated code (especially for noreturn functions,
but generally you want the location of the call, not whatever instruction is after it too).
But, for synchronously delivered signals, such as SIGSEGV and various others, the PC you get from unwind info
or from siginfo of the fatal signal is not any kind of return pc, but the start of the instruction that couldn't be executed
because it faulted. If such instruction e.g. happens to be the very first one in the function, by subtracting one you might
appear in a completely unrelated function. And in any case, you want the location of the faulting instruction, not whatever
appears before it.

The unwinder has for many years a new API I've added, _Unwind_GetIPInfo, which has an extra argument that tells
what kind of PC it returns, if it is a PC after a call insn or PC before faulting instruction (signal frame).
Unfortunately right now the sanitizer backtrace code has recording of PCs into an array and calling
StackTrace::GetPreviousInstructionPc as separate steps quite far away, where all context is lost.

This patch is a hack, basically for PCs from the second category (i.e. which point before the faulting instruction)
it adds a small number to those so that when StackTrace::GetPreviousInstructionPc is called, it will return
the address of the faulting instruction rather than something before it. The printed hex PCs will be then
correct for these and still bogus (pointing to inside the call insn) otherwise (the latter as before).
If we want to fix that, I think we'd need to record the flags what kind of PCs it is in trace_buffer or in array next to it,
and then always use the PCs from the trace_buffer as is for printing, and for the return addresses
only use StackTrace::GetPreviousInstructionPc for symbolization purposes.

Diff Detail

Event Timeline

jakubjelinek created this revision.Feb 15 2017, 9:02 AM
kcc edited edge metadata.Feb 16 2017, 6:26 PM

Please:

  • no more #ifdefs int this part of code, at least in the first two hunks
  • no code duplications (the first two hunks look similar)
  • no C-style comments, use //
  • a test

Also, if possible, please try to change the logic from "harm, then unharm" to "don't harm"

In D29992#679484, @kcc wrote:

Please:

  • no more #ifdefs int this part of code, at least in the first two hunks
  • no code duplications (the first two hunks look similar)
  • no C-style comments, use //
  • a test

Also, if possible, please try to change the logic from "harm, then unharm" to "don't harm"

The patch has been written mainly as a hack for discussions on how to fix it for real (i.e. whether to use a bool [kStackTraceMax] array field next to trace_buffer, bitmap or whatever else.
Also, I don't have much time for this.
As for testcase, the testsuite already has one (the null_deref.cc), though just with -march=zEC12 on s390x.
A slight modification that makes it fail also on x86_64-linux is e.g.
-attribute((noinline))
+attribute((noinline, no_sanitize_address, noclone))
(noclone needed only for g++ at -O2 and higher, otherwisejust no_sanitize_address is enough).
With that I can reproduce the bug with g++ 6.3, trunk, clang++ 3.8.1 and 4.0.0 (r291659) - the faulting instruction
is then the very first one in NullDeref function and therefore one gets:

13538==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000028 (pc 0x000000400730 bp 0x000000400740 sp 0x7fff29334330 T0)

#0 0x40072f  (/tmp/null-deref+0x40072f)
#1 0x400606 in main /tmp/null_deref.cc:21

rather than

13544==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000028 (pc 0x000000400706 bp 0x7ffe2102a0f0 sp 0x7ffe2102a0f0 T0)

#0 0x400705 in NullDeref /tmp/null_deref.cc:15
#1 0x40071d in main /tmp/null_deref.cc:21

E.g. with clang++ -g -O2 the function is:
000000000050c470 <_ZL9NullDerefPi>:

50c470:       ff 04 25 28 00 00 00    incl   0x28
50c477:       c3                      retq

you get PC of 0x50c470 for that (before faulting insn) and that -1 already is outside of any function or if unlucky enough inside another function.