This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] bug 23600 - sanitizer stack trace pc off by 1
AbandonedPublic

Authored by msebor on May 27 2015, 10:10 AM.

Details

Reviewers
samsonov
Summary

The PC printed in sanitizer stack traces is consistently off by 1 byte or 1 instruction from the value printed by GDB and the value returned by the __builtin_return_address intrinsic function. The difference in the PC for the active frame when a stack trace is being generated in response to a signal causes lookups in DWARF line number programs to return the wrong line number for certain programs.

This change fixes the problem with the lookups and makes the sanitizer print PC values that are consistent with both GDB and __builtin_return_address.

Diff Detail

Event Timeline

msebor updated this revision to Diff 26610.May 27 2015, 10:10 AM
msebor retitled this revision from to [compiler-rt] bug 23600 - sanitizer stack trace pc off by 1.
msebor updated this object.
msebor edited the test plan for this revision. (Show Details)
msebor added a subscriber: Unknown Object (MLST).May 27 2015, 10:13 AM
kcc added a subscriber: kcc.

+samsonov for lit.common.cfg

No need for the "Changelog:" entry -- I personally see it as redundancy.

lib/sanitizer_common/sanitizer_stacktrace.h
47

I frankly don't remember is there is a reason to keep StackTrace <= 16 bytes.
Let me check later today.

84

I think the old code was intentional.
It cancels the thumb bit (whatever that is) and then executes the rest of the function.
Did you test it on arm?

lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc
37

I frankly don't understand this.
The old code was to call GetPreviousInstructionPc(), which on some archs subtracted 4 or 8.
Now you just subtract 1. Why?

kcc added inline comments.May 27 2015, 10:40 AM
test/sanitizer_common/TestCases/print-stack-trace-pc.cc
8

This is strange. The test should know where the symbolizer is,
other tests don't require this.

23

I think you can make this test simpler by using e.g. [[@LINE-1]] in the CHECK: line, see e.g.
test/asan/TestCases/unaligned_loads_and_stores.cc

Just add the CHECK lines closer to the code, then use a small constant offset, e.g. [[@LINE-4]]

27

This code repeats 3 times, right?
Maybe put it in a macro?

kcc added inline comments.May 27 2015, 11:24 AM
lib/sanitizer_common/sanitizer_stacktrace.h
47

Note that the StackTrace is copied into the StackDepot and the signalled bit will be lost there.
I'd guess that instead of adding a new field we should add a new tag type to lib/sanitizer_common/sanitizer_stacktrace.h, e.g. TAG_SIGNAL.
(If we need it at all)

test/sanitizer_common/TestCases/print-stack-trace-pc.cc
2

That's a great test to have, but it is not testing the case with signals, is it?

msebor added inline comments.May 27 2015, 2:17 PM
lib/sanitizer_common/sanitizer_stacktrace.h
84

This change isn't necessary for the fix, but I included it because it was trivial and obvious (the function otherwise is missing a return statement), and because I didn't manage to create a test case that made the bug manifest in an observable way. I tested the same change to the GCC sanitizer on arm with no regressions.

lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc
37

DWARF line number programs are represented as ranges of addresses that correspond to a given source line. When looking up a line number for a given PC value, the lookup compares the PC value to the beginning and end address of each range and considers it a match when the PC is anywhere in that range. It doesn't matter if the PC value points to the beginning of an instruction in the range or some byte past it. So subtracting 1 byte is just as good as subtracting the size of an instruction.

test/sanitizer_common/TestCases/print-stack-trace-pc.cc
8

As far as I can tell, no other ASan test depends on the symbolizer and the other sanitizers use their own symbolizer. I didn't spend too much time debugging this beyond simply making it work.

23

I'm afraid I don't see how to simplify the test by making use of the [[@LINE]] expressions. The test uses the XXX_LINE constants to set the line numbers printed by the first backtrace (the for loop in foo) and to set (via the #line directives) the corresponding line numbers printed by the sanitizer backtrace. The two sets of numbers must match. Can you flesh out in a bit more detail what you have in mind?

I tried changing the test to make use of [[@LINE]] and while it seems that it should be possible to do as you suggested I'm having trouble getting FileCheck to match the patterns for the second and subsequent pairs of CHECK directives (i.e., in bar and main). I spent quite a bit of time figuring out how to get CHECK to work the way I wanted to for the first patch and I'm not sure that investing even more time into changing it to use @LINE would be worth it.

27

Sure, I can add a macro for this.

msebor added inline comments.May 27 2015, 2:41 PM
test/sanitizer_common/TestCases/print-stack-trace-pc.cc
2

No, it isn't. It didn't seem too important since the code that generates the stack trace is the same in either case.

What seemed more important to me and what's not covered is testing the PC value for the active frame. Unfortunately, I'm not aware of a straightforward and portable way of getting the PC value in this case. I considered two approaches to verifying that the PC value for the active frame when the stack trace is generated in response to a signal:

  1. Invoke GDB and have it print the stack trace and use it for comparison.
  1. Handle the signal, extract and print the PC value in it, and raise it again to let the sanitizer generate its own.

I rejected (1) because I couldn't find any other tests that relied on GDB and introducing a dependency for just this one test seemed like overkill. Adding the machinery to invoke GDB also seemed like a nontrivial effort that I cannot justify.

(2) should be doable on POSIX using sigaction and siginfo_t but wouldn't work on non-POSIX targets. Since I don't have access to such targets I decided to keep the test simple at the expense of not exercising this case.

If you can suggest a portable way that's not overly involved I'd be happy to code it up.

msebor updated this revision to Diff 26636.May 27 2015, 2:47 PM

Added ADD_FRAME macro to the new print-stack-trace-pc.cc test.

samsonov added inline comments.May 27 2015, 2:50 PM
lib/sanitizer_common/sanitizer_stacktrace.h
84

I also think the original code was written like that intentionally - we're first canceling the thumb bit, and then subtract one. I agree that it's probably should be rewritten as

return (pc & (~1)) - 1;
lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc
37

We've added architecture-specific GetPreviousInstructionPc for a reason: we don't care at all about actual instruction sizes and addresses, we just need PC's that would be correctly handled by symbolizer, and subtracting one provided to be insufficient. As, for the line programs, suppose I have two consecutive instructions:
<instruction 1> at PC [0x100, 0x102], source line 42.
<instruction 2> at PC [0x104, 0x105], source line 43.

Note that instructions are 4-byte aligned. I can imagine the line table for this program matching [0x100,0x102] to line 42, and [0x104,0x105] to line 43. The lookup of 0x103 would not find any line table entry, and thus return no line number at all.

test/sanitizer_common/TestCases/print-stack-trace-pc.cc
8

All tests under compiler-rt/test/sanitizer_common depend on the llvm-symbolizer (the dependency is added in compiler-rt/test/CMakeLists.txt). It should be available in $PATH, and thus picked up by ASan/TSan/MSan.

kcc added inline comments.May 27 2015, 3:36 PM
lib/sanitizer_common/sanitizer_stacktrace.h
84

the function otherwise is missing a return statement

No it is not. There is an #else clause.

It's very sad that none of our tests cover this use case (although you can't be sure w/o running clang tests on arm),
but it is not a good reason to change the logic of the code.

test/sanitizer_common/TestCases/print-stack-trace-pc.cc
2

Yea, I'd like to keep gdb out of our tests, it's too heavy.
But I am totally find with having a posix-only or even a linux-only test that uses sigaction.
(put it into test/sanitizer_common/TestCases/Linux/)
Since the patch aims exactly at signals, it sounds like having such a test would be nice.

23

[[@LINE]] matches the current line number
[[@LINE-1]] matches the previous, and so on.
So, do something like this

DoSomething(LINE);
CHECK: ExpexctedFoo:[[@LINE-1]]
CHECK: UnrelatedStuff1
CHECK: UnrelatedStuff2
CHECK: ExpexctedBar:[[@LINE-4]]

You will not need #define FOO_LINE nor the #line directives

msebor added inline comments.Jun 5 2015, 5:05 PM
lib/sanitizer_common/sanitizer_stacktrace.h
84

You're right. I completely missed there are two sets of #if/#endf blocks. My bad! I'll remove the change.

lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc
37

I don't think the scenario you described can happen. After a branch/call, the PC points at the first byte of the instruction following the branch instruction (or following the delay slot). An address that's one less than that is that of the last byte of the branch instruction (or the instruction in the delay slot). There can't very well be a gap between the branch instruction and what follows it in the line number program, but if there was, it has to be dealt with by mapping addresses in the gap to the preceding address range. That's just what libbacktrace does: it stores the line number program as an array of triples (PC, file, line) sorted by PC. When looking up a line number for an address, it calls the bsearch function on the array with a callback that compares the address against the range denoted by the PC of current element and the PC of the element just past it (there's a sentinel element at the end of the array). I.e., no gaps are possible.

(As an aside, subtracting 1 on RISC is no different than doing the same on x86 where the call instruction is at least 2 or more bytes wide.)

All that being said, I don't think this part of the patch is essential (so far none of the tests failed after replacing the subtraction by 1 with GetPreviousInstructionPc). I'll do some more testing and if there are no regressions I'll submit a new patch with GetPreviousInstructionPc instead.

msebor added inline comments.Jun 9 2015, 2:48 PM
test/sanitizer_common/TestCases/print-stack-trace-pc.cc
8

I tried to debug the problem some more but wasn't able to reproduce it with the latest tests so I've removed the changes to this file from the latest patch.

msebor updated this revision to Diff 27402.Jun 9 2015, 2:55 PM
msebor updated this object.
msebor edited the test plan for this revision. (Show Details)

Added a new test, test/asan/TestCases/print-stack-trace-pc-on-signal.cc, to verify that the PC printed in stack traces generated in response to a signal matches the PC obtained in the signal handler for frame #0 and the PC returned from __builtin_return_address for frames #1 and beyond.
Removed an incorrect change to GetPreviousInstructionPc for ARM.
Removed unnecessary changes to lit.* files.

Are there any concerns with this version of the patch?

samsonov edited edge metadata.Jun 19 2015, 3:53 PM

Sorry for delayed response, I lost track of this patch for some reason =/

lib/sanitizer_common/sanitizer_stacktrace.h
47

I'm still concerned about us failing to propagate this field. StackTrace objects, for instance, are stored in / fetched from StackDepot, that would do nothing to save this "signaled" bit. If you use it only in Print method, perhaps you would now at call site where did you obtain this stack trace from, and whether you need to "adjust" the top frame.

That is, StackTrace::Unwind is not the right place to determine where the stack comes from - if context is missing, it means nothing - the context only serves as a hint for slow unwinder, and caller is not obliged to pass it down, if it has copied pc/bp from the context himself.

lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc
44

Why do you need to adjust PCs in the error report? This would break offline symbolization - scripts that process ASan reports and provide file/line information would *not* calculate GetPreviousInstructionPc() on their own - they rely on ASan to give them to PCs of call instructions, not return addresses we see in the stack trace.

MTC added a subscriber: MTC.Aug 16 2021, 7:30 PM
msebor abandoned this revision.Mar 28 2022, 4:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 4:13 PM