This is an archive of the discontinued LLVM Phabricator instance.

[Symbolizers] On Darwin compute function offset when possible.
ClosedPublic

Authored by delcypher on Oct 28 2019, 11:41 PM.

Details

Summary

The sanitizer symbolizers support printing the function offset
(difference between pc and function start) of a stackframe using the
%q format specifier.

Unfortunately this didn't actually work because neither the atos
or dladdr symbolizer set the AddressInfo::function_offset field.

This patch teaches both symbolizers to try to compute the function
offset. In the case of the atos symbolizer, atos might not report the
function offset (e.g. it reports a source location instead) so in this
case it fallsback to using dladdr() to compute the function offset.

Two test cases are included.

rdar://problem/56695185

Diff Detail

Event Timeline

delcypher created this revision.Oct 28 2019, 11:41 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 28 2019, 11:41 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
yln added a comment.Oct 30 2019, 1:03 PM

I have a few small comments. Looks good overall!

compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
32

If we fail to dladdr, stack->info.function_offset will not be assigned.
What is the initial value? garbage? 0x0, or kUnknown = ~(uptr)0?

What do we want here? I think 0x0 would be a good choice.
Would it be possible to construct a test case for this code path to pin down the behavior?

171

stack->info.function_offset is assigned unconditionally, which is inconsistent with DlAddrSymbolizer::SymbolizePC.

Also: kUnknown = ~(uptr)0 so I would expect the "invalid address case" to print 0xFFFF.
But our tests only check for 0x0:
// BADADDR-NOT: function_name:baz{{(\(\))?}} function_offset:0x0

Is it possible to construct a test for the fallback not working? (Same as above.)

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-atos.cpp
3

Another way to structure this test would be to

  1. compile only once (with -g) then
  2. RUN with debug info
  3. rm *.dSYM
  4. RUN same command, but now without debug info.

I think this would nicely communicate that without debug info---but with everything else being equal---offsets still work as expected.

5

Is verbosity=2 required?

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-dladdr.cpp
10

Could this be replaced with --implicit-check-not='function_offset:0x0 ?
Maybe also add --implicit-check-not='function_offset:0xF.
Also in above test.

delcypher marked 2 inline comments as done.Nov 7 2019, 4:30 PM
delcypher added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
32

The initial value is kUnknown. See sanitizer_symbolizer.cpp.

AddressInfo::AddressInfo() {
  internal_memset(this, 0, sizeof(AddressInfo));
  function_offset = kUnknown;
}

I don't know a way of writing a test case such that dladdr() fails. Perhaps stripping the binary would do it?

171

stack->info.function_offset is assigned unconditionally, which is inconsistent with DlAddrSymbolizer::SymbolizePC.

No it's not. We assign AddressInfo::kUnknown which is the default value of the function offset.

Also: kUnknown = ~(uptr)0 so I would expect the "invalid address case" to print 0xFFFF

The stacktrace printer prints 0x0 for kUnknown. See lib/sanitizer_common/sanitizer_stacktrace_printer.cpp.

But our tests only check for 0x0

Given my point above, this isn't relevant.

Is it possible to construct a test for the fallback not working? (Same as above.)

We have a test case for falling back to dlsym (when debug information is missing) but we don't have a test case for when dlsym fails.

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-atos.cpp
3

This might work. I'll give it a try.

5

This is for knowing which symbolizer is being used (e.g. CHECK: Using atos found at)

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-dladdr.cpp
10

Maybe. I've never used this feature. I had to do the check as two separate FileCheck invocations because otherwise FileCheck failed to detect function_offset:0x0.

delcypher marked an inline comment as done.Nov 7 2019, 4:33 PM
delcypher added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
171

No it's not. We assign AddressInfo::kUnknown which is the default value of the function offset.

Just to elaborate. It looks inconsistent but it actually isn't given how function_offset is initialized. Do you have a suggestion to make it more consistent. Maybe in DlAddrSymbolizer::SymbolizePC we should always assign to function_offset too?

yln added inline comments.Nov 8 2019, 9:51 AM
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
32

Got it, thanks for explaining. Let's ignore this test case for this change.

171

stack->info.function_offset is assigned unconditionally, which is inconsistent with DlAddrSymbolizer::SymbolizePC.

I meant: stack->info.function_offset = function_offset; is executed unconditionally.

Do you have a suggestion to make it more consistent. Maybe in DlAddrSymbolizer::SymbolizePC we should always assign to function_offset too?

I would flip it: only assign to it here, if we were able to compute something.

uptr start_address = AddressInfo::kUnknown;
// Attempt 1: fill via ParseCommandOutput

// Attempt 2/fallback: fill via dladdr
if (start_address == AddressInfo::kUnknown) {
  Dl_info info;
    int result = dladdr((const void *)addr, &info);
    if (result)
      start_address = reinterpret_cast<uptr>(info.dli_saddr);
}

// Only assign if we were able to compute start_address
if (start_address != AddressInfo::kUnknown) {
  CHECK(addr >= start_address);
  stack->info.function_offset = addr - start_address;
}
delcypher marked an inline comment as done.Nov 13 2019, 2:35 PM
delcypher added inline comments.
compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-dladdr.cpp
10

I tried using --implicit-check-not=. It doesn't work for this use case. This flag adds implicit CHECK-NOT in between patterns, it won't apply to lines that match the CHECK patterns. The existing patterns are the same lines we want to apply CHECK-NOT on but that doesn't work here. Initially I thought it was working but that's because the CHECK-NOT was firing on the __sanitizer_print_stack_trace on the top of the stacktrace.

delcypher marked an inline comment as done.Nov 13 2019, 3:07 PM
delcypher added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
32

I tried stripping the binary. It prevents us from getting the name of the function but it seems that dladdr is still able to determine where the start of the function is.

delcypher updated this revision to Diff 229194.Nov 13 2019, 3:09 PM
  • Rework test to use a single binary
  • Rework logic to only set function_offset when a useful value was found.

@yln I've re-worked this patch addressing everything I could.

  • Fix line length violation.
  • Fix broken test.
yln added inline comments.Nov 14 2019, 12:26 PM
compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-atos.cpp
9

Are the comments accidentally switched?
I thought:
debug -> atos works
no debug -> dladdr fallback

13

Same as below. Refined positive regex would allow for removal of BADADDR, second FileCheck. invocation, temporary file, etc.

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-dladdr.cpp
10

Ah, got it, this is the explicit negative case. Could we make the regex for the positive case smarter so it doesn't match 0x0? This would remove the need for the separate BADADDR FileCheck run and the temporary output file, etc.

delcypher marked 2 inline comments as done.Nov 14 2019, 2:01 PM
delcypher added inline comments.
compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-atos.cpp
9

No the comments are the right way round.

It there's debug info atos will report a source location without a function offset. In that case we'll fallback to dladdr.
If there's no debug info atos will not report a source location and it will report the function offset. In that case we don't fallback to dladdr.

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-dladdr.cpp
10

Regexes are good for explicitly stating what you want to match. They are (IMHO) terrible if you want to explicitly not match a pattern unless the regex language variant has explicit support for it (e.g. (?!...) in Python's re module). LLVM's regex implementation is "POSIX extended regular expression (ERE)" which I don't think has (?!...). We could probably come up with a complicated regex that does what we want but it would be difficult to understand.

So let's just go with what we have here. The approach here is a lot easier to understand than a complicated regex.

delcypher marked an inline comment as done.Nov 14 2019, 2:06 PM
delcypher added inline comments.
compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-dladdr.cpp
10

@yln Just be clear. If you have a suggested regex then I'll consider it but I don't see how to write a comprehensible regex that does what we want it to do

yln added inline comments.Nov 15 2019, 11:04 AM
compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-atos.cpp
4

// With debug info atos reports the source location, but no function offset. We fallback to dladdr() to retrieve the function offset.

9

Got it. Suggesting comment:

// Without debug info atos already reports the function offset. No fallback required.

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-dladdr.cpp
10

How about any number of 0s, followed by at least one non-0?
0*[1-9a-f]
https://regexr.com/4ou65

delcypher marked an inline comment as done.Nov 15 2019, 3:22 PM
delcypher added inline comments.
compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-dladdr.cpp
10

Yes I guess that would work. If we want to match the whole address I guess we'd have to do

0*[1-9a-f][0-9a-f]*

It's not clear that we want to disallow 0x0 though so if we go this way I'd want a comment explaining that.

delcypher marked an inline comment as done.Nov 15 2019, 3:23 PM
delcypher added inline comments.
compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-function-offset-atos.cpp
4

That's a good description. I'll add it.

delcypher updated this revision to Diff 229970.Nov 18 2019, 9:16 PM
  • Update some test comments.
  • Use more sophisticated regex so we can drop separate 0x0 bad address check.
delcypher marked 3 inline comments as done.Nov 18 2019, 9:18 PM

@yln I've done another pass over this. Hopefully there are no remaining issues.

yln accepted this revision.Nov 19 2019, 9:38 AM
This revision is now accepted and ready to land.Nov 19 2019, 9:38 AM
This revision was automatically updated to reflect the committed changes.