This is an archive of the discontinued LLVM Phabricator instance.

[Sanitizers] Remove BuildId from sanitizers stacktrace on Darwin
ClosedPublic

Authored by usama54321 on May 10 2023, 1:23 PM.

Details

Summary

On Darwin, we do not want to show the BuildId appended at the end of stack
frames in Sanitizers. According to our understanding, the BuildId string is not being used anywhere on Darwin.

rdar://108324403

Diff Detail

Event Timeline

usama54321 created this revision.May 10 2023, 1:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2023, 1:23 PM
Herald added a subscriber: Enna1. · View Herald Transcript
usama54321 requested review of this revision.May 10 2023, 1:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2023, 1:23 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

There are two more calls to MaybeBuildIdToBuffer in the switch statement. I did not guard them with !SANITIZER_APPLE because they do not affect the codepath on Darwin. If we want, I can also guard them with !SANITIZER_APPLE, or just guard the function implementation itself rather than the call sites.

yln added a comment.May 10 2023, 2:27 PM

Nit: can we push the conditional inside MaybeBuildIdToBuffer()? Since it's already aptly named "Maybe..."

Please also show the before/after outputs for an example ASan/TSan/UBSan bug.

Feedback fixes

There is no change in UBSan.

Old ASan

WRITE of size 4 at 0x000100502ee0 thread T0
    #0 0x100493ec8 in main+0x84 (a.out:arm64+0x100003ec8) (BuildId: 6b02f9ebe9513beeb808945f611316dc32000000200000000100000000000e00)
    #1 0x18937a054  (<unknown module>)

0x000100502ee0 is located 1100 bytes after 100-byte region [0x000100502a30,0x000100502a94)
allocated by thread T0 here:
    #0 0x100f22b08 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x52b08) (BuildId: 487e1a780e8f3396a9f9da766c5e185b32000000200000000100000000000b00)
    #1 0x100493e68 in main+0x24 (a.out:arm64+0x100003e68) (BuildId: 6b02f9ebe9513beeb808945f611316dc32000000200000000100000000000e00)
    #2 0x18937a054  (<unknown module>)

SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:arm64+0x100003ec8) (BuildId: 6b02f9ebe9513beeb808945f611316dc32000000200000000100000000000e00) in main+0x84

New Asan

WRITE of size 4 at 0x60b0000004f0 thread T0
    #0 0x1008bfec8 in main+0x84 (a.out:arm64+0x100003ec8)
    #1 0x18937a054  (<unknown module>)
    #2 0x3e79fffffffffffc  (<unknown module>)

0x60b0000004f0 is located 1100 bytes after 100-byte region [0x60b000000040,0x60b0000000a4)
allocated by thread T0 here:
    #0 0x101260800 in wrap_malloc sanitizer_malloc_mac.inc:137
    #1 0x1008bfe68 in main+0x24 (a.out:arm64+0x100003e68)
    #2 0x18937a054  (<unknown module>)
    #3 0x3e79fffffffffffc  (<unknown module>)

SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:arm64+0x100003ec8) in main+0x84

Old TSan:

 Write of size 4 at 0x00010420c000 by thread T1:
    #0 Thread1 <null>:79480708 (a.out:arm64+0x100003dc8) (BuildId: 5e79074a7b753da999cf8789fa452d1432000000200000000100000000000e00)

  Previous write of size 4 at 0x00010420c000 by main thread:
    #0 main <null>:79480708 (a.out:arm64+0x100003e3c) (BuildId: 5e79074a7b753da999cf8789fa452d1432000000200000000100000000000e00)

  Location is global 'Global' at 0x00010420c000 (a.out+0x100008000)

  Thread T1 (tid=6744635, running) created by main thread at:
    #0 pthread_create <null>:79480708 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x2ffec) (BuildId: e5ecae67005633f0b93152d2bf60e86e32000000200000000100000000000b00)
    #1 main <null>:79480708 (a.out:arm64+0x100003e34) (BuildId: 5e79074a7b753da999cf8789fa452d1432000000200000000100000000000e00)

SUMMARY: ThreadSanitizer: data race (a.out:arm64+0x100003dc8) (BuildId: 5e79074a7b753da999cf8789fa452d1432000000200000000100000000000e00) in Thread1+0x34

New TSan:

  Write of size 4 at 0x0001048e4000 by thread T1:
    #0 Thread1 <null>:131073952 (a.out:arm64+0x100003dc0)

  Previous write of size 4 at 0x0001048e4000 by main thread:
    #0 main <null>:131073600 (a.out:arm64+0x100003e3c)

  Location is global 'Global' at 0x0001048e4000 (a.out+0x100008000)

  Thread T1 (tid=6745255, running) created by main thread at:
    #0 pthread_create tsan_interceptors_posix.cpp:1048 (libclang_rt.tsan_osx_dynamic.dylib:arm64+0x2d20c)
    #1 main <null>:131073952 (a.out:arm64+0x100003e24)

SUMMARY: ThreadSanitizer: data race (a.out:arm64+0x100003dc0) in Thread1+0x3c
vitalybuka added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_printer.cpp
129

how about runtime sanitizer common flag which is FALSE by default for apple

fmayer added a subscriber: fmayer.May 18 2023, 12:59 PM

Can you please elaborate in the change description why "On Darwin, we do not want to show the BuildId appended at the end of stack
frames in Sanitizers."?

thetruestblue added a comment.EditedMay 18 2023, 1:06 PM

My only nit is to add more description to commit message (as fmayer suggested) about why we are removing this and perhaps a link to the review that this was introduced. https://reviews.llvm.org/D114294

Can you please elaborate in the change description why "On Darwin, we do not want to show the BuildId appended at the end of stack
frames in Sanitizers."?

We do not want to show it since according to our understanding it is currently not being used for anything on Darwin. Can you kindly elaborate a bit about the motivation of introducing this in https://reviews.llvm.org/D114294? Thanks

usama54321 edited the summary of this revision. (Show Details)May 18 2023, 1:19 PM

Can you please elaborate in the change description why "On Darwin, we do not want to show the BuildId appended at the end of stack
frames in Sanitizers."?

We do not want to show it since according to our understanding it is currently not being used for anything on Darwin. Can you kindly elaborate a bit about the motivation of introducing this in https://reviews.llvm.org/D114294? Thanks

It makes it a lot easier to symbolize crashes from stripped binary, because you can find the correct version of the unstripped binary from the build id.

Can you please elaborate in the change description why "On Darwin, we do not want to show the BuildId appended at the end of stack
frames in Sanitizers."?

We do not want to show it since according to our understanding it is currently not being used for anything on Darwin. Can you kindly elaborate a bit about the motivation of introducing this in https://reviews.llvm.org/D114294? Thanks

It makes it a lot easier to symbolize crashes from stripped binary, because you can find the correct version of the unstripped binary from the build id.

With that in mind I would suggest _not removing this_ unless you have a reason why this is causing problems. This can be useful for people.

fmayer requested changes to this revision.May 18 2023, 1:38 PM

Apart from my other comment, if you want to go ahead I would suggest implementing this in a different way and only removing the MaybeBuildIdToBuffer from the implementation of 'L' in sanitizer_stacktrace_printer.cpp, then a) people can still manually get the build id if their format string uses %b, and b) the MaybeBuildIdToBuffer function doesn't unnecessarily not work on OS X, which is confusing.

This revision now requires changes to proceed.May 18 2023, 1:38 PM
yln added a comment.May 18 2023, 2:35 PM

Apart from my other comment, if you want to go ahead I would suggest implementing this in a different way and only removing the MaybeBuildIdToBuffer from the implementation of 'L' in sanitizer_stacktrace_printer.cpp, then a) people can still manually get the build id if their format string uses %b, and b) the MaybeBuildIdToBuffer function doesn't unnecessarily not work on OS X, which is confusing.

I think these are good suggestions. How about combining this with Vitaly's suggestion for using a flag to only change this on Apple platforms (and to enable everyone to still get the current output)?

Just thought of an even more flexible approach. Make the frame format string a configuration option, so we can specify platform-dependent defaults, allow users to get current behavior (on Apple platforms), and give them the ability to fully customize if they desire.

static const char kDefaultFormat[] = "    #%n %p %F %L";

@vitalybuka @fmayer
What do you think?

Apart from my other comment, if you want to go ahead I would suggest implementing this in a different way and only removing the MaybeBuildIdToBuffer from the implementation of 'L' in sanitizer_stacktrace_printer.cpp, then a) people can still manually get the build id if their format string uses %b, and b) the MaybeBuildIdToBuffer function doesn't unnecessarily not work on OS X, which is confusing.

I think these are good suggestions. How about combining this with Vitaly's suggestion for using a flag to only change this on Apple platforms (and to enable everyone to still get the current output)?

Just thought of an even more flexible approach. Make the frame format string a configuration option, so we can specify platform-dependent defaults, allow users to get current behavior (on Apple platforms), and give them the ability to fully customize if they desire.

static const char kDefaultFormat[] = "    #%n %p %F %L";

@vitalybuka @fmayer
What do you think?

The custom strings would involve splitting the %b from the %L and putting "%L %b" as the custom string, which would lead to an extraneous space if %b is empty. We could add the space as the first character of the %b, but that's kind of ugly.

yln added a comment.EditedMay 18 2023, 2:43 PM

Oh, I and just realized, we already have this option to customize the stack frame format (sanitizer common flag stack_trace_format)!
https://github.com/llvm/llvm-project/blob/ca50840b5bc0679e824850ca4fd322504d6713a0/compiler-rt/lib/sanitizer_common/sanitizer_flags.inc#L194

So then it's just about specifying a platform-dependent default?

yln added a comment.May 18 2023, 2:51 PM

Maybe something like this?

#if APPLE
static const char kDefaultFormat[] = "    #%n %p %F %xxxx<figure this out>";
#else
static const char kDefaultFormat[] = "    #%n %p %F %L";
#endif

Apart from my other comment, if you want to go ahead I would suggest implementing this in a different way and only removing the MaybeBuildIdToBuffer from the implementation of 'L' in sanitizer_stacktrace_printer.cpp, then a) people can still manually get the build id if their format string uses %b, and b) the MaybeBuildIdToBuffer function doesn't unnecessarily not work on OS X, which is confusing.

I think these are good suggestions. How about combining this with Vitaly's suggestion for using a flag to only change this on Apple platforms (and to enable everyone to still get the current output)?

Just thought of an even more flexible approach. Make the frame format string a configuration option, so we can specify platform-dependent defaults, allow users to get current behavior (on Apple platforms), and give them the ability to fully customize if they desire.

static const char kDefaultFormat[] = "    #%n %p %F %L";

@vitalybuka @fmayer
What do you think?

The custom strings would involve splitting the %b from the %L and putting "%L %b" as the custom string, which would lead to an extraneous space if %b is empty. We could add the space as the first character of the %b, but that's kind of ugly.

The current codeflow makes this refactoring difficult since the call to MaybeBuildIdToBuffer appears in an else if branch in both %L and %M specifiers. There is already a %b specifier present for MaybeBuildIdToBuffer but I can not just use that. I could write another specifier like %B which only calls MaybeBuildIdToBuffer with the conditions (!info->file && info->module) || (!(address & kExternalPCBit) && info->module) but that looks really ugly. Do you have any other suggestions?

MaskRay added inline comments.
compiler-rt/lib/sanitizer_common/tests/sanitizer_stacktrace_printer_test.cpp
140

This can use a regular if statement.

Apart from my other comment, if you want to go ahead I would suggest implementing this in a different way and only removing the MaybeBuildIdToBuffer from the implementation of 'L' in sanitizer_stacktrace_printer.cpp, then a) people can still manually get the build id if their format string uses %b, and b) the MaybeBuildIdToBuffer function doesn't unnecessarily not work on OS X, which is confusing.

I think these are good suggestions. How about combining this with Vitaly's suggestion for using a flag to only change this on Apple platforms (and to enable everyone to still get the current output)?

Just thought of an even more flexible approach. Make the frame format string a configuration option, so we can specify platform-dependent defaults, allow users to get current behavior (on Apple platforms), and give them the ability to fully customize if they desire.

static const char kDefaultFormat[] = "    #%n %p %F %L";

@vitalybuka @fmayer
What do you think?

The custom strings would involve splitting the %b from the %L and putting "%L %b" as the custom string, which would lead to an extraneous space if %b is empty. We could add the space as the first character of the %b, but that's kind of ugly.

The current codeflow makes this refactoring difficult since the call to MaybeBuildIdToBuffer appears in an else if branch in both %L and %M specifiers. There is already a %b specifier present for MaybeBuildIdToBuffer but I can not just use that. I could write another specifier like %B which only calls MaybeBuildIdToBuffer with the conditions (!info->file && info->module) || (!(address & kExternalPCBit) && info->module) but that looks really ugly. Do you have any other suggestions?

Could you please explain why these build ids do harm? Otherwise it would really be preferable to just have consistency between platforms.

At any rate, if you have to: however you do the change, please do not make`'%b` a no-op, and don't make MaybeBuildIdToBuffer a no-op. If the easiest way is to put the two calls in %M and %L into an ifdef, then I'd begrudgingly accept that.

usama54321 updated this revision to Diff 529072.EditedJun 6 2023, 4:05 PM

There are a couple of problems with Build IDs on Darwin. Firstly, Build IDs/UUIDs are 16 bytes on Darwin. I put up a patch at https://reviews.llvm.org/D152309 to fix this. Secondly, a stack trace is not the right place to show Build Ids as multiple frames from the same library lead to repetition. This purpose is better served by the print_module_map sanitizer flag which dumps the Build Ids of all the libraries without repetition. I have followed your recommendations and made the %b specifier and the MaybeBuildIdToBuffer method functional again. @fmayer

fmayer accepted this revision.Jun 6 2023, 4:14 PM

I still don't like the inconsistency between platforms, and I don't think a few bytes of repeated build id matter, but fine.

This revision is now accepted and ready to land.Jun 6 2023, 4:14 PM
This revision was landed with ongoing or failed builds.Jun 6 2023, 4:38 PM
This revision was automatically updated to reflect the committed changes.
fmayer added a comment.Jun 6 2023, 5:23 PM

Could this have broken check-hwasan? http://45.33.8.238/linux/109046/step_10.txt

Looks like it.

thakis added a comment.Jun 6 2023, 5:25 PM

Please take a look and revert for now if it takes a while to fix.

compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_printer.cpp
221

sanitizer_platform.h does

#else
#  define SANITIZER_APPLE 0

so it's defined on non-apple too. I think that's why this breaks tests.

fmayer reopened this revision.Jun 6 2023, 5:27 PM
This revision is now accepted and ready to land.Jun 6 2023, 5:27 PM