This is an archive of the discontinued LLVM Phabricator instance.

[asan] On OS X, log reports to syslog and os_trace (version 2)
ClosedPublic

Authored by zaks.anna on Oct 5 2015, 2:49 PM.

Details

Summary

When ASan currently detects a bug, by default it will only print out the text of the report to stderr. This patch changes this behavior and writes the full text of the report to syslog before we terminate the process. It also calls os_trace (Activity Tracing available on OS X and iOS) with a message saying that the report is available in syslog. This is useful, because this message will be shown in the crash log.

For this to work, the patch makes sure we store the full report into error_message_buffer unconditionally, and it also strips out ANSI escape sequences from the report (they are used when producing colored reports).

This is the second version of the patch that addresses several concerns raised in http://reviews.llvm.org/D11981

  • Added a unit test for removal of color sequences
  • Moved the logic to sanitizer_common, instead of ASan so that all other tools could use error report logging. (Note, that this introduced some ugliness due to the specifics of os_trace, which only allows us to log literal strings and I think it's important to log the name of the tool that produced the report there.)
  • Refactored the Android specific WriteToSyslog logic to make it reusable by other platforms.
  • Added a lock to AppendToErrorMessageBuffer.
  • Made sure the logging is off for testing. The code will get executed by abort_on_error.cc test that runs with the default options.

I've initially tried to log to syslog during printing, which is done on Android right now. The advantage is that if we crash during error reporting or the produced error does not go through ScopedInErrorReport, we would still get a (partial) message in the syslog. However, that solution is very problematic on OS X. One issue is that the logging routine uses GCD, which may spawn a new thread on its behalf. In many cases, the reporting logic locks threadRegistry, which leads to deadlocks.

Diff Detail

Event Timeline

zaks.anna updated this revision to Diff 36555.Oct 5 2015, 2:49 PM
zaks.anna retitled this revision from to [asan] On OS X, log reports to syslog and os_trace (version 2).
zaks.anna updated this object.
zaks.anna added reviewers: glider, samsonov, kcc.
filcab added a comment.Oct 6 2015, 6:24 AM

Looks good, but wait on another LGTM.
Minor comments above.

lib/asan/asan_report.cc
48

I'd prefer this to have "mutex" or "lock" (or a variation of one) in the name.

lib/sanitizer_common/sanitizer_mac.cc
408

Can we verify any/some of these things (e.g: someone was holding the mutex and called us) with some CHECK() calls?

lib/sanitizer_common/sanitizer_printf.cc
282

It's not really testing Android, it's something more general.
Both rephrasing or removing the comment are ok for me.

test/ubsan/lit.common.cfg
45

Extra ':'.

glider added inline comments.Oct 6 2015, 10:00 AM
lib/asan/asan_report.cc
48

Which variables does this mutex protect?
I suspect |error_message_buffer| belongs to those, therefore you need to acquire this mutex every time you access |error_message_buffer|.
I also suggest you declare the mutex next to error_message_xxx declarations (and change the name as proposed by Filipe)

lib/sanitizer_common/sanitizer_common_libcdep.cc
15

Please keep the includes sorted alphabetically.

lib/sanitizer_common/sanitizer_mac.cc
385

I think the spelling outside Xcode must be "AddressSanitizer" (ditto for other sanitizers).
Kostya, WDYT?

kcc added inline comments.Oct 6 2015, 10:47 AM
lib/sanitizer_common/sanitizer_mac.cc
385

The tool name is certainly "AddressSanitizer" w/o spaces,
but that this is a Mac-only code, so I don't have strong opinion here.
On non-Mac-specific code -- no spaces please.

eugenis added inline comments.Oct 6 2015, 12:34 PM
lib/sanitizer_common/sanitizer_common.cc
322

a shorter way to do the same:

if (s ! = z) *z = *s;
z++;
s++;

}

lib/sanitizer_common/sanitizer_common.h
659

We already have knowledge of the current sanitizer: SanitizerToolName. Could you either dispatch on that, or use it directly?

lib/sanitizer_common/tests/sanitizer_common_test.cc
221

Please test 2 ansi sequencies with nothing in between.

test/asan/lit.cfg
39

This way log_to_syslog is completely untested. How bad is it? Is there a way to tune this down, but still exercise the logging code?

zaks.anna added inline comments.Oct 6 2015, 1:57 PM
lib/sanitizer_common/sanitizer_common.h
659

One of the APIs we use only allows string literals (for privacy reasons).

Dispatching on the string literal instead of the enum feels bad.

Also, how does this global SanitizerToolName works when we have more than one sanitizer on at the same time? I think UBSan and ASan are not mutually exclusive. It might not matter which tool failed in some cases; however, when a tool reports an error, differentiating between the two might be useful. One way of doing that would be to have the error reporting APIs take the tool name as a parameter.

In most cases, SanitizerToolName is used in calls to Report() when reporting an internal error. It would be useful to log the internal errors to syslog on OS X as well (at least that should be the long term goal).

Any further opinions on this?

lib/sanitizer_common/sanitizer_common_libcdep.cc
15

They are sorted. sanitizer-common.h is "the module header" for this cc file.

lib/sanitizer_common/sanitizer_mac.cc
408

It would be good to check if there is a lock on threadRegestry; however, I don't know if we can access that from a utility in sanitizer_common without passing it in explicitly.

test/asan/lit.cfg
39

The code will get executed by abort_on_error.cc test that runs with the default options.

eugenis added inline comments.Oct 6 2015, 2:15 PM
lib/sanitizer_common/sanitizer_common.h
659

I don't like 2 things about this: duplication between SanitizerType and SanitizerToolName, and the fact the SanitizerType list is incomplete and looks arbitrary.

Maybe extend the list to include, at least, all possible values of SanitizerToolName, and define a global instance of that to replace SanitizerToolName? With a static list of pretty tool names indexed by SanitizerType.

Also, how does this global SanitizerToolName works when we have more than one sanitizer on at the same time?

There is always a "main sanitizer" which provides the name to be used in reports. In the case of asan+ubsan the main sanitizer is asan. We never really saw this as a problem. After all, the report contains a type string like "use-after-free" which clearly tells the user what it is about.

zaks.anna added inline comments.Oct 6 2015, 2:37 PM
lib/sanitizer_common/sanitizer_common.h
659

I agree about the duplication.

Maybe extend the list to include, at least, all possible values of SanitizerToolName, and define a global instance of that to
replace SanitizerToolName? With a static list of pretty tool names indexed by SanitizerType.

In the solution you propose above, we would also replace uses of SanitizerToolName by a call to a function that dispatches on the enum and returns the name. This function would be implemented in sanitizer_common and would have to know about all sanitizers, correct?

I can do this as a separate patch.

After all, the report contains a type string like "use-after-free" which clearly tells the user what it is about.

I do no think it is true of every user and every error message. Also, this is not friendly to (hypothetical but likely) post processing scripts that check the syslog trying to detect and categorize the failures.

Another issue that is Mac-specific is that the cash report might only have the messages from os_trace, which will not include the rest of the message. Calling out which tool failed there is valuable.

zaks.anna updated this revision to Diff 36671.Oct 6 2015, 3:49 PM

Address all outstanding review comments.

zaks.anna marked 6 inline comments as done.Oct 6 2015, 3:50 PM
zaks.anna added inline comments.
lib/sanitizer_common/sanitizer_common.h
659

Using the SanitizerToolName in this patch; we can follow up with a different solution later as discussed.

samsonov added inline comments.Oct 6 2015, 4:03 PM
lib/asan/asan_report.cc
48

Agree. Note that currently there is a race between setting error_report_callback from __asan_set_error_report_callback and invoking it when the report is printed. We may use this mutex to fix the problem.

lib/sanitizer_common/sanitizer_common.cc
310

s = internal_strchrnul(s, 'm');

lib/sanitizer_common/tests/sanitizer_common_test.cc
212

s/0/nullptr

227

indentation went off

zaks.anna updated this revision to Diff 36684.Oct 6 2015, 4:54 PM

Address Alexey's comments. (I think now I've got the synchronization right.)

samsonov edited edge metadata.Oct 6 2015, 5:11 PM

Looks reasonable to me. I'm kind of worried about the increased probability of deadlock inside runtime (e.g. if we decide to call AppendToErrorMessageBuffer when executing user-provided error report callback), but I don't see a good way to deal with this.

lib/asan/asan_report.cc
34

This should be error_message_buf_mutex(LINKER_INITIALIZED);

674

You can just add a scope and BlockingMutexLock here.

lib/sanitizer_common/sanitizer_mac.cc
379

const char *kNameMapping[][2] = {
{"AddressSanitizer", "Address Sanitizer"},
{"ThreadSanitizer", "Thread Sanitizer"},
...
};

and then a loop, would probably reduce duplication. Also, add some kind of comment about why you want space.

zaks.anna marked an inline comment as done.Oct 6 2015, 6:32 PM
zaks.anna added inline comments.
lib/asan/asan_report.cc
674

Report calls Print, which might call AppendToErrorMessageBuffer resulting in double locking.

lib/sanitizer_common/sanitizer_mac.cc
379

The main issue here is not the space but that the os_trace API does not work with anything but the string literal constants (for privacy reasons).

zaks.anna added inline comments.Oct 6 2015, 6:33 PM
lib/asan/asan_report.cc
674

Oh, "add a scope". OK. Will do.

zaks.anna updated this revision to Diff 36696.Oct 6 2015, 6:44 PM
zaks.anna edited edge metadata.

Addressed additional comments from Alexey.

glider added inline comments.Oct 7 2015, 9:56 AM
lib/sanitizer_common/sanitizer_mac.cc
378

There's a name clash between this parameter and the global variable.

399

Any idea about the conditions under which the logging calls pthread_create?
Maybe we can warm up the logging subsystem when initializing ASan?
Will it help to create actual asl clients using asl_open()?

zaks.anna added inline comments.Oct 7 2015, 1:33 PM
lib/sanitizer_common/sanitizer_mac.cc
378

The variable is a static from another file, so technically there is no clash, but I can rename the parameter to buffer.

399

asl_log relies on libdispatch, for example, it calls dispatch_once. We cannot control when GCD will spawn a new thread. The deadlocks are unpredictable, they often happen when the client code uses GCD.

asl_open should be used if we want to log from multiple threads without locking:
"A client handle may only be safely shared amongst multiple threads if the application uses locks or some synchronization strategy to ensure single-threaded access."

LogFullErrorReport is only called behind a lock right now. However, I could add another lock just to protect logging to syslog. Ex: maybe we could lock before calling WriteToSyslog below and check the lock in WriteOneLineToSyslog. This way we will future proof the code.

zaks.anna updated this revision to Diff 36900.Oct 8 2015, 3:23 PM

Adding a new diff, where the variable is renamed and an extra lock is added to guard calls to ask_log.

All outstanding issues have been resolved.

glider accepted this revision.Oct 13 2015, 9:42 AM
glider edited edge metadata.

LGTM

lib/sanitizer_common/sanitizer_common_libcdep.cc
15

I think there should be a newline between them then.

This revision is now accepted and ready to land.Oct 13 2015, 9:42 AM
zaks.anna closed this revision.Nov 30 2015, 5:39 PM

In r253688.