[tsan] Don't report bugs from interceptors called from libignored modules
ClosedPublic

Authored by kubamracek on Mar 28 2017, 6:53 PM.

Details

Summary

I'd like to expand the ignore_noninstrumented_modules=1 flag to cover all report types (it currently only suppresses data race detection). We're seeing some false positives from system libraries where they guard the mutex acquisition via atomics (so the deadlock cannot happen). The general approach that I'd like to achieve with ignore_noninstrumented_modules=1 is to only report bugs where we know they actually come from instrumented code, regardless of bug type (race, mutex problem, deadlock).

This patch make sure we don't report deadlocks and other bug types when we're inside an interceptor that was called from a noninstrumented module (when ignore_noninstrumented_modules=1 is set). Adding a testcase that shows that deadlock detection still works on Darwin (to make sure we're not silencing too many reports).

Diff Detail

Repository
rL LLVM
kubamracek created this revision.Mar 28 2017, 6:53 PM
kcc added a subscriber: kcc.Mar 29 2017, 5:25 PM

I don't like this change -- it is very un-intuitive that reads/writes affects deadlock checker.
Also remember that deadlock detector does not require instrumentation at all.
Can we solve it with a blacklist?

remember that deadlock detector does not require instrumentation at all.

Unfortunately, the fact that TSan is able to detect issues on non-instrumented code brings more harm than benefits, at least on macOS/iOS. Almost all reports that don't come from instrumented code are false positives. Our users are usually fine with recompiling their source code with instrumentation and they expect to only find issues with their code, not some 3rd party library. Same applies to mutexes and deadlocks. If a user wants to see reports from non-instrumented libraries, they can just turn the ignore_noninstrumented_modules flag to zero.

Can we solve it with a blacklist?

Maybe, but to use libignore (which I would need for this), the same patch would be needed. Libignore currently sets ignore_reads_and_writes, which doesn't prevent reporting deadlocks.

it is very un-intuitive that reads/writes affects deadlock checker.

That's why the patches renamed ignore_reads_and_writes to ignore_reads_writes_and_reports. Would it help if I added a separate ignore_reports (or even ignore_deadlocks) flag?

When an interceptor comes from a non-instrumented library, we set thr->in_ignored_lib=true. This causes SCOPED_TSAN_INTERCEPTOR to call the real function an skip all tsan processing. Consequently mutex lock/unlock operations in non-instrumented libraries are not handled by tsan already. So it's not clear to me how you get deadlock reports there. What am I missing?

One problem is if one mutex in deadlock cycles is taken from non-instrumented code, but the second mutex is taken from instrumented code, tsan won't report the deadlock on the mutex operations coming from non-instrumented code, but it will still report it on operations coming from instrumented code. So it seems to me that a better way to handle it is to hide all mutex operations in non-instrumented code from tsan. But it already seems to be happening, so I am confused.

When an interceptor comes from a non-instrumented library, we set thr->in_ignored_lib=true. This causes SCOPED_TSAN_INTERCEPTOR to call the real function an skip all tsan processing. Consequently mutex lock/unlock operations in non-instrumented libraries are not handled by tsan already. So it's not clear to me how you get deadlock reports there. What am I missing?

There's two issues with the current state of things:

  1. in_ignored_lib=true only affects second (nested) calls to other interceptor. So if a libignored library directly calls pthread_mutex_lock+pthread_mutex_unlock, those interceptors are still called and they detect deadlocks.
  2. Even if libignore affects direct interceptor calls, we'd have even more issues, because the pthread_mutex_lock+pthread_mutex_unlock interceptors provide important synchronization and if we just ignore them, we'll get even more false positives.

What I'm suggesting is: Non-instrumented modules should still call interceptors like pthread_mutex_lock+pthread_mutex_unlock, but these should just not report any issues, including deadlocks. Do you disagree?

dvyukov added inline comments.Apr 2 2017, 3:03 AM
lib/tsan/dd/dd_rtl.cc
40 ↗(On Diff #93331)

This is a Thread from dd_rtl.h, this is not ThreadState from tsan_rtl.h. It does not have that field.

  1. s/ignore_reads_and_writes/ignore_reads_writes_and_reports/ does not look like a good idea to me, because these things are not necessary related. If we go this route we need separate flags.
  2. From the description it seems that your intention is to ignore all reports, but you are ignoring only deadlocks. There are many more types of reports.

How about:

 bool OutputReport(ThreadState *thr, const ScopedReport &srep) {
-  if (!flags()->report_bugs)
+  if (!flags()->report_bugs || thr->suppress_reports)
     return false;
   atomic_store_relaxed(&ctx->last_symbolize_time_ns, NanoTime());
   const ReportDesc *rep = srep.GetReport();

This will be a minimalistic change and it will suppress all reports and won't affect anything other than non-instrumented libs.
But note that it will effectively nullify the external API thing as it will never report anything.

kubamracek updated this revision to Diff 93886.Apr 3 2017, 10:45 AM
kubamracek retitled this revision from [tsan] Don't report deadlocks when ignore_reads_and_writes > 0 to [tsan] Don't report bugs from interceptors called from libignored modules.
kubamracek edited the summary of this revision. (Show Details)

Addressing review comments, renaming.

From the description it seems that your intention is to ignore all reports, but you are ignoring only deadlocks. There are many more types of reports.

Right, this patch now suppresses all types.

note that it will effectively nullify the external API thing as it will never report anything.

I tested this and the external API still works fine. This is because this API isn't an interceptor and we're not using ScopedInterceptor in it.

kubamracek updated this revision to Diff 94548.Apr 7 2017, 11:21 AM

Is this okay to land?

dvyukov accepted this revision.Apr 21 2017, 4:45 AM
This revision is now accepted and ready to land.Apr 21 2017, 4:45 AM
This revision was automatically updated to reflect the committed changes.