Page MenuHomePhabricator

sanitizer_common: split LibIgnore into fast/slow paths
ClosedPublic

Authored by dvyukov on Jul 9 2021, 11:36 AM.

Details

Summary

LibIgnore is checked in every interceptor.
Currently it has all logic in the single function
in the header, which makes it uninlinable.
Split it into fast path (no libraries ignored)
and slow path (have ignored libraries).
It makes the fast path inlinable (single load).

Diff Detail

Event Timeline

dvyukov created this revision.Jul 9 2021, 11:36 AM
dvyukov requested review of this revision.Jul 9 2021, 11:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2021, 11:36 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
vitalybuka accepted this revision.Jul 9 2021, 12:10 PM
This revision is now accepted and ready to land.Jul 9 2021, 12:10 PM
This revision was landed with ongoing or failed builds.Jul 11 2021, 2:34 AM
This revision was automatically updated to reflect the committed changes.

This patch seems to break TSAN_OPTION=ignore_noninstrumented_modules=1 behavior. Before this patch, make check-libarcher succeeds, after this patch, the tests fail. As I understand the description, on semantic change is intended? Any idea, what happened here?

make check-libarcher runt the tests for libarcher providing the TsanAnnotations for OpenMP. Since some intercepted pthread calls in libomp are identified as data race, while they are synchronized, we set this environmental variable to suppress analysis of the non-instrumented libomp.

I've tried:

$ ninja check-libarcher
ninja: error: unknown target 'check-libarcher'

Please extract what libarcher is doing and add as a tsan unit test. It will help to fix this and prevent future breakages of libarcher.
You can see existing libignore tests here:
https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/tsan/ignore_lib0.cpp

This might have one of two reasons: libarcher is part of the openmp project, so you need to add archer to the enabled projects. Furthermore, libarcher is only supported for linux/macos and I think x86_64.

I'll try to come back with a standalone reproducer.

I was actually thinking about reverting this change, because e.g. Java, Mac (and now OpenMP) always require some ignores enabled, so the "fast path" includes checking ranges in these contexts. But this change is definitely positive for synthetic microbenchmarks :)

I have not decided re revert as there were no pressing need to decide quickly. And a test would be good anyway.

https://bugs.llvm.org/show_bug.cgi?id=51205 reports a small reproducer of the issue:

int main()
{
  #pragma omp parallel
  {  }
  #pragma omp parallel
  {  }
}

compiled with

clang -fopenmp -fsanitize=thread main.c

execute with

env OMP_NUM_THREADS=4 ./a.out

The application code does not have any data race. The application does not even access any data. Nevertheless threadsanitizer reports data race for pthread functions intercepted from libomp:

WARNING: ThreadSanitizer: data race (pid=8344)
  Atomic read of size 1 at 0x7b6800002f40 by main thread:
    #0 pthread_mutex_lock sanitizer_common_interceptors.inc:4249:215 (a.out+0x464758)
    #1 void __kmp_resume_64<false, true>(int, kmp_flag_64<false, true>*) <null> (libomp.so+0x8ef94)

  Previous write of size 1 at 0x7b6800002f40 by thread T4:
    #0 pthread_mutex_init tsan_interceptors_posix.cpp:1269:215 (a.out+0x447bdf)
    #1 __kmp_suspend_initialize_thread <null> (libomp.so+0x8e19e)

TSAN_OPTIONS='ignore_noninstrumented_modules=1' previously helped to suppress such reports, so that we actually suggest to set this variable and print a warning, if the variable is not set at runtime (Warning: please export TSAN_OPTIONS='ignore_noninstrumented_modules=1' to avoid false positive reports from the OpenMP runtime!)

From my perspective, it would make sense to name-shift the intercepted functions during TSan compiler instrumentation. A runtime flag could then easily limit the handling of intercepted functions to name-shifted function.

Here is my reproducer:

lib.cpp (Tsan detects a race between malloc and pthread_mutex_lock):

#include <thread>
#include <mutex>
#include <atomic>

struct globals{
  std::atomic<int> g_inited{0};
  std::mutex* g_mutex;
} g;

void worker(){
  while(g.g_inited<1);
  std::lock_guard<std::mutex> lock(*g.g_mutex);
}

extern "C"
int init(){
  auto t1 = std::thread(worker);
  g.g_mutex = new std::mutex();
  g.g_inited=1;
  t1.join();
  return 0;
}

main.c:

int init();

int main(){
  return init();
}

build (the rpath part might be linux-specific laziness to allow finding the shared library in the current directory):

clang++ -g -fPIC -shared -pthread lib.cpp -o libtest.so
clang -g -fsanitize=thread -L. -Wl,--rpath,. -ltest main.c

exec:

TSAN_OPTIONS='ignore_noninstrumented_modules=1' ./a.out

After revisiting the existing test cases, I guess it would be sufficient to add another runline:

// RUN: echo running with generic suppression of noninstrumented code:
// RUN: env  %env_tsan_opts=ignore_noninstrumented_modules=1 %run %t 2>&1 | FileCheck %s --check-prefix=CHECK-WITHSUPP

After revisiting the existing test cases, I guess it would be sufficient to add another runline:

// RUN: echo running with generic suppression of noninstrumented code:
// RUN: env  %env_tsan_opts=ignore_noninstrumented_modules=1 %run %t 2>&1 | FileCheck %s --check-prefix=CHECK-WITHSUPP

Somehow we did not have any portable tests with ignore_noninstrumented_modules=1 (only Darwin ones).
I've sent https://reviews.llvm.org/D106855 with a revert. Thanks for tracking down the reproducer.

The reason for using ignore_noninstrumented_modules=1 is the general interception of pthread / libc functions.
If the instrumentation pass would name shift the functions during compilation, turning off the analysis for non-name shifted functions, i.e., function calls from non-instrumented code, could be done at runtime without looking at the caller.