People keep complain about spurious fails on heavy halt_on_error* tests. After r275539 I've managed to reproduce a failure on halt_on_error-torture.cc, reported by Kostya:
/home/max/build/llvm/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -fsanitize-recover=address -pthread /home/max/src/llvm/projects/compiler-rt/test/asan/TestCases/Posix/halt_on_error-torture.cc -o /home/max/build/llvm/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/halt_on_error-torture.cc.tmp env ASAN_OPTIONS=halt_on_error=false:suppress_equal_pcs=false /home/max/build/llvm/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/halt_on_error-torture.cc.tmp 10 20 >10.txt 2>&1 || true FileCheck --check-prefix=CHECK-COLLISION /home/max/src/llvm/projects/compiler-rt/test/asan/TestCases/Posix/halt_on_error-torture.cc < 10.txt || FileCheck --check-prefix=CHECK-NO-COLLISION /home/max/src/llvm/projects/compiler-rt/test/asan/TestCases/Posix/halt_on_error-torture.cc < 10.txt -- Exit Code: 1 Command Output (stderr): -- /home/max/src/llvm/projects/compiler-rt/test/asan/TestCases/Posix/halt_on_error-torture.cc:76:22: error: expected string not found in input // CHECK-COLLISION: AddressSanitizer: nested bug in the same thread, aborting ^ <stdin>:1:1: note: scanning from here ================================================================= ^ <stdin>:22:10: note: possible intended match here SUMMARY: AddressSanitizer: use-after-poison /home/max/src/llvm/projects/compiler-rt/test/asan/TestCases/Posix/halt_on_error-torture.cc:45:14 in run(void*) ^ /home/max/src/llvm/projects/compiler-rt/test/asan/TestCases/Posix/halt_on_error-torture.cc:77:25: error: expected string not found in input // CHECK-NO-COLLISION: All threads terminated ^ <stdin>:1:1: note: scanning from here ================================================================= ^ <stdin>:3:31: note: possible intended match here WRITE of size 1 at 0x7f36b00fedb0 thread T1 ^ -- ********************
Here what I found out:
When we run halt_on_error-torture.cc with 10 threads and 20 iterations with halt_on_error=false:suppress_equal_pcs=false, we write 200 reports to 10.txt file and sometimes have collisions. We have CHECK-COLLISION check that greps 'AddressSanitizer: nested bug in the same thread, aborting' message in 10.txt, but for some reason it doesn't contain this line. If I don't redirect stderr > 10.txt 'AddressSanitizer: nested bug in the same thread, aborting' is printed to my screen as expected.
This happens because we hit on race in WriteToFile function called from ScopedInErrorReport constructor:
u32 current_tid = GetCurrentTidOrInvalid(); if (reporting_thread_tid_ == current_tid || reporting_thread_tid_ == kInvalidTid) { // This is either asynch signal or nested error during error reporting. // Fail simple to avoid deadlocks in Report(). // Can't use Report() here because of potential deadlocks // in nested signal handlers. const char msg[] = "AddressSanitizer: nested bug in the same thread, " "aborting.\n"; WriteToFile(kStderrFd, msg, sizeof(msg)); internal__exit(common_flags()->exitcode); }
Here we have concurrent write of "AddressSanitizer: nested bug in the same thread, aborting.\n" message in one thread and report writing from another thread. There's no guarantee of write operation atomicity unless we open corresponding file with O_APPEND flag in Unix, thus the "AddressSanitizer: nested bug in the same thread, aborting.\n" message might be missed that leads to test failure if collision occurred.
Ideally, this race in ScopedInErrorReport should be eliminated, but for now we can fix heavy recovery mode tests by implicitly setting O_APPEND for opened files (use >> instead of > for stderr redirection).