Add structures for the new trace format,
functions that serialize and add events to the trace
and trace replaying logic.
Depends on D107910.
Differential D107911
tsan: add new trace dvyukov on Aug 11 2021, 9:22 AM. Authored by
Details
Add structures for the new trace format, Depends on D107910.
Diff Detail
Event TimelineComment Actions I want to add few more tests, but the rest of the code is ready. So I mailed it to get any high-level feedback sooner. Comment Actions (review incomplete, but this is what I have so far)
Comment Actions Hi, I believe that your change is causing a build failure on one of the bots, can you please take a look? https://lab.llvm.org/buildbot/#/builders/37/builds/6103
Comment Actions I think GCC version 8.3 is bad (which the build bot uses). GCC 8.4 seems to have fixed the issue: https://godbolt.org/z/66xYK1xMf Comment Actions FTR, one bot failed with: /home/buildbots/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/lib/tsan/tests/unit/tsan_trace_test.cpp:153:12: error: unused variable 'res' [-Werror,-Wunused-variable] bool res = RestoreStack(thr->tid, v3::EventType::kLock, thr->sid, ^ /home/buildbots/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/lib/tsan/tests/unit/tsan_trace_test.cpp:194:12: error: unused variable 'res' [-Werror,-Wunused-variable] bool res = https://lab.llvm.org/buildbot/#/builders/19/builds/6003/steps/24/logs/stdio I've sent https://reviews.llvm.org/D108118 to fix this. Comment Actions In builds with the -Werror -Wmissing-field-initializers option(s) enabled, the following instantiation will cause an error. Thread::Params tests[] = { {1, 0, 1, true}, {4, 0, 2, true}, {4, 2, 2, true}, {8, 3, 1, true}, {2, 1, 1, true}, {1, 1, 1, false}, {8, 5, 4, false}, {4, static_cast<uptr>(-1l), 4, false}, }; Could the patch be updated to explicitly initialize the last int field in the Thread::Params struct? Comment Actions This patch is causing some failures on the clang-ppc64le-rhel buildbot. Here is the failing build. Specifically in the compiler-rt/lib/tsan/tests/unit/tsan_trace_test.cpp test case:
Comment Actions That may be the case, but unless GCC 8.3 is an unsupported compiler (I don't believe it is), can we get a fix, a workaround or get this change reverted? It is causing the bot I linked to be red, and also one of our internal bots are failing because of this problem. Comment Actions This is not breaking any default builds. This is only failing because Werror is on. By default it isn't. It can be disabled with -DCOMPILER_RT_ENABLE_WERROR=OFF, which I'd recommend for the bots still using GCC 8.3, given we know it's buggy. Comment Actions We probably should not leave it to the user Simple change should fix it. struct Event { u64 type : 3; EventType GetType() const { return static_cast<EventType>(type); } }; Comment Actions @dvyukov To avoid revert I've done in commit above. Feel free to change if you have a better solution.
Comment Actions This seems simple enough, but still would be nice if the code doesn't need to suffer too much because of known buggy compilers. Would it be reasonable to force Werror off for known bad compiler versions? In this case GCC 9.2 or earlier (except 8.4 or later from 8 series which also include the fix). Comment Actions This doesn't seem to fix the problem on our internal bot (and likely also the upstream one). Interestingly, @melver's solution of -DCOMPILER_RT_ENABLE_WERROR=OFF doesn't work either. It would appear that -Werror is being hardcoded in line 62 of compiler-rt/lib/tsan/go/buildgo.sh. If I remove that, I can get the compilation to "succeed". I'll put up that workaround shortly. Comment Actions This patch still fails on the PPC64LE-RHEL Bot (https://lab.llvm.org/buildbot/#/builders/57/builds/9429). Could we please revert this patch or add a follow up fix to it? (The fix should be to explicitly initialize all fields in the struct within the initializer list (in lines 128 - 131) of the Thread::Params struct) Comment Actions I will have to respectfully disagree here. "Buggy compilers" are a fact of life. We are all developers on a software project like any other and just as developers on other projects sometimes have to put in code workarounds for bugs in Clang (which we develop), we too have to do the same for bugs in compilers we support for building our project. Turning on -Werror is a very useful thing and any suggestion to turn it off for bots that turn it on needs very solid justification. There have been many subtle issues identified by bots that use this option and turning it off completely just because of one issue is ill-advised IMHO. Not to mention that as @dyung mentioned, it doesn't fix the problem. While I can certainly sympathize with not wanting to put in code to work around bugs that are not yours, I am afraid that this is something we all have to do sometimes. All in all, if there is no obvious immediate fix that will bring the bot back to green, I recommend pulling this patch and any follow-up fixes that depend on it. Once you think you have a complete fixed patch, you can post it here and we'll be happy to test it ahead of the commit on the very machine that the bot runs on. Comment Actions The newly added unit tests are failing on Darwin. Can you please look into it? Would it possible to revert this until the issue is resolved? TsanUnitTest-x86_64-Test --gtest_filter=Trace.MemoryAccessSize -- Note: Google Test filter = Trace.MemoryAccessSize [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from Trace [ RUN ] Trace.MemoryAccessSize access_size=1, offset=0, size=1, res=1, type=0 ThreadSanitizer:DEADLYSIGNAL ==49176==ERROR: ThreadSanitizer: SEGV on unknown address 0x0000000000f8 (pc 0x00010a190c52 bp 0x7e8000080b70 sp 0x7e8000080b00 T570136) ==49176==The signal is caused by a READ memory access. ==49176==Hint: address points to the zero page. #0 __tsan::v3::TraceSwitchPart(__tsan::ThreadState*) tsan_rtl.cpp:682 (TsanUnitTest-x86_64-Test:x86_64+0x1000bcc52) #1 __tsan::v3::TraceFunc(__tsan::ThreadState*, unsigned long) tsan_rtl.cpp:630 (TsanUnitTest-x86_64-Test:x86_64+0x1000bd2b1) #2 __tsan::Trace_MemoryAccessSize_Test::TestBody()::Thread::Func(void*) tsan_trace_test.cpp:93 (TsanUnitTest-x86_64-Test:x86_64+0x10003fe77) #3 _pthread_start <null>:2 (libsystem_pthread.dylib:x86_64+0x6108) #4 thread_start <null>:2 (libsystem_pthread.dylib:x86_64+0x1b8a) ==49176==Register values: rax = 0x0000000000000000 rbx = 0x0000000000000000 rcx = 0x0000000000000008 rdx = 0x0000000000000000 rdi = 0x000000010ea8e000 rsi = 0x0000000000001000 rbp = 0x00007e8000080b70 rsp = 0x00007e8000080b00 r8 = 0x0000000063000000 r9 = 0x0000000000000000 r10 = 0x000000010ea8e000 r11 = 0x0000000000000206 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000001000 r15 = 0x0000000000000000 ThreadSanitizer can not provide additional info. SUMMARY: ThreadSanitizer: SEGV tsan_rtl.cpp:682 in __tsan::v3::TraceSwitchPart(__tsan::ThreadState*) ==49176==ABORTING Refer to https://green.lab.llvm.org/green/job/clang-stage1-RA/23374/testReport/junit/ThreadSanitizer-Unit/. Comment Actions The test and the code is new, and the only user of the code is the test. So I think it's fair to just disable the test on Darwin until Dmitry has a chance to look at it. I'll try and do this later today. |
Other code below uses reinterpret_cast<Event *>(atomic_load_relaxed(&thr->trace_pos)); -- also switch to reintepret_cast?