This patch adds functions for managing fibers:
- __tsan_get_current_fiber()
- __tsan_create_fiber()
- __tsan_destroy_fiber()
- __tsan_switch_to_fiber()
- __tsan_set_fiber_name()
Differential D54889
Fiber support for thread sanitizer yuri on Nov 26 2018, 2:07 AM. Authored by
Details
Diff Detail
Event TimelineComment Actions Can you explain what this patch does a bit? The way I see it, it's just adding an extra level of indirection to each cur_thread() call. Is that it? Does that affect the performance of TSan? Can we have some tests added that show what exactly can TSan now achieve with fibers? If the approach gets approved by @dvyukov, I'd like to make sure that the work can be re-used to implement proper understanding of Darwin GCD queues. Comment Actions It's unclear how this is supposed to be used. I would expect to see an interceptor for swapcontext that will make use of this, but there is none... Comment Actions
What kind of API is required for GCD queues? Comment Actions This patch adds indirection for cur_thread() on Linux, there is no way to do implement feature without it. On macOS indirection already exist, nothing has changed. While preparing this patch I did my best to not affect performance of programs that do not need new API. Comment Actions Interceptor for swapcontext without additional sanitizer API is not enough because it is not known when context is created and destroyed. Actually, I am not sure if swapcontext is actually used by someone because it is slow and has strange limitations. My interface assumes is that users should call __tsan_switch_to_fiber() immediately before switching context and then call swapcontext or whatever he want to perform actual switch.
Interface change is already present in diff.
OK, I will work on it and add some tests. Comment Actions What do people use nowadays? Will swapcontext work along with makecontext interceptor?
Missed it.
Comment Actions Proposed fiber API is well suitable for GCD tasks:
I will add test that covers such scenario Comment Actions We use custom assembler code. You can see it at https://github.com/acronis/pcs-io/blob/master/libpcs_io/pcs_ucontext.c
It is possible to implement interceptors for swapcontext() and makecontext(), but user still have to call something when context is destroyed. Actually, such interceptors are not needed for everyone. If application execute fibers in single thread, then it does not need any special support. Comment Actions I think single threaded programs would also benefit from swapcontext support. At the very least it will give correct stacks. I think tsan shadow stack can also overflow on some fibers patterns (if a fiber has uneven number of function entries and exits). I also wonder if it's theoretically possible to support fibers done with longjmp. I did fibers in Relacy with makecontext+swapcontext to do an initial switch to a new fiber, and then setjmp/longjmp to switch between already running fibers. If that's the common pattern, then we could see in longjmp that we are actually switching to a different stack and do fiber switch. Re cleanup, we could do it the same way we cleanup atomic variables. Each atomic variable hold a large context object in tsan, but there are no explicit destructors for atomic variables (and it probably would not be feasible to ask users to annotate each atomic var). So we should have most of the required machinery already. Comment Actions
Comment Actions Got it.
Maybe disabling interceptors before setmp/longjmp and enabling after it will help?
Are sure it is good solution?
QEMU already have annotations for Address Sanitizer at points of fiber switch. It is not a big problem to add annotations for Thread Sanitizer at the same places. Comment Actions Some high-level comments:
I think check_analyze.sh test will fail. Kinda hard to say, because it's broken already at the moment, but this perturbs runtime functions enough: Before: write1 tot 367; size 1425; rsp 8; push 1; pop 5; call 2; load 21; store 13; sh 40; mov 94; lea 4; cmp 39 write2 tot 373; size 1489; rsp 8; push 1; pop 5; call 2; load 21; store 13; sh 40; mov 95; lea 4; cmp 43 write4 tot 374; size 1457; rsp 8; push 1; pop 5; call 2; load 21; store 13; sh 40; mov 95; lea 4; cmp 43 write8 tot 364; size 1429; rsp 8; push 1; pop 5; call 2; load 21; store 13; sh 40; mov 95; lea 4; cmp 39 read1 tot 409; size 1674; rsp 8; push 1; pop 4; call 2; load 21; store 13; sh 45; mov 97; lea 4; cmp 39 read2 tot 412; size 1694; rsp 8; push 1; pop 4; call 2; load 21; store 13; sh 45; mov 97; lea 4; cmp 43 read4 tot 412; size 1694; rsp 8; push 1; pop 4; call 2; load 21; store 13; sh 45; mov 97; lea 4; cmp 43 read8 tot 404; size 1660; rsp 8; push 1; pop 4; call 2; load 21; store 13; sh 45; mov 97; lea 4; cmp 39 func_entry tot 42; size 165; rsp 0; push 0; pop 0; call 1; load 4; store 2; sh 3; mov 12; lea 1; cmp 1 func_exit tot 37; size 149; rsp 0; push 0; pop 0; call 1; load 2; store 1; sh 3; mov 9; lea 1; cmp 1 After: write1 tot 367; size 1439; rsp 4; push 1; pop 5; call 2; load 21; store 14; sh 40; mov 93; lea 4; cmp 40 write2 tot 375; size 1493; rsp 4; push 1; pop 5; call 2; load 21; store 14; sh 40; mov 94; lea 4; cmp 44 write4 tot 375; size 1471; rsp 4; push 1; pop 5; call 2; load 21; store 14; sh 40; mov 94; lea 4; cmp 44 write8 tot 365; size 1435; rsp 4; push 1; pop 5; call 2; load 21; store 14; sh 40; mov 94; lea 4; cmp 40 read1 tot 412; size 1682; rsp 4; push 1; pop 4; call 2; load 21; store 14; sh 45; mov 96; lea 4; cmp 40 read2 tot 414; size 1698; rsp 4; push 1; pop 4; call 2; load 21; store 14; sh 45; mov 96; lea 4; cmp 44 read4 tot 414; size 1698; rsp 4; push 1; pop 4; call 2; load 21; store 14; sh 45; mov 96; lea 4; cmp 44 read8 tot 404; size 1642; rsp 4; push 1; pop 4; call 2; load 21; store 14; sh 45; mov 96; lea 4; cmp 40 func_entry tot 48; size 197; rsp 0; push 0; pop 0; call 1; load 4; store 3; sh 3; mov 14; lea 1; cmp 2 func_exit tot 45; size 181; rsp 0; push 0; pop 0; call 1; load 2; store 2; sh 3; mov 11; lea 1; cmp 2
ThreadState -> TaskContext (of physical thread, always linked) -> TaskContext (current fiber context)
Comment Actions Hi Dmitry, In case of single-threaded scheduler, it is worth to synchronize fibers on each switch. Currently I do it in client code using _tsan_release()/_tsan_acquire(), but it is possible to add flag for _tsan_switch_to_fiber() that will do it. In case of multithreaded scheduler, client probably has own sychronization primitives (for example, custom mutex), and it is client's responsibility to call corresponding TSAN functions.
I have checked parts of the patch separately and found:
For us possibility to use context of another thread is required feature. We use it to call into library which use fibers from external finber-unaware code.
I think, it's enough to add check and abort until someone will really need support for it
It will not solve our requirements. It is convenient to set fiber name from the fiber itself and even change it sometimes, analogously to pthread_setname_np().
OK
GCD has no nested task. GCD tasks can start other tasks, but can't switch or wait. For me current API looks more generic then what you suggest.
My idea is when setjmp/longjmp is used for fiber switch, client should disable interceptors before call and enable them after it. This procedure can be simplified by flag of _tsan_switch_to_fiber().
I prefer not to do it:
Comment Actions
Comment Actions FTR, Lingfeng did prototype change to qemu that uses these annotations: Comment Actions Thanks for the mention Dmitry. yuri, that change points to the needed changes to support TSAN with QEMU using the patch set before the latest upload today. We've tried it and it seems to work with the full emulator, but there are a lot of false positive messages and we think that it's possible to not require annotations in QEMU if interceptors for ucontext API were added here, and a refactoring was done to make sure jmpbufs did not get changed on all fiber switches. The problem is that QEMU has this way of implementing coroutines: Calling thread, qemu_coroutine_new:
In the new context, in coroutine_trampoline:
Later on, in the QEMU code when switching to that coroutine, in qemu_coroutine_switch:
The current way suggests to insert fiber creation/switching aroudn getcontext and swapcontext. However, since swapcontext is used in such conjunction with longjmp, this creates an issue where one also needs to add fiber switch calls to longjmp where previously, we didn't have to do that before. That's because the current way of switching fibers identifies fibers with the complete state of a cur_thread, so fiber switch + swapcontext may also unintendedly exchange the jmpbufs as well. At least for this use case, it looks like we need to separate jmp bufs from fiber switches, and generally separate fibers from physical threads, because they differ at least in the set of jmpbufs that must be preserved.
We also found that there were a lot of false positive warnings generated by QEMU due to switching fibers on the same physical thread + updating common state. TSAN should be aware of fiber switches that are restricted to one physical thread and consider the operations implicitly synchronized. This is another reason to separate their state. To summarize: The set of jumpbufs should stay invariant when changing "fibers" via setjmp/longjmp. The set of jmpbufs is not necessarily invariant under swapcontext, because swapcontext can be done across multiple physical threads, while setjmp/longjmp must remain on one thread. However, qemu runs setjmp -> swapcontext -> longjmp all in one thread; the physical thread should be tracked for those calls and it should allow access to the jmpbuf set. For example, if we swapcontext in another physical thread we cannot use the same jmpbufs, but if we swapcontext in one thread it should use the current jmpbufs. This should allow that logic to work for QEMU at least, and it feels like it should work for all possible other uses (?) On fiber API versus intercepting swapcontext/makecontext: Maybe we can have both. Comment Actions Lingfeng, thanks for extensive feedback.
I am not sure there are no programs that use setjmp/longjmp to switch fibers across threads. Yes, man says behavior is undefined. But strictly saying, longjmp shouldn't be used for fiber switching too (it's solely for jumping upwards on the same stack). So I feel people are just relying on implementation details here (let's test it, works, ok, let's use it). You said that QEMU uses sigsetjmp/siglongjmp for coroutine switches. How does a coroutine always stay on the same thread then? Say, if they have a thread pool servicing all coroutines, I would expect that coroutines are randomly intermixed across threads. So it can happen that we setjmp on one thread, but then reschedule the coroutine with longjmp on another thread. If it's the case, then it's quite unpleasant b/c it means we need a global list of jmpbufs. And that won't work with the current jmpbug garbage collection logic and sp level checks. So we will probably need to understand that this particular jmpbuf is a fiber and then handle it specially. Comment Actions I think that synchronization will be exactly in the wrong place. If fiber switch will synchronize parent thread and the fiber, then we will get the desired transitive synchronization. Looking at the race reports produces by the qemu prototype, it seems there are actually all kinds of these false reports. There are false races on TLS storage. There are races on fiber start function. There are numerous races on arguments passed to fibers. You said that for your case you need the synchronization, and it seems that it's also needed for all other cases too. I think we need to do it. People will get annotations wrong, or don't think about acquire/release at all, and then come to mailing list with questions about all these races. If it will prove to be needed, we can always add an opt-out of synchronization with a new flag for the annotations. Comment Actions Yes, cur_thread() is the problem. It's used in all the hottest functions.
You mean Processor?
Not if we always synchronize on fiber switch.
Go is different. Processor is not user visible, ThreadState is. Comment Actions
Tell me more.
Let's add a check and a test then. These things tend to lead to very obscure failure modes that take days to debug on large user programs.
I see. Let's leave it as separate function. I guess it's more flexible and will cover all possible use cases.
Let's leave tasks aside for now. It's quite complex without tasks already.
What do you mean by "disable interceptors"? Do we have such thing?
How? setjmp/longjmp happen outside of __tsan_switch_to_fiber. Comment Actions Thinking of how we can eliminate the indirection in cur_thread... We could not even allocate a tid for fiber (tid space is expensive) if we record fiber switch events in the trace. Then all fibers running on a thread could reuse its tid, and we could restore fiber identity for memory accesses from trace. But it get pretty complicated at this point. I think we sould split ThreadState into Thread part (which still can be called ThreadState) and Fiber part. ThreadState should contain at least fast_state, fast_synch_epoch, clock and maybe some of the int's and bool's (as they are easy to copy) and pointer to the current Fiber and to the physical thread Fiber. Fiber part will contain everything else. Fiber switch will switch Fiber pointer in ThreadState and copyout/copyin/update fast_state, fast_synch_epoch and clock. This will eliminate the indirection on hot paths and will also open path to nested tasks (as Fiber can contain pointer to a nested Fiber).
Comment Actions Thanks for pointing that out. I have to check more closely, but I haven't observed migration of fibers across physical threads at all in qemu. The tendency is to both create and switch the coroutine on a dedicated thread. Swapcontext is done to a trampoline function, to which one can return again later on. Kind of like a fiber pool, where reuse of a fiber means longjmping to the trampoline instead of another swapcontext. I don't think there is a thread pool that mixes around the fibers. I guess this means we would have to annotate in the more general use case, but for QEMU, if we make the assumption that setjmp/longjmp follow the spec, while only allowing for physical thread switches in swapcontext, we should be OK (And physical thread switching in swapcontext is somethign QEMU does not do). Comment Actions Makes sense. If somebody will want to do longjmp to switch fibers across threads, we will need some additional support in tsan for that. Let's stick with Yuri and your use cases for now. Comment Actions
Comment Actions I added default synchronization and flag to opt-out. It would be great if someone can check this version with QEMU. Comment Actions New version should be as fast as original code.
IMHO it is hardly possible function entry/exit use shadow stack, memory access functions use clock. Both structure are too big to be copied on each fiber switch.
Can't agree with this. Only fiber is visible to user. Of cause, it is the same as thread until user switches it. Comment Actions In standalone thread we create temporary context and switch into into it. Then original thread context is posted into event loop thread, where it runs for some time together with other fibers. At some point code decides that it should migrate to original thread. Context is removed from list of fibers and original thread (running temporary context) is signalled. Thread switches into its own context and destroys temporary context. This trick significantly simplifies implementation of bindings for library that uses fibers internally and improves debugability because stack traces are preserved after switch.
Currently there is no interceptor for pthread_exit(). Do you suggest to add it?
Isn't it enough to place _tsan_ignore_thread_begin()/_tsan_ignore_thread_end() around setjmp/longjmp calls?
Comment Actions
Whatever we call them, Processor is for different things. It must not hold anything related to user state.
Oh, I see. I guess it can have some rough edges if used incorrectly, but also probably can work if used carefully.
Yes.
These only ignore memory accesses, they don't affect interceptor operation as far as I see. Comment Actions I don't completely follow logic behind cur_thread/cur_thread_fast/cur_thread1 and how it does not introduce slowdown. Comment Actions
Can you swicth to it in your codebase? I would expect that you need all (or almost all) of this synchronization anyway?
Please add such test (or we will break it in future).
This is a good question. Comment Actions FTR, here is current code: 00000000004b2c50 <__tsan_read2>: 4b2c50: 48 b8 f8 ff ff ff ff movabs $0xffff87fffffffff8,%rax 4b2c57: 87 ff ff 4b2c5a: 48 ba 00 00 00 00 00 movabs $0x40000000000,%rdx 4b2c61: 04 00 00 4b2c64: 53 push %rbx 4b2c65: 48 21 f8 and %rdi,%rax 4b2c68: 48 8b 74 24 08 mov 0x8(%rsp),%rsi 4b2c6d: 48 31 d0 xor %rdx,%rax 4b2c70: 48 83 3c 85 00 00 00 cmpq $0xffffffffffffffff,0x0(,%rax,4) 4b2c77: 00 ff 4b2c79: 0f 84 9d 00 00 00 je 4b2d1c <__tsan_read2+0xcc> 4b2c7f: 49 c7 c0 c0 04 fc ff mov $0xfffffffffffc04c0,%r8 4b2c86: 64 49 8b 08 mov %fs:(%r8),%rcx 4b2c8a: 48 85 c9 test %rcx,%rcx 4b2c8d: 0f 88 89 00 00 00 js 4b2d1c <__tsan_read2+0xcc> ... 4b2cf6: 64 f3 41 0f 7e 50 08 movq %fs:0x8(%r8),%xmm2 ... 4b2d36: 64 49 89 10 mov %rdx,%fs:(%r8) Here is new code: 00000000004b8460 <__tsan_read2>: 4b8460: 48 b8 f8 ff ff ff ff movabs $0xffff87fffffffff8,%rax 4b8467: 87 ff ff 4b846a: 48 ba 00 00 00 00 00 movabs $0x40000000000,%rdx 4b8471: 04 00 00 4b8474: 53 push %rbx 4b8475: 48 21 f8 and %rdi,%rax 4b8478: 48 8b 74 24 08 mov 0x8(%rsp),%rsi 4b847d: 48 31 d0 xor %rdx,%rax 4b8480: 48 83 3c 85 00 00 00 cmpq $0xffffffffffffffff,0x0(,%rax,4) 4b8487: 00 ff 4b8489: 0f 84 9f 00 00 00 je 4b852e <__tsan_read2+0xce> 4b848f: 48 c7 c2 f8 ff ff ff mov $0xfffffffffffffff8,%rdx 4b8496: 64 4c 8b 02 mov %fs:(%rdx),%r8 4b849a: 49 8b 08 mov (%r8),%rcx 4b849d: 48 85 c9 test %rcx,%rcx 4b84a0: 0f 88 88 00 00 00 js 4b852e <__tsan_read2+0xce> ... 4b8509: f3 41 0f 7e 50 08 movq 0x8(%r8),%xmm2 ... 4b8546: 49 89 10 mov %rdx,(%r8) The additional indirection is "mov (%r8),%rcx". Comment Actions https://android-review.googlesource.com/c/platform/external/qemu/+/844675 Latest version of patch doesn't work with QEMU anymore, at least with those annotations. Error log: qemu_coroutine_new:181:0x7f02938cf700:0x7f029388f7c0 Start new coroutine #0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_rtl_report.cc:48:25 (qemu-system-x86_64+0x536168) #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79:5 (qemu-system-x86_64+0x4bcf4f) #2 LongJmp(__tsan::ThreadState*, unsigned long*) /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:531:7 (qemu-system-x86_64+0x4d2311) #3 siglongjmp /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:647:3 (qemu-system-x86_64+0x4d23ce) #4 coroutine_trampoline /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/coroutine-ucontext.c:165:9 (qemu-system-x86_64+0xb72000) #5 <null> <null> (libc.so.6+0x43fcf) Do you know what I'm doing wrong here? Comment Actions
Comment Actions That worked, thanks! Sorry, I had updated the patch with a refactoring the meantime that probably actually caused the break. Comment Actions QEMU status: Fewer false positives now; there are many warnings that seem more real now, mostly about not atomically reading variables that got atomically updated. There might be other issues as well. Comment Actions WARNING: ThreadSanitizer: data race (pid=3742)
Atomic write of size 1 at 0x7b0c00051620 by thread T14 (mutexes: write M1097):
#0 __tsan_atomic8_exchange /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:579:3 (qemu-system-x86_64+0x51d0a8)
#1 qemu_bh_schedule /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:167:9 (qemu-system-x86_64+0xb4675e)
#2 worker_thread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:114:9 (qemu-system-x86_64+0xb4838c)
#3 qemu_thread_trampoline /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-thread-posix.c:551:17 (qemu-system-x86_64+0xb4fe36)
Previous read of size 1 at 0x7b0c00051620 by thread T10 (mutexes: write M985): #0 aio_compute_timeout /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:198:17 (qemu-system-x86_64+0xb46868) #1 aio_poll /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/aio-posix.c:617:26 (qemu-system-x86_64+0xb4c38c) #2 blk_prw /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1247:9 (qemu-system-x86_64+0x98bbbd) #3 blk_pread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1409:15 (qemu-system-x86_64+0x98b91a) #4 find_image_format /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:701:11 (qemu-system-x86_64+0x958068) #5 bdrv_open_inherit /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2690 (qemu-system-x86_64+0x958068) #6 bdrv_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2802:12 (qemu-system-x86_64+0x958d1f) #7 blk_new_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:375:10 (qemu-system-x86_64+0x98927a) #8 blockdev_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:598:15 (qemu-system-x86_64+0x9c5700) #9 drive_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:1092 (qemu-system-x86_64+0x9c5700) #10 drive_init(void*, QemuOpts*, Error**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:366:28 (qemu-system-x86_64+0xb1e9a2) #11 qemu_opts_foreach /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-option.c:1106:14 (qemu-system-x86_64+0xb696e5) #12 android_drive_share_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:530:9 (qemu-system-x86_64+0xb1e1e0) #13 main_impl /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:5244:13 (qemu-system-x86_64+0x54f811) #14 run_qemu_main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3349:21 (qemu-system-x86_64+0x547a02) #15 enter_qemu_main_loop(int, char**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:606:5 (qemu-system-x86_64+0x54427c) #16 MainLoopThread::run() /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/emulator-qt-window.h:73:13 (qemu-system-x86_64+0xc40b41) #17 QThreadPrivate::start(void*) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:367:14 (libQt5Core.so.5+0xa7d35) Location is heap block of size 40 at 0x7b0c00051600 allocated by thread T13: #0 malloc /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:667:5 (qemu-system-x86_64+0x4d246c) #1 g_malloc /tmp/jansene-build-temp-193494/src/glib-2.38.2/glib/gmem.c:104 (qemu-system-x86_64+0xecfdc0) #2 thread_pool_init_one /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:307:27 (qemu-system-x86_64+0xb47a63) #3 thread_pool_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:327 (qemu-system-x86_64+0xb47a63) #4 aio_get_thread_pool /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:320:28 (qemu-system-x86_64+0xb46934) #5 paio_submit_co /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1565:12 (qemu-system-x86_64+0xa15d23) #6 raw_co_prw /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1620 (qemu-system-x86_64+0xa15d23) #7 raw_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1627 (qemu-system-x86_64+0xa15d23) #8 bdrv_driver_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:924:16 (qemu-system-x86_64+0x8d91f7) #9 bdrv_aligned_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:1228 (qemu-system-x86_64+0x8d91f7) #10 bdrv_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:1324:11 (qemu-system-x86_64+0x8d8dd7) #11 blk_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1158:11 (qemu-system-x86_64+0x98b0da) #12 blk_read_entry /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1206:17 (qemu-system-x86_64+0x98c1e3) #13 coroutine_trampoline /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/coroutine-ucontext.c:173:9 (qemu-system-x86_64+0xb71c8a) #14 <null> <null> (libc.so.6+0x43fcf) Mutex M1097 (0x7b3800009bd0) created at: #0 pthread_mutex_init /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1187:3 (qemu-system-x86_64+0x4d4ebc) #1 qemu_mutex_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-thread-posix.c:61:11 (qemu-system-x86_64+0xb4f2a7) #2 thread_pool_init_one /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:308:5 (qemu-system-x86_64+0xb47a7b) #3 thread_pool_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:327 (qemu-system-x86_64+0xb47a7b) #4 aio_get_thread_pool /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:320:28 (qemu-system-x86_64+0xb46934) #5 paio_submit_co /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1565:12 (qemu-system-x86_64+0xa15d23) #6 raw_co_prw /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1620 (qemu-system-x86_64+0xa15d23) #7 raw_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1627 (qemu-system-x86_64+0xa15d23) #8 bdrv_driver_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:924:16 (qemu-system-x86_64+0x8d91f7) #9 bdrv_aligned_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:1228 (qemu-system-x86_64+0x8d91f7) #10 bdrv_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:1324:11 (qemu-system-x86_64+0x8d8dd7) #11 blk_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1158:11 (qemu-system-x86_64+0x98b0da) #12 blk_read_entry /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1206:17 (qemu-system-x86_64+0x98c1e3) #13 coroutine_trampoline /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/coroutine-ucontext.c:173:9 (qemu-system-x86_64+0xb71c8a) #14 <null> <null> (libc.so.6+0x43fcf) Mutex M985 (0x000003bb0fb8) created at: #0 pthread_mutex_init /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1187:3 (qemu-system-x86_64+0x4d4ebc) #1 qemu_mutex_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-thread-posix.c:61:11 (qemu-system-x86_64+0xb4f2a7) #2 qemu_init_cpu_loop /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../cpus.c:1123:5 (qemu-system-x86_64+0x5ddb0c) #3 main_impl /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3437:5 (qemu-system-x86_64+0x547ae0) #4 run_qemu_main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3349:21 (qemu-system-x86_64+0x547a02) #5 enter_qemu_main_loop(int, char**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:606:5 (qemu-system-x86_64+0x54427c) #6 MainLoopThread::run() /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/emulator-qt-window.h:73:13 (qemu-system-x86_64+0xc40b41) #7 QThreadPrivate::start(void*) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:367:14 (libQt5Core.so.5+0xa7d35) Thread T14 (tid=3886, running) created by thread T10 at: #0 pthread_create /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:968:3 (qemu-system-x86_64+0x4d3cda) #1 qemu_thread_create /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-thread-posix.c:591:11 (qemu-system-x86_64+0xb4fcef) #2 do_spawn_thread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:135:5 (qemu-system-x86_64+0xb48072) #3 spawn_thread_bh_fn /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:143 (qemu-system-x86_64+0xb48072) #4 aio_bh_call /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:90:5 (qemu-system-x86_64+0xb46616) #5 aio_bh_poll /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:118 (qemu-system-x86_64+0xb46616) #6 aio_poll /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/aio-posix.c:706:17 (qemu-system-x86_64+0xb4cf45) #7 blk_prw /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1247:9 (qemu-system-x86_64+0x98bbbd) #8 blk_pread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1409:15 (qemu-system-x86_64+0x98b91a) #9 find_image_format /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:701:11 (qemu-system-x86_64+0x958068) #10 bdrv_open_inherit /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2690 (qemu-system-x86_64+0x958068) #11 bdrv_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2802:12 (qemu-system-x86_64+0x958d1f) #12 blk_new_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:375:10 (qemu-system-x86_64+0x98927a) #13 blockdev_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:598:15 (qemu-system-x86_64+0x9c5700) #14 drive_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:1092 (qemu-system-x86_64+0x9c5700) #15 drive_init(void*, QemuOpts*, Error**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:366:28 (qemu-system-x86_64+0xb1e9a2) #16 qemu_opts_foreach /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-option.c:1106:14 (qemu-system-x86_64+0xb696e5) #17 android_drive_share_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:530:9 (qemu-system-x86_64+0xb1e1e0) #18 main_impl /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:5244:13 (qemu-system-x86_64+0x54f811) #19 run_qemu_main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3349:21 (qemu-system-x86_64+0x547a02) #20 enter_qemu_main_loop(int, char**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:606:5 (qemu-system-x86_64+0x54427c) #21 MainLoopThread::run() /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/emulator-qt-window.h:73:13 (qemu-system-x86_64+0xc40b41) #22 QThreadPrivate::start(void*) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:367:14 (libQt5Core.so.5+0xa7d35) Thread T10 'MainLoopThread' (tid=3881, running) created by main thread at: #0 pthread_create /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:968:3 (qemu-system-x86_64+0x4d3cda) #1 QThread::start(QThread::Priority) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:726:16 (libQt5Core.so.5+0xa84fb) #2 skin_winsys_spawn_thread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/winsys-qt.cpp:519:17 (qemu-system-x86_64+0xbcf68b) #3 main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:1624:5 (qemu-system-x86_64+0x543f9c) Thread T13 (tid=0, running) created by thread T10 at: #0 on_new_fiber /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/coroutine-ucontext.c:90:25 (qemu-system-x86_64+0xb71b48) #1 qemu_coroutine_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/coroutine-ucontext.c:217 (qemu-system-x86_64+0xb71b48) #2 qemu_coroutine_create /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-coroutine.c:88:14 (qemu-system-x86_64+0xb70349) #3 blk_prw /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1245:25 (qemu-system-x86_64+0x98baca) #4 blk_pread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1409:15 (qemu-system-x86_64+0x98b91a) #5 find_image_format /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:701:11 (qemu-system-x86_64+0x958068) #6 bdrv_open_inherit /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2690 (qemu-system-x86_64+0x958068) #7 bdrv_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2802:12 (qemu-system-x86_64+0x958d1f) #8 blk_new_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:375:10 (qemu-system-x86_64+0x98927a) #9 blockdev_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:598:15 (qemu-system-x86_64+0x9c5700) #10 drive_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:1092 (qemu-system-x86_64+0x9c5700) #11 drive_init(void*, QemuOpts*, Error**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:366:28 (qemu-system-x86_64+0xb1e9a2) #12 qemu_opts_foreach /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-option.c:1106:14 (qemu-system-x86_64+0xb696e5) #13 android_drive_share_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:530:9 (qemu-system-x86_64+0xb1e1e0) #14 main_impl /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:5244:13 (qemu-system-x86_64+0x54f811) #15 run_qemu_main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3349:21 (qemu-system-x86_64+0x547a02) #16 enter_qemu_main_loop(int, char**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:606:5 (qemu-system-x86_64+0x54427c) #17 MainLoopThread::run() /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/emulator-qt-window.h:73:13 (qemu-system-x86_64+0xc40b41) #18 QThreadPrivate::start(void*) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:367:14 (libQt5Core.so.5+0xa7d35) SUMMARY: ThreadSanitizer: data race /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:167:9 in qemu_bh_scheduleWARNING: ThreadSanitizer: data race (pid=3742)
Read of size 4 at 0x7b4400025758 by thread T14 (mutexes: write M1097):
#0 aio_notify /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:342:14 (qemu-system-x86_64+0xb46778)
#1 qemu_bh_schedule /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:168 (qemu-system-x86_64+0xb46778)
#2 worker_thread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:114:9 (qemu-system-x86_64+0xb4838c)
#3 qemu_thread_trampoline /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-thread-posix.c:551:17 (qemu-system-x86_64+0xb4fe36)
Previous atomic write of size 4 at 0x7b4400025758 by thread T10 (mutexes: write M985): #0 __tsan_atomic32_fetch_add /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:616:3 (qemu-system-x86_64+0x51df18) #1 aio_poll /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/aio-posix.c:608:9 (qemu-system-x86_64+0xb4c342) #2 blk_prw /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1247:9 (qemu-system-x86_64+0x98bbbd) #3 blk_pread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1409:15 (qemu-system-x86_64+0x98b91a) #4 find_image_format /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:701:11 (qemu-system-x86_64+0x958068) #5 bdrv_open_inherit /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2690 (qemu-system-x86_64+0x958068) #6 bdrv_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2802:12 (qemu-system-x86_64+0x958d1f) #7 blk_new_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:375:10 (qemu-system-x86_64+0x98927a) #8 blockdev_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:598:15 (qemu-system-x86_64+0x9c5700) #9 drive_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:1092 (qemu-system-x86_64+0x9c5700) #10 drive_init(void*, QemuOpts*, Error**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:366:28 (qemu-system-x86_64+0xb1e9a2) #11 qemu_opts_foreach /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-option.c:1106:14 (qemu-system-x86_64+0xb696e5) #12 android_drive_share_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:530:9 (qemu-system-x86_64+0xb1e1e0) #13 main_impl /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:5244:13 (qemu-system-x86_64+0x54f811) #14 run_qemu_main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3349:21 (qemu-system-x86_64+0x547a02) #15 enter_qemu_main_loop(int, char**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:606:5 (qemu-system-x86_64+0x54427c) #16 MainLoopThread::run() /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/emulator-qt-window.h:73:13 (qemu-system-x86_64+0xc40b41) #17 QThreadPrivate::start(void*) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:367:14 (libQt5Core.so.5+0xa7d35) Location is heap block of size 296 at 0x7b44000256c0 allocated by thread T10: #0 calloc /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:684:5 (qemu-system-x86_64+0x4d26ef) #1 g_malloc0 /tmp/jansene-build-temp-193494/src/glib-2.38.2/glib/gmem.c:134 (qemu-system-x86_64+0xecfe18) #2 qemu_init_main_loop /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/main-loop.c:165:24 (qemu-system-x86_64+0xb4abac) #3 main_impl /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:4605:9 (qemu-system-x86_64+0x54b937) #4 run_qemu_main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3349:21 (qemu-system-x86_64+0x547a02) #5 enter_qemu_main_loop(int, char**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:606:5 (qemu-system-x86_64+0x54427c) #6 MainLoopThread::run() /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/emulator-qt-window.h:73:13 (qemu-system-x86_64+0xc40b41) #7 QThreadPrivate::start(void*) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:367:14 (libQt5Core.so.5+0xa7d35) Mutex M1097 (0x7b3800009bd0) created at: #0 pthread_mutex_init /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1187:3 (qemu-system-x86_64+0x4d4ebc) #1 qemu_mutex_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-thread-posix.c:61:11 (qemu-system-x86_64+0xb4f2a7) #2 thread_pool_init_one /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:308:5 (qemu-system-x86_64+0xb47a7b) #3 thread_pool_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:327 (qemu-system-x86_64+0xb47a7b) #4 aio_get_thread_pool /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:320:28 (qemu-system-x86_64+0xb46934) #5 paio_submit_co /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1565:12 (qemu-system-x86_64+0xa15d23) #6 raw_co_prw /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1620 (qemu-system-x86_64+0xa15d23) #7 raw_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/file-posix.c:1627 (qemu-system-x86_64+0xa15d23) #8 bdrv_driver_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:924:16 (qemu-system-x86_64+0x8d91f7) #9 bdrv_aligned_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:1228 (qemu-system-x86_64+0x8d91f7) #10 bdrv_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/io.c:1324:11 (qemu-system-x86_64+0x8d8dd7) #11 blk_co_preadv /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1158:11 (qemu-system-x86_64+0x98b0da) #12 blk_read_entry /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1206:17 (qemu-system-x86_64+0x98c1e3) #13 coroutine_trampoline /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/coroutine-ucontext.c:173:9 (qemu-system-x86_64+0xb71c8a) #14 <null> <null> (libc.so.6+0x43fcf) Mutex M985 (0x000003bb0fb8) created at: #0 pthread_mutex_init /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1187:3 (qemu-system-x86_64+0x4d4ebc) #1 qemu_mutex_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-thread-posix.c:61:11 (qemu-system-x86_64+0xb4f2a7) #2 qemu_init_cpu_loop /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../cpus.c:1123:5 (qemu-system-x86_64+0x5ddb0c) #3 main_impl /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3437:5 (qemu-system-x86_64+0x547ae0) #4 run_qemu_main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3349:21 (qemu-system-x86_64+0x547a02) #5 enter_qemu_main_loop(int, char**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:606:5 (qemu-system-x86_64+0x54427c) #6 MainLoopThread::run() /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/emulator-qt-window.h:73:13 (qemu-system-x86_64+0xc40b41) #7 QThreadPrivate::start(void*) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:367:14 (libQt5Core.so.5+0xa7d35) Thread T14 (tid=3886, running) created by thread T10 at: #0 pthread_create /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:968:3 (qemu-system-x86_64+0x4d3cda) #1 qemu_thread_create /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-thread-posix.c:591:11 (qemu-system-x86_64+0xb4fcef) #2 do_spawn_thread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:135:5 (qemu-system-x86_64+0xb48072) #3 spawn_thread_bh_fn /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/thread-pool.c:143 (qemu-system-x86_64+0xb48072) #4 aio_bh_call /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:90:5 (qemu-system-x86_64+0xb46616) #5 aio_bh_poll /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:118 (qemu-system-x86_64+0xb46616) #6 aio_poll /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/aio-posix.c:706:17 (qemu-system-x86_64+0xb4cf45) #7 blk_prw /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1247:9 (qemu-system-x86_64+0x98bbbd) #8 blk_pread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:1409:15 (qemu-system-x86_64+0x98b91a) #9 find_image_format /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:701:11 (qemu-system-x86_64+0x958068) #10 bdrv_open_inherit /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2690 (qemu-system-x86_64+0x958068) #11 bdrv_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block.c:2802:12 (qemu-system-x86_64+0x958d1f) #12 blk_new_open /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../block/block-backend.c:375:10 (qemu-system-x86_64+0x98927a) #13 blockdev_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:598:15 (qemu-system-x86_64+0x9c5700) #14 drive_new /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../blockdev.c:1092 (qemu-system-x86_64+0x9c5700) #15 drive_init(void*, QemuOpts*, Error**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:366:28 (qemu-system-x86_64+0xb1e9a2) #16 qemu_opts_foreach /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/qemu-option.c:1106:14 (qemu-system-x86_64+0xb696e5) #17 android_drive_share_init /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/drive-share.cpp:530:9 (qemu-system-x86_64+0xb1e1e0) #18 main_impl /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:5244:13 (qemu-system-x86_64+0x54f811) #19 run_qemu_main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../vl.c:3349:21 (qemu-system-x86_64+0x547a02) #20 enter_qemu_main_loop(int, char**) /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:606:5 (qemu-system-x86_64+0x54427c) #21 MainLoopThread::run() /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/emulator-qt-window.h:73:13 (qemu-system-x86_64+0xc40b41) #22 QThreadPrivate::start(void*) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:367:14 (libQt5Core.so.5+0xa7d35) Thread T10 'MainLoopThread' (tid=3881, running) created by main thread at: #0 pthread_create /usr/local/google/home/lfy/aosp-llvm-toolchain/toolchain/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:968:3 (qemu-system-x86_64+0x4d3cda) #1 QThread::start(QThread::Priority) /usr/local/google/home/joshuaduong/qt-build/src/qt-everywhere-src-5.11.1/qtbase/src/corelib/thread/qthread_unix.cpp:726:16 (libQt5Core.so.5+0xa84fb) #2 skin_winsys_spawn_thread /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android/android-emu/android/skin/qt/winsys-qt.cpp:519:17 (qemu-system-x86_64+0xbcf68b) #3 main /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../android-qemu2-glue/main.cpp:1624:5 (qemu-system-x86_64+0x543f9c) SUMMARY: ThreadSanitizer: data race /usr/local/google/home/lfy/emu2/master/external/qemu/objs/../util/async.c:342:14 in aio_notifyComment Actions
Comment Actions I assume that after program or new thread starts sanitizer will first see few (or maybe zero) interceptors, then _func_entry() and then everything else. Following this logic, I use cur_thread_fast() in all performance-critical entry points excluding _func_entry(). I ran tsan benchmarks and it looks like memory indirection by itself does not affect performance because variable is in CPU cache in most cases. At the same time, conditional check has visible effect on performance.
Comment Actions Unfortunately, I can't check it. Current implementation of our codebase work fine with synchronization on fiber switch. I plan to implement another mode when fibers are not synchronized by default, but only when they call special synchronization APIs (events, mutexes etc.). Such way I can catch more errors in code when fibers are running in the same thread only by chance. In order to implement it, I need some way to opt-out of synchronization in tsan. Comment Actions Sorry, I meant to switch to the built-in synchronization to validate that it works in real projects and either removes the need in manual synchronization annotations or at least significantly reduces the need in manual annotations.
Comment Actions Re performance, we have this open performance regression in clang codegen which makes it harder to do analysis: Comment Actions I've benchmarked on 350140 with host gcc version 7.3.0 (Debian 7.3.0-5), running old/new binary alternated: int main() { const int kSize = 2<<10; const int kRepeat = 1<<19; volatile long data[kSize]; for (int i = 0; i < kRepeat; i++) { for (int j = 0; j < kSize; j++) data[j] = 1; __atomic_load_n(&data[0], __ATOMIC_ACQUIRE); __atomic_store_n(&data[0], 1, __ATOMIC_RELEASE); } } compiler-rt$ TIME="%e" nice -20 time taskset -c 0 ./current.test compiler-rt$ TIME="%e" nice -20 time taskset -c 0 ./fiber.test This looks like 14% degradation. Comment Actions I improved test execution time. On my system I got following execution times (compared to original version of code):
Comment Actions "current" is compiler-rt HEAD with no local changes, or with the previous version of this change? This makes clang-compiled runtime 12% faster? Comment Actions What exactly has changed wrt performance? I see some small tweaks, but I am not sure if they are the main reason behind the speedup or I am missing something important. Comment Actions "current" is compiler-rt HEAD without any changes. In case of clang-compiled library, even previous version of patch was faster then HEAD. Looks like additional indirection of ThreadState by itself introduce minimal overhead, but affect code generation in unpredictable way, especially for clang.
Yes, I can see slowdown with previous version of patch and gcc-compiled library around 3.5%.
You may consider this as "cheating" because changes are general and not related to fibers. Most speedup is because of change in tsan_rtl.cc: - if (!SANITIZER_GO && *shadow_mem == kShadowRodata) { + if (!SANITIZER_GO && !kAccessIsWrite && *shadow_mem == kShadowRodata) { Placing LIKELY/UNLIKELY in code gives additional 1-2%. Comment Actions Benchmark results with clang: write8: read8: read4: fibers(new) is the current version of the change. fibers(old) is the one I used with gcc (no spot optimizations). So this change indeed makes it faster for clang, but the fastest clang version is still slower then gcc, so this suggests that there is something to improve in clang codegen. If we improve clang codegen, will this change again lead to slower code or not? It's bad that we have that open clang regression... Comment Actions I have found out which optimization is applied by gcc but not by clang and did it manually. Executions times for new version:
Comment Actions Hi Dmitry, Comment Actions
Just me finding time to review it. Comment Actions Spent a while benchmarking this. I used 4 variations of a test program: I used 3 variations of the runtime: current tsan runtime without modifications, this change, this change with cur_thread/cur_thread_fast returning reinterpret_cast<ThreadState *>(cur_thread_placeholder). In all cases I used self-hosted clang build on revision 353000. Fibers seem to incur ~4% slowdown on average. Comment Actions Since fiber support incurs slowdown for all current tsan users who don't use fibers, this is a hard decision. I've prototyped a change which leaves fast_state and fast_synch_epoch in TLS (the only state accessed on fast path besides the clock): struct FastThreadState { FastState fast_state; u64 fast_synch_epoch; }; __attribute__((tls_model("initial-exec"))) extern THREADLOCAL char cur_thread_faststate1[]; INLINE FastThreadState& cur_thread_faststate() { return *reinterpret_cast<FastThreadState *>(cur_thread_faststate1); } But this seems to be even slower than just using the single pointer indirection. Comment Actions Also benchmarked function entry/exit using the following benchmark: // foo1.c volatile int kRepeat = 1 << 30; const int repeat = kRepeat; for (int i = 0; i < repeat; i++) foo(false); } // foo2.c if (x) bar(); } The program spends ~75% of time in __tsan_func_entry/exit. Rest of the conditions are the same as in the previous benchmark. Current code runs in 7.16s Using the pointer indirection seems to positively affect func entry/exit codegen. Comment Actions But if I do: INLINE ThreadState *cur_thread_fast() { ThreadState* thr; __asm__("": "=a"(thr): "a"(&cur_thread_placeholder[0])); return thr; } (which is a dirty trick to force compiler to cache address of the tls object in a register) then the program runs 5.94s -- faster than any other options as it takes advantage of both no indirection and faster instructions. But this is not beneficial for __tsan_read/write functions because caching the address takes a register and these functions are already severely short on registers. Comment Actions Going forward I think we should get in all unrelated/preparatory changed first: thread type (creates lots of diffs), pthread_exit interceptor/test and spot optimizations to memory access functions. Comment Actions This is looked like an interesting optimization, but turns out to be too sensitive to unrelated code changes. Comment Actions Did another round of benchmarking of this change on the current HEAD using these 2 benchmarks: Here fibers is this change, and fibers* is this change with 2 additional changes:
Now ~2% slowdown on highly synthetic benchmarks looks like something we can tolerate (2 cases are actually faster). The cur_thread/cur_thread_fast separation still looks confusing to me. It's a convoluted way to do lazy initialization. If one adds any new calls to these functions, which one to choose is non-obvious. I think we should do lazy initialization explicitly. Namely, leave cur_thread alone, don't introduce cur_thread_fast, don't change any call sites. Instead, add init_cur_thread call that does lazy initialization to interceptor entry point and any other points that we expect can be the first call into tsan runtime overall or within a new thread. I think interceptors and tsan_init should be enough (no tsan_func_entry). We call __tsan_init from .preinit_array, instrumented code can't be executed before .preinit_array, only interceptors from dynamic loader can precede .preinit_array callbacks. With these 3 changes, it looks good to me and I am ready to merge it. Comment Actions To clarify the graph: it's difference in execution time in percents as compared to the current HEAD. I.e. -4 means that fibers are 4% slower than the current HEAD. Comment Actions
Comment Actions For now I added calls to cur_thread_init() into 3 places. It was enough to pass all tests on my system. I am not sure if it will work with different versions of glibc. What do you think about it? Comment Actions The change is now a very good shape.
I've tested on 2 more distributions and it worked for me.
Comment Actions There is a lot if interceptors that do if (cur_thread()->in_symbolizer) before SCOPED_INTERCEPTOR_RAW. What to do with them? Comment Actions Yikes! Good question! If we are in the symbolize we've already initialized cur_thread, since we are coming recursively from runtime. But this does not help because if we are not in symbolizer, we can have cur_thread not initialized... We have it in malloc, atexit and similar fundamental functions that can well be a function called during process or thread start. All of in_symbolizer checks call cur_thread in the same expression rather than use some local variable, i.e. they are of the form: if (cur_thread()->in_symbolizer) which suggests that we should introduce a helper in_symbolizer(void) function that will incapsulate cur_thread_init and the check (probably should go into tsan_interceptors.h). Comment Actions Committed in: Thanks for bearing with me, this touched very sensitive parts of runtime so I did not want to crush. This adds a useful capability to ThreadSanitizer. The performance improvements resulted from this work are much appreciated too. Comment Actions FTR updated check_analyze.sh in http://llvm.org/viewvc/llvm-project?view=revision&revision=353820 Comment Actions @lei, please help to investigate what happened in http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/11413 Is it possible get stack trace of crash? Comment Actions Hi Yuri, I think this might be breaking our aarch64-full buildbot [1]. I ran compiler-rt/test/sanitizer_common/tsan-aarch64-Linux/Linux/Output/clock_gettime.c.tmp and get this stack trace [2]: Any ideas? The bot has been red for quite a while now, so I think I will have to revert this while we investigate. [1] http://lab.llvm.org:8011/builders/clang-cmake-aarch64-full/builds/6556 Comment Actions
From: Yi-Hong Lyu via Phabricator <reviews@reviews.llvm.org> Yi-Hong.Lyu added a comment. Comment Actions I see a similar failure was already reported: Comment Actions Looks like it is already fixed. See http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/9157 Comment Actions I work on integrating this for OpenMP task into openmp/tools/archer/ompt-tsan.cpp Initial experiments of the integration into ompt-tsan show two issues:
==274531==FATAL: ThreadSanitizer: internal allocator is out of memory trying to allocate 0x3fb58 bytes or ==274830==ERROR: ThreadSanitizer failed to deallocate 0x41000 (266240) bytes at address 0x7f8813fd2000 ==274830==ERROR: ThreadSanitizer failed to deallocate 0x43000 (274432) bytes at address 0x0e2050cf1000 FATAL: ThreadSanitizer CHECK failed: /home/pj416018/TSAN/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix.cc:61 "(("unable to unmap" && 0)) != (0)" (0x0, 0x0) |
Bulk of this logic should go into tsan_rtl_thread.cc
This file is not meant to contain any significant logic, it's only interface part.
Please add a set of runtime functions, following current naming convention (e.g. FiberCreate FiberSwitch, etc), and forward to these functions.