In tsan, release sequences are never blocked, and all of them will continue indefinitely. It leads several data races to be missed. On the other hand, tsan does not recognise fence semantics and their role in synchronization, causing tsan to produce false positives.
See more in https://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2017/POPL.pdf
Now vector clock algorithm is modified according to the article. Correctness is proved by several unit tests.
Details
Diff Detail
- Repository
- rCRT Compiler Runtime
- Build Status
Buildable 18377 Build 18377: arc lint + arc unit
Event Timeline
Hi Yorov,
Sorry for delays, I was on a vacation and now have a backlog of work.
Thanks for working on this.
This needs some additional work before we can merge this. High-level notes:
- Fences and release sequences need to be separated into individual patches. I would suggest to start with fences as it looks simpler.
- Tsan is used on a very large programs with thousands of threads and tens of millions of sync objects. Performance and especially memory consumption of synchronization handling is crucial. Since tsan is already widely deployed any non-constant overheads need to be controlled by flags. For example, any fence handling need to be completely skipped if the flag is not set. Default value for these flags it's a hard question, but we can decide later.
- Vector's should not be used inside of sync objects. There are 2 problems with Vector's:
- they can use memory suboptimally, and this is unacceptable for sync objects since they can consume tens of gigs
- our memory allocator does reuse memory between different size classes, this leads to O(N^2) memory consumption as these Vector's slowly grow during program startup and eat memory in all consecutive size classes; that's the reason why current clocks are built as chunked data structures with constant-size blocks
- For fences check out what we did in KTSAN to reduce their overheads (acquire_active and release_active):
https://github.com/google/ktsan/blob/tsan/mm/ktsan/sync_atomic.c#L6
I think we need something similar here. The actual threshold for discarding fence effects should be controllable with a flag.
- For release fences it's possible to get their overhead to almost zero in common case. The idea is as follows. In common case there are no acquire operations in between release fence and the corresponding relaxed store, in such case we don't need to memorize the whole thread vector clock. Instead we can memorize only the current thread time. Then when we need to turn a relaxed store into release operation, we use the current thread vector clock (it did not change because we did not do any acquire operations) except for the current thread time, for the current thread time we use the previously saved value. This makes release fence cost O(1) instead of O(N).
But if there is an acquire operation in between, then we need to save the whole vector clock at that point. However, if we also do (4), then usually there won't be an acquire fence while a release fence is active.
lib/tsan/rtl/tsan_clock.cc | ||
---|---|---|
185 | This comment and "Check if we need to resize dst" does not seem to add much value. I would rather see more extensive comments for more complex parts of the change. | |
186 | This code base does not use {} around if/for blocks if they contain a single statement. Please fix here and below. | |
274 | The TODOs need to be resolved before commit. Either by fixing code, or by removing TODO if we are not going to fix code. | |
349 | revert | |
383 | Same here: either resolve it, or remove. | |
396 | We use 2 styles for comments:
So if you do this as a full line this needs to be: // Unshare before changing dst. | |
443 | Empty line between functions please. | |
lib/tsan/rtl/tsan_clock.h | ||
180 | I think we can use kMaxTid here. | |
lib/tsan/rtl/tsan_interface_atomic.cc | ||
227 | No dead code please. | |
test/CMakeLists.txt | ||
18 | I don't think this needs to be here as no other tests are listed here. |
I think we can use kMaxTid here.
kMaxTidInClock is kMaxTid*2, that's required to detect accesses to freed memory at low cost. But we should not access the second half of the clock here.