Objective-C employs tagged pointers, that is, small objects/values may be encoded directly in the pointer bits. The resulting pointer is not backed by an allocation/does not point to a valid memory. TSan infrastructure requires a valid address for Acquire/Release and Mutex{Lock/Unlock}.
This patch establishes such a mapping via a "dummy allocation" for each encountered tagged pointer value.
Details
Diff Detail
- Repository
- rCRT Compiler Runtime
Event Timeline
What are tagged pointers? When they are used? What is the actual value? How does synchronized handle them? Is there a man page or something with more info? It should be included in the comment for future generations.
lib/tsan/rtl/tsan_interceptors_mac.cc | ||
---|---|---|
302 | I still wonder what does synchronized itself do to lock them? | |
319 | It's safer/better to use a user allocation. Internal allocations may not be covered by shadow in future (not sure it has not in all configurations). And mutexes have limited functionality in non-app mem. |
lib/tsan/rtl/tsan_interceptors_mac.cc | ||
---|---|---|
302 | Ah, I think I now understand your question. Sorry for not communicating more clearly. The Obj-C runtime maintains a mapping: <pointer value -> lock>. So two pointers with the same value (pointer bits) will map to the same lock, regardless whether or not they are backed by an allocation. So our implementation already replicates the behavior of the Obj-C runtime. We considered reporting a blanket warning any time @synchronized is used with a tagged pointer, since it might have surprising behavior from the programmer's point of view. However, Devin made a convincing point that tagged pointers are only one special case of a larger issue here. Summary: | |
319 | With user_alloc TSan reports a race in the objc-synchronize-tagged.mm test. I do not completely understand what the cause is: the address/allocation is supposed to be racy by nature since it represents a mutex, which itself is used to establish synchronization. Do you have any insights here? |
lib/tsan/rtl/tsan_interceptors_mac.cc | ||
---|---|---|
302 | I see. Thanks for explanation. | |
319 | malloc and user_alloc should generally be the same. But I wonder if malloc interceptor detects that it's called a non-intrumented library and enables ignores so that it does not model access to the range. Yes, I think a false report is possible if we use user memory without ignores. Let's try to surround user_alloc with ThreadIgnoreBegin/End. Does it help? |
lib/tsan/rtl/tsan_interceptors_mac.cc | ||
---|---|---|
302 |
Some APIs return tagged pointers and certain patterns create tagged pointers. So user code can create new tagged pointers, but should not depend on it (no guarantees) and ideally the programmer shouldn't even be aware of it (transparent optimization). We can say that @synchronized is a place where this optimization leaks its implementation details.
Yes, that's our intention here. | |
319 | Your analysis is correct. With TSAN_OPTIONS=ignore_noninstrumented_modules=0 also malloc reports a warning and ThreadIgnoreBegin/End fixes the failing test. Good that we had that test. :) One last question: looking at the code I don't think so, but does the size of the dummy allocation matter, i.e., should it be something like sizeof(pthread_mutex_t)? |
lib/tsan/rtl/tsan_interceptors_mac.cc | ||
---|---|---|
319 | No, it doesn't have to be larger than 1 byte because a user mutex implementation can be just 1 byte (think of a simple spinlock). Tsan does not store any info inline (that would interfere with the mutex code itself), all info is stored on the side, so 1 byte enough. |
lib/tsan/rtl/tsan_interceptors_mac.cc | ||
---|---|---|
319 | Okay, got it. Thanks for the quick answers and great feedback, Dmitry! |
I still wonder what does synchronized itself do to lock them?
Since the optimization is transparent, it suggest that these things still have reference identity rather than values identity. But I fail to see how we respect this reference identity. Consider, we have two different objects that are small and converted to a tagger pointer with the same value (say, integer 1). Now we will use the same address for these 2 objects because they have the same value, so we think they are the same. Since we merge them we can get false deadlock reports and all kinds of bad stuff. But then I am confused how synchronized distinguish them.