Page MenuHomePhabricator

[TSan] Support Objective-C @synchronized with tagged pointers

Authored by yln on Jan 2 2019, 5:17 PM.



Objective-C employs tagged pointers, that is, small objects/values may be encoded directly in the pointer bits. The resulting pointer is not backed by an allocation/does not point to a valid memory. TSan infrastructure requires a valid address for Acquire/Release and Mutex{Lock/Unlock}.
This patch establishes such a mapping via a "dummy allocation" for each encountered tagged pointer value.

Diff Detail


Event Timeline

yln created this revision.Jan 2 2019, 5:17 PM
Herald added subscribers: Restricted Project, llvm-commits. · View Herald TranscriptJan 2 2019, 5:17 PM

What are tagged pointers? When they are used? What is the actual value? How does synchronized handle them? Is there a man page or something with more info? It should be included in the comment for future generations.

yln updated this revision to Diff 180112.Jan 3 2019, 11:35 AM
yln retitled this revision from [TSan] Support Objective-C @synchronized(ob) where obj is a tagged pointer to [TSan] Support Objective-C @synchronized with tagged pointers.

Improve/extend comments.

dvyukov added inline comments.Jan 4 2019, 1:24 AM
302 ↗(On Diff #180112)

I still wonder what does synchronized itself do to lock them?
Since the optimization is transparent, it suggest that these things still have reference identity rather than values identity. But I fail to see how we respect this reference identity. Consider, we have two different objects that are small and converted to a tagger pointer with the same value (say, integer 1). Now we will use the same address for these 2 objects because they have the same value, so we think they are the same. Since we merge them we can get false deadlock reports and all kinds of bad stuff. But then I am confused how synchronized distinguish them.

319 ↗(On Diff #180112)

It's safer/better to use a user allocation. Internal allocations may not be covered by shadow in future (not sure it has not in all configurations). And mutexes have limited functionality in non-app mem.

yln marked 2 inline comments as done.Jan 4 2019, 11:44 AM
yln added inline comments.
302 ↗(On Diff #180112)

Ah, I think I now understand your question. Sorry for not communicating more clearly.

The Obj-C runtime maintains a mapping: <pointer value -> lock>. So two pointers with the same value (pointer bits) will map to the same lock, regardless whether or not they are backed by an allocation. So our implementation already replicates the behavior of the Obj-C runtime.

We considered reporting a blanket warning any time @synchronized is used with a tagged pointer, since it might have surprising behavior from the programmer's point of view. However, Devin made a convincing point that tagged pointers are only one special case of a larger issue here.
In general, it is bad practice to lock on objects that you don't own/allocate/control the lifetime of. Think of a cache for small numbers, or interned string literals. If you use @synchronized with, e.g., two numbers of the same value, with the expectation that they are different objects, then it doesn't matter if they are tagged pointers or two pointers pointing to the same object. In both cases your assumption is broken. So this would need a more general warning to make sure programmers explicitly manage locks.

@synchronized locks on pointer values. We already replicate this behavior (I hope ;)

319 ↗(On Diff #180112)

With user_alloc TSan reports a race in the test. I do not completely understand what the cause is: the address/allocation is supposed to be racy by nature since it represents a mutex, which itself is used to establish synchronization. Do you have any insights here?
Plain old malloc seems to work.

dvyukov added inline comments.Jan 6 2019, 10:12 PM
302 ↗(On Diff #180112)

I see. Thanks for explanation.
Is it that a user can't create new tagged pointers?
But regardless, if this is what synchronized do, then I guess we can do the same. In some sense deadlock reports caused by merging will be true.

319 ↗(On Diff #180112)

malloc and user_alloc should generally be the same. But I wonder if malloc interceptor detects that it's called a non-intrumented library and enables ignores so that it does not model access to the range.
This can be checked by running with TSAN_OPTIONS=ignore_noninstrumented_modules=0

Yes, I think a false report is possible if we use user memory without ignores.
Consider a thread creates a mutex, then passes it to 2 threads and these 2 threads lock the mutex concurrently. In this case all proper synchronization is in place and the code is correct.
However, what tsan sees is that the mutex is created by one of the 2 threads that calls lock first. So it looks like one thread creates a mutex and another thread immidiately tries to lock it without any synchronization.

Let's try to surround user_alloc with ThreadIgnoreBegin/End. Does it help?

yln updated this revision to Diff 180520.Jan 7 2019, 10:47 AM
yln edited the summary of this revision. (Show Details)

Use user_alloc surrounded by ThreadIgnoreBegin/End.

yln marked 6 inline comments as done.Jan 7 2019, 11:05 AM
yln added inline comments.
302 ↗(On Diff #180112)

Is it that a user can't create new tagged pointers?

Some APIs return tagged pointers and certain patterns create tagged pointers. So user code can create new tagged pointers, but should not depend on it (no guarantees) and ideally the programmer shouldn't even be aware of it (transparent optimization). We can say that @synchronized is a place where this optimization leaks its implementation details.

But regardless, if this is what synchronized do, then I guess we can do the same. In some sense deadlock reports caused by merging will be true.

Yes, that's our intention here.

319 ↗(On Diff #180112)

Your analysis is correct. With TSAN_OPTIONS=ignore_noninstrumented_modules=0 also malloc reports a warning and ThreadIgnoreBegin/End fixes the failing test. Good that we had that test. :)

One last question: looking at the code I don't think so, but does the size of the dummy allocation matter, i.e., should it be something like sizeof(pthread_mutex_t)?

yln marked 2 inline comments as done.Jan 7 2019, 11:06 AM
dvyukov accepted this revision.Jan 7 2019, 11:10 AM
dvyukov added inline comments.
319 ↗(On Diff #180112)

No, it doesn't have to be larger than 1 byte because a user mutex implementation can be just 1 byte (think of a simple spinlock). Tsan does not store any info inline (that would interfere with the mutex code itself), all info is stored on the side, so 1 byte enough.

This revision is now accepted and ready to land.Jan 7 2019, 11:10 AM
yln marked 2 inline comments as done.Jan 7 2019, 11:15 AM
yln added inline comments.
319 ↗(On Diff #180112)

Okay, got it. Thanks for the quick answers and great feedback, Dmitry!

This revision was automatically updated to reflect the committed changes.
yln marked an inline comment as done.