This is an archive of the discontinued LLVM Phabricator instance.

[tsan] Update dispatch_group support to avoid using a disposed group object
ClosedPublic

Authored by kubamracek on Dec 9 2015, 8:08 AM.

Details

Summary

We're using the dispatch group itself to synchronize (to call Release() and Acquire() on it), but in dispatch group notifications, the group can already be disposed/deallocated. This causes a later assertion failure at DCHECK_EQ(*meta, 0); in MetaMap::AllocBlock when the same memory is reused (note that the failure only happens in debug builds).

Fixing this by retaining the group and releasing it in the notification. Adding a stress test case that reproduces this.

Diff Detail

Repository
rL LLVM

Event Timeline

kubamracek updated this revision to Diff 42300.Dec 9 2015, 8:08 AM
kubamracek retitled this revision from to [tsan] Update dispatch_group support to avoid using a disposed group object.
kubamracek updated this object.
kubamracek added reviewers: dvyukov, samsonov, glider, kcc.
kubamracek added subscribers: llvm-commits, zaks.anna.

Also, shouldn't we have some CHECK/DCHECK to make sure Acquire() and Release() aren't called on deallocated memory?

dvyukov added inline comments.Dec 11 2015, 5:18 AM
lib/tsan/rtl/tsan_libdispatch_mac.cc
93 ↗(On Diff #42300)

Looks like use-after-free. You've just freed the context line above.

93 ↗(On Diff #42300)

Please add a comment somewhere here explaining why we need this additional lifetime management.

93 ↗(On Diff #42300)

I don't understand the exact scenario that leads to the problem. The group is context->object_to_acquire, which we touch in the very beginning of this function. Then we call user callback orig_work. In your tests orig_work calls dispatch_group_leave on the group. So either (1) test code is also buggy, or (2) we don't have use-after-free in this function and don't need this change, or (3) I am missing something.

kubamracek added inline comments.Dec 13 2015, 7:42 AM
lib/tsan/rtl/tsan_libdispatch_mac.cc
93 ↗(On Diff #42300)

In your tests orig_work calls dispatch_group_leave

No, orig_work here is the empty block (or empty function) passed as argument to dispatch_group_notify (or dispatch_group_notify_f). The issue is that the API doesn't guarantee that the group itself will still be alive, when the notification is invoked.

Looks like use-after-free. You've just freed the context line above.

Oops, you're right.

dvyukov added inline comments.Dec 14 2015, 4:08 AM
lib/tsan/rtl/tsan_libdispatch_mac.cc
93 ↗(On Diff #42300)

I've missed that it is only for dispatch_group_notify.
Then, I think you need to move the release right below the Acquire and add a comment.

As a side note, it can make sense to detect such bugs in user code. If you add MemoryRead/Write in necessary places, tsan will be able to detect such racy-use-after-free usages. You can see tsan_rtl_mutex.cc for an example.

kubamracek updated this revision to Diff 42703.Dec 14 2015, 4:40 AM

Updating patch.

kubamracek added inline comments.Dec 14 2015, 4:46 AM
lib/tsan/rtl/tsan_libdispatch_mac.cc
100 ↗(On Diff #42703)

To detect racy-use-after-free, do we need to annotate all uses of the object (a dispatch group in this case) or is it okay to miss some? Will that produce false positives? Because here, the group is deallocated either by the explicit dispatch_retain on the main thread, or implicitly when the async block finishes. I'm not sure how to write a reliable test that would catch the bug here.

dvyukov accepted this revision.Dec 14 2015, 5:22 AM
dvyukov edited edge metadata.

LGTM

lib/tsan/rtl/tsan_libdispatch_mac.cc
100 ↗(On Diff #42703)

It is OK to miss some. As for reliable test, check out e.g. race_on_mutex2.c.
I can't say about implicit/explicit deallocation, you will need to experiment to see how we can detect some misuses without false positives.

This revision is now accepted and ready to land.Dec 14 2015, 5:22 AM
This revision was automatically updated to reflect the committed changes.