This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Fix bug in shared_timed_mutex that could cause a program to hang.
ClosedPublic

Authored by EricWF on Apr 1 2015, 9:09 PM.

Details

Summary

The summary of the bug, provided by Stephan T. Lavavej:

In shared_timed_mutex::try_lock_until() (line 195 in 3.6.0), you need to deliver a notification. The scenario is:

  • There are N threads holding the shared lock.
  • One thread calls try_lock_until() to attempt to acquire the exclusive lock. It sets the "I want to write" bool/bit, then waits for the N readers to drain away.
  • K more threads attempt to acquire the shared lock, but they notice that someone said "I want to write", so they block on a condition_variable.
  • At least one of the N readers is stubborn and doesn't release the shared lock.
  • The wannabe-writer times out, gives up, and unsets the "I want to write" bool/bit.

At this point, a notification (it needs to be notify_all) must be delivered to the condition_variable that the K wannabe-readers are waiting on. Otherwise, they can block forever without waking up.

Diff Detail

Event Timeline

EricWF updated this revision to Diff 23116.Apr 1 2015, 9:09 PM
EricWF retitled this revision from to [libcxx] Fix bug in shared_timed_mutex that could cause a program to hang..
EricWF updated this object.
EricWF edited the test plan for this revision. (Show Details)
EricWF added a reviewer: mclow.lists.
EricWF added a subscriber: Unknown Object (MLST).
EricWF updated this revision to Diff 23118.Apr 1 2015, 9:49 PM

Change notify_one() to notify_all(), using notify_one() can still cause deadlock. Also modify test so it would have caught that bug.

EricWF updated this revision to Diff 23159.Apr 2 2015, 9:23 AM

I think my last fix where we released the lock before calling notify_all() introduced a race condition where another thread could enter the lock() method reach gate2_ before gate1_ was notified. The thread waiting on __gate2_ would never be notified.

There must be better names for those variables than gate1 and gate2.

It's weird that the condition_variable::wait_until() loops aren't using the predicate versions of wait_until() so they wouldn't need the loops. Even in C++98 mode where you'd have to name the predicates, it seems like it'd be worth it.

The fix itself looks correct to me.

test/std/thread/thread.mutex/thread.mutex.requirements/thread.sharedtimedmutex.requirements/thread.sharedtimedmutex.class/try_lock_until_deadlock_bug.pass.cpp
28 ↗(On Diff #23159)

Nit: Can't you initialize this as:

std::atomic<int> readers_started(0);

?

31 ↗(On Diff #23159)

This comment is out of date.

45 ↗(On Diff #23159)

"blocked by main()."

EricWF added a comment.Apr 2 2015, 1:03 PM

The fix itself looks correct to me.

Good! The fix I had before released the mutex before calling notify_all(). I was hoping to do that as an optimization but I don't think it is possible. Should I be at all concerned about notifying other threads while holding the lock?

EricWF updated this revision to Diff 23175.Apr 2 2015, 1:06 PM

Address review comments.

EricWF updated this object.Apr 2 2015, 1:08 PM
EricWF added a comment.Apr 2 2015, 1:10 PM

Because the initialism for Stefan's name and that of the Standard Template Library are the same, I'd rather see this named something reflecting what the bug was, instead of the initials of the guy who reported it. This avoids the
confused question of "isn't every libc++ bug an STL bug?", despite him having made a big name for himself in the community.

Good point. I've changed it.

Also, did he file a PR for this?

No PR unfortunately. He emailed this bug out to a couple of people.

The fix itself looks correct to me.

Good! The fix I had before released the mutex before calling notify_all(). I was hoping to do that as an optimization but I don't think it is possible. Should I be at all concerned about notifying other threads while holding the lock?

There's no requirement that the mutex be either locked or unlocked when you call notify_*(). One choice might be faster than the other, but it depends on the mutex+condvar implementation. (A smart mutex+condvar can ensure that no thread is unblocked before the mutex is unlocked.) If you're tempted to optimize this, consider optimizing the fact that std::mutex uses pthread_mutex, which is big and slow, first.

jyasskin accepted this revision.Apr 2 2015, 1:44 PM
jyasskin added a reviewer: jyasskin.
This revision is now accepted and ready to land.Apr 2 2015, 1:44 PM
EricWF closed this revision.Apr 2 2015, 2:04 PM