This is an archive of the discontinued LLVM Phabricator instance.

Implement std::condition_variable via pthread_cond_clockwait() where available
ClosedPublic

Authored by tomcherry on Jul 26 2019, 10:22 AM.

Details

Summary

std::condition_variable is currently implemented via
pthread_cond_timedwait() on systems that use pthread. This is
problematic, since that function waits by default on CLOCK_REALTIME
and libc++ does not provide any mechanism to change from this
default.

Due to this, regardless of if condition_variable::wait_until() is
called with a chrono::system_clock or chrono::steady_clock parameter,
condition_variable::wait_until() will wait using CLOCK_REALTIME. This
is not accurate to the C++ standard as calling
condition_variable::wait_until() with a chrono::steady_clock parameter
should use CLOCK_MONOTONIC.

This is particularly problematic because CLOCK_REALTIME is a bad
choice as it is subject to discontinuous time adjustments, that may
cause condition_variable::wait_until() to immediately timeout or wait
indefinitely.

This change fixes this issue with a new POSIX function,
pthread_cond_clockwait() proposed on
http://austingroupbugs.net/view.php?id=1216. The new function is
similar to pthread_cond_timedwait() with the addition of a clock
parameter that allows it to wait using either CLOCK_REALTIME or
CLOCK_MONOTONIC, thus allowing condition_variable::wait_until() to
wait using CLOCK_REALTIME for chrono::system_clock and CLOCK_MONOTONIC
for chrono::steady_clock.

pthread_cond_clockwait() is implemented in glibc (2.30 and later) and
Android's bionic (Android API version 30 and later).

This change additionally makes wait_for() and wait_until() with clocks
other than chrono::system_clock use CLOCK_MONOTONIC.

Bug: 35756266
Test: boot, condition variables unaffected by time changes
Test: libc++ tests

Change-Id: I0246d85df106daa7a3e59ed2bb21180c86d6722d

Event Timeline

tomcherry created this revision.Jul 26 2019, 10:22 AM

So, to make sure I understand, this patch uses something that is *not yet* in POSIX, it's only proposed, correct?

Overall, I think this is a good direction since it patches a hole in condition_variable.

include/__mutex_base
423

Can't you just overload on chrono::time_point<chrono::system_clock, _Duration>& and avoid the enable_if dance?

503

Please pardon my naiveness, but what is that number?

Clarify that pthread_cond_clockwait() is only proposed to POSIX.
Remove enable_if.

tomcherry marked 3 inline comments as done.Jul 26 2019, 12:21 PM

So, to make sure I understand, this patch uses something that is *not yet* in POSIX, it's only proposed, correct?

Correct. I updated the commit message to clarify that it's only proposed.

include/__mutex_base
423

Thanks, yes, the new patch simplifies all of this.

503

It's taken from condition_variable.cpp, added in https://reviews.llvm.org/rGaad745a024a3d0fd573ac9649d4c659e6a713702.

I tried to keep all of the safety checks around chrono intact.

EricWF added inline comments.Aug 9 2019, 11:25 PM
include/__mutex_base
18

Can't we include time.h unconditionally?

308

This deduction is wrong. Either you deduce the entire time point, like we were originally, or you use the timepoint and duration specified by the clock.

You don't specify the clock and deduce the other parameters

Also, fewer overloads is better. We want to declare a _single_ public overload. We can do the magic behind the scenes .

340

This can likely be the same implementation as above.

459

This section seems like it can be generalized to work for both halves of this ifdef.
Having the same code tested in all cases is always better than preprocessor branches.

Forgive me if I'm missing something.

460

using instead of typedef in new code if you don't mind.

461

This assumes the resolution of the steady_clock, no?

495

__lk, __tp, __d, etc.

Everything that isn't a part of the public API needs a reserved name.

503

Lets give it a name so it makes sense.

508

const works the same in all dialects.

512

What does the cast do except silence bugs?

(Same question above).

ldionne added inline comments.Aug 12 2019, 8:26 AM
include/__mutex_base
308

This deduction is wrong. Either you deduce the entire time point, like we were originally, or you use the timepoint and duration specified by the clock.

Why? I was the one to suggest this in the last diff, because we otherwise needed SFINAE and the resulting code was a lot more complex than it needed to be. What's wrong with this implementation?

tomcherry updated this revision to Diff 214736.Aug 12 2019, 4:26 PM
tomcherry marked 13 inline comments as done.

Responding to review comments

I simplified the code a bit more and responded to the comments. I haven't had a good chance to test this yet, but pushing early for feedback.

include/__mutex_base
18

I thought it would be better to not include it if not needed, but I'll make it unconditionally included.

308

The latest patchset has it back to the original public interface.

340

The new patchset does this all with the one public interface.

459

I'm not sure it can. The first branch does the overflow checking with steady_clock and the second does it with system_clock. If we have steady_clock available, we'll want to use it. If we don't have steady_clock available, then we must use system_clock, since all non supported clocks end up calling wait_for() with the difference between their now() and the input time.

460

Will do.

461

Sort of. It mostly assumes that steady_clock::now() can be converted into nanoseconds safely, which I think is a reasonable assumption to make. The old code assumed the resolution of system_clock too, so it's not a regression from there.

I'm happy to hear a better solution, because both this code and the original have never sit well with me. What we really want is a safe_duration_cast, where we get a ceiling effect if there would be overflow. Since that is not how std::chrono was created, we're left with two options:

  1. This function ends up with signed integer overflow if someone passes milliseconds::max() to it and steady/system_clock.
  2. Work-arounds like this
495

Got it, done in next patch.

503

In retrospect, I removed it. There's no accompanying comments on why it was added originally and there's no obvious reason why it would be needed.

508

It's constexpr in the original code, but const is sufficient, so I changed it to that.

512

The first cast makes sense, we don't know what size time_t is, but we know that if s.count() < ts_sec_max, that we can convert into it.

The second cast isn't needed, removed.

mclow.lists added inline comments.Aug 13 2019, 10:17 AM
include/__mutex_base
461

What we really want is a safe_duration_cast, where we get a ceiling effect if there would be overflow. Since that is not how std::chrono was created, we're left with two options:

I suggest a third option: Write __safe_duration_cast

tomcherry updated this revision to Diff 214967.Aug 13 2019, 4:15 PM

Add __safe_nanosecond_cast

tomcherry marked an inline comment as done.Aug 13 2019, 4:21 PM
tomcherry added inline comments.
include/__mutex_base
461

I intentionally didn't suggest that since that full solution would be out of the scope of this change, and I find it strange that we'd implement a private version, when we really do need a public safe_duration_cast.

But I did write something similar today: a safe_nanosecond_cast that only handles the cases needed for this code.

I assume that I cannot use __builtin_mul_overflow or similar in libcxx, so I avoided those, though would happily add them back if it were acceptable.

Are there any other comments for this change? I think my latest patchset takes into consideration the previous comments.

Thanks!

danalbert accepted this revision.Sep 9 2019, 12:39 PM

LGTM. @EricWF any further comments before I submit this?

This revision is now accepted and ready to land.Sep 9 2019, 12:39 PM
tomcherry updated this revision to Diff 219593.Sep 10 2019, 1:17 PM

Restore original behavior that cv_status::timeout is immediately returned if the absolute time of wait_until() has already passed.
Fix return 0 location in test.

tomcherry updated this revision to Diff 219840.Sep 11 2019, 6:04 PM

Add an extra conversion since clocks might not necessarily return nanoseconds from time_from_epoch().