This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Fixes a flaky test.
ClosedPublic

Authored by Mordante on Feb 11 2023, 8:04 AM.

Details

Reviewers
ldionne
Group Reviewers
Restricted Project
Commits
rG261d21f8c375: [libc++] Fixes a flaky test.
Summary

While investigating the flaky tests on the mac backdeployment targets
it seems the tests are different than suggested in the LWG issue.
That tests "does work", grabs the mutex, marks the task as done, and
finally calls the notifier.
Our test emulated "does work" after the notification, effectively
sleeping with a lock held. This has been fixed.

A second improvement is that the test fails when, due to OS
scheduling, the condition variable is not used in the main thread.

During discussing the approach of the patch with @ldionne, he
noticed the real reason why the test is flaky; the Apple
backdeployment targets use a dylib, which does not contain the
fix. So the test can't be tested on that platform; it only
proves the LWG issue was correct and the Standard contained
a bug.

Diff Detail

Event Timeline

Mordante created this revision.Feb 11 2023, 8:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2023, 8:04 AM

If all the threads exit before the cv.wait is called. cv.wait would immediately return as the predicate returns true.

If all the threads exit before the cv.wait is called. cv.wait would immediately return as the predicate returns true.

Good point, I had looked at the overload without a predicate.

Still I think this test is better, we want to make sure notify_all_at_thread_exit does a notification. Depending on the OS scheduling the previous test may pass without any notification. Which would happen when the predicate is true at its first evaluation.

If all the threads exit before the cv.wait is called. cv.wait would immediately return as the predicate returns true.

Good point, I had looked at the overload without a predicate.

Still I think this test is better, we want to make sure notify_all_at_thread_exit does a notification. Depending on the OS scheduling the previous test may pass without any notification. Which would happen when the predicate is true at its first evaluation.

The fact that the test can hang might indicate there might be a problem either in our libc++ , or the OS, if we cannot prove that the test is not correct.

Maybe we should try to figure that out as well

If all the threads exit before the cv.wait is called. cv.wait would immediately return as the predicate returns true.

Good point, I had looked at the overload without a predicate.

Still I think this test is better, we want to make sure notify_all_at_thread_exit does a notification. Depending on the OS scheduling the previous test may pass without any notification. Which would happen when the predicate is true at its first evaluation.

The fact that the test can hang might indicate there might be a problem either in our libc++ , or the OS, if we cannot prove that the test is not correct.

Maybe we should try to figure that out as well

I fully agree. That's why the review is still in draft state. But I wanted to test whether this fix passes the CI for all platforms. I'm still looking into it.

ldionne added inline comments.Feb 14 2023, 9:22 AM
libcxx/test/std/thread/thread.condition/notify_all_at_thread_exit_lwg3343.pass.cpp
12

I think this needs to be marked as UNSUPPORTED on the back-deployment target. The fix is in the dylib so we should expect it to fail on back-deployment targets.

I can't believe I didn't think of that earlier, I'm sorry I hadn't realized the original fix was in thread.cpp, not in <thread>.

Mordante updated this revision to Diff 497717.Feb 15 2023, 10:02 AM

Rewritten the test based on a private discussion with @ldionne.

Mordante edited the summary of this revision. (Show Details)Feb 15 2023, 10:09 AM
Mordante updated this revision to Diff 497726.Feb 15 2023, 10:14 AM
Mordante marked an inline comment as done.

Minor comment improvement.

libcxx/test/std/thread/thread.condition/notify_all_at_thread_exit_lwg3343.pass.cpp
18

@ldionne please verify whether these targets are correct. These are the same values I use for the format backdeployment tests.

Mordante published this revision for review.Feb 15 2023, 10:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 10:15 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
var-const retitled this revision from [libc++] Fixes a flacky test. to [libc++] Fixes a flaky test..Feb 15 2023, 11:02 AM

hmm. I am not sure if I understand why we need to change the test. According to the description, it is not the test that is flaky. Instead, it is the system under test (the dylib which contains the bug) that is flaky.
So the test does exact what it should do: detect the bug. The fact that it fails very frequently probably means that it does a good job.

the added assertion might actually makes the test flaky since that totally depends on the scheduler.

hmm. I am not sure if I understand why we need to change the test. According to the description, it is not the test that is flaky. Instead, it is the system under test (the dylib which contains the bug) that is flaky.
So the test does exact what it should do: detect the bug. The fact that it fails very frequently probably means that it does a good job.

the added assertion might actually makes the test flaky since that totally depends on the scheduler.

There are two reasons for the change:

  1. The typical use case of notify_all_at_thread_exit is to issue it at the end of the thread. In the original case the worker thread was sleeping while holding a mutex. Something which is not a typical use case.
  2. The test might never actually test that notify_all_at_thread_exit is called. Testing on Mac revealed it did happen, but was extremely rare. So if this fails 25% of the time on other platforms we should look at how to make the test more realistic, longer sleep, yielding at the start of the thread execution, add more thread, etc. So this is basically a test that the test tests what it expects.

I agree these improvements are not related to the "fix", so maybe it would be better to commit them separately.

ldionne accepted this revision.Feb 16 2023, 1:09 PM

I think I like those changes, however I am wondering -- does the test still fail as reliably with those changes? Did you run it on the backdeployment host with those changes and you could see it fail pretty much as often as before? If so, this LGTM. If this makes the test not reproduce (or reproduce much less frequently), then we're reducing its effectiveness and I'd argue we shouldn't.

LGTM assuming this does not power-down the test.

libcxx/test/std/thread/thread.condition/notify_all_at_thread_exit_lwg3343.pass.cpp
18

Yeah, that sounds fine to me. I think 12.0 would fail too, but this can be fixed in bulk when we add testing for 12.0 in backdeployment. Other tests will also need fixing anyway.

78–79
This revision is now accepted and ready to land.Feb 16 2023, 1:09 PM

I think I like those changes, however I am wondering -- does the test still fail as reliably with those changes? Did you run it on the backdeployment host with those changes and you could see it fail pretty much as often as before? If so, this LGTM. If this makes the test not reproduce (or reproduce much less frequently), then we're reducing its effectiveness and I'd argue we shouldn't.

LGTM assuming this does not power-down the test.

I didn't run this version on back-deployment, I can do that tomorrow. During testing I've tested with a version where I removed the sleep entirely and I got the feeling it made the test less stable on Mac. At that time I didn't investigate it further and I wasn't aware of the real issue on the Mac.

libcxx/test/std/thread/thread.condition/notify_all_at_thread_exit_lwg3343.pass.cpp
78–79

Thanks, when is indeed not intended.

This revision was automatically updated to reflect the committed changes.
Mordante marked an inline comment as done.

I think I like those changes, however I am wondering -- does the test still fail as reliably with those changes? Did you run it on the backdeployment host with those changes and you could see it fail pretty much as often as before? If so, this LGTM. If this makes the test not reproduce (or reproduce much less frequently), then we're reducing its effectiveness and I'd argue we shouldn't.

Well it seems the CI hates me ;-) The original test, the sleep before mutex, and even removing the sleep are running stable on the Mac. I will try again another time.

LGTM assuming this does not power-down the test.

It shouldn't.