Download Raw Diff

Tokens

"Haypence" token, awarded by post.kadirselcuk.

Details

Reviewers

vinipsmaker
• post.kadirselcuk
lewissbaker
• Quuxplusone
ldionne

Group Reviewers

Restricted Project

Commits

rG64fc3cd55d58: [libc++] Hold mutex lock while notify_all is called at notify_all_at_thread_exit

Summary

Releasing the mutex before the call to notify_all is an optimization.
This optimization cannot be used here. The thread waiting on the
condition might destroy the associated resources — mutex + condition
variable — and the notifier thread will access an destroyed variable
— the condition variable. In fact, notify_all_at_thread_exit is meant
exactly to join on detached threads, and the waiting thread doesn't
expect for the notifier thread to access any further shared resources,
making this scenario very likely to happen. The waiting thread might
awake spuriously on the release of the mutex lock. The reorder is
necessary to prevent this race.

Further details can be found at
https://cplusplus.github.io/LWG/issue3343.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vinipsmaker requested review of this revision.Jul 10 2021, 5:19 AM

vinipsmaker created this revision.

Herald added a reviewer: Restricted Project. · View Herald TranscriptJul 10 2021, 5:19 AM

Herald added a subscriber: libcxx-commits. · View Herald Transcript

vinipsmaker edited the summary of this revision. (Show Details)Jul 10 2021, 5:21 AM

LGTM based on a quick read of that LWG issue. @ldionne, WDYT?

Harbormaster completed remote builds in B113347: Diff 357705.Jul 10 2021, 7:06 AM

• post.kadirselcuk added a child revision: D34362: [LNT] Support for different DataSet usage in Polybench for "lnt runtest nt".Jul 10 2021, 5:55 PM

• post.kadirselcuk added a parent revision: D105762: [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack..Jul 10 2021, 8:06 PM

craig.topper removed a parent revision: D105762: [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack..Jul 10 2021, 9:47 PM

• post.kadirselcuk commandeered this revision.Jul 10 2021, 10:51 PM

• post.kadirselcuk added a child revision: D34363: [LLVM Test-suite] Added Support for different dataset usage in Polybench.

• post.kadirselcuk added a reviewer: vinipsmaker.

• post.kadirselcuk awarded a token.

• post.kadirselcuk planned changes to this revision.Jul 10 2021, 10:53 PM

• post.kadirselcuk accepted this revision.

Kadir, ig it is enough for today 😂

• post.kadirselcuk added a commit: rGcae3b831f430: [llvm-nm] Switch command line parsing from llvm::cl to OptTable.Jul 10 2021, 11:58 PM

• post.kadirselcuk added a parent revision: D105330: [llvm-nm] Switch command line parsing from llvm::cl to OptTable.Jul 11 2021, 12:41 AM

I disabled post.kadirselcuk who is apparently spamming the website.

The change does look legit, but the submitter not so much. We should:

Add a test
Update the LWG status page

Edit Oh, sorry, I thought the submitter was the spammer, not @vinipsmaker. In all cases, my comments above hold but let's move forward with this.

ldionne removed a reviewer: • post.kadirselcuk.Jul 12 2021, 8:02 AM

Update the LWG status page

Given you'll spend time with the committee anyway, could one of you also write a related unlock_at_thread_exit? The use case is described at https://lists.isocpp.org/std-proposals/2021/07/2809.php.

ldionne removed a commit: rGcae3b831f430: [llvm-nm] Switch command line parsing from llvm::cl to OptTable.Jul 15 2021, 11:54 AM

efriedma removed a child revision: D34362: [LNT] Support for different DataSet usage in Polybench for "lnt runtest nt".Jul 17 2021, 3:02 PM

Commandeering to add a test and update the LWG status page, per ldionne's request.

tambre added a subscriber: tambre.Jul 20 2021, 8:42 AM

Commandeering to add a test and update the LWG status page, per ldionne's request.

What exactly "commandeering" refers to here? I don't understand this comment.

@vinipsmaker wrote:

What exactly "commandeering" refers to here?

It means I'm taking over the responsibility of updating this patch, i.e., it's "mine" now instead of "yours" (as far as Phabricator is concerned). Louis's impression was that you weren't planning to follow up on adding tests and updating the status page, so he asked me to commandeer the patch. However, if you want it back so that you can add tests and status-page update, then I have no problem with you commandeering it back. I'm not planning to actively work on it for a few hours at least. :)

Btw, "could one of you also write a related unlock_at_thread_exit?" — I agree with your rationale and think such a function ought to exist, but I don't think @ldionne would consent to add it to libc++ unless it was already in the Standard (or at least tentatively ready for adoption by the Standard), and I'm not personally willing to write the paper and champion it and so on. But I think it's a good idea for someone (not me) to write that paper. :)

It means I'm taking over the responsibility of updating this patch, i.e., it's "mine" now instead of "yours" (as far as Phabricator is concerned).

Okay, I see. Thanks.

Louis's impression was that you weren't planning to follow up on adding tests and updating the status page, so he asked me to commandeer the patch.

His impression is correct. I just want to solve my problem. I want to stay as far away as possible from C++ committees. Thanks for taking care of this.

[...] I agree with your rationale and think such a function [unlock_at_thread_exit] ought to exist, but I don't think @ldionne would consent to add it to libc++ unless it was already in the Standard [...] But I think it's a good idea for someone (not me) to write that paper. :)

I didn't express myself clearly. Actually that's what I was suggesting... a paper. Several of you guys already are in constant contact with the C++ committee. I just want to solve my problem. I don't want my name in the standard nor any contact with the C++ committee. Just like yourself... I don't want to write such paper.

Looks like this function won't see the light of the day, but that's okay.

• Quuxplusone added a subscriber: lewissbaker.Jul 21 2021, 7:43 PM

• Quuxplusone added inline comments.

libcxx/src/thread.cpp
162	Btw, update: I've emailed @lewissbaker to ask why the proof-of-concept wandbox from https://cplusplus.github.io/LWG/issue3343 (namely, https://wandbox.org/permlink/eUu3eiQbLl7JQKMm ) doesn't reproduce anymore. It's trivial to turn that code snippet into a libc++ unit test; but since it's already green on trunk, it wouldn't be very useful. He hasn't gotten back to me yet. But this PR is still on my radar!

xgupta removed a subscriber: xgupta.Jul 21 2021, 8:07 PM

Still haven't heard from @lewissbaker , so this did fall off my radar. At this point I'm inclined to say we should just commit the fix, presumably without the attached test case (because the test case doesn't actually reproduce the bug with libc++ anyway). However, I'm showing the test case in this PR right now, just so that people see what I tried.
Again, this test case does not reproduce the bug: it's green both before and after the proposed change.

Harbormaster completed remote builds in B119000: Diff 365644.Aug 10 2021, 7:41 PM

At this point I'm inclined to say we should just commit the fix, presumably without the attached test case

Well... I can't even use notify_all_at_thread_exit in its current state, and my workaround isn't general enough to cover all cases. Tests should be available to ensure the library is correct, not the other way around. If a test case is just going to guarantee libc++ will be buggy... I don't see the point with a test case at all.

@vinipsmaker wrote:

If a test case is just going to guarantee libc++ will be buggy...

Well, this test case is green both before and after the change. So it wouldn't "guarantee libc++ will be buggy"; it simply would "not guarantee libc++ won't be buggy."

And actually it looks like this test case does fail with GCC 11! https://reviews.llvm.org/harbormaster/unit/view/943944/
Is that bot using libstdc++ instead of libc++? or is something weird going on here?

@ldionne, what are your thoughts?
Situation at the moment: This apparently fixes a fairly important bug in <thread>. The attached test case is supposed to detect the bug, but AFAICT it does not. The wording for the fix is proposed in LWG3343 but has not yet been adopted into any draft.
I think the obvious alternatives are "Land this to fix the bug and get it over with," or "Continue sitting on this indefinitely until LWG3343 gets adopted." Either way, we'll have to grapple with the fact that the test case doesn't seem to reproduce the bug.

(Also, re-ping: @lewissbaker , but I don't think he's listening)

LGTM, but..

Is that bot using libstdc++ instead of libc++? or is something weird going on here?

No, it should be using libc++ (which you can validate by looking at the command-line used to compile the test). Are you able to reproduce in Docker with this?

./libcxx/utils/ci/run-buildbot-container
$ ./libcxx/utils/ci/run-buildbot generic-gcc-cxx11

This revision is now accepted and ready to land.Aug 24 2021, 1:58 PM

lewissbaker added inline comments.Aug 27 2021, 8:36 PM

libcxx/src/thread.cpp
162	One approach you could take to try to verify that this fix works is to try putting a sleep between the `unlock()` and `notify_all()` calls in both the before and after code to try to increase the chances of the other thread acquires the lock, sees the state that says the thread is done and then destroys the CV before the exiting thread calls `notify_all()`. It's obviously not something you'd leave in the code, however, but could serve to give a higher confidence in the fix.
libcxx/test/std/thread/thread.condition/notify_all_at_thread_exit_lwg3343.pass.cpp
62 ↗	(On Diff #365644)	We are trying to make the cv destructor race with the call to notify_all(). This is generally only going to happen if the main thread happens to see the threads_active as 0 between the lock being released and the cv being notified. You could possibly increase the likelihood of a race by taking out a lock on the mutex before launching the threads. The idea being that the other threads that launch quickly will wait to acquire the lock and then once the main thread releases the lock inside the call to cv.wait() then the other threads start at once. I'd also consider removing the sleep_for() in the threads to encourage some more raciness of all of the threads exiting at the same time. Really, it's only ever going to be the last thread to execute that will potentially run into the lifetime issue. Another approach could be to have the main thread simply spin, acquiring the lock, looking at the threads_active and then releasing the lock and having it continue on and destroy the CV once it sees that no more threads are active. i.e. just do a spin-loop version of cv.wait() without actually calling cv.wait() If you do all of this in conjunction with inserting a sleep_for() call in the implementation of notify_all_at_thread_exit() between the unlock and notify_all calls (just for testing) then you should be able to reproduce the issue.

Any news on the issue? More than one year later and I still can't use this fix. It's an one-line reordering guys, c'mon.

Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2023, 4:36 AM

This fell through the cracks. @vinipsmaker what Author Name <email@domain> do you want that committed under?

ldionne commandeered this revision.Jan 10 2023, 10:31 AM

ldionne edited reviewers, added: • Quuxplusone; removed: ldionne.

This revision now requires review to proceed.Jan 10 2023, 10:31 AM

Rebase onto main.

Will ship once I have the attribution info and a green CI run.

This revision is now accepted and ready to land.Jan 10 2023, 10:32 AM

This fell through the cracks. @vinipsmaker what Author Name <email@domain> do you want that committed under?

I only authored the diff for the file libcxx/src/thread.cpp and that's a 1-line patchset. This is so small that I don't believe it's even copyrightable. If you really need one, you can use this email for the author field: https://mailhide.io/en/e/ckagYWto ; and Vinícius Oliveira for the author name would be fine.

Harbormaster completed remote builds in B206846: Diff 487878.Jan 10 2023, 2:58 PM

In D105758#4041055, @vinipsmaker wrote:

This fell through the cracks. @vinipsmaker what Author Name <email@domain> do you want that committed under?

I only authored the diff for the file libcxx/src/thread.cpp and that's a 1-line patchset. This is so small that I don't believe it's even copyrightable. If you really need one, you can use this email for the author field: https://mailhide.io/en/e/ckagYWto ; and Vinícius Oliveira for the author name would be fine.

Oh, sorry, I see now that @Quuxplusone had authored most of the patch.

Tweak test to avoid timeouts and fix for no-threads configuration.

Fix LWG issue CSV

Harbormaster completed remote builds in B207086: Diff 488210.Jan 11 2023, 9:47 AM

Failures are flakes, landing.

This revision was landed with ongoing or failed builds.Jan 11 2023, 2:01 PM

Closed by commit rG64fc3cd55d58: [libc++] Hold mutex lock while notify_all is called at notify_all_at_thread_exit (authored by • arthur.j.odwyer, committed by ldionne). · Explain Why

This revision was automatically updated to reflect the committed changes.

ldionne added a commit: rG64fc3cd55d58: [libc++] Hold mutex lock while notify_all is called at notify_all_at_thread_exit.

This is an archive of the discontinued LLVM Phabricator instance.

Hold mutex lock while notify_all is called at notify_all_at_thread_exit
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 357705

libcxx/src/thread.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Hold mutex lock while notify_all is called at notify_all_at_thread_exitClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 357705

libcxx/src/thread.cpp

Hold mutex lock while notify_all is called at notify_all_at_thread_exit
ClosedPublic