Page MenuHomePhabricator

[libcxx] Fix potential lost wake-up in counting semaphore
AcceptedPublic

Authored by fwolff on Wed, Nov 17, 1:31 PM.

Details

Reviewers
Quuxplusone
ldionne
Group Reviewers
Restricted Project
Summary

Fixes PR#47013. The implementation in libstdc++ seems to have had the same problem (see Bug #100806) and also solved it by always notifying all (instead of just one if __update is equal to 1).

Diff Detail

Unit TestsFailed

TimeTest
120 mslibcxx CI 32 bit > llvm-libunwind-shared-cfg-in.llvm-libunwind-shared-cfg-in::forceunwind.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/test/forceunwind.pass.cpp --target=x86_64-unknown-linux-gnu -m32 -nostdinc++ -I /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/include -funwind-tables -std=c++2b -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -L /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -lunwind -ldl -Wl,--export-dynamic -o /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/Output/forceunwind.pass.cpp.dir/t.tmp.exe
110 mslibcxx CI 32 bit > llvm-libunwind-shared-cfg-in.llvm-libunwind-shared-cfg-in::frameheadercache_test.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/test/frameheadercache_test.pass.cpp --target=x86_64-unknown-linux-gnu -m32 -nostdinc++ -I /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/include -funwind-tables -std=c++2b -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -L /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -lunwind -ldl -Wl,--export-dynamic -o /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/Output/frameheadercache_test.pass.cpp.dir/t.tmp.exe
110 mslibcxx CI 32 bit > llvm-libunwind-shared-cfg-in.llvm-libunwind-shared-cfg-in::libunwind_01.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/test/libunwind_01.pass.cpp --target=x86_64-unknown-linux-gnu -m32 -nostdinc++ -I /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/include -funwind-tables -std=c++2b -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -L /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -lunwind -ldl -Wl,--export-dynamic -o /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/Output/libunwind_01.pass.cpp.dir/t.tmp.exe
100 mslibcxx CI 32 bit > llvm-libunwind-shared-cfg-in.llvm-libunwind-shared-cfg-in::libunwind_02.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/test/libunwind_02.pass.cpp --target=x86_64-unknown-linux-gnu -m32 -nostdinc++ -I /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/include -funwind-tables -std=c++2b -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -L /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -lunwind -ldl -Wl,--export-dynamic -o /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/Output/libunwind_02.pass.cpp.dir/t.tmp.exe
100 mslibcxx CI 32 bit > llvm-libunwind-shared-cfg-in.llvm-libunwind-shared-cfg-in::signal_frame.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/test/signal_frame.pass.cpp --target=x86_64-unknown-linux-gnu -m32 -nostdinc++ -I /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/libunwind/include -funwind-tables -std=c++2b -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -L /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/lib -lunwind -ldl -Wl,--export-dynamic -o /home/libcxx-builder/.buildkite-agent/builds/014f282531a3-1/llvm-project/libcxx-ci/build/generic-32bit/Output/signal_frame.pass.cpp.dir/t.tmp.exe
View Full Test Results (8 Failed)

Event Timeline

fwolff requested review of this revision.Wed, Nov 17, 1:31 PM
fwolff created this revision.
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald TranscriptWed, Nov 17, 1:31 PM

I believe the old code had a problem. I'm not fully convinced that the new code doesn't still have a similar problem. ;) But this PR seems like a good idea to me, and might even be a candidate for 13.x (@ldionne?).
Is it possible to write a very naïve test for this? E.g.

int main(int, char**)
{
    std::counting_semaphore s(0);
    std::barrier b(3);
    std::thread t1 = std::thread([&]() {
        for (int i=0; i < 100000; ++i) {
            s.acquire();
            b.arrive_and_wait();
        }
    });
    std::thread t2 = std::thread([&]() {
        for (int i=0; i < 100000; ++i) {
            s.acquire();
            b.arrive_and_wait();
        }
    });
    std::thread t3 = std::thread([&]() {
        for (int i=0; i < 100000; ++i) {
            s.release(1);
            s.release(1);
            b.arrive_and_wait();
        }
    });
    t1.join();
    t2.join();
    t3.join();

    return 0;
}

(This fails to reproduce the problem on my personal laptop, but I don't think that's too surprising.)

fwolff updated this revision to Diff 388044.Wed, Nov 17, 2:46 PM

I've added a simple test, although I couldn't reproduce the issue locally either.

fwolff updated this revision to Diff 388302.Thu, Nov 18, 1:11 PM
Quuxplusone accepted this revision as: Quuxplusone.Thu, Nov 18, 2:50 PM

LGTM % test nitpicks. However, I believe we should force @ldionne to look at this one. :) <semaphore> has already gotten two patches merged back to 13.x, and this might be another one.

libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp
10

I assume that REQUIRES: long_tests is a tradeoff — faster tests in most configurations, but testing of this specific test only in one (or a few) configurations; right?
This test takes only 1.3 seconds on my laptop. Maybe we should just knock the iteration count down from 100'000 to 10'000 (so it takes 0.13 seconds) and then we can definitely remove this REQUIRES. Defer to @ldionne here.

34–37

Throughout: please int [ij] = 0 instead of auto [ij] = 0.
Also, I find this nested-loop version much harder to grok at first sight (I mean, maybe just because I wrote the other version myself ;)). IMO it would help to unroll the j loop.
I guess the real question is, does the change from "2 acquiring threads, 2 releases in 1 releasing thread" to "8 acquiring threads, 8 releases in 2 releasing threads" make the buggy situation more or less likely to occur? From the description in https://bugs.llvm.org/show_bug.cgi?id=47013#c1 , I think that increasing the number of waiting acquirers is good (because each adjacent pair of acquirers has a chance to race with each other, so we have 7x the chances to reproduce in each loop iteration), but I don't see how increasing the number of releasers helps. If you think it'd be harmless to go back to just one releaser (keeping all 8 acquirers, and ideally still unrolling the loop), that would simplify this code IMHO.

fwolff updated this revision to Diff 388335.Thu, Nov 18, 3:21 PM
fwolff marked 2 inline comments as done.Thu, Nov 18, 3:29 PM
fwolff added inline comments.
libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp
34–37

It's hard to say. From my understanding, the releases have to happen in a quick enough succession that none of the acquirer threads get around to decrementing the variable in between. This might be more likely with two releaser threads to happen, but maybe not because it increases the contention on the atomic variable. I really don't know either, so I've implemented your suggestion for now.

ldionne requested changes to this revision.Mon, Nov 22, 10:37 AM
ldionne added a subscriber: __simt__.

I would actually like @__simt__ to take a look at this one.

If we do agree this is a proper fix, then yeah I would cherry-pick to release/13.x.

libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp
10

REQUIRES: long_tests was used for slow devices like simulators. It's not really maintained anymore.

By default, it is enabled. I'm OK with the current state, i.e. we always run this test.

12

You will need to add:

// This test requires the dylib support introduced in D68480, which shipped in macOS 11.0.
// XFAIL: use_system_cxx_lib && target={{.+}}-apple-macosx10.{{9|10|11|12|13|14|15}}

You will also need to add:

// TODO(ldionne): This test fails on Ubuntu Focal on our CI nodes (and only there), in 32 bit mode.
// UNSUPPORTED: linux && 32bits-on-64bits

Sorry, I still have not figured this one out.

52–54

This is used for some freestanding configurations where there is no way to create a thread without providing a stack size. In those configurations, support::make_test_thread can be used to pass a stack size.

That isn't covered by CI yet so you couldn't notice.

This revision now requires changes to proceed.Mon, Nov 22, 10:37 AM

I've just read up the backstory of this.

I agree with the explanation of the bug, and agree the fix is to just notify_all() regardless of the update amount.

Wouldn't it also be possible to just unconditionally notify the atomic? Something like this:

__a.fetch_add(__update, memory_order_release);
if(__update > 1)
    __a.notify_all();
else
    __a.notify_one();

One downside is that there might be some unneeded calls to the platform wake function (futex, etc.), but it would have the advantage of avoiding thundering herd problems if there are many consumers and a single producer repeatedly calling notify_one().

fwolff updated this revision to Diff 389308.Tue, Nov 23, 1:37 PM
fwolff marked an inline comment as done.
fwolff marked 2 inline comments as done.
ldionne accepted this revision.Wed, Nov 24, 3:25 PM

Wouldn't it also be possible to just unconditionally notify the atomic? Something like this:

__a.fetch_add(__update, memory_order_release);
if(__update > 1)
    __a.notify_all();
else
    __a.notify_one();

One downside is that there might be some unneeded calls to the platform wake function (futex, etc.), but it would have the advantage of avoiding thundering herd problems if there are many consumers and a single producer repeatedly calling notify_one().

IMO it makes more sense to try and diminish spurious wakeups, but I'm far from an expert in that domain.

The current fix LGTM -- I did my due diligence and I understand the problem and how this fixes it. I'd simply like to make sure the test is not flaky.

libcxx/include/semaphore
87–91

We might want to reformulate this as

if(__a.fetch_add(__update, memory_order_release) == 0)
  __a.notify_all();

This seems easier to understand.

libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp
2

Do you think this test is the reason why the CI nodes timed out after running for 2 hours? It seems like it's not consistently slow (I just restarted the macOS CI and it finished quickly, but before that it froze for 2 hours, which I've never seen). https://buildkite.com/llvm-project/libcxx-ci/builds/6835#4278d152-fa4d-482f-a12e-e9697caa99e3

This revision is now accepted and ready to land.Wed, Nov 24, 3:25 PM
fwolff updated this revision to Diff 389889.Thu, Nov 25, 4:52 PM
fwolff marked 2 inline comments as done.Thu, Nov 25, 5:05 PM
fwolff added inline comments.
libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp
2

I don't see what could be causing this, but I agree that it looks suspicious.

The fix itself should be uncontroversial, though, at least as far as correctness is concerned, because we are always waking up at least as many threads as before, so if anything, we're making deadlocks less likely to occur.

What options do we have regarding this test? Could the problem also lie with, say, std::barrier?

ldionne added inline comments.Wed, Dec 1, 8:56 AM
libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp
2

I pulled the patch down to run it locally. First time went well, second time -- I'm still running the test after a couple minutes. It looks like it's hanging for some reason.

2

Oh and I'm running on a Mac obviously.