This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] [test] Repair thread unsafety in thread tests
ClosedPublic

Authored by BillyONeal on Aug 9 2018, 9:48 PM.

Details

Summary
  • Repair thread-unsafe modifications of n_alive in F.pass.cpp

In this example, the ctor of G runs in the main thread in the expression G(), and also in the copy ctor of G() in the DECAY_COPY inside std::thread. The main thread destroys the G() instance at the semicolon, and the started thread destroys the G() after it returns. Thus there is a race between the threads on the n_alive variable.

The fix is to join with the background thread before attempting to destroy the G in the main thread.

Diff Detail

Event Timeline

BillyONeal created this revision.Aug 9 2018, 9:48 PM

We should add some TSAN folks to this review, since I think these are also TSAN false negatives; perhaps correctly, perhaps not.

We should add some TSAN folks to this review, since I think these are also TSAN false negatives; perhaps correctly, perhaps not.

Do you know who that is / can you add them?

I was hoping my comment would summon them magically. I'll figure it out shortly.

CC some sanitizer folks.

dvyukov added inline comments.Aug 13 2018, 9:54 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

I don't immediately see how the race on n_alive/op_run happens. It seems that the modifications in the thread happen before this line, and modifications in main happen after this line. How can both of them modify the variables at the same time?

BillyONeal added inline comments.Aug 13 2018, 10:01 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

The destruction of g here races with the destruction of the DECAY_COPY'd copy of G used as the parameter of operator(). That is, line 69 creates a copy of g, passes that to the started thread, the started thread calls gCopy(). gCopy() doesn't return until the done flag is set, but the destruction of the object on which op() is being called is not so synchronized. Most of the other thread tests avoid this problem by joining with the thread; joining waits for the destruction of the DECAY_COPY'd parameters, but this does not.

(This is one of the reasons detach() should basically never be called anywhere)

BillyONeal added inline comments.Aug 13 2018, 10:05 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

(That is to say, there's nothing to prevent both threads from executing G::!G() on the two different copies of g... making op_run atomic is probably avoidable but I'm being paranoid given that there was already thread unsafety here...)

dvyukov added inline comments.Aug 13 2018, 10:07 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

What is gCopy? I don't see anything named gCopy in this file...

Do we care about completion of destruction? Why? We wait for done to be set, and other variables are already updated at that point. Why does it matter that "the destruction of the object on which op() is being called is not so synchronized."?

BillyONeal added inline comments.Aug 13 2018, 10:11 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

Because the destructor does --n_alive;

BillyONeal added inline comments.Aug 13 2018, 10:21 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

What is gCopy? I don't see anything named gCopy in this file...

The copy is made in the constructor of std::thread. std::thread makes a copy of all the input parameters, gives the copy to the started thread, and then std::invoke is called there.

Why does it matter that "the destruction of the object on which op() is being called is not so synchronized."?

Because the two dtors race on --n_alive; when n_alive is not atomic.

dvyukov added inline comments.Aug 13 2018, 10:26 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

But the first dtor runs before "while (!done) {}" line and the second dtor runs after "while (!done) {}" line, no?
Or there is third object involved? But then I don't see how joining the thread would help either.

BillyONeal added inline comments.Aug 13 2018, 10:33 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

But the first dtor runs before "while (!done) {}" line

No, both dtors are run after the while (!done) {} line. The first dtor runs on line 76 (when the local variable g is destroyed), and the second dtor runs after operator() returns in the constructed thread. The constructed thread is morally doing:

void threadproc(G * g) {
    g->operator(); // setting done happens in here
    delete g; // dtor of second copy runs here
}

I don't see how joining the thread would help either.

Joining with the thread would wait for the second dtor -- the one after op() returns -- to complete. Of course joining with the thread isn't doable here given that the point is to test thread::detach :)

BillyONeal added inline comments.Aug 13 2018, 10:38 AM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
75 ↗(On Diff #160055)

Actually I just realized that this assert is bogus too; there's no synchronization to ensure the copied g is destroyed here.

dvyukov added inline comments.Aug 13 2018, 4:41 PM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

No, both dtors are run after the while (!done) {} line.

But how do we get past while (!done) line before the desctructor in the thread has finished? The destructor sets done. So after while (!done) line the destructor is effectively finished. What am I missing?

BillyONeal added inline comments.Aug 13 2018, 5:59 PM
test/std/thread/thread.threads/thread.thread.class/thread.thread.member/detach.pass.cpp
73 ↗(On Diff #160055)

The destructor sets done.

Hmmm I thought done was getting set on 56 but that's done_ (with a trailing underscore). :sigh:

BillyONeal edited the summary of this revision. (Show Details)

Remove changes to detach tests.

EricWF accepted this revision.Oct 12 2018, 9:48 PM
This revision is now accepted and ready to land.Oct 12 2018, 9:48 PM
BillyONeal closed this revision.Oct 19 2018, 4:47 PM

Committed r344820