Page MenuHomePhabricator

[Support] ThreadPoolExecutor fixes for Windows/MinGW
ClosedPublic

Authored by andrewng on Nov 19 2019, 8:38 AM.

Details

Summary

Changed ThreadPoolExecutor to no longer use detached threads and instead
to join threads on destruction. This is to prevent intermittent crashing
on Windows when doing a normal full exit, e.g. via exit().

Changed ThreadPoolExecutor to be a ManagedStatic so that it can be
stopped on llvm_shutdown(). Without this, it would only be stopped in
the destructor when doing a full exit. This is required to avoid
intermittent crashing on Windows due to a race condition between the
ThreadPoolExecutor starting up threads and the process doing a fast
exit, e.g. via _exit().

The Windows crashes appear to only occur with the MSVC static runtimes
and are more frequent with the debug static runtime.

These changes also prevent intermittent deadlocks on exit with the MinGW
runtime.

Diff Detail

Event Timeline

andrewng created this revision.Nov 19 2019, 8:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2019, 8:38 AM

This patch basically reduces the amount of dependence that ThreadPoolExecutor has on what is effectively undefined behaviour. The changes have been based on issues that have been observed when running LLD on Windows built with the MSVC (2015 & 2017) static run-times and MinGW LLD on Windows.

The patch has been tested on Ubuntu 18.04.3 LTS. I have checked that there is little to no impact on LLD lit testing performance, as the lit testing disables the "fast" exit mode of LLD. In fact, this patch appears to give a slight performance improvement with higher core counts.

ormris added a subscriber: ormris.Nov 19 2019, 11:14 AM

FWIW, this seems to touch on areas also (partially, incompletely) discussed earlier in D53968. (In that one, the issues seemed to be specific to builds with LLVM_LINK_LLVM_DYLIB enabled.)

Also regarding deadlocks on exit in MinGW, for cases with libstdc++ with winpthreads, there are some known such deadlocks that were caused by signaling a condvar without holding the corresponding mutex (see https://sourceforge.net/p/mingw-w64/bugs/774/ for a report on the matter); if I recall correctly this particular deadlock should now be fixed with the latest version of winpthreads, but there might still be other deadlocks lingering.

How come we have both lib/Support/Parallel.cpp and lib/Support/ThreadPool.cpp? What is the difference? I thought this was the general LLVM threadpool and I was going to send this over to @mehdi_amini who added that in rG396abbb6f045232f92439dbce052db3177a474f9, but I guess @zturner added Parallel.cpp.

In D70447#1760618, @rnk wrote:

How come we have both lib/Support/Parallel.cpp and lib/Support/ThreadPool.cpp? What is the difference? I thought this was the general LLVM threadpool and I was going to send this over to @mehdi_amini who added that in rG396abbb6f045232f92439dbce052db3177a474f9, but I guess @zturner added Parallel.cpp.

LLD started the Parallel.h thread-pool in 2013, unfortunately they didn't do it in libSupport so it grew inside LLD. I added ThreadPool.cpp in 2015 to LLVM, I wasn't aware about the LLD implementation (that's what happens when utilities aren't put in libSupport...). Zachary noticed that later, and moved the LLD implementation in libSupport in 2017. However I haven't seen a plan for reconciliation the two.

I'm not fond of global state in general (this patch is also a good example of why), and so reluctant to the Parallel.cpp implementation.

ruiu added a comment.Nov 27 2019, 11:17 PM

Aren't Parallel and ThreadPool different? We are using parallel_for_loop but we are not using any thread pool.

Aren't Parallel and ThreadPool different? We are using parallel_for_loop but we are not using any thread pool.

It seems to me that parallel_for_loop is nothing else than a thin abstraction over a thread pool?

https://github.com/llvm/llvm-project/blob/master/llvm/lib/Support/Parallel.cpp#L35

I have also tested this patch with LLD built with libc++ on Ubuntu 18.04.3 LTS.

Does anyone have any comments on the actual patch itself?

I think it's beyond the scope of this patch to resolve the issues regarding the existence of both Parallel.cpp and ThreadPool.cpp. I agree with @ruiu that they serve different purposes and although some functionality could be shared/merged, I'm not entirely sure it would be worthwhile. Until we get to C++17, we're kind of stuck with Parallel.cpp for now...

rnk added a comment.Dec 3 2019, 11:54 AM

I raised the question of merging the thread pool implementations if possible because this code is subtle and hard to understand, and I was hoping I could get out of understanding it by reusing some code that already works. Now that I look at the other thread pool, I see it doesn't manage a global thread pool instance, so it doesn't suffer from the shutdown bugs you are running into. I will chalk that up as a win for not using global variables. =P

Given that this code is already written to use a global default executor instance, I think your proposal as I understand it makes sense:

  • Create the threadpool lazily on first access with static locals.
  • Stop the threads on llvm_shutdown if and when that is called, but do not deallocate any memory.
  • Deallocate the memory if and when the image is unloaded via a static unique_ptr.

I see that LLD today calls llvm_shutdown to destroy ManagedStatics:
https://github.com/llvm/llvm-project/blob/master/lld/Common/ErrorHandler.cpp#L59
Please update that comment to indicate that this will also stop any threads that have been spawned, and this is critical for avoiding crashes on shutdown. It's actually a waste of time to deallocate other ManagedStatics, so someone might try to remove this at some point.

However, I wonder if LLD (and any other hypothetical clients of Parallel.h) should explicitly cancel, join, or detach threads during shutdown instead.

  • Stop the threads on llvm_shutdown if and when that is called, but do not deallocate any memory.

To be clear (and as stated in the code comment), this only stops the threads from taking on more tasks and does not actually wait for running threads to finish. The important thing that does occur, is that it ensures there is no further thread creation going on which could cause a crash on Windows as the main thread is doing process exit.

  • Deallocate the memory if and when the image is unloaded via a static unique_ptr.

This also joins the threads, again to avoid intermittent crashes on Windows.

I see that LLD today calls llvm_shutdown to destroy ManagedStatics:
https://github.com/llvm/llvm-project/blob/master/lld/Common/ErrorHandler.cpp#L59
Please update that comment to indicate that this will also stop any threads that have been spawned, and this is critical for avoiding crashes on shutdown. It's actually a waste of time to deallocate other ManagedStatics, so someone might try to remove this at some point.

I will update the comment. This is already needed for -time-passes for LTO and I'm not sure it's entirely safe to generalize that all ManagedStatics do not need to be "shutdown".

However, I wonder if LLD (and any other hypothetical clients of Parallel.h) should explicitly cancel, join, or detach threads during shutdown instead.

The only truly "safe" option is to always join threads on exit. This should avoid any possible undefined behaviour. However, this has been ruled out for performance reasons :(.

andrewng updated this revision to Diff 232129.Dec 4 2019, 7:20 AM

Updated the comment about llvm_shutdown() in LLD.

ruiu added inline comments.Dec 10 2019, 11:02 PM
llvm/lib/Support/Parallel.cpp
135

OK, I had to take a look at ManagedStatic's class definition to understnad that ManagedStatic is a pointer-ish class which defines operator*.

ManagedStatic::operator* calls RegisterManagedStatic if it is not registered yet. So I guess you don't actually have to create a unique pointer, but instead you can just do return &*ManagedExec;.

andrewng marked an inline comment as done.Dec 11 2019, 4:27 AM
andrewng added inline comments.
llvm/lib/Support/Parallel.cpp
135

As mentioned in the comment, the unique_ptr is used to actually destroy the ThreadPoolExecutor via the destructor which "stops" the ThreadPoolExecutor and also waits for the threads to complete.

The "destroy" of the ManagedStatic only "stops" the ThreadPoolExecutor and does not wait for the threads to complete.

rnk added a comment.Dec 11 2019, 11:43 AM

Sorry, I lost track of this.

llvm/lib/Support/Parallel.cpp
135

You don't truly need unique_ptr to set up an atexit call, all you need is an atexit callback that runs after the ManagedStatic is destroyed. I think the idiomatic way to do that is to make your own class with a ctor/dtor pair that saves a pointer to the ThreadPoolExecutor, and then calls the appropriate methods on it when it is destroyed. I think that will make it clearer that no memory deallocation is occuring.

andrewng marked an inline comment as done.Dec 12 2019, 8:59 AM
andrewng added inline comments.
llvm/lib/Support/Parallel.cpp
135

With the unique_ptr, the ThreadPoolExecutor is destructed and its memory is deallocated. That was the intention and hence the use of unique_ptr.

Are you saying you would prefer that there is no memory deallocation of the ThreadPoolExecutor?

rnk added a subscriber: MaskRay.Dec 16 2019, 3:29 PM
llvm/lib/Support/Parallel.cpp
135

I see, no, I think I just misunderstood.

I guess there is a use case here that we aren't handling, which is multiple LLVM sessions in one process, where LLVM is shutdown multiple times. The way this code is structured, the thread pool will be created, started, "stopped", and then it will remain stopped for all future sessions. Right now LLD is the only client, but this code is in Support, so it should be general. I see that it is not used by the rest of LLVM. Perhaps it should be moved back to LLD Core, so that we can make such assumptions about process lifetime.

Right now LLD is the only client, but this code is in Support, so it should be general. I see that it is not used by the rest of LLVM. Perhaps it should be moved back to LLD Core, so that we can make such assumptions about process lifetime.

lld/Common/Threads.h is the only user of llvm/Support/Parallel.h. parallelForEach and parallelSort in lld do not take a context parameter, and hence the implementation of llvm/Support/Parallel.h uses a function-local static variable. If we can add a context parameter (llvm::ThreadPool), we may be able to make llvm/Support/Parallel.h more general and probably re-implement it on top of lib/Support/ThreadPool.cpp . If we can achieve that goal, do we still need to move the parallel utilities to lld?

rriddle added a subscriber: rriddle.EditedDec 16 2019, 5:38 PM

FWIW I've been using parallel::for_each in a few places in MLIR to do multi-threading, because it seemed like the right solution for what I needed. From what I gather here it seems like I should switch all of those usages to ThreadPool instead. If we can't resolve the deficiencies present in the current implementation it would be nice to move it out of Support/ so that others don't make the same mistake.

lld/Common/Threads.h is the only user of llvm/Support/Parallel.h. parallelForEach and parallelSort in lld do not take a context parameter, and hence the implementation of llvm/Support/Parallel.h uses a function-local static variable. If we can add a context parameter (llvm::ThreadPool), we may be able to make llvm/Support/Parallel.h more general and probably re-implement it on top of lib/Support/ThreadPool.cpp . If we can achieve that goal, do we still need to move the parallel utilities to lld?

I think the lack of a context parameter was deliberate to closer mimic the C++17 execution policies feature.

andrewng marked an inline comment as done.Dec 17 2019, 5:52 AM
andrewng added inline comments.
llvm/lib/Support/Parallel.cpp
135

Another solution would be to add support for "restarting" to this patch. It probably wouldn't be super efficient but it should be possible to handle "multiple shutdowns". Although it has to be said that "multiple shutdowns" just doesn't really sound right...

andrewng marked an inline comment as done.Dec 18 2019, 11:11 AM
andrewng added inline comments.
llvm/lib/Support/Parallel.cpp
135

There is potentially another alternative to using ManagedStatic / llvm_shutdown() by instead using std::at_quick_exit() along with std::quick_exit(). As far as I can tell, std::quick_exit() is equivalent to _exit(), except for the calling of the std::at_quick_exit() handlers.

Could this be a preferred option?

rnk added a comment.Dec 18 2019, 1:08 PM

I guess I like the quick_exit idea. Supporting multiple shutdowns of a thread pool seems fraught with complexity.

I also think it's kind of reasonable to keep the Parallel.h APIs simple. You don't actually want to have more than one threadpool per process running at the same time, so if we were to change all the APIs to pass a context parameter, I'm not sure that'd be a good thing.

In D70447#1790286, @rnk wrote:

I also think it's kind of reasonable to keep the Parallel.h APIs simple.

+1

llvm/lib/Support/Parallel.cpp
135

Can you elaborate how you'd use quick_exit and at_quick_exit?

andrewng marked an inline comment as done.Dec 19 2019, 8:52 AM
andrewng added inline comments.
llvm/lib/Support/Parallel.cpp
135

Instead of ManagedStatic, I would create a similar "holding" class that would hook into std::quick_exit() using std::at_quick_exit() to perform the "stop". However, this means that LLD would need to call std::quick_exit() rather than _exit().

In my current prototyping of this alternative, I'm actually running into a new issue with the MSVC static run-time, in that I appear now to be intermittently losing output to stderr and thus getting random lit test failures for where LLD is expected to fail and output an error message. Note that in these situations, LLD is exiting in a thread from the ThreadPoolExecutor!

In D70447#1790286, @rnk wrote:

I guess I like the quick_exit idea. Supporting multiple shutdowns of a thread pool seems fraught with complexity.

Why exactly? Isn't it just about joining threads and deleting them? I feel I am missing something.

I also think it's kind of reasonable to keep the Parallel.h APIs simple. You don't actually want to have more than one threadpool per process running at the same time, so if we were to change all the APIs to pass a context parameter, I'm not sure that'd be a good thing.

I'm not sure I totally agree with this: a global thread pool without preemption and priority inversion seems to be easily leading to deadlock through starvation. It is also breaking composability of library components since you may end-up enqueuing to the same pool from a thread in the pool.
Explicitly controlling the context may be suboptimal in some cases, but at least it allows to keep things scoped inside a component.

andrewng marked an inline comment as done.Dec 20 2019, 7:27 AM

I'm not sure I totally agree with this: a global thread pool without preemption and priority inversion seems to be easily leading to deadlock through starvation. It is also breaking composability of library components since you may end-up enqueuing to the same pool from a thread in the pool.

There is already a "primitive" guard to prevent nesting of parallel invocations. Although this guard could also prevent other threads (not belonging to the "parallel" thread pool) from enqueuing parallel tasks.

llvm/lib/Support/Parallel.cpp
135

Another problem with std::quick_exit() / std::at_quick_exit() is that MinGW does not support it!

rnk accepted this revision.Jan 3 2020, 3:58 PM

Given that quick_exit won't work then, let's do this thing. I don't think multiple llvm_shutdown is a real use case for anyone using these APIs today.

This revision is now accepted and ready to land.Jan 3 2020, 3:58 PM
MaskRay added inline comments.Jan 3 2020, 5:52 PM
llvm/lib/Support/Parallel.cpp
47

std::lock_guard

51

Why if (Stop) break

70

You can use

static ManagedStatic<ThreadPoolExecutor, object_creator<ThreadPoolExecutor>,
                     ThreadPoolExecutor::Deleter>
    ManagedExec;
75

Why detach?

andrewng marked 5 inline comments as done.Jan 6 2020, 6:49 AM
andrewng added inline comments.
llvm/lib/Support/Parallel.cpp
47

I will also change stop() and add() to use a scoped lock which is actually my preferred style.

51

This is an "early out" in the case that the pool is stopped before all the threads of the pool have been created.

75

This is to follow the general guideline that a thread should either be detached or joined at exit.

andrewng updated this revision to Diff 236355.Jan 6 2020, 6:52 AM
andrewng marked an inline comment as done.

Updated to address review comments.

MaskRay accepted this revision.Jan 7 2020, 9:59 AM
This revision was automatically updated to reflect the committed changes.