This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Support/
-
Support/
-
Parallel.cpp

Differential D53968

[LLD]fix windows mingw build with 'LLVM_LINK_LLVM_DYLIB' the lld will hang when exiting
Needs ReviewPublic

Authored by SquallATF on Oct 31 2018, 11:37 PM.

Download Raw Diff

Details

Reviewers

mstorsjo
ruiu
zturner

Summary

Windows mingw shared build LLVM the work threads will exit before static ThreadPoolExecutor exec object's destructor run. Then the Done.dec() function will not be executed, lld will hang at Latch destructor. So need to use llvm::ManagedStatic wrap it to protect work threads safe exit when call llvm_shutdown().

Diff Detail

Repository

rL LLVM

Build Status

Buildable 24441
Build 24440: arc lint + arc unit

Event Timeline

SquallATF created this revision.Oct 31 2018, 11:37 PM

Herald added subscribers: llvm-commits, kristina. · View Herald TranscriptOct 31 2018, 11:37 PM

Harbormaster completed remote builds in B24441: Diff 172091.Oct 31 2018, 11:37 PM

mstorsjo added a subscriber: mstorsjo.Nov 1 2018, 5:14 AM

Does it hang on a normal exit or only on an abnormal exit? Does it always hang?

+Martin for the MinGW specific aspect of this, I can't comment on that.

However, we usually don't put ManagedStatic inside of functions. Just make them global.

I _think_ this isn't mingw specific, but I don't often do MSVC based builds...

However, I think I have run into this issue before, in a different setup. When exiting lld, we exit by calling llvm_shutdown(); _exit(Val); - as this is supposed not to run destructors. At some point I had a broken setup with global destructors, so they ran even when _exit() was called, and I ran into a hang with that. I just fixed that by fixing the mingw global destructors to not run on _exit().

However, in the case reported here, if built with LLVM_LINK_LLVM_DYLIB enabled, the global destructors (from this object here) are in a different DLL. When exiting the process with _exit(), global destructors in other DLLs actually do run - both with mingw and MSVC builds.

I never did dig into the matter any further back then, so I can't confirm/deny the explanation in this post (which is a bit hard to follow), but I would guess that you can try to reproduce the same issue by changing _exit() to exit() in LLD, and try running LLD in a normal windows build - I think it could be reproducible in MSVC style builds as well.

I think it is OK to wrap these objects with ManagedStatic as long as it fixes a crash bug and doesn't do any harm.

This bug only in shared dll, static build does not trigger this bug Here is a issue with another project here that is very similar to this one.
I create a small test code to reproduce the bug. I have tried use clang, MSVC and gcc,I post MSVC build command here.

cpptest.zip1 KBDownload

bug build, test program will hang when exiting.

cl /EHsc /LD cppdll.cpp /Fecppdllcl.dll
cl cppexe.cpp /Fecppexecl.exe /link cppdllcl.lib

static build does not have this bug

cl /EHsc cppexe.cpp cppdll.cpp -DSTATIC /Fecppdllcls.exe

fix use std::weak_ptr, but std::weak_ptr is not a real singleton.

cl /EHsc /LD cppdll.cpp -DFIX /Fecppdllclfix.dll
cl cppexe.cpp /Fecppexeclfix.exe -DFIX /link cppdllclfix.lib

In D53968#1284789, @ruiu wrote:

I think it is OK to wrap these objects with ManagedStatic as long as it fixes a crash bug and doesn't do any harm.

If I understand this correctly, this change makes lld on all other platforms to clean up the default thread pool executor in the call to llvm_shutdown() in exitLld(), before calling _exit(). So it potentially has some performance effect for other platforms.

Btw, it does indeed seem that you can't repro this issue by changing _exit() into exit(), I was mistaken about this.

After looking a bit closer at the repro example, as far as I understand it, the main issue is that you cannot do threading cleanup from within a DLL unload handler (DLLMain), but global objects' destructors within a DLL run within the unload handler.

As long as the main LLVM code is linked within lld.exe, the LLVM threadpool executor destructor doesn't get run (or if changing _exit()into exit(), the destructors run in a context where it is ok to clean up threads), and everything is fine. But if the threadpool code is within LLVM code built as a separate DLL, the global destructor for static ThreadPoolExecutor exec; does run, in DLLMain when unloading the DLL, where you can't do thread cleanup and the destructor subsequently hangs.

So if @ruiu is ok with the performance impact on always running this destructor when exiting lld, this change probably is fine. If not, we'd have to do something like this:

#if defined(_WIN32) && defined(LLVM_LINK_LLVM_DYLIB) // I don't think there's a define for this?
static llvm::ManagedStatic<ThreadPoolExecutor> exec; // This destructor can't run on DLL unload where it hangs; make sure it is called already on llvm_shutdown().
#else
static ThreadPoolExecutor exec;
#endif

mstorsjo added reviewers: ruiu, zturner.Nov 2 2018, 11:30 PM

I don't think I understand what could run as a result of destructing the thread pool executor on exit. My guess is that it calls the destructors of thread-local global objects and then call the operating system to kill threads. If that's the case, the cost of doing that should be negligible because we don't have any thread-local globals. But is that understanding correct?

In D53968#1286945, @ruiu wrote:

I don't think I understand what could run as a result of destructing the thread pool executor on exit. My guess is that it calls the destructors of thread-local global objects and then call the operating system to kill threads. If that's the case, the cost of doing that should be negligible because we don't have any thread-local globals. But is that understanding correct?

I guess so, plus the overhead of actually joining and waiting for all the threads to complete.

Put another way though; lld currently exits with _exit instead of exit explicitly to skip running destructors, for performance reasons. If you have a case where you can measure the difference between these two, to conclude that the distinction is worthwhile, it should be easy to measure the impact of this patch, whether it still keeps the same beneficial tradeoff as before, or if it goes closer to the case of exit with running all destructors, which we explicitly didn't want to.

If it's isn't doable to actually measure any such difference, I would say this patch also should be fine because whatever difference it makes is within the unmeasurable range.

I guess so, plus the overhead of actually joining and waiting for all the threads to complete.

Waiting for all the threads for completion may incur a noticeable delay when exiting in an error condition. Currently, when we exit from one thread running a parallel-for loop, I believe the entire process exit immediately, but if we wait for all thread to complete, I think we'll wait for the parallel-for loop to complete. That situation could happen if we are processing relocations, for example, and found a lot of relocation errors.

In D53968#1286962, @ruiu wrote:

I guess so, plus the overhead of actually joining and waiting for all the threads to complete.

Waiting for all the threads for completion may incur a noticeable delay when exiting in an error condition. Currently, when we exit from one thread running a parallel-for loop, I believe the entire process exit immediately, but if we wait for all thread to complete, I think we'll wait for the parallel-for loop to complete. That situation could happen if we are processing relocations, for example, and found a lot of relocation errors.

Ok, I see.

I think this one might be correct in general though (for any other tool that uses the LLVM internals built into a separate DLL), but for LLD for wanting to exit quickly, should there be a way to tell the ThreadPoolExecutor object to skip any cleanup in the destructor?

I think this one might be correct in general though (for any other tool that uses the LLVM internals built into a separate DLL), but for LLD for wanting to exit quickly, should there be a way to tell the ThreadPoolExecutor object to skip any cleanup in the destructor?

Yeah I guess so. The point of existing as quickly as possible if we decide to exit is to minimize the latency for interactive users, and I think if we can achieve that goal by introducing some mechanism, it's probably worth it.

grimar added a subscriber: grimar.Nov 5 2018, 6:58 AM

mati865 added a subscriber: mati865.Dec 28 2018, 2:55 AM

mstorsjo mentioned this in D70447: [Support] ThreadPoolExecutor fixes for Windows/MinGW.Nov 19 2019, 12:36 PM

Revision Contents

Path

Size

lib/

Support/

Parallel.cpp

17 lines

Diff 172091

lib/Support/Parallel.cpp

//===- llvm/Support/Parallel.cpp - Parallel algorithms --------------------===//		//===- llvm/Support/Parallel.cpp - Parallel algorithms --------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Support/Parallel.h"		#include "llvm/Support/Parallel.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"

#if LLVM_ENABLE_THREADS		#if LLVM_ENABLE_THREADS

		#include "llvm/Support/ManagedStatic.h"
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"

#include <atomic>		#include <atomic>
#include <stack>		#include <stack>
#include <thread>		#include <thread>

using namespace llvm;		using namespace llvm;

namespace {		namespace {

/// An abstract class that takes closures and runs them asynchronously.		/// An abstract class that takes closures and runs them asynchronously.
class Executor {		class Executor {
public:		public:
virtual ~Executor() = default;		virtual ~Executor() = default;
virtual void add(std::function<void()> func) = 0;		virtual void add(std::function<void()> func) = 0;

static Executor *getDefaultExecutor();		static Executor &getDefaultExecutor();
};		};

#if defined(_MSC_VER)		#if defined(_MSC_VER)
/// An Executor that runs tasks via ConcRT.		/// An Executor that runs tasks via ConcRT.
class ConcRTExecutor : public Executor {		class ConcRTExecutor : public Executor {
struct Taskish {		struct Taskish {
Taskish(std::function<void()> Task) : Task(Task) {}		Taskish(std::function<void()> Task) : Task(Task) {}

std::function<void()> Task;		std::function<void()> Task;

static void run(void *P) {		static void run(void *P) {
Taskish Self = static_cast<Taskish >(P);		Taskish Self = static_cast<Taskish >(P);
Self->Task();		Self->Task();
concurrency::Free(Self);		concurrency::Free(Self);
}		}
};		};

public:		public:
virtual void add(std::function<void()> F) {		virtual void add(std::function<void()> F) {
Concurrency::CurrentScheduler::ScheduleTask(		Concurrency::CurrentScheduler::ScheduleTask(
Taskish::run, new (concurrency::Alloc(sizeof(Taskish))) Taskish(F));		Taskish::run, new (concurrency::Alloc(sizeof(Taskish))) Taskish(F));
}		}
};		};

Executor *Executor::getDefaultExecutor() {		Executor &Executor::getDefaultExecutor() {
static ConcRTExecutor exec;		static llvm::ManagedStatic<ConcRTExecutor> exec;
return &exec;		return *exec;
}		}

#else		#else
/// An implementation of an Executor that runs closures on a thread pool		/// An implementation of an Executor that runs closures on a thread pool
/// in filo order.		/// in filo order.
class ThreadPoolExecutor : public Executor {		class ThreadPoolExecutor : public Executor {
public:		public:
explicit ThreadPoolExecutor(unsigned ThreadCount = hardware_concurrency())		explicit ThreadPoolExecutor(unsigned ThreadCount = hardware_concurrency())
Show All 40 Lines	private:

std::atomic<bool> Stop{false};		std::atomic<bool> Stop{false};
std::stack<std::function<void()>> WorkStack;		std::stack<std::function<void()>> WorkStack;
std::mutex Mutex;		std::mutex Mutex;
std::condition_variable Cond;		std::condition_variable Cond;
parallel::detail::Latch Done;		parallel::detail::Latch Done;
};		};

Executor *Executor::getDefaultExecutor() {		Executor &Executor::getDefaultExecutor() {
static ThreadPoolExecutor exec;		static llvm::ManagedStatic<ThreadPoolExecutor> exec;
return &exec;		return *exec;
}		}
#endif		#endif
}		}

void parallel::detail::TaskGroup::spawn(std::function<void()> F) {		void parallel::detail::TaskGroup::spawn(std::function<void()> F) {
L.inc();		L.inc();
Executor::getDefaultExecutor()->add([&, F] {		Executor::getDefaultExecutor().add([&, F] {
F();		F();
L.dec();		L.dec();
});		});
}		}
#endif // LLVM_ENABLE_THREADS		#endif // LLVM_ENABLE_THREADS