Windows mingw shared build LLVM the work threads will exit before static ThreadPoolExecutor exec object's destructor run. Then the Done.dec() function will not be executed, lld will hang at Latch destructor. So need to use llvm::ManagedStatic wrap it to protect work threads safe exit when call llvm_shutdown().
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 24441 Build 24440: arc lint + arc unit
Event Timeline
+Martin for the MinGW specific aspect of this, I can't comment on that.
However, we usually don't put ManagedStatic inside of functions. Just make them global.
I _think_ this isn't mingw specific, but I don't often do MSVC based builds...
However, I think I have run into this issue before, in a different setup. When exiting lld, we exit by calling llvm_shutdown(); _exit(Val); - as this is supposed not to run destructors. At some point I had a broken setup with global destructors, so they ran even when _exit() was called, and I ran into a hang with that. I just fixed that by fixing the mingw global destructors to not run on _exit().
However, in the case reported here, if built with LLVM_LINK_LLVM_DYLIB enabled, the global destructors (from this object here) are in a different DLL. When exiting the process with _exit(), global destructors in other DLLs actually do run - both with mingw and MSVC builds.
I never did dig into the matter any further back then, so I can't confirm/deny the explanation in this post (which is a bit hard to follow), but I would guess that you can try to reproduce the same issue by changing _exit() to exit() in LLD, and try running LLD in a normal windows build - I think it could be reproducible in MSVC style builds as well.
I think it is OK to wrap these objects with ManagedStatic as long as it fixes a crash bug and doesn't do any harm.
This bug only in shared dll, static build does not trigger this bug Here is a issue with another project here that is very similar to this one.
I create a small test code to reproduce the bug. I have tried use clang, MSVC and gcc,I post MSVC build command here.
bug build, test program will hang when exiting.
cl /EHsc /LD cppdll.cpp /Fecppdllcl.dll cl cppexe.cpp /Fecppexecl.exe /link cppdllcl.lib
static build does not have this bug
cl /EHsc cppexe.cpp cppdll.cpp -DSTATIC /Fecppdllcls.exe
fix use std::weak_ptr, but std::weak_ptr is not a real singleton.
cl /EHsc /LD cppdll.cpp -DFIX /Fecppdllclfix.dll cl cppexe.cpp /Fecppexeclfix.exe -DFIX /link cppdllclfix.lib
If I understand this correctly, this change makes lld on all other platforms to clean up the default thread pool executor in the call to llvm_shutdown() in exitLld(), before calling _exit(). So it potentially has some performance effect for other platforms.
Btw, it does indeed seem that you can't repro this issue by changing _exit() into exit(), I was mistaken about this.
After looking a bit closer at the repro example, as far as I understand it, the main issue is that you cannot do threading cleanup from within a DLL unload handler (DLLMain), but global objects' destructors within a DLL run within the unload handler.
As long as the main LLVM code is linked within lld.exe, the LLVM threadpool executor destructor doesn't get run (or if changing _exit()into exit(), the destructors run in a context where it is ok to clean up threads), and everything is fine. But if the threadpool code is within LLVM code built as a separate DLL, the global destructor for static ThreadPoolExecutor exec; does run, in DLLMain when unloading the DLL, where you can't do thread cleanup and the destructor subsequently hangs.
So if @ruiu is ok with the performance impact on always running this destructor when exiting lld, this change probably is fine. If not, we'd have to do something like this:
#if defined(_WIN32) && defined(LLVM_LINK_LLVM_DYLIB) // I don't think there's a define for this? static llvm::ManagedStatic<ThreadPoolExecutor> exec; // This destructor can't run on DLL unload where it hangs; make sure it is called already on llvm_shutdown(). #else static ThreadPoolExecutor exec; #endif
I don't think I understand what could run as a result of destructing the thread pool executor on exit. My guess is that it calls the destructors of thread-local global objects and then call the operating system to kill threads. If that's the case, the cost of doing that should be negligible because we don't have any thread-local globals. But is that understanding correct?
I guess so, plus the overhead of actually joining and waiting for all the threads to complete.
Put another way though; lld currently exits with _exit instead of exit explicitly to skip running destructors, for performance reasons. If you have a case where you can measure the difference between these two, to conclude that the distinction is worthwhile, it should be easy to measure the impact of this patch, whether it still keeps the same beneficial tradeoff as before, or if it goes closer to the case of exit with running all destructors, which we explicitly didn't want to.
If it's isn't doable to actually measure any such difference, I would say this patch also should be fine because whatever difference it makes is within the unmeasurable range.
I guess so, plus the overhead of actually joining and waiting for all the threads to complete.
Waiting for all the threads for completion may incur a noticeable delay when exiting in an error condition. Currently, when we exit from one thread running a parallel-for loop, I believe the entire process exit immediately, but if we wait for all thread to complete, I think we'll wait for the parallel-for loop to complete. That situation could happen if we are processing relocations, for example, and found a lot of relocation errors.
Ok, I see.
I think this one might be correct in general though (for any other tool that uses the LLVM internals built into a separate DLL), but for LLD for wanting to exit quickly, should there be a way to tell the ThreadPoolExecutor object to skip any cleanup in the destructor?
I think this one might be correct in general though (for any other tool that uses the LLVM internals built into a separate DLL), but for LLD for wanting to exit quickly, should there be a way to tell the ThreadPoolExecutor object to skip any cleanup in the destructor?
Yeah I guess so. The point of existing as quickly as possible if we decide to exit is to minimize the latency for interactive users, and I think if we can achieve that goal by introducing some mechanism, it's probably worth it.