This is an archive of the discontinued LLVM Phabricator instance.

[LLD] Allow disabling the early exit codepath as a build configuration
AbandonedPublic

Authored by mstorsjo on May 18 2021, 5:34 AM.

Details

Reviewers
rnk
MaskRay
Summary

In certain cases, the early exit codepath can be brittle and cause
rare deadlocks: On Windows, when exiting bypassing destructors, all
loaded DLLs are still unintiailized normally, running their
destructors. This is known to cause problems if e.g. building with
LLVM_LINK_LLVM_DYLIB (where the destructors within the LLVM Support
threadpool classes are run), and also seems to cause rare (0.05% of
executions) hangs if LLD is linked against libc++ in the form of a DLL.

Alternatively, should we disable the early exit codepath altogether
when running on Windows?

Diff Detail

Event Timeline

mstorsjo created this revision.May 18 2021, 5:34 AM
mstorsjo requested review of this revision.May 18 2021, 5:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2021, 5:34 AM

FWIW, I think the difference is measurable, but it's in the order of magnitude of 0.6% of the runtime.

Or should we keep the option, but deafult it to off on windows?

mati865 added a subscriber: mati865.EditedMay 18 2021, 9:10 AM

Funny but I've hit this deadlock (LLVM_LINK_LLVM_DYLIB, LLD_DEFAULT_LD_LLD_IS_MINGW and shared libc++) last week in single LLD unit test. It never reproduced despite dozen of LLD testsuite runs.

Haven't seen it in real use yet.

Funny but I've hit this deadlock (LLVM_LINK_LLVM_DYLIB, LLD_DEFAULT_LD_LLD_IS_MINGW and shared libc++) last week in single LLD unit test. It never reproduced despite dozen of LLD testsuite runs.

Haven't seen it in real use yet.

I see it when running libcxx tests (which link ~6000 executables), where around 3-6 of those linker jobs get stuck, without LLVM_LINK_LLVM_DYLIB, but with libc++ linked statically. FWIW, if you want to cherrypick this in one form or another, you'll need to pick dd7575ba44f0e255e3d3c04bc60648a8d41a18d4 too (which I just went ahead and committed directly as trivial).

rnk added a comment.May 20 2021, 10:05 AM

I'd be OK with this option, but I think it makes more sense to try to disable the thread pool destructor in the support library. Oh, it looks like that has already been done:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Parallel.cpp#L121

Why isn't that working? I guess the reason it's not working is that the unique_ptr destructor destroys the ThreadPoolExecutor when the LLVM DLL is unloaded, even during _exit:

static std::unique_ptr<ThreadPoolExecutor> Exec(&(*ManagedExec));

So, here is my (possibly terribe) idea: let's just leak the thread pool executor. It really can't be destroyed until the user calls llvm_shutdown to cancel/join the threads. This code would do it, right?

static ThreadPoolExecutor *Exec = &(*ManagedExec);

I'd be OK with this option, but I think it makes more sense to try to disable the thread pool destructor in the support library. Oh, it looks like that has already been done:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Parallel.cpp#L121

Why isn't that working?

I think that might be the case for when using LLVM_LINK_LLVM_DYLIB, but that's not the case I ran into.

The case I identified is when all of LLVM is linked statically (i.e. their destructors are bypassed, as intended), but libc++ is linked in dynamically. So something in the libc++ DLL contains a destructor, which can cause these hangs occasionally. I have no idea offhand what that might be and how that might relate to the LLVM threadpool.

andrewng added a subscriber: andrewng.EditedMay 21 2021, 3:37 AM

I looked into this in some detail back in the day and am mainly responsible for how this thread pool currently functions. I primarily focused on static linking with the MSVC static runtime, as that's what we use downstream and were seeing issues with, but also included MinGW, GNU and libc++. To be honest, I can't remember if I tried LLVM_LINK_LLVM_DYLIB, there are just too many possible configurations! The various compilers/runtimes have changed since I last looked at this code, so it's not surprising that there may be issues depending on your configuration.

From past experience, I think it has always been "dangerous" on Windows to exit without waiting/terminating any running threads. If this can't be done, then a "hard" exit, e.g. via ExitProcess(), is usually the only reliable way to finish and this is what we currently still use downstream for the LLD early exit. IIRC, although this may have now changed, the LLD lit testsuite disables early exit for running the tests (at least for ELF) and it was actually this scenario with the MSVC debug static runtime and MS ConcRT (used at that time for the Windows parallel functionality) that was intermittently crashing on exit during lit testing.

The other thing that complicates the situation is that LLD can exit from a thread from the thread pool in certain error cases, i.e. doesn't always exit from the main thread. This scenario is going to be rare in actual practical use, but AFAIK is still a path exercised by the LLD testing.

So in terms of how to "fix" this, I would say that disabling early exit for Windows is definitely an option, assuming the impact for high core counts isn't too significant. Or perhaps using ExitProcess() for early exit which has worked for us downstream, at least for static linking with the MSVC static runtime. The trickiest part is really going to be checking any "fix" works for the configurations that people care about.

rnk added a comment.May 21 2021, 9:56 AM

I'm in favor of strengthening _exit to ExitProcess. Fewer configs is better. Users won't have to discover these flaky shutdown crashes and add this customization.

rnk added a comment.May 21 2021, 9:58 AM

If ExitProcess still unloads DLLs before exiting, I believe there are ways to call TerminateProcess to avoid this.

FWIW at MSYS2 we build LLVM with both GNU and LLVM toolchains for different "subsystems".
GNU's LLD was deadlocking a lot so we have patched it to use TerminateProcess and it worked around all deadlocks except when running tests (the patch probably needs more work on it), we don't apply it for LLVM's LLD because I thought it's a bit overkill.

IIRC ExitProcess was not enough but this is caused by GCC emutls bug so I don't think this configuration should be considered here.

The patch: https://github.com/msys2/MINGW-packages/blob/847e7c0cca7997d8f1b106b6df47825ece22d835/mingw-w64-clang/0305-use-TerminateProcess-instead-of-exit.patch

I looked into this in some detail back in the day and am mainly responsible for how this thread pool currently functions. I primarily focused on static linking with the MSVC static runtime, as that's what we use downstream and were seeing issues with, but also included MinGW, GNU and libc++. To be honest, I can't remember if I tried LLVM_LINK_LLVM_DYLIB, there are just too many possible configurations! The various compilers/runtimes have changed since I last looked at this code, so it's not surprising that there may be issues depending on your configuration.

From past experience, I think it has always been "dangerous" on Windows to exit without waiting/terminating any running threads. If this can't be done, then a "hard" exit, e.g. via ExitProcess(), is usually the only reliable way to finish....

Agreed. ExitProcess is usually the best choice when you don't want to bother with an orderly shut down. I'd go so far as to say that if you experience a deadlock or other problem with ExitProcess, the code likely has a bug or at least a questionable design choice. For example, hanging in a destructor in DLL_PROCESS_DETACH is akin to throwing an exception from a destructor: You shouldn't do that.

The other thing that complicates the situation is that LLD can exit from a thread from the thread pool in certain error cases, i.e. doesn't always exit from the main thread. This scenario is going to be rare in actual practical use, but AFAIK is still a path exercised by the LLD testing.

I could see how that might make even ExitProcess a risky proposition if there are any waits on synchronization objects happening during DLL_THREAD_DETACH. Is this a well understood scenario? If the DLL_THREAD_DETACH destructor problems are hard to fix, would it be feasible to have the thread pool thread signal the main thread to have _it_ call ExitProcess?

Windows has evolved the shutdown rules to reduce the risk of deadlocking in DLL_THREAD_DETACH, but it's not perfect. https://devblogs.microsoft.com/oldnewthing/20100122-00/?p=15193

I could see how that might make even ExitProcess a risky proposition if there are any waits on synchronization objects happening during DLL_THREAD_DETACH. Is this a well understood scenario? If the DLL_THREAD_DETACH destructor problems are hard to fix, would it be feasible to have the thread pool thread signal the main thread to have _it_ call ExitProcess?

Having the threads signal the main thread and make the main thread call ExitProcess is pretty much the same as disabling the early exit mechanism, i.e. what this patch does (behind a configure option).

(The early exit flag does two things: First, exiting from any thread when the error limit is exceeded, vs waiting until the main thread checks if it should exit. And secondly, when exiting, both from the main thread and from worker threads, skip destructors or not. If we exit from non-main threads we really must not run destructors. If we exit from the main thread we could run all of them, but bypassing destructors in the same way shaves off some fraction of runtime, and makes the destructor bypass mechanism more tested.)

While exiting from a non-main thread is tricky, things also do hang when exiting from the main thread, when bypassing constructors. The fatal issue there is that we have a number of threads still running, that can keep various locks. When exiting, all other threads are stopped where they are, and we run the destructors for DLLs.

The hangs on exit look like this:

* frame #0: 0x00007ffe068ae087 ntdll.dll`RtlAllocateHeap + 4215
  frame #1: 0x00007ffe068ad997 ntdll.dll`RtlAllocateHeap + 2439
  frame #2: 0x00007ffe029baeee ucrtbase.dll`_calloc_base + 78
  frame #3: 0x00007ffe029f94bc ucrtbase.dll`__intrinsic_setjmpex + 6012
  frame #4: 0x00007ffe029d142d ucrtbase.dll`_configthreadlocale + 13
  frame #5: 0x00007ffde79be2b7 libc++.dll`std::__1::ios_base::Init::~Init() + 167
  frame #6: 0x00007ffde79f8d5a libc++.dll`std::__1::strstreambuf::operator=(std::__1::strstreambuf&&) + 1418
  frame #7: 0x00007ffde79c7751 libc++.dll`std::__1::codecvt<wchar_t, char, _Mbstatet>::do_unshift(_Mbstatet&, char*, char*, char*&) const + 81
  frame #8: 0x00007ffde79beda9 libc++.dll`std::__1::ios_base::Init::~Init() + 2969
  frame #9: 0x00007ffde79b0016 libc++.dll`std::__1::basic_ostream<wchar_t, std::__1::char_traits<wchar_t> >::flush() + 86
  frame #10: 0x00007ffde79be1dc libc++.dll`std::__1::ios_base::Init::Init() + 124
  frame #11: 0x00007ffe029c3e26 ucrtbase.dll`_execute_onexit_table + 342
  frame #12: 0x00007ffe029c3d4b ucrtbase.dll`_execute_onexit_table + 123
  frame #13: 0x00007ffe029c3d04 ucrtbase.dll`_execute_onexit_table + 52
  frame #14: 0x00007ffde799109e libc++.dll`_CRT_INIT(hDllHandle=<unavailable>, dwReason=<unavailable>, lpreserved=<unavailable>) at crtdll.c:130:11
  frame #15: 0x00007ffde7991311 libc++.dll`__DllMainCRTStartup(hDllHandle=0x00007ffde7990000, dwReason=0, lpreserved=0x0000000000000001) at crtdll.c:195:6
  frame #16: 0x00007ffe068e8f07 ntdll.dll`RtlAnsiStringToUnicodeString + 663
  frame #17: 0x00007ffe068e35bc ntdll.dll`LdrShutdownProcess + 300
  frame #18: 0x00007ffe06906dcd ntdll.dll`RtlExitUserProcess + 173
  frame #19: 0x00007ffe042fd62a kernel32.dll`ExitProcess + 10
  frame #20: 0x00007ffe029d00a4 ucrtbase.dll`exit + 468
  frame #21: 0x00007ffe029cff4f ucrtbase.dll`exit + 127
  frame #22: 0x00000001400d2679 ld.lld.exe`_Exit + 9
  frame #23: 0x00000001400d265f ld.lld.exe`llvm::sys::Process::Exit(int, bool) + 31
  frame #24: 0x000000014003beac ld.lld.exe`lld::exitLld(int) + 460
  frame #25: 0x000000014004ad55 ld.lld.exe`lld::coff::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&, llvm::raw_ostream&) + 565
  frame #26: 0x00000001400ad85e ld.lld.exe`lld::mingw::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&, llvm::raw_ostream&) + 20046
  frame #27: 0x0000000140001f08 ld.lld.exe`lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) + 1608

(The top two frames, viewed from WinDbg instead of lldb, are ntdll!RtlpLowFragHeapAllocFromContext and ntdll!RtlpAllocateHeapInternal.)

So while libc++ ideally wouldn't be calling _configthreadlocale when cleaning up, we're also pretty much in pretty bad place in any case if RtlAllocateHeap hangs.

I'd go so far as to say that if you experience a deadlock or other problem with ExitProcess, the code likely has a bug or at least a questionable design choice.

Well the questionable design choice in this case is LLD that wants to exit while there's lots of threads still "running"/not joined. This is designed to be done by not running any destructors at all, but when we can have DLL boundaries at multiple places (around libc++, or around libLLVM) we end up running some (but not all) destructors.

I'll go ahead and push D102944 which should make sure that no destructors whatsoever are called - hopefully that will put an end to these issues. If not we might need to just disable the early exit mechanism altogether on windows and avoid the whole questionable design that way.

mstorsjo abandoned this revision.May 22 2021, 1:54 PM

Superseded (for now) by D102944 unless that one turns out to cause other problems in practice.

While exiting from a non-main thread is tricky, things also do hang when exiting from the main thread, when bypassing constructors. The fatal issue there is that we have a number of threads still running, that can keep various locks. When exiting, all other threads are stopped where they are, and we run the destructors for DLLs.

I guess I misread a few sources. ExitProcess first terminates all threads other than the calling thread. I expected that would force the terminated threads to release all of their synchronization objects, thus preventing many types of deadlocks. But now I see that the synchronization objects are not released until later--when all of the process's kernel objects are reclaimed.

I assumed the short circuit option would just skip running destructors for statics in the DLLs (by having the DLLs do nothing on DLL_PROCESS_DETACH), but I see in the stack trace that it's libc++'s DllMain, which is outside our scope. Bummer.

Of course, the real mind bender is wondering why the "universal" CRT needs to setjmp (?!) (and therefore allocate memory) during shutdown. Yikes!

I assumed the short circuit option would just skip running destructors for statics in the DLLs (by having the DLLs do nothing on DLL_PROCESS_DETACH), but I see in the stack trace that it's libc++'s DllMain, which is outside our scope. Bummer.

Of course, the real mind bender is wondering why the "universal" CRT needs to setjmp (?!) (and therefore allocate memory) during shutdown. Yikes!

It's not the UCRT itself that needs setjmp during shutdown, it's libc++ that calls _configthreadlocale (which IIRC is used to emulate functions that take a locale parameter). Not sure what call it is within libc++ though. The root cause remains in any case - running threads that are abruptly stopped can leave things in an unclean state, and the only really safe way to exit from it is to not let things run destructors.

Just out of curiosity, here's the same backtrace with libc++.dll built with debug info:

* frame #0: 0x00007ffa21a9dd08 ntdll.dll`RtlAllocateHeap + 3320
  frame #1: 0x00007ffa21a9d997 ntdll.dll`RtlAllocateHeap + 2439
  frame #2: 0x00007ffa1e40aeee ucrtbase.dll`_calloc_base + 78
  frame #3: 0x00007ffa1e4494bc ucrtbase.dll`__intrinsic_setjmpex + 6012
  frame #4: 0x00007ffa1e42142d ucrtbase.dll`_configthreadlocale + 13
  frame #5: 0x00007ffa10accee7 libc++.dll`std::__1::__libcpp_locale_guard::__libcpp_locale_guard(this=0x00000000013fdfb0, __l=locale_t @ 0x00000000013fdfc0) at __locale:75:18
  frame #6: 0x00007ffa10b01e9a libc++.dll`wcrtomb_l(s="", wc=L'\0', ps=0x00007ffa10b923c0, loc=<unavailable>) at locale_win32.cpp:58:27
  frame #7: 0x00007ffa10ad5941 libc++.dll`std::__1::codecvt<wchar_t, char, _Mbstatet>::do_unshift(this=<unavailable>, st=<unavailable>, to=<unavailable>, to_end="XzM\U00000001", to_nxt=<no value available>) const at locale.cpp:1673:16
  frame #8: 0x00007ffa10acd9e9 libc++.dll`std::__1::__stdoutbuf<wchar_t>::sync() [inlined] std::__1::codecvt<wchar_t, char, _Mbstatet>::unshift(this=<unavailable>, __st=<unavailable>, __to_end="XzM\U00000001", __to_nxt=<no value available>) const at __locale:1022:16
  frame #9: 0x00007ffa10acd9d8 libc++.dll`std::__1::__stdoutbuf<wchar_t>::sync(this=0x00007ffa10b92360) at __std_stream:334
  frame #10: 0x00007ffa10abea56 libc++.dll`std::__1::basic_ostream<wchar_t, std::__1::char_traits<wchar_t> >::flush() [inlined] std::__1::basic_streambuf<wchar_t, std::__1::char_traits<wchar_t> >::pubsync(this=<unavailable>) at streambuf:167:28
  frame #11: 0x00007ffa10abea50 libc++.dll`std::__1::basic_ostream<wchar_t, std::__1::char_traits<wchar_t> >::flush(this=0x00007ffa10b91ef8) at ostream:951
  frame #12: 0x00007ffa10acccec libc++.dll`::__dtor__ZZNSt3__18ios_base4InitC1EvE16init_the_streams() at iostream.cpp:141:16
  frame #13: 0x00007ffa10acccd4 libc++.dll`::__dtor__ZZNSt3__18ios_base4InitC1EvE16init_the_streams() at iostream.cpp:0
  frame #14: 0x00007ffa1e413e26 ucrtbase.dll`_execute_onexit_table + 342
  frame #15: 0x00007ffa1e413d4b ucrtbase.dll`_execute_onexit_table + 123
  frame #16: 0x00007ffa1e413d04 ucrtbase.dll`_execute_onexit_table + 52
  frame #17: 0x00007ffa10aa109e libc++.dll`_CRT_INIT(hDllHandle=<unavailable>, dwReason=<unavailable>, lpreserved=<unavailable>) at crtdll.c:130:11
  frame #18: 0x00007ffa10aa1311 libc++.dll`__DllMainCRTStartup(hDllHandle=0x00007ffa10aa0000, dwReason=0, lpreserved=0x0000000000000001) at crtdll.c:195:6