LLD on 32-bit Windows would frequently fail on large projects with an exception "thread constructor failed: Exec format error". The stack trace pointed to this usage of std::async, and looking at the implementation in libc++ it seems using std::async with std::launch::async results in the immediate creation of a new thread for every call. This could result in a potentially unbounded number of threads, depending on the number of input files. This seems to be hitting some limit in 32-bit Windows host.
I took the easy route, and only use threads on 64-bit Windows, not all Windows as before. I was thinking a more proper solution might involve using a thread pool rather than blindly spawning any number of new threads, but that may have other unforeseen consequences.