Page MenuHomePhabricator

[LLD] [COFF] Avoid thread exhaustion on 32-bit Windows host
ClosedPublic

Authored by jeremyd2019 on Jul 6 2021, 12:28 PM.

Details

Summary

LLD on 32-bit Windows would frequently fail on large projects with an exception "thread constructor failed: Exec format error". The stack trace pointed to this usage of std::async, and looking at the implementation in libc++ it seems using std::async with std::launch::async results in the immediate creation of a new thread for every call. This could result in a potentially unbounded number of threads, depending on the number of input files. This seems to be hitting some limit in 32-bit Windows host.

I took the easy route, and only use threads on 64-bit Windows, not all Windows as before. I was thinking a more proper solution might involve using a thread pool rather than blindly spawning any number of new threads, but that may have other unforeseen consequences.

Diff Detail

Event Timeline

jeremyd2019 requested review of this revision.Jul 6 2021, 12:28 PM
jeremyd2019 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2021, 12:28 PM

I do not have commit rights, and in fact this is my first attempt to submit a patch to llvm.

mstorsjo added a subscriber: pcc.

Adding @pcc who I think designed this performance optimization back in the day, in case he's still active and interested in commenting on it.

I think this might be a reasonably pragmatic fix for the issue. For the case we care most about, 64 bit, I don't think we would want to use a threadpool - using potentially unbounded number of threads might be pretty much intentional, to get as much parallelism as possible while waiting on IO. But for more limited cases like 32 bit mode, using a thread pool might be a reasonable approximation. This probably would be something of a performance degradation for 32 bit cases, while making it more robust - and spending more code on optimizing that case might not be worth it if it results in lots of code churn. But that depends on the scale of the degradation of course.

@jeremyd2019 - You wouldn't happen to have a reasonbly sized testcase to link lying around that you could try to run and see if the performance difference is measurable (i.e. a built project with e.g. lots of loose object files)? (Running lld often is so fast so it's hard to even measure anything unless you're linking something on the scale of Chromium, but I think this optimization is kinda vital.)

@jeremyd2019 - You wouldn't happen to have a reasonbly sized testcase to link lying around that you could try to run and see if the performance difference is measurable (i.e. a built project with e.g. lots of loose object files)? (Running lld often is so fast so it's hard to even measure anything unless you're linking something on the scale of Chromium, but I think this optimization is kinda vital.)

I'm not quite sure what you are getting at. I know of several projects that crash before with 32-bit LLD but succeed after. Generally when being built with higher -j options.

@jeremyd2019 - You wouldn't happen to have a reasonbly sized testcase to link lying around that you could try to run and see if the performance difference is measurable (i.e. a built project with e.g. lots of loose object files)? (Running lld often is so fast so it's hard to even measure anything unless you're linking something on the scale of Chromium, but I think this optimization is kinda vital.)

I'm not quite sure what you are getting at. I know of several projects that crash before with 32-bit LLD but succeed after. Generally when being built with higher -j options.

Not a testcase for whether this crashes or not, but to see how much the suggested fix regresses linking performance for such projects that do succeed even before this patch.

Not a testcase for whether this crashes or not, but to see how much the suggested fix regresses linking performance for such projects that do succeed even before this patch.

Hmm, I guess I could build qt6-base with the non-patched lld, let it fail, then manually run and time the link (which should succeed in isolation because it only tends to crash with parallel builds), and manually run and time the link with the patched lld.

Not a testcase for whether this crashes or not, but to see how much the suggested fix regresses linking performance for such projects that do succeed even before this patch.

Hmm, I guess I could build qt6-base with the non-patched lld, let it fail, then manually run and time the link (which should succeed in isolation because it only tends to crash with parallel builds), and manually run and time the link with the patched lld.

Thanks, that'd be awesome. Yeah a project on that scale hopefully is big enough where the difference is noticable.

FWIW, see https://github.com/llvm/llvm-project/commit/6ee0b4e9f543fa108b47d3ae7c030d835cacd2c1 for the original commit that added this feature, giving the scale of the speedup that it gave. So it might make the linking time around 30-100% longer than before, roughly, on large projects. Given how fast lld is otherwise I think that might be tolerable (and lld on 32 bit can't link huge projects anyway due to using memory mapping).

I couldn't tell anything useful from that. Both patched and unpatched lld linked QT6Core.dll (which crashed LLD in a parallel build) in about 0.3 seconds.

I couldn't tell anything useful from that. Both patched and unpatched lld linked QT6Core.dll (which crashed LLD in a parallel build) in about 0.3 seconds.

Ok, thanks for testing anyway. Based on the numbers from the old commit from @pcc, it'd be a worst case slowdown of 2x, but most actual cases you can link on 32 bit are so small anyway that you probably run into the memory mapping limit fairly soon anyway, so the cases that can be linked are small enough that it's plenty fast anyway.

So with that I think this is fine, but I'd still like an ack from @rnk.

Yes, I suspect that anything with enough files for this to make a noticeable performance difference would have crashed on 32-bit Windows either because of this or the 32-bit address space being exhausted.

rnk accepted this revision.Jul 7 2021, 7:27 AM

32-bit builds of LLD are not well supported (see aforementioned address space issues), so sure, we can power this optimization down there.

This revision is now accepted and ready to land.Jul 7 2021, 7:27 AM

Great! @jeremyd2019, let me know your preferred git author line (Real Name <email@address>) to use for the commit when I push it?

Great! @jeremyd2019, let me know your preferred git author line (Real Name <email@address>) to use for the commit when I push it?

I suppose "Jeremy Drake <github (at) jdrake.com>"

This revision was automatically updated to reflect the committed changes.